We rethink the role of positional encoding in 3D representation learning, and propose Positional Prompt Tuning, a simple but efficient method for transfer learning.
We present ShapeLLM, the first 3D Multimodal Large Language Model designed for embodied interaction, exploring a universal 3D object understanding with 3D point clouds and languages.
We present DreamLLM, a learning framework that first achieves versatile Multimodal Large Language Models empowered with frequently overlooked synergy between multimodal comprehension and creation.
VPP⚡: Efficient Conditional 3D Generation via Voxel-Point Progressive Representation Zekun Qi*,
Muzhou Yu*,
Runpei Dong,
Kaisheng Ma Conference on Neural Information Processing Systems (NeurIPS), 2023 [arXiv][Code][OpenReview]
We achieve rapid, multi-category 3D conditional generation by sharing the merits of different representations. VPP can generate 3D shapes less than 0.2s using a single RTX 2080Ti.
Point-GCC: Universal Self-supervised 3D Scene Pre-training via Geometry-Color Contrast Guofan Fan,
Zekun Qi,
Wenkai Shi,
Kaisheng Ma ACM International Conference on Multimedia (ACMMM), 2024 [arXiv][Code]
We enhance the utilization of color information to improve 3D scene self-supervised learning.