Feed-Forward 3D Vision

Advances in Feed-Forward 3D Reconstruction
and View Synthesis

Jiahui Zhang¹, Yuelei Li², Anpei Chen³, Muyu Xu¹, Kunhao Liu¹, Jianyuan Wang⁴, Xiao-Xiao Long⁵,
Hanxue Liang⁶, Zexiang Xu⁷, Hao Su⁸, Christian Theobalt⁹, Christian Rupprecht⁴,
Andrea Vedaldi⁴, Hanspeter Pfister¹⁰, Shijian Lu¹, Fangneng Zhan^10,11

¹NTU, ²Caltech, ³Westlake University, ⁴University of Oxford, ⁵Nanjing University,
⁶University of Cambridge, ⁷Hillbot, ⁸UCSD, ⁹MPI-INF, ¹⁰Harvard University, ¹¹MIT

Abstract

3D reconstruction and view synthesis are foundational problems in computer vision, graphics, and immersive technologies such as augmented reality (AR), virtual reality (VR), and digital twins. Traditional methods rely on computationally intensive iterative optimization in a complex chain, limiting their applicability in real-world scenarios. Recent advances in feed-forward approaches, driven by deep learning, have revolutionized this field by enabling fast and generalizable 3D reconstruction and view synthesis. This survey offers a comprehensive review of feed-forward techniques for 3D reconstruction and view synthesis, with a taxonomy according to the underlying representation architectures including point cloud, 3D Gaussian Splatting (3DGS), Neural Radiance Fields (NeRF), etc. We examine key tasks such as pose-free reconstruction, dynamic 3D reconstruction, and 3D-aware image and video synthesis, highlighting their applications in digital humans, SLAM, robotics, and beyond. In addition, we review commonly used datasets with detailed statistics, along with evaluation protocols for various downstream tasks. We conclude by discussing open research challenges and promising directions for future work, emphasizing the potential of feed-forward approaches to advance the state of the art in 3D vision.

Summary

Methods

1. NeRF-based Methods

2. Pointmap-based Methods

3. 3DGS-based Methods

4. Representation-free Methods