Abstract

The recent advance of neural fields, such as neural radiance fields, has significantly pushed the boundary of scene representation learning. Aiming to boost the computational efficiency and rendering quality of 3D scenes, a popular line of research maps the 3D coordinate system to another measuring system, e.g., 2D manifolds and hash tables, for modeling neural fields. The conversion of coordinate systems can be typically dubbed as gauge transformation, which is usually a pre-defined mapping function, e.g., orthogonal projection or spatial hash function. This begs a question: can we directly learn a desired gauge transformation along with the neural field in an end-to-end manner? In this work, we extend this problem to a general paradigm with a taxonomy of discrete and continuous cases, and develop an end-to-end learning framework to jointly optimize the gauge transformation and neural fields. To counter the problem that the learning of gauge transformations can collapse easily, we derive a general regularization mechanism from the principle of information conversation during the gauge transformation. On the strength of the derived unified neural gauge field framework, we naturally discover a new type of gauge transformation which achieves a trade-off between learning collapse and computational cost.

overview

Video

Reflection Direction Parameterization

Previous approaches directly input the camera's view direction into the MLP to predict outgoing radiance. We show that instead using the reflection of the view direction about the normal makes the emittance function significantly easier to learn and interpolate, greatly improving our results.

Integrated Directional Encoding

We explicitly model object roughness using the expected values of a set of spherical harmonics under a von Mises-Fisher distribution whose concentration parameter varies spatially:


We call this Integrated Directional Encoding, and we show experimentally that it allows sharing the emittance functions between points with different roughnesses. It also enables scene editing after training. Theoretically, our encoding is stationary on the sphere, similar to the Euclidean stationarity of NeRF's positional encoding.

Additional Synthetic Results

Results on Captured Scenes

Our method also produces accurate renderings and surface normals from captured photographs:

Scene Editing

We show that our structured representation of the directional MLP allows for scene editing after training. Here we show that we can convincingly change material properties.
We can increase and decrease material roughness:
We can also control the amounts of specular and diffuse colors, or change the diffuse color without affecting the specular reflections:

Citation

Acknowledgements

We would like to thank Lior Yariv and Kai Zhang for helping us evaluate their methods, and Ricardo Martin-Brualla for helpful comments on our text. DV is supported by the National Science Foundation under Cooperative Agreement PHY-2019786 (an NSF AI Institute, http://iaifi.org)
The website template was borrowed from Michaël Gharbi.