GeoWizard: Unleashing the Diffusion Priors for 3D Geometry Estimation from a Single Image

ECCV 2024

Xiao Fu^1*, Wei Yin^2*, Mu Hu^3*, Kaixuan Wang³, Yuexin Ma⁴,
Ping Tan^3,6, Shaojie Shen³, Dahua Lin¹, Xiaoxiao Long^5,6,

¹ CUHK ² The University of Adelaide ³ HKUST ⁴ ShanghaiTech University ⁵ HKU ⁶ Light Illusions

Paper 🤗 Hugging Face Code

GeoWizard generates high-frequency and SOTA depth and normal maps in the wild.

Hair-level Details

Depth & Normal Comparison Gallery

Point Cloud Rendering Using Depth

Image Relighting Using Normal

3D Reconstruction Comparison

We utilize BiNI algorithm to reconstruct 3D mesh from estimated normal map.

Novel View Synthesis Comparison

We utilize depth guided 3D-photo-inpainting to render novel view.

Overall Framework

During fine-tuning, GeoWizard encodes the image, GT depth, and GT normal through the frozen VAE into latent space and forms two concatenated geometric groups. Each group is fed into the U-Net to generate the output in depth or normal domain under the guide of a geometry switcher. Additionally, the scene prompt is introduced to produce results with one of three possible scene layouts (indoor/outdoor/object). During inference, given an image, a scene prompt, initial depth noise and normal noise, GeoWizard can generate high-quality depth and normal jointly.

Citation


        @inproceedings{fu2024geowizard,
          title={GeoWizard: Unleashing the Diffusion Priors for 3D Geometry Estimation from a Single Image},
          author={Fu, Xiao and Yin, Wei and Hu, Mu and Wang, Kaixuan and Ma, Yuexin and Tan, Ping and Shen, Shaojie and Lin, Dahua and Long, Xiaoxiao},
          booktitle={ECCV},
          year={2024}
        }