GeoWizard: Unleashing the Diffusion Priors for 3D Geometry Estimation from a Single Image

ECCV 2024

1 CUHK 2 The University of Adelaide 3 HKUST 4 ShanghaiTech University 5 HKU 6 Light Illusions

* Equal contribution

Teaser image.

GeoWizard generates high-frequency and SOTA depth and normal maps in the wild.

Hair-level Details

hair-level_detail

Depth & Normal Comparison Gallery

Point Cloud Rendering Using Depth

Image Relighting Using Normal

3D Reconstruction Comparison

We utilize BiNI algorithm to reconstruct 3D mesh from estimated normal map.
3d reconstruction comparison

Novel View Synthesis Comparison

We utilize depth guided 3D-photo-inpainting to render novel view.
3d reconstruction comparison

Overall Framework

During fine-tuning, GeoWizard encodes the image, GT depth, and GT normal through the frozen VAE into latent space and forms two concatenated geometric groups. Each group is fed into the U-Net to generate the output in depth or normal domain under the guide of a geometry switcher. Additionally, the scene prompt is introduced to produce results with one of three possible scene layouts (indoor/outdoor/object). During inference, given an image, a scene prompt, initial depth noise and normal noise, GeoWizard can generate high-quality depth and normal jointly.

3d reconstruction comparison

Citation


        @article{fu2024geowizard,
          title={GeoWizard: Unleashing the Diffusion Priors for 3D Geometry Estimation from a Single Image},
          author={Fu, Xiao and Yin, Wei and Hu, Mu and Wang, Kaixuan and Ma, Yuexin and Tan, Ping and Shen, Shaojie and Lin, Dahua and Long, Xiaoxiao},
          journal = {arxiv},
          year = {2024}
        }