GithubHelp home page GithubHelp logo

fuxiao0719 / geowizard Goto Github PK

View Code? Open in Web Editor NEW
676.0 676.0 30.0 72.7 MB

[ECCV'24] GeoWizard: Unleashing the Diffusion Priors for 3D Geometry Estimation from a Single Image

Home Page: https://fuxiao0719.github.io/projects/geowizard/

Python 99.69% Shell 0.31%

geowizard's People

Contributors

fuxiao0719 avatar jugghm avatar xxlong0 avatar yvanyin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

geowizard's Issues

Which base diffusion model do you use?

Hi, Thanks for your great work.

You mentioned your model is fine-tuned from "pre-trained stable diffusion v2, which has been fine-tuned with image conditions." I was wondering that do you pre-train it by yourself or using an open-sourced one, as SD v2 is initially a text-to-image model if I understand correctly.

Looking forward to your reply.

What's the meaning of these parameters?

--k ${k} \ --iter ${iterations} \ --tol ${tol}

Sorry to bother you in your busy schedule. I wanna ask what's the meaning of the three parameters and what performance improvements would their changes make? Thx!!!

Could you share the testing code?

Hi,

I'm trying to replicate your reported results. Using “run_infer.py” with the default settings (ensemble_size=3 and denoise_steps=10) for surface normal estimation, my scores for Mean Angular Error and Accuracy (11.25) are lower than those in your paper.

Could you provide more details on your testing setup or share the relevant testing code? It would help me better understand your findings. Thanks in advance!

Below are the results obtained by me compared to those you reported.
image

Code location of cross-domain geometric self-attention

Hello,
Thank you for the wonderful work you've done!
I am currently analyzing your code on Hugging Face.
However, I am having trouble locating the cross-domain geometric self-attention feature mentioned in the paper.

If it's not too much trouble, could you please point me in the right direction?

Thank you for your assistance!

Low resolution normals / blocky normals

Hi, thanks for your hard work. GeoWizards works well, however, for characters portraits and sometimes for interiors, it tends to produce a bit low resolution / blocky normal maps. I tried about 25 different settings to avoid that, but the results are always much more low-res than the original image. Is there a way to improve this and smoothing the results? (it's especially a problem on rounded edges like hair, mouth, etc.).
It would be great if it could have a post-process filter that cleans-up this kind of problems a bit. I tried both run-infer and run-infer_object but both have this issue.

Thank you

Numeric bug, and question regarding the normal ensemble logic

The numeric bug
The torch.acos has numeric issues that produce nan values near the +1 and -1 values, as mentioned here: pytorch/pytorch#8069

This problem causes the ensemble_normals() function to always select the first tensor containing any nan values after the torch.acos

angle_error = torch.acos(torch.cosine_similarity(normal_pred[None], normal_preds, dim=1))

A simple solution is clamp the values by a small epsilon before feeding the tensor into torch.acos.


Question regarding the ensemble_normals implementation
Meanwhile, I am curious about the design choice of this ensemble_normals() function.
Different from ensemble_depths(), which selects per-pixel mean or median, and truly ensembles the predictions by fusing the multiple depth maps.
On the other hand, ensemble_normals() calculates an error score of each normal map prediction, selects the entirety of the normal map, and disregards the remaining normal maps.
Have you tried using mean/median reduction similar to ensemble_depths(), and having some insights leading to the current design?
Sincerely thanks!

Training on whole unet or just lora

Thanks for the great work!! I am curious whether you only trained the attention layer and lora in unet, or trained the entire unet after loading the pretrained SD weight? Thank you very much!

About multi-resolution noise

Hey, thanks for your excellent work.
As Marigold and GeoWizward both use multi-resolution noise to achieve faster and stable convergence, it would be great if you can provide a pseudo code to describe the generation of multi-resolution noise? I want to make sure I get it right in my implementation.
Thanks a lot!

problems about predicting normal and depth independently

Hello, thank you for your brilliant work in Stable Diffusion-based geometry estimation!
I wonder whether your model can predict normal and depth independently. i.e. I can input an image and the model will only output a depth or normal.

Thank you :)

Logical bug of `self.img_embed`

The persistent attribute self.img_embed is assigned to the current image latent here:

if self.img_embed is None:
self.__encode_img_embed(input_rgb)

It is understandable to save AE computation, but the variable is not reset after the depth/normal ensemble. Thus, making a second call to the pipeline inference will result in the wrong input condition.

A simple (but not so clean) fix is to assign self.img_embed=None before return:

normal_colored_img = Image.fromarray(normal_colored)
return DepthNormalPipelineOutput(

Reproduce results: evaluation settings

Hi, thanks for your great work!

I was trying to reproduce your normal estimation results from the paper and couldn't quite get the same numbers. Would it be possible that you share some more details on the particular evaluation settings that you used?

As far as I understood, you used 50 denoising steps and 10 ensemble size. However, I'm not sure which image resolution you used as input to the diffusion model. Did you down/upsample the images to a certain processing resolution or did you just input the original images without any rescaling?

about video test?

For video testing, flicker is very serious, what can be done to reduce flicker

About the results

Thanks for sharing this excellent work!
I noticed that the image resolution used in your paper (576x768) differs from that in Marigold (480x640). Is this comparison fair? The results from the higher resolution in your paper are directly compared with those in Marigold.
image

.

.

Code for 3D reconstruction

Hi @fuxiao0719 , could you please share the code for section 3.3 3D reconstruction? I'm wondering about the calculation of scale and shift parameters about how to get the metric depth. Thanks!

Request for code of datasets preprocess

Thank you for your excellent work!

I noticed that you used four larger datasets: hypersim, Replica, 3D Ken Burns, and Objaverse, and filtered these datasets before training. I think this filtering operation is crucial for the final training result, so I would like to ask if you can release this part of the code or filtered filename list. This will help a lot.

Waiting for your early reply.
Best wishes!

insufficient installation guide

summary

  • tested on A100 GPU instance from LambdaCloud
  • the installation guide from README.md seems not supplying all the dependencies
  • It's not possible to run the inference code since there are so many missing dependencies

reproduction of the error

installation process

git clone https://github.com/fuxiao0719/GeoWizard.git
cd GeoWizard

conda create -n geowizard python=3.9
conda activate geowizard
pip install -r requirements.txt
cd geowizard

command I've run :

python run_infer.py \
    --input_dir input/example \
    --output_dir output \
    --ensemble_size 3 \
    --denoise_steps 10 \
    --domain "indoor"

errors complaining missing dependencies :

(geowizard) ubuntu@129-146-73-36:~/A100/GeoWizard/geowizard$ python run_infer.py \
    --input_dir input/example \
    --output_dir output \
    --ensemble_size 3 \
    --denoise_steps 10 \
    --domain "indoor"
Traceback (most recent call last):
  File "/home/ubuntu/A100/GeoWizard/geowizard/run_infer.py", line 7, in <module>
    import numpy as np
ModuleNotFoundError: No module named 'numpy'
Traceback (most recent call last):                                                                                                                                                                                                                                                                
  File "/home/ubuntu/A100/GeoWizard/geowizard/run_infer.py", line 10, in <module>                                                                                                                                                                                                                 
    from tqdm.auto import tqdm                                                                                                                                                                                                                                                                    
ModuleNotFoundError: No module named 'tqdm'                                                                                                                                                                                                                                                       
Traceback (most recent call last):
  File "/home/ubuntu/A100/GeoWizard/geowizard/run_infer.py", line 13, in <module>
    import cv2
ModuleNotFoundError: No module named 'cv2'
Traceback (most recent call last):                                                                                                                                                                                                                                                                
  File "/home/ubuntu/A100/GeoWizard/geowizard/run_infer.py", line 16, in <module>                                                                                                                                                                                                                 
    from models.geowizard_pipeline import DepthNormalEstimationPipeline                                                                                                                                                                                                                           
  File "/home/ubuntu/A100/GeoWizard/geowizard/models/geowizard_pipeline.py", line 10, in <module>                                                                                                                                                                                                 
    from diffusers import (                                                                                                                                                                                                                                                                       
ModuleNotFoundError: No module named 'diffusers' 
Traceback (most recent call last):                                                                                                                                                                                                                                                                
  File "/home/ubuntu/A100/GeoWizard/geowizard/run_infer.py", line 16, in <module>                                                                                                                                                                                                                 
    from models.geowizard_pipeline import DepthNormalEstimationPipeline                                                                                                                                                                                                                           
  File "/home/ubuntu/A100/GeoWizard/geowizard/models/geowizard_pipeline.py", line 10, in <module>                                                                                                                                                                                                 
    from diffusers import (                                                                                                                                                                                                                                                                       
  File "/home/ubuntu/anaconda3/envs/geowizard/lib/python3.9/site-packages/diffusers/__init__.py", line 5, in <module>                                                                                                                                                                             
    from .utils import (                                                                                                                                                                                                                                                                          
  File "/home/ubuntu/anaconda3/envs/geowizard/lib/python3.9/site-packages/diffusers/utils/__init__.py", line 18, in <module>                                                                                                                                                                      
    from packaging import version                                                                                                                                                                                                                                                                 
ModuleNotFoundError: No module named 'packaging'    

I've installed following packages through conda and pip :

conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
conda install tqdm
conda install conda-forge::opencv
conda install -c conda-forge diffusers 
conda install conda-forge::packaging  

But the final error message I've got :

Traceback (most recent call last):
  File "/home/ubuntu/A100/GeoWizard/geowizard/run_infer.py", line 16, in <module>
    from models.geowizard_pipeline import DepthNormalEstimationPipeline
  File "/home/ubuntu/A100/GeoWizard/geowizard/models/geowizard_pipeline.py", line 10, in <module>
    from diffusers import (
  File "/home/ubuntu/anaconda3/envs/geowizard/lib/python3.9/site-packages/diffusers/__init__.py", line 5, in <module>
    from .utils import (
  File "/home/ubuntu/anaconda3/envs/geowizard/lib/python3.9/site-packages/diffusers/utils/__init__.py", line 21, in <module>
    from .constants import (
  File "/home/ubuntu/anaconda3/envs/geowizard/lib/python3.9/site-packages/diffusers/utils/constants.py", line 17, in <module>
    from huggingface_hub.constants import HF_HOME
ImportError: cannot import name 'HF_HOME' from 'huggingface_hub.constants' (/home/ubuntu/anaconda3/envs/geowizard/lib/python3.9/site-packages/huggingface_hub/constants.py)

I probably won't be able to run the code successfully.

Questions About the Dataset

Thank you for your excellent work!
I really amazed about the quality of your paper and code.
I have a few questions regarding the dataset:

  1. I noticed that training with GeoWizard is more expensive compared to Marigold. Is this due to the size of the dataset?
  2. Considering the costs associated with data collection, what would you think the ideal size for a high-quality dataset?
  3. How did you filter out high-quality meshes for the objaverse?
  4. Are there any plans to make the training data available, such as a filtered list from the objaverse or unban rendered data?

Again, I'm very thank for your nice work.

Which pretrianed Stable Diffusion did you use?

I try to use the stable diffusion here: https://huggingface.co/stabilityai/stable-diffusion-2.
But I can not initialize the model correctly.

Some weights of UNet2DConditionModel were not initialized from the model checkpoint at /home//GeoWizard_edit/sdv2 and are newly initialized: ['class_embedding.linear_1.weight', 'class_embedding.linear_2.bias', 'class_embedding.linear_1.bias', 'class_embedding.linear_2.weight']

  • conv_in.weight: found shape torch.Size([320, 4, 3, 3]) in the checkpoint and torch.Size([320, 8, 3, 3]) in the model instantiated
  • down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_k.weight: found shape torch.Size([320, 1024]) in the checkpoint and torch.Size([320, 768]) in the model instantiated
  • down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_v.weight: found shape torch.Size([320, 1024]) in the checkpoint and torch.Size([320, 768]) in the model instantiated
  • down_blocks.0.attentions.1.transformer_blocks.0.attn2.to_k.weight: found shape torch.Size([320, 1024]) in the checkpoint and torch.Size([320, 768]) in the model instantiated
  • down_blocks.0.attentions.1.transformer_blocks.0.attn2.to_v.weight: found shape torch.Size([320, 1024]) in the checkpoint and torch.Size([320, 768]) in the model instantiated

Can GeoWizard generate normal/depth maps without RGB image input?

Hi authors. Thank you for this amazing work!

I see in your paper that GeoWizard was trained with classifier-free guidance. Does this mean that it was trained sometimes with RGB input as a condition, but also sometimes unconditioned? Is it possible to generate normal or depth maps without RGB image input?

Thank you!

About parameters setting

Hello, thanks again for the nice work! I noticed in your documentation that you mentioned the default values for ensemble_size and denoise_steps are 3 and 10, respectively, while for academic purposes, they are 10 and 50. Could you specify how much the performance differs between these two configurations? The inference time is quite long when ensemble_size and denoise_steps are set to 10 and 50. Specifically, when using GeoWizard as prior information of depth map for computer vision, which configuration would you recommend?

Could you please provide the code for optimizing the scale and shift?

Hi, thank you for opening source your greate work.
In your paper you mentioned that you use least squares method to optimizing the scale and shift.
Could your provide codes of this part? or give us some intructions about your implementation. I tried to implement it, But i found it is a non-linear process to obtain normal from depth( use neighbor points' cross product as the nomal). Could you give us some hints?

Depthmap generation?

Hi,

I'm currently in the process of trying different depthmap generators, as I use them to convert 2D movies into 3D stereoscopy.
I went from Midas v3.1 to Depth Anything v1 to now trying out Depth Anything v2.
Didn't use Zoedepth, as it takes too long. A movie is generally around 130.000 to 190.000 images to process.

How does Geowizard compare to these? As I only see heatmaps in the examples, can it generate depthmaps (which are basically grayscaled heatmaps, but slighly less gradations)
As it is for movies, it needs to be able to handle different scenarios. From indoor, outdoor, small and large objects/people. And special effects (like explosions, electricity, fantasy-projectils, ... ), as this is where I notice most depthmap generators fall short.

Looking at the installation, it obviously also installs things like python and such, like the other ones. But as I've already installed these things, will it overwrite them, or get in conflict with it?

Cheers.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.