fuxiao0719 / geowizard Goto Github PK
View Code? Open in Web Editor NEW[ECCV'24] GeoWizard: Unleashing the Diffusion Priors for 3D Geometry Estimation from a Single Image
Home Page: https://fuxiao0719.github.io/projects/geowizard/
[ECCV'24] GeoWizard: Unleashing the Diffusion Priors for 3D Geometry Estimation from a Single Image
Home Page: https://fuxiao0719.github.io/projects/geowizard/
Hi, Thanks for your great work.
You mentioned your model is fine-tuned from "pre-trained stable diffusion v2, which has been fine-tuned with image conditions." I was wondering that do you pre-train it by yourself or using an open-sourced one, as SD v2 is initially a text-to-image model if I understand correctly.
Looking forward to your reply.
How to not output the large npy files?
--k ${k} \ --iter ${iterations} \ --tol ${tol}
Sorry to bother you in your busy schedule. I wanna ask what's the meaning of the three parameters and what performance improvements would their changes make? Thx!!!
Hi,
I'm trying to replicate your reported results. Using “run_infer.py” with the default settings (ensemble_size=3 and denoise_steps=10) for surface normal estimation, my scores for Mean Angular Error and Accuracy (11.25) are lower than those in your paper.
Could you provide more details on your testing setup or share the relevant testing code? It would help me better understand your findings. Thanks in advance!
Below are the results obtained by me compared to those you reported.
How much memory is needed and how long does it take to generate a model? What is an open source license
Hello,
Thank you for the wonderful work you've done!
I am currently analyzing your code on Hugging Face.
However, I am having trouble locating the cross-domain geometric self-attention feature mentioned in the paper.
If it's not too much trouble, could you please point me in the right direction?
Thank you for your assistance!
Hi, thanks for your hard work. GeoWizards works well, however, for characters portraits and sometimes for interiors, it tends to produce a bit low resolution / blocky normal maps. I tried about 25 different settings to avoid that, but the results are always much more low-res than the original image. Is there a way to improve this and smoothing the results? (it's especially a problem on rounded edges like hair, mouth, etc.).
It would be great if it could have a post-process filter that cleans-up this kind of problems a bit. I tried both run-infer and run-infer_object but both have this issue.
Thank you
The numeric bug
The torch.acos
has numeric issues that produce nan
values near the +1
and -1
values, as mentioned here: pytorch/pytorch#8069
This problem causes the ensemble_normals()
function to always select the first tensor containing any nan
values after the torch.acos
A simple solution is clamp the values by a small epsilon before feeding the tensor into torch.acos
.
Question regarding the ensemble_normals
implementation
Meanwhile, I am curious about the design choice of this ensemble_normals()
function.
Different from ensemble_depths()
, which selects per-pixel mean or median, and truly ensembles the predictions by fusing the multiple depth maps.
On the other hand, ensemble_normals()
calculates an error score of each normal map prediction, selects the entirety of the normal map, and disregards the remaining normal maps.
Have you tried using mean/median reduction similar to ensemble_depths()
, and having some insights leading to the current design?
Sincerely thanks!
Thanks for the great work!! I am curious whether you only trained the attention layer and lora in unet, or trained the entire unet after loading the pretrained SD weight? Thank you very much!
Hey, thanks for your excellent work.
As Marigold and GeoWizward both use multi-resolution noise to achieve faster and stable convergence, it would be great if you can provide a pseudo code to describe the generation of multi-resolution noise? I want to make sure I get it right in my implementation.
Thanks a lot!
Hello, thank you for your brilliant work in Stable Diffusion-based geometry estimation!
I wonder whether your model can predict normal and depth independently. i.e. I can input an image and the model will only output a depth or normal.
Thank you :)
The persistent attribute self.img_embed
is assigned to the current image latent here:
GeoWizard/geowizard/models/geowizard_pipeline.py
Lines 248 to 249 in 0a60193
It is understandable to save AE computation, but the variable is not reset after the depth/normal ensemble. Thus, making a second call to the pipeline inference will result in the wrong input condition.
A simple (but not so clean) fix is to assign self.img_embed=None
before return:
GeoWizard/geowizard/models/geowizard_pipeline.py
Lines 199 to 201 in 0a60193
Hi, thanks for your great work!
I was trying to reproduce your normal estimation results from the paper and couldn't quite get the same numbers. Would it be possible that you share some more details on the particular evaluation settings that you used?
As far as I understood, you used 50 denoising steps and 10 ensemble size. However, I'm not sure which image resolution you used as input to the diffusion model. Did you down/upsample the images to a certain processing resolution or did you just input the original images without any rescaling?
For video testing, flicker is very serious, what can be done to reduce flicker
.
Hi @fuxiao0719 , could you please share the code for section 3.3 3D reconstruction? I'm wondering about the calculation of scale and shift parameters about how to get the metric depth. Thanks!
Thank you for your excellent work!
I noticed that you used four larger datasets: hypersim, Replica, 3D Ken Burns, and Objaverse, and filtered these datasets before training. I think this filtering operation is crucial for the final training result, so I would like to ask if you can release this part of the code or filtered filename list. This will help a lot.
Waiting for your early reply.
Best wishes!
installation process
git clone https://github.com/fuxiao0719/GeoWizard.git
cd GeoWizard
conda create -n geowizard python=3.9
conda activate geowizard
pip install -r requirements.txt
cd geowizard
command I've run :
python run_infer.py \
--input_dir input/example \
--output_dir output \
--ensemble_size 3 \
--denoise_steps 10 \
--domain "indoor"
errors complaining missing dependencies :
(geowizard) ubuntu@129-146-73-36:~/A100/GeoWizard/geowizard$ python run_infer.py \
--input_dir input/example \
--output_dir output \
--ensemble_size 3 \
--denoise_steps 10 \
--domain "indoor"
Traceback (most recent call last):
File "/home/ubuntu/A100/GeoWizard/geowizard/run_infer.py", line 7, in <module>
import numpy as np
ModuleNotFoundError: No module named 'numpy'
Traceback (most recent call last):
File "/home/ubuntu/A100/GeoWizard/geowizard/run_infer.py", line 10, in <module>
from tqdm.auto import tqdm
ModuleNotFoundError: No module named 'tqdm'
Traceback (most recent call last):
File "/home/ubuntu/A100/GeoWizard/geowizard/run_infer.py", line 13, in <module>
import cv2
ModuleNotFoundError: No module named 'cv2'
Traceback (most recent call last):
File "/home/ubuntu/A100/GeoWizard/geowizard/run_infer.py", line 16, in <module>
from models.geowizard_pipeline import DepthNormalEstimationPipeline
File "/home/ubuntu/A100/GeoWizard/geowizard/models/geowizard_pipeline.py", line 10, in <module>
from diffusers import (
ModuleNotFoundError: No module named 'diffusers'
Traceback (most recent call last):
File "/home/ubuntu/A100/GeoWizard/geowizard/run_infer.py", line 16, in <module>
from models.geowizard_pipeline import DepthNormalEstimationPipeline
File "/home/ubuntu/A100/GeoWizard/geowizard/models/geowizard_pipeline.py", line 10, in <module>
from diffusers import (
File "/home/ubuntu/anaconda3/envs/geowizard/lib/python3.9/site-packages/diffusers/__init__.py", line 5, in <module>
from .utils import (
File "/home/ubuntu/anaconda3/envs/geowizard/lib/python3.9/site-packages/diffusers/utils/__init__.py", line 18, in <module>
from packaging import version
ModuleNotFoundError: No module named 'packaging'
I've installed following packages through conda and pip :
conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
conda install tqdm
conda install conda-forge::opencv
conda install -c conda-forge diffusers
conda install conda-forge::packaging
But the final error message I've got :
Traceback (most recent call last):
File "/home/ubuntu/A100/GeoWizard/geowizard/run_infer.py", line 16, in <module>
from models.geowizard_pipeline import DepthNormalEstimationPipeline
File "/home/ubuntu/A100/GeoWizard/geowizard/models/geowizard_pipeline.py", line 10, in <module>
from diffusers import (
File "/home/ubuntu/anaconda3/envs/geowizard/lib/python3.9/site-packages/diffusers/__init__.py", line 5, in <module>
from .utils import (
File "/home/ubuntu/anaconda3/envs/geowizard/lib/python3.9/site-packages/diffusers/utils/__init__.py", line 21, in <module>
from .constants import (
File "/home/ubuntu/anaconda3/envs/geowizard/lib/python3.9/site-packages/diffusers/utils/constants.py", line 17, in <module>
from huggingface_hub.constants import HF_HOME
ImportError: cannot import name 'HF_HOME' from 'huggingface_hub.constants' (/home/ubuntu/anaconda3/envs/geowizard/lib/python3.9/site-packages/huggingface_hub/constants.py)
I probably won't be able to run the code successfully.
Thank you for your excellent work!
I really amazed about the quality of your paper and code.
I have a few questions regarding the dataset:
Again, I'm very thank for your nice work.
I try to use the stable diffusion here: https://huggingface.co/stabilityai/stable-diffusion-2.
But I can not initialize the model correctly.
Some weights of UNet2DConditionModel were not initialized from the model checkpoint at /home//GeoWizard_edit/sdv2 and are newly initialized: ['class_embedding.linear_1.weight', 'class_embedding.linear_2.bias', 'class_embedding.linear_1.bias', 'class_embedding.linear_2.weight']
Hi authors. Thank you for this amazing work!
I see in your paper that GeoWizard was trained with classifier-free guidance. Does this mean that it was trained sometimes with RGB input as a condition, but also sometimes unconditioned? Is it possible to generate normal or depth maps without RGB image input?
Thank you!
How to output not colored, so the maps are usable?
Hello, thanks again for the nice work! I noticed in your documentation that you mentioned the default values for ensemble_size and denoise_steps are 3 and 10, respectively, while for academic purposes, they are 10 and 50. Could you specify how much the performance differs between these two configurations? The inference time is quite long when ensemble_size and denoise_steps are set to 10 and 50. Specifically, when using GeoWizard as prior information of depth map for computer vision, which configuration would you recommend?
Hi, thank you for opening source your greate work.
In your paper you mentioned that you use least squares method to optimizing the scale and shift.
Could your provide codes of this part? or give us some intructions about your implementation. I tried to implement it, But i found it is a non-linear process to obtain normal from depth( use neighbor points' cross product as the nomal). Could you give us some hints?
Hello, thank you for doing wonderful work!
I wondering whether you are going to share the training code.
Hi,
I'm currently in the process of trying different depthmap generators, as I use them to convert 2D movies into 3D stereoscopy.
I went from Midas v3.1 to Depth Anything v1 to now trying out Depth Anything v2.
Didn't use Zoedepth, as it takes too long. A movie is generally around 130.000 to 190.000 images to process.
How does Geowizard compare to these? As I only see heatmaps in the examples, can it generate depthmaps (which are basically grayscaled heatmaps, but slighly less gradations)
As it is for movies, it needs to be able to handle different scenarios. From indoor, outdoor, small and large objects/people. And special effects (like explosions, electricity, fantasy-projectils, ... ), as this is where I notice most depthmap generators fall short.
Looking at the installation, it obviously also installs things like python and such, like the other ones. But as I've already installed these things, will it overwrite them, or get in conflict with it?
Cheers.
Thanks for the nice work! I encountered some issues while running this project, which seems to be caused by the missing pre-trained models on Hugging Face. "requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://huggingface.co/lemonaddie/Geowizard/resolve/8e910b828c6f7ae4795d83e9c286cef3b7d95d28/unet_object/diffusion_pytorch_model.bin" Could you please provide a solution? Looking forward to your response!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.