fudan-generative-vision / champ Goto Github PK

View Code? Open in Web Editor NEW

3.5K 3.5K 415.0 779.3 MB

Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance

Home Page: https://fudan-generative-vision.github.io/champ/

License: MIT License

Python 100.00%

human-animation image-animatioln video-generation

champ's People

Stargazers

Watchers

Forkers

siyuzhu-fudan sdbds camenduru szninjacat kijai super-alex blizaine mathdroid anthonyyuan xiusdk chnxindong drkhoinguyen awekling kustomzone frierenlabs monsterdove cuuupid 227time tufo830 ototao bluesealjs twelvearrays d3p10y dineshkumares sh3rly13 mz0in ameerazam08 mistyr0se vamoko mooneese ntt720 richardsolaire jmaigc iam20cm paramedick kackbob princepride wangxihao idoatad ysuws1314 aricgamma fskeo jinshiyin qoffee elvhack lplzyp painebenjamin stupidbai yushan777 jbluv hp027 cylonspace hubin858130 sunsmarterjie ltfschoen maigone aliang-cv magicwang1111 giantclam peter65374 zlandrew jameswang007 r42-chun red7sk jmwdpk dearsunshine maisnamraju tonghengcheng minisoco obsidian6s tc999 wensiyuansix anhtudotinfo bewazdi gznbilir xiaojiuli fathyjaquesa mikelmanro mikigit11 hhy5277 maxmax2016 padre33 anuragvohraec zyxyzk pariigh fateme211 narsis77 misterypoem techthiyanes zhaopufeng tardigrade34 charliechap3 decentralizedbug shinshin86 hs991023 farmingtong navezjt jakubik2023 unfolloweddev sam145g

champ's Issues

Vram required?

Thanks for the great work! What is the minimum Vram required?

Commercial usage?

Are the checkpoints available for commercial usage?
Thank you for your work!

Hi :) Champ is really an inspiring work! During my experiment, Champ shows a high demand of memory and I cannot run the inference code on 3090 due to out of memory. May I ask is there any solution to solve this except changing to A10

About SMPL, how to get the parameters, use HMR?

I want to know how to get the parameters of SMPL，using HMR is not very accurate。

Can this repo run on Mojo language?

Is it possible to run champ on mojo language than using python?

Torch Not Compile with Cuda Enable

My machine is RTX4080 Window
I install all pretrain models, packages and run it in conda, when I use torch==2.0.1, it will say Torch not compiled with CUDA enabled.

  File "D:\Github\champ\inference.py", line 284, in <module>
    main(cfg)
  File "D:\Github\champ\inference.py", line 162, in main
    ).to(dtype=weight_dtype, device="cuda")
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Charlie\miniconda3\Lib\site-packages\transformers\modeling_utils.py", line 1902, in to
    return super().to(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Charlie\miniconda3\Lib\site-packages\torch\nn\modules\module.py", line 1152, in to
    return self._apply(convert)
           ^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Charlie\miniconda3\Lib\site-packages\torch\nn\modules\module.py", line 802, in _apply
    module._apply(fn)
  File "C:\Users\Charlie\miniconda3\Lib\site-packages\torch\nn\modules\module.py", line 802, in _apply
    module._apply(fn)
  File "C:\Users\Charlie\miniconda3\Lib\site-packages\torch\nn\modules\module.py", line 802, in _apply
    module._apply(fn)
  File "C:\Users\Charlie\miniconda3\Lib\site-packages\torch\nn\modules\module.py", line 825, in _apply
    param_applied = fn(param)
                    ^^^^^^^^^
  File "C:\Users\Charlie\miniconda3\Lib\site-packages\torch\nn\modules\module.py", line 1150, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Charlie\miniconda3\Lib\site-packages\torch\cuda\__init__.py", line 293, in _lazy_init
    raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled

I also tried using "conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia", in that way,
it will say

`decoderF` is not supported because:
    xFormers wasn't build with CUDA support
    attn_bias type is <class 'NoneType'>
    operator wasn't built - see `python -m xformers.info` for more info
`[email protected]` is not supported because:
    xFormers wasn't build with CUDA support
    dtype=torch.float32 (supported: {torch.bfloat16, torch.float16})
    operator wasn't built - see `python -m xformers.info` for more info
`tritonflashattF` is not supported because:
    xFormers wasn't build with CUDA support
    dtype=torch.float32 (supported: {torch.bfloat16, torch.float16})
    operator wasn't built - see `python -m xformers.info` for more info
    triton is not available
`cutlassF` is not supported because:
    xFormers wasn't build with CUDA support
    operator wasn't built - see `python -m xformers.info` for more info
`smallkF` is not supported because:
    max(query.shape[-1] != value.shape[-1]) > 32
    xFormers wasn't build with CUDA support
    operator wasn't built - see `python -m xformers.info` for more info
    unsupported embed per head: 40

how to solve this

What model is to be used for extraction of the semantic segmentation map?

Thank you for your awesome work released in open source! I really appreciate the impact this paper and code will bring to the community.

I would like to test this model by using in-the-wild video which requires preprocessing.
I am planning to use the follows:

Depth: Depth Anything with greyscale
Dwpose: official dwpose repository
Normal: ICON normal map
mask: unsure whether it is necessary as "inference.yaml" does not require mask for its guidance_types by default

Meanwhile, I am unsure what models I should be using for semantic segmentation map. Please guide me if there is any model that are suitable to be used in the data preprocessing stage.

i can not find the file "smpl_rendering.blend"

hello,
thank you for your great work, it is solid and meaningful.
i am interested in your four conditions(depth, normal, seg_map, pose) rendered from SMPL model. this is also meaning for my work now.
however, i follow your old version code(https://github.com/Leoooo333/champ/tree/master?tab=readme-ov-file). when i process SMPL section, i have finished Fit SMPL and Transfer SMPL successfully.
THE QUESTION is the command "blender smpl_rendering.blend --background --python rendering.py --driving_path test_smpl/transfer_result/smpl_results --reference_path test_smpl/reference_imgs/images/ref.png" of RINDERING section, i can not find the file smpl_rendering.blend in the code.
i am confused, how can i find this file to finish my rendering? thank you!

Parametric shape alignment

Hi, thanks for sharing this work -- it looks great!
The paper mentions parametric shape alignment that sounds intriguing. however, i'm not seeing any reference to those models in the codebase. do you plan to release the inference/models for that as well?.

Could you provide the color map for SMPL body parts？

Is the semantic segmentation map generated by the body part segmentation map of the SMPL model? Could you provide the color map for different body parts？

How could you get the results of Animate Anyone?

CUDA out of memory, feature request

This looks really wonderful. Many thanks for sharing with the community.
I gave up looking at the System requirement and Graphics card requirement but thought no harm in trying.

So I tested on Windows 11 + RTX 4060 8 GB VRAM + 16 GB RAM and it worked.
The changes I did was to keep the frames inside motion to 20 frames and deleted remaining from all folders within motion. [EDIT] Tried with 40 motion frames and it worked without complaining. :)

requesting to add a feature of batch processing and the number of frames can be specified in configs/inference.yaml
Correct the system requirements to also include Windows. along with Ubuntu20.04

Here is the output
https://github.com/fudan-generative-vision/champ/assets/2102186/b3fc3b93-a5cc-4ef7-94d3-22cd4e5ed9f3

Please put your models on huggingface - Google drive quota already exceeded

V100爆显存

使用V100（32G）运行
CUDA_VISIBLE_DEVICES=6 python inference.py --config configs/inference.yaml

爆显存如下
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 3.95 GiB (GPU 0; 31.75 GiB total capacity; 24.84 GiB already allocated; 1.96 GiB free; 28.65 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

需要多大的显存才可以运行或者修改哪里？

Is there any plan for releasing the training code?

Blender script for animations

Hi, your script is awesome! I have a question, when will you make the Blender file available for generating new animations from video? Because at the moment, I can only use the example data for animation

How to get result with my own pose?

Hi author!

Very nice work!

I wanna try this model with my own image or pose in video.

Could you know me how to use it?

Have new schedule about the dataset and training code release?

Hi,buddy.Thank you very much for your great work and I am very appreciate.
As you notice the roadmap will delay.
I am very concerned your new schedule.

Why do you choose to use blender to render condition images?

Hi dear author @Leoooo333,

Thanks for releasing your code and provide guidance in preparing condition images to run with your method!

I noticed that you use pyrender as in HMR2 for semantic condition rendering and blender for rendering the rest of conditions. I am wondering why do you make this choice -- is it intentional? Will there be any issue we should keep in mind if we only use, say, pyrender to render all conditions?

Thanks!
Hang

Some doubts after testing Champ

Without data preprocessing, a random picture is used as ref_image and the provided motion_6 for inference. The result is as follow. The consistency of the character's movements is very good, but the character's face is greatly damaged. It should be due to the lack of preprocessing and the human body information in ref_image and the figure in motion are not aligned.

grid_wguidance.mp4

Because the paper mentioned that champ was tested on the UBC fashion dataset, in order to test the data preprocessing process, the following video was selected as the guidance motion from the UBC fashion dataset.

91D23ZVV6NS.mp4

Based on the data preprocessing doc, after completing the environment setup, the required depth, normal, semantic_map and dwpose features can be successfully obtained from the motion guidance video. But I encountered a problem. The obtained semantic_map was missing two frames for some reason. Have you encountered this during data preprocessing? Because the 14s motion guidance video has a total of 422 frames, the difference between the two frames before and after is small. For the two missing frames in semantic_map, directly copy the previous frame to supplement.

In the figure below, the left side is the first frame of the guidance motion video, its size is 960×1254. The right side is the reference image, its size is 451×677. The middle is the depth in the first frame of the guidance motion video after data preprocessing, you can see that the image size is aligned to 451×677, and the human body parts are also more consistent.

However, using the preprocessed data based on the above reference image and guidance motion video for inference, the result is very bad, as shown below. There is a lot of jitter in the video, and there are serious distortions in the faces and bodies of the characters.

animation.mp4

Can somebody tell me the reason for the poor performance or provide some suggestions for improvement? Thanks

Strong image distortion

Thank you very much for your great work.

I have tried your model using reference images of my own and the end result is often not visually pleasing at all (face distorted, image proportions changed, ...).

I would be grateful if you could provide any potential constraint regarding the source image, for instance :

any ratio height / width for the image
proportion of the head versus the rest of the image
location of the head and body in the image (horizontally centered ?)
amount of body visible and the pose of the body
max dimensions of the image

Many thanks in advance

How to get depth images

Wonderful Work!!
I really appreciate you release your model and testing code. I have a question about the depth images. How do you get them?
Thank you!

operating system

how to run on windows or mac os

The output video is messy and disorganized

I take the configs/conference.yaml file for testing, and my graphics memory is only 20GB. Therefore I deleted a lot of the action images of motion-09, leaving only the first 100 images with an output resolution of 512x512. But the output video had serious errors, what is the reason for this? I tested a total of 4 images and actions in example_data, but none of them yielded the correct results

Is it possible to use customized basemodel?

I would like to run the inference using other base model than SD1.5, such as majicMIX-realistic on Hugging Face.

I faced a problem running it on other basemodel. Simply changing the cfg.base_model_path to majicMIX-realistic does not work:
denoising_unet/reference_unet.load_state_dict( ... ) at line 200-213 resets the unet to the base model
When I nullify line 200-213, what I obtain as result is basic noise image of grey color.

I would like to know whether it is possible to use .safetensor on CIVITAI to change the base model of the image?
Or, are the denoising_unet.pth and reference_unet.pth in provided checkpoint specialized for their own task which makes other models unable to function?

If there is a method that I can easily implement using other base model, please guide me. Thank you!

Video flickers severely on my own data

Hi，Thank you very much for your great work. it's really awesome！

I successfully ran the entire project on my own dataset, but the generated results seem to flicker much more severely compared to the example data. Is there a way to stabilize the results like the examples do?
The tools I used are as follows:

0x1. To ensure the stability of the running results, I deliberately resized the ref_image and motion_data to the same dimensions. The ref_image was regenerated based on the pose from the motion data.
0x2. I obtained complete motion data based on the project at https://github.com/kijai/ComfyUI-champWrapper, and then imported the data into CHAMP to run the process:
1.1 DSINE Normal Map to obtain normal data
1.2 DWpose Estimator to obtain dwpose data
1.3 Depth Anything to obtain depth data
1.4 DensePose Estimator to obtain semantic_map data
0x3. The motion data and ref image I used are attached.
Thank you very much.

data.zip

why

Memory Consumption

Hello,

This looks like an excellent piece of work - thank you for releasing openly with models available!

Question on whether there are any means by which we can reduce VRAM usage? For those of us who don't have an A100 :)

Cheers.

给项目点赞！制作 Motion 的这一套流程技术太强了！

✌✌✌用了最少帧的 06 样本，跑起来了。。。
启动中，请耐心等待......
03/26/2024 15:38:35 - INFO - root - Running inference ...
03/26/2024 15:38:39 - INFO - models.unet_3d - loaded temporal unet's pretrained weights from pretrained_models\stable-diffusion-v1-5\unet ...
03/26/2024 15:38:46 - INFO - models.unet_3d - Load motion module params from pretrained_models\champ\motion_module.pth
D:\AITest\Champ\runtime\lib\site-packages\torch_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
return self.fget.get(instance, owner)()
03/26/2024 15:38:49 - INFO - models.unet_3d - Loaded 453.20928M-parameter motion module
Some weights of the model checkpoint were not used when initializing UNet2DConditionModel:
['conv_norm_out.bias, conv_norm_out.weight, conv_out.bias, conv_out.weight']
90%|████████████████████████████████████████████████████████████████████████ | 18/20 [18:10<02:01, 60.65s/it]

some problems in data_process.md

When i follow data_process.md to setup environment and download models, i realize some problems about downloading models. The original text said "download our Pose model dw-ll_ucoco_384.onnx and Det model yolox_l.onnx, then put them into Champ/annotator/ckpts/", the "Champ" is the root directory of the project or "pretrained_models/champ",or should the models be placed in an unopened "annotator/ckpts"?

As shown in the red boxes in the screenshots below, "annotator" and "hmr2" don’t feel like third-party libraries, but there is no relevant directories in repo. Has it not been released yet?

RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED

(champ) meme@ubuntugpu:~/champ$ /mnt/data/meme/.conda/envs/champ/bin/python  inference.py --config configs/inference.yaml
[2024-04-06 14:13:13,650] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
2024-04-06 14:13:14.193352: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-04-06 14:13:14.243075: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-04-06 14:13:15.100858: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
/mnt/data/meme/.local/lib/python3.10/site-packages/diffusers/utils/outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  torch.utils._pytree._register_pytree_node(
04/06/2024 14:13:16 - INFO - root - Running inference ...
04/06/2024 14:13:29 - INFO - models.unet_3d - loaded temporal unet's pretrained weights from pretrained_models/stable-diffusion-v1-5/unet ...
04/06/2024 14:13:56 - INFO - models.unet_3d - Load motion module params from pretrained_models/champ/motion_module.pth
04/06/2024 14:14:14 - INFO - models.unet_3d - Loaded 453.20928M-parameter motion module
Some weights of the model checkpoint were not used when initializing UNet2DConditionModel: 
 ['conv_norm_out.weight, conv_norm_out.bias, conv_out.weight, conv_out.bias']
Traceback (most recent call last):
  File "/mnt/data/meme/champ/inference.py", line 312, in <module>
    main(cfg)
  File "/mnt/data/meme/champ/inference.py", line 260, in main
    result_video_tensor = inference(
  File "/mnt/data/meme/champ/inference.py", line 134, in inference
    video = pipeline(
  File "/mnt/data/meme/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/mnt/data/meme/champ/pipelines/pipeline_aggregation.py", line 387, in __call__
    clip_image_embeds = self.image_encoder(
  File "/mnt/data/meme/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/mnt/data/meme/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/data/meme/.local/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 1310, in forward
    vision_outputs = self.vision_model(
  File "/mnt/data/meme/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/mnt/data/meme/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/data/meme/.local/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 865, in forward
    hidden_states = self.embeddings(pixel_values)
  File "/mnt/data/meme/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/mnt/data/meme/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/data/meme/.local/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 195, in forward
    patch_embeds = self.patch_embedding(pixel_values)  # shape = [*, width, grid, grid]
  File "/mnt/data/meme/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/mnt/data/meme/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/data/meme/.local/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 460, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/mnt/data/meme/.local/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 456, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED

my test case for comfyui workflow is bad

here is my test image and video：

2s.mp4

and here is the result video：

Champ.mp4

So, what's the problem? is there any requirements for test data? Thanks

where is model_config.yaml?

Traceback (most recent call last):
File "/workspace/champ/4D-Humans/inference_smpl.py", line 74, in
model, model_cfg = load_hmr2(DEFAULT_CHECKPOINT)
File "/workspace/champ/4D-Humans/hmr2/models/init.py", line 72, in load_hmr2
model_cfg = get_config(model_cfg, update_cachedir=True)
File "/workspace/champ/4D-Humans/hmr2/configs/init.py", line 103, in get_config
cfg.merge_from_file(config_file)
File "/opt/conda/lib/python3.10/site-packages/yacs/config.py", line 211, in merge_from_file
with open(cfg_filename, "r") as f:
FileNotFoundError: [Errno 2] No such file or directory: '/root/.cache/4DHumans/logs/train/multiruns/hmr2/0/model_config.yaml'

No matter how hard I look, I can't find the relevant yaml file. Is there anything I missed?

[swscaler @ 0x68f11c0] Warning: data is not aligned! This can lead to a speed loss

测试数据如何与指定motion对齐

[Tutorial]: Run Champ on Windows

I am unable to checkout or create Pull request as I am not a collaborator.

If you could please include the following as Windows tutorial

Champ: Windows tutorial in English (https://www.youtube.com/watch?v=XasrlKbFKy0)
@Leoooo333
@ShenhaoZhu

comfyui verison: size of tensor a (67) must match the size of tensor b (68) at non-singleton dimension 4

when I use the comfyui version, I met a tensor error , I really don't know what is going on, can someone help me, the workflow is below

Error occurred when executing champ_sampler:

The size of tensor a (67) must match the size of tensor b (68) at non-singleton dimension 4

ERROR:root:Traceback (most recent call last):
File "/root/ComfyUI/execution.py", line 152, in recursive_execute
output_data, output_ui = get_output_data(obj, input_data_all)
File "/root/ComfyUI/execution.py", line 82, in get_output_data
return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True)
File "/root/ComfyUI/execution.py", line 75, in map_node_over_list
results.append(getattr(obj, func)(**slice_dict(input_data_all, i)))
File "/root/ComfyUI/custom_nodes/ComfyUI-champWrapper/nodes.py", line 418, in process
result_video_tensor = inference(
File "/root/ComfyUI/custom_nodes/ComfyUI-champWrapper/nodes.py", line 471, in inference
video = pipeline(
File "/root/miniconda3/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/root/ComfyUI/custom_nodes/ComfyUI-champWrapper/pipelines/pipeline_aggregation.py", line 550, in call
pred = self.denoising_unet(
File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/root/ComfyUI/custom_nodes/ComfyUI-champWrapper/models/unet_3d.py", line 484, in forward
sample = sample + guidance_fea
RuntimeError: The size of tensor a (67) must match the size of tensor b (68) at non-singleton dimension 4
Champ_replace_person_01 (1).json

can't open file 'C:\\sd1\\champ\\inference_smpl.py'

Hello,

I am trying to SMPL & Rendering for own video.

Followed all the steps but it comes up with an error

(venv) C:\sd1\champ>python inference_smpl.py  --reference_imgs_folder test_smpl/reference_imgs --driving_videos_folder test_smpl/driving_videos --device 1
C:\Program Files\Python310\python.exe: can't open file 'C:\\sd1\\champ\\inference_smpl.py': [Errno 2] No such file or directory

Can't locate this file on github

Am I supposed to manually create the folders for the models?

I don't have a folder called pretrained_models, champ(unless it means the main app folder which is called champ), image_ encoder, sd-vae-ft, stable-diffusion-v-15

Inference time

Hi, I'm grateful for your excellent work! I've implemented the code as per the instructions, and it runs without errors. However, the inference time is slow, approximately 176 seconds per iteration. I tested it on an 80G A100 GPU, and it seems to be using around 71G of GPU memory. Is this normal?

About the release of the dataset

Very nice work! May I know do you have plan to release the dataset?

How to obtain normal map

According to your previous advice, I have tried Smpler-X and depth anything to obtain 3-D human body and corresponding depth image. I also have a question about how to obtain normal map. Could you please tell me the model you use?

Thank you!

2 Issues regarding example data

In example_data/motions/motion-0X:
There are extra output.mp4 which should not be in the dataset.
This causes an error in 'inference.py' line 78 by opening a video file using PIL.Image module
I fixed this in my local by adding the following lines after line 73:
try: Image.open(guidance_image_path).convert("RGB") except: continue
In example_data/motions/motion-07:
There is an extra 0389_all.png for motion-07/semantic_map while other depth, dwpose, mask, normal does not have 0389_all.png
This causes an assertion error in 'inference.py' line 87
I manually fixed it by erasing the file in my folder, but it would be grateful if you fix it by adding a code that automatically skips the image if it does not match with other guidances.

ImportError: ('Unable to load EGL library', "Could not find module 'EGL' (or one of its dependencies). Try using the full path with constructor syntax.", 'EGL', None)

PackagesNotFoundError: The following packages are not available from current channels: - gxx=12 - gcc=12

怎么制作motion数据？

包括：depth dwpose mask normal semantic_map

the code of computing PSNR in the Disco repository is wrong

The Disco code of computing PSNR as follows is wrong:
https://github.com/Wangt-CN/DisCo/blob/8538889c9ee9edd8dd43ffee182d1a91ce7a9828/tool/metrics/ssim_l1_lpips_psnr.py#L13.

As pointed out in Wangt-CN/DisCo#86, the accurate code is mse = np.mean((original/1.0 - compressed/1.0) ** 2) instead of mse = np.mean((original - compressed) ** 2) , because original and compressed images are uint8 in their code, and (original - compressed) * * 2 will cause numerical overflow.

If you use their evaluation code of computing PSNR, please update your results.

fudan-generative-vision / champ Goto Github PK

champ's People

Stargazers

Watchers

Forkers

champ's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs