GithubHelp home page GithubHelp logo

fudan-generative-vision / champ Goto Github PK

View Code? Open in Web Editor NEW
3.5K 3.5K 415.0 779.3 MB

Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance

Home Page: https://fudan-generative-vision.github.io/champ/

License: MIT License

Python 100.00%
human-animation image-animatioln video-generation

champ's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

champ's Issues

Vram required?

Thanks for the great work! What is the minimum Vram required?

Commercial usage?

Are the checkpoints available for commercial usage?
Thank you for your work!

CUDA out of memory

Hi :) Champ is really an inspiring work! During my experiment, Champ shows a high demand of memory and I cannot run the inference code on 3090 due to out of memory. May I ask is there any solution to solve this except changing to A10

Torch Not Compile with Cuda Enable

My machine is RTX4080 Window
I install all pretrain models, packages and run it in conda, when I use torch==2.0.1, it will say Torch not compiled with CUDA enabled.

  File "D:\Github\champ\inference.py", line 284, in <module>
    main(cfg)
  File "D:\Github\champ\inference.py", line 162, in main
    ).to(dtype=weight_dtype, device="cuda")
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Charlie\miniconda3\Lib\site-packages\transformers\modeling_utils.py", line 1902, in to
    return super().to(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Charlie\miniconda3\Lib\site-packages\torch\nn\modules\module.py", line 1152, in to
    return self._apply(convert)
           ^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Charlie\miniconda3\Lib\site-packages\torch\nn\modules\module.py", line 802, in _apply
    module._apply(fn)
  File "C:\Users\Charlie\miniconda3\Lib\site-packages\torch\nn\modules\module.py", line 802, in _apply
    module._apply(fn)
  File "C:\Users\Charlie\miniconda3\Lib\site-packages\torch\nn\modules\module.py", line 802, in _apply
    module._apply(fn)
  File "C:\Users\Charlie\miniconda3\Lib\site-packages\torch\nn\modules\module.py", line 825, in _apply
    param_applied = fn(param)
                    ^^^^^^^^^
  File "C:\Users\Charlie\miniconda3\Lib\site-packages\torch\nn\modules\module.py", line 1150, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Charlie\miniconda3\Lib\site-packages\torch\cuda\__init__.py", line 293, in _lazy_init
    raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled

I also tried using "conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia", in that way,
it will say

`decoderF` is not supported because:
    xFormers wasn't build with CUDA support
    attn_bias type is <class 'NoneType'>
    operator wasn't built - see `python -m xformers.info` for more info
`[email protected]` is not supported because:
    xFormers wasn't build with CUDA support
    dtype=torch.float32 (supported: {torch.bfloat16, torch.float16})
    operator wasn't built - see `python -m xformers.info` for more info
`tritonflashattF` is not supported because:
    xFormers wasn't build with CUDA support
    dtype=torch.float32 (supported: {torch.bfloat16, torch.float16})
    operator wasn't built - see `python -m xformers.info` for more info
    triton is not available
`cutlassF` is not supported because:
    xFormers wasn't build with CUDA support
    operator wasn't built - see `python -m xformers.info` for more info
`smallkF` is not supported because:
    max(query.shape[-1] != value.shape[-1]) > 32
    xFormers wasn't build with CUDA support
    operator wasn't built - see `python -m xformers.info` for more info
    unsupported embed per head: 40

how to solve this

What model is to be used for extraction of the semantic segmentation map?

Thank you for your awesome work released in open source! I really appreciate the impact this paper and code will bring to the community.

I would like to test this model by using in-the-wild video which requires preprocessing.
I am planning to use the follows:

  • Depth: Depth Anything with greyscale
  • Dwpose: official dwpose repository
  • Normal: ICON normal map
  • mask: unsure whether it is necessary as "inference.yaml" does not require mask for its guidance_types by default

Meanwhile, I am unsure what models I should be using for semantic segmentation map. Please guide me if there is any model that are suitable to be used in the data preprocessing stage.

i can not find the file "smpl_rendering.blend"

hello,
thank you for your great work, it is solid and meaningful.
i am interested in your four conditions(depth, normal, seg_map, pose) rendered from SMPL model. this is also meaning for my work now.
however, i follow your old version code(https://github.com/Leoooo333/champ/tree/master?tab=readme-ov-file). when i process SMPL section, i have finished Fit SMPL and Transfer SMPL successfully.
THE QUESTION is the command "blender smpl_rendering.blend --background --python rendering.py --driving_path test_smpl/transfer_result/smpl_results --reference_path test_smpl/reference_imgs/images/ref.png" of RINDERING section, i can not find the file smpl_rendering.blend in the code.
i am confused, how can i find this file to finish my rendering? thank you!

Parametric shape alignment

Hi, thanks for sharing this work -- it looks great!
The paper mentions parametric shape alignment that sounds intriguing. however, i'm not seeing any reference to those models in the codebase. do you plan to release the inference/models for that as well?.

CUDA out of memory, feature request

This looks really wonderful. Many thanks for sharing with the community.
I gave up looking at the System requirement and Graphics card requirement but thought no harm in trying.

So I tested on Windows 11 + RTX 4060 8 GB VRAM + 16 GB RAM and it worked.
The changes I did was to keep the frames inside motion to 20 frames and deleted remaining from all folders within motion. [EDIT] Tried with 40 motion frames and it worked without complaining. :)

  1. requesting to add a feature of batch processing and the number of frames can be specified in configs/inference.yaml
  2. Correct the system requirements to also include Windows. along with Ubuntu20.04

Here is the output
https://github.com/fudan-generative-vision/champ/assets/2102186/b3fc3b93-a5cc-4ef7-94d3-22cd4e5ed9f3

V100爆显存

使用V100(32G)运行
CUDA_VISIBLE_DEVICES=6 python inference.py --config configs/inference.yaml

爆显存如下
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 3.95 GiB (GPU 0; 31.75 GiB total capacity; 24.84 GiB already allocated; 1.96 GiB free; 28.65 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

需要多大的显存才可以运行或者修改哪里?

Blender script for animations

Hi, your script is awesome! I have a question, when will you make the Blender file available for generating new animations from video? Because at the moment, I can only use the example data for animation

Why do you choose to use blender to render condition images?

Hi dear author @Leoooo333,

Thanks for releasing your code and provide guidance in preparing condition images to run with your method!

I noticed that you use pyrender as in HMR2 for semantic condition rendering and blender for rendering the rest of conditions. I am wondering why do you make this choice -- is it intentional? Will there be any issue we should keep in mind if we only use, say, pyrender to render all conditions?

Thanks!
Hang

Some doubts after testing Champ

Without data preprocessing, a random picture is used as ref_image and the provided motion_6 for inference. The result is as follow. The consistency of the character's movements is very good, but the character's face is greatly damaged. It should be due to the lack of preprocessing and the human body information in ref_image and the figure in motion are not aligned.

grid_wguidance.mp4

Because the paper mentioned that champ was tested on the UBC fashion dataset, in order to test the data preprocessing process, the following video was selected as the guidance motion from the UBC fashion dataset.

91D23ZVV6NS.mp4

Based on the data preprocessing doc, after completing the environment setup, the required depth, normal, semantic_map and dwpose features can be successfully obtained from the motion guidance video. But I encountered a problem. The obtained semantic_map was missing two frames for some reason. Have you encountered this during data preprocessing? Because the 14s motion guidance video has a total of 422 frames, the difference between the two frames before and after is small. For the two missing frames in semantic_map, directly copy the previous frame to supplement.

In the figure below, the left side is the first frame of the guidance motion video, its size is 960×1254. The right side is the reference image, its size is 451×677. The middle is the depth in the first frame of the guidance motion video after data preprocessing, you can see that the image size is aligned to 451×677, and the human body parts are also more consistent.
image

However, using the preprocessed data based on the above reference image and guidance motion video for inference, the result is very bad, as shown below. There is a lot of jitter in the video, and there are serious distortions in the faces and bodies of the characters.

animation.mp4

Can somebody tell me the reason for the poor performance or provide some suggestions for improvement? Thanks

Strong image distortion

Hi

Thank you very much for your great work.

I have tried your model using reference images of my own and the end result is often not visually pleasing at all (face distorted, image proportions changed, ...).

I would be grateful if you could provide any potential constraint regarding the source image, for instance :

  • any ratio height / width for the image
  • proportion of the head versus the rest of the image
  • location of the head and body in the image (horizontally centered ?)
  • amount of body visible and the pose of the body
  • max dimensions of the image

Many thanks in advance

How to get depth images

Wonderful Work!!
I really appreciate you release your model and testing code. I have a question about the depth images. How do you get them?
Thank you!

The output video is messy and disorganized

I take the configs/conference.yaml file for testing, and my graphics memory is only 20GB. Therefore I deleted a lot of the action images of motion-09, leaving only the first 100 images with an output resolution of 512x512. But the output video had serious errors, what is the reason for this? I tested a total of 4 images and actions in example_data, but none of them yielded the correct results
20240328182857

Is it possible to use customized basemodel?

I would like to run the inference using other base model than SD1.5, such as majicMIX-realistic on Hugging Face.

I faced a problem running it on other basemodel. Simply changing the cfg.base_model_path to majicMIX-realistic does not work:
denoising_unet/reference_unet.load_state_dict( ... ) at line 200-213 resets the unet to the base model
When I nullify line 200-213, what I obtain as result is basic noise image of grey color.

I would like to know whether it is possible to use .safetensor on CIVITAI to change the base model of the image?
Or, are the denoising_unet.pth and reference_unet.pth in provided checkpoint specialized for their own task which makes other models unable to function?

If there is a method that I can easily implement using other base model, please guide me. Thank you!

Video flickers severely on my own data

Hi,Thank you very much for your great work. it's really awesome!

I successfully ran the entire project on my own dataset, but the generated results seem to flicker much more severely compared to the example data. Is there a way to stabilize the results like the examples do?
The tools I used are as follows:

0x1. To ensure the stability of the running results, I deliberately resized the ref_image and motion_data to the same dimensions. The ref_image was regenerated based on the pose from the motion data.
0x2. I obtained complete motion data based on the project at https://github.com/kijai/ComfyUI-champWrapper, and then imported the data into CHAMP to run the process:
1.1 DSINE Normal Map to obtain normal data
1.2 DWpose Estimator to obtain dwpose data
1.3 Depth Anything to obtain depth data
1.4 DensePose Estimator to obtain semantic_map data
0x3. The motion data and ref image I used are attached.
Thank you very much.

data.zip

Memory Consumption

Hello,

This looks like an excellent piece of work - thank you for releasing openly with models available!

Question on whether there are any means by which we can reduce VRAM usage? For those of us who don't have an A100 :)

Cheers.

给项目点赞!制作 Motion 的这一套流程技术太强了!

✌✌✌用了最少帧的 06 样本,跑起来了。。。
启动中,请耐心等待......
03/26/2024 15:38:35 - INFO - root - Running inference ...
03/26/2024 15:38:39 - INFO - models.unet_3d - loaded temporal unet's pretrained weights from pretrained_models\stable-diffusion-v1-5\unet ...
03/26/2024 15:38:46 - INFO - models.unet_3d - Load motion module params from pretrained_models\champ\motion_module.pth
D:\AITest\Champ\runtime\lib\site-packages\torch_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
return self.fget.get(instance, owner)()
03/26/2024 15:38:49 - INFO - models.unet_3d - Loaded 453.20928M-parameter motion module
Some weights of the model checkpoint were not used when initializing UNet2DConditionModel:
['conv_norm_out.bias, conv_norm_out.weight, conv_out.bias, conv_out.weight']
90%|████████████████████████████████████████████████████████████████████████ | 18/20 [18:10<02:01, 60.65s/it]

some problems in data_process.md

When i follow data_process.md to setup environment and download models, i realize some problems about downloading models. The original text said "download our Pose model dw-ll_ucoco_384.onnx and Det model yolox_l.onnx, then put them into Champ/annotator/ckpts/", the "Champ" is the root directory of the project or "pretrained_models/champ",or should the models be placed in an unopened "annotator/ckpts"?

As shown in the red boxes in the screenshots below, "annotator" and "hmr2" don’t feel like third-party libraries, but there is no relevant directories in repo. Has it not been released yet?
image

image

RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED

(champ) meme@ubuntugpu:~/champ$ /mnt/data/meme/.conda/envs/champ/bin/python  inference.py --config configs/inference.yaml
[2024-04-06 14:13:13,650] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
2024-04-06 14:13:14.193352: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-04-06 14:13:14.243075: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-04-06 14:13:15.100858: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
/mnt/data/meme/.local/lib/python3.10/site-packages/diffusers/utils/outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  torch.utils._pytree._register_pytree_node(
04/06/2024 14:13:16 - INFO - root - Running inference ...
04/06/2024 14:13:29 - INFO - models.unet_3d - loaded temporal unet's pretrained weights from pretrained_models/stable-diffusion-v1-5/unet ...
04/06/2024 14:13:56 - INFO - models.unet_3d - Load motion module params from pretrained_models/champ/motion_module.pth
04/06/2024 14:14:14 - INFO - models.unet_3d - Loaded 453.20928M-parameter motion module
Some weights of the model checkpoint were not used when initializing UNet2DConditionModel: 
 ['conv_norm_out.weight, conv_norm_out.bias, conv_out.weight, conv_out.bias']
Traceback (most recent call last):
  File "/mnt/data/meme/champ/inference.py", line 312, in <module>
    main(cfg)
  File "/mnt/data/meme/champ/inference.py", line 260, in main
    result_video_tensor = inference(
  File "/mnt/data/meme/champ/inference.py", line 134, in inference
    video = pipeline(
  File "/mnt/data/meme/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/mnt/data/meme/champ/pipelines/pipeline_aggregation.py", line 387, in __call__
    clip_image_embeds = self.image_encoder(
  File "/mnt/data/meme/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/mnt/data/meme/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/data/meme/.local/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 1310, in forward
    vision_outputs = self.vision_model(
  File "/mnt/data/meme/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/mnt/data/meme/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/data/meme/.local/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 865, in forward
    hidden_states = self.embeddings(pixel_values)
  File "/mnt/data/meme/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/mnt/data/meme/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/data/meme/.local/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 195, in forward
    patch_embeds = self.patch_embedding(pixel_values)  # shape = [*, width, grid, grid]
  File "/mnt/data/meme/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/mnt/data/meme/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/data/meme/.local/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 460, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/mnt/data/meme/.local/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 456, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED

where is model_config.yaml?

Traceback (most recent call last):
File "/workspace/champ/4D-Humans/inference_smpl.py", line 74, in
model, model_cfg = load_hmr2(DEFAULT_CHECKPOINT)
File "/workspace/champ/4D-Humans/hmr2/models/init.py", line 72, in load_hmr2
model_cfg = get_config(model_cfg, update_cachedir=True)
File "/workspace/champ/4D-Humans/hmr2/configs/init.py", line 103, in get_config
cfg.merge_from_file(config_file)
File "/opt/conda/lib/python3.10/site-packages/yacs/config.py", line 211, in merge_from_file
with open(cfg_filename, "r") as f:
FileNotFoundError: [Errno 2] No such file or directory: '/root/.cache/4DHumans/logs/train/multiruns/hmr2/0/model_config.yaml'

No matter how hard I look, I can't find the relevant yaml file. Is there anything I missed?

comfyui verison: size of tensor a (67) must match the size of tensor b (68) at non-singleton dimension 4

when I use the comfyui version, I met a tensor error , I really don't know what is going on, can someone help me, the workflow is below

Error occurred when executing champ_sampler:

The size of tensor a (67) must match the size of tensor b (68) at non-singleton dimension 4

ERROR:root:Traceback (most recent call last):
File "/root/ComfyUI/execution.py", line 152, in recursive_execute
output_data, output_ui = get_output_data(obj, input_data_all)
File "/root/ComfyUI/execution.py", line 82, in get_output_data
return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True)
File "/root/ComfyUI/execution.py", line 75, in map_node_over_list
results.append(getattr(obj, func)(**slice_dict(input_data_all, i)))
File "/root/ComfyUI/custom_nodes/ComfyUI-champWrapper/nodes.py", line 418, in process
result_video_tensor = inference(
File "/root/ComfyUI/custom_nodes/ComfyUI-champWrapper/nodes.py", line 471, in inference
video = pipeline(
File "/root/miniconda3/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/root/ComfyUI/custom_nodes/ComfyUI-champWrapper/pipelines/pipeline_aggregation.py", line 550, in call
pred = self.denoising_unet(
File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/root/ComfyUI/custom_nodes/ComfyUI-champWrapper/models/unet_3d.py", line 484, in forward
sample = sample + guidance_fea
RuntimeError: The size of tensor a (67) must match the size of tensor b (68) at non-singleton dimension 4
Champ_replace_person_01 (1).json

can't open file 'C:\\sd1\\champ\\inference_smpl.py'

Hello,

I am trying to SMPL & Rendering for own video.

Followed all the steps but it comes up with an error

(venv) C:\sd1\champ>python inference_smpl.py  --reference_imgs_folder test_smpl/reference_imgs --driving_videos_folder test_smpl/driving_videos --device 1
C:\Program Files\Python310\python.exe: can't open file 'C:\\sd1\\champ\\inference_smpl.py': [Errno 2] No such file or directory

Can't locate this file on github

Am I supposed to manually create the folders for the models?

./pretrained_models/
|-- champ
| |-- denoising_unet.pth
| |-- guidance_encoder_depth.pth
| |-- guidance_encoder_dwpose.pth
| |-- guidance_encoder_normal.pth
| |-- guidance_encoder_semantic_map.pth
| |-- reference_unet.pth
| -- motion_module.pth |-- image_encoder | |-- config.json | -- pytorch_model.bin
|-- sd-vae-ft-mse
| |-- config.json
| |-- diffusion_pytorch_model.bin
| -- diffusion_pytorch_model.safetensors -- stable-diffusion-v1-5
|-- feature_extractor
| -- preprocessor_config.json |-- model_index.json |-- unet | |-- config.json | -- diffusion_pytorch_model.bin
`-- v1-inference.yaml

I don't have a folder called pretrained_models, champ(unless it means the main app folder which is called champ), image_ encoder, sd-vae-ft, stable-diffusion-v-15

Inference time

Hi, I'm grateful for your excellent work! I've implemented the code as per the instructions, and it runs without errors. However, the inference time is slow, approximately 176 seconds per iteration. I tested it on an 80G A100 GPU, and it seems to be using around 71G of GPU memory. Is this normal?
image
image

How to obtain normal map

According to your previous advice, I have tried Smpler-X and depth anything to obtain 3-D human body and corresponding depth image. I also have a question about how to obtain normal map. Could you please tell me the model you use?

Thank you!

2 Issues regarding example data

  1. In example_data/motions/motion-0X:
    There are extra output.mp4 which should not be in the dataset.
    This causes an error in 'inference.py' line 78 by opening a video file using PIL.Image module
    I fixed this in my local by adding the following lines after line 73:
    try: Image.open(guidance_image_path).convert("RGB") except: continue

  2. In example_data/motions/motion-07:
    There is an extra 0389_all.png for motion-07/semantic_map while other depth, dwpose, mask, normal does not have 0389_all.png
    This causes an assertion error in 'inference.py' line 87
    I manually fixed it by erasing the file in my folder, but it would be grateful if you fix it by adding a code that automatically skips the image if it does not match with other guidances.

the code of computing PSNR in the Disco repository is wrong

The Disco code of computing PSNR as follows is wrong:
https://github.com/Wangt-CN/DisCo/blob/8538889c9ee9edd8dd43ffee182d1a91ce7a9828/tool/metrics/ssim_l1_lpips_psnr.py#L13.

image

As pointed out in Wangt-CN/DisCo#86, the accurate code is mse = np.mean((original/1.0 - compressed/1.0) ** 2) instead of mse = np.mean((original - compressed) ** 2) , because original and compressed images are uint8 in their code, and (original - compressed) * * 2 will cause numerical overflow.

If you use their evaluation code of computing PSNR, please update your results.

Salute your open source spirit!

The earth and the sky will praise your generosity, and countless algorithm engineers will praise you, the selfless devotee, the great architect.

The resulting video flickers badly

Thank you for your work! I have followed your instructions to complete the entire process, but the generated video flickers very seriously. Is this caused by SMPL not being smoothed? Can this problem be perfectly solved if SMPL is smoothed?

grid_wguidance.mp4

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.