Hi, Thanks a lot for this great work! I am trying to run the finetun

Hi, thanks a lot for the report of this problem. There is a args <code class="notr

Generally, you can follow this setup: <a href="https://github.com/Wangt-CN/DisCo#human

Human specific finetune about disco HOT 6 CLOSED

alicranck commented on August 22, 2024

Human specific finetune

from disco.

Comments (6)

Wangt-CN commented on August 22, 2024 2

Hi, thanks a lot for the report of this problem.
There is a args web_data_root at the config file, e.g., https://github.com/Wangt-CN/DisCo/blob/main/config/ref_attn_clip_combine_controlnet_imgspecific_ft/webtan_S256L16_xformers_upsquare.py#L26C5-L26C101. Please change it to your own local dev path.

You may also need to modity the ft_idx here (https://github.com/Wangt-CN/DisCo/blob/main/dataset/tiktok_controlnet_t2i_imagevar_combine_specifcimg_web_upsquare.py#L109), which is the specific dir name of the specific video frames.

Sorry that I am not at the workstation now so that can not refine this human-specific ft dataloader. If you finally work it out, you are also welcome to submit a pull request.

Best,
Tan

from disco.

Wangt-CN commented on August 22, 2024

Hi, thanks for your great interests.

May I first confirm the stage you want to ask is "fine-tuning" or "human-specific fine-tuning"?
We use tiktok data for fine-tuning, but for human-specific fine-tuning, actually you can use any data you want.

from disco.

Wangt-CN commented on August 22, 2024

The training process is: 1) Pre-training --> 2) General fine-tuning (on tiktok data).--> 3) Human-specific fine-tuning (optional, and you can expect a much better results if you have a human-specific training data, e.g, anime).

from disco.

alicranck commented on August 22, 2024

Thanks for the quick response:)
Yes exactly, I'm interested in performing human specific finetuning on my data.

from disco.

Wangt-CN commented on August 22, 2024

Generally, you can follow this setup: https://github.com/Wangt-CN/DisCo#human-specific-fine-tuning.
But actually, we do not use Tiktok data in human-specific ft. Would you please provide the running command and the error message?

For your own data, you need to first get the human mask and the openpose annotation. You can choose to process by your own or use the toolkit provided by us.

Best,
Tan

from disco.

alicranck commented on August 22, 2024

Indeed I am following the instructions provided, I prepared the masks and poses as well.

The command is cmd = f'{python_path} {script_path} \ --cf {config_path} --do_train --root_dir {project_dir} \ --local_train_batch_size 32 --local_eval_batch_size 32 --log_dir {project_dir} \ --epochs 20 --deepspeed --eval_step 500 --save_step 500 --gradient_accumulate_steps 1 \ --learning_rate 1e-3 --fix_dist_seed --loss_target "noise" \ --unet_unfreeze_type "crossattn" \ --refer_sdvae --ref_null_caption False --combine_clip_local --combine_use_mask --conds "poses" "masks" \ --freeze_pose True --freeze_background False \ --pretrained_model {base_model_ckpt} \ --ft_iters 500 --ft_one_ref_image False --ft_idx {hf_data_dir} --strong_aug_stage1 True --strong_rand_stage2 True'

and the error is:
────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ /home/ubuntu/dataspan/dataspan_research/dataspan_research/external_repos/DisCo/finetune_sdm_yaml │ │ .py:209 in <module> │ │ │ │ 206 │ # main_worker(parsed_args) │ │ 207 │ from utils.args import sharedArgs │ │ 208 │ parsed_args = sharedArgs.parse_args() │ │ ❱ 209 │ main_worker(parsed_args) │ │ 210 │ │ │ │ /home/ubuntu/dataspan/dataspan_research/dataspan_research/external_repos/DisCo/finetune_sdm_yaml │ │ .py:99 in main_worker │ │ │ │ 96 │ │ │ train_dataset = BaseDataset(args, args.train_yaml, split='train', preprocess │ │ 97 │ │ │ eval_dataset = BaseDataset(args, args.val_yaml, split='val', preprocesser=mo │ │ 98 │ │ else: │ │ ❱ 99 │ │ │ train_dataset = BaseDataset(args, args.train_yaml, split='train') │ │ 100 │ │ │ eval_dataset = BaseDataset(args, args.val_yaml, split='val') │ │ 101 │ │ │ │ 102 │ │ train_info = get_loader_info(args, args.local_train_batch_size, │ │ │ │ /home/ubuntu/dataspan/dataspan_research/dataspan_research/external_repos/DisCo/dataset/tiktok_co │ │ ntrolnet_t2i_imagevar_combine_specifcimg_web_upsquare.py:133 in __init__ │ │ │ │ 130 │ │ │ │ │ image_files_idx = list(open(os.path.join(folder_path, 'image_list.tx │ │ 131 │ │ │ │ │ image_files_idx = [file.strip() for file in image_files_idx] │ │ 132 │ │ │ │ else: │ │ ❱ 133 │ │ │ │ │ image_files_idx = os.listdir(folder_path) │ │ 134 │ │ │ │ │ image_files_idx = [file for file in image_files_idx if file.endswith │ │ 135 │ │ │ else: │ │ 136 │ │ │ │ assert split == 'val' │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ FileNotFoundError: [Errno 2] No such file or directory: './blob_dir/debug_output/video_sythesis/dataset/Lindsey_0504_youtube/frames_tan//home/ubuntu/dataspan/common_data/human_specific_finetune'

I assume that some paths are coded into the scripts pointing to data that I don't have?

Thanks!

from disco.

Human specific finetune about disco HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs