modelscope / facechain Goto Github PK

View Code? Open in Web Editor NEW

8.8K 90.0 822.0 98.01 MB

FaceChain is a deep-learning toolchain for generating your Digital-Twin.

License: Apache License 2.0

Python 17.95% Shell 0.09% CSS 0.19% Jupyter Notebook 81.77%

facechain's Issues

can we use this on google colab?

please make colab version of this repo

mat1 and mat2 must have the same dtype

08/17/2023 14:39:07 - INFO - __main__ - ***** Running training *****
08/17/2023 14:39:07 - INFO - __main__ -   Num examples = 3
08/17/2023 14:39:07 - INFO - __main__ -   Num Epochs = 200
08/17/2023 14:39:07 - INFO - __main__ -   Instantaneous batch size per device = 1
08/17/2023 14:39:07 - INFO - __main__ -   Total train batch size (w. parallel, distributed & accumulation) = 1
08/17/2023 14:39:07 - INFO - __main__ -   Gradient Accumulation steps = 1
08/17/2023 14:39:07 - INFO - __main__ -   Total optimization steps = 600
Steps:   0%|                                                                                    | 0/600 [00:00<?, ?it/s]╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ facechain/facechain/train_text_to_image_lora.py:1103 in <module>           │
│                                                                                                  │
│   1100                                                                                           │
│   1101                                                                                           │
│   1102 if __name__ == "__main__":                                                                │
│ ❱ 1103 │   main()                                                                                │
│   1104                                                                                           │
│                                                                                                  │
│ facechain/facechain/train_text_to_image_lora.py:924 in main                │
│                                                                                                  │
│    921 │   │   │   │   │   raise ValueError(f"Unknown prediction type {noise_scheduler.config.p  │
│    922 │   │   │   │                                                                             │
│    923 │   │   │   │   # Predict the noise residual and compute loss                             │
│ ❱  924 │   │   │   │   model_pred = unet(noisy_latents, timesteps, encoder_hidden_states).sampl  │
│    925 │   │   │   │   loss = F.mse_loss(model_pred.float(), target.float(), reduction="mean")   │
│    926 │   │   │   │                                                                             │
│    927 │   │   │   │   # Gather the losses across all processes for logging (if we use distribu  │
│                                                                                                  │
│ /home//.local/lib/python3.10/site-packages/torch/nn/modules/module.py:1194 in           │
│ _call_impl                                                                                       │
│                                                                                                  │
│   1191 │   │   # this function, and just call forward.                                           │
│   1192 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o  │
│   1193 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1194 │   │   │   return forward_call(*input, **kwargs)                                         │
│   1195 │   │   # Do not call functions when jit is used                                          │
│   1196 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1197 │   │   if self._backward_hooks or _global_backward_hooks:                                │
│                                                                                                  │
│ /home//.local/lib/python3.10/site-packages/diffusers/models/unet_2d_condition.py:805 in │
│ forward                                                                                          │
│                                                                                                  │
│   802 │   │   # there might be better ways to encapsulate this.                                  │
│   803 │   │   t_emb = t_emb.to(dtype=sample.dtype)                                               │
│   804 │   │                                                                                      │
│ ❱ 805 │   │   emb = self.time_embedding(t_emb, timestep_cond)                                    │
│   806 │   │   aug_emb = None                                                                     │
│   807 │   │                                                                                      │
│   808 │   │   if self.class_embedding is not None:                                               │
│                                                                                                  │
│ /home//.local/lib/python3.10/site-packages/torch/nn/modules/module.py:1194 in           │
│ _call_impl                                                                                       │
│                                                                                                  │
│   1191 │   │   # this function, and just call forward.                                           │
│   1192 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o  │
│   1193 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1194 │   │   │   return forward_call(*input, **kwargs)                                         │
│   1195 │   │   # Do not call functions when jit is used                                          │
│   1196 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1197 │   │   if self._backward_hooks or _global_backward_hooks:                                │
│                                                                                                  │
│ /home//.local/lib/python3.10/site-packages/diffusers/models/embeddings.py:192 in        │
│ forward                                                                                          │
│                                                                                                  │
│   189 │   def forward(self, sample, condition=None):                                             │
│   190 │   │   if condition is not None:                                                          │
│   191 │   │   │   sample = sample + self.cond_proj(condition)                                    │
│ ❱ 192 │   │   sample = self.linear_1(sample)                                                     │
│   193 │   │                                                                                      │
│   194 │   │   if self.act is not None:                                                           │
│   195 │   │   │   sample = self.act(sample)                                                      │
│                                                                                                  │
│ /home//.local/lib/python3.10/site-packages/torch/nn/modules/module.py:1194 in           │
│ _call_impl                                                                                       │
│                                                                                                  │
│   1191 │   │   # this function, and just call forward.                                           │
│   1192 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o  │
│   1193 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1194 │   │   │   return forward_call(*input, **kwargs)                                         │
│   1195 │   │   # Do not call functions when jit is used                                          │
│   1196 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1197 │   │   if self._backward_hooks or _global_backward_hooks:                                │
│                                                                                                  │
│ /home//.local/lib/python3.10/site-packages/torch/nn/modules/linear.py:114 in forward    │
│                                                                                                  │
│   111 │   │   │   init.uniform_(self.bias, -bound, bound)                                        │
│   112 │                                                                                          │
│   113 │   def forward(self, input: Tensor) -> Tensor:                                            │
│ ❱ 114 │   │   return F.linear(input, self.weight, self.bias)                                     │
│   115 │                                                                                          │
│   116 │   def extra_repr(self) -> str:                                                           │
│   117 │   │   return 'in_features={}, out_features={}, bias={}'.format(                          │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: mat1 and mat2 must have the same dtype
Steps:   0%|                                                                                    | 0/600 [00:02<?, ?it/s]
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home//.local/bin/accelerate:8 in <module>                                              │
│                                                                                                  │
│   5 from accelerate.commands.accelerate_cli import main                                          │
│   6 if __name__ == '__main__':                                                                   │
│   7 │   sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0])                         │
│ ❱ 8 │   sys.exit(main())                                                                         │
│   9                                                                                              │
│                                                                                                  │
│ /home//.local/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py:45 in  │
│ main                                                                                             │
│                                                                                                  │
│   42 │   │   exit(1)                                                                             │
│   43 │                                                                                           │
│   44 │   # Run                                                                                   │
│ ❱ 45 │   args.func(args)                                                                         │
│   46                                                                                             │
│   47                                                                                             │
│   48 if __name__ == "__main__":                                                                  │
│                                                                                                  │
│ /home//.local/lib/python3.10/site-packages/accelerate/commands/launch.py:941 in         │
│ launch_command                                                                                   │
│                                                                                                  │
│   938 │   elif defaults is not None and defaults.compute_environment == ComputeEnvironment.AMA   │
│   939 │   │   sagemaker_launcher(defaults, args)                                                 │
│   940 │   else:                                                                                  │
│ ❱ 941 │   │   simple_launcher(args)                                                              │
│   942                                                                                            │
│   943                                                                                            │
│   944 def main():                                                                                │
│                                                                                                  │
│ /home//.local/lib/python3.10/site-packages/accelerate/commands/launch.py:603 in         │
│ simple_launcher                                                                                  │
│                                                                                                  │
│   600 │   process.wait()                                                                         │
│   601 │   if process.returncode != 0:                                                            │
│   602 │   │   if not args.quiet:                                                                 │
│ ❱ 603 │   │   │   raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)    │
│   604 │   │   else:                                                                              │
│   605 │   │   │   sys.exit(1)                                                                    │
│   606                                                                                            │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
CalledProcessError: Command '['miniconda3/bin/python', 'facechain/train_text_to_image_lora.py',
'--pretrained_model_name_or_path=ly261666/cv_portrait_model', '--revision=v2.0', '--sub_path=film/film',
'--dataset_name=./imgs', '--output_dataset_name=./processed', '--caption_column=text', '--resolution=512',
'--random_flip', '--train_batch_size=1', '--num_train_epochs=200', '--checkpointing_steps=5000',
'--learning_rate=1e-04', '--lr_scheduler=cosine', '--lr_warmup_steps=0', '--seed=42', '--output_dir=./output',
'--lora_r=32', '--lora_alpha=32', '--lora_text_encoder_r=32', '--lora_text_encoder_alpha=32']' returned non-zero exit
status 1.

Can we use it with free version of google colab??

Please tell me that if we can use it with free version of google colab or any other way where we don't have enough GPU

Can't download 3,2gb model

It's currently not possible to download the 3.20GB model.
The download fails at ~95%. This reproduceable on colab and local

Downloading:  92% 2.95G/3.20G [01:54<00:05, 46.1MB/s]


Downloading:  93% 2.97G/3.20G [01:54<00:05, 48.3MB/s]


Downloading:  93% 2.98G/3.20G [01:55<00:06, 34.6MB/s]


Downloading:  94% 3.00G/3.20G [01:55<00:06, 32.4MB/s]


Downloading:  94% 3.01G/3.20G [01:57<00:10, 19.7MB/s]


Downloading:  95% 3.04G/3.20G [01:58<00:07, 21.9MB/s]


Downloading:  95% 3.05G/3.20G [01:59<00:08, 20.7MB/s]Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/urllib3/response.py", line 710, in _error_catcher
    yield
  File "/usr/local/lib/python3.10/dist-packages/urllib3/response.py", line 814, in _raw_read
    data = self._fp_read(amt) if not fp_closed else b""
  File "/usr/local/lib/python3.10/dist-packages/urllib3/response.py", line 799, in _fp_read
    return self._fp.read(amt) if amt is not None else self._fp.read()
  File "/usr/lib/python3.10/http/client.py", line 466, in read
    s = self.fp.read(amt)
  File "/usr/lib/python3.10/socket.py", line 705, in readinto
    return self._sock.recv_into(b)
ConnectionResetError: [Errno 104] Connection reset by peer

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/requests/models.py", line 816, in generate
    yield from self.raw.stream(chunk_size, decode_content=True)
  File "/usr/local/lib/python3.10/dist-packages/urllib3/response.py", line 940, in stream
    data = self.read(amt=amt, decode_content=decode_content)
  File "/usr/local/lib/python3.10/dist-packages/urllib3/response.py", line 879, in read
    data = self._raw_read(amt)
  File "/usr/local/lib/python3.10/dist-packages/urllib3/response.py", line 813, in _raw_read
    with self._error_catcher():
  File "/usr/lib/python3.10/contextlib.py", line 153, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/usr/local/lib/python3.10/dist-packages/urllib3/response.py", line 727, in _error_catcher
    raise ProtocolError(f"Connection broken: {e!r}", e) from e
urllib3.exceptions.ProtocolError: ("Connection broken: ConnectionResetError(104, 'Connection reset by peer')", ConnectionResetError(104, 'Connection reset by peer'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/gradio/routes.py", line 488, in run_predict
    output = await app.get_blocks().process_api(
  File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1431, in process_api
    result = await self.call_function(
  File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1109, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "/usr/local/lib/python3.10/dist-packages/anyio/to_thread.py", line 33, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
    return await future
  File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 807, in run
    result = context.run(func, *args)
  File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 706, in wrapper
    response = f(*args, **kwargs)
  File "/content/facechain/app.py", line 184, in run
    data_process_fn(instance_data_dir, True)
  File "/content/facechain/facechain/inference.py", line 23, in data_process_fn
    data_process_fn = Blipv2()
  File "/content/facechain/facechain/data_process/preprocessing.py", line 202, in __init__
    self.model = DeepDanbooru()
  File "/content/facechain/facechain/data_process/deepbooru.py", line 721, in __init__
    snapshot_path = snapshot_download(foundation_model_id, revision='v4.0')
  File "/usr/local/lib/python3.10/dist-packages/modelscope/hub/snapshot_download.py", line 140, in snapshot_download
    parallel_download(
  File "/usr/local/lib/python3.10/dist-packages/modelscope/hub/file_download.py", line 243, in parallel_download
    list(executor.map(download_part, tasks))
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 621, in result_iterator
    yield _result_or_cancel(fs.pop())
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 319, in _result_or_cancel
    return fut.result(timeout)
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 458, in result
    return self.__get_result()
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/local/lib/python3.10/dist-packages/modelscope/hub/file_download.py", line 203, in download_part
    for chunk in r.iter_content(chunk_size=API_FILE_DOWNLOAD_CHUNK_SIZE):
  File "/usr/local/lib/python3.10/dist-packages/requests/models.py", line 818, in generate
    raise ChunkedEncodingError(e)
requests.exceptions.ChunkedEncodingError: ("Connection broken: ConnectionResetError(104, 'Connection reset by peer')", ConnectionResetError(104, 'Connection reset by peer'))

ImportError: FaceDetectionPipeline: DamoFdDetect: cannot import name 'Config' from 'mmcv' (unknown location)

开始训练后报这个错，我装了mvcc-full 1.7.0的版本

Error: nms_impl: implementation for device cuda:0 not found.

When upload picture and start trainning, there will be error on server side.
`2023-08-19 16:09:33,371 - modelscope - INFO - load model done
cathed for image process of 000.jpg
Error: nms_impl: implementation for device cuda:0 not found.

[]
Error: result is empty.
Traceback (most recent call last):
File "C:\ProgramData\anaconda3\envs\fchain\lib\site-packages\gradio\routes.py", line 488, in run_predict
output = await app.get_blocks().process_api(
File "C:\ProgramData\anaconda3\envs\fchain\lib\site-packages\gradio\blocks.py", line 1431, in process_api
result = await self.call_function(
File "C:\ProgramData\anaconda3\envs\fchain\lib\site-packages\gradio\blocks.py", line 1109, in call_function
prediction = await anyio.to_thread.run_sync(
File "C:\ProgramData\anaconda3\envs\fchain\lib\site-packages\anyio\to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "C:\ProgramData\anaconda3\envs\fchain\lib\site-packages\anyio_backends_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
File "C:\ProgramData\anaconda3\envs\fchain\lib\site-packages\anyio_backends_asyncio.py", line 807, in run
result = context.run(func, *args)
File "C:\ProgramData\anaconda3\envs\fchain\lib\site-packages\gradio\utils.py", line 706, in wrapper
response = f(*args, **kwargs)
File "D:\dev\facechain\app.py", line 174, in run
data_process_fn(instance_data_dir, True)
File "D:\dev\facechain\facechain\inference.py", line 24, in data_process_fn
out_json_name = data_process_fn(input_img_dir)
File "D:\dev\facechain\facechain\data_process\preprocessing.py", line 335, in call
exit()
File "C:\ProgramData\anaconda3\envs\fchain\lib_sitebuiltins.py", line 26, in call
raise SystemExit(code)
SystemExit: None`

执行"开始推理"时报错 OSError: [Errno 122] Disk quota exceeded

按照 ModelScope notebook 的方式跑起来了, 提示模型训练成功后, 执行"开始推理"时报错 OSError: [Errno 122] Disk quota exceeded

运行环境: 魔搭平台免费实例, PAI-DSW, GPU环境

8核 32GB 显存16G
预装 ModelScope Library
预装镜像  ubuntu20.04-cuda11.7.1-py38-torch2.0.1-tf1.15.5-1.8.1

工作空间的空间大小如下

root@dsw:/mnt/workspace# du -h -d 1
14G     ./.cache
73K     ./.ipynb_checkpoints
8.5K    ./.virtual_documents
574K    ./facechain
14G     .

魔搭平台免费实例是不是提供的硬盘太小了? 看 facechain 官方 README 是要求 Disk: About 50GB

有没有什么办法能够在魔搭平台免费实例上成功体验完全流程呢?

No such file or directory: '/tmp/qw/personalizaition_lora/pytorch_lora_weights.bin'

File "/home/yyy/facechain/facechain/inference.py", line 47, in main_diffusion_inference
pipe = merge_lora(pipe, lora_human_path, multiplier_human, from_safetensor=False)
File "/home/yyy/facechain/facechain/merge_lora.py", line 15, in merge_lora
checkpoint = torch.load(os.path.join(lora_path, 'pytorch_lora_weights.bin'),
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/qw/personalizaition_lora/pytorch_lora_weights.bin
训练生成的为safetensor，为什么代码中的为bin？为什么会找不到该文件？请教，感谢。

Running on local URL: http://127.0.0.1:7860

想问下这个本地的URL是在哪里设置的，我想设置其他的端口号

在colab跑起来出错

在colab上跑起来，用的是A100，到最后一步都正常，网页打开，上传照片后，点击开始训练，提示“CUDA is not available”
日志如下
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/gradio/routes.py", line 488, in run_predict
output = await app.get_blocks().process_api(
File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1431, in process_api
result = await self.call_function(
File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1109, in call_function
prediction = await anyio.to_thread.run_sync(
File "/usr/local/lib/python3.10/dist-packages/anyio/to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 807, in run
result = context.run(func, *args)
File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 706, in wrapper
response = f(*args, **kwargs)
File "/content/facechain/app.py", line 123, in run
raise gr.Error('CUDA is not available.')
gradio.exceptions.Error: 'CUDA is not available.'

不想部署环境，有直接调用收费的api的教程吗

Could not find a version that satisfies the requirement tf-estimator-nightly==2.8.0.dev2021122109

2.8.0的tensorflow好像有问题，python的3.8可以下载这个包吗？
INFO: pip is looking at multiple versions of tensorflow to determine which version is compatible with other requirements. This could take a while.
ERROR: Ignored the following versions that require a different python version: 1.11.0 Requires-Python <3.13,>=3.9; 1.11.0rc1 Requires-Python <3.13,>=3.9; 1.11.0rc2 Requires-Python <3.13,>=3.9; 1.11.1 Requires-Python <3.13,>=3.9; 1.11.2 Requires-Python <3.13,>=3.9; 1.25.0 Requires-Python >=3.9; 1.25.0rc1 Requires-Python >=3.9; 1.25.1 Requires-Python >=3.9; 1.25.2 Requires-Python >=3.9; 1.26.0b1 Requires-Python <3.13,>=3.9; 3.8.0rc1 Requires-Python >=3.9
ERROR: Could not find a version that satisfies the requirement tf-estimator-nightly==2.8.0.dev2021122109 (from tensorflow) (from versions: none)
ERROR: No matching distribution found for tf-estimator-nightly==2.8.0.dev2021122109

Error when training data

Windows环境下，训练报错，导致后面无法推理

 File "D:\ProgramData\anaconda3\envs\facechain\lib\site-packages\datasets\packaged_modules\folder_based_builder\folder_based_builder.py", line 311, in _generate_examples
    raise ValueError(
ValueError: image at tmp.png doesn't have metadata in D:\AI\qw\training_data\personalizaition_lora_labeled\metadata.jsonl.

查看后台相关信息，发现一个“rm”命令的错误信息，rm是linux命令，在windows下没有此命令。

2023-08-20 00:15:28.975118: I tensorflow/stream_executor/cuda/cuda_dnn.cc:368] Loaded cuDNN version 8700
000.jpg 0.9607361331582069
1girl, brown_eyes, brown_hair, earrings, jewelry, lips, long_hair, looking_at_viewer, open_mouth, simple_background, smile, solo, teeth, transparent_background
[['1girl', 'brown_eyes', 'brown_hair', 'earrings', 'jewelry', 'lips', 'long_hair', 'looking_at_viewer', 'open_mouth', 'simple_background', 'smile', 'solo', 'teeth', 'transparent_background']]
'rm' 不是内部或外部命令，也不是可运行的程序
或批处理文件。
0.png a beautiful woman, brown_hair, earrings, jewelry, long_hair, looking_at_viewer, open_mouth, simple_background, smile, solo, transparent_background
08/20/2023 00:15:31 - INFO - __main__ - Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda

mmcv和modelscope版本问题

目前看上去modelscop库里的很多task依然是基于mmcv<2.0.0编写的，要修改大量的地方，比如from mmcv.parallel import MMDataParallel ，后续会有更新吗

/opt/conda/bin/python: can't open file 'facechain/train_text_to_image_lora.py': [Errno 2] No such file or directory

使用容器部署，GPU A10 NVIDIA-SMI 525.105.17 Driver Version: 525.105.17 CUDA Version: 12.0
webUI执行训练后后台日志输出如下错误，web现实训练完成但是形象体验Error

/opt/conda/bin/python: can't open file 'facechain/train_text_to_image_lora.py': [Errno 2] No such file or directory
Traceback (most recent call last):
File "/opt/conda/bin/accelerate", line 8, in
sys.exit(main())
File "/opt/conda/lib/python3.8/site-packages/accelerate/commands/accelerate_cli.py", line 45, in main
args.func(args)
File "/opt/conda/lib/python3.8/site-packages/accelerate/commands/launch.py", line 979, in launch_command
simple_launcher(args)
File "/opt/conda/lib/python3.8/site-packages/accelerate/commands/launch.py", line 628, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/opt/conda/bin/python', 'facechain/train_text_to_image_lora.py', '--pretrained_model_name_or_path=ly261666/cv_portrait_model', '--revision=v2.0', '--sub_path=film/film', '--output_dataset_name=/tmp/qw/training_data/personalizaition_lora', '--caption_column=text', '--resolution=512', '--random_flip', '--train_batch_size=1', '--num_train_epochs=200', '--checkpointing_steps=5000', '--learning_rate=1e-04', '--lr_scheduler=cosine', '--lr_warmup_steps=0', '--seed=42', '--output_dir=/tmp/qw/personalizaition_lora', '--lora_r=32', '--lora_alpha=32', '--lora_text_encoder_r=32', '--lora_text_encoder_alpha=32']' returned non-zero exit status 2.

torch.distributed.elastic.multiprocessing.errors.ChildFailedError

run the script PYTHONPATH=. sh train_lora.sh "ly261666/cv_portrait_model" "v2.0" "film/film" "./imgs" "./processed" "./output"

ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 1 (pid: 399410) of binary: /home/disk01/wyw/.conda/envs/facechain/bin/python
Traceback (most recent call last):
File "/home/disk01/wyw/.conda/envs/facechain/bin/accelerate", line 8, in
sys.exit(main())
File "/home/disk01/wyw/.conda/envs/facechain/lib/python3.8/site-packages/accelerate/commands/accelerate_cli.py", line 45, in main
args.func(args)
File "/home/disk01/wyw/.conda/envs/facechain/lib/python3.8/site-packages/accelerate/commands/launch.py", line 970, in launch_command
multi_gpu_launcher(args)
File "/home/disk01/wyw/.conda/envs/facechain/lib/python3.8/site-packages/accelerate/commands/launch.py", line 646, in multi_gpu_launcher
distrib_run.run(args)
File "/home/disk01/wyw/.conda/envs/facechain/lib/python3.8/site-packages/torch/distributed/run.py", line 785, in run
elastic_launch(
File "/home/disk01/wyw/.conda/envs/facechain/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 134, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/disk01/wyw/.conda/envs/facechain/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

facechain/train_text_to_image_lora.py FAILED

Does it support run on local machine?

I try to train lora on my machine? but it raise error

In [1]: from modelscope import snapshot_download
^[[A2023-08-14 11:23:11,600 - modelscope - INFO - PyTorch version 2.0.0+cu118 Found.
2023-08-14 11:23:11,602 - modelscope - INFO - TensorFlow version 2.13.0 Found.
2023-08-14 11:23:11,602 - modelscope - INFO - Loading ast index from /root/.cache/modelscope/ast_indexer
2023-08-14 11:23:11,631 - modelscope - INFO - Loading done! Current index file version is 1.8.1, with md5 bbb8dd73324c667bf9ab6594815ac903 and a total number of 893 components indexed

In [2]: model_dir = snapshot_download('Cherrytest/rot_bgr', revision='v1.0.0')
2023-08-14 11:23:13,696 - modelscope - ERROR - Authentication token does not exist, failed to access model Cherrytest/rot_bgr which may not exist or may be                 private. Please login first.

关于cuda11.7

我在部署facechain的时候，系统要求cuda11.7，可是在ubuntu上的 4090对应的显卡驱动都是支持的cuda12.2的，安装不上11.7。请问是不是cuda12.2 的也可以，为什么我的这边一直报错，训练的时候。。

还有两个问题：
1: 如果conda安装的python3.10.6的时候，如果是cuda12.2，那么mim install mmcv-full==1.7.0压根就无法执行安装
2: 但是如果用python3.8的话是可以安装成功mim install mmcv-full==1.7.0，启动程序，但是训练的时候报错。

是否会提供windows训练脚本

Couldn't launch app.py on Window11

python3 app.py

nothing happens!

模型文件格式问题，大家有遇到吗

生成模型时是.safetensors文件：

使用模型时使用的是 .bin文件：

和“妙鸭”的差距大吗？

有人跑起来额吗，和“妙鸭”的差距大不大，差距在哪些地方呢？谢谢

Error: result is empty.

[]
Error: result is empty.
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/gradio/routes.py", line 488, in run_predict
output = await app.get_blocks().process_api(
File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1431, in process_api
result = await self.call_function(
File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1109, in call_function
prediction = await anyio.to_thread.run_sync(
File "/usr/local/lib/python3.10/dist-packages/anyio/to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 807, in run
result = context.run(func, *args)
File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 706, in wrapper
response = f(*args, **kwargs)
File "/content/facechain/app.py", line 149, in run
data_process_fn(instance_data_dir, True)
File "/content/facechain/facechain/inference.py", line 24, in data_process_fn
out_json_name = data_process_fn(input_img_dir)
File "/content/facechain/facechain/data_process/preprocessing.py", line 335, in call
exit()
File "/usr/lib/python3.10/_sitebuiltins.py", line 26, in call
raise SystemExit(code)
SystemExit: None

When running on Colab (t4 Runtime)

windows system

can i use on windows system?

CUDA out of memory error on training

When I run into the following error on Alibaba Cloud DSW with an NVIDIA V100 instance

image： modelscope:ubuntu20.04-cuda11.7.1-py38-torch2.0.1-tf1.15.5-1.8.1

DSW NVIDIA V100

08/18/2023 19:46:51 - INFO - __main__ - ***** Running training *****
08/18/2023 19:46:51 - INFO - __main__ -   Num examples = 9
08/18/2023 19:46:51 - INFO - __main__ -   Num Epochs = 200
08/18/2023 19:46:51 - INFO - __main__ -   Instantaneous batch size per device = 1
08/18/2023 19:46:51 - INFO - __main__ -   Total train batch size (w. parallel, distributed & accumulation) = 1
08/18/2023 19:46:51 - INFO - __main__ -   Gradient Accumulation steps = 1
08/18/2023 19:46:51 - INFO - __main__ -   Total optimization steps = 1800
Steps:   0%|                                           | 0/1800 [00:00<?, ?it/s]Traceback (most recent call last):
  File "facechain/train_text_to_image_lora.py", line 1103, in <module>
    main()
  File "facechain/train_text_to_image_lora.py", line 924, in main
    model_pred = unet(noisy_latents, timesteps, encoder_hidden_states).sample
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/diffusers/models/unet_2d_condition.py", line 956, in forward
    sample = upsample_block(
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/diffusers/models/unet_2d_blocks.py", line 2127, in forward
    hidden_states = attn(
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/diffusers/models/transformer_2d.py", line 291, in forward
    hidden_states = block(
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/diffusers/models/attention.py", line 154, in forward
    attn_output = self.attn1(
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/diffusers/models/attention_processor.py", line 321, in forward
    return self.processor(
  File "/opt/conda/lib/python3.8/site-packages/diffusers/models/attention_processor.py", line 601, in __call__
    attention_probs = attn.get_attention_scores(query, key, attention_mask)
  File "/opt/conda/lib/python3.8/site-packages/diffusers/models/attention_processor.py", line 362, in get_attention_scores
    attention_scores = torch.baddbmm(
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 512.00 MiB (GPU 0; 15.78 GiB total capacity; 8.13 GiB already allocated; 469.75 MiB free; 8.35 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

On windows you must use pip to install mmcv-full

When I use
mim install mmcv-full==1.7.0

I always got following error
RuntimeError: nms_impl: implementation for device cuda:0 not found.

I thought it was cuda version problem and try to downgrade my cuda from 12.2 to 11.8.

Finally when I use

mim uninstall mmcv-full
pip install mmcv-full

The building process will take twenty minutes. When I restart app, the error is gone.

quantum aaaaachieve

able to program quantum computer

Expected all tensors to be on the same device

Error: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! (when checking argument for argument weight in method wrapper_CUDA__cudnn_convolution)

当有多张显卡的时候，会找不到指定的显卡，请问需要怎么解决？

app.py执行没有反应

安装过程没有任何报错，最后执行的时候没有任何输出

训练时对图片的处理问题

你好，我看训练源码时发现只对图片进行了旋转矫正，readme中的其他矫正如人脸美颜等在代码中并没有体现。以及是否在训练人脸lora时未对人脸打标签？

把更新后的代码拉下来，再运行打开网址后直接报错了

使用windows11反复尝试修改已经可以跑起来，到形象体验->开始生成报错，实在没找到原因，还是请帮忙确认下，这个错误是否可以定位到问题？

windows11
python3.8
CUDA 11.7
GPU GeForce RTX 4060

与readme中环境的差异：
1、mmcv-full==1.7.0 报错，nms_impl: implementation for device cuda:0 not found. 反复卸载重装依旧报错，改用1.7.1后成功。
到形象体验->开始生成报错：
FileNotFoundError: [Errno 2] No such file or directory: 'D:\AI\facechain\tmp/qw/personalizaition_lora\pytorch_lora_weights.bin'

修改prompt导致生成失败

我将upper_body修改为full body，结果生成图片失败，请问这是什么原因导致的？

nms_impl: implementation for device cuda:0 not found

cathed for image process of 000.jpg
Error: nms_impl: implementation for device cuda:0 not found.

CUDA is not available issue with colab

all works fine but when start training i get this error
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/gradio/routes.py", line 488, in run_predict
output = await app.get_blocks().process_api(
File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1431, in process_api
result = await self.call_function(
File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1109, in call_function
prediction = await anyio.to_thread.run_sync(
File "/usr/local/lib/python3.10/dist-packages/anyio/to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 807, in run
result = context.run(func, *args)
File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 706, in wrapper
response = f(*args, **kwargs)
File "/content/facechain/app.py", line 123, in run
raise gr.Error('CUDA is not available.')
gradio.exceptions.Error: 'CUDA is not available.'

安装mmcv==1.7.0之后运行出现问题

2023-08-21 16:42:31,201 - modelscope - INFO - Model revision not specified, use the latest revision: v1.1
2023-08-21 16:42:31,396 - modelscope - INFO - initiate model from /home/hx/.cache/modelscope/hub/damo/cv_ddsar_face-detection_iclr23-damofd
2023-08-21 16:42:31,396 - modelscope - INFO - initiate model from location /home/hx/.cache/modelscope/hub/damo/cv_ddsar_face-detection_iclr23-damofd.
2023-08-21 16:42:31,397 - modelscope - INFO - initialize model from /home/hx/.cache/modelscope/hub/damo/cv_ddsar_face-detection_iclr23-damofd
Traceback (most recent call last):
File "/home/hx/anaconda3/envs/facechain/lib/python3.8/site-packages/modelscope/utils/registry.py", line 210, in build_from_cfg
return obj_cls._instantiate(**args)
File "/home/hx/anaconda3/envs/facechain/lib/python3.8/site-packages/modelscope/models/base/base_model.py", line 66, in _instantiate
return cls(**kwargs)
File "/home/hx/anaconda3/envs/facechain/lib/python3.8/site-packages/modelscope/models/cv/face_detection/scrfd/damofd_detect.py", line 31, in init
super().init(model_dir, **kwargs)
File "/home/hx/anaconda3/envs/facechain/lib/python3.8/site-packages/modelscope/models/cv/face_detection/scrfd/scrfd_detect.py", line 36, in init
from mmdet.models import build_detector
File "/home/hx/anaconda3/envs/facechain/lib/python3.8/site-packages/mmdet/models/init.py", line 2, in
from .backbones import * # noqa: F401,F403
File "/home/hx/anaconda3/envs/facechain/lib/python3.8/site-packages/mmdet/models/backbones/init.py", line 2, in
from .csp_darknet import CSPDarknet
File "/home/hx/anaconda3/envs/facechain/lib/python3.8/site-packages/mmdet/models/backbones/csp_darknet.py", line 11, in
from ..utils import CSPLayer
File "/home/hx/anaconda3/envs/facechain/lib/python3.8/site-packages/mmdet/models/utils/init.py", line 13, in
from .point_sample import (get_uncertain_point_coords_with_randomness,
File "/home/hx/anaconda3/envs/facechain/lib/python3.8/site-packages/mmdet/models/utils/point_sample.py", line 3, in
from mmcv.ops import point_sample
File "/home/hx/anaconda3/envs/facechain/lib/python3.8/site-packages/mmcv/ops/init.py", line 2, in
from .active_rotated_filter import active_rotated_filter
File "/home/hx/anaconda3/envs/facechain/lib/python3.8/site-packages/mmcv/ops/active_rotated_filter.py", line 10, in
ext_module = ext_loader.load_ext(
File "/home/hx/anaconda3/envs/facechain/lib/python3.8/site-packages/mmcv/utils/ext_loader.py", line 13, in load_ext
ext = importlib.import_module('mmcv.' + name)
File "/home/hx/anaconda3/envs/facechain/lib/python3.8/importlib/init.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
ModuleNotFoundError: No module named 'mmcv._ext'

你好，我尝试过安装2.0.0版本的mmcv，但是安装后训练过程中出现报错，can not import name 'Config' from mmcv
所以后面换回1.7.0版本，但是出现了以上问题，请问该如何解决

有更多的服饰吗

目前只有几个屌爆了的silver armor之类的

examples = {
'prompt_male': [
['silver armor'],
['T-shirt']
],
'prompt_female': [
['beautiful traditional hanfu, upper_body'],
['an elegant evening gown']
],
}

example_styles = [
{'name': '默认风格(default style)'},
{'name': '凤冠霞帔(Chinese traditional gorgeous suit)',
'model_id': 'ly261666/civitai_xiapei_lora',
'revision': 'v1.0.0',
'bin_file': 'xiapei.safetensors',
'multiplier_style': 0.35,
'add_prompt_style': 'red, hanfu, tiara, crown, '},
]

这个里面的模型可以换吗不够写实

搞出来的人像还是很卡通啊

torch2的 mmcv-full安装不了，一直卡在Building wheel for mmcv-full (setup.py) ... /

mim install mmcv-full==1.7.0
Looking in indexes: http://mirrors.aliyun.com/pypi/simple
Looking in links: https://download.openmmlab.com/mmcv/dist/cu117/torch2.0.0/index.html
Collecting mmcv-full==1.7.0
Downloading http://mirrors.aliyun.com/pypi/packages/a1/81/89120850923f4c8b49efba81af30160e7b1b305fdfa9671a661705a8abbf/mmcv-full-1.7.0.tar.gz (593 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 593.6/593.6 kB 4.6 MB/s eta 0:00:00
Preparing metadata (setup.py) ... done
Requirement already satisfied: addict in /root/autodl-tmp/conda/envs/facechain/lib/python3.10/site-packages (from mmcv-full==1.7.0) (2.4.0)
Requirement already satisfied: numpy in /root/autodl-tmp/conda/envs/facechain/lib/python3.10/site-packages (from mmcv-full==1.7.0) (1.22.0)
Requirement already satisfied: packaging in /root/autodl-tmp/conda/envs/facechain/lib/python3.10/site-packages (from mmcv-full==1.7.0) (23.1)
Requirement already satisfied: Pillow in /root/autodl-tmp/conda/envs/facechain/lib/python3.10/site-packages (from mmcv-full==1.7.0) (10.0.0)
Requirement already satisfied: pyyaml in /root/autodl-tmp/conda/envs/facechain/lib/python3.10/site-packages (from mmcv-full==1.7.0) (6.0.1)
Requirement already satisfied: yapf in /root/autodl-tmp/conda/envs/facechain/lib/python3.10/site-packages (from mmcv-full==1.7.0) (0.40.1)
Requirement already satisfied: importlib-metadata>=6.6.0 in /root/autodl-tmp/conda/envs/facechain/lib/python3.10/site-packages (from yapf->mmcv-full==1.7.0) (6.8.0)
Requirement already satisfied: platformdirs>=3.5.1 in /root/autodl-tmp/conda/envs/facechain/lib/python3.10/site-packages (from yapf->mmcv-full==1.7.0) (3.10.0)
Requirement already satisfied: tomli>=2.0.1 in /root/autodl-tmp/conda/envs/facechain/lib/python3.10/site-packages (from yapf->mmcv-full==1.7.0) (2.0.1)
Requirement already satisfied: zipp>=0.5 in /root/autodl-tmp/conda/envs/facechain/lib/python3.10/site-packages (from importlib-metadata>=6.6.0->yapf->mmcv-full==1.7.0) (3.16.2)
Building wheels for collected packages: mmcv-full
Building wheel for mmcv-full (setup.py) ... /

style lora需要怎么训练？

如果想添加新的style lora，是用user lora的训练脚本和训练参数去训练吗？

无法下载人脸融合模型 cv_unet-image-face-fusion_damo

你好，能提供下载链接吗

报错如下：

modelscope - INFO - Model revision not specified, use the latest revision: v1.1
urllib3.exceptions.MaxRetryError: None: Max retries exceeded with url: http://www.modelscope.cn/api/v1/models/damo/cv_unet-image-face-fusion_damo/repo?Revision=v1.1&FilePath=description/.DS_Store (Caused by ConnectionError(ProtocolError('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))))

Rrr

CUDA is not available.

How to use GPU in conda on windows

If you install package by default requirements.txt, the torch will be cpu version.
If you want to use GPU version, you should uninstall torch first, and install the correct GPU version torch.
`(fchain) D:\dev\facechain>pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
Looking in indexes: https://download.pytorch.org/whl/cu118'

模型如何直接本地加载，代码中默认下载地址是哪里？

局限于模型下载的速度和效率太慢

k8s 上跑遇见的问题

Dockerfile

FROM registry.cn-hangzhou.aliyuncs.com/modelscope-repo/modelscope:ubuntu20.04-cuda11.7.1-py38-torch2.0.1-tf1.15.5-1.8.0
RUN pip3 install gradio

SHELL ["/bin/bash", "--login", "-c"]
RUN GIT_LFS_SKIP_SMUDGE=1 git clone https://github.com/modelscope/facechain.git --depth 1
WORKDIR facechain
ENV NVIDIA_DISABLE_REQUIRE=true

ENTRYPOINT ["python3","app.py"]

阿里云的 k8s调度的ecs

2023-08-20 09:08:15.817344: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
/opt/conda/lib/python3.8/site-packages/torch/cuda/init.py:107: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 803: system has unsupported display driver / cuda driver combination (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:109.)
return torch._C._cuda_getDeviceCount() > 0
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
app.py:302: GradioDeprecationWarning: The style method is deprecated. Please set these arguments in the constructor instead.
output_images = gr.Gallery(label='Output', show_label=False).style(columns=3, rows=2, height=600,

modelscope / facechain Goto Github PK

facechain's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs