GithubHelp home page GithubHelp logo

modelscope / facechain Goto Github PK

View Code? Open in Web Editor NEW
8.6K 8.6K 804.0 97.98 MB

FaceChain is a deep-learning toolchain for generating your Digital-Twin.

License: Apache License 2.0

Python 16.39% Shell 0.10% CSS 0.19% Jupyter Notebook 83.32%

facechain's Introduction



PyPI

license open issues GitHub pull-requests GitHub latest commit Leaderboard

modelscope%2Fmodelscope | Trendshift

English | 中文 | 日本語

Introduction

ModelScope is built upon the notion of “Model-as-a-Service” (MaaS). It seeks to bring together most advanced machine learning models from the AI community, and streamlines the process of leveraging AI models in real-world applications. The core ModelScope library open-sourced in this repository provides the interfaces and implementations that allow developers to perform model inference, training and evaluation.

In particular, with rich layers of API-abstraction, the ModelScope library offers unified experience to explore state-of-the-art models spanning across domains such as CV, NLP, Speech, Multi-Modality, and Scientific-computation. Model contributors of different areas can integrate models into the ModelScope ecosystem through the layered-APIs, allowing easy and unified access to their models. Once integrated, model inference, fine-tuning, and evaluations can be done with only a few lines of codes. In the meantime, flexibilities are also provided so that different components in the model applications can be customized wherever necessary.

Apart from harboring implementations of a wide range of different models, ModelScope library also enables the necessary interactions with ModelScope backend services, particularly with the Model-Hub and Dataset-Hub. Such interactions facilitate management of various entities (models and datasets) to be performed seamlessly under-the-hood, including entity lookup, version control, cache management, and many others.

Models and Online Accessibility

Hundreds of models are made publicly available on ModelScope (700+ and counting), covering the latest development in areas such as NLP, CV, Audio, Multi-modality, and AI for Science, etc. Many of these models represent the SOTA in their specific fields, and made their open-sourced debut on ModelScope. Users can visit ModelScope(modelscope.cn) and experience first-hand how these models perform via online experience, with just a few clicks. Immediate developer-experience is also possible through the ModelScope Notebook, which is backed by ready-to-use CPU/GPU development environment in the cloud - only one click away on ModelScope.



Some representative examples include:

LLM:

Multi-Modal:

CV:

Audio:

AI for Science:

Note: Most models on ModelScope are public and can be downloaded without account registration on modelscope website(www.modelscope.cn), please refer to instructions for model download, for dowloading models with api provided by modelscope library or git.

QuickTour

We provide unified interface for inference using pipeline, fine-tuning and evaluation using Trainer for different tasks.

For any given task with any type of input (image, text, audio, video...), inference pipeline can be implemented with only a few lines of code, which will automatically load the underlying model to get inference result, as is exemplified below:

>>> from modelscope.pipelines import pipeline
>>> word_segmentation = pipeline('word-segmentation',model='damo/nlp_structbert_word-segmentation_chinese-base')
>>> word_segmentation('今天天气不错,适合出去游玩')
{'output': '今天 天气 不错 , 适合 出去 游玩'}

Given an image, portrait matting (aka. background-removal) can be accomplished with the following code snippet:

image

>>> import cv2
>>> from modelscope.pipelines import pipeline

>>> portrait_matting = pipeline('portrait-matting')
>>> result = portrait_matting('https://modelscope.oss-cn-beijing.aliyuncs.com/test/images/image_matting.png')
>>> cv2.imwrite('result.png', result['output_img'])

The output image with the background removed is: image

Fine-tuning and evaluation can also be done with a few more lines of code to set up training dataset and trainer, with the heavy-lifting work of training and evaluation a model encapsulated in the implementation of traner.train() and trainer.evaluate() interfaces.

For example, the gpt3 base model (1.3B) can be fine-tuned with the chinese-poetry dataset, resulting in a model that can be used for chinese-poetry generation.

>>> from modelscope.metainfo import Trainers
>>> from modelscope.msdatasets import MsDataset
>>> from modelscope.trainers import build_trainer

>>> train_dataset = MsDataset.load('chinese-poetry-collection', split='train'). remap_columns({'text1': 'src_txt'})
>>> eval_dataset = MsDataset.load('chinese-poetry-collection', split='test').remap_columns({'text1': 'src_txt'})
>>> max_epochs = 10
>>> tmp_dir = './gpt3_poetry'

>>> kwargs = dict(
     model='damo/nlp_gpt3_text-generation_1.3B',
     train_dataset=train_dataset,
     eval_dataset=eval_dataset,
     max_epochs=max_epochs,
     work_dir=tmp_dir)

>>> trainer = build_trainer(name=Trainers.gpt3_trainer, default_args=kwargs)
>>> trainer.train()

Why should I use ModelScope library

  1. A unified and concise user interface is abstracted for different tasks and different models. Model inferences and training can be implemented by as few as 3 and 10 lines of code, respectively. It is convenient for users to explore models in different fields in the ModelScope community. All models integrated into ModelScope are ready to use, which makes it easy to get started with AI, in both educational and industrial settings.

  2. ModelScope offers a model-centric development and application experience. It streamlines the support for model training, inference, export and deployment, and facilitates users to build their own MLOps based on the ModelScope ecosystem.

  3. For the model inference and training process, a modular design is put in place, and a wealth of functional module implementations are provided, which is convenient for users to customize their own model inference, training and other processes.

  4. For distributed model training, especially for large models, it provides rich training strategy support, including data parallel, model parallel, hybrid parallel and so on.

Installation

Docker

ModelScope Library currently supports popular deep learning framework for model training and inference, including PyTorch, TensorFlow and ONNX. All releases are tested and run on Python 3.7+, Pytorch 1.8+, Tensorflow1.15 or Tensorflow2.0+.

To allow out-of-box usage for all the models on ModelScope, official docker images are provided for all releases. Based on the docker image, developers can skip all environment installation and configuration and use it directly. Currently, the latest version of the CPU image and GPU image can be obtained from:

CPU docker image

# py37
registry.cn-hangzhou.aliyuncs.com/modelscope-repo/modelscope:ubuntu20.04-py37-torch1.11.0-tf1.15.5-1.6.1

# py38
registry.cn-hangzhou.aliyuncs.com/modelscope-repo/modelscope:ubuntu20.04-py38-torch2.0.1-tf2.13.0-1.9.5

GPU docker image

# py37
registry.cn-hangzhou.aliyuncs.com/modelscope-repo/modelscope:ubuntu20.04-cuda11.3.0-py37-torch1.11.0-tf1.15.5-1.6.1

# py38
registry.cn-hangzhou.aliyuncs.com/modelscope-repo/modelscope:ubuntu20.04-cuda11.8.0-py38-torch2.0.1-tf2.13.0-1.9.5

Setup Local Python Environment

One can also set up local ModelScope environment using pip and conda. ModelScope supports python3.7 and above. We suggest anaconda for creating local python environment:

conda create -n modelscope python=3.8
conda activate modelscope

PyTorch or TensorFlow can be installed separately according to each model's requirements.

  • Install pytorch doc
  • Install tensorflow doc

After installing the necessary machine-learning framework, you can install modelscope library as follows:

If you only want to play around with the modelscope framework, of trying out model/dataset download, you can install the core modelscope components:

pip install modelscope

If you want to use multi-modal models:

pip install modelscope[multi-modal]

If you want to use nlp models:

pip install modelscope[nlp] -f https://modelscope.oss-cn-beijing.aliyuncs.com/releases/repo.html

If you want to use cv models:

pip install modelscope[cv] -f https://modelscope.oss-cn-beijing.aliyuncs.com/releases/repo.html

If you want to use audio models:

pip install modelscope[audio] -f https://modelscope.oss-cn-beijing.aliyuncs.com/releases/repo.html

If you want to use science models:

pip install modelscope[science] -f https://modelscope.oss-cn-beijing.aliyuncs.com/releases/repo.html

Notes:

  1. Currently, some audio-task models only support python3.7, tensorflow1.15.4 Linux environments. Most other models can be installed and used on Windows and Mac (x86).

  2. Some models in the audio field use the third-party library SoundFile for wav file processing. On the Linux system, users need to manually install libsndfile of SoundFile(doc link). On Windows and MacOS, it will be installed automatically without user operation. For example, on Ubuntu, you can use following commands:

    sudo apt-get update
    sudo apt-get install libsndfile1
  3. Some models in computer vision need mmcv-full, you can refer to mmcv installation guide, a minimal installation is as follows:

    pip uninstall mmcv # if you have installed mmcv, uninstall it
    pip install -U openmim
    mim install mmcv-full

Learn More

We provide additional documentations including:

License

This project is licensed under the Apache License (Version 2.0).

facechain's People

Contributors

bubbliiiing avatar bwdforce avatar chkhu avatar cleaner-cyber avatar eltociear avatar foggy-whale avatar haoyu-xie avatar hehaha68 avatar hiswitch avatar hudcase avatar iiiiiiint avatar iotang avatar ly19965 avatar metrosir avatar mowunian avatar potazinc avatar prhloveayg avatar rentingxutx avatar slpal avatar sunbaigui avatar tastelikefeet avatar trumpool avatar ultimatech-cn avatar wangxingjun778 avatar wenmengzhou avatar wuziheng avatar wwdok avatar yingdachen avatar you-cun avatar zanghyu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

facechain's Issues

On windows you must use pip to install mmcv-full

When I use
mim install mmcv-full==1.7.0

I always got following error
RuntimeError: nms_impl: implementation for device cuda:0 not found.

I thought it was cuda version problem and try to downgrade my cuda from 12.2 to 11.8.

Finally when I use

mim uninstall mmcv-full
pip install mmcv-full

The building process will take twenty minutes. When I restart app, the error is gone.

Expected all tensors to be on the same device

Error: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! (when checking argument for argument weight in method wrapper_CUDA__cudnn_convolution)

当有多张显卡的时候,会找不到指定的显卡,请问需要怎么解决?

torch2的 mmcv-full安装不了,一直卡在Building wheel for mmcv-full (setup.py) ... /

mim install mmcv-full==1.7.0
Looking in indexes: http://mirrors.aliyun.com/pypi/simple
Looking in links: https://download.openmmlab.com/mmcv/dist/cu117/torch2.0.0/index.html
Collecting mmcv-full==1.7.0
Downloading http://mirrors.aliyun.com/pypi/packages/a1/81/89120850923f4c8b49efba81af30160e7b1b305fdfa9671a661705a8abbf/mmcv-full-1.7.0.tar.gz (593 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 593.6/593.6 kB 4.6 MB/s eta 0:00:00
Preparing metadata (setup.py) ... done
Requirement already satisfied: addict in /root/autodl-tmp/conda/envs/facechain/lib/python3.10/site-packages (from mmcv-full==1.7.0) (2.4.0)
Requirement already satisfied: numpy in /root/autodl-tmp/conda/envs/facechain/lib/python3.10/site-packages (from mmcv-full==1.7.0) (1.22.0)
Requirement already satisfied: packaging in /root/autodl-tmp/conda/envs/facechain/lib/python3.10/site-packages (from mmcv-full==1.7.0) (23.1)
Requirement already satisfied: Pillow in /root/autodl-tmp/conda/envs/facechain/lib/python3.10/site-packages (from mmcv-full==1.7.0) (10.0.0)
Requirement already satisfied: pyyaml in /root/autodl-tmp/conda/envs/facechain/lib/python3.10/site-packages (from mmcv-full==1.7.0) (6.0.1)
Requirement already satisfied: yapf in /root/autodl-tmp/conda/envs/facechain/lib/python3.10/site-packages (from mmcv-full==1.7.0) (0.40.1)
Requirement already satisfied: importlib-metadata>=6.6.0 in /root/autodl-tmp/conda/envs/facechain/lib/python3.10/site-packages (from yapf->mmcv-full==1.7.0) (6.8.0)
Requirement already satisfied: platformdirs>=3.5.1 in /root/autodl-tmp/conda/envs/facechain/lib/python3.10/site-packages (from yapf->mmcv-full==1.7.0) (3.10.0)
Requirement already satisfied: tomli>=2.0.1 in /root/autodl-tmp/conda/envs/facechain/lib/python3.10/site-packages (from yapf->mmcv-full==1.7.0) (2.0.1)
Requirement already satisfied: zipp>=0.5 in /root/autodl-tmp/conda/envs/facechain/lib/python3.10/site-packages (from importlib-metadata>=6.6.0->yapf->mmcv-full==1.7.0) (3.16.2)
Building wheels for collected packages: mmcv-full
Building wheel for mmcv-full (setup.py) ... /

使用windows11反复尝试修改已经可以跑起来,到形象体验->开始生成报错,实在没找到原因,还是请帮忙确认下,这个错误是否可以定位到问题?

windows11
python3.8
CUDA 11.7
GPU GeForce RTX 4060

与readme中环境的差异:
1、mmcv-full==1.7.0 报错,nms_impl: implementation for device cuda:0 not found. 反复卸载重装依旧报错,改用1.7.1后成功。
到形象体验->开始生成报错:
FileNotFoundError: [Errno 2] No such file or directory: 'D:\AI\facechain\tmp/qw/personalizaition_lora\pytorch_lora_weights.bin'

Error when training data

Windows环境下,训练报错,导致后面无法推理

 File "D:\ProgramData\anaconda3\envs\facechain\lib\site-packages\datasets\packaged_modules\folder_based_builder\folder_based_builder.py", line 311, in _generate_examples
    raise ValueError(
ValueError: image at tmp.png doesn't have metadata in D:\AI\qw\training_data\personalizaition_lora_labeled\metadata.jsonl.

查看后台相关信息,发现一个“rm”命令的错误信息,rm是linux命令,在windows下没有此命令。

2023-08-20 00:15:28.975118: I tensorflow/stream_executor/cuda/cuda_dnn.cc:368] Loaded cuDNN version 8700
000.jpg 0.9607361331582069
1girl, brown_eyes, brown_hair, earrings, jewelry, lips, long_hair, looking_at_viewer, open_mouth, simple_background, smile, solo, teeth, transparent_background
[['1girl', 'brown_eyes', 'brown_hair', 'earrings', 'jewelry', 'lips', 'long_hair', 'looking_at_viewer', 'open_mouth', 'simple_background', 'smile', 'solo', 'teeth', 'transparent_background']]
'rm' 不是内部或外部命令,也不是可运行的程序
或批处理文件。
0.png a beautiful woman, brown_hair, earrings, jewelry, long_hair, looking_at_viewer, open_mouth, simple_background, smile, solo, transparent_background
08/20/2023 00:15:31 - INFO - __main__ - Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda

Error: result is empty.

[]
Error: result is empty.
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/gradio/routes.py", line 488, in run_predict
output = await app.get_blocks().process_api(
File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1431, in process_api
result = await self.call_function(
File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1109, in call_function
prediction = await anyio.to_thread.run_sync(
File "/usr/local/lib/python3.10/dist-packages/anyio/to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 807, in run
result = context.run(func, *args)
File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 706, in wrapper
response = f(*args, **kwargs)
File "/content/facechain/app.py", line 149, in run
data_process_fn(instance_data_dir, True)
File "/content/facechain/facechain/inference.py", line 24, in data_process_fn
out_json_name = data_process_fn(input_img_dir)
File "/content/facechain/facechain/data_process/preprocessing.py", line 335, in call
exit()
File "/usr/lib/python3.10/_sitebuiltins.py", line 26, in call
raise SystemExit(code)
SystemExit: None

When running on Colab (t4 Runtime)

No such file or directory: '/tmp/qw/personalizaition_lora/pytorch_lora_weights.bin'

File "/home/yyy/facechain/facechain/inference.py", line 47, in main_diffusion_inference
pipe = merge_lora(pipe, lora_human_path, multiplier_human, from_safetensor=False)
File "/home/yyy/facechain/facechain/merge_lora.py", line 15, in merge_lora
checkpoint = torch.load(os.path.join(lora_path, 'pytorch_lora_weights.bin'),
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/qw/personalizaition_lora/pytorch_lora_weights.bin
训练生成的为safetensor,为什么代码中的为bin?为什么会找不到该文件?请教,感谢。

mat1 and mat2 must have the same dtype

08/17/2023 14:39:07 - INFO - __main__ - ***** Running training *****
08/17/2023 14:39:07 - INFO - __main__ -   Num examples = 3
08/17/2023 14:39:07 - INFO - __main__ -   Num Epochs = 200
08/17/2023 14:39:07 - INFO - __main__ -   Instantaneous batch size per device = 1
08/17/2023 14:39:07 - INFO - __main__ -   Total train batch size (w. parallel, distributed & accumulation) = 1
08/17/2023 14:39:07 - INFO - __main__ -   Gradient Accumulation steps = 1
08/17/2023 14:39:07 - INFO - __main__ -   Total optimization steps = 600
Steps:   0%|                                                                                    | 0/600 [00:00<?, ?it/s]╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ facechain/facechain/train_text_to_image_lora.py:1103 in <module>           │
│                                                                                                  │
│   1100                                                                                           │
│   1101                                                                                           │
│   1102 if __name__ == "__main__":                                                                │
│ ❱ 1103 │   main()                                                                                │
│   1104                                                                                           │
│                                                                                                  │
│ facechain/facechain/train_text_to_image_lora.py:924 in main                │
│                                                                                                  │
│    921 │   │   │   │   │   raise ValueError(f"Unknown prediction type {noise_scheduler.config.p  │
│    922 │   │   │   │                                                                             │
│    923 │   │   │   │   # Predict the noise residual and compute loss                             │
│ ❱  924 │   │   │   │   model_pred = unet(noisy_latents, timesteps, encoder_hidden_states).sampl  │
│    925 │   │   │   │   loss = F.mse_loss(model_pred.float(), target.float(), reduction="mean")   │
│    926 │   │   │   │                                                                             │
│    927 │   │   │   │   # Gather the losses across all processes for logging (if we use distribu  │
│                                                                                                  │
│ /home//.local/lib/python3.10/site-packages/torch/nn/modules/module.py:1194 in           │
│ _call_impl                                                                                       │
│                                                                                                  │
│   1191 │   │   # this function, and just call forward.                                           │
│   1192 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o  │
│   1193 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1194 │   │   │   return forward_call(*input, **kwargs)                                         │
│   1195 │   │   # Do not call functions when jit is used                                          │
│   1196 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1197 │   │   if self._backward_hooks or _global_backward_hooks:                                │
│                                                                                                  │
│ /home//.local/lib/python3.10/site-packages/diffusers/models/unet_2d_condition.py:805 in │
│ forward                                                                                          │
│                                                                                                  │
│   802 │   │   # there might be better ways to encapsulate this.                                  │
│   803 │   │   t_emb = t_emb.to(dtype=sample.dtype)                                               │
│   804 │   │                                                                                      │
│ ❱ 805 │   │   emb = self.time_embedding(t_emb, timestep_cond)                                    │
│   806 │   │   aug_emb = None                                                                     │
│   807 │   │                                                                                      │
│   808 │   │   if self.class_embedding is not None:                                               │
│                                                                                                  │
│ /home//.local/lib/python3.10/site-packages/torch/nn/modules/module.py:1194 in           │
│ _call_impl                                                                                       │
│                                                                                                  │
│   1191 │   │   # this function, and just call forward.                                           │
│   1192 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o  │
│   1193 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1194 │   │   │   return forward_call(*input, **kwargs)                                         │
│   1195 │   │   # Do not call functions when jit is used                                          │
│   1196 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1197 │   │   if self._backward_hooks or _global_backward_hooks:                                │
│                                                                                                  │
│ /home//.local/lib/python3.10/site-packages/diffusers/models/embeddings.py:192 in        │
│ forward                                                                                          │
│                                                                                                  │
│   189 │   def forward(self, sample, condition=None):                                             │
│   190 │   │   if condition is not None:                                                          │
│   191 │   │   │   sample = sample + self.cond_proj(condition)                                    │
│ ❱ 192 │   │   sample = self.linear_1(sample)                                                     │
│   193 │   │                                                                                      │
│   194 │   │   if self.act is not None:                                                           │
│   195 │   │   │   sample = self.act(sample)                                                      │
│                                                                                                  │
│ /home//.local/lib/python3.10/site-packages/torch/nn/modules/module.py:1194 in           │
│ _call_impl                                                                                       │
│                                                                                                  │
│   1191 │   │   # this function, and just call forward.                                           │
│   1192 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o  │
│   1193 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1194 │   │   │   return forward_call(*input, **kwargs)                                         │
│   1195 │   │   # Do not call functions when jit is used                                          │
│   1196 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1197 │   │   if self._backward_hooks or _global_backward_hooks:                                │
│                                                                                                  │
│ /home//.local/lib/python3.10/site-packages/torch/nn/modules/linear.py:114 in forward    │
│                                                                                                  │
│   111 │   │   │   init.uniform_(self.bias, -bound, bound)                                        │
│   112 │                                                                                          │
│   113 │   def forward(self, input: Tensor) -> Tensor:                                            │
│ ❱ 114 │   │   return F.linear(input, self.weight, self.bias)                                     │
│   115 │                                                                                          │
│   116 │   def extra_repr(self) -> str:                                                           │
│   117 │   │   return 'in_features={}, out_features={}, bias={}'.format(                          │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: mat1 and mat2 must have the same dtype
Steps:   0%|                                                                                    | 0/600 [00:02<?, ?it/s]
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home//.local/bin/accelerate:8 in <module>                                              │
│                                                                                                  │
│   5 from accelerate.commands.accelerate_cli import main                                          │
│   6 if __name__ == '__main__':                                                                   │
│   7 │   sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0])                         │
│ ❱ 8 │   sys.exit(main())                                                                         │
│   9                                                                                              │
│                                                                                                  │
│ /home//.local/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py:45 in  │
│ main                                                                                             │
│                                                                                                  │
│   42 │   │   exit(1)                                                                             │
│   43 │                                                                                           │
│   44 │   # Run                                                                                   │
│ ❱ 45 │   args.func(args)                                                                         │
│   46                                                                                             │
│   47                                                                                             │
│   48 if __name__ == "__main__":                                                                  │
│                                                                                                  │
│ /home//.local/lib/python3.10/site-packages/accelerate/commands/launch.py:941 in         │
│ launch_command                                                                                   │
│                                                                                                  │
│   938 │   elif defaults is not None and defaults.compute_environment == ComputeEnvironment.AMA   │
│   939 │   │   sagemaker_launcher(defaults, args)                                                 │
│   940 │   else:                                                                                  │
│ ❱ 941 │   │   simple_launcher(args)                                                              │
│   942                                                                                            │
│   943                                                                                            │
│   944 def main():                                                                                │
│                                                                                                  │
│ /home//.local/lib/python3.10/site-packages/accelerate/commands/launch.py:603 in         │
│ simple_launcher                                                                                  │
│                                                                                                  │
│   600 │   process.wait()                                                                         │
│   601 │   if process.returncode != 0:                                                            │
│   602 │   │   if not args.quiet:                                                                 │
│ ❱ 603 │   │   │   raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)    │
│   604 │   │   else:                                                                              │
│   605 │   │   │   sys.exit(1)                                                                    │
│   606                                                                                            │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
CalledProcessError: Command '['miniconda3/bin/python', 'facechain/train_text_to_image_lora.py',
'--pretrained_model_name_or_path=ly261666/cv_portrait_model', '--revision=v2.0', '--sub_path=film/film',
'--dataset_name=./imgs', '--output_dataset_name=./processed', '--caption_column=text', '--resolution=512',
'--random_flip', '--train_batch_size=1', '--num_train_epochs=200', '--checkpointing_steps=5000',
'--learning_rate=1e-04', '--lr_scheduler=cosine', '--lr_warmup_steps=0', '--seed=42', '--output_dir=./output',
'--lora_r=32', '--lora_alpha=32', '--lora_text_encoder_r=32', '--lora_text_encoder_alpha=32']' returned non-zero exit
status 1.

安装mmcv==1.7.0之后运行出现问题

2023-08-21 16:42:31,201 - modelscope - INFO - Model revision not specified, use the latest revision: v1.1
2023-08-21 16:42:31,396 - modelscope - INFO - initiate model from /home/hx/.cache/modelscope/hub/damo/cv_ddsar_face-detection_iclr23-damofd
2023-08-21 16:42:31,396 - modelscope - INFO - initiate model from location /home/hx/.cache/modelscope/hub/damo/cv_ddsar_face-detection_iclr23-damofd.
2023-08-21 16:42:31,397 - modelscope - INFO - initialize model from /home/hx/.cache/modelscope/hub/damo/cv_ddsar_face-detection_iclr23-damofd
Traceback (most recent call last):
File "/home/hx/anaconda3/envs/facechain/lib/python3.8/site-packages/modelscope/utils/registry.py", line 210, in build_from_cfg
return obj_cls._instantiate(**args)
File "/home/hx/anaconda3/envs/facechain/lib/python3.8/site-packages/modelscope/models/base/base_model.py", line 66, in _instantiate
return cls(**kwargs)
File "/home/hx/anaconda3/envs/facechain/lib/python3.8/site-packages/modelscope/models/cv/face_detection/scrfd/damofd_detect.py", line 31, in init
super().init(model_dir, **kwargs)
File "/home/hx/anaconda3/envs/facechain/lib/python3.8/site-packages/modelscope/models/cv/face_detection/scrfd/scrfd_detect.py", line 36, in init
from mmdet.models import build_detector
File "/home/hx/anaconda3/envs/facechain/lib/python3.8/site-packages/mmdet/models/init.py", line 2, in
from .backbones import * # noqa: F401,F403
File "/home/hx/anaconda3/envs/facechain/lib/python3.8/site-packages/mmdet/models/backbones/init.py", line 2, in
from .csp_darknet import CSPDarknet
File "/home/hx/anaconda3/envs/facechain/lib/python3.8/site-packages/mmdet/models/backbones/csp_darknet.py", line 11, in
from ..utils import CSPLayer
File "/home/hx/anaconda3/envs/facechain/lib/python3.8/site-packages/mmdet/models/utils/init.py", line 13, in
from .point_sample import (get_uncertain_point_coords_with_randomness,
File "/home/hx/anaconda3/envs/facechain/lib/python3.8/site-packages/mmdet/models/utils/point_sample.py", line 3, in
from mmcv.ops import point_sample
File "/home/hx/anaconda3/envs/facechain/lib/python3.8/site-packages/mmcv/ops/init.py", line 2, in
from .active_rotated_filter import active_rotated_filter
File "/home/hx/anaconda3/envs/facechain/lib/python3.8/site-packages/mmcv/ops/active_rotated_filter.py", line 10, in
ext_module = ext_loader.load_ext(
File "/home/hx/anaconda3/envs/facechain/lib/python3.8/site-packages/mmcv/utils/ext_loader.py", line 13, in load_ext
ext = importlib.import_module('mmcv.' + name)
File "/home/hx/anaconda3/envs/facechain/lib/python3.8/importlib/init.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
ModuleNotFoundError: No module named 'mmcv._ext'

你好,我尝试过安装2.0.0版本的mmcv,但是安装后训练过程中出现报错,can not import name 'Config' from mmcv
所以后面换回1.7.0版本,但是出现了以上问题,请问该如何解决

CUDA out of memory error on training

When I run into the following error on Alibaba Cloud DSW with an NVIDIA V100 instance

image: modelscope:ubuntu20.04-cuda11.7.1-py38-torch2.0.1-tf1.15.5-1.8.1

DSW NVIDIA V100

08/18/2023 19:46:51 - INFO - __main__ - ***** Running training *****
08/18/2023 19:46:51 - INFO - __main__ -   Num examples = 9
08/18/2023 19:46:51 - INFO - __main__ -   Num Epochs = 200
08/18/2023 19:46:51 - INFO - __main__ -   Instantaneous batch size per device = 1
08/18/2023 19:46:51 - INFO - __main__ -   Total train batch size (w. parallel, distributed & accumulation) = 1
08/18/2023 19:46:51 - INFO - __main__ -   Gradient Accumulation steps = 1
08/18/2023 19:46:51 - INFO - __main__ -   Total optimization steps = 1800
Steps:   0%|                                           | 0/1800 [00:00<?, ?it/s]Traceback (most recent call last):
  File "facechain/train_text_to_image_lora.py", line 1103, in <module>
    main()
  File "facechain/train_text_to_image_lora.py", line 924, in main
    model_pred = unet(noisy_latents, timesteps, encoder_hidden_states).sample
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/diffusers/models/unet_2d_condition.py", line 956, in forward
    sample = upsample_block(
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/diffusers/models/unet_2d_blocks.py", line 2127, in forward
    hidden_states = attn(
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/diffusers/models/transformer_2d.py", line 291, in forward
    hidden_states = block(
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/diffusers/models/attention.py", line 154, in forward
    attn_output = self.attn1(
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/diffusers/models/attention_processor.py", line 321, in forward
    return self.processor(
  File "/opt/conda/lib/python3.8/site-packages/diffusers/models/attention_processor.py", line 601, in __call__
    attention_probs = attn.get_attention_scores(query, key, attention_mask)
  File "/opt/conda/lib/python3.8/site-packages/diffusers/models/attention_processor.py", line 362, in get_attention_scores
    attention_scores = torch.baddbmm(
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 512.00 MiB (GPU 0; 15.78 GiB total capacity; 8.13 GiB already allocated; 469.75 MiB free; 8.35 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

mmcv和modelscope版本问题

目前看上去modelscop库里的很多task依然是基于mmcv<2.0.0编写的,要修改大量的地方,比如from mmcv.parallel import MMDataParallel ,后续会有更新吗

Error: nms_impl: implementation for device cuda:0 not found.

When upload picture and start trainning, there will be error on server side.
`2023-08-19 16:09:33,371 - modelscope - INFO - load model done
cathed for image process of 000.jpg
Error: nms_impl: implementation for device cuda:0 not found.

[]
Error: result is empty.
Traceback (most recent call last):
File "C:\ProgramData\anaconda3\envs\fchain\lib\site-packages\gradio\routes.py", line 488, in run_predict
output = await app.get_blocks().process_api(
File "C:\ProgramData\anaconda3\envs\fchain\lib\site-packages\gradio\blocks.py", line 1431, in process_api
result = await self.call_function(
File "C:\ProgramData\anaconda3\envs\fchain\lib\site-packages\gradio\blocks.py", line 1109, in call_function
prediction = await anyio.to_thread.run_sync(
File "C:\ProgramData\anaconda3\envs\fchain\lib\site-packages\anyio\to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "C:\ProgramData\anaconda3\envs\fchain\lib\site-packages\anyio_backends_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
File "C:\ProgramData\anaconda3\envs\fchain\lib\site-packages\anyio_backends_asyncio.py", line 807, in run
result = context.run(func, *args)
File "C:\ProgramData\anaconda3\envs\fchain\lib\site-packages\gradio\utils.py", line 706, in wrapper
response = f(*args, **kwargs)
File "D:\dev\facechain\app.py", line 174, in run
data_process_fn(instance_data_dir, True)
File "D:\dev\facechain\facechain\inference.py", line 24, in data_process_fn
out_json_name = data_process_fn(input_img_dir)
File "D:\dev\facechain\facechain\data_process\preprocessing.py", line 335, in call
exit()
File "C:\ProgramData\anaconda3\envs\fchain\lib_sitebuiltins.py", line 26, in call
raise SystemExit(code)
SystemExit: None`

有更多的服饰吗

目前只有几个屌爆了的silver armor之类的

examples = {
'prompt_male': [
['silver armor'],
['T-shirt']
],
'prompt_female': [
['beautiful traditional hanfu, upper_body'],
['an elegant evening gown']
],
}

example_styles = [
{'name': '默认风格(default style)'},
{'name': '凤冠霞帔(Chinese traditional gorgeous suit)',
'model_id': 'ly261666/civitai_xiapei_lora',
'revision': 'v1.0.0',
'bin_file': 'xiapei.safetensors',
'multiplier_style': 0.35,
'add_prompt_style': 'red, hanfu, tiara, crown, '},
]

torch.distributed.elastic.multiprocessing.errors.ChildFailedError

run the script PYTHONPATH=. sh train_lora.sh "ly261666/cv_portrait_model" "v2.0" "film/film" "./imgs" "./processed" "./output"

ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 1 (pid: 399410) of binary: /home/disk01/wyw/.conda/envs/facechain/bin/python
Traceback (most recent call last):
File "/home/disk01/wyw/.conda/envs/facechain/bin/accelerate", line 8, in
sys.exit(main())
File "/home/disk01/wyw/.conda/envs/facechain/lib/python3.8/site-packages/accelerate/commands/accelerate_cli.py", line 45, in main
args.func(args)
File "/home/disk01/wyw/.conda/envs/facechain/lib/python3.8/site-packages/accelerate/commands/launch.py", line 970, in launch_command
multi_gpu_launcher(args)
File "/home/disk01/wyw/.conda/envs/facechain/lib/python3.8/site-packages/accelerate/commands/launch.py", line 646, in multi_gpu_launcher
distrib_run.run(args)
File "/home/disk01/wyw/.conda/envs/facechain/lib/python3.8/site-packages/torch/distributed/run.py", line 785, in run
elastic_launch(
File "/home/disk01/wyw/.conda/envs/facechain/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 134, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/disk01/wyw/.conda/envs/facechain/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

facechain/train_text_to_image_lora.py FAILED

/opt/conda/bin/python: can't open file 'facechain/train_text_to_image_lora.py': [Errno 2] No such file or directory

使用容器部署,GPU A10 NVIDIA-SMI 525.105.17 Driver Version: 525.105.17 CUDA Version: 12.0
webUI执行训练后后台日志输出如下错误,web现实训练完成但是形象体验Error

/opt/conda/bin/python: can't open file 'facechain/train_text_to_image_lora.py': [Errno 2] No such file or directory
Traceback (most recent call last):
File "/opt/conda/bin/accelerate", line 8, in
sys.exit(main())
File "/opt/conda/lib/python3.8/site-packages/accelerate/commands/accelerate_cli.py", line 45, in main
args.func(args)
File "/opt/conda/lib/python3.8/site-packages/accelerate/commands/launch.py", line 979, in launch_command
simple_launcher(args)
File "/opt/conda/lib/python3.8/site-packages/accelerate/commands/launch.py", line 628, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/opt/conda/bin/python', 'facechain/train_text_to_image_lora.py', '--pretrained_model_name_or_path=ly261666/cv_portrait_model', '--revision=v2.0', '--sub_path=film/film', '--output_dataset_name=/tmp/qw/training_data/personalizaition_lora', '--caption_column=text', '--resolution=512', '--random_flip', '--train_batch_size=1', '--num_train_epochs=200', '--checkpointing_steps=5000', '--learning_rate=1e-04', '--lr_scheduler=cosine', '--lr_warmup_steps=0', '--seed=42', '--output_dir=/tmp/qw/personalizaition_lora', '--lora_r=32', '--lora_alpha=32', '--lora_text_encoder_r=32', '--lora_text_encoder_alpha=32']' returned non-zero exit status 2.

无法下载 人脸融合模型 cv_unet-image-face-fusion_damo

你好,能提供下载链接吗

报错如下:

Can't download 3,2gb model

It's currently not possible to download the 3.20GB model.
The download fails at ~95%. This reproduceable on colab and local

Downloading:  92% 2.95G/3.20G [01:54<00:05, 46.1MB/s]


Downloading:  93% 2.97G/3.20G [01:54<00:05, 48.3MB/s]


Downloading:  93% 2.98G/3.20G [01:55<00:06, 34.6MB/s]


Downloading:  94% 3.00G/3.20G [01:55<00:06, 32.4MB/s]


Downloading:  94% 3.01G/3.20G [01:57<00:10, 19.7MB/s]


Downloading:  95% 3.04G/3.20G [01:58<00:07, 21.9MB/s]


Downloading:  95% 3.05G/3.20G [01:59<00:08, 20.7MB/s]Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/urllib3/response.py", line 710, in _error_catcher
    yield
  File "/usr/local/lib/python3.10/dist-packages/urllib3/response.py", line 814, in _raw_read
    data = self._fp_read(amt) if not fp_closed else b""
  File "/usr/local/lib/python3.10/dist-packages/urllib3/response.py", line 799, in _fp_read
    return self._fp.read(amt) if amt is not None else self._fp.read()
  File "/usr/lib/python3.10/http/client.py", line 466, in read
    s = self.fp.read(amt)
  File "/usr/lib/python3.10/socket.py", line 705, in readinto
    return self._sock.recv_into(b)
ConnectionResetError: [Errno 104] Connection reset by peer

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/requests/models.py", line 816, in generate
    yield from self.raw.stream(chunk_size, decode_content=True)
  File "/usr/local/lib/python3.10/dist-packages/urllib3/response.py", line 940, in stream
    data = self.read(amt=amt, decode_content=decode_content)
  File "/usr/local/lib/python3.10/dist-packages/urllib3/response.py", line 879, in read
    data = self._raw_read(amt)
  File "/usr/local/lib/python3.10/dist-packages/urllib3/response.py", line 813, in _raw_read
    with self._error_catcher():
  File "/usr/lib/python3.10/contextlib.py", line 153, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/usr/local/lib/python3.10/dist-packages/urllib3/response.py", line 727, in _error_catcher
    raise ProtocolError(f"Connection broken: {e!r}", e) from e
urllib3.exceptions.ProtocolError: ("Connection broken: ConnectionResetError(104, 'Connection reset by peer')", ConnectionResetError(104, 'Connection reset by peer'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/gradio/routes.py", line 488, in run_predict
    output = await app.get_blocks().process_api(
  File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1431, in process_api
    result = await self.call_function(
  File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1109, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "/usr/local/lib/python3.10/dist-packages/anyio/to_thread.py", line 33, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
    return await future
  File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 807, in run
    result = context.run(func, *args)
  File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 706, in wrapper
    response = f(*args, **kwargs)
  File "/content/facechain/app.py", line 184, in run
    data_process_fn(instance_data_dir, True)
  File "/content/facechain/facechain/inference.py", line 23, in data_process_fn
    data_process_fn = Blipv2()
  File "/content/facechain/facechain/data_process/preprocessing.py", line 202, in __init__
    self.model = DeepDanbooru()
  File "/content/facechain/facechain/data_process/deepbooru.py", line 721, in __init__
    snapshot_path = snapshot_download(foundation_model_id, revision='v4.0')
  File "/usr/local/lib/python3.10/dist-packages/modelscope/hub/snapshot_download.py", line 140, in snapshot_download
    parallel_download(
  File "/usr/local/lib/python3.10/dist-packages/modelscope/hub/file_download.py", line 243, in parallel_download
    list(executor.map(download_part, tasks))
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 621, in result_iterator
    yield _result_or_cancel(fs.pop())
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 319, in _result_or_cancel
    return fut.result(timeout)
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 458, in result
    return self.__get_result()
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/local/lib/python3.10/dist-packages/modelscope/hub/file_download.py", line 203, in download_part
    for chunk in r.iter_content(chunk_size=API_FILE_DOWNLOAD_CHUNK_SIZE):
  File "/usr/local/lib/python3.10/dist-packages/requests/models.py", line 818, in generate
    raise ChunkedEncodingError(e)
requests.exceptions.ChunkedEncodingError: ("Connection broken: ConnectionResetError(104, 'Connection reset by peer')", ConnectionResetError(104, 'Connection reset by peer'))

在colab跑起来出错

在colab上跑起来,用的是A100,到最后一步都正常,网页打开,上传照片后,点击开始训练,提示“CUDA is not available”
日志如下
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/gradio/routes.py", line 488, in run_predict
output = await app.get_blocks().process_api(
File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1431, in process_api
result = await self.call_function(
File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1109, in call_function
prediction = await anyio.to_thread.run_sync(
File "/usr/local/lib/python3.10/dist-packages/anyio/to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 807, in run
result = context.run(func, *args)
File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 706, in wrapper
response = f(*args, **kwargs)
File "/content/facechain/app.py", line 123, in run
raise gr.Error('CUDA is not available.')
gradio.exceptions.Error: 'CUDA is not available.'

Error when training data

在windows11下运行app.py出现这个错误:'PYTHONPATH' 不是内部或外部命令,也不是可运行的程序

k8s 上跑遇见的问题

Dockerfile

FROM registry.cn-hangzhou.aliyuncs.com/modelscope-repo/modelscope:ubuntu20.04-cuda11.7.1-py38-torch2.0.1-tf1.15.5-1.8.0
RUN pip3 install gradio

SHELL ["/bin/bash", "--login", "-c"]
RUN GIT_LFS_SKIP_SMUDGE=1 git clone https://github.com/modelscope/facechain.git --depth 1
WORKDIR facechain
ENV NVIDIA_DISABLE_REQUIRE=true

ENTRYPOINT ["python3","app.py"]

阿里云的 k8s调度的ecs

2023-08-20 09:08:15.817344: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
/opt/conda/lib/python3.8/site-packages/torch/cuda/init.py:107: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 803: system has unsupported display driver / cuda driver combination (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:109.)
return torch._C._cuda_getDeviceCount() > 0
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
app.py:302: GradioDeprecationWarning: The style method is deprecated. Please set these arguments in the constructor instead.
output_images = gr.Gallery(label='Output', show_label=False).style(columns=3, rows=2, height=600,

Could not find a version that satisfies the requirement tf-estimator-nightly==2.8.0.dev2021122109

2.8.0的tensorflow好像有问题,python的3.8可以下载这个包吗?
INFO: pip is looking at multiple versions of tensorflow to determine which version is compatible with other requirements. This could take a while.
ERROR: Ignored the following versions that require a different python version: 1.11.0 Requires-Python <3.13,>=3.9; 1.11.0rc1 Requires-Python <3.13,>=3.9; 1.11.0rc2 Requires-Python <3.13,>=3.9; 1.11.1 Requires-Python <3.13,>=3.9; 1.11.2 Requires-Python <3.13,>=3.9; 1.25.0 Requires-Python >=3.9; 1.25.0rc1 Requires-Python >=3.9; 1.25.1 Requires-Python >=3.9; 1.25.2 Requires-Python >=3.9; 1.26.0b1 Requires-Python <3.13,>=3.9; 3.8.0rc1 Requires-Python >=3.9
ERROR: Could not find a version that satisfies the requirement tf-estimator-nightly==2.8.0.dev2021122109 (from tensorflow) (from versions: none)
ERROR: No matching distribution found for tf-estimator-nightly==2.8.0.dev2021122109

出的效果图与本人相差甚远?

按照Readme提示,在notebook上部署体验了下,感受是:跟本人五官不说毫无关系吧,也是相差甚远,几乎看不出来是本人,可是样例中效果来看没有这种问题,请问需要调节哪些参数从而增加体验感?

CUDA is not available issue with colab

all works fine but when start training i get this error
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/gradio/routes.py", line 488, in run_predict
output = await app.get_blocks().process_api(
File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1431, in process_api
result = await self.call_function(
File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1109, in call_function
prediction = await anyio.to_thread.run_sync(
File "/usr/local/lib/python3.10/dist-packages/anyio/to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 807, in run
result = context.run(func, *args)
File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 706, in wrapper
response = f(*args, **kwargs)
File "/content/facechain/app.py", line 123, in run
raise gr.Error('CUDA is not available.')
gradio.exceptions.Error: 'CUDA is not available.'

Does it support run on local machine?

I try to train lora on my machine? but it raise error

In [1]: from modelscope import snapshot_download
^[[A2023-08-14 11:23:11,600 - modelscope - INFO - PyTorch version 2.0.0+cu118 Found.
2023-08-14 11:23:11,602 - modelscope - INFO - TensorFlow version 2.13.0 Found.
2023-08-14 11:23:11,602 - modelscope - INFO - Loading ast index from /root/.cache/modelscope/ast_indexer
2023-08-14 11:23:11,631 - modelscope - INFO - Loading done! Current index file version is 1.8.1, with md5 bbb8dd73324c667bf9ab6594815ac903 and a total number of 893 components indexed

In [2]: model_dir = snapshot_download('Cherrytest/rot_bgr', revision='v1.0.0')
2023-08-14 11:23:13,696 - modelscope - ERROR - Authentication token does not exist, failed to access model Cherrytest/rot_bgr which may not exist or may be                 private. Please login first.

关于cuda11.7

我在部署facechain的时候,系统要求cuda11.7,可是在ubuntu上的 4090对应的显卡驱动都是支持的cuda12.2的,安装不上11.7。请问是不是cuda12.2 的也可以,为什么我的这边一直报错,训练的时候。。

还有两个问题:
1: 如果conda安装的python3.10.6的时候,如果是cuda12.2,那么mim install mmcv-full==1.7.0压根就无法执行安装
2: 但是如果用python3.8的话是可以安装成功mim install mmcv-full==1.7.0,启动程序,但是训练的时候报错。

执行"开始推理"时报错 OSError: [Errno 122] Disk quota exceeded

按照 ModelScope notebook 的方式跑起来了, 提示模型训练成功后, 执行"开始推理"时报错 OSError: [Errno 122] Disk quota exceeded

运行环境: 魔搭平台免费实例, PAI-DSW, GPU环境

8核 32GB 显存16G
预装 ModelScope Library
预装镜像  ubuntu20.04-cuda11.7.1-py38-torch2.0.1-tf1.15.5-1.8.1

工作空间的空间大小如下

root@dsw:/mnt/workspace# du -h -d 1
14G     ./.cache
73K     ./.ipynb_checkpoints
8.5K    ./.virtual_documents
574K    ./facechain
14G     .

魔搭平台免费实例是不是提供的硬盘太小了? 看 facechain 官方 README 是要求 Disk: About 50GB

有没有什么办法能够在魔搭平台免费实例上成功体验完全流程呢?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.