openbmb / cpm-live Goto Github PK
View Code? Open in Web Editor NEWLive Training for Open-source Big Models
Live Training for Open-source Big Models
修改配置为:
export CUDA_VISIBLE_DEVICES=0
GPUS_PER_NODE=1
NNODES=1
请问可能有什么原因造成呢?
但是得到了错误:
Traceback (most recent call last):
File "tune_cpm_ant.py", line 47, in
tune.run(data)
File "/home/shanhoo3/fkb/remote_project/cpm/cpm-live/examples/tune.py", line 222, in run
self.forward(train_dataloader, eval_dataloader, cls_num=self.cls_num)
File "/home/shanhoo3/fkb/remote_project/cpm/cpm-live/examples/tune.py", line 122, in forward
loss = self._forward(train_data, cls_num=cls_num)
File "/home/shanhoo3/fkb/remote_project/cpm/cpm-live/examples/tune.py", line 350, in _forward
loss = self.loss_function(logits, targets.view(-1))
File "/home/shanhoo3/anaconda3/envs/cpm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/shanhoo3/anaconda3/envs/cpm/lib/python3.8/site-packages/bmtrain/loss/cross_entropy.py", line 200, in forward
w = (target != self.ignore_index).int()
RuntimeError: CUDA error: no kernel image is available for execution on the device
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 16863) of binary: /home/shanhoo3/anaconda3/envs/cpm/bin/python
Traceback (most recent call last):
File "/home/shanhoo3/anaconda3/envs/cpm/bin/torchrun", line 33, in
sys.exit(load_entry_point('torch==1.10.1', 'console_scripts', 'torchrun')())
File "/home/shanhoo3/anaconda3/envs/cpm/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 345, in wrapper
return f(*args, **kwargs)
File "/home/shanhoo3/anaconda3/envs/cpm/lib/python3.8/site-packages/torch/distributed/run.py", line 719, in main
run(args)
File "/home/shanhoo3/anaconda3/envs/cpm/lib/python3.8/site-packages/torch/distributed/run.py", line 710, in run
elastic_launch(
File "/home/shanhoo3/anaconda3/envs/cpm/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/shanhoo3/anaconda3/envs/cpm/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 259, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
cpm ant++ 使用几十万级别的素材tune训练之后得到best.pt文件,但在推理阶段输入对应的input,得不到想要的结果,全是英文字符和---------这种符号,不知道哪块环节出了问题
AdamOffloadOptimizer is no such parameter: scale
self.optimizer = bmt.optim.AdamOffloadOptimizer(
model.parameters(), weight_decay=0.01, scale=1048576
)
Traceback (most recent call last):
File "tune_cpm_ant.py", line 33, in
tune = config_dict["tune"](
File "/home/shanhoo3/fkb/remote_project/cpm/cpm-live/examples/tune.py", line 291, in init
super().init(**kwargs)
File "/home/shanhoo3/fkb/remote_project/cpm/cpm-live/examples/tune.py", line 57, in init
self.optimizer = bmt.optim.AdamOffloadOptimizer(
TypeError: init() got an unexpected keyword argument 'scale'
我当前有两个任务,task1和task2,其中task2需要基于task1训练完的checkpoint来训练。但是发现task1训练完生成的best.pt可以在task1的infer过程中被加载,但是无法在task2的tune过程中加载,明明model的state_dict中的key和best.pt的state_dict中的key都是一致的,但是却报错有Unexpected key(s) in state_dict: "generator.encoder.layers.0.self_att.self_attention.project_q.lora.lora_A",请问下是什么原因导致的呢?
您好!非常赞你们的工作,看智源大会时候了解你们使用的结构是Transformer Encoder,请问你们是在训练的时候也会随机的加上attention-mask吗,还是说针对生成任务需要在微调?
1 GPU
command : python3 text_generation.py --use-bminf --memory-limit 4
error :
Traceback (most recent call last):
File "text_generation.py", line 34, in
model = bminf.wrapper(model, quantization=False, memory_limit=args.memory_limit << 30)
File "/usr/local/lib/python3.8/dist-packages/bminf/wrapper.py", line 55, in wrapper
model, found_linear = _wrapper(model, quantization, False, memory_limit)
File "/usr/local/lib/python3.8/dist-packages/bminf/wrapper.py", line 30, in _wrapper
model._modules[kw], fd = _wrapper(model._modules[kw], quantization, is_module_list or is_in_blocklist, memory_limit)
File "/usr/local/lib/python3.8/dist-packages/bminf/wrapper.py", line 30, in _wrapper
model._modules[kw], fd = _wrapper(model._modules[kw], quantization, is_module_list or is_in_blocklist, memory_limit)
File "/usr/local/lib/python3.8/dist-packages/bminf/wrapper.py", line 34, in _wrapper
model = TransformerBlockList([
File "/usr/local/lib/python3.8/dist-packages/bminf/scheduler/init.py", line 382, in init
raise ValueError("Missing some parameters in layer %d" % i)
ValueError: Missing some parameters in layer 2
单卡环境,不用bminf可以正常运行,尝试加上bminf就报如上错误,求教
![image](https://user-images.githubusercontent.com/92961574/205198889-81b48fa4-e5dd-4aa7-8a77-2251e9881c91.png)
代码配置如上所示,train.json9万条数据,eval9000条数据
Originally posted by @touwenameng in #254 (comment)
错误提示:
Collecting bmtrain==0.1.8.post1
Using cached bmtrain-0.1.8.post1.tar.gz (48 kB)
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error
× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [12 lines of output]
Traceback (most recent call last):
File "", line 2, in
File "", line 34, in
File "/tmp/pip-install-c0_v5dr2/bmtrain_c0a1c88cbb974a93b109264909d72dd8/setup.py", line 52, in
CUDAExtension('bmtrain.nccl._C', [
File "/home/bmxm/anaconda3/envs/apm-ant-plus/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 932, in CUDAExtension
library_dirs += library_paths(cuda=True)
File "/home/bmxm/anaconda3/envs/apm-ant-plus/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1040, in library_paths
if (not os.path.exists(_join_cuda_home(lib_dir)) and
File "/home/bmxm/anaconda3/envs/apm-ant-plus/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 2058, in _join_cuda_home
raise EnvironmentError('CUDA_HOME environment variable is not set. '
OSError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root.
[end of output]
大佬,这种CUDA版本貌似不支持,该如何处理呢
模力表格为啥非要用微信扫码登录,我真的不想拿起手机、现在手机坏了,有几个月没修,只充话费保号码,收不到短信
能不能用github登录
github不行的话,用账号密码登录吧
好多在github的网站或者开源软件都支持github登录的
我使用cpm_ant_plus 来跑inference测试即text_generation.py没有问题,但是测试scripts/CCPM_ddp.sh的时候报错
python3 -m torch.distributed.launch --master_addr localhost --master_port 1234 --nproc_per_node 2 --nnodes 1 tune_cpm_ant.py --dataset-name CCPM --dataset-path cpm_ant_plus/CPM-Live/cpm-live/examples/data/oss_cuge/CCPM --output-path cpm_ant_plus/CPM-Live/cpm-live/examples/fintune_model/CCPM --model-path cpm_ant_plus/CPM-Live/cpm-live/model/cpm-ant-plus-10b.pt --config-path cpm_ant_plus/CPM-Live/cpm-live/model/cpm-ant-plus-10b.json --batch-size 32 --early-stop-patience 10 --eval-interval 50 --tune-maxlen 256 --lr 5e-3 --warmup-iters 50 --epochs 20 --infer-maxlen 1
/root/anaconda3/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated
and will be removed in future. Use torch.distributed.run.
Note that --use_env is set by default in torch.distributed.run.
If your script expects --local_rank
argument to be set, please
change it to read from os.environ['LOCAL_RANK']
instead. See
https://pytorch.org/docs/stable/distributed.html#launch-utility for
further instructions
warnings.warn(
WARNING:torch.distributed.run:*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
/root/anaconda3/lib/python3.8/site-packages/requests/init.py:109: RequestsDependencyWarning: urllib3 (1.26.14) or chardet (2.1.1)/charset_normalizer (2.1.1) doesn't match a supported version!
warnings.warn(
/root/anaconda3/lib/python3.8/site-packages/requests/init.py:109: RequestsDependencyWarning: urllib3 (1.26.14) or chardet (2.1.1)/charset_normalizer (2.1.1) doesn't match a supported version!
warnings.warn(
====================== Initialization ======================
rank : 0
local_rank : 0
world_size : 2
local_size : 2
master : localhost:1234
device : 0
cpus : [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1
3, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 2
4, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 3
5, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 4
6, 47]
====================== Initialization ======================
rank : 1
local_rank : 1
world_size : 2
local_size : 2
master : localhost:1234
device : 1
cpus : [48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58,
59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69,
70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80,
81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91,
92, 93, 94, 95]
root
├── encoder (Encoder)
│ ├── layers (TransformerBlockList)
│ │ └── 0-47(CheckpointBlock)
│ │ ├── self_att (SelfAttentionBlock)
│ │ │ ├── layernorm_before_attention (LayerNorm) weight:[0]
│ │ │ └── self_attention (Attention)
│ │ │ ├── project_q,project_v(Linear) weight:[0]
│ │ │ │ └── lora (LowRankLinear) lora_A:[8, 4096] lora_B:[4096, 8]
│ │ │ └── project_k,attention_out(Linear) weight:[0]
│ │ └── ffn (FFNBlock)
│ │ ├── layernorm_before_ffn (LayerNorm) weight:[0]
│ │ └── ffn (FeedForward)
│ │ ├── w_in (DenseGatedACT)
│ │ │ ├── w_0 (Linear) weight:[12587008]
│ │ │ └── w_1 (Linear) weight:[41943040]
│ │ └── w_out (Linear) weight:[41943040]
│ └── output_layernorm (LayerNorm) weight:[2048]
├── segment_embedding (Embedding) weight:[65536]
├── input_embedding (Embedding) weight:[179410944]
└── position_bias (SegmentPositionEmbedding) relative_attention_bias:[24576]
[INFO|(OpenDelta)basemodel:696]2023-02-17 19:26:12,415 >> Trainable Ratio: 6291456/4816502784=0.130623%
[INFO|(OpenDelta)basemodel:698]2023-02-17 19:26:12,415 >> Delta Parameter Ratio: 6291456/4816502784=0.130623%
[INFO|(OpenDelta)basemodel:700]2023-02-17 19:26:12,415 >> Static Memory 8.97 GB, Max Memory 9.63 GB
root
├── encoder (Encoder)
│ ├── layers (TransformerBlockList)
│ │ └── 0-47(CheckpointBlock)
│ │ ├── self_att (SelfAttentionBlock)
│ │ │ ├── layernorm_before_attention (LayerNorm) weight:[4096]
│ │ │ └── self_attention (Attention)
│ │ │ ├── project_q,project_v(Linear) weight:[16777216]
│ │ │ │ └── lora (LowRankLinear) lora_A:[8, 4096] lora_B:[4096, 8]
│ │ │ └── project_k,attention_out(Linear) weight:[16777216]
│ │ └── ffn (FFNBlock)
│ │ ├── layernorm_before_ffn (LayerNorm) weight:[4096]
│ │ └── ffn (FeedForward)
│ │ ├── w_in (DenseGatedACT)
│ │ │ ├── w_0 (Linear) weight:[29356032]
│ │ │ └── w_1 (Linear) weight:[0]
│ │ └── w_out (Linear) weight:[0]
│ └── output_layernorm (LayerNorm) weight:[2048]
├── segment_embedding (Embedding) weight:[65536]
├── input_embedding (Embedding) weight:[179410944]
└── position_bias (SegmentPositionEmbedding) relative_attention_bias:[24576]
[INFO|(OpenDelta)basemodel:696]2023-02-17 19:26:12,508 >> Trainable Ratio: 6291456/4816502784=0.130623%
[INFO|(OpenDelta)basemodel:698]2023-02-17 19:26:12,509 >> Delta Parameter Ratio: 6291456/4816502784=0.130623%
[INFO|(OpenDelta)basemodel:700]2023-02-17 19:26:12,509 >> Static Memory 8.97 GB, Max Memory 10.30 GB
[INFO] Tuning begins...
Traceback (most recent call last):
File "tune_cpm_ant.py", line 47, in
tune.run(data)
File "/search/ai/kaitongyang/cpm_ant_plus/CPM-Live/cpm-live/examples/tune.py", line 220, in run
self.forward(train_dataloader, eval_dataloader, cls_num=self.cls_num)
File "/search/ai/kaitongyang/cpm_ant_plus/CPM-Live/cpm-live/examples/tune.py", line 121, in forward
global_loss = bmt.sum_loss(loss).item()
File "/root/anaconda3/lib/python3.8/site-packages/bmtrain/synchronize.py", line 34, in sum_loss
return distributed.all_reduce(loss, "avg")
File "/root/anaconda3/lib/python3.8/site-packages/bmtrain/distributed/ops.py", line 92, in all_reduce
return OpAllReduce.apply(x, op)
File "/root/anaconda3/lib/python3.8/site-packages/bmtrain/distributed/ops.py", line 50, in forward
ncclAllReduce(
File "/root/anaconda3/lib/python3.8/site-packages/bmtrain/nccl/init.py", line 118, in allReduce
C.ncclAllReduce(
RuntimeError: NCCL Error: invalid argument
Traceback (most recent call last):
File "tune_cpm_ant.py", line 47, in
tune.run(data)
File "/search/ai/kaitongyang/cpm_ant_plus/CPM-Live/cpm-live/examples/tune.py", line 220, in run
self.forward(train_dataloader, eval_dataloader, cls_num=self.cls_num)
File "/search/ai/kaitongyang/cpm_ant_plus/CPM-Live/cpm-live/examples/tune.py", line 121, in forward
global_loss = bmt.sum_loss(loss).item()
File "/root/anaconda3/lib/python3.8/site-packages/bmtrain/synchronize.py", line 34, in sum_loss
return distributed.all_reduce(loss, "avg")
File "/root/anaconda3/lib/python3.8/site-packages/bmtrain/distributed/ops.py", line 92, in all_reduce
return OpAllReduce.apply(x, op)
File "/root/anaconda3/lib/python3.8/site-packages/bmtrain/distributed/ops.py", line 50, in forward
ncclAllReduce(
File "/root/anaconda3/lib/python3.8/site-packages/bmtrain/nccl/init.py", line 118, in allReduce
C.ncclAllReduce(
RuntimeError: NCCL Error: invalid argument
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 308683) of binary: /root/anaconda3/bin/python3
Traceback (most recent call last):
File "/root/anaconda3/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/root/anaconda3/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/root/anaconda3/lib/python3.8/site-packages/torch/distributed/launch.py", line 193, in
main()
File "/root/anaconda3/lib/python3.8/site-packages/torch/distributed/launch.py", line 189, in main
launch(args)
File "/root/anaconda3/lib/python3.8/site-packages/torch/distributed/launch.py", line 174, in launch
run(args)
File "/root/anaconda3/lib/python3.8/site-packages/torch/distributed/run.py", line 689, in run
elastic_launch(
File "/root/anaconda3/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 116, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/root/anaconda3/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 244, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
tune_cpm_ant.py FAILED
Other Failures:
[1]:
time: 2023-02-17_19:26:16
rank: 1 (local_rank: 1)
exitcode: 1 (pid: 308684)
error_file: <N/A>
msg: "Process failed with exitcode 1"
Hi all,
请问在哪里可以看到这两个模型的模型权重的具体license嘛
参照hugging face openbmb/cpm-ant-10b model card上的描述
from transformers import CPMAntTokenizer, CPMAntModel
texts = ["今天天气真好!"]
model = CPMAntModel.from_pretrained("openbmb/cpm-ant-10b")
tokenizer = CPMAntTokenizer.from_pretrained("openbmb/cpm-ant-10b")
input_ids = tokenizer.get_model_input(texts)
logits, hidden = model(**input_ids)
print(logits.shape)
print(hidden.shape)
得到如下错误
cannot import name 'CPMAntTokenizer' from 'transformers'
ImportError: cannot import name 'CPMAntModel' from 'transformers'
已经transformers更新至最新版本(4.24.0),请问是怎么回事呢
你好,如果不使用bminf,text_generation.py中默认的显存是12G,我现在显卡是16G的v100,在使用默认配置运行text_generation.py时显示cuda out of memory,请问是什么原因呢
如果不是就当我没问,如果是请你写上中文readme再来推广!!!
Hi, I'm trying to fully-finetuning on CPM-ANT+. I followed the instructions provided in readme, using the preprocess_dataset.py to generate the binary data file. But it seems that when world_size > 1 (in distributed mode), the read() method in DistributedDataset will raise an error "Empty Dataset", while the data will be successfully read in single node mode. Could you help me fix it? Thanks.
CPM-Live/cpm-live/pretrain_cpm_ant_plus.py
Line 427 in e0cee47
I want to serving a cpm-plus-10b model,but all failed.
when I use bminf wrapper,it's too slow,and always oom;since I have 4 gpus, then I try deepspeed to wrapper model,but failed too.
is there any serving example code ?
现在基于ant+基础模型之上又做了一些业务相关的预训练,然后用了加权平均的方案得到句向量。
从效果上看有一定的正向关系,但是没有达到预期效果(有点摸棱两可的感觉)。所以这里想咨询一下官方有没有对应的方案?
I am adapting CPM-Ant to a NLG task using LoRA. I find there maybe a bug in the code. When tuning, the padding_side is set to "right", e.g. in tune.py:
padded[key] = pad(items, key, _padding_value, padding_side="right")
however, when generation, the padding_size is set to "left", e.g., in generation/ant.py:
padded[key] = pad(input_tensors, key, padding_side='left')
This inconsistency may degrade the performance of model when conducting model inference with batch size larger than 1.
尝试下载了CPM-Ant 3b、7b、10b模型,但是全部无法在windows和ubuntu上解压。300M的可以,麻烦check一下大模型是否能正常在ubuntu上解压。
你好,请问“data_bin_new”下的文件格式是什么样子的?能给个示例吗?
学习下CPM_Live的代码。
As mentioned in the title, since chatgpt is now publicly accessible, it is possible that we can obtain enough high-quality corpus for training cpm-ant or cpm-bee through chatgpt. 😄
Some useful information:
lora+cpm方式在部署时可以解藕么?
即cpm部署在A机器,lora部署在B机器的场景
请问多卡fine-tune和inference是否有相应的代码呢
readme给出的代码,ckpt在哪里下载
from cpm_live.models import CPMAnt, CPMAntConfig
import bmtrain as bmt
bmt.init_distributed(seed=0)
config = CPMAntConfig.from_json_file("YOUR_PATH/cpm-ant-10b.json")
ckpt_path = "YOUR_PATH/cpm-ant-10b.pt"
# You can load the compressed models in the same way!
# config = CPMAntConfig.from_json_file("YOUR_PATH/cpm-ant-3b.json")
# ckpt_path = "YOUR_PATH/cpm-ant-3b.pt"
model = CPMAnt(config=config)
bmt.load(model, ckpt_path)
我尝试了多种方式:
1 model.load_state_dict(torch.load(args.model_path), strict=False)
2 bmt.load(model, args.LoRA_path,strict=False)
但是,打印模型参数后发现并没有被读取进去。为什么发生这样的情况。
# load model
bmt.init_distributed(seed=0)
config = CPMAntConfig.from_json_file(args.config_path)
model = CPMAntPlus(config=config)
bmt.load(model, args.model_path,strict=True)
# insert LoRA
#delta_model = AutoDeltaModel.from_finetuned(args.LoRA_path, backbone_model=model)
delta_model = LoraModel(backbone_model=model, modified_modules=["project_q", "project_v"], backend="bmt")
delta_model.freeze_module(exclude=["deltas"], set_state_dict=True)
Thank you for your good work. However, I have some doubts about the following code: (Source in ant_torch.py#L144 when I ran Ant model.)
context
is all set to True
, and span
is all set to 0
on _convert_to_tensors, and it seems mask is all to 1
after the following code. So what do those codes?with torch.no_grad():
device = input.device
directional_mask_2d = torch.arange(seqlen, device=device) <= torch.arange(
seqlen, device=device
).view(-1, 1)
attention_mask = context[:, None, :] | (
context[:, :, None].logical_not() & directional_mask_2d.view(1, seqlen, seqlen)
)
attention_mask = attention_mask & (span[:, None, :] == span[:, :, None])
mask_1d = (
torch.arange(seqlen, device=device)[None, :].repeat(batch, 1) < length[:, None]
)
attention_mask = (
mask_1d.view(batch, seqlen, 1) & mask_1d.view(batch, 1, seqlen) & attention_mask
)
经过在线测试,感觉效果不错。但没有对应的语料很难finetune出类似效果的模型。
所以请问智取标题这个模型有开源么
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.