openbmb / cpm-live Goto Github PK

Live Training for Open-source Big Models

Python 99.34% Shell 0.66%

deep-learning multi-task-learning natural-language-generation natural-language-understanding nlp parameter-efficient-learning pretrained-language-model natural-language-processing

cpm-live's People

Contributors

Stargazers

Watchers

cpm-live's Issues

为什么CPM ANT+现在用bminf微调没有效果了呢，之前都是可以的，示范代码里面也是有的啊

单机单卡运行问题。 3090+cuda11.3+torch1.10

修改配置为：
export CUDA_VISIBLE_DEVICES=0
GPUS_PER_NODE=1
NNODES=1

请问可能有什么原因造成呢？

但是得到了错误：
Traceback (most recent call last):
File "tune_cpm_ant.py", line 47, in
tune.run(data)
File "/home/shanhoo3/fkb/remote_project/cpm/cpm-live/examples/tune.py", line 222, in run
self.forward(train_dataloader, eval_dataloader, cls_num=self.cls_num)
File "/home/shanhoo3/fkb/remote_project/cpm/cpm-live/examples/tune.py", line 122, in forward
loss = self._forward(train_data, cls_num=cls_num)
File "/home/shanhoo3/fkb/remote_project/cpm/cpm-live/examples/tune.py", line 350, in _forward
loss = self.loss_function(logits, targets.view(-1))
File "/home/shanhoo3/anaconda3/envs/cpm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/shanhoo3/anaconda3/envs/cpm/lib/python3.8/site-packages/bmtrain/loss/cross_entropy.py", line 200, in forward
w = (target != self.ignore_index).int()
RuntimeError: CUDA error: no kernel image is available for execution on the device
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 16863) of binary: /home/shanhoo3/anaconda3/envs/cpm/bin/python
Traceback (most recent call last):
File "/home/shanhoo3/anaconda3/envs/cpm/bin/torchrun", line 33, in
sys.exit(load_entry_point('torch==1.10.1', 'console_scripts', 'torchrun')())
File "/home/shanhoo3/anaconda3/envs/cpm/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 345, in wrapper
return f(*args, **kwargs)
File "/home/shanhoo3/anaconda3/envs/cpm/lib/python3.8/site-packages/torch/distributed/run.py", line 719, in main
run(args)
File "/home/shanhoo3/anaconda3/envs/cpm/lib/python3.8/site-packages/torch/distributed/run.py", line 710, in run
elastic_launch(
File "/home/shanhoo3/anaconda3/envs/cpm/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/shanhoo3/anaconda3/envs/cpm/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 259, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

cpm ant++ tune训练之后infer不出想要结果

cpm ant++ 使用几十万级别的素材tune训练之后得到best.pt文件，但在推理阶段输入对应的input，得不到想要的结果，全是英文字符和---------这种符号，不知道哪块环节出了问题

CPM-Ant zero-shot支持Promt做文本续写吗？

TypeError: init() got an unexpected keyword argument 'scale'

AdamOffloadOptimizer is no such parameter: scale

self.optimizer = bmt.optim.AdamOffloadOptimizer(
model.parameters(), weight_decay=0.01, scale=1048576
)

Traceback (most recent call last):
File "tune_cpm_ant.py", line 33, in
tune = config_dict["tune"](
File "/home/shanhoo3/fkb/remote_project/cpm/cpm-live/examples/tune.py", line 291, in init
super().init(**kwargs)
File "/home/shanhoo3/fkb/remote_project/cpm/cpm-live/examples/tune.py", line 57, in init
self.optimizer = bmt.optim.AdamOffloadOptimizer(
TypeError: init() got an unexpected keyword argument 'scale'

Finetune task2任务无法基于已训练完的Finetune task1的best.pt继续训练？

我当前有两个任务，task1和task2，其中task2需要基于task1训练完的checkpoint来训练。但是发现task1训练完生成的best.pt可以在task1的infer过程中被加载，但是无法在task2的tune过程中加载，明明model的state_dict中的key和best.pt的state_dict中的key都是一致的，但是却报错有Unexpected key(s) in state_dict: "generator.encoder.layers.0.self_att.self_attention.project_q.lora.lora_A",请问下是什么原因导致的呢？

关于模型的问题

您好！非常赞你们的工作，看智源大会时候了解你们使用的结构是Transformer Encoder，请问你们是在训练的时候也会随机的加上attention-mask吗，还是说针对生成任务需要在微调？

CPM-Ant with bminf error

1 GPU
command : python3 text_generation.py --use-bminf --memory-limit 4
error :
Traceback (most recent call last):
File "text_generation.py", line 34, in
model = bminf.wrapper(model, quantization=False, memory_limit=args.memory_limit << 30)
File "/usr/local/lib/python3.8/dist-packages/bminf/wrapper.py", line 55, in wrapper
model, found_linear = _wrapper(model, quantization, False, memory_limit)
File "/usr/local/lib/python3.8/dist-packages/bminf/wrapper.py", line 30, in _wrapper
model._modules[kw], fd = _wrapper(model._modules[kw], quantization, is_module_list or is_in_blocklist, memory_limit)
File "/usr/local/lib/python3.8/dist-packages/bminf/wrapper.py", line 30, in _wrapper
model._modules[kw], fd = _wrapper(model._modules[kw], quantization, is_module_list or is_in_blocklist, memory_limit)
File "/usr/local/lib/python3.8/dist-packages/bminf/wrapper.py", line 34, in _wrapper
model = TransformerBlockList([
File "/usr/local/lib/python3.8/dist-packages/bminf/scheduler/init.py", line 382, in init
raise ValueError("Missing some parameters in layer %d" % i)
ValueError: Missing some parameters in layer 2

单卡环境，不用bminf可以正常运行，尝试加上bminf就报如上错误，求教

What is the difference between CPM-Ant+ and CPM-Ant?

![image](https://user-images.githubusercontent.com/92961574/205198889-81b48fa4-e5dd-4aa7-8a77-2251e9881c91.png)

    ![image](https://user-images.githubusercontent.com/92961574/205198889-81b48fa4-e5dd-4aa7-8a77-2251e9881c91.png)

代码配置如上所示，train.json9万条数据，eval9000条数据

Originally posted by @touwenameng in #254 (comment)

请问有支持英文且适配hugging face的CPM-Ant模型吗？

项目文档说明，还有模型使用说明太少了，例子也没有，太难了

OSError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root.

错误提示：
Collecting bmtrain==0.1.8.post1
Using cached bmtrain-0.1.8.post1.tar.gz (48 kB)
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error

× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [12 lines of output]
Traceback (most recent call last):
File "", line 2, in
File "", line 34, in
File "/tmp/pip-install-c0_v5dr2/bmtrain_c0a1c88cbb974a93b109264909d72dd8/setup.py", line 52, in
CUDAExtension('bmtrain.nccl._C', [
File "/home/bmxm/anaconda3/envs/apm-ant-plus/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 932, in CUDAExtension
library_dirs += library_paths(cuda=True)
File "/home/bmxm/anaconda3/envs/apm-ant-plus/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1040, in library_paths
if (not os.path.exists(_join_cuda_home(lib_dir)) and
File "/home/bmxm/anaconda3/envs/apm-ant-plus/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 2058, in _join_cuda_home
raise EnvironmentError('CUDA_HOME environment variable is not set. '
OSError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root.
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

大佬，这种CUDA版本貌似不支持，该如何处理呢

请问cpm-bee什么时候发布呀！！！

请使用微信扫码登录，不能用github

模力表格为啥非要用微信扫码登录，我真的不想拿起手机、现在手机坏了，有几个月没修，只充话费保号码，收不到短信
能不能用github登录
github不行的话，用账号密码登录吧
好多在github的网站或者开源软件都支持github登录的

使用cpm_ant_plus 来运行 scripts/CCPM_ddp.sh 报错

我使用cpm_ant_plus 来跑inference测试即text_generation.py没有问题，但是测试scripts/CCPM_ddp.sh的时候报错

python3 -m torch.distributed.launch --master_addr localhost --master_port 1234 --nproc_per_node 2 --nnodes 1 tune_cpm_ant.py --dataset-name CCPM --dataset-path cpm_ant_plus/CPM-Live/cpm-live/examples/data/oss_cuge/CCPM --output-path cpm_ant_plus/CPM-Live/cpm-live/examples/fintune_model/CCPM --model-path cpm_ant_plus/CPM-Live/cpm-live/model/cpm-ant-plus-10b.pt --config-path cpm_ant_plus/CPM-Live/cpm-live/model/cpm-ant-plus-10b.json --batch-size 32 --early-stop-patience 10 --eval-interval 50 --tune-maxlen 256 --lr 5e-3 --warmup-iters 50 --epochs 20 --infer-maxlen 1
/root/anaconda3/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated
and will be removed in future. Use torch.distributed.run.
Note that --use_env is set by default in torch.distributed.run.
If your script expects --local_rank argument to be set, please
change it to read from os.environ['LOCAL_RANK'] instead. See
https://pytorch.org/docs/stable/distributed.html#launch-utility for
further instructions

warnings.warn(
WARNING:torch.distributed.run:*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.

/root/anaconda3/lib/python3.8/site-packages/requests/init.py:109: RequestsDependencyWarning: urllib3 (1.26.14) or chardet (2.1.1)/charset_normalizer (2.1.1) doesn't match a supported version!
warnings.warn(
/root/anaconda3/lib/python3.8/site-packages/requests/init.py:109: RequestsDependencyWarning: urllib3 (1.26.14) or chardet (2.1.1)/charset_normalizer (2.1.1) doesn't match a supported version!
warnings.warn(
====================== Initialization ======================
rank : 0
local_rank : 0
world_size : 2
local_size : 2
master : localhost:1234
device : 0
cpus : [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1
3, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 2
4, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 3
5, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 4
6, 47]

====================== Initialization ======================
rank : 1
local_rank : 1
world_size : 2
local_size : 2
master : localhost:1234
device : 1
cpus : [48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58,
59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69,
70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80,
81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91,
92, 93, 94, 95]

root
├── encoder (Encoder)
│ ├── layers (TransformerBlockList)
│ │ └── 0-47(CheckpointBlock)
│ │ ├── self_att (SelfAttentionBlock)
│ │ │ ├── layernorm_before_attention (LayerNorm) weight:[0]
│ │ │ └── self_attention (Attention)
│ │ │ ├── project_q,project_v(Linear) weight:[0]
│ │ │ │ └── lora (LowRankLinear) lora_A:[8, 4096] lora_B:[4096, 8]
│ │ │ └── project_k,attention_out(Linear) weight:[0]
│ │ └── ffn (FFNBlock)
│ │ ├── layernorm_before_ffn (LayerNorm) weight:[0]
│ │ └── ffn (FeedForward)
│ │ ├── w_in (DenseGatedACT)
│ │ │ ├── w_0 (Linear) weight:[12587008]
│ │ │ └── w_1 (Linear) weight:[41943040]
│ │ └── w_out (Linear) weight:[41943040]
│ └── output_layernorm (LayerNorm) weight:[2048]
├── segment_embedding (Embedding) weight:[65536]
├── input_embedding (Embedding) weight:[179410944]
└── position_bias (SegmentPositionEmbedding) relative_attention_bias:[24576]
[INFO|(OpenDelta)basemodel:696]2023-02-17 19:26:12,415 >> Trainable Ratio: 6291456/4816502784=0.130623%
[INFO|(OpenDelta)basemodel:698]2023-02-17 19:26:12,415 >> Delta Parameter Ratio: 6291456/4816502784=0.130623%
[INFO|(OpenDelta)basemodel:700]2023-02-17 19:26:12,415 >> Static Memory 8.97 GB, Max Memory 9.63 GB
root
├── encoder (Encoder)
│ ├── layers (TransformerBlockList)
│ │ └── 0-47(CheckpointBlock)
│ │ ├── self_att (SelfAttentionBlock)
│ │ │ ├── layernorm_before_attention (LayerNorm) weight:[4096]
│ │ │ └── self_attention (Attention)
│ │ │ ├── project_q,project_v(Linear) weight:[16777216]
│ │ │ │ └── lora (LowRankLinear) lora_A:[8, 4096] lora_B:[4096, 8]
│ │ │ └── project_k,attention_out(Linear) weight:[16777216]
│ │ └── ffn (FFNBlock)
│ │ ├── layernorm_before_ffn (LayerNorm) weight:[4096]
│ │ └── ffn (FeedForward)
│ │ ├── w_in (DenseGatedACT)
│ │ │ ├── w_0 (Linear) weight:[29356032]
│ │ │ └── w_1 (Linear) weight:[0]
│ │ └── w_out (Linear) weight:[0]
│ └── output_layernorm (LayerNorm) weight:[2048]
├── segment_embedding (Embedding) weight:[65536]
├── input_embedding (Embedding) weight:[179410944]
└── position_bias (SegmentPositionEmbedding) relative_attention_bias:[24576]
[INFO|(OpenDelta)basemodel:696]2023-02-17 19:26:12,508 >> Trainable Ratio: 6291456/4816502784=0.130623%
[INFO|(OpenDelta)basemodel:698]2023-02-17 19:26:12,509 >> Delta Parameter Ratio: 6291456/4816502784=0.130623%
[INFO|(OpenDelta)basemodel:700]2023-02-17 19:26:12,509 >> Static Memory 8.97 GB, Max Memory 10.30 GB
[INFO] Tuning begins...
Traceback (most recent call last):
File "tune_cpm_ant.py", line 47, in
tune.run(data)
File "/search/ai/kaitongyang/cpm_ant_plus/CPM-Live/cpm-live/examples/tune.py", line 220, in run
self.forward(train_dataloader, eval_dataloader, cls_num=self.cls_num)
File "/search/ai/kaitongyang/cpm_ant_plus/CPM-Live/cpm-live/examples/tune.py", line 121, in forward
global_loss = bmt.sum_loss(loss).item()
File "/root/anaconda3/lib/python3.8/site-packages/bmtrain/synchronize.py", line 34, in sum_loss
return distributed.all_reduce(loss, "avg")
File "/root/anaconda3/lib/python3.8/site-packages/bmtrain/distributed/ops.py", line 92, in all_reduce
return OpAllReduce.apply(x, op)
File "/root/anaconda3/lib/python3.8/site-packages/bmtrain/distributed/ops.py", line 50, in forward
ncclAllReduce(
File "/root/anaconda3/lib/python3.8/site-packages/bmtrain/nccl/init.py", line 118, in allReduce
C.ncclAllReduce(
RuntimeError: NCCL Error: invalid argument
Traceback (most recent call last):
File "tune_cpm_ant.py", line 47, in
tune.run(data)
File "/search/ai/kaitongyang/cpm_ant_plus/CPM-Live/cpm-live/examples/tune.py", line 220, in run
self.forward(train_dataloader, eval_dataloader, cls_num=self.cls_num)
File "/search/ai/kaitongyang/cpm_ant_plus/CPM-Live/cpm-live/examples/tune.py", line 121, in forward
global_loss = bmt.sum_loss(loss).item()
File "/root/anaconda3/lib/python3.8/site-packages/bmtrain/synchronize.py", line 34, in sum_loss
return distributed.all_reduce(loss, "avg")
File "/root/anaconda3/lib/python3.8/site-packages/bmtrain/distributed/ops.py", line 92, in all_reduce
return OpAllReduce.apply(x, op)
File "/root/anaconda3/lib/python3.8/site-packages/bmtrain/distributed/ops.py", line 50, in forward
ncclAllReduce(
File "/root/anaconda3/lib/python3.8/site-packages/bmtrain/nccl/init.py", line 118, in allReduce
C.ncclAllReduce(
RuntimeError: NCCL Error: invalid argument
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 308683) of binary: /root/anaconda3/bin/python3
Traceback (most recent call last):
File "/root/anaconda3/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/root/anaconda3/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/root/anaconda3/lib/python3.8/site-packages/torch/distributed/launch.py", line 193, in
main()
File "/root/anaconda3/lib/python3.8/site-packages/torch/distributed/launch.py", line 189, in main
launch(args)
File "/root/anaconda3/lib/python3.8/site-packages/torch/distributed/launch.py", line 174, in launch
run(args)
File "/root/anaconda3/lib/python3.8/site-packages/torch/distributed/run.py", line 689, in run
elastic_launch(
File "/root/anaconda3/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 116, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/root/anaconda3/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 244, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

     tune_cpm_ant.py FAILED

=======================================
Root Cause:
[0]:
time: 2023-02-17_19:26:16
rank: 0 (local_rank: 0)
exitcode: 1 (pid: 308683)
error_file: <N/A>
msg: "Process failed with exitcode 1"

Other Failures:
[1]:
time: 2023-02-17_19:26:16
rank: 1 (local_rank: 1)
exitcode: 1 (pid: 308684)
error_file: <N/A>
msg: "Process failed with exitcode 1"

CPM-Ant 和 CPM-Bee 的模型权重License的问题

Hi all,

请问在哪里可以看到这两个模型的模型权重的具体license嘛

cannot import name 'CPMAntTokenizer' from 'transformers'

参照hugging face openbmb/cpm-ant-10b model card上的描述

from transformers import CPMAntTokenizer, CPMAntModel
texts = ["今天天气真好！"]
model = CPMAntModel.from_pretrained("openbmb/cpm-ant-10b")
tokenizer = CPMAntTokenizer.from_pretrained("openbmb/cpm-ant-10b")
input_ids = tokenizer.get_model_input(texts)
logits, hidden = model(**input_ids)
print(logits.shape)
print(hidden.shape)

得到如下错误
cannot import name 'CPMAntTokenizer' from 'transformers'
ImportError: cannot import name 'CPMAntModel' from 'transformers'
已经transformers更新至最新版本（4.24.0），请问是怎么回事呢

CUDA out of memory

你好，如果不使用bminf，text_generation.py中默认的显存是12G，我现在显卡是16G的v100，在使用默认配置运行text_generation.py时显示cuda out of memory，请问是什么原因呢

请问你说**人吗？

如果不是就当我没问，如果是请你写上中文readme再来推广！！！

Empty Dataset in distributed mode

Hi, I'm trying to fully-finetuning on CPM-ANT+. I followed the instructions provided in readme, using the preprocess_dataset.py to generate the binary data file. But it seems that when world_size > 1 (in distributed mode), the read() method in DistributedDataset will raise an error "Empty Dataset", while the data will be successfully read in single node mode. Could you help me fix it? Thanks.

CPM-Live/cpm-live/pretrain_cpm_ant_plus.py

Line 427 in e0cee47

DistributedDataset("path/to/binary/file", bmt.rank(), bmt.world_size()),

about serving

I want to serving a cpm-plus-10b model,but all failed.
when I use bminf wrapper,it's too slow,and always oom;since I have 4 gpus, then I try deepspeed to wrapper model,but failed too.
is there any serving example code ?

我想拿ant+模型获取句向量，然后做一些下游业务。官方有相应的方案吗？

现在基于ant+基础模型之上又做了一些业务相关的预训练，然后用了加权平均的方案得到句向量。

从效果上看有一定的正向关系，但是没有达到预期效果（有点摸棱两可的感觉）。所以这里想咨询一下官方有没有对应的方案？

[bug] Inconsistent padding_side when tuning and generation, get wrong results when inference with batch size larger than 1

I am adapting CPM-Ant to a NLG task using LoRA. I find there maybe a bug in the code. When tuning, the padding_side is set to "right", e.g. in tune.py:

    padded[key] = pad(items, key, _padding_value, padding_side="right")

however, when generation, the padding_size is set to "left", e.g., in generation/ant.py:

   padded[key] = pad(input_tensors, key, padding_side='left')

This inconsistency may degrade the performance of model when conducting model inference with batch size larger than 1.

无法在ubuntu上解压提供的大模型压缩包

尝试下载了CPM-Ant 3b、7b、10b模型，但是全部无法在windows和ubuntu上解压。300M的可以，麻烦check一下大模型是否能正常在ubuntu上解压。

你好，请问“data_bin_new”下的文件格式是什么样子的？

你好，请问“data_bin_new”下的文件格式是什么样子的？能给个示例吗？
学习下CPM_Live的代码。

[New Feature]Any idea for chatCPM like chatgpt 😄

As mentioned in the title, since chatgpt is now publicly accessible, it is possible that we can obtain enough high-quality corpus for training cpm-ant or cpm-bee through chatgpt. 😄

Some useful information:

请问torchrun 后台运行需要怎么写呢

lora+cpm方式在部署时解藕问题

lora+cpm方式在部署时可以解藕么？
即cpm部署在A机器，lora部署在B机器的场景

多卡fine-tune和inference

请问多卡fine-tune和inference是否有相应的代码呢

怎么推理啊

readme给出的代码，ckpt在哪里下载

from cpm_live.models import CPMAnt, CPMAntConfig
import bmtrain as bmt

bmt.init_distributed(seed=0)
config = CPMAntConfig.from_json_file("YOUR_PATH/cpm-ant-10b.json")
ckpt_path = "YOUR_PATH/cpm-ant-10b.pt"
# You can load the compressed models in the same way! 
# config = CPMAntConfig.from_json_file("YOUR_PATH/cpm-ant-3b.json")
# ckpt_path = "YOUR_PATH/cpm-ant-3b.pt"

model = CPMAnt(config=config)
bmt.load(model, ckpt_path)

怎么在使用bmtrain训练的时候读取已经训练好的增量微调的权重？？

我尝试了多种方式：
1 model.load_state_dict(torch.load(args.model_path), strict=False)
2 bmt.load(model, args.LoRA_path,strict=False)
但是，打印模型参数后发现并没有被读取进去。为什么发生这样的情况。

# load model
bmt.init_distributed(seed=0)
config = CPMAntConfig.from_json_file(args.config_path)
model = CPMAntPlus(config=config)
bmt.load(model, args.model_path,strict=True)

# insert LoRA
#delta_model = AutoDeltaModel.from_finetuned(args.LoRA_path, backbone_model=model)
delta_model = LoraModel(backbone_model=model, modified_modules=["project_q", "project_v"], backend="bmt")
delta_model.freeze_module(exclude=["deltas"], set_state_dict=True)

Puzzled in mask operation

Thank you for your good work. However, I have some doubts about the following code： (Source in ant_torch.py#L144 when I ran Ant model.)

What is the main logic of this part？ I did not get it.
when inferencing, context is all set to True, and span is all set to 0 on _convert_to_tensors, and it seems mask is all to 1 after the following code. So what do those codes?

with torch.no_grad():
      device = input.device
      directional_mask_2d = torch.arange(seqlen, device=device) <= torch.arange(
          seqlen, device=device
      ).view(-1, 1)
      attention_mask = context[:, None, :] | (
          context[:, :, None].logical_not() & directional_mask_2d.view(1, seqlen, seqlen)
      )
      attention_mask = attention_mask & (span[:, None, :] == span[:, :, None])
      mask_1d = (
          torch.arange(seqlen, device=device)[None, :].repeat(batch, 1) < length[:, None]
      )
      attention_mask = (
          mask_1d.view(batch, seqlen, 1) & mask_1d.view(batch, 1, seqlen) & attention_mask
      )

请问基于CPM-Ant的智取标题模型有开源计划吗

经过在线测试，感觉效果不错。但没有对应的语料很难finetune出类似效果的模型。
所以请问智取标题这个模型有开源么

openbmb / cpm-live Goto Github PK

cpm-live's People

Contributors

Stargazers

Watchers

Forkers

cpm-live's Issues

note: This error originates from a subprocess, and is likely not a problem with pip. error: metadata-generation-failed

======================================= Root Cause: [0]: time: 2023-02-17_19:26:16 rank: 0 (local_rank: 0) exitcode: 1 (pid: 308683) error_file: <N/A> msg: "Process failed with exitcode 1"

Recommend Projects

Recommend Topics

Recommend Org

Jobs

note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

=======================================
Root Cause:
[0]:
time: 2023-02-17_19:26:16
rank: 0 (local_rank: 0)
exitcode: 1 (pid: 308683)
error_file: <N/A>
msg: "Process failed with exitcode 1"