git下载后执行run.py报错说bert-base-chinese找不到，我去hf找到google的bert-base-chinese模型下载到本地后执行run.py成功

感谢大侠热心答复：—）我是把train.yaml里的taskname改为role后再运行了run.py后再运行了predi

example下的ee执行predict.py报错 about deepke HOT 10 CLOSED

zhuweigang commented on July 30, 2024

example下的ee执行predict.py报错

from deepke.

Comments (10)

shengyumao commented on July 30, 2024

您好，当出现RuntimeError: CUDA error: device-side assert triggered这个错误的时候，一般是指最后分类头预测的维度和测试数据集中标签的数量不一致。
请您检查一下predict.yaml中您设置的任务类型（i.e. task_name参数中是trigger还是role）与所训练的模型是否一致，因为predict.yaml中默认的任务类型为role（trigger的eval在训练的过程中已经同时执行了），而train.yaml中设定的默认任务类型为trigger。如果您需要对role进行evaluation的话，需要在train.yaml中修改task_name为role，再另外训练一个模型后再进行evaluation。
如果上述方案没能解决您的问题，请提供一下具体的运行参数（例如配置文件等）～

from deepke.

zxlzr commented on July 30, 2024

请问您还有其他问题吗？

from deepke.

zhuweigang commented on July 30, 2024

感谢大侠热心的的答复，我按照建议把train.yaml的task_name由trigger改成role之后报另外一个错误了，这次不是cuda错误：

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues

bin /root/anaconda3/envs/deepke/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda118.so
/root/anaconda3/envs/deepke/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /root/anaconda3/envs/deepke did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
warn(msg)
CUDA SETUP: CUDA runtime path found: /usr/local/cuda-11.8/lib64/libcudart.so.11.0
CUDA SETUP: Highest compute capability among GPUs detected: 8.6
CUDA SETUP: Detected CUDA version 118
CUDA SETUP: Loading binary /root/anaconda3/envs/deepke/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda118.so...
/root/anaconda3/envs/deepke/lib/python3.9/site-packages/hydra/plugins/config_source.py:190: UserWarning:
Missing @Package directive train.yaml in file:///root/DeepKE/example/ee/standard/conf.
See https://hydra.cc/docs/next/upgrades/0.11_to_1.0/adding_a_package_directive
warnings.warn(message=msg, category=UserWarning)
Traceback (most recent call last):
File "/root/DeepKE/example/ee/standard/predict.py", line 49, in main
args.dev_trigger_pred_file = os.path.join(args.cwd, args.dev_trigger_pred_file) if args.do_pipeline_predict and args.task_name=="role" else None
File "/root/anaconda3/envs/deepke/lib/python3.9/posixpath.py", line 90, in join
genericpath._check_arg_types('join', a, *p)
File "/root/anaconda3/envs/deepke/lib/python3.9/genericpath.py", line 152, in _check_arg_types
raise TypeError(f'{funcname}() argument must be str, bytes, or '
TypeError: join() argument must be str, bytes, or os.PathLike object, not 'NoneType'

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

from deepke.

zhuweigang commented on July 30, 2024

另外我想确定一下，我去huggingface下载google下的bert-base-chinese模型给ee例子用，这样做行不行的？

from deepke.

shengyumao commented on July 30, 2024

感谢大侠热心的的答复，我按照建议把train.yaml的task_name由trigger改成role之后报另外一个错误了，这次不是cuda错误：

===================================BUG REPORT=================================== Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues

bin /root/anaconda3/envs/deepke/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda118.so /root/anaconda3/envs/deepke/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /root/anaconda3/envs/deepke did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths... warn(msg) CUDA SETUP: CUDA runtime path found: /usr/local/cuda-11.8/lib64/libcudart.so.11.0 CUDA SETUP: Highest compute capability among GPUs detected: 8.6 CUDA SETUP: Detected CUDA version 118 CUDA SETUP: Loading binary /root/anaconda3/envs/deepke/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda118.so... /root/anaconda3/envs/deepke/lib/python3.9/site-packages/hydra/plugins/config_source.py:190: UserWarning: Missing @Package directive train.yaml in file:///root/DeepKE/example/ee/standard/conf. See https://hydra.cc/docs/next/upgrades/0.11_to_1.0/adding_a_package_directive warnings.warn(message=msg, category=UserWarning) Traceback (most recent call last): File "/root/DeepKE/example/ee/standard/predict.py", line 49, in main args.dev_trigger_pred_file = os.path.join(args.cwd, args.dev_trigger_pred_file) if args.do_pipeline_predict and args.task_name=="role" else None File "/root/anaconda3/envs/deepke/lib/python3.9/posixpath.py", line 90, in join genericpath._check_arg_types('join', a, *p) File "/root/anaconda3/envs/deepke/lib/python3.9/genericpath.py", line 152, in _check_arg_types raise TypeError(f'{funcname}() argument must be str, bytes, or ' TypeError: join() argument must be str, bytes, or os.PathLike object, not 'NoneType'

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

您好，从报错信息来看这里你修改参数后是直接运行的predict.py文件？您将train.yaml中的taskname改为role了之后需要再次运行python run.py，去训练一个事件元素抽取模型，我们在README中有提到对于事件抽取任务，需要训练两个阶段的模型。
另外从报错信息里头来看似乎是您把predict.yaml中的dev_trigger_pred_file参数删除了或设置为空，默认参数为./exp/DuEE/trigger/bert-base-chinese/eval_pred.json，这个是在运行trigger模型训练的过程中得到的触发词预测结果，用于后续继续进行pipeline的事件元素抽取。请您再次检查一下。

另外我想确定一下，我去huggingface下载google下的bert-base-chinese模型给ee例子用，这样做行不行的？

从huggingface上下载是没有问题的。

from deepke.

zhuweigang commented on July 30, 2024

感谢大侠热心答复：—）

我是把train.yaml里的taskname改为role后再运行了run.py后再运行了predict.py得到的这个报错结果
predict.yaml中的dev_trigger_pred_file参数我没有动，还是./exp/DuEE/trigger/bert-base-chinese/eval_pred.json，而且这个文件是存在的

from deepke.

zhuweigang commented on July 30, 2024

附上两个配置文件哈

************** train.yaml ******************

data_name: DuEE # [ACE, DuEE]
model_name_or_path: bert-base-chinese # [bert-base-uncased, bert-base-chinese] english for ace, chinese for duee
#task_name: trigger # [trigger, role]
task_name: role
model_type: bertcrf
do_train: True
do_eval: True
do_predict: False # True for ACE, False for DuEE
labels: ""
config_name: ""
tokenizer_name: ""
cache_dir: ""
evaluate_during_training: True
do_lower_case: True
weight_decay: 0.0
learning_rate: 5e-5
adam_epsilon: 1e-8
per_gpu_train_batch_size: 16
per_gpu_eval_batch_size: 16
gradient_accumulation_steps: 1
max_seq_length: 256
max_grad_norm: 1.0
num_train_epochs: 5
max_steps: 500
warmup_steps: 0
logging_steps: 500
save_steps: 500
eval_all_checkpoints: False
no_cuda: False
n_gpu: 0
overwrite_output_dir: True
overwrite_cache: True
seed: 42
fp16: False
fp16_opt_level: "01"
local_rank: -1
data_dir: "" # parsing in run.py
tag_path: "" # parsing in run.py
output_dir: "" # parsing in run.py
dev_trigger_pred_file: null
test_trigger_pred_file: null

*************** predict.yaml ***************

defaults:

train

data_name: DuEE # [ACE, DuEE]
model_name_or_path: ./exp/DuEE/role/bert-base-chinese
task_name: role # the trigger prediction is done during the training process.
do_train: False
do_eval: True
do_predict: False # True for ACE, False for DuEE

do_pipeline_predict: True
overwrite_cache: True

dev_trigger_pred_file: ./exp/DuEE/trigger/bert-base-chinese/eval_pred.json # change to your pred file of trigger classification
test_trigger_pred_file: ./exp/DuEE/trigger/bert-base-chinese/test_pred.json

from deepke.

shengyumao commented on July 30, 2024

您好，这里的报错为TypeError: join() argument must be str, bytes, or os.PathLike object, not 'NoneType'，即args.cwd和 args.dev_trigger_pred_file中有对象为空，你可以尝试打个断点或者将相关变量打印出来看看，我在自己的环境下重新运行了一下，默认参数下predict.py L49并没有报错，可以看看在你的环境下变量错在哪儿。

from deepke.

zhuweigang commented on July 30, 2024

感谢指导，不知道我的yaml文件什么毛病，现在搞定了dev_trigger_pred_file和test_trigger_pred_file参数为空的问题后报了下面的错误，方便的话可以加我微信一起看一下吗？非常感谢

[2024-06-24 17:34:19,321][deepke.event_extraction.standard.bertcrf.processor_ee][INFO] - LOOKING AT /root/DeepKE/example/ee/standard/./data/DuEE/role/dev_with_pred_trigger.tsv train
[2024-06-24 17:34:19,345][run][INFO] - Creating features from dataset file at /root/DeepKE/example/ee/standard/./data/DuEE/role
###############
[2024-06-24 17:34:19,345][deepke.event_extraction.standard.bertcrf.processor_ee][INFO] - Writing example 0 of 2015
###############
[2024-06-24 17:34:23,558][run][INFO] - Saving features into cached file /root/DeepKE/example/ee/standard/./data/DuEE/role/cached_dev_bert-base-chinese_256
[2024-06-24 17:34:24,415][run][INFO] - ***** Running evaluation *****
[2024-06-24 17:34:24,416][run][INFO] - Num examples = 2015
[2024-06-24 17:34:24,416][run][INFO] - Batch size = 16
[2024-06-24 17:34:24,416][run][INFO] - Mode = dev
Evaluating: 0%| | 0/126 [00:00<?, ?it/s]../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [0,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [1,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [2,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [3,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [4,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [5,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [6,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [7,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [8,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [9,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [10,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [11,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [12,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [13,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [14,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [15,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
Evaluating: 0%| | 0/126 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/root/DeepKE/example/ee/standard/predict.py", line 124, in main
result, eval_pred_list = evaluate(args, model, eval_dataset, tokenizer, labels, pad_token_label_id, mode="dev", device=device)
File "/root/DeepKE/example/ee/standard/run.py", line 219, in evaluate
outputs = model(pad_token_label_id=pad_token_label_id, **inputs)
File "/root/anaconda3/envs/deepke/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/anaconda3/envs/deepke/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/root/DeepKE/src/deepke/event_extraction/standard/bertcrf/bert_crf.py", line 89, in forward
loss = self.crf.neg_log_likelihood(crf_logits, crf_mask, crf_labels)
File "/root/DeepKE/src/deepke/event_extraction/standard/bertcrf/crf.py", line 273, in neg_log_likelihood
gold_score = self._score_sentence(scores, mask, tags)
File "/root/DeepKE/src/deepke/event_extraction/standard/bertcrf/crf.py", line 258, in _score_sentence
tg_energy = tg_energy.masked_select(mask.transpose(1, 0))
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

from deepke.

zhuweigang commented on July 30, 2024

ee的问题解决了，是因为hydra-core版本不是1.3.1的缘故，感谢大神

from deepke.

example下的ee执行predict.py报错 about deepke HOT 10 CLOSED

Comments (10)

and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues

and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs