cooelf / awesomemrc Goto Github PK

View Code? Open in Web Editor NEW

360.0 360.0 69.0 2.13 MB

IJCAI 2021 Tutorial & code for Retrospective Reader for Machine Reading Comprehension (AAAI 2021)

Home Page: https://arxiv.org/abs/2005.06249

Python 99.93% Shell 0.07%

question-answering reading-comprehension transformers

awesomemrc's People

Contributors

Stargazers

Watchers

awesomemrc's Issues

ERROR when run sh_albert_cls.sh

你好，albert的预训练模型，我是从google下载的，然后用convert_albert_original_tf_checkpoint_to_pytorch.py这个转的，
转之后的文件是下面的形式：

当我运行 sh_albert_cls.sh的时候，出现了下面的错误

请问这个是什么原因导致的呀

sh_albert_cls.sh get a weired score 0.499

Has anyone ran sh_albert_cls.sh and got a strange score, keeping 0.499 unchanged

Is this the self-defined file which do not be uploaded?

执行 ./sh_albert_cls.sh 时遇到错误：ValueError: Task not found: squad

非常感谢您开源了论文相关的源代码！不过在复现您的论文时，我遇到了一些问题，十分感谢您拨冗为我解答。

环境

我根据 transformers 中的指示，通过源代码安装了 transformer v2.3.0 ，并安装了 examples 中所需要的相关依赖。

错误

但是当我执行 ./sh_albert_cls.sh 时，得到了以下错误信息：

Traceback (most recent call last):
  File "./examples/run_cls.py", line 645, in <module>
    main()
  File "./examples/run_cls.py", line 498, in main
    raise ValueError("Task not found: %s" % (args.task_name))
ValueError: Task not found: squad

排查

我发现在 sh_albert_cls.sh 中，指定了 export TASK_NAME=squad 并且执行 python ./examples/run_cls.py 时带上了参数 --task_name $TASK_NAME 。
但是当我执行 python ./examples/run_cls.py -h 时，得到了以下输出：

  --task_name TASK_NAME
                        The name of the task to train selected in the list: cola, mnli, mnli-mm, mrpc, sst-2, sts-b, qqp, qnli, rte, wnli

即似乎 python ./examples/run_cls.py 并不支持 --task_name squad 作为参数？

问题

十分疑惑我是在哪一步出现了问题，提前感谢您的任何回复！

关于运行代码的电脑配置问题，我这边2080Ti总是爆显存

你好，我把batch_size都设置为1了，还是会爆显存，显存有10个G，请问是我这边硬件不行呢，还是代码哪有问题

关于复现Retro-Reader

你好，我读了你们的Retrospective Reader论文，在Rear Verification节有一个疑惑，公式(12)的\hat{y}和\bar{y}分别指什么？我看了你们的代码（run_cls.py，run_squad_av.py和run_verifier.py），貌似没找到对应公式(12)的代码

I am not able to run the model for inference. I am trying to run model1 from codalab using sh.albert script but the error says the model is corrupted. Please help

Issue with reference to Task Name

Can you please help me in reference to how we need to specify task name and what it is.

Please help me with step by step implementation of the code.

Its not able to take squad when i run sh_albert_cls.sh

问题描述：
RuntimeError: Found param albert.embeddings.word_embeddings.weight with type torch.FloatTensor, expected torch.cuda.FloatTensor.
When using amp.initialize, you need to provide a model with parameters
located on a CUDA device before passing it no matter what optimization level
you chose. Use model.to('cuda') to use the default device.
问题描述来自AwesomeMRC/transformer-mrc/examples/run_squad_av.py的90行。

https://github.com/ThilinaRajapakse/simpletransformers/issues/32与我问题相同，他提供一种解决方法，但是只有一行代码，我想问一下这是否可行，这行代码应该加在哪里？
如果这种方法不可行的话，有没有其他的方法。

out of memory error even on colab

I fixed every env problem to get the code to run on colab, even with 14G memory, it still crashes with

RuntimeError: CUDA out of memory. Tried to allocate 192.00 MiB (GPU 0; 14.76 GiB total capacity; 13.46 GiB already allocated; 17.75 MiB free; 13.71 GiB reserved in total by PyTorch)

I moved to colab in the first place, because I thought my own VT100 may be too small. I cut the batch size all the way to 1 and it still crashes with oom.

How can this code use so much memory?

how to fix this?

各位大佬帮忙看看我这儿为什么报错，谢谢

Error when running ./sh_albert_av.sh

File "./examples/run_squad_av.py", line 19, in
from transformers.data.processors.squad import SquadV1Processor, SquadV2Processor, SquadResult
ModuleNotFoundError: No module named 'transformers'

Once I install Huggingface transformers library it says

2020-06-01 14:44:44.649987: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
Traceback (most recent call last):
File "./examples/run_squad_av.py", line 21, in
from examples.evaluate_official2 import eval_squad
ModuleNotFoundError: No module named 'examples.evaluate_official2'

关于 question 和 context 的左右关系

您好!
关于 sequence_output = self.albert_att(context_sequence_output, sequence_output, context_attention_mask) #这里context_sequence_output是question（第一个序列）
这里我不太明白,请问原始 feature 中 question 在左 context 在右对吧?
"context_sequence_output是question"是指经过 split_ques_context 翻转了吗?
多谢了!

tpu使用问题

你好，请问你的代码支持tpu吗

TypeError: unsupported operand type(s) for +=: 'float' and 'list'

I try to run the run_verifier.py file but I got this error.

python run_verifier.py
Traceback (most recent call last):
  File "run_verifier.py", line 85, in <module>
    main()
  File "run_verifier.py", line 82, in main
    get_score1(args)
  File "run_verifier.py", line 40, in get_score1
    mean_score += score
TypeError: unsupported operand type(s) for +=: 'float' and 'list'

Do you guys know what happens?

项目的依赖环境

你好，能否提供这个项目对应的依赖环境

Can you post requirment.txt

When will the code be released?

Hi there, I am a Lead researcher in NLMatics. I am monitoring the squad2 leaderboard continuously and noticed your work since later January. It is a remarkable achievement and I am eager to try it out. Would you please let me know when the code will be released? Thanks -- Yi

梯度爆炸问题

我在运行sh_albert_av.sh时,最后报错提示"No module named 'amp_C'",并且还会梯度爆炸,请问是什么原因,该如何解决呢?
错误如下:

Selected optimization level O1: Insert automatic casts around Pytorch functions and Tensor methods.
Defaults for this optimization level are:
enabled : True
opt_level : O1
cast_model_type : None
patch_torch_functions : True
keep_batchnorm_fp32 : None
master_weights : None
loss_scale : dynamic
Processing user overrides (additional kwargs that are not None)...
After processing overrides, optimization options are:
enabled : True
opt_level : O1
cast_model_type : None
patch_torch_functions : True
keep_batchnorm_fp32 : None
master_weights : None
loss_scale : dynamic
Warning: multi_tensor_applier fused unscale kernel is unavailable, possibly because apex was installed without --cuda_ext --cpp_ext. Using Python fallback. Original ImportError was: ModuleNotFoundError("No module named 'amp_C'",)
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 32768.0
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 16384.0
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 8192.0
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 8192.0
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4096.0
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 8192.0
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4096.0
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 8192.0
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4096.0

code running question

Hi,

I ran sh sh_albert_cls.sh but an error is raised because squad is missing:

squad appears when I tested in the console, so I'm using the same transformers lib (2.3.0):

Could you please help me out? Thanks!

运行run_squad_av.py时报错 KeyError: '56ddde6b9a695914005b9629'

运行run_squad_av.py时，出现很多Missing prediction（“Missing prediction for 56ddde6b9a695914005b9629"），然后就报错了。
('exact', 100.0 * sum(exact_scores[k] for k in qid_list) / total),
KeyError: '56ddde6b9a695914005b9629'
我用vs code调试，问题定位到evaluate_official2.py中make_eval_dict函数中的这行代码return collections.OrderedDict()。
return collections.OrderedDict([
('exact', 100.0 * sum(exact_scores[k] for k in qid_list) / total),
('f1', 100.0 * sum(f1_scores[k] for k in qid_list) / total),
('total', total),
qid_list有5000多个对象，而exact-scores只有35个对象，所以计算sum()时，会出现k在exact_scores里没有对应的值，程序就报错了。
但是evaluate_official2.py好像是官方的代码，请问这怎么解决？还是我其他地方调错了。求指导！！！

PS: run_cls.py正常运行，基础环境应该没问题，run_squad_av.py的参数设置如下：
"args":[
"--model_type","albert",
"--model_name_or_path","albert-base-v2",
"--do_train",
"--do_eval",
"--do_lower_case",
"--version_2_with_negative",
"--data_dir","/home/amax/anaconda3/AwesomeMRC/transformer-mrc/data/",
"--train_file","/home/amax/anaconda3/AwesomeMRC/transformer-mrc/data/train-v2.0.json",
"--predict_file","/home/amax/anaconda3/AwesomeMRC/transformer-mrc/data/dev-v2.0.json",
"--learning_rate","2e-5",
"--num_train_epochs","2",
"--max_seq_length","512",
"--doc_stride","128",
"--max_query_length","64",
"--per_gpu_train_batch_size","6",
"--per_gpu_eval_batch_size","8",
"--warmup_steps","814",
"--output_dir","squad/squad2_albert-base-v2_lr2e-5_len512_bs48_ep2_wm814_av_ce_fp16",
"--eval_all_checkpoints",
"--save_steps","2500",
"--n_best_size","20",
"--max_answer_length","30",
"--fp16",
"--overwrite_output_dir"
]

fp16 compatibility

I am running sh_albert_cls.sh. It crashed with

Iteration: 0%| | 0/10860 [00:03<?, ?it/s]
Epoch: 0%| | 0/2 [00:03<?, ?it/s]
Traceback (most recent call last):
File "./examples/run_cls.py", line 645, in
main()
File "./examples/run_cls.py", line 533, in main
global_step, tr_loss = train(args, train_dataset, model, tokenizer)
File "./examples/run_cls.py", line 159, in train
outputs = model(**inputs)
File "/data/anaconda3/envs/mrc/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/data/anaconda3/envs/mrc/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 161, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/data/anaconda3/envs/mrc/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 171, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/data/anaconda3/envs/mrc/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 86, in parallel_apply
output.reraise()
File "/data/anaconda3/envs/mrc/lib/python3.7/site-packages/torch/_utils.py", line 428, in reraise
raise self.exc_type(msg)
StopIteration: Caught StopIteration in replica 0 on device 0.
Original Traceback (most recent call last):
File "/data/anaconda3/envs/mrc/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker
output = module(*input, **kwargs)
File "/data/anaconda3/envs/mrc/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/bruce/AwesomeMRC/transformer-mrc/transformers/modeling_albert.py", line 688, in forward
inputs_embeds=inputs_embeds
File "/data/anaconda3/envs/mrc/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/bruce/AwesomeMRC/transformer-mrc/transformers/modeling_albert.py", line 524, in forward
extended_attention_mask = extended_attention_mask.to(dtype=next(self.parameters()).dtype) # fp16 compatibility
StopIteration

I commented out the argument

--fp16

But still get the same error.
The messages are really not telling me much what's wrong. Any ideas?

Run Squad Predictions with pre-trained model

Hello -

This is an amazing project. But a quick question.
Is there any straightforward way to use a pre-trained model and run the predictions on a text? For eg:

Something like:

python run_squad.py --model_type="xlnet" --model_name="XLNetForQuestionAnswering" --context="There are 29 countries and 1983 states in the world" --question="How many states are there in the world?"

Or any other way that I can use the pre-trained model to run on the text I provide?

{
  "exact": 85.48808220331846,
  "f1": 88.88667867457023,
  "total": 11873,
  "HasAns_exact": 83.23211875843455,
  "HasAns_f1": 90.03905801335617,
  "HasAns_total": 5928,
  "NoAns_exact": 87.73759461732548,
  "NoAns_f1": 87.73759461732548,
  "NoAns_total": 5945
}

As opposed to these ones you reported:

{
  "exact": 87.75372694348522, 
  "f1": 90.91630165754992, 
  "total": 11873, 
  "HasAns_exact": 83.1140350877193, 
  "HasAns_f1": 89.4482539777485, 
  "HasAns_total": 5928, 
  "NoAns_exact": 92.38015138772077, 
  "NoAns_f1": 92.38015138772077, 
  "NoAns_total": 5945
  }

I ran these commands:

./sh_albert_cls.sh
./sh_albert_av.sh
python3 run_verifier.py --input_null_files="squad/cls_squad2_albert-xxlarge-v2_lr2e-5_len512_bs48_ep2_wm814/cls_score.json,squad/squad2_albert-xxlarge-v2_lr2e-5_len512_bs48_ep2_wm814_av_ce_fp16/null_odds_5_len512_bs48_ep2_wm814_av_ce_fp16.json" --input_nbest_files="squad/squad2_albert-xxlarge-v2_lr2e-5_len512_bs48_ep2_wm814_av_ce_fp16/nbest_predictions_5_len512_bs48_ep2_wm814_av_ce_fp16.json"
python3 evaluate-v2.0.py data/dev-v2.0.json predictions.json

Do you have any idea what may have went wrong? Thanks a lot.

We can not access the [model] at CodaLab for reproduction, can you provide the trained weights? Thank you!

Is there any one I can use this model for inference?

I want to apply this model to get an answer from SQUAD like question answering, is there any way of doing? Can you just give me a direction on how to do that?

cooelf / awesomemrc Goto Github PK

awesomemrc's People

Contributors

Stargazers

Watchers

Forkers

awesomemrc's Issues

环境

错误

排查

问题

--fp16

Recommend Projects

Recommend Topics

Recommend Org

Jobs