xinyadu / eeqa Goto Github PK

View Code? Open in Web Editor NEW

204.0 204.0 49.0 566 KB

Event Extraction by Answering (Almost) Natural Questions

License: MIT License

Shell 2.44% Python 97.56%

eeqa's People

Contributors

Stargazers

Watchers

eeqa's Issues

Multi-token event triggers in ACE

Hi Xinya,
According to the data example, there is only one index provided for the trigger. Does this mean that in your preprocessing you removed all multi-token triggers?

why add a hardcoded 12?

hey xinya, when looking at the code, i have a question why u are adding a hard coded 12 here?

eeqa/code/run_trigger_qa.py

Line 73 in 11bda7d

self.max_sent_length += 12

I don't get why for [CLS] you are adding 12?

Do inference using trained trigger_qa model?

Hello! I've got training to work and now I want the trained model to label unlabeled text with event triggers.

When I set the --model-dir to the epoch#-step# dir associated with best performance during training, and I set the --eval_test flag, I also set the --test_data to my inference (unlabeled) dataset, which doesn't have an "events" column (i.e., the data is unlabeled). However, I'm getting the following error which indicates --eval_test only controls the evaluation (i.e., implying test data are labeled) of the test data partition, not that it controls inference on new, unlabeled data.

Is inference implemented for this repo? If so, what code can I look at to enact inference?

~/event-extraction-pipeline$ ./eeqa_run.sh 
12/17/2021 17:09:16 - INFO - __main__ - device: cuda, n_gpu: 1, 16-bits training: False
12/17/2021 17:09:16 - INFO - __main__ - Namespace(add_lstm=False, dev_file='./RAMS_1.0/data/dev_convert.json', do_eval=True, do_lower_case=False, do_train=True, eval_batch_size=12, eval_metric='f1_c', eval_per_epoch=20, eval_test=True, fp16=False, gradient_accumulation_steps=1, learning_rate=8e-06, loss_scale=0, lstm_lr=None, model='bert-base-uncased', model_dir='eeqa/trigger_qa_output/epoch5-step2959', no_cuda=False, nth_query=5, num_train_epochs=6.0, output_dir='eeqa/trigger_qa_output_rams1', seed=42, test_file='french_election_2017.jsonl', train_batch_size=12, train_file='./data/train_convert.json', train_mode='random_sorted', verbose_logging=False, warmup_proportion=0.1)
12/17/2021 17:09:16 - WARNING - pytorch_pretrained_bert.tokenization - The pre-trained model you are loading is an uncased model but you have set `do_lower_case` to False. We are setting `do_lower_case=True` for you but you may want to check this behavior.
12/17/2021 17:09:17 - INFO - pytorch_pretrained_bert.tokenization - loading vocabulary file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-vocab.txt from cache at /home/lthistlethwaite_parenthetic_io/.pytorch_pretrained_bert/26bc1ad6c0ac742e9b52263248f6d0f00068293b33709fae12320c0e35ccfbbb.542ce4285a40d23a559526243235df47c5f75c197f04f37d1a0c124c32c9a084
Traceback (most recent call last):
  File "eeqa/code/run_trigger_qa.py", line 632, in <module>
    main(args)
  File "eeqa/code/run_trigger_qa.py", line 378, in main
    category_vocab.create_vocab([args.train_file, args.dev_file, args.test_file])
  File "eeqa/code/run_trigger_qa.py", line 62, in create_vocab
    events, sentence = example["event"], example["sentence"] 
KeyError: 'event'

What does gold_file refer to

Hi,

In code/script_args_qa_thresh.sh and code/script_args_qa.sh, what does the argument gold_file refer to?

It is both used in line 548 and line 698 in code/run_args_qa_thresh.py, yet they are used for evaluating devel set and test set respectively. How is one gold file able to evaluate both sets?

Thank you!

中文指标一直上不去 metric is allways bad in Chinese

我只使用了一条数据如下（Only 1 example to train dev and test）：
{"sentence": ["网上车市从布鲁克斯自动化公司官方获悉，布鲁克斯自动化公司全新蓝河正式发布。"], "s_start": 0, "ner": [[[5, 14, "CompanyName"], [19, 28, "CompanyName"], [30, 32, "Product"]]], "event": [[[32, 36, "NewService.正式发布"], [5, 14, "CompanyName"], [19, 28, "CompanyName"], [30, 32, "Product"]]]}

test结果如下（Test Result）：
[["NewService.正式发布_Product", [30, 32]]]

参数如下（args）：
batch_size = 4
epoch = 0
f1_c = 50.0
f1_i = 50.0
global_step = 36
learning_rate = 1e-06
prec_c = 100.0
prec_i = 100.0
recall_c = 33.333333333333336
recall_i = 33.333333333333336

why only one event considered from the pre-processinng step.

eeqa/code/run_trigger_qa.py

Line 65 in 0432ae9

event_type = event[0][1]

Hey Xinya, I wonder why only the first event type of the sentence is considered here: for example I can have an event parsing like this

{'sentence': ['Orders', 'went', 'out', 'today', 'to', 'deploy', '17,000', 'U.S.', 'Army', 'soldiers', 'in', 'the', 'Persian', 'Gulf', 'region', '.'], 's_start': 24, 'ner': [[31, 31, 'GPE'], [32, 32, 'ORG'], [33, 33, 'PER'], [36, 37, 'LOC'], [38, 38, 'LOC']], 'relation': [[32, 32, 31, 31, 'PART-WHOLE.Subsidiary'], [33, 33, 32, 32, 'ORG-AFF.Employment'], [33, 33, 38, 38, 'PHYS.Located'], [36, 37, 38, 38, 'PART-WHOLE.Geographical']], 'event': [[[29, 'Movement.Transport'], [33, 33, 'Artifact'], [38, 38, 'Destination']]]}
```, 
why 'Artifact' and 'Destination' is not grabbed? I do compare the code result with the official statistics, there are 33 events type and the code here does give me 33 events type in total, but not sure why the other events are not considered?

len(start_token) != 1 in parse_ace_event.py

When preprocessing ACE 2005 dataset via parse_ace_event.py

d:\github project\eeqa-master\proc\scripts\data\ace-event\parse_ace_event.py(780)()
-> main()
d:\github project\eeqa-master\proc\scripts\data\ace-event\parse_ace_event.py(776)main()
-> include_pronouns=args.include_pronouns)
d:\github project\eeqa-master\proc\scripts\data\ace-event\parse_ace_event.py(750)one_fold()
-> js = document.to_json()
d:\github project\eeqa-master\proc\scripts\data\ace-event\parse_ace_event.py(726)to_json()
-> js = doc.to_json()
d:\github project\eeqa-master\proc\scripts\data\ace-event\parse_ace_event.py(229)to_json()
-> self.remove_whitespace()
d:\github project\eeqa-master\proc\scripts\data\ace-event\parse_ace_event.py(216)remove_whitespace()
-> entry.remove_whitespace()
d:\github project\eeqa-master\proc\scripts\data\ace-event\parse_ace_event.py(160)remove_whitespace()
-> self.align()
d:\github project\eeqa-master\proc\scripts\data\ace-event\parse_ace_event.py(152)align()
-> entity.align(self.sent)
d:\github project\eeqa-master\proc\scripts\data\ace-event\parse_ace_event.py(43)align()
-> self.span_sentence = get_token_indices(self, sent.as_doc())
d:\github project\eeqa-master\proc\scripts\data\ace-event\parse_ace_event.py(248)get_token_indices()
-> debug_if(len(start_token) != 1)

As you seen here, len(start_token) != 1 .

The bug rose from here:

def get_token_indices(entity, sent):
     start_token = [tok for tok in sent if tok.idx == entity.start_char]

And the reason is when feeding sent.as_doc() into get_token_indices(), the token.idx indexes from the beginning of this sentence while entity.start_char is count from start of the whole document. So start_token=[]

Could you please fix this bug? Or is there anything I did wrong?

Best regards.

The paras of BertForQuestionAnswering_withIfTriggerEmbedding used in train is right?

the paras of the BertForQuestionAnswering_withIfTriggerEmbedding‘s forward is
def forward(self, input_ids, if_trigger_ids, token_type_ids=None, attention_mask=None, start_positions=None, end_positions=None):
but the code of train in run_args_qa.py at line 556 is
if not args.add_if_trigger_embedding: loss = model(input_ids, segment_ids, input_mask, start_positions, end_positions) else: loss = model(input_ids, segment_ids, if_trigger_ids, input_mask, start_positions, end_positions)

does the correct？

the secend para of forward is "if_trigger_ids", but the fact used is the "segment_ids"

数据集中触发词都只占一个位吗？

你好，我在阅读和调试代码过程中，发现所有data中的event[0]位置都是一个只有2个元素的列表，第一个元素为触发词的offset，第二个元素为event_type标签，请问这个意思是说触发词只能由单个字符（词）构成吗？

 "event": [[[463, "Contact.Meet"], [466, 466, "Entity"]]]

How to download SciERC and GENIA datasets ?

How to get SciERC and GENIA datasets ? In the eeqa/proc/DATA.md file, I see instructions to download 2 datasets. But I can't find ./scripts/data/get_scierc.sh and ./scripts/data/get_genia.sh files to download. Thanks

can't run pre-processinng code.

hi, xinya
我跑了你的代码，在数据预处理的时候报错, 步骤是执行‘python scripts/data/ace-event/convert_examples.py’

  File "scripts/data/ace-event/convert_examples.py", line 11, in <module>
    line = json.loads(line)
  File "/home/yangpan/anaconda3/envs/ace-event-preprocess/lib/python3.7/json/__init__.py", line 348, in loads
    return _default_decoder.decode(s)
  File "/home/yangpan/anaconda3/envs/ace-event-preprocess/lib/python3.7/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/home/yangpan/anaconda3/envs/ace-event-preprocess/lib/python3.7/json/decoder.py", line 353, in raw_decode
    obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 2 column 1 (char 2)

我检查了下过程，是严格按照readme的步骤来的 & 重新从官网下载了原始数据：

在解析ace05数据的时候，我采用了default-setting: python ./scripts/data/ace-event/parse_ace_event.py default-settings

我检查了下执行parse_ace_event.py得到的数据，会有一些问题，如‘events’ 字段有大量无意义的空list, 如下图：

请问下这个是什么原因？

reproduce

What about the result of the argument extraction part you reproduced?

Why can't I reproduce the result in the paper?

@xinyadu
Hello, I run the code as required，just get trigger_classification: P=73.333 R=68.238 F1=70.694 when choose the best trigger question strategy (while the corresponding result P=71.12 R=73.70 F1=72.39 in the paper). Are there any other recommendation or parameter option to improve?

convert_example.py

When I try to run convert_example.py (in .proc/) as the proc/README says,it always generate the error:
json.decoder.JSONDecodeError:Expecting property name enclosed in double quotes:line 2 column 1(char 2)
Could anyone tell me why this happened?And how to solve this problem?Thanks!

RuntimeError: cublas runtime error : the GPU program failed to execute at /pytorch/aten/src/THC/THCBlas.cu:331

Why Runtime Error
Enviroment：
RTX3090
CUDA：
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Nov__3_21:07:56_CDT_2017
Cuda compilation tools, release 9.1, V9.1.85

Error：
06/19/2021 16:37:19 - INFO - main - device: cuda, n_gpu: 2, 16-bits training: False
06/19/2021 16:51:17 - INFO - main - Start epoch #0 (lr = 4e-05)...
Traceback (most recent call last):
File "code/run_trigger_qa.py", line 629, in
main(args)
File "code/run_trigger_qa.py", line 480, in main
loss = model(input_ids, token_type_ids = segment_ids, attention_mask = input_mask, labels = labels)
File "/home/hdu/anaconda3/envs/ace-event-qa/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/hdu/anaconda3/envs/ace-event-qa/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 152, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/home/hdu/anaconda3/envs/ace-event-qa/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 162, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/home/hdu/anaconda3/envs/ace-event-qa/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 85, in parallel_apply
output.reraise()
File "/home/hdu/anaconda3/envs/ace-event-qa/lib/python3.7/site-packages/torch/_utils.py", line 369, in reraise
raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in replica 0 on device 0.
Original Traceback (most recent call last):
File "/home/hdu/anaconda3/envs/ace-event-qa/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 60, in _worker
output = module(*input, **kwargs)
File "/home/hdu/anaconda3/envs/ace-event-qa/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/zf1/xqs/eeqa-master/code/pytorch_pretrained_bert/modeling.py", line 1198, in forward
sequence_output, _ = self.bert(input_ids, token_type_ids, attention_mask, output_all_encoded_layers=False)
File "/home/hdu/anaconda3/envs/ace-event-qa/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/zf1/xqs/eeqa-master/code/pytorch_pretrained_bert/modeling.py", line 734, in forward
output_all_encoded_layers=output_all_encoded_layers)
File "/home/hdu/anaconda3/envs/ace-event-qa/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/zf1/xqs/eeqa-master/code/pytorch_pretrained_bert/modeling.py", line 411, in forward
hidden_states = layer_module(hidden_states, attention_mask)
File "/home/hdu/anaconda3/envs/ace-event-qa/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/zf1/xqs/eeqa-master/code/pytorch_pretrained_bert/modeling.py", line 396, in forward
attention_output = self.attention(hidden_states, attention_mask)
File "/home/hdu/anaconda3/envs/ace-event-qa/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/zf1/xqs/eeqa-master/code/pytorch_pretrained_bert/modeling.py", line 354, in forward
self_output = self.self(input_tensor, attention_mask)
File "/home/hdu/anaconda3/envs/ace-event-qa/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/zf1/xqs/eeqa-master/code/pytorch_pretrained_bert/modeling.py", line 311, in forward
attention_scores = torch.matmul(query_layer, key_layer.transpose(-1, -2))
RuntimeError: cublas runtime error : the GPU program failed to execute at /pytorch/aten/src/THC/THCBlas.cu:331

xinyadu / eeqa Goto Github PK

eeqa's People

Contributors

Stargazers

Watchers

Forkers

eeqa's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs