xinyadu / eeqa Goto Github PK
View Code? Open in Web Editor NEWEvent Extraction by Answering (Almost) Natural Questions
License: MIT License
Event Extraction by Answering (Almost) Natural Questions
License: MIT License
Hi Xinya,
According to the data example, there is only one index provided for the trigger. Does this mean that in your preprocessing you removed all multi-token triggers?
hey xinya, when looking at the code, i have a question why u are adding a hard coded 12 here?
Line 73 in 11bda7d
[CLS]
you are adding 12?对于训练机有什么要求
Hello! I've got training to work and now I want the trained model to label unlabeled text with event triggers.
When I set the --model-dir to the epoch#-step# dir associated with best performance during training, and I set the --eval_test flag, I also set the --test_data to my inference (unlabeled) dataset, which doesn't have an "events" column (i.e., the data is unlabeled). However, I'm getting the following error which indicates --eval_test only controls the evaluation (i.e., implying test data are labeled) of the test data partition, not that it controls inference on new, unlabeled data.
Is inference implemented for this repo? If so, what code can I look at to enact inference?
~/event-extraction-pipeline$ ./eeqa_run.sh
12/17/2021 17:09:16 - INFO - __main__ - device: cuda, n_gpu: 1, 16-bits training: False
12/17/2021 17:09:16 - INFO - __main__ - Namespace(add_lstm=False, dev_file='./RAMS_1.0/data/dev_convert.json', do_eval=True, do_lower_case=False, do_train=True, eval_batch_size=12, eval_metric='f1_c', eval_per_epoch=20, eval_test=True, fp16=False, gradient_accumulation_steps=1, learning_rate=8e-06, loss_scale=0, lstm_lr=None, model='bert-base-uncased', model_dir='eeqa/trigger_qa_output/epoch5-step2959', no_cuda=False, nth_query=5, num_train_epochs=6.0, output_dir='eeqa/trigger_qa_output_rams1', seed=42, test_file='french_election_2017.jsonl', train_batch_size=12, train_file='./data/train_convert.json', train_mode='random_sorted', verbose_logging=False, warmup_proportion=0.1)
12/17/2021 17:09:16 - WARNING - pytorch_pretrained_bert.tokenization - The pre-trained model you are loading is an uncased model but you have set `do_lower_case` to False. We are setting `do_lower_case=True` for you but you may want to check this behavior.
12/17/2021 17:09:17 - INFO - pytorch_pretrained_bert.tokenization - loading vocabulary file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-vocab.txt from cache at /home/lthistlethwaite_parenthetic_io/.pytorch_pretrained_bert/26bc1ad6c0ac742e9b52263248f6d0f00068293b33709fae12320c0e35ccfbbb.542ce4285a40d23a559526243235df47c5f75c197f04f37d1a0c124c32c9a084
Traceback (most recent call last):
File "eeqa/code/run_trigger_qa.py", line 632, in <module>
main(args)
File "eeqa/code/run_trigger_qa.py", line 378, in main
category_vocab.create_vocab([args.train_file, args.dev_file, args.test_file])
File "eeqa/code/run_trigger_qa.py", line 62, in create_vocab
events, sentence = example["event"], example["sentence"]
KeyError: 'event'
Hi,
In code/script_args_qa_thresh.sh and code/script_args_qa.sh, what does the argument gold_file refer to?
It is both used in line 548 and line 698 in code/run_args_qa_thresh.py, yet they are used for evaluating devel set and test set respectively. How is one gold file able to evaluate both sets?
Thank you!
我只使用了一条数据如下(Only 1 example to train dev and test):
{"sentence": ["网上车市从布鲁克斯自动化公司官方获悉,布鲁克斯自动化公司全新蓝河正式发布。"], "s_start": 0, "ner": [[[5, 14, "CompanyName"], [19, 28, "CompanyName"], [30, 32, "Product"]]], "event": [[[32, 36, "NewService.正式发布"], [5, 14, "CompanyName"], [19, 28, "CompanyName"], [30, 32, "Product"]]]}
test结果如下(Test Result):
[["NewService.正式发布_Product", [30, 32]]]
参数如下(args):
batch_size = 4
epoch = 0
f1_c = 50.0
f1_i = 50.0
global_step = 36
learning_rate = 1e-06
prec_c = 100.0
prec_i = 100.0
recall_c = 33.333333333333336
recall_i = 33.333333333333336
Line 65 in 0432ae9
Hey Xinya, I wonder why only the first event type of the sentence is considered here: for example I can have an event parsing like this
{'sentence': ['Orders', 'went', 'out', 'today', 'to', 'deploy', '17,000', 'U.S.', 'Army', 'soldiers', 'in', 'the', 'Persian', 'Gulf', 'region', '.'], 's_start': 24, 'ner': [[31, 31, 'GPE'], [32, 32, 'ORG'], [33, 33, 'PER'], [36, 37, 'LOC'], [38, 38, 'LOC']], 'relation': [[32, 32, 31, 31, 'PART-WHOLE.Subsidiary'], [33, 33, 32, 32, 'ORG-AFF.Employment'], [33, 33, 38, 38, 'PHYS.Located'], [36, 37, 38, 38, 'PART-WHOLE.Geographical']], 'event': [[[29, 'Movement.Transport'], [33, 33, 'Artifact'], [38, 38, 'Destination']]]}
```,
why 'Artifact' and 'Destination' is not grabbed? I do compare the code result with the official statistics, there are 33 events type and the code here does give me 33 events type in total, but not sure why the other events are not considered?
When preprocessing ACE 2005 dataset via parse_ace_event.py
d:\github project\eeqa-master\proc\scripts\data\ace-event\parse_ace_event.py(780)()
-> main()
d:\github project\eeqa-master\proc\scripts\data\ace-event\parse_ace_event.py(776)main()
-> include_pronouns=args.include_pronouns)
d:\github project\eeqa-master\proc\scripts\data\ace-event\parse_ace_event.py(750)one_fold()
-> js = document.to_json()
d:\github project\eeqa-master\proc\scripts\data\ace-event\parse_ace_event.py(726)to_json()
-> js = doc.to_json()
d:\github project\eeqa-master\proc\scripts\data\ace-event\parse_ace_event.py(229)to_json()
-> self.remove_whitespace()
d:\github project\eeqa-master\proc\scripts\data\ace-event\parse_ace_event.py(216)remove_whitespace()
-> entry.remove_whitespace()
d:\github project\eeqa-master\proc\scripts\data\ace-event\parse_ace_event.py(160)remove_whitespace()
-> self.align()
d:\github project\eeqa-master\proc\scripts\data\ace-event\parse_ace_event.py(152)align()
-> entity.align(self.sent)
d:\github project\eeqa-master\proc\scripts\data\ace-event\parse_ace_event.py(43)align()
-> self.span_sentence = get_token_indices(self, sent.as_doc())
d:\github project\eeqa-master\proc\scripts\data\ace-event\parse_ace_event.py(248)get_token_indices()
-> debug_if(len(start_token) != 1)
As you seen here, len(start_token) != 1 .
The bug rose from here:
def get_token_indices(entity, sent):
start_token = [tok for tok in sent if tok.idx == entity.start_char]
And the reason is when feeding sent.as_doc() into get_token_indices(), the token.idx indexes from the beginning of this sentence while entity.start_char is count from start of the whole document. So start_token=[]
Could you please fix this bug? Or is there anything I did wrong?
Best regards.
the paras of the BertForQuestionAnswering_withIfTriggerEmbedding‘s forward is
def forward(self, input_ids, if_trigger_ids, token_type_ids=None, attention_mask=None, start_positions=None, end_positions=None):
but the code of train in run_args_qa.py at line 556 is
if not args.add_if_trigger_embedding: loss = model(input_ids, segment_ids, input_mask, start_positions, end_positions) else: loss = model(input_ids, segment_ids, if_trigger_ids, input_mask, start_positions, end_positions)
does the correct?
the secend para of forward is "if_trigger_ids", but the fact used is the "segment_ids"
你好,我在阅读和调试代码过程中,发现所有data中的event[0]位置都是一个只有2个元素的列表,第一个元素为触发词的offset,第二个元素为event_type标签,请问这个意思是说触发词只能由单个字符(词)构成吗?
"event": [[[463, "Contact.Meet"], [466, 466, "Entity"]]]
How to get SciERC and GENIA datasets ? In the eeqa/proc/DATA.md file, I see instructions to download 2 datasets. But I can't find ./scripts/data/get_scierc.sh and ./scripts/data/get_genia.sh files to download. Thanks
hi, xinya
我跑了你的代码,在数据预处理的时候报错, 步骤是执行‘python scripts/data/ace-event/convert_examples.py’
File "scripts/data/ace-event/convert_examples.py", line 11, in <module>
line = json.loads(line)
File "/home/yangpan/anaconda3/envs/ace-event-preprocess/lib/python3.7/json/__init__.py", line 348, in loads
return _default_decoder.decode(s)
File "/home/yangpan/anaconda3/envs/ace-event-preprocess/lib/python3.7/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/home/yangpan/anaconda3/envs/ace-event-preprocess/lib/python3.7/json/decoder.py", line 353, in raw_decode
obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 2 column 1 (char 2)
我检查了下过程,是严格按照readme的步骤来的 & 重新从官网下载了原始数据:
我检查了下执行parse_ace_event.py得到的数据,会有一些问题,如‘events’ 字段有大量无意义的空list, 如下图:
请问下这个是什么原因?
What about the result of the argument extraction part you reproduced?
@xinyadu
Hello, I run the code as required,just get trigger_classification: P=73.333 R=68.238 F1=70.694 when choose the best trigger question strategy (while the corresponding result P=71.12 R=73.70 F1=72.39 in the paper). Are there any other recommendation or parameter option to improve?
When I try to run convert_example.py (in .proc/) as the proc/README says,it always generate the error:
json.decoder.JSONDecodeError:Expecting property name enclosed in double quotes:line 2 column 1(char 2)
Could anyone tell me why this happened?And how to solve this problem?Thanks!
Why Runtime Error
Enviroment:
RTX3090
CUDA:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Nov__3_21:07:56_CDT_2017
Cuda compilation tools, release 9.1, V9.1.85
Error:
06/19/2021 16:37:19 - INFO - main - device: cuda, n_gpu: 2, 16-bits training: False
06/19/2021 16:51:17 - INFO - main - Start epoch #0 (lr = 4e-05)...
Traceback (most recent call last):
File "code/run_trigger_qa.py", line 629, in
main(args)
File "code/run_trigger_qa.py", line 480, in main
loss = model(input_ids, token_type_ids = segment_ids, attention_mask = input_mask, labels = labels)
File "/home/hdu/anaconda3/envs/ace-event-qa/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/hdu/anaconda3/envs/ace-event-qa/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 152, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/home/hdu/anaconda3/envs/ace-event-qa/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 162, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/home/hdu/anaconda3/envs/ace-event-qa/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 85, in parallel_apply
output.reraise()
File "/home/hdu/anaconda3/envs/ace-event-qa/lib/python3.7/site-packages/torch/_utils.py", line 369, in reraise
raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in replica 0 on device 0.
Original Traceback (most recent call last):
File "/home/hdu/anaconda3/envs/ace-event-qa/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 60, in _worker
output = module(*input, **kwargs)
File "/home/hdu/anaconda3/envs/ace-event-qa/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/zf1/xqs/eeqa-master/code/pytorch_pretrained_bert/modeling.py", line 1198, in forward
sequence_output, _ = self.bert(input_ids, token_type_ids, attention_mask, output_all_encoded_layers=False)
File "/home/hdu/anaconda3/envs/ace-event-qa/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/zf1/xqs/eeqa-master/code/pytorch_pretrained_bert/modeling.py", line 734, in forward
output_all_encoded_layers=output_all_encoded_layers)
File "/home/hdu/anaconda3/envs/ace-event-qa/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/zf1/xqs/eeqa-master/code/pytorch_pretrained_bert/modeling.py", line 411, in forward
hidden_states = layer_module(hidden_states, attention_mask)
File "/home/hdu/anaconda3/envs/ace-event-qa/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/zf1/xqs/eeqa-master/code/pytorch_pretrained_bert/modeling.py", line 396, in forward
attention_output = self.attention(hidden_states, attention_mask)
File "/home/hdu/anaconda3/envs/ace-event-qa/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/zf1/xqs/eeqa-master/code/pytorch_pretrained_bert/modeling.py", line 354, in forward
self_output = self.self(input_tensor, attention_mask)
File "/home/hdu/anaconda3/envs/ace-event-qa/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/zf1/xqs/eeqa-master/code/pytorch_pretrained_bert/modeling.py", line 311, in forward
attention_scores = torch.matmul(query_layer, key_layer.transpose(-1, -2))
RuntimeError: cublas runtime error : the GPU program failed to execute at /pytorch/aten/src/THC/THCBlas.cu:331
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.