GithubHelp home page GithubHelp logo

eeqa's People

Contributors

xinyadu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

eeqa's Issues

Multi-token event triggers in ACE

Hi Xinya,
According to the data example, there is only one index provided for the trigger. Does this mean that in your preprocessing you removed all multi-token triggers?

Do inference using trained trigger_qa model?

Hello! I've got training to work and now I want the trained model to label unlabeled text with event triggers.

When I set the --model-dir to the epoch#-step# dir associated with best performance during training, and I set the --eval_test flag, I also set the --test_data to my inference (unlabeled) dataset, which doesn't have an "events" column (i.e., the data is unlabeled). However, I'm getting the following error which indicates --eval_test only controls the evaluation (i.e., implying test data are labeled) of the test data partition, not that it controls inference on new, unlabeled data.

Is inference implemented for this repo? If so, what code can I look at to enact inference?

~/event-extraction-pipeline$ ./eeqa_run.sh 
12/17/2021 17:09:16 - INFO - __main__ - device: cuda, n_gpu: 1, 16-bits training: False
12/17/2021 17:09:16 - INFO - __main__ - Namespace(add_lstm=False, dev_file='./RAMS_1.0/data/dev_convert.json', do_eval=True, do_lower_case=False, do_train=True, eval_batch_size=12, eval_metric='f1_c', eval_per_epoch=20, eval_test=True, fp16=False, gradient_accumulation_steps=1, learning_rate=8e-06, loss_scale=0, lstm_lr=None, model='bert-base-uncased', model_dir='eeqa/trigger_qa_output/epoch5-step2959', no_cuda=False, nth_query=5, num_train_epochs=6.0, output_dir='eeqa/trigger_qa_output_rams1', seed=42, test_file='french_election_2017.jsonl', train_batch_size=12, train_file='./data/train_convert.json', train_mode='random_sorted', verbose_logging=False, warmup_proportion=0.1)
12/17/2021 17:09:16 - WARNING - pytorch_pretrained_bert.tokenization - The pre-trained model you are loading is an uncased model but you have set `do_lower_case` to False. We are setting `do_lower_case=True` for you but you may want to check this behavior.
12/17/2021 17:09:17 - INFO - pytorch_pretrained_bert.tokenization - loading vocabulary file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-vocab.txt from cache at /home/lthistlethwaite_parenthetic_io/.pytorch_pretrained_bert/26bc1ad6c0ac742e9b52263248f6d0f00068293b33709fae12320c0e35ccfbbb.542ce4285a40d23a559526243235df47c5f75c197f04f37d1a0c124c32c9a084
Traceback (most recent call last):
  File "eeqa/code/run_trigger_qa.py", line 632, in <module>
    main(args)
  File "eeqa/code/run_trigger_qa.py", line 378, in main
    category_vocab.create_vocab([args.train_file, args.dev_file, args.test_file])
  File "eeqa/code/run_trigger_qa.py", line 62, in create_vocab
    events, sentence = example["event"], example["sentence"] 
KeyError: 'event'

What does gold_file refer to

Hi,

In code/script_args_qa_thresh.sh and code/script_args_qa.sh, what does the argument gold_file refer to?

It is both used in line 548 and line 698 in code/run_args_qa_thresh.py, yet they are used for evaluating devel set and test set respectively. How is one gold file able to evaluate both sets?

Thank you!

中文指标一直上不去 metric is allways bad in Chinese

我只使用了一条数据如下(Only 1 example to train dev and test):
{"sentence": ["网上车市从布鲁克斯自动化公司官方获悉,布鲁克斯自动化公司全新蓝河正式发布。"], "s_start": 0, "ner": [[[5, 14, "CompanyName"], [19, 28, "CompanyName"], [30, 32, "Product"]]], "event": [[[32, 36, "NewService.正式发布"], [5, 14, "CompanyName"], [19, 28, "CompanyName"], [30, 32, "Product"]]]}

test结果如下(Test Result):
[["NewService.正式发布_Product", [30, 32]]]

参数如下(args):
batch_size = 4
epoch = 0
f1_c = 50.0
f1_i = 50.0
global_step = 36
learning_rate = 1e-06
prec_c = 100.0
prec_i = 100.0
recall_c = 33.333333333333336
recall_i = 33.333333333333336

why only one event considered from the pre-processinng step.

event_type = event[0][1]

Hey Xinya, I wonder why only the first event type of the sentence is considered here: for example I can have an event parsing like this

{'sentence': ['Orders', 'went', 'out', 'today', 'to', 'deploy', '17,000', 'U.S.', 'Army', 'soldiers', 'in', 'the', 'Persian', 'Gulf', 'region', '.'], 's_start': 24, 'ner': [[31, 31, 'GPE'], [32, 32, 'ORG'], [33, 33, 'PER'], [36, 37, 'LOC'], [38, 38, 'LOC']], 'relation': [[32, 32, 31, 31, 'PART-WHOLE.Subsidiary'], [33, 33, 32, 32, 'ORG-AFF.Employment'], [33, 33, 38, 38, 'PHYS.Located'], [36, 37, 38, 38, 'PART-WHOLE.Geographical']], 'event': [[[29, 'Movement.Transport'], [33, 33, 'Artifact'], [38, 38, 'Destination']]]}
```, 
why 'Artifact' and 'Destination' is not grabbed? I do compare the code result with the official statistics, there are 33 events type and the code here does give me 33 events type in total, but not sure why the other events are not considered?

len(start_token) != 1 in parse_ace_event.py

When preprocessing ACE 2005 dataset via parse_ace_event.py

d:\github project\eeqa-master\proc\scripts\data\ace-event\parse_ace_event.py(780)()
-> main()
d:\github project\eeqa-master\proc\scripts\data\ace-event\parse_ace_event.py(776)main()
-> include_pronouns=args.include_pronouns)
d:\github project\eeqa-master\proc\scripts\data\ace-event\parse_ace_event.py(750)one_fold()
-> js = document.to_json()
d:\github project\eeqa-master\proc\scripts\data\ace-event\parse_ace_event.py(726)to_json()
-> js = doc.to_json()
d:\github project\eeqa-master\proc\scripts\data\ace-event\parse_ace_event.py(229)to_json()
-> self.remove_whitespace()
d:\github project\eeqa-master\proc\scripts\data\ace-event\parse_ace_event.py(216)remove_whitespace()
-> entry.remove_whitespace()
d:\github project\eeqa-master\proc\scripts\data\ace-event\parse_ace_event.py(160)remove_whitespace()
-> self.align()
d:\github project\eeqa-master\proc\scripts\data\ace-event\parse_ace_event.py(152)align()
-> entity.align(self.sent)
d:\github project\eeqa-master\proc\scripts\data\ace-event\parse_ace_event.py(43)align()
-> self.span_sentence = get_token_indices(self, sent.as_doc())
d:\github project\eeqa-master\proc\scripts\data\ace-event\parse_ace_event.py(248)get_token_indices()
-> debug_if(len(start_token) != 1)

As you seen here, len(start_token) != 1 .

The bug rose from here:

def get_token_indices(entity, sent):
     start_token = [tok for tok in sent if tok.idx == entity.start_char] 

And the reason is when feeding sent.as_doc() into get_token_indices(), the token.idx indexes from the beginning of this sentence while entity.start_char is count from start of the whole document. So start_token=[]

Could you please fix this bug? Or is there anything I did wrong?

Best regards.

The paras of BertForQuestionAnswering_withIfTriggerEmbedding used in train is right?

the paras of the BertForQuestionAnswering_withIfTriggerEmbedding‘s forward is
def forward(self, input_ids, if_trigger_ids, token_type_ids=None, attention_mask=None, start_positions=None, end_positions=None):
but the code of train in run_args_qa.py at line 556 is
if not args.add_if_trigger_embedding: loss = model(input_ids, segment_ids, input_mask, start_positions, end_positions) else: loss = model(input_ids, segment_ids, if_trigger_ids, input_mask, start_positions, end_positions)

does the correct?

the secend para of forward is "if_trigger_ids", but the fact used is the "segment_ids"

数据集中触发词都只占一个位吗?

你好,我在阅读和调试代码过程中,发现所有data中的event[0]位置都是一个只有2个元素的列表,第一个元素为触发词的offset,第二个元素为event_type标签,请问这个意思是说触发词只能由单个字符(词)构成吗?

 "event": [[[463, "Contact.Meet"], [466, 466, "Entity"]]]

How to download SciERC and GENIA datasets ?

How to get SciERC and GENIA datasets ? In the eeqa/proc/DATA.md file, I see instructions to download 2 datasets. But I can't find ./scripts/data/get_scierc.sh and ./scripts/data/get_genia.sh files to download. Thanks

can't run pre-processinng code.

hi, xinya
我跑了你的代码,在数据预处理的时候报错, 步骤是执行‘python scripts/data/ace-event/convert_examples.py’

  File "scripts/data/ace-event/convert_examples.py", line 11, in <module>
    line = json.loads(line)
  File "/home/yangpan/anaconda3/envs/ace-event-preprocess/lib/python3.7/json/__init__.py", line 348, in loads
    return _default_decoder.decode(s)
  File "/home/yangpan/anaconda3/envs/ace-event-preprocess/lib/python3.7/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/home/yangpan/anaconda3/envs/ace-event-preprocess/lib/python3.7/json/decoder.py", line 353, in raw_decode
    obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 2 column 1 (char 2)

我检查了下过程,是严格按照readme的步骤来的 & 重新从官网下载了原始数据:

  • 在解析ace05数据的时候,我采用了default-setting: python ./scripts/data/ace-event/parse_ace_event.py default-settings

我检查了下执行parse_ace_event.py得到的数据,会有一些问题,如‘events’ 字段有大量无意义的空list, 如下图:

image

请问下这个是什么原因?

reproduce

What about the result of the argument extraction part you reproduced?

Why can't I reproduce the result in the paper?

@xinyadu
Hello, I run the code as required,just get trigger_classification: P=73.333 R=68.238 F1=70.694 when choose the best trigger question strategy (while the corresponding result P=71.12 R=73.70 F1=72.39 in the paper). Are there any other recommendation or parameter option to improve?

convert_example.py

When I try to run convert_example.py (in .proc/) as the proc/README says,it always generate the error:
json.decoder.JSONDecodeError:Expecting property name enclosed in double quotes:line 2 column 1(char 2)
Could anyone tell me why this happened?And how to solve this problem?Thanks!

RuntimeError: cublas runtime error : the GPU program failed to execute at /pytorch/aten/src/THC/THCBlas.cu:331

Why Runtime Error
Enviroment:
RTX3090
CUDA:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Nov__3_21:07:56_CDT_2017
Cuda compilation tools, release 9.1, V9.1.85

Error:
06/19/2021 16:37:19 - INFO - main - device: cuda, n_gpu: 2, 16-bits training: False
06/19/2021 16:51:17 - INFO - main - Start epoch #0 (lr = 4e-05)...
Traceback (most recent call last):
File "code/run_trigger_qa.py", line 629, in
main(args)
File "code/run_trigger_qa.py", line 480, in main
loss = model(input_ids, token_type_ids = segment_ids, attention_mask = input_mask, labels = labels)
File "/home/hdu/anaconda3/envs/ace-event-qa/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/hdu/anaconda3/envs/ace-event-qa/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 152, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/home/hdu/anaconda3/envs/ace-event-qa/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 162, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/home/hdu/anaconda3/envs/ace-event-qa/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 85, in parallel_apply
output.reraise()
File "/home/hdu/anaconda3/envs/ace-event-qa/lib/python3.7/site-packages/torch/_utils.py", line 369, in reraise
raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in replica 0 on device 0.
Original Traceback (most recent call last):
File "/home/hdu/anaconda3/envs/ace-event-qa/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 60, in _worker
output = module(*input, **kwargs)
File "/home/hdu/anaconda3/envs/ace-event-qa/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/zf1/xqs/eeqa-master/code/pytorch_pretrained_bert/modeling.py", line 1198, in forward
sequence_output, _ = self.bert(input_ids, token_type_ids, attention_mask, output_all_encoded_layers=False)
File "/home/hdu/anaconda3/envs/ace-event-qa/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/zf1/xqs/eeqa-master/code/pytorch_pretrained_bert/modeling.py", line 734, in forward
output_all_encoded_layers=output_all_encoded_layers)
File "/home/hdu/anaconda3/envs/ace-event-qa/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/zf1/xqs/eeqa-master/code/pytorch_pretrained_bert/modeling.py", line 411, in forward
hidden_states = layer_module(hidden_states, attention_mask)
File "/home/hdu/anaconda3/envs/ace-event-qa/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/zf1/xqs/eeqa-master/code/pytorch_pretrained_bert/modeling.py", line 396, in forward
attention_output = self.attention(hidden_states, attention_mask)
File "/home/hdu/anaconda3/envs/ace-event-qa/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/zf1/xqs/eeqa-master/code/pytorch_pretrained_bert/modeling.py", line 354, in forward
self_output = self.self(input_tensor, attention_mask)
File "/home/hdu/anaconda3/envs/ace-event-qa/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/zf1/xqs/eeqa-master/code/pytorch_pretrained_bert/modeling.py", line 311, in forward
attention_scores = torch.matmul(query_layer, key_layer.transpose(-1, -2))
RuntimeError: cublas runtime error : the GPU program failed to execute at /pytorch/aten/src/THC/THCBlas.cu:331

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.