GithubHelp home page GithubHelp logo

dureader's Introduction

DuReader

DuReader focus on the benchmarks and models of machine reading comprehension for question answering.

Dataset:

DuReader-vis: The first Chinese Open-domain Document Visual Question Answering (Open-Domain DocVQA) dataset. [Paper]

DuReader Retrieval: A large-scale Chinese dataset for passage retrieval. [Paper][Code] [Leaderboard]

DuQM: Linguistically Perturbed Natural Questions for Evaluating the Robustness of Question Matching Models.[Paper][Code] [Leaderboard]

DuReader Checklist: A dataset challenging model understanding capabilities in vocabulary, phrase, semantic role, reasoning. [Code] [Leaderboard]

DuReader Yes/No: A dataset challenging models in opinion polarity judgment. [Code] [Leaderboard]

DuReader Robust: A dataset challenging models in (1)over-sensitivity, (2)over-stability and (3)generalization. [Paper] [Code] [Learderboard]

DuReader 2.0: A new large-scale real-world and human sourced MRC dataset [Paper] [Code] [Leaderboard]

DuReader Robust, DuReader Yes/No, DuReader Checklist, DuQMcan be downloaded at qianyan official website. DuReader-vis can be downloaded by following the method in DuReader-vis/README.md at this repository. DuReader 2.0 can be downloaded by following the method in DuReader-2.0/README.md at this repository.

Models:

KT-NET: A machine reading comprehension (MRC) model which integrates knowledge from knowledge bases (KBs) into pre-trained contextualized representations. [Paper] [Code] [Learderboard]

D-NET: A simple pre-training and fine-tuning framework which focused on the generalization of machine reading comprehension (MRC) models. [Paper] [Code] [Learderboard]

News

  • May 2022, DuReader-vis was accepted by ACL 2022 Findings.
  • March 2022, DuReader Retrieval was released, holding the Passage retrieval challenge.
  • September 2021, we released DuQM that is a Chinese dataset of linguistically perturbed natural questions for evaluating the robustness of question matching models, and it was included in qianyan.
  • June 2021, DuReader Robust, DuReader Yes/No and DuReader Checklist were included in qianyan.
  • May 2021, DuReader Robust was accepted by ACL 2021.
  • March 2021, DuReader Checklist was released, holding the DuReader Checklist challenge.
  • March 2020, DuReader Robust was released, holding the DuReader Robust challenge.
  • December 2019, DuReader Yes/No was released, holding the DuReader Yes/No challenge. After that, DuReader Yes/No Individual Challenge and Team Challenge were held.
  • August 2019, D-NET was released and ranked at top 1 of the MRQA-2019 shared task.
  • July 2019, KT-NET was accepted by ACL 2019.
  • March 2019, the second MRC challenge was held based on DuReader 2.0, including hard samples in the test set.
  • April 2018, DuReader 2.0 was accepted by ACL 2018 at the Workshop on Machine Reading for Question Answering.
  • March 2018, the first MRC challenge was held based on DuReader 2.0

Detailed Description

DuReader contains four datasets: DuReader 2.0, DuReader Robust, DuReader Yes/No , DuReader Checklist and DuReader-vis. The main features of these datasets include:

  • Real question, Real article, Real answer, Real application scenario;
  • Rich question types, including entity, number, opinion, etc;
  • Various task types, including span-based tasks and classification tasks;
  • Rich task challenges, including model retrieval capability, model robustness, model checklist etc.

DuReader 2.0 : Real question, Real article, Real answer

[Paper] [Code] [Leaderboard]

DuReader is a new large-scale real-world and human sourced MRC dataset in Chinese. DuReader focuses on real-world open-domain question answering. The advantages of DuReader over existing datasets are concluded as follows: Real question, Real article, Real answer, Real application scenario and Rich annotation.

KT-NET: Integrate knowledge into pre-trained LMs.

[Paper] [Code] [Learderboard]

KT-NET (Knowledge and Text fusion NET) is a machine reading comprehension (MRC) model which integrates knowledge from knowledge bases (KBs) into pre-trained contextualized representations. The model is proposed in ACL2019 paper Enhancing Pre-Trained Language Representations with Rich Knowledge for Machine Reading Comprehension.

D-NET: Model generalization

[Paper] [Code] [Learderboard]

D-NET is a simple system Baidu submitted for MRQA (Machine Reading for Question Answering) 2019 Shared Task that focused on generalization of machine reading comprehension (MRC) models. The system is built on a framework of pretraining and fine-tuning. The techniques of pre-trained language models and multi-task learning are explored to improve the generalization of MRC models. D-NET is ranked at top 1 of all the participants in terms of averaged F1 score.

DuReader Robust: Model Robustness

[Paper] [Code] [Learderboard]

DuReader Robust is designed to challenge MRC models from the following aspects: (1) over-sensitivity, (2) over-stability and (3) generalization. Besides, DuReader Robust has another advantage over previous datasets: questions and documents are from Baidu Search. It presents the robustness issues of MRC models when applying them to real-world scenarios.

DuReader Yes/No: Opinion Yes/No Questions

[Code] [Leaderboard]

Span-based MRC tasks adopt F1 and EM metrics to measure the difference between predicted answers and labeled answers. However, the task about opinion polarity cannot be well measured by these metrics. DuReader Yes/No is proposed to challenge MRC models in opinion polarity, which will complement the disadvantages of existing MRC tasks and evaluate the effectiveness of existing models more reasonably.

DuReader Checklist: Natural Language Understanding Capabilities

[Code] [Leaderboard]

DuReader Checklist is a high-quality Chinese machine reading comprehension dataset for real application scenarios. It is designed to challenge the natural language understanding capabilities from multi-aspect via systematic evaluation (i.e. checklist), including understanding of vocabulary, phrase, semantic role, reasoning and so on.

DuQM: Linguistically Perturbed Natural Questions for Evaluating the Robustness of Question Matching Models

[Paper][Code] [Leaderboard]

DuQM is a Chinese question matching robust dataset, which contains natural questions with linguistic perturbations to evaluate the robustness of question matching models. DuQM is designed to be fine-grained, diverse and natural. And it contains 3 categories and 13 subcategories with 32 linguistic perturbations.

DuReader Retrieval: A large-scale Chinese dataset for passage retrieval from web search engine

[Paper][Code] [Leaderboard]

DuReader Retrieval is a large-scale Chinese dataset for passage retrieval from web search engine. The dataset contains more than 90K queries and over 8M unique passages from realistic data sources.

DuReader-vis: A Chinese Dataset for Open-domain Document Visual Question Answering

[Paper]

DuReader-vis is the first Chinese Open-domain DocVQA dataset from web search engine. The dataset contains more than 15K labeled question-document pairs and over 158K unique documents from realistic data sources.

Dataset and Evaluation Tools

We make public a dataset loading and evaluation tool named qianyan. You can use this package easily by following the qianyan repo.

Copyright and License

Copyright 2017 Baidu.com, Inc. All Rights Reserved

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Contact Information

For help or issues using DuReader, including datasets and baselines, please submit a Github issue.

For other communication or cooperation, please contact Jing Liu ([email protected]) or Hongyu Li ([email protected]).

dureader's People

Contributors

0xflotus avatar chenustc avatar decstionback avatar hongyuli2018 avatar legendarydan avatar liuyuuan avatar lkliukai avatar wan-wei avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dureader's Issues

Test loss does not decrease in the training process

When i ran the BiDAF model (origin code without any modification) in the tensorflow version, the training loss decreases to about 2. However, if we ran a dev mode after each epoch, the dev loss does not varies (always about 15). The bleu4 score is only about 20-25 in the dev set.
i have also tried to add dropout in the training process, and to add l2 regularization in the training process, embedding matrix has also been replaced by a pretrained embedding model, but all of them have less effect on the result of the dev process. Dev loss still does not change in the dev process after each epoch.
In addtion, the dataset is the training and dev set of baidu_search total dataset.
Is there anything special we could do to train the tensorflow version and get a better result?

Only support for single GPU?

I use the argument with --gpu 0,1,3,4, but it seems that the code only use one gpu. Is it support for multiple gpus?

Thanks.

用dureader_preprocessed.zip的数据运行paddle的train出错

我用data/download.sh下载了dureader_preprocessed.zip,解压得到testset、trainset、devset的各个json,我把每个json都head -n 3000截取头3000行。前面的步骤没有错误,到了train这步出现一堆SKIP:
paddle-train

之后infer也没提示出错,但运行完之后models底下的infer目录为空
Uploading paddle-infer.png…

之前试过不截取数据直接运行,train那一步也是有一堆SKIP的。只不过机器实在达不到要求,train还没运行完就挂掉了,也就没测试到后续的infer。

运行DuReader/paddle/paragraph_extraction.py生成的数据中, answer_span对应的答案与fake_answer不相同

运行paddle中的run.sh脚本,调用paragraph_extraction生成新的文件后,发现 answer_span对应的答案与fake_answer对应的答案不是一直相同的。原则上二者应该相同吧?

下面3个例子取自demo/devset/search.dev.json生成的新的文件。可以看出二者大多数情况相同,但也有一些情况不相同。

question id 181623 fake_answer 大众牛逼关键在于是个德国车企,**人普遍崇拜德国。
question id 181623 span_answer <splitter>我就想知道路上那么多大众怎么破,这不是事实么

question id 181625 fake_answer 结构不固定,形式多变,或正或反,或分或合,笔画或多或少,相当灵活,具有很大的随意性
question id 181625 span_answer 

question id 181611 fake_answer 1、将干海参用自来水直接冲洗1分钟,洗掉表面少许微尘。2、置于1-10度凉纯净水中24小时左右,中间换水2次直至将海参泡软。3、将泡软的海参从腹部纵向剖开,去掉海参前端牙状物和体内白筋。4、添纯净水上无油锅加盖煮沸,用中火煮15-25分钟。5、换新的凉纯净水,泡24小时左右,中间换水2次直至发泡到2倍左右长度。6、泡好后,即可食用。可把多余的单独零度以下冷冻,建议2周内用完。7、如有个别海参没有发大,属于正常现象,可重复4、5步骤
question id 181611 span_answer 凉水泡24小时直至海参变软。第二步清洁剪掉海参的沙嘴,切断筋,清洗干净。第三步;将海参放入无油的,装入凉水的干净锅内,大火煮开改用小火煮50至60分钟左右,将海参捞出,用海参掐海参侧壁肉,能掐透或者稍变软即可,如没有则继续煮。第四步;水发,将煮好的海参捞出来,自然凉透之后

上面的结果是使用下面代码打印的。

with io.open(dev_path, 'r', encoding='utf-8') as fin:
    data_set = []
    for lidx, line in enumerate(fin):
        sample = json.loads(line.strip())
            if len(sample['answer_spans']) == 0:
                continue
            if len(sample['answer_docs']) == 0:
                continue
            if sample['answer_docs'][0] >= len(sample['documents']):
                continue

            print('fake answer', sample['fake_answers'][0])
        
            answer_doc_idx = sample['answer_passages'][0]
            start = sample['answer_spans'][0][0]
            end = sample['answer_spans'][0][1]
            print('span answer',
                  ''.join(sample['documents'][answer_doc_idx]['segmented_paragraphs'][0][start: end + 1]))

About the code in match_layer.py

In the class MatchLSTMAttnCell, the call function, there is a tf.expand_dims(tc.layers.fully_connected(ref_vector,
num_outputs=self._num_units,
activation_fn=None), 1))
Why need to expand the dimension as axis 1?
Thx

paddle infer mode KeyError: 'answer_docs'

bash run_demo.sh test_bidaf_demo bidaf infer --testset ../data/preprocessed/testset/search.test.json
tensorflow版本没有问题,可以跑通。
paddlepaddle版本,训练集和验证集因为有ground_truth, 所以存在"answer_docs"。但是在测试集中没有ground_truth,所以就不存在“answer_docs”。但是在paddle版本的dataset.py中确实存在json解析,必须要"answer_docs"字段。是我需要进一步预处理一下数据还是说要在测试的时候改这部分代码?

Performance decline when using pretrained embedding.

I use the pre-trained embedding (fasttext zh-300-vec) rather than the random initialized embedding.
The loss was lower in evaluate. But the performance declined when inference.
Is it the OOV issue or other problem.

The implementation of BiDAF is not exactly same as the one in the original paper

For the calculation of similarity matrix,

sim_matrix = tf.matmul(passage_encodes, question_encodes, transpose_b=True)

This is a simple dot product. But the original paper choose
untitled
where h is the passage representation and u is the question representation.
This maybe not a big problem, but it may still cause some difference.

modify NoneType error when run `sh run.sh --para_extraction`

When I run command sh run.sh --para_extraction, the following error occur:

Start paragraph extraction, this may take a few hours
Source dir: ../data/preprocessed
Target dir: ../data/extracted
Processing trainset
Processing devset
Processing testset
Traceback (most recent call last):
  File "paragraph_extraction.py", line 197, in <module>
    paragraph_selection(sample, mode)
  File "paragraph_extraction.py", line 111, in paragraph_selection
    status = dup_remove(doc)
  File "paragraph_extraction.py", line 66, in dup_remove
    if p_idx < para_id:
TypeError: '<' not supported between instances of 'int' and 'NoneType'
Traceback (most recent call last):
  File "paragraph_extraction.py", line 197, in <module>
    paragraph_selection(sample, mode)
  File "paragraph_extraction.py", line 111, in paragraph_selection
    status = dup_remove(doc)
  File "paragraph_extraction.py", line 66, in dup_remove
    if p_idx < para_id:
TypeError: '<' not supported between instances of 'int' and 'NoneType'

So, I New a pull request #45 to modify this bug.

Error for preprocessing data

In Readme, you said:

To preprocess the raw data, you should first segment 'question', 'title', 'paragraphs' and then store the segemented result into 'segmented_question', 'segmented_title', 'segmented_paragraphs' like the downloaded preprocessed data

Actually, the 'anwsers' should be segmented into 'segmented_anwsers' too, please notice that.

And , how long will it take for preprocessing data? I've run the python file for over 12 hours...

About Match-LSTM code

hi
when i read the code ./tensorflow/nn_layers/match_layer.py, the class MatchLSTMAttnCell.
I found the you concat H^p ,H^q\alpha and H^p-H^q\alpha, H^p*H^q\alpha as the input of Match-LSTM , but the model proposed by Wang only concat H^p and H^q\alpha as the new input of Match-LSTM.
Could u tell me why did you make such a change.

Unresolved reference

Unresolved reference brc_eval and find_answer in utils/baseline_eval.py, 27 line.

about tf.reduce_max function

DuReader/tensorflow/layers/match_layer.py line 94:
b = tf.nn.softmax(tf.expand_dims(tf.reduce_max(sim_matrix, 2), 1), -1)
i think tf.reduce_max(t,2) hasnt been defined.it should be 0 or 1.

Question about batch_size

Question about batch_size:
The run.py file has batch_size term,and I set it to 64.
In _train_epoch() method, I insert the code below to print run time shape:

def _train_epoch(self, train_batches, **dropout_keep_prob):
# some code.......
        for bitx, batch in enumerate(train_batches, 1):
            feed_dict = {self.p: batch['passage_token_ids'],
                         self.q: batch['question_token_ids'],
                         self.p_length: batch['passage_length'],
                         self.q_length: batch['question_length'],
                         self.start_label: batch['start_id'],
                         self.end_label: batch['end_id'],
                         self.dropout_keep_prob: dropout_keep_prob}
            # inserted code
            p_shape,q_shape,sl_shape = self.sess.run([self.p, self.q, self.start_label], feed_dict)
            self.logger.info('p_shape {}\nq_shape {}\nsl_shape {}'.format(p_shape.shape,q_shape.shape,sl_shape.shape ))

The output is:

p_shape (320, 500)
q_shape (320, 12)
sl_shape (64,)

The first dimension of p and q and start_label is different, and first dimension of p and q are fixed number 320 for any batch_size setting in run.py.

发现一个错误

id为181573的question,bs_rank_pos=2跟bs_rank_pos=40的段落几乎是一样的,但是一个为select=true,一个select=false

当然我不大清楚这种错误是否普遍,仅此报告一下

关于实现细节

co2wkmz8pl0p61wn 8u9 fb
论文中S是这样求的,而这个实现中只是将p和q进行简单的相乘?这是为什么?还有论文中Modeling Layer是两层双向LSTM为什么这里只用了一层?

tensorflow版本的baseline结果性能指标是多少?

PaddlePaddle版本的Baseline有提供在DuReader2.0数据集上的Dev ROUGE-L Test ROUGE-L。而TensorFlow版本的Baseline并没有提供。有劳提供下TensorFlow版在Dev数据集和Test数据集上的结果性能指标。另外,对于TensorFlow版是直接采用默认参数就可以获得上述指标,还是有其他超参数尚未提供出来?能否公布下对应的参数?谢谢!

Five questions have arisen when i run "sh run.sh --train --pass_num 5 --use_gpu=False".

1)ParallelExeccutor is deprecated.Please use CompiledProgram and Executor.CompiledProgram is a central place for optimization and Executor is the unified executor.Example can be found in conpiler.py.

2)[3548 graph.h:204]WARN:After a series of passes,the current graph can be quite different from OriginProgram.So,pleasse avoid using the ' OriginProgram () 'method!

3)You can try our memory optimize feature to save your memory usage:...
image

4)The number of graph should be only one,but the current graph has 8 sub_graphs.If you want to see the nodes of the sub_graphs,you should use 'FLAGS_print_sub_graph_dir' to specify the output dir. NOTES : if you not do training,please don't pass loss_var_name.

5)Traceback (most recent call last):
File "run.py",line 645, in
train(logger, args)
File "run.py", line 464, in train
args)
File "run.py", line 308, in validation
ave_loss = 1.0 * total_loss / count
ZeroDivisionError: float division by zero

About the entity answers

The test result of the baseline model all output a empty list for the entity_answers, so, for the evaluation of the entity question , which answer is used? The entity_answers or the answers ?

用demo数据运行paddle的问题

运行paddle的infer这步,需要用到preprocessed/testnet下的数据
dureader

但github代码中并没有demo对应的preprocessed数据。我按Preprocess the Data那节的命令来生成testnet预处理数据出错(trainset和devset成功,只有testset失败。经查search.test.json的确没有segmented_answers键)
paddle-demo-preprocess

这样导致用demo的数据无法执行paddle infer这步,执行完后models底下的infer目录是空的。

CNN用了吗?

这个tensorflow使用了CNN对character进行embedding吗?

KeyError: 'segmented_paragraphs'

mldl@mldlUB1604:/ub16_prj/DuReader$ cat data/raw/trainset/search.train.json | python3 utils/preprocess.py > data/preprocessed/trainset/search.train.json
Traceback (most recent call last):
File "utils/preprocess.py", line 217, in
find_fake_answer(sample)
File "utils/preprocess.py", line 158, in find_fake_answer
for p_idx, para_tokens in enumerate(doc['segmented_paragraphs']):
KeyError: 'segmented_paragraphs'
mldl@mldlUB1604:
/ub16_prj/DuReader$

About the preprocessed data

I see in the preprocessed data, there are several answers in the field of "answers", but only one element in "answer_docs" and "answer_spans", which answer should I choose? I also want to know does the element in "answer_docs" mean the index of selected passages? Thank you.

what should I do if I want to use my data?

I want to know what the various keys of the json data set represent. For example, ‘is_selected’, ‘answer_spans’, and ‘match_scores’. And I see that there are no such keys in the raw data.

Incorrect file path in line 126 of README.md

Seems a small change should be made in README.md:

126: python run.py --predict --algo BIDAF --test_files ../data/demo/search.dev.json 

=>

126: python run.py --predict --algo BIDAF --test_files ../data/demo/devset/search.dev.json 

数据集格式和应用的问题

看了下载的raw和preprocessed的头几行格式,大概总结出有以下的字段:
RAW
{
question,question_type?,fact_or_option?,question_id,
documents[{title,bs_rank_pos?,is_selected?,paragraphs[]}],
entity_answers[],
answers[]
}

PREPROCESSED
{
question,question_type,fact_or_option,question_id,
documents[{title,bs_rank_pos,is_selected,paragraphs[],+most_related_para?,+segmented_title[对title分词],+segmented_paragraphs[对paragraphs分词]}],
answers[],
+answer_spans[],?
+fake_answers[],?
+segmented_answers[对answers分词],
+answer_docs[],?
+segmented_question[对question分词],
+match_scores[]?,
+yesno_type?
}

有几个问题想了解一下:

  1. 打?的那些字段代表什么意义?REQUIRED还是OPTIONAL?取值不同或忽略对结果大概有什么影响?如果是由proprocess.py生成出来的字段就可以忽略不解释。
  2. 看过CLOSED ISUUSES,RAW里面的bs_rank_pos是搜索排名数,是越大越推荐还是越小越推荐?
  3. RAW里面的answers、entity_answers似乎都是答案,有什么区别?
  4. 准备用MRC做医学问答系统,让机器阅读不同版本的教材文字作为文章,用课后作业及其答案作为训练,或者从医学科普杂志上抽取问答,而不是搜索引擎中获取数据集。这种情况下,阅读文章该放在哪里?documents.paragraphs吗?另外诸如bs_rank_pos、is_selected等与搜索引擎相关的字段该怎么配置?

万望赐教,谢谢!

关于filter_tokens_by_cnt

函数的目的是重建the token x id map,但通过下面的实现产生的token x id map应该和原来的是一样的

# rebuild the token x id map
        self.token2id = {}
        self.id2token = {}
        for token in self.initial_tokens:
            self.add(token, cnt=0)
        for token in filtered_tokens:
            self.add(token, cnt=0)

是不是应该改成:

        self.initial_tokens = filtered_tokens
        self.initial_tokens.extend([self.pad_token, self.unk_token])
        # rebuild the token x id map
        self.token2id = {}
        self.id2token = {}
        for token in self.initial_tokens:
            self.add(token, cnt=0)

关于使用百度AI studio

有没有小伙伴在BaiduAI Studio上面跑起来的啊,第一次用这个东西,错误日志都不知道去哪里找,有没有小伙伴帮帮我啊...

I do not understand the meaning of this line and cannot locate which module output this stdout.

2019-02-22 13:16:30,550 - brc - INFO - Training the model for epoch 10
2019-02-22 13:16:36,357 - brc - INFO - Average train loss for epoch 10 is 8.694370711477179
2019-02-22 13:16:36,358 - brc - INFO - Evaluating the model after epoch 10
**{'testlen': 5920, 'reflen': 9147, 'correct': [2409, 1195, 762, 577], 'guess': [5920, 5821, 5722, 5623]}
ratio: 0.6472067344483823**
2019-02-22 13:16:42,349 - brc - INFO - Dev eval loss 14.789813613891601
2019-02-22 13:16:42,351 - brc - INFO - Dev eval result: {'Bleu-3': 0.12942842413184924, 'Bleu-1': 0.2359285965766089, 'Bleu-4': 0.10657137097095022, 'Bleu-2': 0.1675745997143726, 'Rouge-L': 0.24175041960869123}
2019-02-22 13:16:42,911 - brc - INFO - Model saved in ../data/models/, with prefix BIDAF.

Code implementation different with paper in match layer

In file match_layer.py , class MatchLSTMAttnCell, function __call__

The variable new_inputs is computed as follows:

new_inputs = tf.concat([inputs, attended_context,
             inputs - attended_context, inputs * attended_context],
            -1)

However, I noticed that in the paper of match-LSTM, it should be

2018-03-08 14 52 37

where, z_i corresponds to variable new_inputs, h_i^p corresponds to variable inputs, H^q * alpha_i^T corresponds to variable attended_context.

According to the algorithm in paper, the computation should be:

new_inputs = tf.concat([inputs, attended_context], -1)

So, I am wondering why inputs - attended_context and inputs * attended_context were involved.

@EastonWang

tensorflow error

[root@hd-master tensorflow]# python run.py --train --algo BIDAF --epochs 10
/opt/linuxsir/anaconda2/lib/python2.7/site-packages/h5py/init.py:34: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type.
from ._conv import register_converters as _register_converters
WARNING:tensorflow:From /opt/linuxsir/anaconda2/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/datasets/base.py:198: retry (from tensorflow.contrib.learn.python.learn.datasets.base) is deprecated and will be removed in a future version.
Instructions for updating:
Use the retry module or similar alternatives.
2018-04-27 13:52:09,567 - brc - INFO - Running with args : Namespace(algo='BIDAF', batch_size=32, brc_dir='../data/baidu', dev_files=['../data/demo/devset/search.dev.json'], dropout_keep_prob=1, embed_size=300, epochs=10, evaluate=False, gpu='0', hidden_size=150, learning_rate=0.001, log_path=None, max_a_len=200, max_p_len=500, max_p_num=5, max_q_len=60, model_dir='../data/models/', optim='adam', predict=False, prepare=False, result_dir='../data/results/', summary_dir='../data/summary/', test_files=['../data/demo/testset/search.test.json'], train=True, train_files=['../data/demo/trainset/search.train.json'], vocab_dir='../data/vocab/', weight_decay=0)
2018-04-27 13:52:09,567 - brc - INFO - Load data_set and vocab...
2018-04-27 13:52:10,653 - brc - INFO - Train set size: 95 questions.
2018-04-27 13:52:11,078 - brc - INFO - Dev set size: 100 questions.
2018-04-27 13:52:11,078 - brc - INFO - Converting text into ids...
2018-04-27 13:52:11,211 - brc - INFO - Initialize the model...
2018-04-27 13:52:18,377 - brc - INFO - Time to build graph: 7.16313290596 s
2018-04-27 13:52:26,644 - brc - INFO - There are 4995603 parameters in the model
2018-04-27 13:52:28,206 - brc - INFO - Training the model...
2018-04-27 13:52:28,206 - brc - INFO - Training the model for epoch 1
已杀死

problem of fake answer

in util/preprocess.py

find_fake_answer(sample)

fake answer means bad answer in English, but i find it is selected as golden answer in your code. what does fake answer mean?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.