GithubHelp home page GithubHelp logo

alibaba / esim-response-selection Goto Github PK

View Code? Open in Web Editor NEW
580.0 33.0 182.0 1.58 MB

ESIM for Multi-turn Response Selection Task

Home Page: https://arxiv.org/pdf/1901.02609.pdf

License: Apache License 2.0

Python 98.52% Shell 1.48%

esim-response-selection's Introduction

ESIM for Multi-turn Response Selection Task

Introduction

If you use this code as part of any published research, please acknowledge one of the following papers.

@inproceedings{chen2019sequential,
  title={Sequential Matching Model for End-to-end Multi-turn Response Selection},
  author={Chen, Qian and Wang, Wen},
  booktitle={ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={7350--7354},
  year={2019},
  organization={IEEE}
}
@article{DBLP:journals/corr/abs-1901-02609,
  author    = {Chen, Qian and Wang, Wen},
  title     = {Sequential Attention-based Network for Noetic End-to-End Response Selection},
  journal   = {CoRR},
  volume    = {abs/1901.02609},
  year      = {2019},
  url       = {http://arxiv.org/abs/1901.02609},
}

Requirement

  1. gensim
pip install gensim
  1. Tensorflow 1.9-1.12 + Python2.7

Steps

  1. Download the Ubuntu dataset released by (Xu et al, 2017)

  2. Unzip the dataset and put data directory into data/

  3. Preprocess dataset, including concatenatate context and build vocabulary

cd data
python prepare.py
  1. Train word2vec
bash run_train_word2vec.sh
  1. Train and test ESIM, the log information is in log.txt file. You could find an example log file in log_example.txt.
cd scripts/esim
bash run.sh

esim-response-selection's People

Contributors

enningxie avatar lukecq1231 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

esim-response-selection's Issues

中文语料

有中文语料吗,语料格式是怎么样的?

训练和实际应用的偏差

训练数据中,负样本是随机采样得到的,会差的比较离谱,所以在训练时, p@1可以得到很高的值,因为差中选一比较容易。
但是实际应用中,一个query过来,类似检索策略给出的候选,质量会相对训练时随机采样的高很多,模型效果也就比较差了。
要怎么做才能消除这种训练和应用的偏差呢?

数据预处理部分

在textIterator里面,下面这段代码写的有问题吧,按照注释,应该是按照context和response长度和排序,这里怎么写成了“current_length = ins[1] + ins[2]”,应该是"current_length = len(ins[1]) + len(ins[2])"吧

            # sort by length of sum of target buffer and target_buffer
            length_list = []
            for ins in self.instance_buffer:
                current_length = ins[1] + ins[2]
                length_list.append(current_length)

            length_array = numpy.array(length_list)
            length_idx = length_array.argsort()

语料中的content为啥混在一起了?

感谢阿里作者的无私奉献和共享。我看完语料后有些疑问,例如中文语料E-conmerce中的一行:
image
聊天内容被混在一起了,只有前两句和后两句用了tab键分开,中间的对话都没切分,混在了一起。
请问大佬,这是怎么回事呀?没理解

模型怎么使用

请问训练后得到的模型,怎么使用。有一些简单demo吗?

关于prepare_data函数

for l_x, s_x, l_y, s_y, l in zip(lengths_x, seqs_x, lengths_y, seqs_y, labels):
if l_x > maxlen_1:
new_seqs_x.append(s_x[-maxlen_1:])
new_lengths_x.append(maxlen_1)
else:
new_seqs_x.append(s_x)
new_lengths_x.append(l_x)
if l_y > maxlen_2:
new_seqs_y.append(s_y[:maxlen_2])
new_lengths_y.append(maxlen_2)
else:
new_seqs_y.append(s_y)
new_lengths_y.append(l_y)

这里为什么s_x是取后maxlen_1的词,而s_y是取前maxlen_2的词

tensorflow要用GPU版本的吗?

InvalidArgumentError (see above for traceback): No OpKernel was registered to support Op 'CudnnRNNCanonicalToParams' with these attrs. Registered devices: [CPU], Registered kernels:

performance is poor on valid set using e-commerce dataset

image

is there someone can help or give advice on how to make it work?

change hidden size to 100, so can use batch size 32.

hyper-parameter as below:

CUDA_VISIBLE_DEVICES=6 python -u main.py
--train_file=$DATA_DIR/train.txt
--valid_file=$DATA_DIR/valid.txt
--test_file=$DATA_DIR/test.txt
--vocab_file=$DATA_DIR/vocab.txt
--output_dir=result
--embedding_file=../../data/embedding_w2v_d300.txt
--maxlen_1=300
--maxlen_2=150
--hidden_size=100
--train_batch_size=32
--valid_batch_size=16
--test_batch_size=16
--fix_embedding=True
--patience=1 \

log.txt 2>&1 &

context = ' __eou__ __eot__ '.join(arr[1:-1]) + ' __eou__ __eot__ '

在data prepare 模块,为什么用两个符号“eou eot"分割一个utterance而不是更简单的符号比如”eou"?
论文里面说,"The multi-turn context was concatenated and two special tokens, eou and eot , were inserted, where eou denotes end-of utterance and eot denotes end-of-turn."
所以这里的turn时什么意思?是一问一答2个utterance的意思吗?
如果是的话,一个context就应该是这样的,在偶数句后面多一个__eot__:
A1 eou B1 eou eot A2 eou B2 eou eot A3 eou
希望作者帮忙解答一下,谢谢

model restore problem

Hi,I have an perplexed issue when I try to restore the model to inference corpus.The issue is originated from the use of tf.contrib.cudnn_rnn.CudnnLSTM to restore the model after using this function to train model.But there is no issue when I use the tf.contrib.rnn.LSTMCell function to train corpus and restore model to inference corpus.I found that many developers encounter the same issue with me and no one can handle this issue correctly.I read your code,you just restore it by using the saver.restore(sess, os.path.join(FLAGS.output_dir, "model_epoch_{}...),but I can't restore the model by using it.
image
Please help me to figure the issue out,thank you!My tensorflow's version is 1.10.0.

你好,期待您的开源

我尝试按照您的论文复现模型,不知是不是超参的问题,性能上不去,期待您的开源,向您请教。

gpu并行

请问有没有多卡并行的版本,或者简单说下怎么改。

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.