GithubHelp home page GithubHelp logo

adedzy / k-nrm Goto Github PK

View Code? Open in Web Editor NEW
202.0 18.0 43.0 274 KB

K-NRM: End-to-End Neural Ad-hoc Ranking with Kernel Pooling

License: BSD 3-Clause "New" or "Revised" License

Python 99.13% Shell 0.87%
information-retrieval neural-network deep-learning

k-nrm's Introduction

K-NRM

This is the implementation of the Kernel-based Neural Ranking Model (K-NRM) model from paper End-to-End Neural Ad-hoc Ranking with Kernel Pooling.

If you use this code for your scientific work, please cite it as (bibtex):

C. Xiong, Z. Dai, J. Callan, Z. Liu, and R. Power. End-to-end neural ad-hoc ranking with kernel pooling. 
In Proceedings of the 40th International ACM SIGIR Conference on Research & Development in Information Retrieval. 
ACM. 2017.

Requirements


  • Tensorflow 0.12
  • Numpy
  • traitlets

Coming soon: K-NRM with Tensorflow 1.0

Guide To Use


Configure: first, configure the model through the config file. Configurable parameters are listed here

sample.config

Training : pass the config file, training data and validation data as

python ./knrm/model/model_knrm.py config-file\
    --train \
    --train_file: path to training data\
    --validation_file: path to validation data\
    --train_size: size of training data (number of training samples)\
    --checkpoint_dir: directory to store/load model checkpoints\ 
    --load_model: True or False. Start with a new model or continue training

sample-train.sh

Testing: pass the config file and testing data as

python ./knrm/model/model_knrm.py config-file\
    --test \
    --test_file: path to testing data\
    --test_size: size of testing data (number of testing samples)\
    --checkpoint_dir: directory to load trained model\
    --output_score_file: file to output documents score\

Relevance scores will be output to output_score_file, one score per line, in the same order as test_file. We provide a script to convert scores into trec format.

./knrm/tools/gen_trec_from_score.py

Data Preperation


All queries and documents must be mapped into sequences of integer term ids. Term id starts with 1. -1 indicates OOV or non-existence. Term ids are sepereated by ,

Training Data Format

Each training sample is a tuple of (query, postive document, negative document)

query \t postive_document \t negative_document \t score_difference

Example: 177,705,632 \t 177,705,632,-1,2452,6,98 \t 177,705,632,3,25,14,37,2,146,159, -1 \t 0.119048

If score_difference < 0, the data generator will swap postive docment and negative document.

If score_difference < lickDataGenerator.min_score_diff, this training sample will be omitted.

We recommend shuffling the training samples to ease model convergence.

Testing Data Format

Each testing sample is a tuple of (query, document)

q \t document

Example: 177,705,632 \t 177,705,632,-1,2452,6,98

Configurations


Model Configurations

  • BaseNN.n_bins: number of kernels (soft bins) (default: 11. One exact match kernel and 10 soft kernels)
  • Knrm.lamb: defines the guassian kernels' sigma value. sigma = lamb * bin_size (default:0.5 -> sigma=0.1)
  • BaseNN.embedding_size: embedding dimension (default: 300)
  • BaseNN.max_q_len: max query length (default: 10)
  • BaseNN.max_d_len: max document length (default: 50)
  • DataGenerator.max_q_len: max query length. Should be the same as BaseNN.max_q_len (default: 10)
  • DataGenerator.max_d_len: max query length. Should be the same as BaseNN.max_d_len (default: 50)
  • BaseNN.vocabulary_size: vocabulary size.
  • DataGenerator.vocabulary_size: vocabulary size.

Data

  • Knrm.emb_in: initial embeddings
  • DataGenerator.min_score_diff: minimum score differences between postive documents and negative ones (default: 0)

Training Parameters

  • BaseNN.bath_size: batch size (default: 16)
  • BaseNN.max_epochs: max number of epochs to train
  • BaseNN.eval_frequency: evaluate model on validation set very this steps (default: 1000)
  • BaseNN.checkpoint_steps: save model very this steps (default: 10000)
  • Knrm.learning_rate: learning rate for Adam Opitmizer (default: 0.001)
  • Knrm.epsilon: epsilon for Adam Optimizer (default: 0.00001)

Efficiency

During training, it takes about 60ms to process one batch on a single-GPU machine with the following settings:

  • batch size: 16
  • max_q_len: 10
  • max_d_len: 50
  • vocabulary_size: 300K

Smaller vocabulary and shorter documents accelerate the training.

Click2Vec


We also provide the click2vec model as described in our paper.

  • ./knrm/click2vec/generate_click_term_pair.py: generate <query_term, clicked_title_term> pairs
  • ./knrm/click2vec/run_word2vec.sh: call Google's word2vec tool to train click2vec.

Cite the paper


If you use this code for your scientific work, please cite it as:

C. Xiong, Z. Dai, J. Callan, Z. Liu, and R. Power. End-to-end neural ad-hoc ranking with kernel pooling. 
In Proceedings of the 40th International ACM SIGIR Conference on Research & Development in Information Retrieval. 
ACM. 2017.
@inproceedings{xiong2017neural,
  author          = {{Xiong}, Chenyan and {Dai}, Zhuyun and {Callan}, Jamie and {Liu}, Zhiyuan and {Power}, Russell},
  title           = "{End-to-End Neural Ad-hoc Ranking with Kernel Pooling}",
  booktitle       = {Proceedings of the 40th International ACM SIGIR Conference on Research & Development in Information Retrieval},
  organization    = {ACM},
  year            = 2017,
}

k-nrm's People

Contributors

adedzy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

k-nrm's Issues

Dataset Preparation

Hi,
I have a dataset of 16000 docs and I have some queries. For each query there can be more than one relevant document. Can you tell me how can I prepare my data and also the evaluation?

can use the samples that their different score is 1?

Hello,
I use Letor4 dataset, which owns three levels of judgement for query-docs, 2, 1, 0. Then the difference of scores between two samples are always 1, because I generate training or testing sample by considering "level-2 doc and level-1 doc" , "level-1 and level-0 doc", is it OK?

About embedding usage

Hello! I'd like to know about the details of embedding usage in this model: should the values in the embedding be set to range (0,1); is it ok if i set the labels of words which didn't appear in the embedding to 0 instead of -1(i believe this is ok according to the implement.) Thanks!

关于训练样本数据的疑问

你好,打扰问下,query \t postive_document \t negative_document \t score_difference 这个训练数据如何产生呢?score_difference能否再解释一次,是什么分数的差异,我如何构造这样的样本,谢谢!

Maybe unfair to compare simple k-nrm-max/mean-pool approach with k-nrm-kernel-pool in the paper

In the paper, k-nrm-kernel-pool can be summarized as K sum(log(sum(exp(-(x-u)**2 / 2*sigma**2)))) with K different kernel parameters. Thus, the output of the pooling (and the input of fully connected W,b layer) is batch_size * K. And the shape of W, b is K*1.

In my understanding (correct me if I am wrong), the k-nrm-max/mean-pool in this paper simply calculate sum(sum(x)) or avg(avg(x)). Thus, the output of pooling (and the input of fully connected layer) is batch_size * 1. And the shape of W, b is 1*1.

It might be unfair to compare these 2 approaches because of the difference in the fully-connected/LTR layer.

Hereby, I suggest to approximate sum(exp(.)) by max(exp(.)). Thus the kernel pooling can be approximated by

sum(log(max(exp(-(x-u)**2 / 2*sigma**2)))) = sum(max(-(x-u)**2 / 2*sigma**2))

. It can be regarded as a simple square kernel + 1D max pooling + 1D avg pooling. Thus the output size of the pooling is still batch_size * K with K different kernel parameters.

In this case, the superiority of RBF kernel can be proved more convincing.

关于怎么处理文档

您好,对于训练数据来说:
1,2,3 \t 4,5,6 \t 7,8,9
代表一个样本的话那么1,2,3分别是种子词的对应的id,
那么4,5,6,是3篇文档的编号吗?这个编号是怎么来的,随机分配的吗?
怎么处理文档这里不太了解,是先要分词吗?就是map成id这部不太明白,如果有时间的话
解答一下吧,谢谢您!

关于模型的小疑问

博士学姐,我最近用了你的思路实验了下用户序列预测,用户之前的点击item作为q,后续的item是否点击作为d,实验的auc达到0.6,不过训练达到0.7可能过拟合了,商品维度是100百万维,参数大约2亿多,当然比单纯的商品ctr基线要好。我现在疑问几点。1)过拟合是怎么出现的,好像有点轻微拟合。2)embeding后kernel Pooling起到特征抽取的作用,那么这次实验有效果是由于这层起到重要作用还是?好像能学习到用户这之前的点击序列上能大概知道下一时刻想要的是什么?3)在有些实验样本上表现比较差,分数区分性很小,有的序列下不同商品相关性分数都一样,这个问题?期待你的回复,先谢谢了!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.