GithubHelp home page GithubHelp logo

apollo007fd / urank_umart Goto Github PK

View Code? Open in Web Editor NEW

This project forked from xiaofengzhu/urank_umart

0.0 0.0 0.0 6.6 MB

Listwise Learning to Rank by Exploring Unique Ratings (ICDM 2020)

License: Apache License 2.0

Shell 11.04% Python 68.27% C++ 4.21% Perl 16.47%

urank_umart's Introduction

uRank_uMart

Listwise Learning to Rank by Exploring Unique Ratings (accepted at ICDM 2020)

Configuration

Set a system variable RAW_RANK_DATA that stores the folder that contains the raw ranking data

Set a system variable TF_RANK_DATA that stores the folder that contains the serialized ranking data

Convert raw ranking data to tf records

python prepare_data.py

Your TF_RANK_DATA should now contain the serialized tf records for running the experiments.

params

src/experiments/base_model/params.json contains all of the hyper-parameters.

For instance, num_learners is used to control how many weak learners are used in uBoost and urBoost.

mlp_sizes controls the number of MLP layers and the number of neurons per layer.

residual_mlp_sizes controls the MLP network parameters for the gradient boosting step.

pooling and rnn are used for urRank and urBoost only. "pooling": "MP" denotes max-pooling, and "pooling": "AP" denotes average-pooling. "rnn": "C1" generally works better than "rnn": "C2".

batch_size needs to be 1, which indicates one query per batch or we put say all query-document features and labels that belong to the same query in the same batch.

uRank

Set num_learners to 1. Change the GPU number accordingly.

./run_OHSUMED_uRank.sh

uBoost

Set num_learners to an integer larger than 1.

./run_OHSUMED_uRank.sh

urRank

This model was not mentioned in the paper. It is uRank + RNN without boosting.

./run_OHSUMED_urRank.sh

urBoost

Set num_learners to an integer larger than 1.

./run_OHSUMED_urBoost.sh

uMart

Set a system variable LIGHTGBM_DATA that stores the folder that contains the LightGBM ranking data

Convert the raw learning-to-rank data sets to LightGBM inputs.

python msltr2libsvm.py

Replace the rank_objective.hpp in this repo with the same file in LightGBM (https://github.com/microsoft/LightGBM) and compile to obtain the binary for training uMart.

Notice

Please keep in mind that the default NDCG calculation in LightGBM takes queries with all 0 labels as NDCG any position 1, i.e., NDCG@1=1,  NDCG@3=1, NDCG@5=1, NDCG@10=1.

The original NDCG script used for MQ2007 and MQ2008 data sets take such case as 0 (see mslr-eval-score-mslr-original.pl).

Yahoo learning to rank dataset takes such case as 0.5.

We remove queries with all 0 labels from all of the data sets to avoid this confusion. This was done during python prepare_data.py.

We fixed the logic and added ERR calculation based on the perl script (see mslr-eval-score-mslr.pl).

We also implemented ranknet, listnet, listmle in this repo, however, it might not as efficient as TF-Ranking https://github.com/tensorflow/ranking. Other implementations can be found at https://github.com/microsoft/LightGBM and https://sourceforge.net/p/lemur/wiki/RankLib/.

Citation

Please kindly cite our work if you would like to use our code.

Xiaofeng Zhu and Diego Klabjan. 2020. Listwise Learning to Rank by Explor-ing Unique Ratings. InThe Thirteenth ACM International Conference on WebSearch and Data Mining (WSDM ’20), February 3–7, 2020, Houston, TX, USA.WSDM, Houston, TX, USA.

urank_umart's People

Contributors

xiaofengzhu avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.