GithubHelp home page GithubHelp logo

irdm's Introduction

Information Retrieval and Data Mining Coursework 2017

Learning to Rank

Group 20 - Anna Aleksieva, Hugo Chu, Raluca Georgescu, Julija Bainiaksinaite

This project evaluates a range of Learning to Rank algorithms RankNet, LambdaMART and AdaRank, together with an implementation of ranking as a classification task and proposes an improved approach based on neural networks.

The following are instructions to run each algorithm proposed:

  • RankNet

    • RankNet algorithm was implemented using the open source Lemur Project software, RankLib library (version used: RankLib-2.8.jar). The software has been run using the below commands directly from the command line. User should be in the same directory as RankLib-2.8.jar file, have access to MSLR-WEB10K dataset and follow the following commands:
    • To train the RankNet model with 5 fold cross validation using the train.txt files from Fold1 (evaluation metrics NDCG@10):
    java -jar RankLib-2.8.jar -kcv 5 -train MSLR-WEB10K/Fold1/train.txt -ranker 1 -epoch 100 -layer 2 -node 25 -lr 0.00005 -metric2T NDCG@10  -save modelranknet_trial10.txt -kcvmd ~/Desktop -kcvmn modelranknet_trial10_crval.txt -tvs 0.7
    
    • To train the RankNet model using the train.txt, val.txt and test.txt files (without cross validation) from Fold1 (evaluation metrics NDCG@10):
    java -jar RankLib-2.8.jar -train MSLR-WEB10K/Fold1/train.txt -validate MSLR-WEB10K/Fold1/vali.txt -test MSLR-WEB10K/Fold1/test.txt -ranker 1  -epoch 50 -layer 2 -node 20 -lr 0.0001 -metric2t NDCG@10 -save IRDMCW/modelranknet_trial4.txt
    
    • The above examples refer to the data in Fold1 only, due to the computational power restrictions, the models have been run separetly on all 5 folds (Fold1-Fold5) and then the results averaged. The models have been trained using NDCG@10 evaluation metrics.
    • Parameters tuned: Epochs [50, 100], Layers [1, 2], Nodes [5, 10, 15, 20, 25, 50], Learning Rate [0.00005, 0.0001, 0.0005, 0.005]. All trained models are saved under RankNet/RankNet_Models/.
    • After the models have been trained and parameters tuned, they have been used to rerank the test data and provide with the new ranking for test.txt files, the following command has been used to do the reranking:
    java -jar RankLib-2.8.jar -load IRDMCW/modelranknet_trial10.txt -rank MSLR-WEB10K/Fold1/test.txt -score IRDMCW2017/reranking_trial10.txt
    
  • LambdaMART

    • The implementation of LambdaMART has made use of the RankPy Library (https://bitbucket.org/tunystom/rankpy)
    • Assumptions:
      • The RankPy library is installed in the LambdaMART directory by following the instructions in the link above.
      • The MSLR-WEB10K data folder is existing in the same directory.
    • First, run the prep.py script to convert the data .txt files to binary for easier loading.
    python prep.py
    
    • All the pretrained models are placed in the subdirectory models/. To reload the pretrained models and evaluate them on nDCG@10 and MAP, run the following command within the directory:
    python reload.py
    
    • To run the grid search on hyperparameters:
    python grid_models.py
    
    • To train a model for each fold and average over the results for nDCG@10 and MAP:
    python model_folds.py
    
  • AdaRank

    • Based on the implementation from RankLib, Lemur Project, the code has been run using the following rules from the command line. The user has to be in the same directory with the downloaded RankLib-2.1-patched.jar library file. And has to have access to the MSLR-WEB10K dataset, also from the same current folder.
    java -jar RankLib-2.1-patched.jar -train MSLR-WEB10K/Fold1/train.txt -test MSLR-WEB10K/Fold1/test.txt -validate MSLR-WEB10K/Fold1/vali.txt -ranker 6 -metric2t NDCG@10 -metric2T ERR@10 -save model_fold1_NDCG.txt -noeq -max 1
    
    java -jar RankLib-2.1-patched.jar -train MSLR-WEB10K/Fold1/train.txt -test MSLR-WEB10K/Fold1/test.txt -validate MSLR-WEB10K/Fold1/vali.txt -ranker 6 -metric2t MAP -metric2T ERR@10 -save model_fold1_MAP.txt -noeq -max 1
    
    • Examples refer only to Fold1, but they should be run for all 5 folds and then results would be averaged.
    • In order to obtain the ranking scores, the following command was run (again, command is repeated for all 5 folds and results are averaged):
    java -jar RankLib-2.1-patched.jar -rank MSLR-WEB10K/Fold1/test.txt -load model_fold1_NDCG.txt -score scores_NDCG.txt
    
    • The code for evaluating the rankings against the implemented NDCG@10 and MAP metrics is available in the AdaRank/AdaRank_Evaluation notebook.
  • Logistic Regression Go to LR_and_DeepNet/Implementations folder, run test_logistic_regression.py to see the models train. The parameters have been tuned to converge consistently within 15000 steps. You will need one fold of the dataset in LR_and_DeepNet/All

Project should be run with Python3.

  • DeepNet Go to LR_and_DeepNet/Implementations folder, run DeepNet.py to get NDCG and MAP of trained model. Please be reminded this takes a lot of computational resources and time to run. You will need one fold of the dataset in LR_and_DeepNet/All

Use Tensorflow 1.0

irdm's People

Contributors

julijab avatar ralucageorgescu avatar hugothchu avatar aleksieva avatar

Stargazers

 avatar

Watchers

James Cloos avatar  avatar  avatar

Forkers

ncanna2

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.