GithubHelp home page GithubHelp logo

learning2rank's Introduction

Learning to Rank

An easy implementation of algorithms of learning to rank. Pairwise (RankNet) and ListWise (ListNet) approach. There implemented also a simple regression of the score with neural network. [Contribution Welcome!]

Requirements

RankNet

Pairwise comparison of rank

The original paper was written by Chris Burges et al., "Learning to Rank using Gradient Descent." (available at http://research.microsoft.com/en-us/um/people/cburges/papers/ICML_ranking.pdf)

Usage

Import and initialize

from learning2rank.rank import RankNet
Model = RankNet.RankNet()

Fitting (automatically do training and validation)

Model.fit(X, y)

Here, X is numpy array with the shape of (num_samples, num_features) and y is numpy array with the shape of (num_samples, ). y is the score which you would like to rank based on (e.g., Sales of the products, page view, etc).

Possible options and defaults:

batchsize=100, n_iter=5000, n_units1=512, n_units2=128, tv_ratio=0.95, optimizerAlgorithm="Adam", savefigName="result.pdf", savemodelName="RankNet.model"

n_units1 and n_units2=128 are the number of nodes in hidden layer 1 and 2 in the neural net.

tv_ratio is the ratio of the data amounts between training and validation.

Predict

Model.predict(X)

ListNet

Listwise comparison of rank

The original paper was written by Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, Hang Li "Learning to Rank: From Pairwise Approach to Listwise Approach." (Available at http://research.microsoft.com/en-us/people/tyliu/listnet.pdf)

NOTICE: The top-k probability is not written. This is listwise approach with neuralnets, comparing two arrays by Jensen-Shannon divergence.

Usage

Import and initialize

from learning2rank.rank import ListNet
Model = ListNet.ListNet()

Fitting (automatically do training and validation)

Model.fit(X, y)

Same as ranknet, X is numpy array with the shape of (num_samples, num_features) and y is numpy array with the shape of (num_samples, ). y is the score which you would like to rank based on (e.g., Sales of the products, page view, etc).

Possible options and defaults:

batchsize=100, n_epoch=200, n_units1=512, n_units2=128, tv_ratio=0.95, optimizerAlgorithm="Adam", savefigName="result.pdf", savemodelName="ListNet.model"

Predict

Model.predict(X)

Regression

Regression the scores with neural network

Usage

Import and initialize

from learning2rank.regression import NN
Model = NN.NN()

Fitting (automatically do training and validation)

Model.fit(X, y)

Possible options and defaults:

batchsize=100, n_iter=5000, n_units1=512, n_units2=128, tv_ratio=0.95, optimizerAlgorithm="Adam", savefigName="result.pdf", savemodelName="RankNet.model"

n_units1 and n_units2=128 are the number of nodes in hidden layer 1 and 2 in the neural net.

tv_ratio is the ratio of the data amounts between training and validation.

Predict

Model.predict(X)

Author

If you have any troubles or questions, please contact shiba24.

March, 2016

learning2rank's People

Contributors

akashrajkn avatar betterenvi avatar shiba24 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

learning2rank's Issues

Examples of vector X and Y

Thank you for you code. But sorry, I don't understand how to use it. Could you please explain how to set up data for training in vector X and Y? Could you please provide more details?

Invalid value encountered

Hi,

thanks a lot for sharing the code!

I am trying to use ListNet to learn a ranking problem:

import numpy as np
import random

from learning2rank.rank import ListNet

n = 1000
d = 200
X = np.random.rand(n,d)
y = np.random.rand(n)

model = ListNet.ListNet()
random.seed(1313)
model.fit(X, y, 
          batchsize=16,
          n_epoch=10,
          n_units1=256,
          n_units2=256,
          tv_ratio=0.67,
          optimizerAlgorithm="Adam",
          savefigName="result.pdf",
          savemodelName="ListNet.model")

However, I run into the following error messages.

Start training and validation loop......
epoch 1
0%| | 0/42 [00:00<?, ?it/s]C:\ProgramData\Anaconda3\lib\site-packages\chainer\functions\math\exponential.py:51: RuntimeWarning: invalid value encountered in log
return utils.force_array(numpy.log(x[0])),
C:\ProgramData\Anaconda3\lib\site-packages\chainer\functions\activation\relu.py:38: RuntimeWarning: invalid value encountered in maximum
return utils.force_array(numpy.maximum(x, 0, dtype=x.dtype)),
C:\ProgramData\Anaconda3\lib\site-packages\learning2rank\rank\ListNet.py:58: RuntimeWarning: invalid value encountered in greater
ind = vec_true.data * vec_compare.data > 0
C:\ProgramData\Anaconda3\lib\site-packages\chainer\functions\activation\relu.py:97: RuntimeWarning: invalid value encountered in greater

Any idea what's going here? I am clueless...

Runtime Warning and unchanged loss

Hi
I am facing two problems when using ListNet in this code with Letor dataset.
Problem 1:
My Loss does not seem to be decreasing. Following is the situation in the start.

epoch: 2
NDCG@100 | train: 0.2016394568476937, test: 0.19944033792067814

and these values are same for epoch 200
train mean loss=0.0
test mean loss=0.0
epoch: 201
NDCG@100 | train: 0.2016394568476937, test: 0.19944033792067814

Can you please comment why the loss isnt changing at all?

Problem 2:
Chainer maths warnings in log functions. Can you please tell how can I get rid of the following warnings?

Can you tell me how to get rid of Runtime warning related to chainer maths log functions?
..Anaconda3\lib\site-packages\chainer\functions\math\exponential.py:47: RuntimeWarning: divide by zero encountered in log
return utils.force_array(numpy.log(x[0])),
..Anaconda3\lib\site-packages\chainer\functions\math\exponential.py:47: RuntimeWarning: invalid value encountered in log
return utils.force_array(numpy.log(x[0])),
..Anaconda3\lib\site-packages\chainer\functions\math\basic_math.py:240: RuntimeWarning: invalid value encountered in multiply
return utils.force_array(x[0] * x[1]),

Thankyou

setup and installation

Can you please guide me on how to do the installation?
Did not mention anything about setup in the readme appreciate it if you share info on how to do installation

Merge from betterenvi repo

Hi @betterenvi
thank you for your modification of this repo!
I looked through your repository (and commits), and think it is a nice change. Could you send a PR to this master branch, if it doesn't bother you?
Thank you for reading!

input/output

Hi
Please expalin the input and output formats required for this, is the output of ranknet a probability or a rank?
please clarify

Does this work with (almost) binary y's?

Hi,

Thanks so much for making your code available online!

I had a question: does your approach work if the y's are almost binary (very close to 0 or very close to 1)? Because I tried it and when I did

 from learning2rank.rank import RankNet, ListNet
 Model = RankNet.RankNet()
 Model.fit(X,y()
 predy = Model.predict(X)

 np.min(predy),np.max(predy)

I got 0.0, 0.0.

My X data consist of 6 features (float rankings of objects according to 6 different approaches), and about 100,000 rows. The y's are close to either 0 or 1, depending on whether the objects appeared in a gold standard dataset. I am not sure if the code is designed to work for this type of setup?

Thank you!

model initiating error

Hi, I am implementing the model on MovieLens dataset, I am facing an issue with model training.
When I start training on the dataset, it generates the following error,

`InvalidType:
Invalid operation is performed in: LinearFunction (Forward)

Expect: x.shape[1] == W.shape[1]
Actual: 5 != 950198`

The complete output of model.fit(X,y) is as follows:

`load dataset
The number of data, train: 950198 validate: 50011
prepare initialized model!

0%| | 0/5000 [00:00<?, ?it/s]


InvalidType Traceback (most recent call last)
in
----> 1 model.fit(X,y)

C:/Users/ppawar/Desktop/Genesys_PDP_code/ml-1m/learning2rank/rank\RankNet.py in fit(self, fit_X, fit_y, batchsize, n_iter, n_units1, n_units2, tv_ratio, optimizerAlgorithm, savefigName, savemodelName)
131 self.initializeModel(Model, train_X, n_units1, n_units2, optimizerAlgorithm)
132
--> 133 self.trainModel(train_X, train_y, validate_X, validate_y, n_iter)
134
135 plot_result.acc(self.train_loss, self.test_loss, savename=savefigName)

C:/Users/ppawar/Desktop/Genesys_PDP_code/ml-1m/learning2rank/rank\RankNet.py in trainModel(self, x_train, y_train, x_test, y_test, n_iter)
111 y_j = chainer.Variable(y_train[j])
112
--> 113 self.optimizer.update(self.model, x_i, x_j, y_i, y_j)
114
115 if (step + 1) % loss_step == 0:

~\AppData\Local\Continuum\anaconda3\envs\tensorflow_gpu_keras\lib\site-packages\chainer\optimizer.py in update(self, lossfun, *args, **kwds)
678 if lossfun is not None:
679 use_cleargrads = getattr(self, '_use_cleargrads', True)
--> 680 loss = lossfun(*args, **kwds)
681 if use_cleargrads:
682 self.target.cleargrads()

C:/Users/ppawar/Desktop/Genesys_PDP_code/ml-1m/learning2rank/rank\RankNet.py in call(self, x_i, x_j, t_i, t_j)
35 )
36 def call(self, x_i, x_j, t_i, t_j):
---> 37 s_i = self.l3(F.relu(self.l2(F.relu(self.l1(x_i)))))
38 s_j = self.l3(F.relu(self.l2(F.relu(self.l1(x_j)))))
39 s_diff = s_i - s_j

~\AppData\Local\Continuum\anaconda3\envs\tensorflow_gpu_keras\lib\site-packages\chainer\link.py in call(self, *args, **kwargs)
240 if forward is None:
241 forward = self.forward
--> 242 out = forward(*args, **kwargs)
243
244 # Call forward_postprocess hook

~\AppData\Local\Continuum\anaconda3\envs\tensorflow_gpu_keras\lib\site-packages\chainer\links\connection\linear.py in forward(self, x, n_batch_axes)
136 in_size = functools.reduce(operator.mul, x.shape[1:], 1)
137 self._initialize_params(in_size)
--> 138 return linear.linear(x, self.W, self.b, n_batch_axes=n_batch_axes)

~\AppData\Local\Continuum\anaconda3\envs\tensorflow_gpu_keras\lib\site-packages\chainer\functions\connection\linear.py in linear(x, W, b, n_batch_axes)
286 args = x, W, b
287
--> 288 y, = LinearFunction().apply(args)
289 if n_batch_axes > 1:
290 y = y.reshape(batch_shape + (-1,))

~\AppData\Local\Continuum\anaconda3\envs\tensorflow_gpu_keras\lib\site-packages\chainer\function_node.py in apply(self, inputs)
243
244 if configuration.config.type_check:
--> 245 self._check_data_type_forward(in_data)
246
247 hooks = chainer.get_function_hooks()

~\AppData\Local\Continuum\anaconda3\envs\tensorflow_gpu_keras\lib\site-packages\chainer\function_node.py in _check_data_type_forward(self, in_data)
328 in_type = type_check.get_types(in_data, 'in_types', False)
329 with type_check.get_function_check_context(self):
--> 330 self.check_type_forward(in_type)
331
332 def check_type_forward(self, in_types):

~\AppData\Local\Continuum\anaconda3\envs\tensorflow_gpu_keras\lib\site-packages\chainer\functions\connection\linear.py in check_type_forward(self, in_types)
25 x_type.ndim == 2,
26 w_type.ndim == 2,
---> 27 x_type.shape[1] == w_type.shape[1],
28 )
29 if type_check.eval(n_in) == 3:

~\AppData\Local\Continuum\anaconda3\envs\tensorflow_gpu_keras\lib\site-packages\chainer\utils\type_check.py in expect(*bool_exprs)
544 for expr in bool_exprs:
545 assert isinstance(expr, Testable)
--> 546 expr.expect()
547
548

~\AppData\Local\Continuum\anaconda3\envs\tensorflow_gpu_keras\lib\site-packages\chainer\utils\type_check.py in expect(self)
481 raise InvalidType(
482 '{0} {1} {2}'.format(self.lhs, self.exp, self.rhs),
--> 483 '{0} {1} {2}'.format(left, self.inv, right))
484
485

InvalidType:
Invalid operation is performed in: LinearFunction (Forward)

Expect: x.shape[1] == W.shape[1]
Actual: 5 != 950198
`

listwise_cost

I want to train ListNet to re-rank retrieved document so I got this error
Traceback (most recent call last):
File "ranking/rank/train.py", line 88, in
model.fit(X_train, y_train, X_test, y_test, Query, Query2, batchsize, n_epoch, n_hidden1, n_hidden2)
File "/home/ama/mhidy/nn_ranking/RankNet_chainer/ranking/rank/ListNet.py", line 193, in fit
self.trainModel(train_X, train_y, validate_X, validate_y, query, query_validate, n_epoch, batchsize)
File "/home/ama/mhidy/nn_ranking/RankNet_chainer/ranking/rank/ListNet.py", line 160, in trainModel
self.optimizer.update(self.model, x, t)
File "/home/ama/mhidy/.local/lib/python2.7/site-packages/chainer/optimizer.py", line 392, in update
loss = lossfun(*args, **kwds)
File "/home/ama/mhidy/nn_ranking/RankNet_chainer/ranking/rank/ListNet.py", line 61, in call
self.loss = self.listwise_cost(y, t)
File "/home/ama/mhidy/nn_ranking/RankNet_chainer/ranking/rank/ListNet.py", line 91, in listwise_cost
return - np.sum(self.topkprob(list_ans) * np.log(self.topkprob(list_pred)))
File "/home/ama/mhidy/nn_ranking/RankNet_chainer/ranking/rank/ListNet.py", line 85, in topkprob
vec_sort = np.sort(vec)[-1::-1]
File "/home/ama/mhidy/.local/lib/python2.7/site-packages/numpy/core/fromnumeric.py", line 824, in sort
a = asanyarray(a).copy(order="K")
File "/home/ama/mhidy/.local/lib/python2.7/site-packages/numpy/core/numeric.py", line 533, in asanyarray
return array(a, dtype, copy=False, order=order, subok=True)
File "/home/ama/mhidy/.local/lib/python2.7/site-packages/chainer/functions/array/get_item.py", line 71, in get_item
return GetItem(slices)(x)
File "/home/ama/mhidy/.local/lib/python2.7/site-packages/chainer/function.py", line 189, in call
self._check_data_type_forward(in_data)
File "/home/ama/mhidy/.local/lib/python2.7/site-packages/chainer/function.py", line 271, in _check_data_type_forward
six.raise_from(
AttributeError: 'module' object has no attribute 'raise_from'

IndexError: list index out of range

I wrote the following code

import numpy as np

import utils, rank, regression

from rank import RankNet

Model = RankNet.RankNet()

X = np.array([[1, 2], [2, 3], [4, 5], [1, 3], [0, 0]]);
y = np.array([1, 2, 3, 4, 5]);
Model.fit(X, y);
Model.predict(X);

and got the next error, could you please help me with it

Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "/Applications/PyCharm.app/Contents/helpers/pydev/_pydev_bundle/pydev_umd.py", line 197, in runfile
    pydev_imports.execfile(filename, global_vars, local_vars)  # execute the script
  File "/Users/igladush/opensource-projects/learning2rank/__init__.py", line 3, in <module>
    import utils, rank, regression
  File "/Applications/PyCharm.app/Contents/helpers/pydev/_pydev_bundle/pydev_import_hook.py", line 21, in do_import
    module = self._system_import(name, *args, **kwargs)
  File "/Users/igladush/opensource-projects/learning2rank/rank/__init__.py", line 1, in <module>
    import ListNet, RankNet
  File "/Applications/PyCharm.app/Contents/helpers/pydev/_pydev_bundle/pydev_import_hook.py", line 21, in do_import
    module = self._system_import(name, *args, **kwargs)
  File "/Users/igladush/opensource-projects/learning2rank/rank/ListNet.py", line 18, in <module>
    from learning2rank.utils import plot_result
  File "/Applications/PyCharm.app/Contents/helpers/pydev/_pydev_bundle/pydev_import_hook.py", line 21, in do_import
    module = self._system_import(name, *args, **kwargs)
  File "/Users/igladush/opensource-projects/learning2rank/rank/../../learning2rank/__init__.py", line 11, in <module>
    Model.fit(X, y);
  File "/Users/igladush/opensource-projects/learning2rank/rank/../../learning2rank/rank/RankNet.py", line 123, in fit
    self.trainModel(train_X, train_y, validate_X, validate_y, n_iter)
  File "/Users/igladush/opensource-projects/learning2rank/rank/../../learning2rank/rank/RankNet.py", line 108, in trainModel
    train_ndcg = self.ndcg(y_train, train_score)
  File "/Users/igladush/opensource-projects/learning2rank/rank/../../learning2rank/rank/RankNet.py", line 84, in ndcg
    ideal_dcg += (2 ** y_true_sorted[i] - 1.) / np.log2(i + 2)

Getting following error while running this code

RESTART: G:\Implementation of the Project\Learning to Rank Algorithms\Python\ListNet+RankNet\learning2rank-master\rank\ListNet.py
Traceback (most recent call last):
File "G:\Implementation of the Project\Learning to Rank Algorithms\Python\ListNet+RankNet\learning2rank-master\rank\ListNet.py", line 18, in
from learning2rank.utils import NNfuncs
ModuleNotFoundError: No module named 'learning2rank'

simple regression example

I tried simple regression example.
but, I received following error messages.

import sys, os
import numpy as np
from learning2rank.regression import NN

X = np.array([[1, 1, 1], [2, 2, 2], [3, 3, 3]])
y = np.array([[1], [2], [3]])

Model = NN.NN()
Model.fit(X, y)

X = np.array([[1, 1, 1]])
Model.predict(X)


AssertionError Traceback (most recent call last)
in ()
11
12 X = np.array([[1, 1, 1]])
---> 13 Model.predict(X)

/root/learning2rank/utils/NNfuncs.pyc in predict(self, predict_X)
66
67 def predict(self, predict_X):
---> 68 return self.model.predict(predict_X.astype(np.float32))
69
70 # def predict(self, predict_X, batchsize=100):

/root/learning2rank/regression/NN.pyc in predict(self, x)
41
42 def predict(self, x):
---> 43 h1 = F.relu(self.l1(x))
44 h2 = F.relu(self.l2(h1))
45 h = F.relu(self.l3(h2))

/root/.pyenv/versions/anaconda3-5.2.0/envs/python27/lib/python2.7/site-packages/chainer/links/connection/linear.pyc in call(self, x)
63
64 """
---> 65 return linear.linear(x, self.W, self.b)

/root/.pyenv/versions/anaconda3-5.2.0/envs/python27/lib/python2.7/site-packages/chainer/functions/connection/linear.pyc in linear(x, W, b)
79 return LinearFunction()(x, W)
80 else:
---> 81 return LinearFunction()(x, W, b)

/root/.pyenv/versions/anaconda3-5.2.0/envs/python27/lib/python2.7/site-packages/chainer/function.pyc in call(self, *inputs)
100 in_data = tuple([x.data for x in inputs])
101 if self.type_check_enable:
--> 102 self._check_data_type_forward(in_data)
103 # Forward prop
104 with cuda.get_device(*in_data):

/root/.pyenv/versions/anaconda3-5.2.0/envs/python27/lib/python2.7/site-packages/chainer/function.pyc in _check_data_type_forward(self, in_data)
134
135 def _check_data_type_forward(self, in_data):
--> 136 in_type = type_check.get_types(in_data, 'in_types', False)
137 try:
138 self.check_type_forward(in_type)

/root/.pyenv/versions/anaconda3-5.2.0/envs/python27/lib/python2.7/site-packages/chainer/utils/type_check.pyc in get_types(data, name, accept_none)
44
45 info = TypeInfoTuple(
---> 46 _get_type(name, i, x, accept_none) for i, x in enumerate(data))
47 # I don't know a method to set an attribute in an initializer of tuple.
48 info.name = name

/root/.pyenv/versions/anaconda3-5.2.0/envs/python27/lib/python2.7/site-packages/chainer/utils/type_check.pyc in ((i, x))
44
45 info = TypeInfoTuple(
---> 46 _get_type(name, i, x, accept_none) for i, x in enumerate(data))
47 # I don't know a method to set an attribute in an initializer of tuple.
48 info.name = name

/root/.pyenv/versions/anaconda3-5.2.0/envs/python27/lib/python2.7/site-packages/chainer/utils/type_check.pyc in _get_type(name, index, array, accept_none)
58
59 assert(isinstance(array, numpy.ndarray) or
---> 60 isinstance(array, cuda.ndarray))
61 return Variable(TypeInfo(array.shape, array.dtype), var)
62

AssertionError:

ListNet Loss Function

how to compute the loss function....we only use one sequence if enough or need to use any different sequence to get the loss

How to run/use this code?

I did the following :

$ git clone  https://github.com/shiba24/learning2rank.git
$ python learning2rank/__init__.py
Traceback (most recent call last):
  File "learning2rank/__init__.py", line 1, in <module>
    import utils, rank, regression
  File "/mnt/E4481D43481D1640/Various/books/ML-Course/pw/learning2rank/rank/__init__.py", line 1, in <module>
    import ListNet, RankNet
ModuleNotFoundError: No module named 'ListNet'

Shows this error.

What should I do?

ListNet predict nan

here's my example code:

import numpy as np
from learning2rank.rank import RankNet, ListNet
Model = ListNet()
X = np.array([[1, 2], [2, 3], [4, 5], [1, 3], [0, 0]])
y = np.array([1, 2, 3, 4, 5])
Model.fit(X, y)
score = Model.predict(X)
print(score)

I get the score as:
[[nan]
[nan]
[nan]
[nan]
[nan]]

and I have changed the loss function as
def ndcg(self, y_true, y_score, k=1):

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.