shiba24 / learning2rank Goto Github PK

Learning to rank with neuralnet - RankNet and ListNet

Python 100.00%

learning2rank's Introduction

Learning to Rank

An easy implementation of algorithms of learning to rank. Pairwise (RankNet) and ListWise (ListNet) approach. There implemented also a simple regression of the score with neural network. [Contribution Welcome!]

Requirements

python 2.7
tqdm
matplotlib v1.5.1
numpy v1.13+
scipy
chainer v1.5.1 +
scikit-learn
and some basic packages.

RankNet

Pairwise comparison of rank

The original paper was written by Chris Burges et al., "Learning to Rank using Gradient Descent." (available at http://research.microsoft.com/en-us/um/people/cburges/papers/ICML_ranking.pdf)

Usage

Import and initialize

from learning2rank.rank import RankNet
Model = RankNet.RankNet()

Fitting (automatically do training and validation)

Model.fit(X, y)

Here, X is numpy array with the shape of (num_samples, num_features) and y is numpy array with the shape of (num_samples, ). y is the score which you would like to rank based on (e.g., Sales of the products, page view, etc).

Possible options and defaults:

batchsize=100, n_iter=5000, n_units1=512, n_units2=128, tv_ratio=0.95, optimizerAlgorithm="Adam", savefigName="result.pdf", savemodelName="RankNet.model"

n_units1 and n_units2=128 are the number of nodes in hidden layer 1 and 2 in the neural net.

tv_ratio is the ratio of the data amounts between training and validation.

Predict

Model.predict(X)

ListNet

Listwise comparison of rank

The original paper was written by Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, Hang Li "Learning to Rank: From Pairwise Approach to Listwise Approach." (Available at http://research.microsoft.com/en-us/people/tyliu/listnet.pdf)

NOTICE: The top-k probability is not written. This is listwise approach with neuralnets, comparing two arrays by Jensen-Shannon divergence.

Usage

Import and initialize

from learning2rank.rank import ListNet
Model = ListNet.ListNet()

Fitting (automatically do training and validation)

Model.fit(X, y)

Same as ranknet, X is numpy array with the shape of (num_samples, num_features) and y is numpy array with the shape of (num_samples, ). y is the score which you would like to rank based on (e.g., Sales of the products, page view, etc).

Possible options and defaults:

batchsize=100, n_epoch=200, n_units1=512, n_units2=128, tv_ratio=0.95, optimizerAlgorithm="Adam", savefigName="result.pdf", savemodelName="ListNet.model"

Predict

Model.predict(X)

Regression

Regression the scores with neural network

Usage

Import and initialize

from learning2rank.regression import NN
Model = NN.NN()

Fitting (automatically do training and validation)

Model.fit(X, y)

Possible options and defaults:

batchsize=100, n_iter=5000, n_units1=512, n_units2=128, tv_ratio=0.95, optimizerAlgorithm="Adam", savefigName="result.pdf", savemodelName="RankNet.model"

n_units1 and n_units2=128 are the number of nodes in hidden layer 1 and 2 in the neural net.

tv_ratio is the ratio of the data amounts between training and validation.

Predict

Model.predict(X)

Author

If you have any troubles or questions, please contact shiba24.

March, 2016

learning2rank's People

Contributors

Stargazers

Watchers

Forkers

shubham1310 gjtjx cescigl betterenvi ltoscano mkhvalchik davidgao3141 taichidasheen denmoroz haoruoliu omimo wubizhi stevenlol fydlzr bloodd benjamesbabala jiaofusen alxsoares ajoeajoe zhouyonglong walnutjiazi sycbelief shinichr elvis121193 rock999 mdmustafizurrahman cnglen highflykxf ynxu15 alexmxb zwjyyc colinsongf wonyonyon akashrajkn wangjianyong junjiehu jjyycchh xuguanggen xuelun lgdkobe24 afchung whatisnull ftartarus ml-ai-nlp-ir boluoyu hugh2009hugh hanst hans0s dmz0907 zbn123 jxfruit cdmawow chenglongchen kevindragon fubincom hitflame yingshichenyoli whn09 imaginal ecom-research yanshanjing augusxing carol8421 mannykayy camphora tatsukiyamanami xiaolinzhong sjeblee dotrado carolinexull 1202zhyl andong0323 meghanakotagiri luweizhang ybm1 zhongyunuestc lulzzz echoxiong lengzi apollo007fd huhuigou hi-kubo greatpanc kdjyss boosterduan deepcolin pdsyaom rrishujain trunghieu11 nashwang1997 zermzhang ryefccd technologymz rehan-ai yuanyuan-t jarvisustc liuweiping2020 jiapeijia gamehoo mudwall

learning2rank's Issues

IndexError: list index out of range

I wrote the following code

import numpy as np

import utils, rank, regression

from rank import RankNet

Model = RankNet.RankNet()

X = np.array([[1, 2], [2, 3], [4, 5], [1, 3], [0, 0]]);
y = np.array([1, 2, 3, 4, 5]);
Model.fit(X, y);
Model.predict(X);

and got the next error, could you please help me with it

Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "/Applications/PyCharm.app/Contents/helpers/pydev/_pydev_bundle/pydev_umd.py", line 197, in runfile
    pydev_imports.execfile(filename, global_vars, local_vars)  # execute the script
  File "/Users/igladush/opensource-projects/learning2rank/__init__.py", line 3, in <module>
    import utils, rank, regression
  File "/Applications/PyCharm.app/Contents/helpers/pydev/_pydev_bundle/pydev_import_hook.py", line 21, in do_import
    module = self._system_import(name, *args, **kwargs)
  File "/Users/igladush/opensource-projects/learning2rank/rank/__init__.py", line 1, in <module>
    import ListNet, RankNet
  File "/Applications/PyCharm.app/Contents/helpers/pydev/_pydev_bundle/pydev_import_hook.py", line 21, in do_import
    module = self._system_import(name, *args, **kwargs)
  File "/Users/igladush/opensource-projects/learning2rank/rank/ListNet.py", line 18, in <module>
    from learning2rank.utils import plot_result
  File "/Applications/PyCharm.app/Contents/helpers/pydev/_pydev_bundle/pydev_import_hook.py", line 21, in do_import
    module = self._system_import(name, *args, **kwargs)
  File "/Users/igladush/opensource-projects/learning2rank/rank/../../learning2rank/__init__.py", line 11, in <module>
    Model.fit(X, y);
  File "/Users/igladush/opensource-projects/learning2rank/rank/../../learning2rank/rank/RankNet.py", line 123, in fit
    self.trainModel(train_X, train_y, validate_X, validate_y, n_iter)
  File "/Users/igladush/opensource-projects/learning2rank/rank/../../learning2rank/rank/RankNet.py", line 108, in trainModel
    train_ndcg = self.ndcg(y_train, train_score)
  File "/Users/igladush/opensource-projects/learning2rank/rank/../../learning2rank/rank/RankNet.py", line 84, in ndcg
    ideal_dcg += (2 ** y_true_sorted[i] - 1.) / np.log2(i + 2)

setup and installation

Can you please guide me on how to do the installation?
Did not mention anything about setup in the readme appreciate it if you share info on how to do installation

How to input query and document ?

Are you put query and document dat into X ? I can't understand that how to input my query and document data to train the model .

model initiating error

Hi, I am implementing the model on MovieLens dataset, I am facing an issue with model training.
When I start training on the dataset, it generates the following error,

`InvalidType:
Invalid operation is performed in: LinearFunction (Forward)

Expect: x.shape[1] == W.shape[1]
Actual: 5 != 950198`

The complete output of model.fit(X,y) is as follows:

`load dataset
The number of data, train: 950198 validate: 50011
prepare initialized model!

0%| | 0/5000 [00:00<?, ?it/s]

InvalidType Traceback (most recent call last)
in
----> 1 model.fit(X,y)

C:/Users/ppawar/Desktop/Genesys_PDP_code/ml-1m/learning2rank/rank\RankNet.py in fit(self, fit_X, fit_y, batchsize, n_iter, n_units1, n_units2, tv_ratio, optimizerAlgorithm, savefigName, savemodelName)
131 self.initializeModel(Model, train_X, n_units1, n_units2, optimizerAlgorithm)
132
--> 133 self.trainModel(train_X, train_y, validate_X, validate_y, n_iter)
134
135 plot_result.acc(self.train_loss, self.test_loss, savename=savefigName)

C:/Users/ppawar/Desktop/Genesys_PDP_code/ml-1m/learning2rank/rank\RankNet.py in trainModel(self, x_train, y_train, x_test, y_test, n_iter)
111 y_j = chainer.Variable(y_train[j])
112
--> 113 self.optimizer.update(self.model, x_i, x_j, y_i, y_j)
114
115 if (step + 1) % loss_step == 0:

~\AppData\Local\Continuum\anaconda3\envs\tensorflow_gpu_keras\lib\site-packages\chainer\optimizer.py in update(self, lossfun, *args, **kwds)
678 if lossfun is not None:
679 use_cleargrads = getattr(self, '_use_cleargrads', True)
--> 680 loss = lossfun(*args, **kwds)
681 if use_cleargrads:
682 self.target.cleargrads()

C:/Users/ppawar/Desktop/Genesys_PDP_code/ml-1m/learning2rank/rank\RankNet.py in call(self, x_i, x_j, t_i, t_j)
35 )
36 def call(self, x_i, x_j, t_i, t_j):
---> 37 s_i = self.l3(F.relu(self.l2(F.relu(self.l1(x_i)))))
38 s_j = self.l3(F.relu(self.l2(F.relu(self.l1(x_j)))))
39 s_diff = s_i - s_j

~\AppData\Local\Continuum\anaconda3\envs\tensorflow_gpu_keras\lib\site-packages\chainer\link.py in call(self, *args, **kwargs)
240 if forward is None:
241 forward = self.forward
--> 242 out = forward(*args, **kwargs)
243
244 # Call forward_postprocess hook

~\AppData\Local\Continuum\anaconda3\envs\tensorflow_gpu_keras\lib\site-packages\chainer\links\connection\linear.py in forward(self, x, n_batch_axes)
136 in_size = functools.reduce(operator.mul, x.shape[1:], 1)
137 self._initialize_params(in_size)
--> 138 return linear.linear(x, self.W, self.b, n_batch_axes=n_batch_axes)

~\AppData\Local\Continuum\anaconda3\envs\tensorflow_gpu_keras\lib\site-packages\chainer\functions\connection\linear.py in linear(x, W, b, n_batch_axes)
286 args = x, W, b
287
--> 288 y, = LinearFunction().apply(args)
289 if n_batch_axes > 1:
290 y = y.reshape(batch_shape + (-1,))

~\AppData\Local\Continuum\anaconda3\envs\tensorflow_gpu_keras\lib\site-packages\chainer\function_node.py in apply(self, inputs)
243
244 if configuration.config.type_check:
--> 245 self._check_data_type_forward(in_data)
246
247 hooks = chainer.get_function_hooks()

~\AppData\Local\Continuum\anaconda3\envs\tensorflow_gpu_keras\lib\site-packages\chainer\function_node.py in _check_data_type_forward(self, in_data)
328 in_type = type_check.get_types(in_data, 'in_types', False)
329 with type_check.get_function_check_context(self):
--> 330 self.check_type_forward(in_type)
331
332 def check_type_forward(self, in_types):

~\AppData\Local\Continuum\anaconda3\envs\tensorflow_gpu_keras\lib\site-packages\chainer\functions\connection\linear.py in check_type_forward(self, in_types)
25 x_type.ndim == 2,
26 w_type.ndim == 2,
---> 27 x_type.shape[1] == w_type.shape[1],
28 )
29 if type_check.eval(n_in) == 3:

~\AppData\Local\Continuum\anaconda3\envs\tensorflow_gpu_keras\lib\site-packages\chainer\utils\type_check.py in expect(*bool_exprs)
544 for expr in bool_exprs:
545 assert isinstance(expr, Testable)
--> 546 expr.expect()
547
548

~\AppData\Local\Continuum\anaconda3\envs\tensorflow_gpu_keras\lib\site-packages\chainer\utils\type_check.py in expect(self)
481 raise InvalidType(
482 '{0} {1} {2}'.format(self.lhs, self.exp, self.rhs),
--> 483 '{0} {1} {2}'.format(left, self.inv, right))
484
485

InvalidType:
Invalid operation is performed in: LinearFunction (Forward)

Expect: x.shape[1] == W.shape[1]
Actual: 5 != 950198
`

Invalid value encountered

Hi,

thanks a lot for sharing the code!

I am trying to use ListNet to learn a ranking problem:

import numpy as np
import random

from learning2rank.rank import ListNet

n = 1000
d = 200
X = np.random.rand(n,d)
y = np.random.rand(n)

model = ListNet.ListNet()
random.seed(1313)
model.fit(X, y, 
          batchsize=16,
          n_epoch=10,
          n_units1=256,
          n_units2=256,
          tv_ratio=0.67,
          optimizerAlgorithm="Adam",
          savefigName="result.pdf",
          savemodelName="ListNet.model")

However, I run into the following error messages.

Start training and validation loop......
epoch 1
0%| | 0/42 [00:00<?, ?it/s]C:\ProgramData\Anaconda3\lib\site-packages\chainer\functions\math\exponential.py:51: RuntimeWarning: invalid value encountered in log
return utils.force_array(numpy.log(x[0])),
C:\ProgramData\Anaconda3\lib\site-packages\chainer\functions\activation\relu.py:38: RuntimeWarning: invalid value encountered in maximum
return utils.force_array(numpy.maximum(x, 0, dtype=x.dtype)),
C:\ProgramData\Anaconda3\lib\site-packages\learning2rank\rank\ListNet.py:58: RuntimeWarning: invalid value encountered in greater
ind = vec_true.data * vec_compare.data > 0
C:\ProgramData\Anaconda3\lib\site-packages\chainer\functions\activation\relu.py:97: RuntimeWarning: invalid value encountered in greater

Any idea what's going here? I am clueless...

how to specify different query?

I don't know how to set up parameters for different query

Does this work with (almost) binary y's?

Hi,

Thanks so much for making your code available online!

I had a question: does your approach work if the y's are almost binary (very close to 0 or very close to 1)? Because I tried it and when I did

 from learning2rank.rank import RankNet, ListNet
 Model = RankNet.RankNet()
 Model.fit(X,y()
 predy = Model.predict(X)

 np.min(predy),np.max(predy)

I got 0.0, 0.0.

My X data consist of 6 features (float rankings of objects according to 6 different approaches), and about 100,000 rows. The y's are close to either 0 or 1, depending on whether the objects appeared in a gold standard dataset. I am not sure if the code is designed to work for this type of setup?

Thank you!

ListNet Loss Function

how to compute the loss function....we only use one sequence if enough or need to use any different sequence to get the loss

Runtime Warning and unchanged loss

Hi
I am facing two problems when using ListNet in this code with Letor dataset.
Problem 1:
My Loss does not seem to be decreasing. Following is the situation in the start.

epoch: 2
NDCG@100 | train: 0.2016394568476937, test: 0.19944033792067814

and these values are same for epoch 200
train mean loss=0.0
test mean loss=0.0
epoch: 201
NDCG@100 | train: 0.2016394568476937, test: 0.19944033792067814

Can you please comment why the loss isnt changing at all?

Problem 2:
Chainer maths warnings in log functions. Can you please tell how can I get rid of the following warnings?

Can you tell me how to get rid of Runtime warning related to chainer maths log functions?
..Anaconda3\lib\site-packages\chainer\functions\math\exponential.py:47: RuntimeWarning: divide by zero encountered in log
return utils.force_array(numpy.log(x[0])),
..Anaconda3\lib\site-packages\chainer\functions\math\exponential.py:47: RuntimeWarning: invalid value encountered in log
return utils.force_array(numpy.log(x[0])),
..Anaconda3\lib\site-packages\chainer\functions\math\basic_math.py:240: RuntimeWarning: invalid value encountered in multiply
return utils.force_array(x[0] * x[1]),

Thankyou

listwise_cost

I want to train ListNet to re-rank retrieved document so I got this error
Traceback (most recent call last):
File "ranking/rank/train.py", line 88, in
model.fit(X_train, y_train, X_test, y_test, Query, Query2, batchsize, n_epoch, n_hidden1, n_hidden2)
File "/home/ama/mhidy/nn_ranking/RankNet_chainer/ranking/rank/ListNet.py", line 193, in fit
self.trainModel(train_X, train_y, validate_X, validate_y, query, query_validate, n_epoch, batchsize)
File "/home/ama/mhidy/nn_ranking/RankNet_chainer/ranking/rank/ListNet.py", line 160, in trainModel
self.optimizer.update(self.model, x, t)
File "/home/ama/mhidy/.local/lib/python2.7/site-packages/chainer/optimizer.py", line 392, in update
loss = lossfun(*args, **kwds)
File "/home/ama/mhidy/nn_ranking/RankNet_chainer/ranking/rank/ListNet.py", line 61, in call
self.loss = self.listwise_cost(y, t)
File "/home/ama/mhidy/nn_ranking/RankNet_chainer/ranking/rank/ListNet.py", line 91, in listwise_cost
return - np.sum(self.topkprob(list_ans) * np.log(self.topkprob(list_pred)))
File "/home/ama/mhidy/nn_ranking/RankNet_chainer/ranking/rank/ListNet.py", line 85, in topkprob
vec_sort = np.sort(vec)[-1::-1]
File "/home/ama/mhidy/.local/lib/python2.7/site-packages/numpy/core/fromnumeric.py", line 824, in sort
a = asanyarray(a).copy(order="K")
File "/home/ama/mhidy/.local/lib/python2.7/site-packages/numpy/core/numeric.py", line 533, in asanyarray
return array(a, dtype, copy=False, order=order, subok=True)
File "/home/ama/mhidy/.local/lib/python2.7/site-packages/chainer/functions/array/get_item.py", line 71, in get_item
return GetItem(slices)(x)
File "/home/ama/mhidy/.local/lib/python2.7/site-packages/chainer/function.py", line 189, in call
self._check_data_type_forward(in_data)
File "/home/ama/mhidy/.local/lib/python2.7/site-packages/chainer/function.py", line 271, in _check_data_type_forward
six.raise_from(
AttributeError: 'module' object has no attribute 'raise_from'

ListNet predict nan

here's my example code:

import numpy as np
from learning2rank.rank import RankNet, ListNet
Model = ListNet()
X = np.array([[1, 2], [2, 3], [4, 5], [1, 3], [0, 0]])
y = np.array([1, 2, 3, 4, 5])
Model.fit(X, y)
score = Model.predict(X)
print(score)

I get the score as:
[[nan]
[nan]
[nan]
[nan]
[nan]]

and I have changed the loss function as
def ndcg(self, y_true, y_score, k=1):

Merge from betterenvi repo

Hi @betterenvi
thank you for your modification of this repo!
I looked through your repository (and commits), and think it is a nice change. Could you send a PR to this master branch, if it doesn't bother you?
Thank you for reading!

Any possibility to transfer&continue this project on python3.x?

What kind of data format for x and y？

Can you give an example？ @shiba24

input/output

Hi
Please expalin the input and output formats required for this, is the output of ranknet a probability or a rank?
please clarify

Examples of vector X and Y

Thank you for you code. But sorry, I don't understand how to use it. Could you please explain how to set up data for training in vector X and Y? Could you please provide more details?

How to run/use this code?

I did the following :

$ git clone  https://github.com/shiba24/learning2rank.git
$ python learning2rank/__init__.py
Traceback (most recent call last):
  File "learning2rank/__init__.py", line 1, in <module>
    import utils, rank, regression
  File "/mnt/E4481D43481D1640/Various/books/ML-Course/pw/learning2rank/rank/__init__.py", line 1, in <module>
    import ListNet, RankNet
ModuleNotFoundError: No module named 'ListNet'

Shows this error.

What should I do?

Installation & Setup

Could you tell me how to install this package?

opps, why the python shell in this way after I run ListNet.py?

Getting following error while running this code

RESTART: G:\Implementation of the Project\Learning to Rank Algorithms\Python\ListNet+RankNet\learning2rank-master\rank\ListNet.py
Traceback (most recent call last):
File "G:\Implementation of the Project\Learning to Rank Algorithms\Python\ListNet+RankNet\learning2rank-master\rank\ListNet.py", line 18, in
from learning2rank.utils import NNfuncs
ModuleNotFoundError: No module named 'learning2rank'

simple regression example

I tried simple regression example.
but, I received following error messages.

import sys, os
import numpy as np
from learning2rank.regression import NN

X = np.array([[1, 1, 1], [2, 2, 2], [3, 3, 3]])
y = np.array([[1], [2], [3]])

Model = NN.NN()
Model.fit(X, y)

X = np.array([[1, 1, 1]])
Model.predict(X)

AssertionError Traceback (most recent call last)
in ()
11
12 X = np.array([[1, 1, 1]])
---> 13 Model.predict(X)

/root/learning2rank/utils/NNfuncs.pyc in predict(self, predict_X)
66
67 def predict(self, predict_X):
---> 68 return self.model.predict(predict_X.astype(np.float32))
69
70 # def predict(self, predict_X, batchsize=100):

/root/learning2rank/regression/NN.pyc in predict(self, x)
41
42 def predict(self, x):
---> 43 h1 = F.relu(self.l1(x))
44 h2 = F.relu(self.l2(h1))
45 h = F.relu(self.l3(h2))

/root/.pyenv/versions/anaconda3-5.2.0/envs/python27/lib/python2.7/site-packages/chainer/links/connection/linear.pyc in call(self, x)
63
64 """
---> 65 return linear.linear(x, self.W, self.b)

/root/.pyenv/versions/anaconda3-5.2.0/envs/python27/lib/python2.7/site-packages/chainer/functions/connection/linear.pyc in linear(x, W, b)
79 return LinearFunction()(x, W)
80 else:
---> 81 return LinearFunction()(x, W, b)

/root/.pyenv/versions/anaconda3-5.2.0/envs/python27/lib/python2.7/site-packages/chainer/function.pyc in call(self, *inputs)
100 in_data = tuple([x.data for x in inputs])
101 if self.type_check_enable:
--> 102 self._check_data_type_forward(in_data)
103 # Forward prop
104 with cuda.get_device(*in_data):

/root/.pyenv/versions/anaconda3-5.2.0/envs/python27/lib/python2.7/site-packages/chainer/function.pyc in _check_data_type_forward(self, in_data)
134
135 def _check_data_type_forward(self, in_data):
--> 136 in_type = type_check.get_types(in_data, 'in_types', False)
137 try:
138 self.check_type_forward(in_type)

/root/.pyenv/versions/anaconda3-5.2.0/envs/python27/lib/python2.7/site-packages/chainer/utils/type_check.pyc in get_types(data, name, accept_none)
44
45 info = TypeInfoTuple(
---> 46 _get_type(name, i, x, accept_none) for i, x in enumerate(data))
47 # I don't know a method to set an attribute in an initializer of tuple.
48 info.name = name

/root/.pyenv/versions/anaconda3-5.2.0/envs/python27/lib/python2.7/site-packages/chainer/utils/type_check.pyc in ((i, x))
44
45 info = TypeInfoTuple(
---> 46 _get_type(name, i, x, accept_none) for i, x in enumerate(data))
47 # I don't know a method to set an attribute in an initializer of tuple.
48 info.name = name

/root/.pyenv/versions/anaconda3-5.2.0/envs/python27/lib/python2.7/site-packages/chainer/utils/type_check.pyc in _get_type(name, index, array, accept_none)
58
59 assert(isinstance(array, numpy.ndarray) or
---> 60 isinstance(array, cuda.ndarray))
61 return Variable(TypeInfo(array.shape, array.dtype), var)
62

AssertionError:

It looks like just a logistic package

How to deal with many querys?

shiba24 / learning2rank Goto Github PK

learning2rank's Introduction

Learning to Rank

Requirements

RankNet

Pairwise comparison of rank

Usage

ListNet

Listwise comparison of rank

Usage

Regression

Regression the scores with neural network

Usage

Author

learning2rank's People

Contributors

Stargazers

Watchers

Forkers

learning2rank's Issues

here's my example code:

import numpy as np from learning2rank.rank import RankNet, ListNet Model = ListNet() X = np.array([[1, 2], [2, 3], [4, 5], [1, 3], [0, 0]]) y = np.array([1, 2, 3, 4, 5]) Model.fit(X, y) score = Model.predict(X) print(score)

Recommend Projects

Recommend Topics

Recommend Org

Jobs

import numpy as np
from learning2rank.rank import RankNet, ListNet
Model = ListNet()
X = np.array([[1, 2], [2, 3], [4, 5], [1, 3], [0, 0]])
y = np.array([1, 2, 3, 4, 5])
Model.fit(X, y)
score = Model.predict(X)
print(score)