GithubHelp home page GithubHelp logo

jma127 / pyltr Goto Github PK

View Code? Open in Web Editor NEW
461.0 17.0 106.0 80 KB

Python learning to rank (LTR) toolkit

License: BSD 3-Clause "New" or "Revised" License

Python 99.78% Shell 0.22%
machine-learning machine-learning-library machine-learning-algorithms learning-to-rank

pyltr's People

Contributors

ducthienbui97 avatar forin-xyz avatar jma127 avatar soldni avatar songweige avatar stiebels avatar wararaki718 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pyltr's Issues

MovieLens Dataset

Hi I am trying to implement this model to the MovieLens dataset, but I am facing problem because the significance and meaning of the parameter "qid" is not clear. I am not sure how can we generate these qids for MovieLens dataset. You can have a look at the data, (MovieLens 1M data). PLEASE suggest ways to implement your LambdaMart model for this dataset.

OverflowError: math range error

I am trying train using MQ2007-list dataset.

with open('/home/shivaraj/Downloads/MQ2007-list/Fold1/train.txt') as trainfile,
open('/home/shivaraj/Downloads/MQ2007-list/Fold1/vali.txt') as valifile,
open('/home/shivaraj/Downloads/MQ2007-list/Fold1/test.txt') as evalfile:
TX, Ty, Tqids, T_ = pyltr.data.letor.read_dataset(trainfile)
VX, Vy, Vqids, V_ = pyltr.data.letor.read_dataset(valifile)
EX, Ey, Eqids, E_ = pyltr.data.letor.read_dataset(evalfile)

metric = pyltr.metrics.NDCG(k=10)

Only needed if you want to perform validation (early stopping & trimming)

monitor = pyltr.models.monitors.ValidationMonitor(
VX, Vy, Vqids, metric=metric, stop_after=250)

model = pyltr.models.LambdaMART(
metric=metric,
n_estimators=1000,
learning_rate=0.02,
max_features=0.5,
query_subsample=0.5,
max_leaf_nodes=10,
min_samples_leaf=64,
verbose=1,
)

model.fit(TX, Ty, Tqids, monitor=monitor)

This error log--

OverflowError Traceback (most recent call last)
in
16 )
17
---> 18 model.fit(TX, Ty, Tqids, monitor=monitor)

~/miniconda3/envs/smartDB/lib/python3.6/site-packages/pyltr/models/lambdamart.py in fit(self, X, y, qids, monitor)
199
200 n_stages = self.fit_stages(X, y, qids, y_pred,
--> 201 random_state, begin_at_stage, monitor)
202
203 if n_stages < self.estimators
.shape[0]:

~/miniconda3/envs/smartDB/lib/python3.6/site-packages/pyltr/models/lambdamart.py in _fit_stages(self, X, y, qids, y_pred, random_state, begin_at_stage, monitor)
406 y_pred = self._fit_stage(i, X, y, qids, y_pred, sample_weight,
407 sample_mask, query_groups_to_use,
--> 408 random_state)
409
410 train_total_score, oob_total_score = 0.0, 0.0

~/miniconda3/envs/smartDB/lib/python3.6/site-packages/pyltr/models/lambdamart.py in _fit_stage(self, i, X, y, qids, y_pred, sample_weight, sample_mask, query_groups, random_state)
332 for qid, a, b, _ in query_groups:
333 lambdas, deltas = self._calc_lambdas_deltas(qid, y[a:b],
--> 334 y_pred[a:b])
335 all_lambdas[a:b] = lambdas
336 all_deltas[a:b] = deltas

~/miniconda3/envs/smartDB/lib/python3.6/site-packages/pyltr/models/lambdamart.py in _calc_lambdas_deltas(self, qid, y, y_pred)
267 actual = y[positions]
268
--> 269 swap_deltas = self.metric.calc_swap_deltas(qid, actual)
270 max_k = self.metric.max_k()
271 if max_k is None or ns < max_k:

~/miniconda3/envs/smartDB/lib/python3.6/site-packages/pyltr/metrics/dcg.py in calc_swap_deltas(self, qid, targets, coeff)
33 for j in range(i + 1, n_targets):
34 deltas[i, j] = coeff *
---> 35 (self._gain_fn(targets[i]) - self._gain_fn(targets[j])) *
36 (self._get_discount(j) - self._get_discount(i))
37

~/miniconda3/envs/smartDB/lib/python3.6/site-packages/pyltr/metrics/gains.py in _exp2_gain(x)
16
17 def _exp2_gain(x):
---> 18 return math.exp(x * _LOG2) - 1.0
19
20

OverflowError: math range error

How to upload the model trained in pyltr to ElasticSearch?

The trained model for ranking should be dumped to txt file before it could be upload to ElasticSearch.
Is there anyone who know how to dump the trained model?
By the way, how to convert the trained model to pmml file?

Thank you very much!

Kendall-Tau correlation

Hi Jerry,
Are you planning to implement the calc_swap_deltas method for KT metrics?
As I remember there is a build-in function in scipy to calculate KT.
Thanks

error when running "import pyltr"

     10 
---> 11 import data
     12 import metrics
     13 import models

ImportError: No module named 'data'

import error under python3.5.1, no matter install pyltr or import directly from the source files.

how to use GridSearchCV module in sklearn to tune parameters?

Hi, tks for your work which is very significant. As I said above, how can I use GridSearchCV module in sklearn to tune parameters automatically? I looked up some ways, for example, inheriting classed BaseEstimator and RegressorMixin in module _modle.py. But I met a problem, the following are error infos:

File "E:/Python projects/others/lambdamart_t.py", line 105, in training_model
gscv.fit(training_set[1], training_set[2], groups=training_set[0])
File "E:\Python35\lib\site-packages\sklearn\model_selection_search.py", line 638, in fit
cv.split(X, y, groups)))
File "E:\Python35\lib\site-packages\sklearn\externals\joblib\parallel.py", line 779, in call
while self.dispatch_one_batch(iterator):
File "E:\Python35\lib\site-packages\sklearn\externals\joblib\parallel.py", line 625, in dispatch_one_batch
self._dispatch(tasks)
File "E:\Python35\lib\site-packages\sklearn\externals\joblib\parallel.py", line 588, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "E:\Python35\lib\site-packages\sklearn\externals\joblib_parallel_backends.py", line 111, in apply_async
result = ImmediateResult(func)
File "E:\Python35\lib\site-packages\sklearn\externals\joblib_parallel_backends.py", line 332, in init
self.results = batch()
File "E:\Python35\lib\site-packages\sklearn\externals\joblib\parallel.py", line 131, in call
return [func(*args, **kwargs) for func, args, kwargs in self.items]
File "E:\Python35\lib\site-packages\sklearn\externals\joblib\parallel.py", line 131, in
return [func(*args, **kwargs) for func, args, kwargs in self.items]
File "E:\Python35\lib\site-packages\sklearn\model_selection_validation.py", line 437, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
TypeError: fit() missing 1 required positional argument: 'qids'

I also implemented methods get_params() and set_params() in module lambdamart.py, however, the same error coming again as the above, and the implementing details are following codes:

 def get_params(self, deep=True):

    return {'metric': self.metric ,
            'learning_rate': self.learning_rate,
            'n_estimators': self.n_estimators,
            'query_subsample': self.query_subsample ,
            'subsample': self.subsample,
            'min_samples_split': self.min_samples_split,
            'min_samples_leaf': self.min_samples_leaf,
            'max_depth': self.max_depth,
            'random_state': self.random_state,
            'max_features': self.max_features,
            'verbose': self.verbose,
            'max_leaf_nodes': self.max_leaf_nodes,
            'warm_start': self.warm_start}

def set_params(self, **params):
    """Sets the parameters of this estimator.
            # Arguments
                **params: Dictionary of parameter names mapped to their values.
            # Returns
                self
    """
    for parameter, value in params.items():
        setattr(self, parameter, value)
    return self

my calling method is that
gscv = GridSearchCV(pyltr.models.LambdaMART(), params_lst, scoring=pyltr.metrics.AUCROC.calc_mean, n_jobs=1, cv=5, verbose=1)
gscv.fit(training_set[1], training_set[2], groups=training_set[0])
the training_set is a 3-tuple including training_qids, training_data and training_labels in sequence, which are all arrays.
Looking forword your reply, tks vaery much!!!

TypeError: only integer scalar arrays can be converted to a scalar index

Hi, I am using the same code as provided in Readme.txt with a different data and I keep getting this error. Spend a lot of time fixing it but no luck. Do you have any idea what might have caused it?

File "somefile.py", line 691, in ltr
print 'Our model:', metric.calc_mean(Eqids, Ey, Epred)
File "/anaconda2/lib/python2.7/site-packages/pyltr-0.2.4-py2.7.egg/pyltr/metrics/_metrics.py", line 157, in calc_mean
for qid, a, b in query_groups])
File "/anaconda2/lib/python2.7/site-packages/pyltr-0.2.4-py2.7.egg/pyltr/metrics/_metrics.py", line 108, in evaluate_preds
return self.evaluate(qid, get_sorted_y(targets, preds))
File "/anaconda2/lib/python2.7/site-packages/pyltr-0.2.4-py2.7.egg/pyltr/util/sort.py", line 36, in get_sorted_y
return y[get_sorted_y_positions(y, y_pred, check=check)]
TypeError: only integer scalar arrays can be converted to a scalar index

non IR applications

hey! I want to learn to rank several items for a non-IR application using your code. I do not have any query id. if I just pass the same qid (for example 1) for all the items, will that solve the problem?

Model prediction output

How can we map the regression values to a class label in the model output?

I see that this library uses GradientBoostingRegressor rather than GradientBoostingClassifier, but I am having trouble understanding what happens in metrics that allows for the evaluation of the continuous output with discrete labels. Thanks!

Dataset format

What format does the model need to train and test? (Only the format of LETOR can be used in the model??)

Example input dataset broken link

Hello!

Great library - the link to the example training data is missing.

Would it be possible to share an example raw dataset?

Thanks!

question about calculating NDCG score

Your code shows :
class NDCG(Metric):
def init(self, k=10, gain_type='exp2'):
super(NDCG, self).init()
self.k = k
self.gain_type = gain_type
self._dcg = DCG(k=k, gain_type=gain_type)
self._ideals = {}

def evaluate(self, qid, targets):
    return (self._dcg.evaluate(qid, targets) /
            max(_EPS, self._get_ideal(qid, targets)))

Question:

  1. Is evaluate method used to calculate the NDCG score of the model?
    If yes, then what is the format of targets ? Is it the array of labels for each query-document pair?
    If No, then how do we calculate NDCG score of the selected model?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.