jma127 / pyltr Goto Github PK

View Code? Open in Web Editor NEW

461.0 17.0 106.0 80 KB

Python learning to rank (LTR) toolkit

License: BSD 3-Clause "New" or "Revised" License

Python 99.78% Shell 0.22%

machine-learning machine-learning-library machine-learning-algorithms learning-to-rank

pyltr's People

Contributors

Stargazers

Watchers

Forkers

yerrick ofirnachum namkhanhtran soldni cequencer jinyu0310 mdmustafizurrahman abnering skyjiao haoweiliang1996 liuhaifeng0212 pombredanne yuan39 zhenv5 xnsam songweige chrisplyn forin-xyz yuanzhike rayyuewang kirawxz dualistmage rajat007 harshhpareek highflykxf lijunyou alxsoares zhouyonglong guozhenqiang dansanz isofer adslqa samithaj zxydi1992 lgdkobe24 yuqingsheng kretes yinjie1230 emsalinha wojohowitz00 eathieniti nevg9 jiakechong1991 burningreds jxfruit cdmawow afcarl deepfuzzy yuwailaifeng vitoguan gaoy926 ryuhou ruiatelsevier shuangyumo kamshee shubajitsaha mercileesb yuanjie-ai veyna hulalazz kun-cockpit-tech strategist922 mqlove liantieyu antiphoton dockerfan weiyudang tobby2002 trunghieu11 thiagoveras anuragsinghchaudhary yuanyuan-t manikant92 kyhoolee marlanbar youhebuke mythwn rit-git vliseev jasperadegeest rahulvennapusa mookat19 weixiu00 zhgu-dev ssanbu08 pl8787 lhqlhq penguinkang humdingers sarshaw haiga jamesk14022 abe2g dccdelang yuanjunandrew bladesaber wararaki718 zclab n130557 wenjie2wang

pyltr's Issues

MovieLens Dataset

Hi I am trying to implement this model to the MovieLens dataset, but I am facing problem because the significance and meaning of the parameter "qid" is not clear. I am not sure how can we generate these qids for MovieLens dataset. You can have a look at the data, (MovieLens 1M data). PLEASE suggest ways to implement your LambdaMart model for this dataset.

OverflowError: math range error

I am trying train using MQ2007-list dataset.

with open('/home/shivaraj/Downloads/MQ2007-list/Fold1/train.txt') as trainfile,
open('/home/shivaraj/Downloads/MQ2007-list/Fold1/vali.txt') as valifile,
open('/home/shivaraj/Downloads/MQ2007-list/Fold1/test.txt') as evalfile:
TX, Ty, Tqids, T_ = pyltr.data.letor.read_dataset(trainfile)
VX, Vy, Vqids, V_ = pyltr.data.letor.read_dataset(valifile)
EX, Ey, Eqids, E_ = pyltr.data.letor.read_dataset(evalfile)

metric = pyltr.metrics.NDCG(k=10)

Only needed if you want to perform validation (early stopping & trimming)

monitor = pyltr.models.monitors.ValidationMonitor(
VX, Vy, Vqids, metric=metric, stop_after=250)

model = pyltr.models.LambdaMART(
metric=metric,
n_estimators=1000,
learning_rate=0.02,
max_features=0.5,
query_subsample=0.5,
max_leaf_nodes=10,
min_samples_leaf=64,
verbose=1,
)

model.fit(TX, Ty, Tqids, monitor=monitor)

This error log--

OverflowError Traceback (most recent call last)
in
16 )
17
---> 18 model.fit(TX, Ty, Tqids, monitor=monitor)

~/miniconda3/envs/smartDB/lib/python3.6/site-packages/pyltr/models/lambdamart.py in fit(self, X, y, qids, monitor)
199
200 n_stages = self.fit_stages(X, y, qids, y_pred,
--> 201 random_state, begin_at_stage, monitor)
202
203 if n_stages < self.estimators.shape[0]:

~/miniconda3/envs/smartDB/lib/python3.6/site-packages/pyltr/models/lambdamart.py in _fit_stages(self, X, y, qids, y_pred, random_state, begin_at_stage, monitor)
406 y_pred = self._fit_stage(i, X, y, qids, y_pred, sample_weight,
407 sample_mask, query_groups_to_use,
--> 408 random_state)
409
410 train_total_score, oob_total_score = 0.0, 0.0

~/miniconda3/envs/smartDB/lib/python3.6/site-packages/pyltr/models/lambdamart.py in _fit_stage(self, i, X, y, qids, y_pred, sample_weight, sample_mask, query_groups, random_state)
332 for qid, a, b, _ in query_groups:
333 lambdas, deltas = self._calc_lambdas_deltas(qid, y[a:b],
--> 334 y_pred[a:b])
335 all_lambdas[a:b] = lambdas
336 all_deltas[a:b] = deltas

~/miniconda3/envs/smartDB/lib/python3.6/site-packages/pyltr/models/lambdamart.py in _calc_lambdas_deltas(self, qid, y, y_pred)
267 actual = y[positions]
268
--> 269 swap_deltas = self.metric.calc_swap_deltas(qid, actual)
270 max_k = self.metric.max_k()
271 if max_k is None or ns < max_k:

~/miniconda3/envs/smartDB/lib/python3.6/site-packages/pyltr/metrics/dcg.py in calc_swap_deltas(self, qid, targets, coeff)
33 for j in range(i + 1, n_targets):
34 deltas[i, j] = coeff *
---> 35 (self._gain_fn(targets[i]) - self._gain_fn(targets[j])) *
36 (self._get_discount(j) - self._get_discount(i))
37

~/miniconda3/envs/smartDB/lib/python3.6/site-packages/pyltr/metrics/gains.py in _exp2_gain(x)
16
17 def _exp2_gain(x):
---> 18 return math.exp(x * _LOG2) - 1.0
19
20

OverflowError: math range error

How to upload the model trained in pyltr to ElasticSearch?

The trained model for ranking should be dumped to txt file before it could be upload to ElasticSearch.
Is there anyone who know how to dump the trained model?
By the way, how to convert the trained model to pmml file?

Thank you very much!

Kendall-Tau correlation

Hi Jerry,
Are you planning to implement the calc_swap_deltas method for KT metrics?
As I remember there is a build-in function in scipy to calculate KT.
Thanks

error when running "import pyltr"

     10 
---> 11 import data
     12 import metrics
     13 import models

ImportError: No module named 'data'

import error under python3.5.1, no matter install pyltr or import directly from the source files.

how to use GridSearchCV module in sklearn to tune parameters?

Hi, tks for your work which is very significant. As I said above, how can I use GridSearchCV module in sklearn to tune parameters automatically? I looked up some ways, for example, inheriting classed BaseEstimator and RegressorMixin in module _modle.py. But I met a problem, the following are error infos:

File "E:/Python projects/others/lambdamart_t.py", line 105, in training_model
gscv.fit(training_set[1], training_set[2], groups=training_set[0])
File "E:\Python35\lib\site-packages\sklearn\model_selection_search.py", line 638, in fit
cv.split(X, y, groups)))
File "E:\Python35\lib\site-packages\sklearn\externals\joblib\parallel.py", line 779, in call
while self.dispatch_one_batch(iterator):
File "E:\Python35\lib\site-packages\sklearn\externals\joblib\parallel.py", line 625, in dispatch_one_batch
self._dispatch(tasks)
File "E:\Python35\lib\site-packages\sklearn\externals\joblib\parallel.py", line 588, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "E:\Python35\lib\site-packages\sklearn\externals\joblib_parallel_backends.py", line 111, in apply_async
result = ImmediateResult(func)
File "E:\Python35\lib\site-packages\sklearn\externals\joblib_parallel_backends.py", line 332, in init
self.results = batch()
File "E:\Python35\lib\site-packages\sklearn\externals\joblib\parallel.py", line 131, in call
return [func(*args, **kwargs) for func, args, kwargs in self.items]
File "E:\Python35\lib\site-packages\sklearn\externals\joblib\parallel.py", line 131, in
return [func(*args, **kwargs) for func, args, kwargs in self.items]
File "E:\Python35\lib\site-packages\sklearn\model_selection_validation.py", line 437, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
TypeError: fit() missing 1 required positional argument: 'qids'

I also implemented methods get_params() and set_params() in module lambdamart.py, however, the same error coming again as the above, and the implementing details are following codes:

 def get_params(self, deep=True):

    return {'metric': self.metric ,
            'learning_rate': self.learning_rate,
            'n_estimators': self.n_estimators,
            'query_subsample': self.query_subsample ,
            'subsample': self.subsample,
            'min_samples_split': self.min_samples_split,
            'min_samples_leaf': self.min_samples_leaf,
            'max_depth': self.max_depth,
            'random_state': self.random_state,
            'max_features': self.max_features,
            'verbose': self.verbose,
            'max_leaf_nodes': self.max_leaf_nodes,
            'warm_start': self.warm_start}

def set_params(self, **params):
    """Sets the parameters of this estimator.
            # Arguments
                **params: Dictionary of parameter names mapped to their values.
            # Returns
                self
    """
    for parameter, value in params.items():
        setattr(self, parameter, value)
    return self

my calling method is that
gscv = GridSearchCV(pyltr.models.LambdaMART(), params_lst, scoring=pyltr.metrics.AUCROC.calc_mean, n_jobs=1, cv=5, verbose=1)
gscv.fit(training_set[1], training_set[2], groups=training_set[0])
the training_set is a 3-tuple including training_qids, training_data and training_labels in sequence, which are all arrays.
Looking forword your reply, tks vaery much!!!

TypeError: only integer scalar arrays can be converted to a scalar index

Hi, I am using the same code as provided in Readme.txt with a different data and I keep getting this error. Spend a lot of time fixing it but no luck. Do you have any idea what might have caused it?

File "somefile.py", line 691, in ltr
print 'Our model:', metric.calc_mean(Eqids, Ey, Epred)
File "/anaconda2/lib/python2.7/site-packages/pyltr-0.2.4-py2.7.egg/pyltr/metrics/_metrics.py", line 157, in calc_mean
for qid, a, b in query_groups])
File "/anaconda2/lib/python2.7/site-packages/pyltr-0.2.4-py2.7.egg/pyltr/metrics/_metrics.py", line 108, in evaluate_preds
return self.evaluate(qid, get_sorted_y(targets, preds))
File "/anaconda2/lib/python2.7/site-packages/pyltr-0.2.4-py2.7.egg/pyltr/util/sort.py", line 36, in get_sorted_y
return y[get_sorted_y_positions(y, y_pred, check=check)]
TypeError: only integer scalar arrays can be converted to a scalar index

non IR applications

hey! I want to learn to rank several items for a non-IR application using your code. I do not have any query id. if I just pass the same qid (for example 1) for all the items, will that solve the problem?

Model prediction output

How can we map the regression values to a class label in the model output?

I see that this library uses GradientBoostingRegressor rather than GradientBoostingClassifier, but I am having trouble understanding what happens in metrics that allows for the evaluation of the continuous output with discrete labels. Thanks!

Dataset format

What format does the model need to train and test? (Only the format of LETOR can be used in the model??)

Example input dataset broken link

Hello!

Great library - the link to the example training data is missing.

Would it be possible to share an example raw dataset?

Thanks!

def evaluate(self, qid, targets):
    return (self._dcg.evaluate(qid, targets) /
            max(_EPS, self._get_ideal(qid, targets)))

Question:

Is evaluate method used to calculate the NDCG score of the model?
If yes, then what is the format of targets ? Is it the array of labels for each query-document pair?
If No, then how do we calculate NDCG score of the selected model?

jma127 / pyltr Goto Github PK

pyltr's People

Contributors

Stargazers

Watchers

Forkers

pyltr's Issues

Only needed if you want to perform validation (early stopping & trimming)

This error log--

Recommend Projects

Recommend Topics

Recommend Org

Jobs