GithubHelp home page GithubHelp logo

ultr-community / ultra Goto Github PK

View Code? Open in Web Editor NEW
286.0 286.0 35.0 11.74 MB

Unbiased Learning To Rank Algorithms (ULTRA)

Home Page: https://ultr-community.github.io/ULTRA/

License: Apache License 2.0

Python 44.73% Shell 2.62% Makefile 0.13% CSS 1.75% JavaScript 7.54% HTML 38.37% Batchfile 0.09% Gherkin 4.76%

ultra's People

Contributors

huazhengwang avatar keytoyze avatar qingyaoai avatar taosheng-ty avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ultra's Issues

SetRank doesn't work when the number of input documents varies from training to testing.

The current version of SetRank doesn't work when the number of input documents varies. For example,

  • If you create a SetRank model with the number of input documents as 100 in training, you couldn't use it to rank a test query with 10 candidate documents without explicit paddings.

  • If you create a SetRank model with the number of input documents as 10 in training, you couldn't use it to rank a test query with 100 candidate documents.

Getting ModuleNotFoundError when running Click Simulation Example

When I run Estimate examination propensity with result randomization in the Click Simulation Example, I have this problem:
Traceback (most recent call last):
File "ultra/utils/propensity_estimator.py", line 6, in
from ultra.utils import data_utils
ModuleNotFoundError: No module named 'ultra'

My command is: python ultra/utils/propensity_estimator.py example/ClickModel/pbm_0.1_1.0_4_1.0.json $Data_path/tmp_toy/tmp/ example/PropensityEstimator/

Could you please tell me why this happened and how to solve it? I can't appreciate it more!

Tensorboard stuck at "namespace hierarchy: finding similar subgraphs stuck"

When I try to open the computational graph on tensorboard, It always stuck at this line:
"namespace hierarchy: finding similar subgraphs stuck"
After a few minutes' wating, the graph successful shows only when I use DNN model, which may because of the simple structure of this model. But others still unshown.
How can I improve this problem? Thank you very much ! !

Loss not reducing, high validation and test metric values

I tried to run the code with DLA algorithm on Yahoo dataset. Following is the output attached. I am not sure of the following observation where I am getting almost constant training loss of about 4 (with each rank loss and exam loss as about 2), and high validation and testing metric values of more than 0.9. I did try to observe the parameter values of 2 models, which are actually updating. Also the loss is just fluctuating in range of 3.9 to 4.5 always. Is there something I should do with hyperparameters, have kept the default learning rate of 0.05 and selection_bias_cutoff = 10. This is with respect to the pytorch implementation of the code
Screen Shot 2022-07-08 at 4 01 37 PM

Getting import error when running the toy example

Hi,

I get this error when I follow the instructions and run the toy example:

I am using macOS, python version 3.6.13, tensor flow==1.8.0.

/Users/xxx/anaconda3/envs/py3.6/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint8 = np.dtype([("qint8", np.int8, 1)])
/Users/xxx/anaconda3/envs/py3.6/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/Users/xxx/anaconda3/envs/py3.6/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:521: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint16 = np.dtype([("qint16", np.int16, 1)])
/Users/xxx/anaconda3/envs/py3.6/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:522: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/Users/xxx/anaconda3/envs/py3.6/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:523: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint32 = np.dtype([("qint32", np.int32, 1)])
/Users/xxx/anaconda3/envs/py3.6/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:528: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
np_resource = np.dtype([("resource", np.ubyte, 1)])
Traceback (most recent call last):
File "main.py", line 21, in
import ultra
File "/Users/xxx/ULTRA/ultra/init.py", line 4, in
from . import learning_algorithm
File "/Users/xxx/ULTRA/ultra/learning_algorithm/init.py", line 4, in
from .dla import *
File "/Users/xxx/ULTRA/ultra/learning_algorithm/dla.py", line 24, in
from tensorflow import dtypes
ImportError: cannot import name 'dtypes'

Can you please give some advice? Thank you very much!

`selection_bias_cutoff` setting for Multileave Gradient Descent (MGD) algorithm

I would like to get the letor_features out of the top 10 documents. thus I follow the implementation of I MGD, but I had difficulty setting parameters, specifically for selection_bias_cutoff .

If I understand correctly, MGC will first fetch the prediction for all docs under a query, as
https://github.com/ULTR-Community/ULTRA/blob/master/ultra/learning_algorithm/mgd.py#L85

however, as in main.py, we have,

exp_settings['selection_bias_cutoff'] = min(exp_settings['selection_bias_cutoff'], exp_settings['max_candidate_num'])

when the model is initialized, we have,

self.rank_list_size = self.exp_settings['selection_bias_cutoff']

and when creating the input_feed, we have,

self.rank_list_size = model.rank_list_size


for x in range(self.rank_list_size):
    if data_set.initial_list[i][x] >= 0:
        letor_features.append(
            data_set.features[data_set.initial_list[i][x]])

Therefore, if we set argument selection_bias_cutoff = 10 as usual, then letor_features in one input feed will always be size [?, selection_bias_cutoff]

Then when we try to fetch the features from docid_inputs, the key out of the top 10 will be invalid. Hence, self.output = tf.concat( self.get_ranking_scores( self.docid_inputs, is_training=self.is_training, scope='ranking_model'), 1) will trigger an error.

However, we cannot simply set selection_bias_cutoff = 0, since:

  1. it will show more than 10 document to the user and generates click
  2. the ranking model will train the model with more than top 10 documents as well.

I am wondering if there is any quick way to circumvent this issue, or if I had any misunderstanding of the pipeline?

Projection parameter matrices in MultiHeadAttention are missing?

In SetRank.py line52 - 54 the q,k,v linear projection layer are commented out, I understand that this is to meet the permutation-equivariant requirement, but should we make a single dense layer and pass all q,k,v to it so that can get three identical projection matrices rather than just using identical inputs?

Specifying the number of query sessions

Hi,

I am trying to understand the workflow of the package for some experiments on IPS.

I was wondering if we can specify the number of query sessions to be simulated for click simulation?

Context: I want to try training IPS on different sizes of simulated log data.

I tried reading the offline exp. bash script from example folders, but couldn't figure out if we can do so. Apologies if this is something obvious I am missing from the documentation.

Thanks.

Add supports to ignore unknown hyper-parameters

The same hyper-parameter string may used in multiple places and each place just need a subset of the hyper-parameters. TF 1.x would throw a exception when there are unknown hyper-parameters. It may make sense provide supports to automatically ignore unknown hyper-parameters.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.