ultr-community / ultra Goto Github PK

View Code? Open in Web Editor NEW

286.0 286.0 35.0 11.74 MB

Unbiased Learning To Rank Algorithms (ULTRA)

Home Page: https://ultr-community.github.io/ULTRA/

License: Apache License 2.0

Python 44.73% Shell 2.62% Makefile 0.13% CSS 1.75% JavaScript 7.54% HTML 38.37% Batchfile 0.09% Gherkin 4.76%

ultra's People

Contributors

Stargazers

Watchers

ultra's Issues

SetRank doesn't work when the number of input documents varies from training to testing.

The current version of SetRank doesn't work when the number of input documents varies. For example,

If you create a SetRank model with the number of input documents as 100 in training, you couldn't use it to rank a test query with 10 candidate documents without explicit paddings.
If you create a SetRank model with the number of input documents as 10 in training, you couldn't use it to rank a test query with 100 candidate documents.

Getting ModuleNotFoundError when running Click Simulation Example

When I run Estimate examination propensity with result randomization in the Click Simulation Example, I have this problem:
Traceback (most recent call last):
File "ultra/utils/propensity_estimator.py", line 6, in
from ultra.utils import data_utils
ModuleNotFoundError: No module named 'ultra'

My command is: python ultra/utils/propensity_estimator.py example/ClickModel/pbm_0.1_1.0_4_1.0.json $Data_path/tmp_toy/tmp/ example/PropensityEstimator/

Could you please tell me why this happened and how to solve it? I can't appreciate it more!

Add Vectorization learning algorithm

Paper: https://arxiv.org/abs/2206.01702
Repository: https://github.com/Keytoyze/Vectorization

Tensorboard stuck at "namespace hierarchy: finding similar subgraphs stuck"

When I try to open the computational graph on tensorboard, It always stuck at this line:
"namespace hierarchy: finding similar subgraphs stuck"
After a few minutes' wating, the graph successful shows only when I use DNN model, which may because of the simple structure of this model. But others still unshown.
How can I improve this problem? Thank you very much ! !

Loss not reducing, high validation and test metric values

I tried to run the code with DLA algorithm on Yahoo dataset. Following is the output attached. I am not sure of the following observation where I am getting almost constant training loss of about 4 (with each rank loss and exam loss as about 2), and high validation and testing metric values of more than 0.9. I did try to observe the parameter values of 2 models, which are actually updating. Also the loss is just fluctuating in range of 3.9 to 4.5 always. Is there something I should do with hyperparameters, have kept the default learning rate of 0.05 and selection_bias_cutoff = 10. This is with respect to the pytorch implementation of the code

Getting import error when running the toy example

Hi,

I get this error when I follow the instructions and run the toy example:

I am using macOS, python version 3.6.13, tensor flow==1.8.0.

/Users/xxx/anaconda3/envs/py3.6/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint8 = np.dtype([("qint8", np.int8, 1)])
/Users/xxx/anaconda3/envs/py3.6/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/Users/xxx/anaconda3/envs/py3.6/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:521: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint16 = np.dtype([("qint16", np.int16, 1)])
/Users/xxx/anaconda3/envs/py3.6/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:522: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/Users/xxx/anaconda3/envs/py3.6/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:523: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint32 = np.dtype([("qint32", np.int32, 1)])
/Users/xxx/anaconda3/envs/py3.6/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:528: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
np_resource = np.dtype([("resource", np.ubyte, 1)])
Traceback (most recent call last):
File "main.py", line 21, in
import ultra
File "/Users/xxx/ULTRA/ultra/init.py", line 4, in
from . import learning_algorithm
File "/Users/xxx/ULTRA/ultra/learning_algorithm/init.py", line 4, in
from .dla import *
File "/Users/xxx/ULTRA/ultra/learning_algorithm/dla.py", line 24, in
from tensorflow import dtypes
ImportError: cannot import name 'dtypes'

Can you please give some advice? Thank you very much!

`selection_bias_cutoff` setting for Multileave Gradient Descent (MGD) algorithm

I would like to get the letor_features out of the top 10 documents. thus I follow the implementation of I MGD, but I had difficulty setting parameters, specifically for selection_bias_cutoff .

If I understand correctly, MGC will first fetch the prediction for all docs under a query, as
https://github.com/ULTR-Community/ULTRA/blob/master/ultra/learning_algorithm/mgd.py#L85

however, as in main.py, we have,

exp_settings['selection_bias_cutoff'] = min(exp_settings['selection_bias_cutoff'], exp_settings['max_candidate_num'])

when the model is initialized, we have,

self.rank_list_size = self.exp_settings['selection_bias_cutoff']

and when creating the input_feed, we have,

self.rank_list_size = model.rank_list_size


for x in range(self.rank_list_size):
    if data_set.initial_list[i][x] >= 0:
        letor_features.append(
            data_set.features[data_set.initial_list[i][x]])

Therefore, if we set argument selection_bias_cutoff = 10 as usual, then letor_features in one input feed will always be size [?, selection_bias_cutoff]

Then when we try to fetch the features from docid_inputs, the key out of the top 10 will be invalid. Hence, self.output = tf.concat( self.get_ranking_scores( self.docid_inputs, is_training=self.is_training, scope='ranking_model'), 1) will trigger an error.

However, we cannot simply set selection_bias_cutoff = 0, since:

it will show more than 10 document to the user and generates click
the ranking model will train the model with more than top 10 documents as well.

I am wondering if there is any quick way to circumvent this issue, or if I had any misunderstanding of the pipeline?

Projection parameter matrices in MultiHeadAttention are missing?

In SetRank.py line52 - 54 the q,k,v linear projection layer are commented out, I understand that this is to meet the permutation-equivariant requirement, but should we make a single dense layer and pass all q,k,v to it so that can get three identical projection matrices rather than just using identical inputs?

Specifying the number of query sessions

Hi,

I am trying to understand the workflow of the package for some experiments on IPS.

I was wondering if we can specify the number of query sessions to be simulated for click simulation?

Context: I want to try training IPS on different sizes of simulated log data.

I tried reading the offline exp. bash script from example folders, but couldn't figure out if we can do so. Apologies if this is something obvious I am missing from the documentation.

Thanks.

Add supports to ignore unknown hyper-parameters

The same hyper-parameter string may used in multiple places and each place just need a subset of the hyper-parameters. TF 1.x would throw a exception when there are unknown hyper-parameters. It may make sense provide supports to automatically ignore unknown hyper-parameters.

./example/MSLR_WEB10k/offline_exp_pipeline.sh execution error

I am studying the usage of ULTRA, and I find the line at:
https://github.com/ULTR-Community/ULTRA/blob/master/example/MSLR_WEB10k/offline_exp_pipeline.sh#L7
cause a problem because when bash entered ../../, it is not possible to run python scripts behind the bash.

I delete this line and it runs correctly. Is it a problem, or just I run it in a wrong dir?

ultr-community / ultra Goto Github PK

ultra's People

Contributors

Stargazers

Watchers

Forkers

ultra's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs