GithubHelp home page GithubHelp logo

alizshim / recurrent-intensity-model-experiments Goto Github PK

View Code? Open in Web Editor NEW

This project forked from awslabs/recurrent-intensity-model-experiments

0.0 0.0 0.0 583 KB

License: Apache License 2.0

Python 100.00%

recurrent-intensity-model-experiments's Introduction

Recurrent Intensity Model Experiments

pytest workflow

Repository to reproduce the experiments in the paper:

Recurrent Intensity Modeling for User Recommendation and Online Matching; (Another Link)

@inproceedings{ma2021recurrent,
    Author = {Ma, Yifei and Liu, Ge and Deoras, Anoop},
    Booktitle = {ICML Time Series Workshop},
    Title = {Recurrent Intensity Modeling for User Recommendation and Online Matching},
    Year = {2021}
}

Getting Started

  1. Download and install via pip install -e .
    • If direct installation fails, some dependencies might be more easily installed by conda: conda env update --file environment.yml --name target-env-to-update
    • Notice: conda env update may overwrite the current python version and it is recommended to manually fix that in the yml file.
  2. Add data to the data folder. Some downloading and preparing scripts may be found in data/util.py.
  3. Run experiment as
    from rime import main, plot_results, Experiment, evaluate_assigned
    self = main("prepare_ml_1m_data") # or "prepare_minimal_dataset"
    # print out item_rec and user_rec metrics for all included methods
    
  4. Run pytest -s -x --pdb for unit tests including the end-to-end workflow.

More Examples

Perform Offline-Greedy optimization for diversity-relevance trade-offs

mult=[0, 0.1, 0.2, 0.5, 1, 3, 10, 30, 100]
self = main("prepare_ml_1m_data", mult=mult)
fig = plot_results(self)

greedy-ml-1m

Perform CVX-Online allocation for diversity-relevance trade-offs

cvx_online = main("prepare_ml_1m_data", mult=mult, cvx=True, online=True)
fig = plot_results(cvx_online)

online-ml-1m

Optional configuration that excludes training user-item pairs from reappearing in predictions and targets by a large penalization prior. For other types of block (or approval) lists, please provide a negative (or positive) prior_score input to Dataset constructor following the source code of this example.

D, V = rime.dataset.prepare_ml_1m_data(exclude_train=True)
self = Experiment(D, V)
self.run()
self.print_results()

With the exclude-train option, the performances of ALS, BPR, and LogisticMF improve significantly. (Plot generated from modified scripts/everything_ml_1m.py)

exclude-train-ml-1m

Code Organization

Here is the content of the main function:

D, V = prepare_some_dataset(...) # output instances of rime.dataset.base.Dataset
self = rime.Experiemnt(D, V, ...) # V is required only for Hawkes-Poisson and CVX-Online.
self.run()
self.results.print_results()

Here is what Experiment.run basically does:

Step 1. Predictions.

Let x be a user-time state and y be a unique item. Traditional top-k item-recommendation aims to predict p(y|x) for the next item given the current user-state. On the other hand, we introduce symmetry via user-recommendation that allows for the comparisons across x. To this end, we novelly redefine the problem as the prediction of user-item engagement intensities in a unit time window in the immediate future, λ(x,y), and utilize a marked temporal point process (MTPP) decomposition as λ(x,y) = λ(x) p(y|x). Here is the code to do that:

rnn = rime.models.rnn.RNN(**self.model_hyps["RNN"]).fit(D.training_data)
hawkes = rime.models.hawkes.Hawkes(D.horizon).fit(D.training_data)
S = rnn.transform(D) * hawkes.transform(D)

S is a low-rank dataframe-like object with shape (len(D.user_in_test), len(D.item_in_test)).

Step 2. Offline decisions.

Ranking of the items (or users) and then comparing with the ground-truth targets can be laborsome. Instead, we utilize the scipy.sparse library to easily calculate the recommendation hit rates through point-wise multiplication. The sparsity property allows the evaluations to scale to large numbers of user-item pairs.

item_rec_assignments = rime.util._assign_topk(score_mat, item_rec_topk, device='cuda')
item_rec_metrics = evaluate_assigned(D.target_csr, item_rec_assignments, axis=1)
user_rec_assignments = rime.util._assign_topk(score_mat.T, user_rec_C, device='cuda').T
user_rec_metrics = evaluate_assigned(D.target_csr, user_rec_assignments, axis=0)

Step 3. Online simulation.

RIME contains an optional configuration "CVX-Online", which simulates a scenario where we may not observe the full set of users ahead of time, but must make real-time decisions immediately and unregretfully as each user arrives one at a time. This scenario is useful in the case of multi-day marketing campaigns with budgets allocated for the long-term prospects. Our basic idea is to approximate a quantile threshold v(y) per item-y from an observable user sample and then generalize it to the testing set. We pick the user sample from a "validation" data split V. Additionally, we align the item_in_test between D and V, because cvx also considers the competitions for the limited user capacities from different items.

V = V.reindex(D.item_in_test.index, axis=1) # align on the item_in_test to generalize
T = rnn.transform(V) * hawkes.transform(V)  # solve CVX based on the predicted scores.
cvx_online = rime.metrics.cvx.CVX(S, item_rec_topk, user_rec_C, ...) # set hyperparameters
online_assignments = cvx_online.fit(T).transform(S)
out = evaluate_assigned(D.target_csr, online_assignments, axis=0)

CVX-Online is integrated as self.metrics_update("RNN-Hawkes", S, T), when self.online=True and T is not None.

More information may be found in auto-generated documentation at ReadTheDocs. To extend to other datasets, one may follow the two examples to create a minimal Dataset instance as:

D = rime.dataset.Dataset(
    target_csr=..., user_in_test=..., item_in_test=...,
    training_data=argparse.Namespace(event_df=..., user_df=..., item_df=...),
    # optional sparse negative prior for exclusions or positive prior for approvals
    ...)

The main functions are covered in test.

Security

See CONTRIBUTING for more information.

License

This project is licensed under the Apache-2.0 License.

recurrent-intensity-model-experiments's People

Contributors

amazon-auto avatar gliua avatar yifeim avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.