GithubHelp home page GithubHelp logo

aiueola / kdd2023-aips Goto Github PK

View Code? Open in Web Editor NEW
9.0 1.0 0.0 2.71 MB

(KDD2023) "Off-Policy Evaluation of Ranking Policies under Diverse User Behavior"

License: Apache License 2.0

Python 100.00%
off-policy-evaluation ranking recommender-system research

kdd2023-aips's Introduction

Off-Policy Evaluation of Ranking Policies under Diverse User Behavior


About

This repository contains the code to replicate the synthetic experiment conducted in the paper "Off-Policy Evaluation of Ranking Policies under Diverse User Behavior" by Haruka Kiyohara, Masatoshi Uehara, Yusuke Narita, Nobuyuki Shimizu, Yasuo Yamamoto, and Yuta Saito, which has been accepted to KDD2023. [paper] [arXiv] [slides]

Click here to show the abstract

Ranking interfaces are everywhere in online platforms. There is thus an ever growing interest in their Off-Policy Evaluation (OPE), aiming towards an accurate performance evaluation of ranking policies using logged data. A de-facto approach for OPE is Inverse Propensity Scoring (IPS), which provides an unbiased and consistent value estimate. However, it becomes extremely inaccurate in the ranking setup due to its high variance under large action spaces. To deal with this problem, previous studies assume either independent or cascade user behavior, resulting in some ranking versions of IPS. While these estimators are somewhat effective in reducing the variance, all existing estimators apply a single universal assumption to every user, causing excessive bias and variance. Therefore, this work explores a far more general formulation where user behavior is diverse and can vary depending on the user context. We show that the resulting estimator, which we call Adaptive IPS (AIPS), can be unbiased under any complex user behavior. Moreover, AIPS achieves the minimum variance among all unbiased estimators based on IPS. We further develop a procedure to identify the appropriate user behavior model to minimize the MSE of AIPS in a data-driven fashion. Extensive experiments demonstrate that the empirical accuracy improvement can be significant, enabling effective OPE of ranking systems even under diverse user behavior.

If you find this code useful in your research then please site:

@inproceedings{kiyohara2023off,
  author = {Kiyohara, Haruka and Uehara, Masatoshi and Narita, Yusuke and Shimizu, Nobuyuki and Yamamoto, Yasuo and Saito, Yuta},
  title = {Off-Policy Evaluation of Ranking Policies under Diverse User Behavior},
  booktitle = {Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining},
  pages = {1154--1163},
  year = {2023},
}

Dependencies

This repository supports Python 3.7 or newer.

  • numpy==1.21.2
  • pandas==1.4.3
  • scikit-learn==1.0.2
  • matplotlib==3.4.3
  • seaborn==0.11.2
  • obp==0.5.4
  • hydra-core==1.0.6

Running the code

To conduct the synthetic experiment, run the following commands.

python src/main.py setting={data_size/slate_size/user_behavior_variation} setting.{additional_config}={value} ...

Once the code is finished executing, you can find the result (estimation_{setting}.csv) in the ./logs/{setting}/ directory. (Lower value is better) Please refer to ./conf for the experimental configurations.

Visualize the results

To visualize the results, run the following commands. Note that, make sure that the files are located in logs/{setting}/estimation_{setting}.csv.

python src/visualize.py setting={data_size/slate_size/user_behavior_variation}

Then, you will find the three following figures ({setting}_{mse/squared_bias/variance}.png) in the ./figs/ directory. Lower value is better for the all the metrics.

experiment parameter MSE squared bias variance
varying data size (n)
varying length of ranking (K)
varying user behavior (δ)

kdd2023-aips's People

Contributors

aiueola avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.