Hello <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-ur

Thanks for your prompt reply <a class="user-mention notranslate" data-hovercard-type="

Basic example with smallest dataset/random rewards (onboarding) about contextualbandits HOT 2 CLOSED

david-cortes commented on July 19, 2024

Basic example with smallest dataset/random rewards (onboarding)

from contextualbandits.

Comments (2)

david-cortes commented on July 19, 2024

I'm not sure if I understand it correctly.

From what I get, you have a situation in which you have user features + item features and you observe both, and at each turn you make a recommendation for a different user and each potential recommendation has a continuous score, which is not the kind of scenario that this library deals with.

If the item embeddings are supposed to be invisible / not available to the algorithm and there is a threshold on the obtained scores to make them binary (reward vs. no reward), then it would sound like the kind of problem for this library, for which you could treat the items as arms (you'll need to enumerate them) and the user vector as features (you'll need to convert them to a matrix with 1 row, and pass it as numpy array)

But be aware that (a) this software by default will switch to non-contextual MAB when the number of seen data points is small, so if you explicitly want to make it contextual while running for only a few rounds, you'll have to check the specific parameters that you are using, (b) if you know the specific reward-generating function and the algorithm is supposed to be aware of this point, you might want to select a classifier and its hyperparameters accordingly instead of following the example notebooks.

from contextualbandits.

qathom commented on July 19, 2024

Thanks for your prompt reply @david-cortes, I really appreciate your message!
The simulation is for 1 user (user_embeddings). The turns are a conversation simulation where the user gives his/her preferences step by step (comments in lines 83+).

Indeed, the current gist returns a continuous value between 0 and 1 where 1 means 100% match between user preferences and item features but I can maybe try to define a threshold to return 1 or 0 (reward/no reward).

I think contextual MAB makes sense for my project (a chatbot asks questions about user preferences) because the idea is to use a hybrid approach when it comes to recommend items (user-item and item-item filtering). The Gist tries to illustrate both concepts.
An idea: the goal of the algo is to find the best "end result" based on the maximization of 2 reward functions (user-item and item-item).

from contextualbandits.

Recommend Projects

Basic example with smallest dataset/random rewards (onboarding) about contextualbandits HOT 2 CLOSED

Comments (2)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs