fidelity / mabwiser Goto Github PK

View Code? Open in Web Editor NEW

193.0 11.0 35.0 75.05 MB

[IJAIT 2021] MABWiser: Contextual Multi-Armed Bandits Library

Home Page: https://fidelity.github.io/mabwiser/

License: Apache License 2.0

Python 99.85% Makefile 0.07% Batchfile 0.08%

multi-armed-bandits contextual-bandits parametric-bandits non-parametric-bandits machine-learning recsys

mabwiser's People

Contributors

Stargazers

Watchers

mabwiser's Issues

Randomizing the predicted result instead of pick the first max one?

The current Predict method of 'arm_to_expectation' is to pick the first max arm in code

https://github.com/fmr-llc/mabwiser/blob/b97868fd0ac162619ca66830f0b70079fb64e2dc/mabwiser/greedy.py#L41

How about randomize the pick method, such as the implement below:

https://github.com/alison-carrera/mabalgs/blob/50e520ba7461a8b7126aa308c2a6670422cc8fcf/mab/ranked_algs.py#L40

interpreting `predict_expectations`

Hi there,

I have been exploring your amazing library, and I am wondering if it would make sense to interpret the output of predict_expectations as the counterfactual rewards for each time step. In other words, predict_expectations represents what would have happened if a different action had been chosen at each time t. Would it make sense?

There's no good way of getting the rewards of arms, period.

I saw #86 and how it was marked as "closed" by a developer after suggesting we access a private field of the MAB class in order to get the rewards of e.g. EpsilonGreedy rewards, because the original method predict_expectations has a random chance to return random rewards.

As I already commented, this is not only bad practice, it also (predictably) leads to broken behavior (that is not trivial to workaround anymore) when using it on nonparametric contextual bandits.

My suggestions are either rewrite the method predict_expectations so that it only returns the expectations with no extra rng, or (in case this RNG was implemented for some theoretical reason) add a get_expectations method that returns the values.

[Question] How to deal with cold start

Hi mabwiser team,

Thank you for this great library!

I am wondering what would be the best approach for mabwiser to be used in cold start situations, where there aren't yet any recorded decisions and rewards. Could it be possible to initialize rewards for all arms with 0? The use case I have in mind has binary rewards [0,1].

Thanks in advance!

Need an LP and NP Type Definition

In the constructor of MAB, we use a Union of LPs and Union of NPs written explicitly.

https://github.com/fidelity/mabwiser/blob/master/mabwiser/mab.py#L732

This is problematic..

This creates an issue for ALSN not being able to a generic LP Type or NP Type, unless it also explicity states all options as a Union -- which is not good.

This issue is to create a Type Defintion, LPType and NPType similar to ARM type https://github.com/fidelity/mabwiser/blob/master/mabwiser/utils.py#L13

This can simplify the MAB() constuctor using these type. And... ALNS and other libraries on top, can refer to these types without necessarily knowing what is in it. When mabwiser introduces new policies, pip upgrading the library at the top-level library will uptake new types automatically.

Note: once this features is complete, it also requires a minor code change in the ALNS to refer to these new types in the MABSelector()

Save the state of Contextual MAB

Hi!

We're using your implementation of MAB in our project and it's a great pleasure! However, I'd like to ask: is there a possibility to save the current state of MAB at any time? For example, it would be useful to plot actions probabilities for different contexts throughout the evolution process in our case.

Thank you in advance!

Make protocols out of LearningPolicyType and NeighborhoodPolicyType?

Thanks for the union types and PR for ALNS! 🎉

I noticed that currently MAB only takes learning and neighborhood policies that are already predefined as part of the LearningPolicyType or NeighborhoodPolicyType typevars. It makes sense to do this, because MAB handles each implemented policy on a case-by-case basis, using isinstance checks to decide which policy is used.

A downside to this is that users cannot easily provide their own policies without having to explicitly patch MAB (at least, if I understand the docs correctly). It might be possible to rewrite in such a way that all that a new policy needs to do is implement a few required methods. Those required methods could then be specified in an interface protocol. We do this in ALNS with the stopping and acceptance criteria, as well as the operator selection schemes: as long as users implement the required methods, their classes can be passed directly to ALNS without any modification to the library.

As I write this issue, I realize this would require significant refactoring. But it might be worthwhile because the resulting implementation would be much less coupled than it currently is.

UCB1 arm_to_expectation not updated for all arms (potential bug)

When partial_fit() is done with UCB1, arm_to_expectation of only the arms having a reward are updated (because of https://github.com/fmr-llc/mabwiser/blob/0c860253be017d1f393e18bf9d9d7e1739f93dca/mabwiser/ucb.py#L62 ). If an arm does not have a reward, its arm_to_expectation is not updated.

arm_to_expectation depends on self.total_count which gets updated when "any" of the arms are invoked. Thus, arm_to_expectation of all arms need to be updated when self.total_count changes.

Solution: remove the above condition (if arm_rewards.size).

Happy to submit a fix if this change can be made.

Using categorical variables in the context

Do I need to convert the categorical variables using one-hot encoding in order to use in the context. Does the context only use numerical features or can it accept categorical features as well?

asynchronously implement the training and prediction process？

Very nice project, but how to asynchronously implement the training and prediction process

`context` isn't passed to `_parallel_fit` in Thompson Sampling

Hi,

I noticed that in the fit function of _ThompsonSampling, contexts is never passed to self._parallel_fit(decisions, rewards). https://github.com/fidelity/mabwiser/blob/master/mabwiser/thompson.py#L38

    def fit(self, decisions: np.ndarray, rewards: np.ndarray, contexts: np.ndarray = None) -> NoReturn:

        # If rewards are non binary, convert them
        rewards = self._get_binary_rewards(decisions, rewards)

        # Reset the success and failure counters to 1 (beta distribution is undefined for 0)
        reset(self.arm_to_success_count, 1)
        reset(self.arm_to_fail_count, 1)

        # Reset warm started arms
        self._reset_arm_to_status()

        # Calculate fit
        self._parallel_fit(decisions, rewards)

        # Update trained arms
        self._set_arms_as_trained(decisions=decisions, is_partial=False)

        # Leave the calculation of expectations to predict methods

I'm curious why this is the case. In simulator.py, ThompsonSampling appears in both contextual_mabs and context_free_mabs. https://github.com/fidelity/mabwiser/blob/master/examples/simulator.py#L30-L42

If _parallel_fit in _ThompsonSampling never receives the context, how does it solve the contextual bandits problem?

Thanks.

Thompson Sampling for Gaussian priors?

Hi!

I learned that mabwiser does not implement Thompson Sampling for Gaussian priors. As far as I know (please note that I'm quite new to multi-armed bandits), the Gaussian distribution is a conjugate prior and it's possible to apply the procedure of Thompson Sampling on Gaussian priors. Would the maintainers be open to adding that as a learning policy? Or, is it the case that the optimality of Thompson sampling is only proved for Beta priors?

Thanks a lot!

Evluation erroring out

I am getting the following error



  File "C:\Users\ayush\AppData\Local\Temp\2\ipykernel_5524\1442836565.py", line 12, in <module>
    response_col = 'sales_net')

  File "C:\ProgramData\Anaconda3\envs\test_env\lib\site-packages\mab2rec\pipeline.py", line 417, in benchmark
    return _bench(**args)

  File "C:\ProgramData\Anaconda3\envs\test_env\lib\site-packages\mab2rec\pipeline.py", line 531, in _bench
    recommendations[name])

  File "C:\ProgramData\Anaconda3\envs\test_env\lib\site-packages\jurity\recommenders\combined.py", line 121, in get_score
    return_extended_results)

  File "C:\ProgramData\Anaconda3\envs\test_env\lib\site-packages\jurity\recommenders\auc.py", line 131, in get_score
    sorted_clicks = get_sorted_clicks(predicted_results, self._user_id_column, self.click_column, self.k)

  File "C:\ProgramData\Anaconda3\envs\test_env\lib\site-packages\jurity\hash_utils.py", line 80, in wrapper
    return cached_wrapper(*args, **kwargs)

  File "C:\ProgramData\Anaconda3\envs\test_env\lib\site-packages\jurity\hash_utils.py", line 87, in cached_wrapper
    return user_function(*args, **kwargs)

  File "C:\ProgramData\Anaconda3\envs\test_env\lib\site-packages\jurity\utils.py", line 347, in get_sorted_clicks
    sorted_clicks = results.sort_values(click_column, ascending=False).groupby(user_id_column).head(k)

  File "C:\ProgramData\Anaconda3\envs\test_env\lib\site-packages\pandas\util\_decorators.py", line 311, in wrapper
    return func(*args, **kwargs)

  File "C:\ProgramData\Anaconda3\envs\test_env\lib\site-packages\pandas\core\frame.py", line 6259, in sort_values
    k = self._get_label_or_level_values(by, axis=axis)

  File "C:\ProgramData\Anaconda3\envs\test_env\lib\site-packages\pandas\core\generic.py", line 1779, in _get_label_or_level_values
    raise KeyError(key)

KeyError: 'sales_net'

This is how I have initialized the metrics:


from jurity.recommenders import BinaryRecoMetrics, RankingRecoMetrics

# Column names for the response, user, and item id columns
metric_params = {'click_column': 'sales_net', 'user_id_column': 'ID', 'item_id_column':'MailerID'}

# Evaluate peformance at different k-recommendations
top_k_list = [4]

# List of metrics to benchmark
metrics = []
for k in top_k_list:
    metrics.append(BinaryRecoMetrics.AUC(**metric_params, k=k))
    metrics.append(BinaryRecoMetrics.CTR(**metric_params, k=k))
    metrics.append(RankingRecoMetrics.Precision(**metric_params,  k=k))
    metrics.append(RankingRecoMetrics.Recall(**metric_params, k=k))
    metrics.append(RankingRecoMetrics.NDCG(**metric_params, k=k))
    metrics.append(RankingRecoMetrics.MAP(**metric_params, k=k))

Predict and Predict_expectation difference in results

Hello,
First of all, I would like to say thank you for your work on that package, which considerably simplified the work on contextual bandits!
I was looking at your package to use in my pet project but had a hard time understanding the returned results.
In the attached screenshot one can see that the first 5 values returned by the "prediction" method do not match with the maximum of arm rewards for the first 5 rows which I would expect them to match...Could you please explain to me what is the logic here..?

Parallel fit/predict for contextual policies

In MAB.py line 876
There is a comment that says to not use parallel fit or predict for contextual policies, but it's not clear why that needs to be the case?

Any help would be appreciated.

Cascading feedback type

Hello

In the cascading feedback type (the term coined by Craswell et al., 2008), we assume the user looks at the displayed items in a sequential manner, starting at the top slot. As soon as the user finds an item worthy of clicking, they click and never return to the current ranked list. They don't even look at items below the item clicked. Not clicking on any item is also a possibility, this happens when none of the displayed items are worthy of clicking. In this case, the user does look at all the items.

The feedback signal is composed of two elements: The index of the chosen element, and the value of the click. Then it is the agent's task to translate this information to scores. In our implementation in the bandit library, we implemented the convention that seen but unclicked items receive some low score (typically 0 or -1), the clicked item receives the click value, and the items beyond the clicked one are ignored by the agent.

Does this repo support cascade feedback mechanism?

[Question] A way to only predict arms from a given subset?

We are using mabwiser for a production use case and have had a really good experience with the package so far!

In our use case, it's possible that at serve time, the set of available arms is a subset of all the arms that the model was trained on. So let's say we train the model on arms [1, 2, 3] but when we run predictions online, we can only show arms [1, 2]. Note, the set of available arms is known at serve time.

I'm looking for the best way to make sure that the model only returns one of [1, 2].

I've come up with a way in which I:

call predict_expectations()
filter the expectations to keep just the available arms
return the arm with the highest expectation

Show in code below:

from mabwiser.mab import LearningPolicy,MAB

# TRAIN TIME
mab = MAB(
    arms=[1, 2, 3],
    learning_policy=LearningPolicy.EpsilonGreedy())

mab.fit(decisions=[1, 1, 2, 2, 3, 3], rewards=[.2, .3, .4, .2, .5, .6])

# SERVE TIME
available_arms = {1, 2}

model_arms = set(mab.arms)
print(model_arms)
# {1, 2, 3}

arms_to_remove = model_arms.difference(available_arms)
print(arms_to_remove)
# {3}

# remove arms from predicted expectations and get argmax
predicted_expectations = mab.predict_expectations()
for arm in arms_to_remove:
    predicted_expectations.pop(arm)

prediction = max(predicted_expectations, key=predicted_expectations.get)
print(prediction)
# 2

I was really curious if you thought this was appropriate or had other suggestions. I can already see that with this approach I will have to adjust the code when I start using one of the linear learning policies.

I also tried a different approach where I deep copy the mab model and call remove_arms. But this led to buggy results, where the same strategy is always returned. I suspect this is because of taking a deep copy of the model each time before I call predict.

Thanks a lot!

number of arms

Hi Mabwiser team,
Thank you so much for the great work on the MAB library with neighborhood policy. I really loved it.
I am using mabwiser for recommendation systems, but my number of arms can be in thousands. The library works fine when we have arms in a few hundred, but looking into various ways to do better recommendations using mabwiser. Is there any limit or max number of arms without sacrificing the performance? Thank you in advance.

Consistently get the actual expected value for each arm

Hello, everyone!

First of all, congratulations (and thank you!) for putting this amount of work in a package that is not only easy to use, but also extremely efficient. This here is probably what will make MAB more popular in the next few years.

I would like to ask whether it is possible for the user to consistently get the expected revenues for each arm before each ˋpartial_fitˋ, with the EpsilonGreedy policy. Using ˋ.predict_expectationsˋ does not solve this entirely, since epsilon% of the predictions are going to be random.

I understand that the ultimate purpose behind ˋpredict_expectationsˋ is to work like ˋpredict_probaˋ in case we really need those expectations. But I would like to plot the actual expected value the bandit is storing.

I couldn't find any method to retrieve this information in the official, and I apologize in advance if the instruction is there and I didn't find.

Thank you!!

init order

in class _Linear ,init, with order l2_lambda ,alpha
but in customized_mab.py, class LinUCBColdStart(_Linear) init with order alpha, l2_lambda,
is this correct?

Is there a way to retrieve DecisionTree output?

I have chosen a decision tree as neighborhood policy. Is there any way to get the output of the decision tree, e.g. as an image?

Simulator usage - train and test split for target encoded features to avoid leakage

Hello again dear authors,

Firstly I would like to say that I purchased your paper and it helped quite a lot to gain the "context" of the package and motivated me to go deeper. Now going deeper I was thinking to use the Simulator class to find the optimal model and HP, however, since I am working on contextual bandit there a lot of feature encoding is involved. For example, I am utilizing target encoding on my categorical features, for that, I split the data into a train-test and train my encoder on the train set, then apply this encoder on the test set (targets from the test part are not involved in the encoder training process to avoid leakage). Now, looking at the Simulator class I can only see that the whole dataset can be fed, and then inside the package, the train-test split is going to happen. However, in this case, I would need to target encode my features based on the whole dataset but it would create a leakage since they suppose to be an unknown target on the test set that will be used by the encoder...Therefore my question would be if you could give me a hint on how I can overcome this problem? I hope my explanation is not too messy and again I will be helpful with any kind of advice on that matter.

How to use Categorical variables as context?

Hi team,
I am working on Contextual MAB, and though the example mentions "subscriber" as 0,1. But I was curious to know if I could use a different categorical variable, lets say with categories=10, so, so 9 additional dummy columns would be made. Will this impact the learning, or it can handle?

Thanks

fidelity / mabwiser Goto Github PK

mabwiser's People

Contributors

Stargazers

Watchers

Forkers

mabwiser's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs