fidelity / mabwiser Goto Github PK
View Code? Open in Web Editor NEW[IJAIT 2021] MABWiser: Contextual Multi-Armed Bandits Library
Home Page: https://fidelity.github.io/mabwiser/
License: Apache License 2.0
[IJAIT 2021] MABWiser: Contextual Multi-Armed Bandits Library
Home Page: https://fidelity.github.io/mabwiser/
License: Apache License 2.0
The current Predict method of 'arm_to_expectation' is to pick the first max arm in code
How about randomize the pick method, such as the implement below:
Hi there,
I have been exploring your amazing library, and I am wondering if it would make sense to interpret the output of predict_expectations
as the counterfactual rewards for each time step. In other words, predict_expectations
represents what would have happened if a different action had been chosen at each time t. Would it make sense?
I saw #86 and how it was marked as "closed" by a developer after suggesting we access a private field of the MAB class in order to get the rewards of e.g. EpsilonGreedy rewards, because the original method predict_expectations
has a random chance to return random rewards.
As I already commented, this is not only bad practice, it also (predictably) leads to broken behavior (that is not trivial to workaround anymore) when using it on nonparametric contextual bandits.
My suggestions are either rewrite the method predict_expectations
so that it only returns the expectations with no extra rng, or (in case this RNG was implemented for some theoretical reason) add a get_expectations
method that returns the values.
Hi mabwiser team,
Thank you for this great library!
I am wondering what would be the best approach for mabwiser to be used in cold start situations, where there aren't yet any recorded decisions and rewards. Could it be possible to initialize rewards for all arms with 0? The use case I have in mind has binary rewards [0,1].
Thanks in advance!
In the constructor of MAB, we use a Union of LPs and Union of NPs written explicitly.
https://github.com/fidelity/mabwiser/blob/master/mabwiser/mab.py#L732
This is problematic..
This creates an issue for ALSN not being able to a generic LP Type or NP Type, unless it also explicity states all options as a Union -- which is not good.
This issue is to create a Type Defintion, LPType and NPType similar to ARM type https://github.com/fidelity/mabwiser/blob/master/mabwiser/utils.py#L13
This can simplify the MAB() constuctor using these type. And... ALNS and other libraries on top, can refer to these types without necessarily knowing what is in it. When mabwiser introduces new policies, pip upgrading the library at the top-level library will uptake new types automatically.
Note: once this features is complete, it also requires a minor code change in the ALNS to refer to these new types in the MABSelector()
Hi!
We're using your implementation of MAB in our project and it's a great pleasure! However, I'd like to ask: is there a possibility to save the current state of MAB at any time? For example, it would be useful to plot actions probabilities for different contexts throughout the evolution process in our case.
Thank you in advance!
Thanks for the union types and PR for ALNS! 🎉
I noticed that currently MAB
only takes learning and neighborhood policies that are already predefined as part of the LearningPolicyType
or NeighborhoodPolicyType
typevars. It makes sense to do this, because MAB
handles each implemented policy on a case-by-case basis, using isinstance
checks to decide which policy is used.
A downside to this is that users cannot easily provide their own policies without having to explicitly patch MAB
(at least, if I understand the docs correctly). It might be possible to rewrite in such a way that all that a new policy needs to do is implement a few required methods. Those required methods could then be specified in an interface protocol. We do this in ALNS with the stopping and acceptance criteria, as well as the operator selection schemes: as long as users implement the required methods, their classes can be passed directly to ALNS without any modification to the library.
As I write this issue, I realize this would require significant refactoring. But it might be worthwhile because the resulting implementation would be much less coupled than it currently is.
When partial_fit()
is done with UCB1, arm_to_expectation
of only the arms having a reward are updated (because of https://github.com/fmr-llc/mabwiser/blob/0c860253be017d1f393e18bf9d9d7e1739f93dca/mabwiser/ucb.py#L62 ). If an arm does not have a reward, its arm_to_expectation
is not updated.
arm_to_expectation
depends on self.total_count
which gets updated when "any" of the arms are invoked. Thus, arm_to_expectation
of all arms need to be updated when self.total_count
changes.
Solution: remove the above condition (if arm_rewards.size
).
Happy to submit a fix if this change can be made.
Do I need to convert the categorical variables using one-hot encoding in order to use in the context. Does the context only use numerical features or can it accept categorical features as well?
Very nice project, but how to asynchronously implement the training and prediction process
Hi,
I noticed that in the fit
function of _ThompsonSampling
, contexts
is never passed to self._parallel_fit(decisions, rewards)
. https://github.com/fidelity/mabwiser/blob/master/mabwiser/thompson.py#L38
def fit(self, decisions: np.ndarray, rewards: np.ndarray, contexts: np.ndarray = None) -> NoReturn:
# If rewards are non binary, convert them
rewards = self._get_binary_rewards(decisions, rewards)
# Reset the success and failure counters to 1 (beta distribution is undefined for 0)
reset(self.arm_to_success_count, 1)
reset(self.arm_to_fail_count, 1)
# Reset warm started arms
self._reset_arm_to_status()
# Calculate fit
self._parallel_fit(decisions, rewards)
# Update trained arms
self._set_arms_as_trained(decisions=decisions, is_partial=False)
# Leave the calculation of expectations to predict methods
I'm curious why this is the case. In simulator.py
, ThompsonSampling
appears in both contextual_mabs
and context_free_mabs
. https://github.com/fidelity/mabwiser/blob/master/examples/simulator.py#L30-L42
If _parallel_fit
in _ThompsonSampling
never receives the context, how does it solve the contextual bandits problem?
Thanks.
Hi!
I learned that mabwiser
does not implement Thompson Sampling for Gaussian priors. As far as I know (please note that I'm quite new to multi-armed bandits), the Gaussian distribution is a conjugate prior and it's possible to apply the procedure of Thompson Sampling on Gaussian priors. Would the maintainers be open to adding that as a learning policy? Or, is it the case that the optimality of Thompson sampling is only proved for Beta priors?
Thanks a lot!
I am getting the following error
File "C:\Users\ayush\AppData\Local\Temp\2\ipykernel_5524\1442836565.py", line 12, in <module>
response_col = 'sales_net')
File "C:\ProgramData\Anaconda3\envs\test_env\lib\site-packages\mab2rec\pipeline.py", line 417, in benchmark
return _bench(**args)
File "C:\ProgramData\Anaconda3\envs\test_env\lib\site-packages\mab2rec\pipeline.py", line 531, in _bench
recommendations[name])
File "C:\ProgramData\Anaconda3\envs\test_env\lib\site-packages\jurity\recommenders\combined.py", line 121, in get_score
return_extended_results)
File "C:\ProgramData\Anaconda3\envs\test_env\lib\site-packages\jurity\recommenders\auc.py", line 131, in get_score
sorted_clicks = get_sorted_clicks(predicted_results, self._user_id_column, self.click_column, self.k)
File "C:\ProgramData\Anaconda3\envs\test_env\lib\site-packages\jurity\hash_utils.py", line 80, in wrapper
return cached_wrapper(*args, **kwargs)
File "C:\ProgramData\Anaconda3\envs\test_env\lib\site-packages\jurity\hash_utils.py", line 87, in cached_wrapper
return user_function(*args, **kwargs)
File "C:\ProgramData\Anaconda3\envs\test_env\lib\site-packages\jurity\utils.py", line 347, in get_sorted_clicks
sorted_clicks = results.sort_values(click_column, ascending=False).groupby(user_id_column).head(k)
File "C:\ProgramData\Anaconda3\envs\test_env\lib\site-packages\pandas\util\_decorators.py", line 311, in wrapper
return func(*args, **kwargs)
File "C:\ProgramData\Anaconda3\envs\test_env\lib\site-packages\pandas\core\frame.py", line 6259, in sort_values
k = self._get_label_or_level_values(by, axis=axis)
File "C:\ProgramData\Anaconda3\envs\test_env\lib\site-packages\pandas\core\generic.py", line 1779, in _get_label_or_level_values
raise KeyError(key)
KeyError: 'sales_net'
This is how I have initialized the metrics:
from jurity.recommenders import BinaryRecoMetrics, RankingRecoMetrics
# Column names for the response, user, and item id columns
metric_params = {'click_column': 'sales_net', 'user_id_column': 'ID', 'item_id_column':'MailerID'}
# Evaluate peformance at different k-recommendations
top_k_list = [4]
# List of metrics to benchmark
metrics = []
for k in top_k_list:
metrics.append(BinaryRecoMetrics.AUC(**metric_params, k=k))
metrics.append(BinaryRecoMetrics.CTR(**metric_params, k=k))
metrics.append(RankingRecoMetrics.Precision(**metric_params, k=k))
metrics.append(RankingRecoMetrics.Recall(**metric_params, k=k))
metrics.append(RankingRecoMetrics.NDCG(**metric_params, k=k))
metrics.append(RankingRecoMetrics.MAP(**metric_params, k=k))
Hello,
First of all, I would like to say thank you for your work on that package, which considerably simplified the work on contextual bandits!
I was looking at your package to use in my pet project but had a hard time understanding the returned results.
In the attached screenshot one can see that the first 5 values returned by the "prediction" method do not match with the maximum of arm rewards for the first 5 rows which I would expect them to match...Could you please explain to me what is the logic here..?
In MAB.py line 876
There is a comment that says to not use parallel fit or predict for contextual policies, but it's not clear why that needs to be the case?
Any help would be appreciated.
Hello
In the cascading feedback type (the term coined by Craswell et al., 2008), we assume the user looks at the displayed items in a sequential manner, starting at the top slot. As soon as the user finds an item worthy of clicking, they click and never return to the current ranked list. They don't even look at items below the item clicked. Not clicking on any item is also a possibility, this happens when none of the displayed items are worthy of clicking. In this case, the user does look at all the items.
The feedback signal is composed of two elements: The index of the chosen element, and the value of the click. Then it is the agent's task to translate this information to scores. In our implementation in the bandit library, we implemented the convention that seen but unclicked items receive some low score (typically 0 or -1), the clicked item receives the click value, and the items beyond the clicked one are ignored by the agent.
Does this repo support cascade feedback mechanism?
We are using mabwiser
for a production use case and have had a really good experience with the package so far!
In our use case, it's possible that at serve time, the set of available arms is a subset of all the arms that the model was trained on. So let's say we train the model on arms [1, 2, 3]
but when we run predictions online, we can only show arms [1, 2]
. Note, the set of available arms is known at serve time.
I'm looking for the best way to make sure that the model only returns one of [1, 2]
.
I've come up with a way in which I:
predict_expectations()
Show in code below:
from mabwiser.mab import LearningPolicy,MAB
# TRAIN TIME
mab = MAB(
arms=[1, 2, 3],
learning_policy=LearningPolicy.EpsilonGreedy())
mab.fit(decisions=[1, 1, 2, 2, 3, 3], rewards=[.2, .3, .4, .2, .5, .6])
# SERVE TIME
available_arms = {1, 2}
model_arms = set(mab.arms)
print(model_arms)
# {1, 2, 3}
arms_to_remove = model_arms.difference(available_arms)
print(arms_to_remove)
# {3}
# remove arms from predicted expectations and get argmax
predicted_expectations = mab.predict_expectations()
for arm in arms_to_remove:
predicted_expectations.pop(arm)
prediction = max(predicted_expectations, key=predicted_expectations.get)
print(prediction)
# 2
I was really curious if you thought this was appropriate or had other suggestions. I can already see that with this approach I will have to adjust the code when I start using one of the linear
learning policies.
I also tried a different approach where I deep copy the mab model and call remove_arms
. But this led to buggy results, where the same strategy is always returned. I suspect this is because of taking a deep copy of the model each time before I call predict
.
Thanks a lot!
Hi Mabwiser team,
Thank you so much for the great work on the MAB library with neighborhood policy. I really loved it.
I am using mabwiser for recommendation systems, but my number of arms can be in thousands. The library works fine when we have arms in a few hundred, but looking into various ways to do better recommendations using mabwiser. Is there any limit or max number of arms without sacrificing the performance? Thank you in advance.
Hello, everyone!
First of all, congratulations (and thank you!) for putting this amount of work in a package that is not only easy to use, but also extremely efficient. This here is probably what will make MAB more popular in the next few years.
I would like to ask whether it is possible for the user to consistently get the expected revenues for each arm before each ˋpartial_fitˋ, with the EpsilonGreedy policy. Using ˋ.predict_expectationsˋ does not solve this entirely, since epsilon% of the predictions are going to be random.
I understand that the ultimate purpose behind ˋpredict_expectationsˋ is to work like ˋpredict_probaˋ in case we really need those expectations. But I would like to plot the actual expected value the bandit is storing.
I couldn't find any method to retrieve this information in the official, and I apologize in advance if the instruction is there and I didn't find.
Thank you!!
in class _Linear ,init, with order l2_lambda ,alpha
but in customized_mab.py, class LinUCBColdStart(_Linear) init with order alpha, l2_lambda
,
is this correct?
I have chosen a decision tree as neighborhood policy. Is there any way to get the output of the decision tree, e.g. as an image?
Hello again dear authors,
Firstly I would like to say that I purchased your paper and it helped quite a lot to gain the "context" of the package and motivated me to go deeper. Now going deeper I was thinking to use the Simulator class to find the optimal model and HP, however, since I am working on contextual bandit there a lot of feature encoding is involved. For example, I am utilizing target encoding on my categorical features, for that, I split the data into a train-test and train my encoder on the train set, then apply this encoder on the test set (targets from the test part are not involved in the encoder training process to avoid leakage). Now, looking at the Simulator class I can only see that the whole dataset can be fed, and then inside the package, the train-test split is going to happen. However, in this case, I would need to target encode my features based on the whole dataset but it would create a leakage since they suppose to be an unknown target on the test set that will be used by the encoder...Therefore my question would be if you could give me a hint on how I can overcome this problem? I hope my explanation is not too messy and again I will be helpful with any kind of advice on that matter.
Hi team,
I am working on Contextual MAB, and though the example mentions "subscriber" as 0,1. But I was curious to know if I could use a different categorical variable, lets say with categories=10, so, so 9 additional dummy columns would be made. Will this impact the learning, or it can handle?
Thanks
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.