david-cortes / contextualbandits Goto Github PK
View Code? Open in Web Editor NEWPython implementations of contextual bandits algorithms
Home Page: http://contextual-bandits.readthedocs.io
License: BSD 2-Clause "Simplified" License
Python implementations of contextual bandits algorithms
Home Page: http://contextual-bandits.readthedocs.io
License: BSD 2-Clause "Simplified" License
I figured commenting here would be easier than forking with a pull request.
The final example (using Mediamill_data.txt) has an extra parameter update_freq = 5
I'm guessing this is just a leftover artifact but couldn't find any commit version of evaluateRejectionSampling
that had it? I don't believe removing it substantially changes any of the points made in the tutorial, but the cell result did change from ~(0.4028, 422)
to ~(0.4539, 445)
.
Great library btw!
I found an error if beta_prior
is set to be "auto"
and nchoices
is a list
when we initialize the online contextual bandits.
Here is the error message:
File "/usr/local/lib/python3.8/dist-packages/contextualbandits/online.py", line 1894, in __init__
beta_prior = ((3./nchoices, 4.), 2)
TypeError: unsupported operand type(s) for /: 'float' and 'list'
The beta prior
should be handled depending on the type of nchoices
(int
or list-like
).
@david-cortes
I have saved a model in the following way:
base_algorithm = SGDClassifier(random_state=123, loss='log')
beta_prior = ((3, 7), 2)
model = BootstrappedUCB(deepcopy(base_algorithm), nchoices = nchoices, batch_train=True, beta_prior=beta_prior)
for i in range(iters): // for loop with several iterations
// shape for X: [batch, 2626]
// shape for a: [batch, 1]
// shape for r: [batch, 1]
model.partial_fit(X, a, r)
target_model = "20190521.dill"
dill.dump(model, open(target_model, "wb"))
BUT i got different prediction results every time for the same input, here is a simulation:
>>> model = dill.load(open("20190521.dill", "rb"))
>>> X = np.random.normal(size=(1, 2626))
>>> res01 = model.decision_function(X)
>>> res01[0][:5]
array([0.447249 , 0.27269542, 0.48439773, 0.26759085, 0.1235832 ])
>>>
>>> res02 = model.decision_function(X)
>>> res02[0][:5]
array([0.1319437 , 0.21268724, 0.40948264, 0.13509549, 0.15605585])
>>>
>>>
>>> pred01 = model.predict(X)
>>> pred01
array([651])
>>>
>>> model.predict(X)
array([210])
>>> model.predict(X)
array([1741])
20190521.dill is the model trained with BootstrappedUCB in the above way.
Hello, I'm trying to find the "Bibtex_data.txt" dataset referenced in the examples, but could not find it. Could you please point me to it?
Thanks,
A.
The third argument for _check_constructor_input method is supposed to the batch_train boolean. The constructors for BayesianTS and BayesianUCB are instead passing a tuple.
AssertionError Traceback (most recent call last)
<ipython-input-117-96a94fb40784> in <module>()
4 nchoices=50
5
----> 6 bayesian_ts=BayesianTS(nchoices)
7
8 bayesian_ucb=BayesianUCB(nchoices)
~\AppData\Local\Continuum\anaconda3\lib\site-packages\contextualbandits\online.py in __init__(self, nchoices, method, beta_prior)
1912 """
1913 def __init__(self, nchoices, method='advi', beta_prior=((1,1),3)):
-> 1914 _check_constructor_input(_BetaPredictor(1,1),nchoices,((1,1),2))
1915 self.beta_prior = beta_prior
1916 self.nchoices = nchoices
~\AppData\Local\Continuum\anaconda3\lib\site-packages\contextualbandits\utils.py in _check_constructor_input(base_algorithm, nchoices, batch_train)
54 assert ('fit' in dir(base_algorithm)) and ('predict' in dir(base_algorithm))
55 if batch_train:
---> 56 assert 'partial_fit' in dir(base_algorithm)
57 return None
58
It is said in the documentation that only binary rewards are supported. If continuous values are passed, the following sklearn exception is thrown:
ValueError: Unknown label type: 'continuous'
However, looks like there are no exceptions or errors when some regressor is used as base_algorithm
, e.g.
agent = SeparateClassifiers(base_algorithm=RandomForestRegressor(), n_choices=...)
I haven't faced any unexpected behaviour for my use case. So, was it just luck or it is really a way to work with continuous rewards?
First of all thank you for code to use CB : >
When I run your example notebook (online_contextual_bandits.ipynb), I get 'AssertionError' when i run '3.3 Streaming models' part. how can i get some hint to fix that error?
AssertionError Traceback (most recent call last)
in
62 lst_actions[model],
63 X_batch, y_batch,
---> 64 rnd_seed = batch_st)
in simulate_rounds_stoch(model, rewards, actions_hist, X_batch, y_batch, rnd_seed)
31
32 ## choosing actions for this batch
---> 33 actions_this_batch = model.predict(X_batch).astype('uint8')
34
35 # keeping track of the sum of rewards received
/databricks/python/lib/python3.7/site-packages/contextualbandits/online.py in predict(self, X, exploit)
2003 if not self.is_fitted:
2004 return self._predict_random_if_unfit(X, False)
-> 2005 return self._name_arms(self._predict(X, exploit, True))
2006
2007 def _predict(self, X, exploit = False, choose = True):
/databricks/python/lib/python3.7/site-packages/contextualbandits/online.py in _predict(self, X, exploit, choose)
2029 # case 1: number of predictions to make would still fit within current window
2030 if remainder_window > X.shape[0]:
-> 2031 pred, pred_max = self.calc_preds(X, choose)
2032 self.window_cnt += X.shape[0]
2033 self.window = np.r[self.window, pred_max]
/databricks/python/lib/python3.7/site-packages/contextualbandits/online.py in _calc_preds(self, X, choose)
2076
2077 def _calc_preds(self, X, choose = True):
-> 2078 pred_proba = self._oracles.decision_function(X)
2079 np.nan_to_num(pred_proba, copy=False)
2080 pred_max = pred_proba.max(axis = 1)
/databricks/python/lib/python3.7/site-packages/contextualbandits/utils.py in decision_function(self, X)
927 Parallel(n_jobs=self.njobs, verbose=0, require="sharedmem")
928 (delayed(self._decision_function_single)(choice, X, preds, 1)
--> 929 for choice in range(self.n))
930 _apply_smoothing(preds, self.smooth, self.counters,
931 self.noise_to_smooth, self.random_state)
/databricks/python/lib/python3.7/site-packages/joblib/parallel.py in call(self, iterable)
1015
1016 with self._backend.retrieval_context():
-> 1017 self.retrieve()
1018 # Make sure that we get a last message telling us we are done
1019 elapsed_time = time.time() - self._start_time
/databricks/python/lib/python3.7/site-packages/joblib/parallel.py in retrieve(self)
907 try:
908 if getattr(self._backend, 'supports_timeout', False):
--> 909 self._output.extend(job.get(timeout=self.timeout))
910 else:
911 self._output.extend(job.get())
/usr/lib/python3.7/multiprocessing/pool.py in get(self, timeout)
655 return self._value
656 else:
--> 657 raise self._value
658
659 def _set(self, i, obj):
/usr/lib/python3.7/multiprocessing/pool.py in worker(inqueue, outqueue, initializer, initargs, maxtasks, wrap_exception)
119 job, i, func, args, kwds = task
120 try:
--> 121 result = (True, func(*args, **kwds))
122 except Exception as e:
123 if wrap_exception and func is not _helper_reraises_exception:
/databricks/python/lib/python3.7/site-packages/joblib/_parallel_backends.py in call(self, *args, **kwargs)
606 def call(self, *args, **kwargs):
607 try:
--> 608 return self.func(*args, **kwargs)
609 except KeyboardInterrupt:
610 # We capture the KeyboardInterrupt and reraise it as
/databricks/python/lib/python3.7/site-packages/joblib/parallel.py in call(self)
254 with parallel_backend(self._backend, n_jobs=self._n_jobs):
255 return [func(*args, **kwargs)
--> 256 for func, args, kwargs in self.items]
257
258 def len(self):
/databricks/python/lib/python3.7/site-packages/joblib/parallel.py in (.0)
254 with parallel_backend(self._backend, n_jobs=self._n_jobs):
255 return [func(*args, **kwargs)
--> 256 for func, args, kwargs in self.items]
257
258 def len(self):
/databricks/python/lib/python3.7/site-packages/contextualbandits/utils.py in _decision_function_single(self, choice, X, preds, depth)
955 preds[:, choice] = self.algos[choice].decision_function_w_sigmoid(X)
956 else:
--> 957 preds[:, choice] = self.algos[choice].predict(X)
958
959 ### Note to self: it's not a problem to mix different methods from the
/databricks/python/lib/python3.7/site-packages/contextualbandits/linreg/init.py in predict(self, X)
512 The predicted values given 'X'.
513 """
--> 514 assert self.is_fitted_
515
516 pred = X.dot(self.coef_[:self._n])
AssertionError: `
The utilities module uses the as_matrix method three times. Fitting a model generates the warning:
<path>\contextualbandits\utils.py:85: FutureWarning: Method .as_matrix will be removed in a future version. Use .values instead.
X=X.as_matrix()
The method is also used on lines 97 and 482.
I had to change a couple of things to get this compiling on Mac OS X:
extra_compile_args
member if none were specified-lomp
and preprocessor flag is differentSomething like:
diff --git a/setup.py b/setup.py
index ea0ff35..8b7e15b 100644
--- a/setup.py
+++ b/setup.py
@@ -9,16 +9,17 @@ class build_ext_subclass( build_ext ):
if compiler == 'msvc': # visual studio
for e in self.extensions:
e.extra_compile_args += ['/O2', '/openmp']
+ elif platform.startswith('darwin'):
+ for e in self.extensions:
+ e.extra_link_args += ["-lomp"]
+ e.extra_compile_args = ["-Xpreprocessor", "-fopenmp", "-O3", "-march=native"]
+ if e.language == "c++":
+ e.extra_compile_args += ["-std=c++17"]
else:
for e in self.extensions:
e.extra_compile_args += ['-O3', '-march=native', '-fopenmp']
e.extra_link_args += ['-fopenmp']
-
- ### Remove this code if you have a mac with gcc or clang + openmp
- if platform[:3] == "dar":
- for e in self.extensions:
- e.extra_compile_args = [arg for arg in extra_compile_args if arg != '-fopenmp']
- e.extra_link_args = [arg for arg in extra_link_args if arg != '-fopenmp']
+
build_ext.build_extensions(self)
setup(
The n_features in parameter X (array (n_samples, n_features))
of predict(X, exploit=False, gradient_calc='weighted')
should be referred to as the context, summarizing information of both the user u and arm a. It should be a form like CONCAT<user_vec, arm_vec>.
If So,
predict(CONCAT<user1_vec, arm1_vec>, exploit=False, gradient_calc='weighted')
predict(CONCAT<user1_vec, arm2_vec>, exploit=False, gradient_calc='weighted')
may give a different actions prediction for the same user1.
That is confusing me.
How should I understand the The n_features in parameter X (array (n_samples, n_features))
?
After training the LinUCB model, I shared the model with another user. The other user was getting different scores for the arms for the same data. Any idea why?
The bibtex dataset is no longer available online at the link provided
When I execute this code block:
from sklearn.linear_model import LogisticRegression
from contextualbandits.online import BootstrappedUCB, BootstrappedTS, LogisticUCB, \
SeparateClassifiers, EpsilonGreedy, AdaptiveGreedy, ExploreFirst, \
ActiveExplorer, SoftmaxExplorer
from copy import deepcopy
nchoices = y.shape[1]
base_algorithm = LogisticRegression(solver='lbfgs', warm_start=True)
beta_prior = ((3./nchoices, 4), 2) # until there are at least 2 observations of each class, will use this prior
beta_prior_ucb = ((5./nchoices, 4), 2) # UCB gives higher numbers, thus the higher positive prior
beta_prior_ts = ((2./np.log2(nchoices), 4), 2)
### Important!!! the default values for beta_prior will be changed in version 0.3
## The base algorithm is embedded in different metaheuristics
bootstrapped_ucb = BootstrappedUCB(deepcopy(base_algorithm), nchoices = nchoices,
beta_prior = beta_prior_ucb, percentile = 80,
random_state = 1111)
bootstrapped_ts = BootstrappedTS(deepcopy(base_algorithm), nchoices = nchoices,
beta_prior = beta_prior_ts, random_state = 2222)
one_vs_rest = SeparateClassifiers(deepcopy(base_algorithm), nchoices = nchoices,
beta_prior = beta_prior, random_state = 3333)
epsilon_greedy = EpsilonGreedy(deepcopy(base_algorithm), nchoices = nchoices,
beta_prior = beta_prior, random_state = 4444)
logistic_ucb = LogisticUCB(nchoices = nchoices, percentile = 70,
beta_prior = beta_prior_ts, random_state = 5555)
adaptive_greedy_thr = AdaptiveGreedy(deepcopy(base_algorithm), nchoices=nchoices,
decay_type='threshold',
beta_prior = beta_prior, random_state = 6666)
adaptive_greedy_perc = AdaptiveGreedy(deepcopy(base_algorithm), nchoices = nchoices,
decay_type='percentile', decay=0.9997,
beta_prior=beta_prior, random_state = 7777)
explore_first = ExploreFirst(deepcopy(base_algorithm), nchoices = nchoices,
explore_rounds=1500, beta_prior=None, random_state = 8888)
active_explorer = ActiveExplorer(deepcopy(base_algorithm), nchoices = nchoices,
beta_prior=beta_prior, random_state = 9999)
adaptive_active_greedy = AdaptiveGreedy(deepcopy(base_algorithm), nchoices = nchoices,
active_choice='weighted', decay_type='percentile', decay=0.9997,
beta_prior=beta_prior, random_state = 1234)
softmax_explorer = SoftmaxExplorer(deepcopy(base_algorithm), nchoices = nchoices,
beta_prior=beta_prior, random_state = 5678)
models = [bootstrapped_ucb, bootstrapped_ts, one_vs_rest, epsilon_greedy, logistic_ucb,
adaptive_greedy_thr, adaptive_greedy_perc, explore_first, active_explorer,
adaptive_active_greedy, softmax_explorer]"
I got an error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-2-f97973a242eb> in <module>
1 from sklearn.linear_model import LogisticRegression
----> 2 from contextualbandits.online import BootstrappedUCB, BootstrappedTS, LogisticUCB, \
3 SeparateClassifiers, EpsilonGreedy, AdaptiveGreedy, ExploreFirst, \
4 ActiveExplorer, SoftmaxExplorer
5 from copy import deepcopy
C:\Python38\lib\site-packages\contextualbandits\__init__.py in <module>
----> 1 from . import online
2 from . import offpolicy
3 from . import evaluation
4 from . import linreg
C:\Python38\lib\site-packages\contextualbandits\online.py in <module>
2
3 import numpy as np, warnings, ctypes
----> 4 from .utils import _check_constructor_input, _check_beta_prior, \
5 _check_smoothing, _check_fit_input, _check_X_input, _check_1d_inp, \
6 _ZeroPredictor, _OnePredictor, _OneVsRest,\
C:\Python38\lib\site-packages\contextualbandits\utils.py in <module>
8 from sklearn.linear_model import LogisticRegression
9 from sklearn.tree import DecisionTreeClassifier
---> 10 from .linreg import LinearRegression, _wrapper_double
11 from ._cy_utils import _matrix_inv_symm, _create_node_counters
12
C:\Python38\lib\site-packages\contextualbandits\linreg\__init__.py in <module>
4 import warnings
5 from sklearn.base import BaseEstimator
----> 6 from . import _wrapper_double, _wrapper_float
7
8 __all__ = ["LinearRegression", "ElasticNet"]
contextualbandits\linreg\linreg_double.pyx in init contextualbandits.linreg._wrapper_double()
ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject
Does anyone have any idea why it happened?
Hi there,
Contextual bandits problem is interesting and useful for many realistic problem. Thanks for the code!
When I tried to reproduce the online contextual bandits experiments with the file online_contextual_bandits.ipynb, I encountered an error as follows. I used python 2.7.15 (Anaconda 2.7) under Ubuntu 1604.
I went over possible reasons, like numpy version, joblib version. It looks like somehow a non-positive number, i.e., self.a, is generated for np.random.beta(self.a, self.b, size=X.shape[0])
Thanks in advance.
`<contextualbandits.online.AdaptiveGreedy instance at 0x7f6888ee3200>
ValueError Traceback (most recent call last)
in ()
59 lst_actions[model],
60 X, y,
---> 61 batch_st, batch_end)
in simulate_rounds(model, rewards, actions_hist, X_global, y_global, batch_st, batch_end)
30
31 ## choosing actions for this batch
---> 32 actions_this_batch = model.predict(X_global[batch_st:batch_end, :]).astype('uint8')
33
34 # keeping track of the sum of rewards received
/home/yluo/anaconda2/lib/python2.7/site-packages/contextualbandits/online.pyc in predict(self, X, exploit)
877 """
878 # TODO: add option to output scores
--> 879 return self._name_arms(self._predict(X, exploit))
880
881 def _predict(self, X, exploit = False):
/home/yluo/anaconda2/lib/python2.7/site-packages/contextualbandits/online.pyc in _predict(self, X, exploit)
898 # case 1: number of predictions to make would still fit within current window
899 if remainder_window > X.shape[0]:
--> 900 pred, pred_max = self.calc_preds(X)
901 self.window_cnt += X.shape[0]
902 self.window = np.r[self.window, pred_max]
/home/yluo/anaconda2/lib/python2.7/site-packages/contextualbandits/online.pyc in _calc_preds(self, X)
944
945 def _calc_preds(self, X):
--> 946 pred_proba = self._oracles.decision_function(X)
947 pred_max = pred_proba.max(axis=1)
948 pred = np.argmax(pred_proba, axis=1)
/home/yluo/anaconda2/lib/python2.7/site-packages/contextualbandits/utils.pyc in decision_function(self, X)
624 def decision_function(self, X):
625 preds = np.zeros((X.shape[0], self.n))
--> 626 Parallel(n_jobs=self.njobs, verbose=0, require="sharedmem")(delayed(self._decision_function_single)(choice, X, preds, 1) for choice in range(self.n))
627 _apply_smoothing(preds, self.smooth, self.counters)
628 return preds
/home/yluo/anaconda2/lib/python2.7/site-packages/joblib/parallel.pyc in call(self, iterable)
915 # remaining jobs.
916 self._iterating = False
--> 917 if self.dispatch_one_batch(iterator):
918 self._iterating = self._original_iterator is not None
919
/home/yluo/anaconda2/lib/python2.7/site-packages/joblib/parallel.pyc in dispatch_one_batch(self, iterator)
757 return False
758 else:
--> 759 self._dispatch(tasks)
760 return True
761
/home/yluo/anaconda2/lib/python2.7/site-packages/joblib/parallel.pyc in _dispatch(self, batch)
714 with self._lock:
715 job_idx = len(self._jobs)
--> 716 job = self._backend.apply_async(batch, callback=cb)
717 # A job can complete so quickly than its callback is
718 # called before we get here, causing self._jobs to
/home/yluo/anaconda2/lib/python2.7/site-packages/joblib/_parallel_backends.pyc in apply_async(self, func, callback)
180 def apply_async(self, func, callback=None):
181 """Schedule a func to be run"""
--> 182 result = ImmediateResult(func)
183 if callback:
184 callback(result)
/home/yluo/anaconda2/lib/python2.7/site-packages/joblib/_parallel_backends.pyc in init(self, batch)
547 # Don't delay the application, to avoid keeping the input
548 # arguments in memory
--> 549 self.results = batch()
550
551 def get(self):
/home/yluo/anaconda2/lib/python2.7/site-packages/joblib/parallel.pyc in call(self)
223 with parallel_backend(self._backend, n_jobs=self._n_jobs):
224 return [func(*args, **kwargs)
--> 225 for func, args, kwargs in self.items]
226
227 def len(self):
/home/yluo/anaconda2/lib/python2.7/site-packages/contextualbandits/utils.pyc in _decision_function_single(self, choice, X, preds, depth)
640 preds[:, choice] = self.algos[choice].predict_proba_robust(X)[:, 1]
641 elif 'predict_proba' in dir(self.base):
--> 642 preds[:, choice] = self.algos[choice].predict_proba(X)[:, 1]
643 else:
644 if depth == 0:
/home/yluo/anaconda2/lib/python2.7/site-packages/contextualbandits/utils.pyc in predict_proba(self, X)
313
314 def predict_proba(self, X):
--> 315 preds = np.random.beta(self.a, self.b, size=X.shape[0]).reshape((-1, 1))
316 return np.c_[1 - preds, preds]
317
mtrand.pyx in mtrand.RandomState.beta()
ValueError: a <= 0
`
@david-cortes
Is there a way for the incremental training? That should be an important requirement
Please add more documentation or mention here on how to use the method and how it will work with or without existing training data when adding a new arm.
In any of the online CBs, is it possible for the covariate input (X) to be from a continuous space?
(btw, thx for the awesome work!!)
contextualbandits/contextualbandits/utils.py
Line 819 in 78ea43a
Trying to line up with algo 1 in http://proceedings.mlr.press/v15/chu11a/chu11a.pdf
Hello @david-cortes, thanks for this Contextual Bandits package.
While using some of the online methods (BootstrappedTS, AdaptiveGreedy, maybe some others) from this package, I've faced some unexpected (at least to me) behaviour of decision_function
and other related functions like predict
.
Let's use some simple dummy data (it doesn't matter much) as an example:
import numpy as np
from contextualbandits.online import *
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
RANDOM_STATE = 42
X, y = load_iris(return_X_y=True)
a = np.random.randint(3, size=len(y))
r = 1 * (y == a)
cb_model_1 = BootstrappedTS(
base_algorithm=LogisticRegression(max_iter=10000), random_state=RANDOM_STATE, nchoices=3
)
cb_model_2 = BootstrappedTS(
base_algorithm=LogisticRegression(max_iter=10000), random_state=RANDOM_STATE, nchoices=3
)
cb_model_1.fit(X, a, r)
cb_model_2.fit(X, a, r)
print(cb_model_1.decision_function(X[0]))
print(cb_model_2.decision_function(X[0]))
print(cb_model_1.decision_function(X[0]))
print(cb_model_2.decision_function(X[0]))
The output I get is
[[0.96298824 0.11472752 0.00019669]]
[[0.96298824 0.11472752 0.00019669]]
[[0.97498834 0.22001592 0.00019669]]
[[0.97498834 0.22001592 0.00019669]]
Setting random_state
makes predictions of cb_model_1
and cb_model_2
equal as it should, but it's unclear for me why calling decision_function
second time changes the output. Another way to see this behaviour is to compare two predictions of the same model:
pred_1 = cb_model_1.predict(X)
pred_2 = cb_model_1.predict(X)
print((pred_1 == pred_2).mean())
outputs 0.92.
But the most confusing case is when it's needed to get both scores for each arm from decision function and action prediction:
pred = cb_model_1.predict(X)
dec_func = cb_model_1.decision_function(X)
print((np.argmax(dec_func, axis=1) == pred).mean())
outputs 0.96.
So, is this type of behaviour expected? I think it can be related to how some methods work, e.g.
Bootstrapped Thompson Sampling
Performs Thompson Sampling by fitting several models per class on bootstrapped samples, then makes predictions by taking one of them at random for each class.
But in my opinion, setting random state should block this randomization, especially in decision_function, or there should be a way to block it with another parameter.
I'm extremely new to the subject of contextual bandits and reinforcement learning. What I am interested in is how people use contextual bandits for advertising at scale. How do you average the rewards out?
Are there any implemented algorithms in this library which uses a single unified model for all arms?
Hello,
Thanks for this nice package.
Why two armed bandit aren't allowed?
Changing:
def _check_constructor_input(base_algorithm, nchoices, batch_train=False):
assert nchoices > 2
assert isinstance(nchoices, int)
assert ('fit' in dir(base_algorithm)) and ('predict' in dir(base_algorithm))
if batch_train:
assert 'partial_fit' in dir(base_algorithm)
return None
To:
def _check_constructor_input(base_algorithm, nchoices, batch_train=False):
assert nchoices >= 2
assert isinstance(nchoices, int)
assert ('fit' in dir(base_algorithm)) and ('predict' in dir(base_algorithm))
if batch_train:
assert 'partial_fit' in dir(base_algorithm)
return None
Was enough to make it work.
Just out of curiosity, is there any concern over adding the costsensitive
package to the install_requires
list? The package could be used by users, but it was not installed automatically.
try:
from costsensitive import _BinTree
except:
raise ValueError("This functionality requires package 'costsensitive'.\nCan be installed with 'pip install costsensitive'.")
Hi, first want to thank you for sharing this marvellous job.
I'm trying to follow the example on notebook 3.3 Streaming models
When using softmax_explorer.partial_fit()
, I get error as below
Traceback (most recent call last):
File "/anaconda3/envs/project/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3325, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "", line 1, in
runfile('/Users/user/projects/project/project/test/test_adressa_with_contextual_bandit_agent.py', wdir='/Users/user/projects/project/project/test')
File "/Applications/PyCharm.app/Contents/helpers/pydev/_pydev_bundle/pydev_umd.py", line 197, in runfile
pydev_imports.execfile(filename, global_vars, local_vars) # execute the script
File "/Applications/PyCharm.app/Contents/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/Users/user/projects/project/project/test/test_adressa_with_contextual_bandit_agent.py", line 64, in
np.array(current_batch_rewards))
File "/Users/user/projects/project/project/example_agents/contextual_bandits.py", line 86, in re_train
self.selected_model = self.selected_model.partial_fit(X=states, a=actions, r=rewards)
File "/anaconda3/envs/project/lib/python3.7/site-packages/contextualbandits/online.py", line 210, in partial_fit
self._oracles.partial_fit(X, a, r)
File "/anaconda3/envs/project/lib/python3.7/site-packages/contextualbandits/utils.py", line 598, in partial_fit
Parallel(n_jobs=self.njobs, verbose=0, require="sharedmem")(delayed(self._partial_fit_single)(choice, X, a, r) for choice in range(self.n))
File "/anaconda3/envs/project/lib/python3.7/site-packages/joblib/parallel.py", line 934, in call
self.retrieve()
File "/anaconda3/envs/project/lib/python3.7/site-packages/joblib/parallel.py", line 833, in retrieve
self._output.extend(job.get(timeout=self.timeout))
File "/anaconda3/envs/project/lib/python3.7/multiprocessing/pool.py", line 657, in get
raise self._value
File "/anaconda3/envs/project/lib/python3.7/multiprocessing/pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "/anaconda3/envs/project/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 567, in call
return self.func(*args, **kwargs)
File "/anaconda3/envs/project/lib/python3.7/site-packages/joblib/parallel.py", line 225, in call
for func, args, kwargs in self.items]
File "/anaconda3/envs/project/lib/python3.7/site-packages/joblib/parallel.py", line 225, in
for func, args, kwargs in self.items]
File "/anaconda3/envs/project/lib/python3.7/site-packages/contextualbandits/utils.py", line 608, in _partial_fit_single
self.algos[choice].partial_fit(xclass, yclass, classes = [0, 1])
AttributeError: '_B
etaPredictor' object has no attribute 'partial_fit'
What being passed in _partial_fit_single is as below:
self.algos: [<contextualbandits.utils._BetaPredictor object at 0x1a1dd01518>, <contextualbandits.utils._BetaPredictor object at 0x1a20aaff98>]
choice: 1
Any idea why? Any help will be appreciated.
Thanks in advance. ๐ป
When using BootstrappedUCB.add_arm() the NameError: name 'base_algorithm' is not defined is raised on line 297 under the _BasePolicy.add_arm() method.
fitted_classifier = self._make_bootstrapped(base_algorithm, self._percentile,
self._ts_byrow, self._ts_weighted)
When looking at the code, the base_algorithm is not part of the function signature but is referred to as self. base_algorithm at line 300.
if isinstance(self.base_algorithm, list):
if (fitted_classifier is not None):
raise ValueError("Must pass 'fitted_classifier' when using different 'base_algorithm' per arm.")
To be corrected.
I am getting the following error in the init.py of linreg
ImportError: cannot import name 'wrapper_double' from partially initialized module 'contextualbandits.linreg' (most likely due to a circular import) (C:...\python\contextualbandits-master\contextualbandits\linreg_init.py)
Is there an easy fix to this?
trying to use ParametricTS causes the following error.
usage:
base_model = XGBRegressor(n_estimators=20)
cb_model = cb.online.ParametricTS(base_model, nchoices=actions)
...
cb_model.predict(df_context)
env:
contextualbandits 0.3.17.post3 (installed via pip)
Python 3.9.13
OS: Mac OSX 12.5.1
Stack trace:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
~/miniconda3/envs/env1/lib/python3.9/site-packages/contextualbandits/online.py in predict(self, X, exploit, output_score)
608 scores = self._exploit(X)
609 else:
--> 610 scores = self.decision_function(X)
611 pred = self._name_arms(np.argmax(scores, axis = 1))
612
~/miniconda3/envs/env1/lib/python3.9/site-packages/contextualbandits/online.py in decision_function(self, X)
507 else:
508 return self._predict_from_beta_prior_and_smoothing(X.shape[0])
--> 509 return self._score_matrix(X)
510
511 def _predict_from_beta_prior_and_smoothing(self, n):
~/miniconda3/envs/env1/lib/python3.9/site-packages/contextualbandits/online.py in _score_matrix(self, X)
3341 def _score_matrix(self, X):
3342 pred = self._oracles.decision_function(X)
-> 3343 counters = self._oracles.get_nobs_by_arm()
3344 with_model = counters >= self.beta_prior[1]
3345 counters = counters.reshape((1,-1))
~/miniconda3/envs/env1/lib/python3.9/site-packages/contextualbandits/utils.py in get_nobs_by_arm(self)
1006
1007 def get_nobs_by_arm(self):
-> 1008 return self.beta_counters[1] + self.beta_counters[2]
1009
1010 def exploit(self, X):
AttributeError: '_OneVsRest' object has no attribute 'beta_counters'
do you have a hint for an online course teaching these subjects?
def _predict(self, X, exploit = False):
X = _check_X_input(X)
if X.shape[0] == 0:
return np.array([])
if exploit:
return self._oracles.predict(X)
if self.explore_cnt < self.explore_rounds:
self.explore_cnt += X.shape[0]
# case 1: all predictions are within allowance
if self.explore_cnt <= self.explore_rounds:
return np.random.randint(self.nchoices, size = X.shape[0])
# case 2: some predictions are within allowance, others are not
else:
n_explore = self.explore_rounds - self.explore_cnt
pred = np.zeros(X.shape[0])
pred[:n_explore] = np.random.randint(self.nchoices, n_explore)
pred[n_explore:] = self._oracles.predict(X)
return pred
else:
return self._oracles.predict(X)
This part of the code in online.py results in (low>=high) error in case 2. Guess the problem is that n_explore being negative.
pip install contextualbandits
returned error as follow, any advice? thx
......
Using cached
https://pypi.tuna.tsinghua.edu.cn/packages/bd/79/d6b51a83c84b047cf8012ef70f4dae62654e4ca5dfac792589e5709d71f1/contextualbandits-0.3.6.tar.gz (58 kB)
......
Building wheels for collected packages: contextualbandits
Building wheel for contextualbandits (PEP 517) ... error
.......
In file included from contextualbandits/_cy_utils.cpp:629:
contextualbandits/cy_cpp_helpers.cpp:3:10: fatal error: 'random' file not found
#include <random>
When using XGBoost models as base models for online models such as EpsilonGreedy, properties are being added to XGBoost models that prevent the models from being deserialized with either pickle or dill, as these properties don't exist in the original model class.
usage:
import contextualbandits as cb
from xgboost import XGBClassifier
arms = ["a", "b", "c"]
base_model = XGBClassifier(n_estimators=20)
cb_model = cb.online.EpsilonGreedy(base_model, nchoices=arms)
X = pd.DataFrame([0])
a = pd.Series(["a"])
r = pd.Series([1])
cb_model.fit(X, a, r)
dill.loads(dill.dumps(cb_model)) # dill.loads() fails with the error below:
AttributeError Traceback (most recent call last)
/var/folders/g5/lpnvjwrd2h95lf50zlb22dt00000gn/T/ipykernel_70859/3296061279.py in <module>
9 cb_model.fit(X, a, r)
10
---> 11 dill.loads(dill.dumps(cb_model))
~/miniconda3/envs/env1/lib/python3.9/site-packages/dill/_dill.py in loads(str, ignore, **kwds)
385 """
386 file = StringIO(str)
--> 387 return load(file, ignore, **kwds)
388
389 # def dumpzs(obj, protocol=None):
~/miniconda3/envs/env1/lib/python3.9/site-packages/dill/_dill.py in load(file, ignore, **kwds)
371 See :func:`loads` for keyword arguments.
372 """
--> 373 return Unpickler(file, ignore=ignore, **kwds).load()
374
375 def loads(str, ignore=None, **kwds):
~/miniconda3/envs/env1/lib/python3.9/site-packages/dill/_dill.py in load(self)
644
645 def load(self): #NOTE: if settings change, need to update attributes
--> 646 obj = StockUnpickler.load(self)
647 if type(obj).__module__ == getattr(_main_module, '__name__', '__main__'):
648 if not self._ignore:
AttributeError: 'XGBClassifier' object has no attribute '_decision_function_w_sigmoid_from_predict'
Where is Bibtex_data.txt located?
Hi there,
Wherever I try to predict using bootstrapped_ucb, active_explorer, adaptive_active_greedy etc. with a SGD classifier as the base, I'm seeing the error below:
AttributeError Traceback (most recent call last)
in
----> 1 adaptive_active_greedy.predict(batch_S)
~/anaconda3/envs/tensorflow/lib/python3.6/site-packages/contextualbandits/online.py in predict(self, X, exploit)
910 """
911 # TODO: add option to output scores
--> 912 return self._name_arms(self._predict(X, exploit))
913
914 def _predict(self, X, exploit = False):
~/anaconda3/envs/tensorflow/lib/python3.6/site-packages/contextualbandits/online.py in _predict(self, X, exploit)
931 # case 1: number of predictions to make would still fit within current window
932 if remainder_window > X.shape[0]:
--> 933 pred, pred_max = self.calc_preds(X)
934 self.window_cnt += X.shape[0]
935 self.window = np.r[self.window, pred_max]
~/anaconda3/envs/tensorflow/lib/python3.6/site-packages/contextualbandits/online.py in _calc_preds(self, X)
975
976 def _calc_preds(self, X):
--> 977 pred_proba = self._oracles.decision_function(X)
978 pred_max = pred_proba.max(axis = 1)
979 pred = np.argmax(pred_proba, axis = 1)
AttributeError: 'AdaptiveGreedy' object has no attribute '_oracles'
Is there any way, I can change the reward from binary {0,1} to continuous [0,1[?
Hello @david-cortes, thanks for making contextualbandits!
I'm currently exploring solutions to recommend products based on user preferences and it looks like contextual bandits is an interesting approach.
In this small gist (105 lines), I built a simulation of conversation turns with updated item scores based on updated user preferences.
If the user likes a recommendation (let's say that the system is presenting max 3 items), the RS will take the liked items into consideration by finding similar items (cosine similarity) as well.
I would love to use contextualbandits to run experiments but I'm not sure how to use it with a very simple dataset (like the one in my gist). Basically, a vector of item features.
I would be more than grateful if I could get any help or advice.
Hi everyone,
I am working on solving a peg-in hole problem. Initially I stared with RL approach but it seems like its not the right approach for my problem.
Task description
Setup: Robot placed on the table, board with cylindrical peg on the table and cylindrical peg.
Task: Use robot to insert peg inside the cylindrical hole in the board despite small error in the exact location of the cylindrical hole.
Description: In order to accomplish the task the parameters of the controller need to be learnt, the goal is to learn one set of parameters per episode that can solve the problem for a given radius of the peg and hole - with an ability to generalise to other sizes.
Due to the fact that only one set of controller parameters should be learnt the problem is not an RL problem but more of an contextual bandit problem.
The states are not feed to policy at each timestep, instead the context is used (which is the position of the hole) and is feed to the policy only at the beginning of each episode. Given the context policy outputs actions (parameters of controller) which are used throughout the episode.
During the episode, at each timestep the reward is calculated and at the end of the episode the rewards are summed and saved and should be used to update the policy.
As I am more familiar with the RL approach I was wondering if someone more experienced could advise if using contextual banidts is the right way to go and if so, what algorithm recommendation do you have?
For simulation I use robosuite which has gym-like structure
Thank you for your help
On the EpsilonGreedy algorithm, the documentation says that:
But from the code, it appears it is the other way. Am I misunderstanding something?
contextualbandits/contextualbandits/utils.py
Lines 613 to 625 in 719c2c9
Is this framework GPU bound or CPU bound operations?
For Linear Thompson Sampling predictions, there's a value v_sq, which is used to multiply the covariance matrix of model parameters when sampling from the multivariate norm defined by it. In my research, it looks like v is supposed to be the sum of squared residuals of the output value divided by the number of training rows up to this point which comes up in many standard error of coefficient calculations. So how does someone determine this value without storing all the data and re-computing the prediction?
Meanwhile the Agarwal paper sets v to Rsqrt(9d*ln(T/delta) where d is the number of dimensions or features in the linear regression, R is... something... it looks like an absolute bound on the residuals, T is the time horizon for regret calculations so it can be replaced by the current number of training rows, and delta is a fungible parameter between 0 and 1 which determines the regret behavior (higher delta means less variance in predictions and a higher likelihood of exceeding the guaranteed regret bounds).
So... uh... this is really confusing, and how to do you set v_sq then? There's not much documentation in the code.
I think this is because _robust_predict used for batch training is a bounded function and does not get pickled properly. Getting the following error when i try to load the pickle file -
AttributeError: 'SGDClassifier' object has no attribute '_robust_predict'
|
.
First of all, thank you for fixing my issues #62.
I want to setting my CB model with custom 'choice_names' (integer) for using serial number of choices in my example data. I got TypeError, when predict new data with 'topN method'.
This error is not occur when setting no custom 'choice_names'. how can i get some hint to fix that error?
model.topN(X_new, model.nchoices)
TypeError: only integer scalar arrays can be converted to a scalar index
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<command-3531050> in <module>
67 , batch_size = 10
68 , noptions = models[0].nchoices
---> 69 , mode = "allscore") _<- this part is just my custom function code_
70
71 t_end = time() # -------------------------------------------------------- ๋
<command-3531050> in predict_cb(X_new, model, batch_size, noptions, mode)
40
41 elif mode == "allscore":
---> 42 imp_temp = model.topN(X_batch,noptions) _<- this part is just my custom function code_
43 imp_new = np.concatenate((imp_new,imp_temp),axis=0)
44
/databricks/python/lib/python3.7/site-packages/contextualbandits/online.py in topN(self, X, n)
574 else:
575 topN = topN_byrow(scores, n, self.njobs)
--> 576 return self._name_arms(topN)
577
578
/databricks/python/lib/python3.7/site-packages/contextualbandits/online.py in _name_arms(self, pred)
143 return pred
144 else:
--> 145 return self.choice_names[pred]
146
147 def drop_arm(self, arm_name):
TypeError: only integer scalar arrays can be conver
```ted to a scalar index
Hi @david-cortes,
Do you have an example data set or reference to this that is used in your simulation?
Thank you,
Saran
I
I've been trying to access an sklearn's model coefficients and got an AttributeError
, it seems like something modifies the base model.
This issue gets resolved if the arms are integers.
Minimal code that causes the exception:
import numpy as np
from contextualbandits import online as cb
from sklearn.naive_bayes import BernoulliNB
aa = np.array([0.2, 0.1, 0.2, 0.1, 0.4, 1.1, 0.3, 0.5, 0.6, 0.7])
XX = np.array([[1, 0, 0],
[0, 1, 0],
[0, 1, 0],
[0, 1, 0],
[0, 1, 0],
[0, 1, 0],
[0, 1, 0],
[0, 1, 0],
[0, 1, 0],
[0, 1, 0]])
rr = np.array([0, 0, 1, 1, 1, 1, 1, 1, 1, 1])
n_arms = len({t for t in aa})
model = cb.SeparateClassifiers(BernoulliNB(), n_arms, batch_train=True)
model.partial_fit(XX, aa, rr)
[m.coef_ for m in model._oracles.algos]
Is it possible to get probabilities and topN arms in other methods like DoublyRobust and OffsetTree apart from Bootstrapped TS
Hi @david-cortes,
I was wondering if you had the example data set that is use in your examples?
That or a reference would be much appreciated.
Hi,
A group I was in tried to reproduce some of the results from the paper associated with this repo.
To be honest, it was a bit more rushed than I'd like, there is probably a fair amount we could have done better by reaching out, I don't think there are any really significant findings - except we did find some curious behavior when comparing the cython vs pure-python implementation of some algorithms. We kind of ran out of time to isolate what was causing the change, we mainly noticed it because one of the group members was having issues with the cython and decided to use the python-pure version.
I was able to reproduce his results (comparing our own implementation, contextual-bandits 0.1.8.5 (matched), and contextual-bandits 0.3.13 (didn't match)). The link to the repo/paper is below - Figure 4 under Section 3.2.1 highlights the issue.
https://github.com/MrinalJain17/ml-tools-project/blob/main/Contextual_Bandit_Comparison.pdf
I believe one of the main reasons why we were unable to completely reproduce the original paper may stem from this issues. Having some group members using different environments (versions of the library) quite possibly skewed some of our findings and isn't really addressed in our write-up.
That said, I hope you have better luck figuring out what exactly is changing the current branch's behavior. Considering we implemented the ContextualAdaptiveGreedy algorithm from scratch and it tracked the 0.1.8.5 version - I'm assuming the problem isn't there...
Hey,
I want to use MAB in an LTR problem. Can you help me figure out which algorithm to use and a little about how to use it?
(a beginner in MAB I am)
I'm thinking of using the functionality for ranking top-N arms instead of always picking the single best one.
Any suggestion would help. Thanks in advance.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.