david-cortes / contextualbandits Goto Github PK

View Code? Open in Web Editor NEW

721.0 23.0 140.0 10.04 MB

Python implementations of contextual bandits algorithms

Home Page: http://contextual-bandits.readthedocs.io

License: BSD 2-Clause "Simplified" License

Python 89.11% C++ 1.48% Cython 9.41%

contextual-bandits multiarmed-bandits reinforcement-learning exploration-exploitation

contextualbandits's People

Contributors

Stargazers

Watchers

Forkers

pinghsieh oscardaniel88 neroksi ebengin bhargav5 yipeng5 alephnotation ngminhtrung qifeng2010 salemameen ebalp danifree longshihe glenn784 xflee hongzhangbrown frutik dreadlord1984 yyht dobriban timkam emanuelrenkl daehwanahn psxz junhaowang pombredanne zwcdp zshengli ato-silkroad helges afshaanmaz stjordanis saltyyyyyyy gordon0803 karan2k stanfordhci sjoerdapp gitant anujk3 kaustubrao jmorrow1000 hexi2015 miloventimiglia alicia1529 yongaryz zhoujialinmumu daddydrac alice-liang quikparse adibharadwaj13 kyungjaelee yuan776 iamsile rafaelaussie ntruongv anhmike joulroad anu7699 hanchenresearch janmotl turbolt juancaros kajiwara111 jcassiojr fagan2888 clabra waral htaghizadeh lindseyrray jokyungsang nachtsky1077 javiervicho danbikim9502 alihajif umangsaraf98 bobycv06fpm haru-256 hakeemta marshallho alexmill aganoob luuuucy rinaer10 lujunsincerely yuan925 principe92 fillefrans yanjiehe tinder-akshatverma tconnor23 mkihsuak reborm sandy4321 raihan-seraj buzzniailab alanliaoyihang wanechka sarahboufelja djmartingale pikqu

contextualbandits's Issues

extra param in example/policy_evaluation.ipynb

I figured commenting here would be easier than forking with a pull request.

The final example (using Mediamill_data.txt) has an extra parameter update_freq = 5

I'm guessing this is just a leftover artifact but couldn't find any commit version of evaluateRejectionSampling that had it? I don't believe removing it substantially changes any of the points made in the tutorial, but the cell result did change from ~(0.4028, 422) to ~(0.4539, 445).

Great library btw!

Type error if beta_prior == "auto" and nchoices is list

I found an error if beta_prior is set to be "auto" and nchoices is a list when we initialize the online contextual bandits.

Here is the error message:

File "/usr/local/lib/python3.8/dist-packages/contextualbandits/online.py", line 1894, in __init__
    beta_prior = ((3./nchoices, 4.), 2)
TypeError: unsupported operand type(s) for /: 'float' and 'list'

The beta prior should be handled depending on the type of nchoices (int or list-like).

Prediction vary every time with the loaded model

@david-cortes
I have saved a model in the following way:

base_algorithm = SGDClassifier(random_state=123, loss='log')
beta_prior = ((3, 7), 2)
model = BootstrappedUCB(deepcopy(base_algorithm), nchoices = nchoices, batch_train=True, beta_prior=beta_prior)
for i in range(iters):   // for loop with several iterations
    // shape for X: [batch, 2626]
    // shape for a: [batch, 1]
    // shape for r: [batch, 1]
    model.partial_fit(X, a, r)
target_model = "20190521.dill"
dill.dump(model, open(target_model, "wb"))

BUT i got different prediction results every time for the same input, here is a simulation:

>>> model = dill.load(open("20190521.dill", "rb"))
>>> X = np.random.normal(size=(1, 2626))
>>> res01 = model.decision_function(X)
>>> res01[0][:5]
array([0.447249  , 0.27269542, 0.48439773, 0.26759085, 0.1235832 ])
>>> 
>>> res02 = model.decision_function(X)
>>> res02[0][:5]
array([0.1319437 , 0.21268724, 0.40948264, 0.13509549, 0.15605585])
>>> 
>>> 
>>> pred01 = model.predict(X)
>>> pred01
array([651])
>>> 
>>> model.predict(X)
array([210])
>>> model.predict(X)
array([1741])

20190521.dill is the model trained with BootstrappedUCB in the above way.

Bibtex_data

Hello, I'm trying to find the "Bibtex_data.txt" dataset referenced in the examples, but could not find it. Could you please point me to it?

Thanks,
A.

AssertionError for BayesianTS and BayesianUCB constructors

The third argument for _check_constructor_input method is supposed to the batch_train boolean. The constructors for BayesianTS and BayesianUCB are instead passing a tuple.

AssertionError Traceback (most recent call last)
<ipython-input-117-96a94fb40784> in <module>()
      4 nchoices=50
      5 
----> 6 bayesian_ts=BayesianTS(nchoices)
      7 
      8 bayesian_ucb=BayesianUCB(nchoices)

~\AppData\Local\Continuum\anaconda3\lib\site-packages\contextualbandits\online.py in __init__(self, nchoices, method, beta_prior)
   1912     """
   1913     def __init__(self, nchoices, method='advi', beta_prior=((1,1),3)):
-> 1914         _check_constructor_input(_BetaPredictor(1,1),nchoices,((1,1),2))
   1915         self.beta_prior = beta_prior
   1916         self.nchoices = nchoices

~\AppData\Local\Continuum\anaconda3\lib\site-packages\contextualbandits\utils.py in _check_constructor_input(base_algorithm, nchoices, batch_train)
     54     assert ('fit' in dir(base_algorithm)) and ('predict' in dir(base_algorithm))
     55     if batch_train:
---> 56         assert 'partial_fit' in dir(base_algorithm)
     57     return None
     58

Support for continuous rewards

It is said in the documentation that only binary rewards are supported. If continuous values are passed, the following sklearn exception is thrown:

ValueError: Unknown label type: 'continuous'

However, looks like there are no exceptions or errors when some regressor is used as base_algorithm, e.g.

agent = SeparateClassifiers(base_algorithm=RandomForestRegressor(), n_choices=...)

I haven't faced any unexpected behaviour for my use case. So, was it just luck or it is really a way to work with continuous rewards?

AssertionError: online_contextual_bandits.ipynb

First of all thank you for code to use CB : >

When I run your example notebook (online_contextual_bandits.ipynb), I get 'AssertionError' when i run '3.3 Streaming models' part. how can i get some hint to fix that error?

`
AssertionError:

AssertionError Traceback (most recent call last)
in
62 lst_actions[model],
63 X_batch, y_batch,
---> 64 rnd_seed = batch_st)

in simulate_rounds_stoch(model, rewards, actions_hist, X_batch, y_batch, rnd_seed)
31
32 ## choosing actions for this batch
---> 33 actions_this_batch = model.predict(X_batch).astype('uint8')
34
35 # keeping track of the sum of rewards received

/databricks/python/lib/python3.7/site-packages/contextualbandits/online.py in predict(self, X, exploit)
2003 if not self.is_fitted:
2004 return self._predict_random_if_unfit(X, False)
-> 2005 return self._name_arms(self._predict(X, exploit, True))
2006
2007 def _predict(self, X, exploit = False, choose = True):

/databricks/python/lib/python3.7/site-packages/contextualbandits/online.py in _predict(self, X, exploit, choose)
2029 # case 1: number of predictions to make would still fit within current window
2030 if remainder_window > X.shape[0]:
-> 2031 pred, pred_max = self.calc_preds(X, choose)
2032 self.window_cnt += X.shape[0]
2033 self.window = np.r[self.window, pred_max]

/databricks/python/lib/python3.7/site-packages/contextualbandits/online.py in _calc_preds(self, X, choose)
2076
2077 def _calc_preds(self, X, choose = True):
-> 2078 pred_proba = self._oracles.decision_function(X)
2079 np.nan_to_num(pred_proba, copy=False)
2080 pred_max = pred_proba.max(axis = 1)

/databricks/python/lib/python3.7/site-packages/contextualbandits/utils.py in decision_function(self, X)
927 Parallel(n_jobs=self.njobs, verbose=0, require="sharedmem")
928 (delayed(self._decision_function_single)(choice, X, preds, 1)
--> 929 for choice in range(self.n))
930 _apply_smoothing(preds, self.smooth, self.counters,
931 self.noise_to_smooth, self.random_state)

/databricks/python/lib/python3.7/site-packages/joblib/parallel.py in call(self, iterable)
1015
1016 with self._backend.retrieval_context():
-> 1017 self.retrieve()
1018 # Make sure that we get a last message telling us we are done
1019 elapsed_time = time.time() - self._start_time

/databricks/python/lib/python3.7/site-packages/joblib/parallel.py in retrieve(self)
907 try:
908 if getattr(self._backend, 'supports_timeout', False):
--> 909 self._output.extend(job.get(timeout=self.timeout))
910 else:
911 self._output.extend(job.get())

/usr/lib/python3.7/multiprocessing/pool.py in get(self, timeout)
655 return self._value
656 else:
--> 657 raise self._value
658
659 def _set(self, i, obj):

/usr/lib/python3.7/multiprocessing/pool.py in worker(inqueue, outqueue, initializer, initargs, maxtasks, wrap_exception)
119 job, i, func, args, kwds = task
120 try:
--> 121 result = (True, func(*args, **kwds))
122 except Exception as e:
123 if wrap_exception and func is not _helper_reraises_exception:

/databricks/python/lib/python3.7/site-packages/joblib/_parallel_backends.py in call(self, *args, **kwargs)
606 def call(self, *args, **kwargs):
607 try:
--> 608 return self.func(*args, **kwargs)
609 except KeyboardInterrupt:
610 # We capture the KeyboardInterrupt and reraise it as

/databricks/python/lib/python3.7/site-packages/joblib/parallel.py in call(self)
254 with parallel_backend(self._backend, n_jobs=self._n_jobs):
255 return [func(*args, **kwargs)
--> 256 for func, args, kwargs in self.items]
257
258 def len(self):

/databricks/python/lib/python3.7/site-packages/joblib/parallel.py in (.0)
254 with parallel_backend(self._backend, n_jobs=self._n_jobs):
255 return [func(*args, **kwargs)
--> 256 for func, args, kwargs in self.items]
257
258 def len(self):

/databricks/python/lib/python3.7/site-packages/contextualbandits/utils.py in _decision_function_single(self, choice, X, preds, depth)
955 preds[:, choice] = self.algos[choice].decision_function_w_sigmoid(X)
956 else:
--> 957 preds[:, choice] = self.algos[choice].predict(X)
958
959 ### Note to self: it's not a problem to mix different methods from the

/databricks/python/lib/python3.7/site-packages/contextualbandits/linreg/init.py in predict(self, X)
512 The predicted values given 'X'.
513 """
--> 514 assert self.is_fitted_
515
516 pred = X.dot(self.coef_[:self._n])

AssertionError: `

Method .as_matrix will be removed in a future version. Use .values instead.

The utilities module uses the as_matrix method three times. Fitting a model generates the warning:

<path>\contextualbandits\utils.py:85: FutureWarning: Method .as_matrix will be removed in a future version. Use .values instead.
 X=X.as_matrix()

The method is also used on lines 97 and 482.

Build errors on Mac OS X

I had to change a couple of things to get this compiling on Mac OS X:

python extension object in setup.py doesn't have an extra_compile_args member if none were specified
openmp library flag in clang is -lomp and preprocessor flag is different
clang needs a recent C++ standard specified to handle some of the lambdas

Something like:

diff --git a/setup.py b/setup.py
index ea0ff35..8b7e15b 100644
--- a/setup.py
+++ b/setup.py
@@ -9,16 +9,17 @@ class build_ext_subclass( build_ext ):
         if compiler == 'msvc': # visual studio
             for e in self.extensions:
                 e.extra_compile_args += ['/O2', '/openmp']
+        elif platform.startswith('darwin'):
+            for e in self.extensions:
+                e.extra_link_args += ["-lomp"]
+                e.extra_compile_args = ["-Xpreprocessor", "-fopenmp", "-O3", "-march=native"]
+                if e.language == "c++":
+                    e.extra_compile_args += ["-std=c++17"]
         else:
             for e in self.extensions:
                 e.extra_compile_args += ['-O3', '-march=native', '-fopenmp']
                 e.extra_link_args += ['-fopenmp']
-            
-            ### Remove this code if you have a mac with gcc or clang + openmp
-            if platform[:3] == "dar":
-                for e in self.extensions:
-                    e.extra_compile_args = [arg for arg in extra_compile_args if arg != '-fopenmp']
-                    e.extra_link_args    = [arg for arg in extra_link_args    if arg != '-fopenmp']
+
         build_ext.build_extensions(self)
 
 setup(

Context Understanding of the API

The n_features in parameter X (array (n_samples, n_features)) of predict(X, exploit=False, gradient_calc='weighted') should be referred to as the context, summarizing information of both the user u and arm a. It should be a form like CONCAT<user_vec, arm_vec>.

If So,

predict(CONCAT<user1_vec, arm1_vec>, exploit=False, gradient_calc='weighted')
predict(CONCAT<user1_vec, arm2_vec>, exploit=False, gradient_calc='weighted')

may give a different actions prediction for the same user1.

That is confusing me.
How should I understand the The n_features in parameter X (array (n_samples, n_features)) ?

Different predictions of the same model.dill file in different CPUs for the LinUCB algorithm

After training the LinUCB model, I shared the model with another user. The other user was getting different scores for the arms for the same data. Any idea why?

Data unavailable for online example

The bibtex dataset is no longer available online at the link provided

Error Occurs when Running the Online Contextual Bandits Tutorial

When I execute this code block:

from sklearn.linear_model import LogisticRegression
from contextualbandits.online import BootstrappedUCB, BootstrappedTS, LogisticUCB, \
            SeparateClassifiers, EpsilonGreedy, AdaptiveGreedy, ExploreFirst, \
            ActiveExplorer, SoftmaxExplorer
from copy import deepcopy

nchoices = y.shape[1]
base_algorithm = LogisticRegression(solver='lbfgs', warm_start=True)
beta_prior = ((3./nchoices, 4), 2) # until there are at least 2 observations of each class, will use this prior
beta_prior_ucb = ((5./nchoices, 4), 2) # UCB gives higher numbers, thus the higher positive prior
beta_prior_ts = ((2./np.log2(nchoices), 4), 2)
### Important!!! the default values for beta_prior will be changed in version 0.3

## The base algorithm is embedded in different metaheuristics
bootstrapped_ucb = BootstrappedUCB(deepcopy(base_algorithm), nchoices = nchoices,
                                   beta_prior = beta_prior_ucb, percentile = 80,
                                   random_state = 1111)
bootstrapped_ts = BootstrappedTS(deepcopy(base_algorithm), nchoices = nchoices,
                                 beta_prior = beta_prior_ts, random_state = 2222)
one_vs_rest = SeparateClassifiers(deepcopy(base_algorithm), nchoices = nchoices,
                                  beta_prior = beta_prior, random_state = 3333)
epsilon_greedy = EpsilonGreedy(deepcopy(base_algorithm), nchoices = nchoices,
                               beta_prior = beta_prior, random_state = 4444)
logistic_ucb = LogisticUCB(nchoices = nchoices, percentile = 70,
                           beta_prior = beta_prior_ts, random_state = 5555)
adaptive_greedy_thr = AdaptiveGreedy(deepcopy(base_algorithm), nchoices=nchoices,
                                     decay_type='threshold',
                                     beta_prior = beta_prior, random_state = 6666)
adaptive_greedy_perc = AdaptiveGreedy(deepcopy(base_algorithm), nchoices = nchoices,
                                      decay_type='percentile', decay=0.9997,
                                       beta_prior=beta_prior, random_state = 7777)
explore_first = ExploreFirst(deepcopy(base_algorithm), nchoices = nchoices,
                             explore_rounds=1500, beta_prior=None, random_state = 8888)
active_explorer = ActiveExplorer(deepcopy(base_algorithm), nchoices = nchoices,
                                 beta_prior=beta_prior, random_state = 9999)
adaptive_active_greedy = AdaptiveGreedy(deepcopy(base_algorithm), nchoices = nchoices,
                                        active_choice='weighted', decay_type='percentile', decay=0.9997,
                                        beta_prior=beta_prior, random_state = 1234)
softmax_explorer = SoftmaxExplorer(deepcopy(base_algorithm), nchoices = nchoices,
                                   beta_prior=beta_prior, random_state = 5678)

models = [bootstrapped_ucb, bootstrapped_ts, one_vs_rest, epsilon_greedy, logistic_ucb,
          adaptive_greedy_thr, adaptive_greedy_perc, explore_first, active_explorer,
          adaptive_active_greedy, softmax_explorer]"

I got an error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-2-f97973a242eb> in <module>
      1 from sklearn.linear_model import LogisticRegression
----> 2 from contextualbandits.online import BootstrappedUCB, BootstrappedTS, LogisticUCB, \
      3             SeparateClassifiers, EpsilonGreedy, AdaptiveGreedy, ExploreFirst, \
      4             ActiveExplorer, SoftmaxExplorer
      5 from copy import deepcopy

C:\Python38\lib\site-packages\contextualbandits\__init__.py in <module>
----> 1 from . import online
      2 from . import offpolicy
      3 from . import evaluation
      4 from . import linreg

C:\Python38\lib\site-packages\contextualbandits\online.py in <module>
      2 
      3 import numpy as np, warnings, ctypes
----> 4 from .utils import _check_constructor_input, _check_beta_prior, \
      5             _check_smoothing, _check_fit_input, _check_X_input, _check_1d_inp, \
      6             _ZeroPredictor, _OnePredictor, _OneVsRest,\

C:\Python38\lib\site-packages\contextualbandits\utils.py in <module>
      8 from sklearn.linear_model import LogisticRegression
      9 from sklearn.tree import DecisionTreeClassifier
---> 10 from .linreg import LinearRegression, _wrapper_double
     11 from ._cy_utils import _matrix_inv_symm, _create_node_counters
     12 

C:\Python38\lib\site-packages\contextualbandits\linreg\__init__.py in <module>
      4 import warnings
      5 from sklearn.base import BaseEstimator
----> 6 from . import _wrapper_double, _wrapper_float
      7 
      8 __all__ = ["LinearRegression", "ElasticNet"]

contextualbandits\linreg\linreg_double.pyx in init contextualbandits.linreg._wrapper_double()

ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject

Does anyone have any idea why it happened?

mtrand.pyx in mtrand.RandomState.beta() a<=0 error in Adaptive Greedy Algorithm

Hi there,

Contextual bandits problem is interesting and useful for many realistic problem. Thanks for the code!

When I tried to reproduce the online contextual bandits experiments with the file online_contextual_bandits.ipynb, I encountered an error as follows. I used python 2.7.15 (Anaconda 2.7) under Ubuntu 1604.

I went over possible reasons, like numpy version, joblib version. It looks like somehow a non-positive number, i.e., self.a, is generated for np.random.beta(self.a, self.b, size=X.shape[0])

Thanks in advance.

`<contextualbandits.online.AdaptiveGreedy instance at 0x7f6888ee3200>

ValueError Traceback (most recent call last)
in ()
59 lst_actions[model],
60 X, y,
---> 61 batch_st, batch_end)

in simulate_rounds(model, rewards, actions_hist, X_global, y_global, batch_st, batch_end)
30
31 ## choosing actions for this batch
---> 32 actions_this_batch = model.predict(X_global[batch_st:batch_end, :]).astype('uint8')
33
34 # keeping track of the sum of rewards received

/home/yluo/anaconda2/lib/python2.7/site-packages/contextualbandits/online.pyc in predict(self, X, exploit)
877 """
878 # TODO: add option to output scores
--> 879 return self._name_arms(self._predict(X, exploit))
880
881 def _predict(self, X, exploit = False):

/home/yluo/anaconda2/lib/python2.7/site-packages/contextualbandits/online.pyc in _predict(self, X, exploit)
898 # case 1: number of predictions to make would still fit within current window
899 if remainder_window > X.shape[0]:
--> 900 pred, pred_max = self.calc_preds(X)
901 self.window_cnt += X.shape[0]
902 self.window = np.r[self.window, pred_max]

/home/yluo/anaconda2/lib/python2.7/site-packages/contextualbandits/online.pyc in _calc_preds(self, X)
944
945 def _calc_preds(self, X):
--> 946 pred_proba = self._oracles.decision_function(X)
947 pred_max = pred_proba.max(axis=1)
948 pred = np.argmax(pred_proba, axis=1)

/home/yluo/anaconda2/lib/python2.7/site-packages/contextualbandits/utils.pyc in decision_function(self, X)
624 def decision_function(self, X):
625 preds = np.zeros((X.shape[0], self.n))
--> 626 Parallel(n_jobs=self.njobs, verbose=0, require="sharedmem")(delayed(self._decision_function_single)(choice, X, preds, 1) for choice in range(self.n))
627 _apply_smoothing(preds, self.smooth, self.counters)
628 return preds

/home/yluo/anaconda2/lib/python2.7/site-packages/joblib/parallel.pyc in call(self, iterable)
915 # remaining jobs.
916 self._iterating = False
--> 917 if self.dispatch_one_batch(iterator):
918 self._iterating = self._original_iterator is not None
919

/home/yluo/anaconda2/lib/python2.7/site-packages/joblib/parallel.pyc in dispatch_one_batch(self, iterator)
757 return False
758 else:
--> 759 self._dispatch(tasks)
760 return True
761

/home/yluo/anaconda2/lib/python2.7/site-packages/joblib/parallel.pyc in _dispatch(self, batch)
714 with self._lock:
715 job_idx = len(self._jobs)
--> 716 job = self._backend.apply_async(batch, callback=cb)
717 # A job can complete so quickly than its callback is
718 # called before we get here, causing self._jobs to

/home/yluo/anaconda2/lib/python2.7/site-packages/joblib/_parallel_backends.pyc in apply_async(self, func, callback)
180 def apply_async(self, func, callback=None):
181 """Schedule a func to be run"""
--> 182 result = ImmediateResult(func)
183 if callback:
184 callback(result)

/home/yluo/anaconda2/lib/python2.7/site-packages/joblib/_parallel_backends.pyc in init(self, batch)
547 # Don't delay the application, to avoid keeping the input
548 # arguments in memory
--> 549 self.results = batch()
550
551 def get(self):

/home/yluo/anaconda2/lib/python2.7/site-packages/joblib/parallel.pyc in call(self)
223 with parallel_backend(self._backend, n_jobs=self._n_jobs):
224 return [func(*args, **kwargs)
--> 225 for func, args, kwargs in self.items]
226
227 def len(self):

/home/yluo/anaconda2/lib/python2.7/site-packages/contextualbandits/utils.pyc in _decision_function_single(self, choice, X, preds, depth)
640 preds[:, choice] = self.algos[choice].predict_proba_robust(X)[:, 1]
641 elif 'predict_proba' in dir(self.base):
--> 642 preds[:, choice] = self.algos[choice].predict_proba(X)[:, 1]
643 else:
644 if depth == 0:

/home/yluo/anaconda2/lib/python2.7/site-packages/contextualbandits/utils.pyc in predict_proba(self, X)
313
314 def predict_proba(self, X):
--> 315 preds = np.random.beta(self.a, self.b, size=X.shape[0]).reshape((-1, 1))
316 return np.c_[1 - preds, preds]
317

mtrand.pyx in mtrand.RandomState.beta()

ValueError: a <= 0

Model Saving and Incremental Training

@david-cortes
Is there a way for the incremental training? That should be an important requirement

Need more understanding for the method add_arm

Please add more documentation or mention here on how to use the method and how it will work with or without existing training data when adding a new arm.

Continuous Covariates

In any of the online CBs, is it possible for the covariate input (X) to be from a continuous space?

Pretty sure this should have alpha multiplied into it

(btw, thx for the awesome work!!)

contextualbandits/contextualbandits/utils.py

Line 819 in 78ea43a

pred += (X.dot(self.Ainv) * X).sum(axis=1)

Trying to line up with algo 1 in http://proceedings.mlr.press/v15/chu11a/chu11a.pdf

Possibly unexpected behaviour of decision function

Hello @david-cortes, thanks for this Contextual Bandits package.

While using some of the online methods (BootstrappedTS, AdaptiveGreedy, maybe some others) from this package, I've faced some unexpected (at least to me) behaviour of decision_function and other related functions like predict.

Let's use some simple dummy data (it doesn't matter much) as an example:

import numpy as np 
from contextualbandits.online import * 
from sklearn.datasets import load_iris 
from sklearn.linear_model import LogisticRegression

RANDOM_STATE = 42

X, y = load_iris(return_X_y=True)
a = np.random.randint(3, size=len(y))
r = 1 * (y == a)

cb_model_1 = BootstrappedTS(
    base_algorithm=LogisticRegression(max_iter=10000), random_state=RANDOM_STATE, nchoices=3
)
cb_model_2 = BootstrappedTS(
    base_algorithm=LogisticRegression(max_iter=10000), random_state=RANDOM_STATE, nchoices=3
)
cb_model_1.fit(X, a, r)
cb_model_2.fit(X, a, r)

print(cb_model_1.decision_function(X[0]))
print(cb_model_2.decision_function(X[0]))
print(cb_model_1.decision_function(X[0]))
print(cb_model_2.decision_function(X[0]))

The output I get is

[[0.96298824 0.11472752 0.00019669]]
[[0.96298824 0.11472752 0.00019669]]
[[0.97498834 0.22001592 0.00019669]]
[[0.97498834 0.22001592 0.00019669]]

Setting random_state makes predictions of cb_model_1 and cb_model_2 equal as it should, but it's unclear for me why calling decision_function second time changes the output. Another way to see this behaviour is to compare two predictions of the same model:

pred_1 = cb_model_1.predict(X)
pred_2 = cb_model_1.predict(X)
print((pred_1 == pred_2).mean())

outputs 0.92.

But the most confusing case is when it's needed to get both scores for each arm from decision function and action prediction:

pred = cb_model_1.predict(X)
dec_func = cb_model_1.decision_function(X)
print((np.argmax(dec_func, axis=1) == pred).mean())

outputs 0.96.

So, is this type of behaviour expected? I think it can be related to how some methods work, e.g.

Bootstrapped Thompson Sampling

Performs Thompson Sampling by fitting several models per class on bootstrapped samples, then makes predictions by taking one of them at random for each class.

But in my opinion, setting random state should block this randomization, especially in decision_function, or there should be a way to block it with another parameter.

Is There A Message Board For This?

I'm extremely new to the subject of contextual bandits and reinforcement learning. What I am interested in is how people use contextual bandits for advertising at scale. How do you average the rewards out?

Single Model

Are there any implemented algorithms in this library which uses a single unified model for all arms?

Allowing two armed bandits?

Hello,

Thanks for this nice package.

Why two armed bandit aren't allowed?

Changing:

def _check_constructor_input(base_algorithm, nchoices, batch_train=False):
    assert nchoices > 2
    assert isinstance(nchoices, int)
    assert ('fit' in dir(base_algorithm)) and ('predict' in dir(base_algorithm))
    if batch_train:
        assert 'partial_fit' in dir(base_algorithm)
    return None

To:

def _check_constructor_input(base_algorithm, nchoices, batch_train=False):
    assert nchoices >= 2
    assert isinstance(nchoices, int)
    assert ('fit' in dir(base_algorithm)) and ('predict' in dir(base_algorithm))
    if batch_train:
        assert 'partial_fit' in dir(base_algorithm)
    return None

Was enough to make it work.

Is there any Concern over Adding costsensitive Package as a Dependency?

Just out of curiosity, is there any concern over adding the costsensitive package to the install_requires list? The package could be used by users, but it was not installed automatically.

try:
    from costsensitive import _BinTree
except:
    raise ValueError("This functionality requires package 'costsensitive'.\nCan be installed with 'pip install costsensitive'.")

AttributeError: '_BetaPredictor' object has no attribute 'partial_fit'

Hi, first want to thank you for sharing this marvellous job.

I'm trying to follow the example on notebook 3.3 Streaming models
When using softmax_explorer.partial_fit(), I get error as below

Traceback (most recent call last):
File "/anaconda3/envs/project/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3325, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "", line 1, in
runfile('/Users/user/projects/project/project/test/test_adressa_with_contextual_bandit_agent.py', wdir='/Users/user/projects/project/project/test')
File "/Applications/PyCharm.app/Contents/helpers/pydev/_pydev_bundle/pydev_umd.py", line 197, in runfile
pydev_imports.execfile(filename, global_vars, local_vars) # execute the script
File "/Applications/PyCharm.app/Contents/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/Users/user/projects/project/project/test/test_adressa_with_contextual_bandit_agent.py", line 64, in
np.array(current_batch_rewards))
File "/Users/user/projects/project/project/example_agents/contextual_bandits.py", line 86, in re_train
self.selected_model = self.selected_model.partial_fit(X=states, a=actions, r=rewards)
File "/anaconda3/envs/project/lib/python3.7/site-packages/contextualbandits/online.py", line 210, in partial_fit
self._oracles.partial_fit(X, a, r)
File "/anaconda3/envs/project/lib/python3.7/site-packages/contextualbandits/utils.py", line 598, in partial_fit
Parallel(n_jobs=self.njobs, verbose=0, require="sharedmem")(delayed(self._partial_fit_single)(choice, X, a, r) for choice in range(self.n))
File "/anaconda3/envs/project/lib/python3.7/site-packages/joblib/parallel.py", line 934, in call
self.retrieve()
File "/anaconda3/envs/project/lib/python3.7/site-packages/joblib/parallel.py", line 833, in retrieve
self._output.extend(job.get(timeout=self.timeout))
File "/anaconda3/envs/project/lib/python3.7/multiprocessing/pool.py", line 657, in get
raise self._value
File "/anaconda3/envs/project/lib/python3.7/multiprocessing/pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "/anaconda3/envs/project/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 567, in call
return self.func(*args, **kwargs)
File "/anaconda3/envs/project/lib/python3.7/site-packages/joblib/parallel.py", line 225, in call
for func, args, kwargs in self.items]
File "/anaconda3/envs/project/lib/python3.7/site-packages/joblib/parallel.py", line 225, in
for func, args, kwargs in self.items]
File "/anaconda3/envs/project/lib/python3.7/site-packages/contextualbandits/utils.py", line 608, in _partial_fit_single
self.algos[choice].partial_fit(xclass, yclass, classes = [0, 1])
AttributeError: '_B

etaPredictor' object has no attribute 'partial_fit'

What being passed in _partial_fit_single is as below:

self.algos: [<contextualbandits.utils._BetaPredictor object at 0x1a1dd01518>, <contextualbandits.utils._BetaPredictor object at 0x1a20aaff98>]
choice: 1

Any idea why? Any help will be appreciated.
Thanks in advance. 🌻

_BasePolicy.add_arm(); NameError: name 'base_algorithm' is not defined

When using BootstrappedUCB.add_arm() the NameError: name 'base_algorithm' is not defined is raised on line 297 under the _BasePolicy.add_arm() method.

fitted_classifier = self._make_bootstrapped(base_algorithm, self._percentile, 
                                            self._ts_byrow, self._ts_weighted)

When looking at the code, the base_algorithm is not part of the function signature but is referred to as self. base_algorithm at line 300.

 if isinstance(self.base_algorithm, list):
            if (fitted_classifier is not None):
                raise ValueError("Must pass 'fitted_classifier' when using different 'base_algorithm' per arm.")

To be corrected.

Import error: Cyclic dependency

I am getting the following error in the init.py of linreg

ImportError: cannot import name 'wrapper_double' from partially initialized module 'contextualbandits.linreg' (most likely due to a circular import) (C:...\python\contextualbandits-master\contextualbandits\linreg_init.py)

Is there an easy fix to this?

ParametricTS fails with: '_OneVsRest' object has no attribute 'beta_counters'

trying to use ParametricTS causes the following error.

usage:

base_model = XGBRegressor(n_estimators=20)
cb_model = cb.online.ParametricTS(base_model, nchoices=actions)
...
cb_model.predict(df_context)

env:
contextualbandits 0.3.17.post3 (installed via pip)
Python 3.9.13
OS: Mac OSX 12.5.1

Stack trace:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
~/miniconda3/envs/env1/lib/python3.9/site-packages/contextualbandits/online.py in predict(self, X, exploit, output_score)
    608             scores = self._exploit(X)
    609         else:
--> 610             scores = self.decision_function(X)
    611         pred = self._name_arms(np.argmax(scores, axis = 1))
    612 

~/miniconda3/envs/env1/lib/python3.9/site-packages/contextualbandits/online.py in decision_function(self, X)
    507             else:
    508                 return self._predict_from_beta_prior_and_smoothing(X.shape[0])
--> 509         return self._score_matrix(X)
    510 
    511     def _predict_from_beta_prior_and_smoothing(self, n):

~/miniconda3/envs/env1/lib/python3.9/site-packages/contextualbandits/online.py in _score_matrix(self, X)
   3341     def _score_matrix(self, X):
   3342         pred = self._oracles.decision_function(X)
-> 3343         counters = self._oracles.get_nobs_by_arm()
   3344         with_model = counters >= self.beta_prior[1]
   3345         counters = counters.reshape((1,-1))

~/miniconda3/envs/env1/lib/python3.9/site-packages/contextualbandits/utils.py in get_nobs_by_arm(self)
   1006 
   1007     def get_nobs_by_arm(self):
-> 1008         return self.beta_counters[1] + self.beta_counters[2]
   1009 
   1010     def exploit(self, X):

AttributeError: '_OneVsRest' object has no attribute 'beta_counters'

reference to the methods describe

do you have a hint for an online course teaching these subjects?

Problem with explore_rounds in online.py

def _predict(self, X, exploit = False):
        X = _check_X_input(X)
        
        if X.shape[0] == 0:
            return np.array([])
        
        if exploit:
            return self._oracles.predict(X)
        
        if self.explore_cnt < self.explore_rounds:
            self.explore_cnt += X.shape[0]
            
            # case 1: all predictions are within allowance
            if self.explore_cnt <= self.explore_rounds:
                return np.random.randint(self.nchoices, size = X.shape[0])
            
            # case 2: some predictions are within allowance, others are not
            else:
                n_explore = self.explore_rounds - self.explore_cnt
                pred = np.zeros(X.shape[0])
                pred[:n_explore] = np.random.randint(self.nchoices, n_explore)
                pred[n_explore:] = self._oracles.predict(X)
                return pred
        else:
            return self._oracles.predict(X)

This part of the code in online.py results in (low>=high) error in case 2. Guess the problem is that n_explore being negative.

pip install error on macosx:'random' file not found

pip install contextualbandits
returned error as follow, any advice? thx
......
Using cached
https://pypi.tuna.tsinghua.edu.cn/packages/bd/79/d6b51a83c84b047cf8012ef70f4dae62654e4ca5dfac792589e5709d71f1/contextualbandits-0.3.6.tar.gz (58 kB)
......
Building wheels for collected packages: contextualbandits
Building wheel for contextualbandits (PEP 517) ... error
.......
In file included from contextualbandits/_cy_utils.cpp:629:
contextualbandits/cy_cpp_helpers.cpp:3:10: fatal error: 'random' file not found
#include <random>

XGBClassifier becomes un-serializable after being used as a base_model

When using XGBoost models as base models for online models such as EpsilonGreedy, properties are being added to XGBoost models that prevent the models from being deserialized with either pickle or dill, as these properties don't exist in the original model class.

usage:

import contextualbandits as cb
from xgboost import XGBClassifier

arms = ["a", "b", "c"]
base_model = XGBClassifier(n_estimators=20)
cb_model = cb.online.EpsilonGreedy(base_model, nchoices=arms)
X = pd.DataFrame([0])
a = pd.Series(["a"])
r = pd.Series([1])
cb_model.fit(X, a, r)

dill.loads(dill.dumps(cb_model)) # dill.loads() fails with the error below:

AttributeError                            Traceback (most recent call last)
/var/folders/g5/lpnvjwrd2h95lf50zlb22dt00000gn/T/ipykernel_70859/3296061279.py in <module>
      9 cb_model.fit(X, a, r)
     10 
---> 11 dill.loads(dill.dumps(cb_model))

~/miniconda3/envs/env1/lib/python3.9/site-packages/dill/_dill.py in loads(str, ignore, **kwds)
    385     """
    386     file = StringIO(str)
--> 387     return load(file, ignore, **kwds)
    388 
    389 # def dumpzs(obj, protocol=None):

~/miniconda3/envs/env1/lib/python3.9/site-packages/dill/_dill.py in load(file, ignore, **kwds)
    371     See :func:`loads` for keyword arguments.
    372     """
--> 373     return Unpickler(file, ignore=ignore, **kwds).load()
    374 
    375 def loads(str, ignore=None, **kwds):

~/miniconda3/envs/env1/lib/python3.9/site-packages/dill/_dill.py in load(self)
    644 
    645     def load(self): #NOTE: if settings change, need to update attributes
--> 646         obj = StockUnpickler.load(self)
    647         if type(obj).__module__ == getattr(_main_module, '__name__', '__main__'):
    648             if not self._ignore:

AttributeError: 'XGBClassifier' object has no attribute '_decision_function_w_sigmoid_from_predict'

Bibtex_data.txt

Where is Bibtex_data.txt located?

object has no attribute '_oracles'

Hi there,

Wherever I try to predict using bootstrapped_ucb, active_explorer, adaptive_active_greedy etc. with a SGD classifier as the base, I'm seeing the error below:

AttributeError Traceback (most recent call last)
in
----> 1 adaptive_active_greedy.predict(batch_S)

~/anaconda3/envs/tensorflow/lib/python3.6/site-packages/contextualbandits/online.py in predict(self, X, exploit)
910 """
911 # TODO: add option to output scores
--> 912 return self._name_arms(self._predict(X, exploit))
913
914 def _predict(self, X, exploit = False):

~/anaconda3/envs/tensorflow/lib/python3.6/site-packages/contextualbandits/online.py in _predict(self, X, exploit)
931 # case 1: number of predictions to make would still fit within current window
932 if remainder_window > X.shape[0]:
--> 933 pred, pred_max = self.calc_preds(X)
934 self.window_cnt += X.shape[0]
935 self.window = np.r[self.window, pred_max]

~/anaconda3/envs/tensorflow/lib/python3.6/site-packages/contextualbandits/online.py in _calc_preds(self, X)
975
976 def _calc_preds(self, X):
--> 977 pred_proba = self._oracles.decision_function(X)
978 pred_max = pred_proba.max(axis = 1)
979 pred = np.argmax(pred_proba, axis = 1)

AttributeError: 'AdaptiveGreedy' object has no attribute '_oracles'

How to change the reward to continous interval?

Is there any way, I can change the reward from binary {0,1} to continuous [0,1[?

Basic example with smallest dataset/random rewards (onboarding)

Hello @david-cortes, thanks for making contextualbandits!
I'm currently exploring solutions to recommend products based on user preferences and it looks like contextual bandits is an interesting approach.
In this small gist (105 lines), I built a simulation of conversation turns with updated item scores based on updated user preferences.
If the user likes a recommendation (let's say that the system is presenting max 3 items), the RS will take the liked items into consideration by finding similar items (cosine similarity) as well.

I would love to use contextualbandits to run experiments but I'm not sure how to use it with a very simple dataset (like the one in my gist). Basically, a vector of item features.
I would be more than grateful if I could get any help or advice.

Question about using contextual bandits in specific case

Hi everyone,
I am working on solving a peg-in hole problem. Initially I stared with RL approach but it seems like its not the right approach for my problem.

Task description

Setup: Robot placed on the table, board with cylindrical peg on the table and cylindrical peg.

Task: Use robot to insert peg inside the cylindrical hole in the board despite small error in the exact location of the cylindrical hole.

Description: In order to accomplish the task the parameters of the controller need to be learnt, the goal is to learn one set of parameters per episode that can solve the problem for a given radius of the peg and hole - with an ability to generalise to other sizes.
Due to the fact that only one set of controller parameters should be learnt the problem is not an RL problem but more of an contextual bandit problem.
The states are not feed to policy at each timestep, instead the context is used (which is the position of the hole) and is feed to the policy only at the beginning of each episode. Given the context policy outputs actions (parameters of controller) which are used throughout the episode.
During the episode, at each timestep the reward is calculated and at the end of the episode the rewards are summed and saved and should be used to update the policy.

As I am more familiar with the RL approach I was wondering if someone more experienced could advise if using contextual banidts is the right way to go and if so, what algorithm recommendation do you have?
For simulation I use robosuite which has gym-like structure

Thank you for your help

Doubt with documentation regarding EpsilongGreedy 'assume_unique_reward' parameter

On the EpsilonGreedy algorithm, the documentation says that:

assume_unique_reward (bool) – Whether to assume that only one arm has a reward per observation. If set to False, whenever an arm receives a reward, the classifiers for all other arms will be fit to that observation too, having negative label.

But from the code, it appears it is the other way. Am I misunderstanding something?

contextualbandits/contextualbandits/utils.py

Lines 613 to 625 in 719c2c9

 def _filter_arm_data(self, X, a, r, choice): 

 if self.assume_un: 

 this_choice = (a == choice) 

 arms_w_rew = (r == 1) 

 yclass = r[this_choice | arms_w_rew] 

 yclass[arms_w_rew & (~this_choice) ] = 0 

 this_choice = this_choice | arms_w_rew 

 else: 

 this_choice = (a == choice) 

 yclass = r[this_choice] 

 ## Note: don't filter X here as in many cases it won't end up used 

 return yclass, this_choice

Ques: GPU or CPU ops?

Is this framework GPU bound or CPU bound operations?

How to set v_sq in linTS?

For Linear Thompson Sampling predictions, there's a value v_sq, which is used to multiply the covariance matrix of model parameters when sampling from the multivariate norm defined by it. In my research, it looks like v is supposed to be the sum of squared residuals of the output value divided by the number of training rows up to this point which comes up in many standard error of coefficient calculations. So how does someone determine this value without storing all the data and re-computing the prediction?

Meanwhile the Agarwal paper sets v to Rsqrt(9d*ln(T/delta) where d is the number of dimensions or features in the linear regression, R is... something... it looks like an absolute bound on the residuals, T is the time horizon for regret calculations so it can be replaced by the current number of training rows, and delta is a fungible parameter between 0 and 1 which determines the regret behavior (higher delta means less variance in predictions and a higher likelihood of exceeding the guaranteed regret bounds).

So... uh... this is really confusing, and how to do you set v_sq then? There's not much documentation in the code.

Unable to pickle batch training models

I think this is because _robust_predict used for batch training is a bounded function and does not get pickled properly. Getting the following error when i try to load the pickle file -

AttributeError: 'SGDClassifier' object has no attribute '_robust_predict'

Need more understanding for the method beta_prior

Any good strategies to run Simulation other then one in Online policy example

TypeError: contextual bandits with custom 'choice_names' (online.py)

First of all, thank you for fixing my issues #62.

I want to setting my CB model with custom 'choice_names' (integer) for using serial number of choices in my example data. I got TypeError, when predict new data with 'topN method'.

This error is not occur when setting no custom 'choice_names'. how can i get some hint to fix that error?

model.topN(X_new, model.nchoices)

TypeError: only integer scalar arrays can be converted to a scalar index
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<command-3531050> in <module>
     67                          , batch_size = 10
     68                          , noptions   = models[0].nchoices
---> 69                          , mode       = "allscore") _<- this part is just my custom function code_
     70 
     71 t_end   = time() # -------------------------------------------------------- 끝

<command-3531050> in predict_cb(X_new, model, batch_size, noptions, mode)  
     40 
     41       elif mode == "allscore":
---> 42         imp_temp    = model.topN(X_batch,noptions) _<- this part is just my custom function code_
     43         imp_new     = np.concatenate((imp_new,imp_temp),axis=0)
     44 

/databricks/python/lib/python3.7/site-packages/contextualbandits/online.py in topN(self, X, n)
    574         else:
    575             topN = topN_byrow(scores, n, self.njobs)
--> 576         return self._name_arms(topN)
    577 
    578 

/databricks/python/lib/python3.7/site-packages/contextualbandits/online.py in _name_arms(self, pred)
    143             return pred
    144         else:
--> 145             return self.choice_names[pred]
    146 
    147     def drop_arm(self, arm_name):

TypeError: only integer scalar arrays can be conver
```ted to a scalar index

Regarding Bibtex data in example

Hi @david-cortes,

Do you have an example data set or reference to this that is used in your simulation?

Thank you,

Saran

How to deal with this Scenario while applying CB techniques

Cannot access model attributes when arms are floats

I've been trying to access an sklearn's model coefficients and got an AttributeError, it seems like something modifies the base model.

This issue gets resolved if the arms are integers.

Minimal code that causes the exception:


import numpy as np
from contextualbandits import online as cb
from sklearn.naive_bayes import BernoulliNB
aa = np.array([0.2, 0.1, 0.2, 0.1, 0.4, 1.1, 0.3, 0.5, 0.6, 0.7])
XX = np.array([[1, 0, 0],
       [0, 1, 0],
       [0, 1, 0],
       [0, 1, 0],
       [0, 1, 0],
       [0, 1, 0],
       [0, 1, 0],
       [0, 1, 0],
       [0, 1, 0],
       [0, 1, 0]])
rr = np.array([0, 0, 1, 1, 1, 1, 1, 1, 1, 1])
n_arms = len({t for t in aa})
model = cb.SeparateClassifiers(BernoulliNB(), n_arms, batch_train=True)
model.partial_fit(XX, aa, rr)
[m.coef_ for m in model._oracles.algos]

Getting topN arm features in Offpolicy method

Is it possible to get probabilities and topN arms in other methods like DoublyRobust and OffsetTree apart from Bootstrapped TS

Bibtex_data In Examples?

Hi @david-cortes,

I was wondering if you had the example data set that is use in your examples?

That or a reference would be much appreciated.

Reproducibility issue with current branch of library?

Hi,

A group I was in tried to reproduce some of the results from the paper associated with this repo.
To be honest, it was a bit more rushed than I'd like, there is probably a fair amount we could have done better by reaching out, I don't think there are any really significant findings - except we did find some curious behavior when comparing the cython vs pure-python implementation of some algorithms. We kind of ran out of time to isolate what was causing the change, we mainly noticed it because one of the group members was having issues with the cython and decided to use the python-pure version.

I was able to reproduce his results (comparing our own implementation, contextual-bandits 0.1.8.5 (matched), and contextual-bandits 0.3.13 (didn't match)). The link to the repo/paper is below - Figure 4 under Section 3.2.1 highlights the issue.

https://github.com/MrinalJain17/ml-tools-project/blob/main/Contextual_Bandit_Comparison.pdf

I believe one of the main reasons why we were unable to completely reproduce the original paper may stem from this issues. Having some group members using different environments (versions of the library) quite possibly skewed some of our findings and isn't really addressed in our write-up.

That said, I hope you have better luck figuring out what exactly is changing the current branch's behavior. Considering we implemented the ContextualAdaptiveGreedy algorithm from scratch and it tracked the 0.1.8.5 version - I'm assuming the problem isn't there...

Question regarding using contextual bandits for Learning-To-Rank

Hey,

I want to use MAB in an LTR problem. Can you help me figure out which algorithm to use and a little about how to use it?
(a beginner in MAB I am)

I'm thinking of using the functionality for ranking top-N arms instead of always picking the single best one.
Any suggestion would help. Thanks in advance.

	def _filter_arm_data(self, X, a, r, choice):
	if self.assume_un:
	this_choice = (a == choice)
	arms_w_rew = (r == 1)
	yclass = r[this_choice \| arms_w_rew]
	yclass[arms_w_rew & (~this_choice) ] = 0
	this_choice = this_choice \| arms_w_rew
	else:
	this_choice = (a == choice)
	yclass = r[this_choice]

	## Note: don't filter X here as in many cases it won't end up used
	return yclass, this_choice

david-cortes / contextualbandits Goto Github PK

contextualbandits's People

Contributors

Stargazers

Watchers

Forkers

contextualbandits's Issues

` AssertionError:

Recommend Projects

Recommend Topics

Recommend Org

Jobs

`
AssertionError: