titu1994 / keras-adabound Goto Github PK
View Code? Open in Web Editor NEWKeras implementation of AdaBound
License: MIT License
Keras implementation of AdaBound
License: MIT License
I think it might make sense to add this optimizer to the main keras repo!
File "/adabound.py", line 40, in init
self.lr = K.variable(lr, name='lr')
AttributeError: can't set attribute
https://github.com/CyberZHG/keras-adabound/blob/master/keras_adabound/optimizers.py
K.minimum(K.maximum(step, lower_bound), upper_bound)
will not work?
I cannot find "final lr" term in original paper.
Can you please explain what is this.
I have downloaded the files and placed them in a folder in the site packages for my virtual environment but I can't get this to work. I have added the folder path to sys.path and verified it is listed. I'm running Tensorflow 2.1.0. What am I doing wrong?
same as my PR keras-team/keras-contrib#478
works only with TF backend
class AdaBound(Optimizer):
"""AdaBound optimizer.
Default parameters follow those provided in the original paper.
# Arguments
lr: float >= 0. Learning rate.
final_lr: float >= 0. Final learning rate.
beta_1: float, 0 < beta < 1. Generally close to 1.
beta_2: float, 0 < beta < 1. Generally close to 1.
gamma: float >= 0. Convergence speed of the bound function.
epsilon: float >= 0. Fuzz factor. If `None`, defaults to `K.epsilon()`.
decay: float >= 0. Learning rate decay over each update.
weight_decay: Weight decay weight.
amsbound: boolean. Whether to apply the AMSBound variant of this
algorithm.
tf_cpu_mode: only for tensorflow backend
0 - default, no changes.
1 - allows to train x2 bigger network on same VRAM consuming RAM
2 - allows to train x3 bigger network on same VRAM consuming RAM*2
and CPU power.
# References
- [Adaptive Gradient Methods with Dynamic Bound of Learning Rate]
(https://openreview.net/forum?id=Bkg3g2R9FX)
- [Adam - A Method for Stochastic Optimization]
(https://arxiv.org/abs/1412.6980v8)
- [On the Convergence of Adam and Beyond]
(https://openreview.net/forum?id=ryQu7f-RZ)
"""
def __init__(self, lr=0.001, final_lr=0.1, beta_1=0.9, beta_2=0.999, gamma=1e-3,
epsilon=None, decay=0., amsbound=False, weight_decay=0.0, tf_cpu_mode=0, **kwargs):
super(AdaBound, self).__init__(**kwargs)
if not 0. <= gamma <= 1.:
raise ValueError("Invalid `gamma` parameter. Must lie in [0, 1] range.")
with K.name_scope(self.__class__.__name__):
self.iterations = K.variable(0, dtype='int64', name='iterations')
self.lr = K.variable(lr, name='lr')
self.beta_1 = K.variable(beta_1, name='beta_1')
self.beta_2 = K.variable(beta_2, name='beta_2')
self.decay = K.variable(decay, name='decay')
self.final_lr = final_lr
self.gamma = gamma
if epsilon is None:
epsilon = K.epsilon()
self.epsilon = epsilon
self.initial_decay = decay
self.amsbound = amsbound
self.weight_decay = float(weight_decay)
self.base_lr = float(lr)
self.tf_cpu_mode = tf_cpu_mode
def get_updates(self, loss, params):
grads = self.get_gradients(loss, params)
self.updates = [K.update_add(self.iterations, 1)]
lr = self.lr
if self.initial_decay > 0:
lr = lr * (1. / (1. + self.decay * K.cast(self.iterations,
K.dtype(self.decay))))
t = K.cast(self.iterations, K.floatx()) + 1
# Applies bounds on actual learning rate
step_size = lr * (K.sqrt(1. - K.pow(self.beta_2, t)) /
(1. - K.pow(self.beta_1, t)))
final_lr = self.final_lr * lr / self.base_lr
lower_bound = final_lr * (1. - 1. / (self.gamma * t + 1.))
upper_bound = final_lr * (1. + 1. / (self.gamma * t))
e = K.tf.device("/cpu:0") if self.tf_cpu_mode > 0 else None
if e: e.__enter__()
ms = [K.zeros(K.int_shape(p), dtype=K.dtype(p)) for p in params]
vs = [K.zeros(K.int_shape(p), dtype=K.dtype(p)) for p in params]
if self.amsbound:
vhats = [K.zeros(K.int_shape(p), dtype=K.dtype(p)) for p in params]
else:
vhats = [K.zeros(1) for _ in params]
if e: e.__exit__(None, None, None)
self.weights = [self.iterations] + ms + vs + vhats
for p, g, m, v, vhat in zip(params, grads, ms, vs, vhats):
# apply weight decay
if self.weight_decay != 0.:
g += self.weight_decay * K.stop_gradient(p)
e = K.tf.device("/cpu:0") if self.tf_cpu_mode == 2 else None
if e: e.__enter__()
m_t = (self.beta_1 * m) + (1. - self.beta_1) * g
v_t = (self.beta_2 * v) + (1. - self.beta_2) * K.square(g)
if self.amsbound:
vhat_t = K.maximum(vhat, v_t)
self.updates.append(K.update(vhat, vhat_t))
if e: e.__exit__(None, None, None)
if self.amsbound:
denom = (K.sqrt(vhat_t) + self.epsilon)
else:
denom = (K.sqrt(v_t) + self.epsilon)
# Compute the bounds
step_size_p = step_size * K.ones_like(denom)
step_size_p_bound = step_size_p / denom
bounded_lr_t = m_t * K.minimum(K.maximum(step_size_p_bound,
lower_bound), upper_bound)
p_t = p - bounded_lr_t
self.updates.append(K.update(m, m_t))
self.updates.append(K.update(v, v_t))
new_p = p_t
# Apply constraints.
if getattr(p, 'constraint', None) is not None:
new_p = p.constraint(new_p)
self.updates.append(K.update(p, new_p))
return self.updates
def get_config(self):
config = {'lr': float(K.get_value(self.lr)),
'final_lr': float(self.final_lr),
'beta_1': float(K.get_value(self.beta_1)),
'beta_2': float(K.get_value(self.beta_2)),
'gamma': float(self.gamma),
'decay': float(K.get_value(self.decay)),
'epsilon': self.epsilon,
'weight_decay': self.weight_decay,
'amsbound': self.amsbound}
base_config = super(AdaBound, self).get_config()
return dict(list(base_config.items()) + list(config.items()))
Thanks for a good optimizer
According to usage
optm = AdaBound(lr=1e-03,
final_lr=0.1,
gamma=1e-03,
weight_decay=0.,
amsbound=False)
Does the learning rate gradually increase by the number of steps?
final lr is described as Final learning rate.
but it actually is leaning rate relative to base lr and current klearning rate?
Line 72 in 5ce819b
this param is not saved.
I looked at official pytorch implementation from original paper.
https://github.com/Luolc/AdaBound/blob/master/adabound/adabound.py
it has
# State initialization
if len(state) == 0:
state['step'] = 0
state is saved with the optimizer.
also it has
# Exponential moving average of gradient values
state['exp_avg'] = torch.zeros_like(p.data)
# Exponential moving average of squared gradient values
state['exp_avg_sq'] = torch.zeros_like(p.data)
these values should also be saved
So your keras implementation is wrong.
I installed with
pip install keras-adabound
imported with:
from keras_adabound import AdaBound
and declared the optimizer as:
opt = AdaBound(lr=1e-03,final_lr=0.1, gamma=1e-03, weight_decay=0., amsbound=False)
Then, I'm getting the error:
TypeError: Unexpected keyword argument passed to optimizer: amsbound
changing the pip install to adabound (instead of keras-adabound) and the import to from adabound import AdaBound, the keyword amsbound is recognized, but then I get the error:
TypeError: __init__() missing 1 required positional argument: 'params'
Am I mixing something up here or missing something?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.