Comments (7)
Quick questions,
- which Pyro example are you referring to?
- What do you mean by "the std of
y_mean
"?- If you are talking about
y_logstd
(aleatoric uncertainty), in this example (and most BNN codes), it is a hyperparameter which we will optimize, i.e. we do not place prior or conduct posterior inference on it. So it should only appear in the likelihood. - If you are talking about
Var_{q(w)}[y_mean]
(epistemic uncertainty), it's accounted for inq(w)
by definition.
- If you are talking about
from zhusuan.
Thank u for responsing so fast
i want to count p(y*|x*,w,x,y), include the std and mean of y*. As i know ,normally q(w) is used to tract the likelyhood, so i think should i count y* by q(w)? And i did it by zhusuan just now, seems looks good.
the code here:
def mean_field_variational(x,layer_sizes, n_particles):
with zs.BayesianNet() as variational:
ws = []
for i, (n_in, n_out) in enumerate(zip(layer_sizes[:-1],
layer_sizes[1:])):
w_mean = tf.get_variable(
'w_mean_' + str(i), shape=[1, n_out, n_in + 1],
initializer=tf.constant_initializer(0.))
w_logstd = tf.get_variable(
'w_logstd_' + str(i), shape=[1, n_out, n_in + 1],
initializer=tf.constant_initializer(0.))
ws.append(
zs.Normal('w' + str(i), w_mean, logstd=w_logstd,
n_samples=n_particles, group_ndims=2))
# forward
ly_x = tf.expand_dims(
tf.tile(tf.expand_dims(x, 0), [n_particles, 1, 1]), 3)
for i in range(len(ws)):
w = tf.tile(ws[i], [1, tf.shape(x)[0], 1, 1])
ly_x = tf.concat(
[ly_x, tf.ones([n_particles, tf.shape(x)[0], 1, 1])], 2)
ly_x = tf.matmul(w, ly_x) / tf.sqrt(tf.to_float(tf.shape(ly_x)[2]))
if i < len(ws) - 1:
ly_x = tf.nn.relu(ly_x)
#print("qw ly_xshape")
#print(ly_x.shape)
y_mean = tf.squeeze(ly_x, [2, 3])
return variational, y_mean
variational,y_qw = mean_field_variational(x,layer_sizes, n_particles)
the y_qw is what i want, the shape is [1,5000,200], 5000 is the numbers of samples. And i count std and mean by it ,looks good...
And i tested on pyro by its example https://github.com/uber/pyro/blob/dev/examples/bayesian_regression.py
like this, and i made a bayesian nn
from zhusuan.
- Your code seems equivalent to (keeping the graph construction code in the example intact, and do the following for test)
y_mean_val = sess.run(
y_mean,
feed_dict={n_particles: ll_samples, # insert large value here
x: x_test, y: y_test})
which is neater IMO. I would be surprised if they produce different results.
- The pyro code corresponds to a Bayesian linear regression model, and I'm not sure when you adapt it to BNNs, you have made sure the modified model is exactly the same to our BNN example. For example, our code uses N(0,1) for weight and biases and scaled the output by
1/sqrt(n_in)
, while the Pyro example used N(0, 2). Which choice is more appropriate depends on the dataset you are dealing with, and you should invest some effort on model specification *. You should also pay attention to things like optimizer hyperparameters when you change the model.
If you have controled all variables and still get very different results, there should be a bug in either side (which seems unlikely). Let me know if this is the case.
(*) The more principled approach is to build a hierarchical model. But you still need to think about the hierarchical prior, or do some diagnosis afterwards.
from zhusuan.
below is the codes of bnn:
X_train, Y_train = Variable(torch.tensor(X)), Variable(torch.tensor(Y)).reshape(200,1)
X_test, Y_test = Variable(torch.tensor(X)), Variable(torch.tensor(Y)).reshape(200,1)
print(X_train.shape,Y_train.shape)
data = torch.cat((X_train, Y_train), 1)
def get_batch_indices(N, batch_size):
all_batches = np.arange(0, N, batch_size)
if all_batches[-1] != N:
all_batches = list(all_batches) + [N]
return all_batches
class Net(torch.nn.Module):
def __init__(self, n_feature, n_hidden):
super(Net, self).__init__()
self.hidden = torch.nn.Linear(n_feature, n_hidden) # hidden layer
self.predict = torch.nn.Linear(n_hidden, 1) # output layer
def forward(self, x):
x = self.hidden(x)
x = self.predict(x)
return x
first_layer = len(X_train.data.numpy()[0])
second_layer = 25
softplus = nn.Softplus()
regression_model = Net(first_layer, second_layer)
def model(data):
mu = Variable(torch.zeros(second_layer, first_layer)).type_as(data)
sigma = Variable(torch.ones(second_layer, first_layer)).type_as(data)
bias_mu = Variable(torch.zeros(second_layer)).type_as(data)
bias_sigma = Variable(torch.ones(second_layer)).type_as(data)
w_prior, b_prior = Normal(mu, sigma), Normal(bias_mu, bias_sigma)
mu2 = Variable(torch.zeros(1, second_layer)).type_as(data)
sigma2 = Variable(torch.ones(1, second_layer)).type_as(data)
bias_mu2 = Variable(torch.zeros(1)).type_as(data)
bias_sigma2 = Variable(torch.ones(1)).type_as(data)
w_prior2, b_prior2 = Normal(mu2, sigma2), Normal(bias_mu2, bias_sigma2)
priors = {'hidden.weight': w_prior,
'hidden.bias': b_prior,
'predict.weight': w_prior2,
'predict.bias': b_prior2}
lifted_module = pyro.random_module("module", regression_model, priors)
lifted_reg_model = lifted_module()
with pyro.iarange("map", N, subsample=data):
x_data = data[:, :-1]
y_data = data[:, -1]
# run the regressor forward conditioned on inputs
prediction_mean = lifted_reg_model(x_data).squeeze()
pyro.sample("obs",
Normal(prediction_mean, Variable(torch.ones(data.size(0))).type_as(data)),
obs=y_data.squeeze())
def guide(data):
#print(data.data)
w_mu = Variable(torch.randn(second_layer, first_layer).type(torch.float64), requires_grad=True)
w_log_sig = Variable(0.1 * torch.ones(second_layer, first_layer).type_as(data.data), requires_grad=True)
b_mu = Variable(torch.randn(second_layer).type_as(data.data), requires_grad=True)
b_log_sig = Variable(0.1 * torch.ones(second_layer).type_as(data.data), requires_grad=True)
# register learnable params in the param store
mw_param = pyro.param("guide_mean_weight", w_mu)
sw_param = softplus(pyro.param("guide_log_sigma_weight", w_log_sig))
mb_param = pyro.param("guide_mean_bias", b_mu)
sb_param = softplus(pyro.param("guide_log_sigma_bias", b_log_sig))
# gaussian guide distributions for w and b
w_dist = Normal(mw_param, sw_param)
b_dist = Normal(mb_param, sb_param)
w_mu2 = Variable(torch.randn(1, second_layer).type_as(data.data), requires_grad=True)
w_log_sig2 = Variable(0.1 * torch.randn(1, second_layer).type_as(data.data), requires_grad=True)
b_mu2 = Variable(torch.randn(1).type_as(data.data), requires_grad=True)
b_log_sig2 = Variable(0.1 * torch.ones(1).type_as(data.data), requires_grad=True)
# register learnable params in the param store
mw_param2 = pyro.param("guide_mean_weight2", w_mu2)
sw_param2 = softplus(pyro.param("guide_log_sigma_weight2", w_log_sig2))
mb_param2 = pyro.param("guide_mean_bias2", b_mu2)
sb_param2 = softplus(pyro.param("guide_log_sigma_bias2", b_log_sig2))
# gaussian guide distributions for w and b
w_dist2 = Normal(mw_param2, sw_param2)
b_dist2 = Normal(mb_param2, sb_param2)
dists = {'hidden.weight': w_dist,
'hidden.bias': b_dist,
'predict.weight': w_dist2,
'predict.bias': b_dist2}
# overloading the parameters in the module with random samples from the guide distributions
lifted_module = pyro.random_module("module", regression_model, dists)
# sample a regressor
return lifted_module()
# instantiate optim and inference objects
optim = Adam({"lr": 0.001})
elbo = Trace_ELBO()
svi = SVI(model, guide, optim, loss=elbo)
N = len(X_train)
for j in range(10000):
epoch_loss = 0.0
perm = torch.randperm(N)
# shuffle data
data = data[perm]
# get indices of each batch
all_batches = get_batch_indices(N, 64)
for ix, batch_start in enumerate(all_batches[:-1]):
batch_end = all_batches[ix + 1]
batch_data = data[batch_start: batch_end]
epoch_loss += svi.step(batch_data)
if j % 100 == 0:
print(j, "avg loss {}".format(epoch_loss/float(N)))
preds = []
for i in range(10000):
sampled_reg_model = guide(X_test)
pred = sampled_reg_model(X_test).data.numpy().flatten()
preds.append(pred)
preds = np.array(preds)
mean = np.mean(preds, axis=0)
std = np.std(preds, axis=0)/10
y_test = Y_test.data.numpy()
x = np.arange(len(y_test))
plt.xlim((0, y_test.shape[0]))
sm = np.array([x for x in range(y_test.shape[0])])
plt.plot(sm, y_test[:y_test.shape[0]])
plt.fill_between(x, mean-std, mean+std, alpha = 0.3, color = 'orange')
plt.show()
from zhusuan.
And this is the codes i uses zhusuan now
def main():
#tf.set_random_seed(1237)
#np.random.seed(1234)
# Load UCI Boston housing data
#data_path = os.path.join(conf.data_dir, 'housing.data')
#x_train, y_train, x_valid, y_valid, x_test, y_test = \
#dataset.load_uci_boston_housing(data_path)
#x_train = np.vstack([x_train, x_valid])
#y_train = np.hstack([y_train, y_valid])
x_train,y_train = rollout(policy=random_policy, timesteps=STEP)
for i in range(1,50):
X_, Y_ = rollout(policy=random_policy, timesteps=STEP)
x_train = np.vstack((x_train, X_)).astype('float32')
y_train = np.vstack((y_train, Y_)).astype('float32')
N, n_x = x_train.shape
x_test,y_test1 = rollout(policy=random_policy, timesteps=STEP)
for i in range(1,50):
X_, Y_ = rollout(policy=random_policy, timesteps=STEP)
x_test = np.vstack((x_test, X_)).astype('float32')
y_test1 = np.vstack((y_test1, Y_)).astype('float32')
# Standardize data
x_train, x_test, _, _ = dataset.standardize(x_train, x_test)
y_train, y_test, mean_y_train, std_y_train = dataset.standardize(
y_train, y_test1)
#print(x_train.shape)
#print(y_train.shape)
#print(mean_y_train)
#print(std_y_train)
# Define model parameters
#std_y_train = np.std(y_train)
y_train = y_train.reshape(-1)
y_test = y_test.reshape(-1)
#print(y_train.shape)
n_hiddens = [50]
print(N,n_x)
# Build the computation graph
n_particles = tf.placeholder(tf.int32, shape=[], name='n_particles')
x = tf.placeholder(tf.float32, shape=[None, n_x])
y = tf.placeholder(tf.float32, shape=[None])
layer_sizes = [n_x] + n_hiddens + [1]
print(layer_sizes)
w_names = ['w' + str(i) for i in range(len(layer_sizes) - 1)]
def log_joint(observed):
model, _ = bayesianNN(observed, x, n_x, layer_sizes, n_particles)
log_pws = model.local_log_prob(w_names)
log_py_xw = model.local_log_prob('y')
return tf.add_n(log_pws) + log_py_xw * N
variational,y_qw = mean_field_variational(x,layer_sizes, n_particles)
qw_outputs = variational.query(w_names, outputs=True, local_log_prob=True)
latent = dict(zip(w_names, qw_outputs))
lower_bound = zs.variational.elbo(
log_joint, observed={'y': y}, latent=latent, axis=0)
cost = tf.reduce_mean(lower_bound.sgvb())
lower_bound = tf.reduce_mean(lower_bound)
optimizer = tf.train.AdamOptimizer(learning_rate=0.001)
infer_op = optimizer.minimize(cost)
# prediction: rmse & log likelihood
observed = dict((w_name, latent[w_name][0]) for w_name in w_names)
observed.update({'y': y})
model, y_mean = bayesianNN(observed, x, n_x, layer_sizes, n_particles)
#y_output = variational.outputs('y')
y_pred = tf.reduce_mean(y_mean, 0)
rmse = tf.sqrt(tf.reduce_mean((y_pred - y) ** 2)) * std_y_train
log_py_xw = model.local_log_prob('y')
log_likelihood = tf.reduce_mean(zs.log_mean_exp(log_py_xw, 0)) - \
tf.log(std_y_train)
# Define training/evaluation parameters
lb_samples = 10
ll_samples = 5000
epochs = 500
batch_size = 10
iters = int(np.floor(x_train.shape[0] / float(batch_size)))
test_freq = 10
# Run the inference
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for epoch in range(1, epochs + 1):
lbs = []
for t in range(iters):
x_batch = x_train[t * batch_size:(t + 1) * batch_size]
#print(x_batch.shape)
y_batch = y_train[t * batch_size:(t + 1) * batch_size]
#print(y_batch.shape)
_, lb,cost_ = sess.run(
[infer_op, lower_bound,cost],
feed_dict={n_particles: lb_samples,
x: x_batch, y: y_batch})
lbs.append(lb)
print('Epoch {}: Lower bound = {}:cost={}'.format(epoch, np.mean(lbs),cost_))
if epoch % test_freq == 0:
test_lb, test_rmse, test_ll = sess.run(
[lower_bound, rmse, log_likelihood],
feed_dict={n_particles: ll_samples,
x: x_test, y: y_test})
print('>> TEST')
print('>> Test lower bound = {}, rmse = {}, log_likelihood = {}'
.format(test_lb, test_rmse, test_ll))
y_pred1 = []
y_pred_ = sess.run([y_qw],feed_dict={n_particles: ll_samples,x: x_test, y: y_test})
y_pred_ = np.array(y_pred_[0])
preds = np.array(y_pred_*std_y_train+mean_y_train)
mean = np.mean(preds,axis=0)
print(mean.shape)
std = np.std(preds,axis = 0)
print(std.shape)
#for i in range(100):
#y_pred_ = sess.run([y_qw],feed_dict={n_particles: ll_samples,x: x_test, y: y_test})
#y_pred_ = np.array(y_pred_[0]).reshape(-1,1)
#y_pred1.append(y_pred_)
#print(np.array(y_pred_).shape)
#print(y_pred1.shape)
#preds = np.array(y_pred1*std_y_train+mean_y_train)
#print(preds.shape)
#mean = np.mean(preds, axis=0).reshape(200)
#std = np.std(preds, axis=0).reshape(200)
x = np.arange(200).reshape(200)
#print(y_test.shape)
#print(preds.shape)
#print(mean.shape)
#print(std)
#print(x.shape)
#plt.figure()
#plt.plot(data[:,0:-1])
#plt.plot(data[:,-1], linestyle = '--')
#plt.show()
plt.figure()
plt.plot(x, y_test1)
plt.plot(x, mean, linestyle = '--')
#plt.plot(x, mean-std, linestyle = '--')
#plt.plot(x, mean+std, linestyle = '--')
plt.fill_between(x, mean-std, mean+std, alpha = 0.3, color = 'orange')
plt.show()
from zhusuan.
- Your code seems equivalent to (keeping the graph construction code in the example intact, and do the following for test)
y_mean_val = sess.run( y_mean, feed_dict={n_particles: ll_samples, # insert large value here x: x_test, y: y_test})
which is neater IMO. I would be surprised if they produce different results.
- The pyro code corresponds to a Bayesian linear regression model, and I'm not sure when you adapt it to BNNs, you have made sure the modified model is exactly the same to our BNN example. For example, our code uses N(0,1) for weight and biases and scaled the output by
1/sqrt(n_in)
, while the Pyro example used N(0, 2). Which choice is more appropriate depends on the dataset you are dealing with, and you should invest some effort on model specification *. You should also pay attention to things like optimizer hyperparameters when you change the model.
If you have controled all variables and still get very different results, there should be a bug in either side (which seems unlikely). Let me know if this is the case.(*) The more principled approach is to build a hierarchical model. But you still need to think about the hierarchical prior, or do some diagnosis afterwards.
I changed the code, not only reuse it, original code is
def mean_field_variational(x,layer_sizes, n_particles):
with zs.BayesianNet() as variational:
ws = []
for i, (n_in, n_out) in enumerate(zip(layer_sizes[:-1], layer_sizes[1:])):
w_mean = tf.get_variable('w_mean_' + str(i), shape=[1, n_out, n_in + 1],
initializer=tf.constant_initializer(0.))
w_logstd = tf.get_variable('w_logstd_' + str(i), shape=[1, n_out, n_in + 1],
initializer=tf.constant_initializer(0.))
ws.append(zs.Normal('w' + str(i), w_mean, logstd=w_logstd,
n_samples=n_particles, group_ndims=2))
return variational
without forward in mean_field_variational, and i add forward parts
from zhusuan.
It seems your torch code implements a deep linear network, while the Zhusuan model has ReLU nonlinearity. In that case you should certainly expect different results...
from zhusuan.
Related Issues (20)
- questions about dlgm_nf.py HOT 1
- Can't compute prior (local_log_prob) of a StochasticTensor inside tf.scan (in LSTM cell) HOT 11
- Clarifying the * N in log_joint? HOT 4
- Dirichlet + Categorical or Dirichlet + Multinomial toy example ? HOT 5
- Collaboration with TensorLayer HOT 5
- save and restore models? HOT 4
- I have some trouble translating a model from PyMC3 HOT 4
- 请问哪里能找到zhusuan的中文文档? HOT 4
- AttributeError: module 'progressbar' has no attribute 'DataSize' HOT 1
- Memory leaks caused by VariationalObjective HOT 2
- How to use custom Hamiltonian? HOT 5
- Eager executation HOT 2
- Get logp from SGMCMC HOT 2
- module 'tensorflow' has no attribute 'make_template' HOT 1
- The examples of ‘semi_supervised_vae’ cannot run successfully HOT 1
- cant install ZhuSuan HOT 4
- AttributeError: module 'tensorflow' has no attribute 'log'
- Examples code is out dated and doesn't work with Tensorflow 2.x HOT 2
- Posterior and parameters analysis HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from zhusuan.