Hi, Thank u for your great share first! And when i test the examplesayesian_neural_n

Quick questions, which Pyro example are you referring to?

below is the codes of bnn: <div class="highlight highlight-source-python notransla

And this is the codes i uses zhusuan now <div class="highlight highlight-source-py

Why the std of y_mean is so small? about zhusuan HOT 7 CLOSED

28huasheng commented on June 12, 2024

Why the std of y_mean is so small?

from zhusuan.

Comments (7)

meta-inf commented on June 12, 2024

Quick questions,

which Pyro example are you referring to?
What do you mean by "the std of y_mean"?
- If you are talking about y_logstd (aleatoric uncertainty), in this example (and most BNN codes), it is a hyperparameter which we will optimize, i.e. we do not place prior or conduct posterior inference on it. So it should only appear in the likelihood.
- If you are talking about Var_{q(w)}[y_mean] (epistemic uncertainty), it's accounted for in q(w) by definition.

from zhusuan.

28huasheng commented on June 12, 2024

Thank u for responsing so fast
i want to count p(y*|x*,w,x,y), include the std and mean of y*. As i know ,normally q(w) is used to tract the likelyhood, so i think should i count y* by q(w)? And i did it by zhusuan just now, seems looks good.
the code here:

def mean_field_variational(x,layer_sizes, n_particles):
    with zs.BayesianNet() as variational:
        ws = []
        for i, (n_in, n_out) in enumerate(zip(layer_sizes[:-1],
                                              layer_sizes[1:])):
            w_mean = tf.get_variable(
                'w_mean_' + str(i), shape=[1, n_out, n_in + 1],
                initializer=tf.constant_initializer(0.))
            w_logstd = tf.get_variable(
                'w_logstd_' + str(i), shape=[1, n_out, n_in + 1],
                initializer=tf.constant_initializer(0.))
            ws.append(
                zs.Normal('w' + str(i), w_mean, logstd=w_logstd,
                          n_samples=n_particles, group_ndims=2))

                # forward
        ly_x = tf.expand_dims(
            tf.tile(tf.expand_dims(x, 0), [n_particles, 1, 1]), 3)
        for i in range(len(ws)):
            w = tf.tile(ws[i], [1, tf.shape(x)[0], 1, 1])
            ly_x = tf.concat(
                [ly_x, tf.ones([n_particles, tf.shape(x)[0], 1, 1])], 2)
            ly_x = tf.matmul(w, ly_x) / tf.sqrt(tf.to_float(tf.shape(ly_x)[2]))
            if i < len(ws) - 1:
                ly_x = tf.nn.relu(ly_x)
        #print("qw ly_xshape")
        #print(ly_x.shape)
        y_mean = tf.squeeze(ly_x, [2, 3])
    return variational, y_mean

variational,y_qw = mean_field_variational(x,layer_sizes, n_particles)

the y_qw is what i want, the shape is [1,5000,200], 5000 is the numbers of samples. And i count std and mean by it ,looks good...

And i tested on pyro by its example https://github.com/uber/pyro/blob/dev/examples/bayesian_regression.py
like this, and i made a bayesian nn

from zhusuan.

meta-inf commented on June 12, 2024

Your code seems equivalent to (keeping the graph construction code in the example intact, and do the following for test)

y_mean_val = sess.run(
                    y_mean,
                    feed_dict={n_particles: ll_samples, # insert large value here
                               x: x_test, y: y_test})

which is neater IMO. I would be surprised if they produce different results.

The pyro code corresponds to a Bayesian linear regression model, and I'm not sure when you adapt it to BNNs, you have made sure the modified model is exactly the same to our BNN example. For example, our code uses N(0,1) for weight and biases and scaled the output by 1/sqrt(n_in), while the Pyro example used N(0, 2). Which choice is more appropriate depends on the dataset you are dealing with, and you should invest some effort on model specification *. You should also pay attention to things like optimizer hyperparameters when you change the model.
If you have controled all variables and still get very different results, there should be a bug in either side (which seems unlikely). Let me know if this is the case.

(*) The more principled approach is to build a hierarchical model. But you still need to think about the hierarchical prior, or do some diagnosis afterwards.

from zhusuan.

28huasheng commented on June 12, 2024

below is the codes of bnn:

X_train, Y_train = Variable(torch.tensor(X)), Variable(torch.tensor(Y)).reshape(200,1)
X_test, Y_test = Variable(torch.tensor(X)), Variable(torch.tensor(Y)).reshape(200,1)
print(X_train.shape,Y_train.shape)
data = torch.cat((X_train, Y_train), 1)

def get_batch_indices(N, batch_size):
    all_batches = np.arange(0, N, batch_size)
    if all_batches[-1] != N:
        all_batches = list(all_batches) + [N]
    return all_batches

class Net(torch.nn.Module):
    def __init__(self, n_feature, n_hidden):
        super(Net, self).__init__()
        self.hidden = torch.nn.Linear(n_feature, n_hidden)   # hidden layer
        self.predict = torch.nn.Linear(n_hidden, 1)   # output layer

    def forward(self, x):
        x = self.hidden(x)
        x = self.predict(x)
        return x

first_layer = len(X_train.data.numpy()[0])
second_layer = 25   
    
softplus = nn.Softplus()
regression_model = Net(first_layer, second_layer)

def model(data):
    mu = Variable(torch.zeros(second_layer, first_layer)).type_as(data)
    sigma = Variable(torch.ones(second_layer, first_layer)).type_as(data)
    bias_mu = Variable(torch.zeros(second_layer)).type_as(data)
    bias_sigma = Variable(torch.ones(second_layer)).type_as(data)
    w_prior, b_prior = Normal(mu, sigma), Normal(bias_mu, bias_sigma)
    
    mu2 = Variable(torch.zeros(1, second_layer)).type_as(data)
    sigma2 = Variable(torch.ones(1, second_layer)).type_as(data)
    bias_mu2 = Variable(torch.zeros(1)).type_as(data)
    bias_sigma2 = Variable(torch.ones(1)).type_as(data)
    w_prior2, b_prior2 = Normal(mu2, sigma2), Normal(bias_mu2, bias_sigma2)    
    
    priors = {'hidden.weight': w_prior, 
              'hidden.bias': b_prior,
              'predict.weight': w_prior2,
              'predict.bias': b_prior2}
    

    lifted_module = pyro.random_module("module", regression_model, priors)

    lifted_reg_model = lifted_module()

    with pyro.iarange("map", N, subsample=data):
        x_data = data[:, :-1]
        y_data = data[:, -1]
        # run the regressor forward conditioned on inputs
        prediction_mean = lifted_reg_model(x_data).squeeze()
        pyro.sample("obs",
                    Normal(prediction_mean, Variable(torch.ones(data.size(0))).type_as(data)),
                    obs=y_data.squeeze())
        

def guide(data):
    #print(data.data)
    w_mu = Variable(torch.randn(second_layer, first_layer).type(torch.float64), requires_grad=True)
    w_log_sig = Variable(0.1 * torch.ones(second_layer, first_layer).type_as(data.data), requires_grad=True)
    b_mu = Variable(torch.randn(second_layer).type_as(data.data), requires_grad=True)
    b_log_sig = Variable(0.1 * torch.ones(second_layer).type_as(data.data), requires_grad=True)
    
    # register learnable params in the param store
    mw_param = pyro.param("guide_mean_weight", w_mu)
    sw_param = softplus(pyro.param("guide_log_sigma_weight", w_log_sig))
    mb_param = pyro.param("guide_mean_bias", b_mu)
    sb_param = softplus(pyro.param("guide_log_sigma_bias", b_log_sig))
    
    # gaussian guide distributions for w and b
    w_dist = Normal(mw_param, sw_param)
    b_dist = Normal(mb_param, sb_param)
    
    w_mu2 = Variable(torch.randn(1, second_layer).type_as(data.data), requires_grad=True)
    w_log_sig2 = Variable(0.1 * torch.randn(1, second_layer).type_as(data.data), requires_grad=True)
    b_mu2 = Variable(torch.randn(1).type_as(data.data), requires_grad=True)
    b_log_sig2 = Variable(0.1 * torch.ones(1).type_as(data.data), requires_grad=True)
    
    # register learnable params in the param store
    mw_param2 = pyro.param("guide_mean_weight2", w_mu2)
    sw_param2 = softplus(pyro.param("guide_log_sigma_weight2", w_log_sig2))
    mb_param2 = pyro.param("guide_mean_bias2", b_mu2)
    sb_param2 = softplus(pyro.param("guide_log_sigma_bias2", b_log_sig2))
    
    # gaussian guide distributions for w and b
    w_dist2 = Normal(mw_param2, sw_param2)
    b_dist2 = Normal(mb_param2, sb_param2)
      
    dists = {'hidden.weight': w_dist, 
              'hidden.bias': b_dist,
              'predict.weight': w_dist2,
              'predict.bias': b_dist2}
    
    # overloading the parameters in the module with random samples from the guide distributions
    lifted_module = pyro.random_module("module", regression_model, dists)
    # sample a regressor
    return lifted_module()

# instantiate optim and inference objects
optim = Adam({"lr": 0.001})
elbo = Trace_ELBO()
svi = SVI(model, guide, optim, loss=elbo)

N = len(X_train)

for j in range(10000):
    epoch_loss = 0.0
    perm = torch.randperm(N)
    # shuffle data
    data = data[perm]
    # get indices of each batch
    all_batches = get_batch_indices(N, 64)
    for ix, batch_start in enumerate(all_batches[:-1]):
        batch_end = all_batches[ix + 1]
        batch_data = data[batch_start: batch_end]        
        epoch_loss += svi.step(batch_data)
    if j % 100 == 0:
        print(j, "avg loss {}".format(epoch_loss/float(N)))

preds = []
for i in range(10000):
    sampled_reg_model = guide(X_test)
    pred = sampled_reg_model(X_test).data.numpy().flatten()
    preds.append(pred)

preds = np.array(preds)
mean = np.mean(preds, axis=0)
std = np.std(preds, axis=0)/10 
y_test = Y_test.data.numpy()
x = np.arange(len(y_test))



plt.xlim((0, y_test.shape[0]))
sm = np.array([x for x in range(y_test.shape[0])])
plt.plot(sm, y_test[:y_test.shape[0]])
plt.fill_between(x, mean-std, mean+std, alpha = 0.3, color = 'orange')
plt.show()

from zhusuan.

28huasheng commented on June 12, 2024

And this is the codes i uses zhusuan now

def main():
    #tf.set_random_seed(1237)
    #np.random.seed(1234)

    # Load UCI Boston housing data
    #data_path = os.path.join(conf.data_dir, 'housing.data')
    #x_train, y_train, x_valid, y_valid, x_test, y_test = \
        #dataset.load_uci_boston_housing(data_path)
    #x_train = np.vstack([x_train, x_valid])
    #y_train = np.hstack([y_train, y_valid])
    x_train,y_train = rollout(policy=random_policy, timesteps=STEP)

    for i in range(1,50):
        X_, Y_ = rollout(policy=random_policy, timesteps=STEP)
        x_train = np.vstack((x_train, X_)).astype('float32')
        y_train = np.vstack((y_train, Y_)).astype('float32')
    N, n_x = x_train.shape

    x_test,y_test1 = rollout(policy=random_policy, timesteps=STEP)

    for i in range(1,50):
        X_, Y_ = rollout(policy=random_policy, timesteps=STEP)
        x_test = np.vstack((x_test, X_)).astype('float32')
        y_test1 = np.vstack((y_test1, Y_)).astype('float32')

    # Standardize data
    x_train, x_test, _, _ = dataset.standardize(x_train, x_test)
    y_train, y_test, mean_y_train, std_y_train = dataset.standardize(
        y_train, y_test1)
    #print(x_train.shape)
    #print(y_train.shape)
    #print(mean_y_train)
    #print(std_y_train)
    # Define model parameters
    #std_y_train = np.std(y_train)
    y_train = y_train.reshape(-1)
    y_test = y_test.reshape(-1)
    #print(y_train.shape)
    n_hiddens = [50]
    print(N,n_x)
    # Build the computation graph
    n_particles = tf.placeholder(tf.int32, shape=[], name='n_particles')
    x = tf.placeholder(tf.float32, shape=[None, n_x])
    y = tf.placeholder(tf.float32, shape=[None])
    layer_sizes = [n_x] + n_hiddens + [1]
    print(layer_sizes)
    w_names = ['w' + str(i) for i in range(len(layer_sizes) - 1)]

    def log_joint(observed):
        model, _ = bayesianNN(observed, x, n_x, layer_sizes, n_particles)
        log_pws = model.local_log_prob(w_names)
        log_py_xw = model.local_log_prob('y')
        return tf.add_n(log_pws) + log_py_xw * N

    variational,y_qw = mean_field_variational(x,layer_sizes, n_particles)
    qw_outputs = variational.query(w_names, outputs=True, local_log_prob=True)
    latent = dict(zip(w_names, qw_outputs))
    lower_bound = zs.variational.elbo(
        log_joint, observed={'y': y}, latent=latent, axis=0)
    cost = tf.reduce_mean(lower_bound.sgvb())
    lower_bound = tf.reduce_mean(lower_bound)

    optimizer = tf.train.AdamOptimizer(learning_rate=0.001)
    infer_op = optimizer.minimize(cost)

    # prediction: rmse & log likelihood
    observed = dict((w_name, latent[w_name][0]) for w_name in w_names)
    observed.update({'y': y})
    model, y_mean = bayesianNN(observed, x, n_x, layer_sizes, n_particles)
    #y_output = variational.outputs('y')
    y_pred = tf.reduce_mean(y_mean, 0)
    rmse = tf.sqrt(tf.reduce_mean((y_pred - y) ** 2)) * std_y_train
    log_py_xw = model.local_log_prob('y')
    log_likelihood = tf.reduce_mean(zs.log_mean_exp(log_py_xw, 0)) - \
        tf.log(std_y_train)

    # Define training/evaluation parameters
    lb_samples = 10
    ll_samples = 5000
    epochs = 500
    batch_size = 10
    iters = int(np.floor(x_train.shape[0] / float(batch_size)))
    test_freq = 10



    # Run the inference
    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())
        for epoch in range(1, epochs + 1):
            lbs = []
            for t in range(iters):
                x_batch = x_train[t * batch_size:(t + 1) * batch_size]
                #print(x_batch.shape)
                y_batch = y_train[t * batch_size:(t + 1) * batch_size]
                #print(y_batch.shape)
                _, lb,cost_ = sess.run(
                    [infer_op, lower_bound,cost],
                    feed_dict={n_particles: lb_samples,
                               x: x_batch, y: y_batch})
                lbs.append(lb)
            print('Epoch {}: Lower bound = {}:cost={}'.format(epoch, np.mean(lbs),cost_))

            if epoch % test_freq == 0:
                test_lb, test_rmse, test_ll = sess.run(
                    [lower_bound, rmse, log_likelihood],
                    feed_dict={n_particles: ll_samples,
                               x: x_test, y: y_test})
                print('>> TEST')
                print('>> Test lower bound = {}, rmse = {}, log_likelihood = {}'
                      .format(test_lb, test_rmse, test_ll))
        y_pred1 = []
        y_pred_ = sess.run([y_qw],feed_dict={n_particles: ll_samples,x: x_test, y: y_test})
        y_pred_ = np.array(y_pred_[0])
        preds = np.array(y_pred_*std_y_train+mean_y_train)
        mean = np.mean(preds,axis=0)
        print(mean.shape)
        std = np.std(preds,axis = 0)
        print(std.shape)
        #for i in range(100):
            #y_pred_ = sess.run([y_qw],feed_dict={n_particles: ll_samples,x: x_test, y: y_test})
            #y_pred_ = np.array(y_pred_[0]).reshape(-1,1)
            #y_pred1.append(y_pred_)
            #print(np.array(y_pred_).shape)

            
        #print(y_pred1.shape)
        #preds = np.array(y_pred1*std_y_train+mean_y_train)
        #print(preds.shape)
        #mean = np.mean(preds, axis=0).reshape(200)
        #std = np.std(preds, axis=0).reshape(200)
        x = np.arange(200).reshape(200)
        #print(y_test.shape)
        #print(preds.shape)
        #print(mean.shape)
        #print(std)
        #print(x.shape)
        #plt.figure()
        #plt.plot(data[:,0:-1])
        #plt.plot(data[:,-1], linestyle = '--')
        #plt.show()

        plt.figure()
        plt.plot(x, y_test1)
        plt.plot(x, mean, linestyle = '--')
        #plt.plot(x, mean-std, linestyle = '--')
        #plt.plot(x, mean+std, linestyle = '--')
        plt.fill_between(x, mean-std, mean+std, alpha = 0.3, color = 'orange')
        plt.show()

from zhusuan.

28huasheng commented on June 12, 2024

Your code seems equivalent to (keeping the graph construction code in the example intact, and do the following for test)
y_mean_val = sess.run(
                    y_mean,
                    feed_dict={n_particles: ll_samples, # insert large value here
                               x: x_test, y: y_test})
which is neater IMO. I would be surprised if they produce different results.

The pyro code corresponds to a Bayesian linear regression model, and I'm not sure when you adapt it to BNNs, you have made sure the modified model is exactly the same to our BNN example. For example, our code uses N(0,1) for weight and biases and scaled the output by 1/sqrt(n_in), while the Pyro example used N(0, 2). Which choice is more appropriate depends on the dataset you are dealing with, and you should invest some effort on model specification *. You should also pay attention to things like optimizer hyperparameters when you change the model.
If you have controled all variables and still get very different results, there should be a bug in either side (which seems unlikely). Let me know if this is the case.

(*) The more principled approach is to build a hierarchical model. But you still need to think about the hierarchical prior, or do some diagnosis afterwards.

I changed the code, not only reuse it, original code is

def mean_field_variational(x,layer_sizes, n_particles):
    with zs.BayesianNet() as variational:
    ws = []
    for i, (n_in, n_out) in enumerate(zip(layer_sizes[:-1], layer_sizes[1:])):
        w_mean = tf.get_variable('w_mean_' + str(i), shape=[1, n_out, n_in + 1],
                                 initializer=tf.constant_initializer(0.))
        w_logstd = tf.get_variable('w_logstd_' + str(i), shape=[1, n_out, n_in + 1],
                                   initializer=tf.constant_initializer(0.))
        ws.append(zs.Normal('w' + str(i), w_mean, logstd=w_logstd,
                  n_samples=n_particles, group_ndims=2))
    return variational

without forward in mean_field_variational, and i add forward parts

from zhusuan.

meta-inf commented on June 12, 2024

It seems your torch code implements a deep linear network, while the Zhusuan model has ReLU nonlinearity. In that case you should certainly expect different results...

from zhusuan.

Why the std of y_mean is so small? about zhusuan HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs