GithubHelp home page GithubHelp logo

Comments (7)

meta-inf avatar meta-inf commented on June 12, 2024

Quick questions,

  • which Pyro example are you referring to?
  • What do you mean by "the std of y_mean"?
    • If you are talking about y_logstd (aleatoric uncertainty), in this example (and most BNN codes), it is a hyperparameter which we will optimize, i.e. we do not place prior or conduct posterior inference on it. So it should only appear in the likelihood.
    • If you are talking about Var_{q(w)}[y_mean] (epistemic uncertainty), it's accounted for in q(w) by definition.

from zhusuan.

28huasheng avatar 28huasheng commented on June 12, 2024

Thank u for responsing so fast
i want to count p(y*|x*,w,x,y), include the std and mean of y*. As i know ,normally q(w) is used to tract the likelyhood, so i think should i count y* by q(w)? And i did it by zhusuan just now, seems looks good.
the code here:

def mean_field_variational(x,layer_sizes, n_particles):
    with zs.BayesianNet() as variational:
        ws = []
        for i, (n_in, n_out) in enumerate(zip(layer_sizes[:-1],
                                              layer_sizes[1:])):
            w_mean = tf.get_variable(
                'w_mean_' + str(i), shape=[1, n_out, n_in + 1],
                initializer=tf.constant_initializer(0.))
            w_logstd = tf.get_variable(
                'w_logstd_' + str(i), shape=[1, n_out, n_in + 1],
                initializer=tf.constant_initializer(0.))
            ws.append(
                zs.Normal('w' + str(i), w_mean, logstd=w_logstd,
                          n_samples=n_particles, group_ndims=2))

                # forward
        ly_x = tf.expand_dims(
            tf.tile(tf.expand_dims(x, 0), [n_particles, 1, 1]), 3)
        for i in range(len(ws)):
            w = tf.tile(ws[i], [1, tf.shape(x)[0], 1, 1])
            ly_x = tf.concat(
                [ly_x, tf.ones([n_particles, tf.shape(x)[0], 1, 1])], 2)
            ly_x = tf.matmul(w, ly_x) / tf.sqrt(tf.to_float(tf.shape(ly_x)[2]))
            if i < len(ws) - 1:
                ly_x = tf.nn.relu(ly_x)
        #print("qw ly_xshape")
        #print(ly_x.shape)
        y_mean = tf.squeeze(ly_x, [2, 3])
    return variational, y_mean

variational,y_qw = mean_field_variational(x,layer_sizes, n_particles)

the y_qw is what i want, the shape is [1,5000,200], 5000 is the numbers of samples. And i count std and mean by it ,looks good...

And i tested on pyro by its example https://github.com/uber/pyro/blob/dev/examples/bayesian_regression.py
like this, and i made a bayesian nn

from zhusuan.

meta-inf avatar meta-inf commented on June 12, 2024
  1. Your code seems equivalent to (keeping the graph construction code in the example intact, and do the following for test)
y_mean_val = sess.run(
                    y_mean,
                    feed_dict={n_particles: ll_samples, # insert large value here
                               x: x_test, y: y_test})

which is neater IMO. I would be surprised if they produce different results.

  1. The pyro code corresponds to a Bayesian linear regression model, and I'm not sure when you adapt it to BNNs, you have made sure the modified model is exactly the same to our BNN example. For example, our code uses N(0,1) for weight and biases and scaled the output by 1/sqrt(n_in), while the Pyro example used N(0, 2). Which choice is more appropriate depends on the dataset you are dealing with, and you should invest some effort on model specification *. You should also pay attention to things like optimizer hyperparameters when you change the model.
    If you have controled all variables and still get very different results, there should be a bug in either side (which seems unlikely). Let me know if this is the case.

(*) The more principled approach is to build a hierarchical model. But you still need to think about the hierarchical prior, or do some diagnosis afterwards.

from zhusuan.

28huasheng avatar 28huasheng commented on June 12, 2024

below is the codes of bnn:

X_train, Y_train = Variable(torch.tensor(X)), Variable(torch.tensor(Y)).reshape(200,1)
X_test, Y_test = Variable(torch.tensor(X)), Variable(torch.tensor(Y)).reshape(200,1)
print(X_train.shape,Y_train.shape)
data = torch.cat((X_train, Y_train), 1)

def get_batch_indices(N, batch_size):
    all_batches = np.arange(0, N, batch_size)
    if all_batches[-1] != N:
        all_batches = list(all_batches) + [N]
    return all_batches

class Net(torch.nn.Module):
    def __init__(self, n_feature, n_hidden):
        super(Net, self).__init__()
        self.hidden = torch.nn.Linear(n_feature, n_hidden)   # hidden layer
        self.predict = torch.nn.Linear(n_hidden, 1)   # output layer

    def forward(self, x):
        x = self.hidden(x)
        x = self.predict(x)
        return x

first_layer = len(X_train.data.numpy()[0])
second_layer = 25   
    
softplus = nn.Softplus()
regression_model = Net(first_layer, second_layer)

def model(data):
    mu = Variable(torch.zeros(second_layer, first_layer)).type_as(data)
    sigma = Variable(torch.ones(second_layer, first_layer)).type_as(data)
    bias_mu = Variable(torch.zeros(second_layer)).type_as(data)
    bias_sigma = Variable(torch.ones(second_layer)).type_as(data)
    w_prior, b_prior = Normal(mu, sigma), Normal(bias_mu, bias_sigma)
    
    mu2 = Variable(torch.zeros(1, second_layer)).type_as(data)
    sigma2 = Variable(torch.ones(1, second_layer)).type_as(data)
    bias_mu2 = Variable(torch.zeros(1)).type_as(data)
    bias_sigma2 = Variable(torch.ones(1)).type_as(data)
    w_prior2, b_prior2 = Normal(mu2, sigma2), Normal(bias_mu2, bias_sigma2)    
    
    priors = {'hidden.weight': w_prior, 
              'hidden.bias': b_prior,
              'predict.weight': w_prior2,
              'predict.bias': b_prior2}
    

    lifted_module = pyro.random_module("module", regression_model, priors)

    lifted_reg_model = lifted_module()

    with pyro.iarange("map", N, subsample=data):
        x_data = data[:, :-1]
        y_data = data[:, -1]
        # run the regressor forward conditioned on inputs
        prediction_mean = lifted_reg_model(x_data).squeeze()
        pyro.sample("obs",
                    Normal(prediction_mean, Variable(torch.ones(data.size(0))).type_as(data)),
                    obs=y_data.squeeze())
        

def guide(data):
    #print(data.data)
    w_mu = Variable(torch.randn(second_layer, first_layer).type(torch.float64), requires_grad=True)
    w_log_sig = Variable(0.1 * torch.ones(second_layer, first_layer).type_as(data.data), requires_grad=True)
    b_mu = Variable(torch.randn(second_layer).type_as(data.data), requires_grad=True)
    b_log_sig = Variable(0.1 * torch.ones(second_layer).type_as(data.data), requires_grad=True)
    
    # register learnable params in the param store
    mw_param = pyro.param("guide_mean_weight", w_mu)
    sw_param = softplus(pyro.param("guide_log_sigma_weight", w_log_sig))
    mb_param = pyro.param("guide_mean_bias", b_mu)
    sb_param = softplus(pyro.param("guide_log_sigma_bias", b_log_sig))
    
    # gaussian guide distributions for w and b
    w_dist = Normal(mw_param, sw_param)
    b_dist = Normal(mb_param, sb_param)
    
    w_mu2 = Variable(torch.randn(1, second_layer).type_as(data.data), requires_grad=True)
    w_log_sig2 = Variable(0.1 * torch.randn(1, second_layer).type_as(data.data), requires_grad=True)
    b_mu2 = Variable(torch.randn(1).type_as(data.data), requires_grad=True)
    b_log_sig2 = Variable(0.1 * torch.ones(1).type_as(data.data), requires_grad=True)
    
    # register learnable params in the param store
    mw_param2 = pyro.param("guide_mean_weight2", w_mu2)
    sw_param2 = softplus(pyro.param("guide_log_sigma_weight2", w_log_sig2))
    mb_param2 = pyro.param("guide_mean_bias2", b_mu2)
    sb_param2 = softplus(pyro.param("guide_log_sigma_bias2", b_log_sig2))
    
    # gaussian guide distributions for w and b
    w_dist2 = Normal(mw_param2, sw_param2)
    b_dist2 = Normal(mb_param2, sb_param2)
      
    dists = {'hidden.weight': w_dist, 
              'hidden.bias': b_dist,
              'predict.weight': w_dist2,
              'predict.bias': b_dist2}
    
    # overloading the parameters in the module with random samples from the guide distributions
    lifted_module = pyro.random_module("module", regression_model, dists)
    # sample a regressor
    return lifted_module()

# instantiate optim and inference objects
optim = Adam({"lr": 0.001})
elbo = Trace_ELBO()
svi = SVI(model, guide, optim, loss=elbo)

N = len(X_train)

for j in range(10000):
    epoch_loss = 0.0
    perm = torch.randperm(N)
    # shuffle data
    data = data[perm]
    # get indices of each batch
    all_batches = get_batch_indices(N, 64)
    for ix, batch_start in enumerate(all_batches[:-1]):
        batch_end = all_batches[ix + 1]
        batch_data = data[batch_start: batch_end]        
        epoch_loss += svi.step(batch_data)
    if j % 100 == 0:
        print(j, "avg loss {}".format(epoch_loss/float(N)))

preds = []
for i in range(10000):
    sampled_reg_model = guide(X_test)
    pred = sampled_reg_model(X_test).data.numpy().flatten()
    preds.append(pred)

preds = np.array(preds)
mean = np.mean(preds, axis=0)
std = np.std(preds, axis=0)/10 
y_test = Y_test.data.numpy()
x = np.arange(len(y_test))



plt.xlim((0, y_test.shape[0]))
sm = np.array([x for x in range(y_test.shape[0])])
plt.plot(sm, y_test[:y_test.shape[0]])
plt.fill_between(x, mean-std, mean+std, alpha = 0.3, color = 'orange')
plt.show()

from zhusuan.

28huasheng avatar 28huasheng commented on June 12, 2024

And this is the codes i uses zhusuan now

def main():
    #tf.set_random_seed(1237)
    #np.random.seed(1234)

    # Load UCI Boston housing data
    #data_path = os.path.join(conf.data_dir, 'housing.data')
    #x_train, y_train, x_valid, y_valid, x_test, y_test = \
        #dataset.load_uci_boston_housing(data_path)
    #x_train = np.vstack([x_train, x_valid])
    #y_train = np.hstack([y_train, y_valid])
    x_train,y_train = rollout(policy=random_policy, timesteps=STEP)

    for i in range(1,50):
        X_, Y_ = rollout(policy=random_policy, timesteps=STEP)
        x_train = np.vstack((x_train, X_)).astype('float32')
        y_train = np.vstack((y_train, Y_)).astype('float32')
    N, n_x = x_train.shape

    x_test,y_test1 = rollout(policy=random_policy, timesteps=STEP)

    for i in range(1,50):
        X_, Y_ = rollout(policy=random_policy, timesteps=STEP)
        x_test = np.vstack((x_test, X_)).astype('float32')
        y_test1 = np.vstack((y_test1, Y_)).astype('float32')

    # Standardize data
    x_train, x_test, _, _ = dataset.standardize(x_train, x_test)
    y_train, y_test, mean_y_train, std_y_train = dataset.standardize(
        y_train, y_test1)
    #print(x_train.shape)
    #print(y_train.shape)
    #print(mean_y_train)
    #print(std_y_train)
    # Define model parameters
    #std_y_train = np.std(y_train)
    y_train = y_train.reshape(-1)
    y_test = y_test.reshape(-1)
    #print(y_train.shape)
    n_hiddens = [50]
    print(N,n_x)
    # Build the computation graph
    n_particles = tf.placeholder(tf.int32, shape=[], name='n_particles')
    x = tf.placeholder(tf.float32, shape=[None, n_x])
    y = tf.placeholder(tf.float32, shape=[None])
    layer_sizes = [n_x] + n_hiddens + [1]
    print(layer_sizes)
    w_names = ['w' + str(i) for i in range(len(layer_sizes) - 1)]

    def log_joint(observed):
        model, _ = bayesianNN(observed, x, n_x, layer_sizes, n_particles)
        log_pws = model.local_log_prob(w_names)
        log_py_xw = model.local_log_prob('y')
        return tf.add_n(log_pws) + log_py_xw * N

    variational,y_qw = mean_field_variational(x,layer_sizes, n_particles)
    qw_outputs = variational.query(w_names, outputs=True, local_log_prob=True)
    latent = dict(zip(w_names, qw_outputs))
    lower_bound = zs.variational.elbo(
        log_joint, observed={'y': y}, latent=latent, axis=0)
    cost = tf.reduce_mean(lower_bound.sgvb())
    lower_bound = tf.reduce_mean(lower_bound)

    optimizer = tf.train.AdamOptimizer(learning_rate=0.001)
    infer_op = optimizer.minimize(cost)

    # prediction: rmse & log likelihood
    observed = dict((w_name, latent[w_name][0]) for w_name in w_names)
    observed.update({'y': y})
    model, y_mean = bayesianNN(observed, x, n_x, layer_sizes, n_particles)
    #y_output = variational.outputs('y')
    y_pred = tf.reduce_mean(y_mean, 0)
    rmse = tf.sqrt(tf.reduce_mean((y_pred - y) ** 2)) * std_y_train
    log_py_xw = model.local_log_prob('y')
    log_likelihood = tf.reduce_mean(zs.log_mean_exp(log_py_xw, 0)) - \
        tf.log(std_y_train)

    # Define training/evaluation parameters
    lb_samples = 10
    ll_samples = 5000
    epochs = 500
    batch_size = 10
    iters = int(np.floor(x_train.shape[0] / float(batch_size)))
    test_freq = 10



    # Run the inference
    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())
        for epoch in range(1, epochs + 1):
            lbs = []
            for t in range(iters):
                x_batch = x_train[t * batch_size:(t + 1) * batch_size]
                #print(x_batch.shape)
                y_batch = y_train[t * batch_size:(t + 1) * batch_size]
                #print(y_batch.shape)
                _, lb,cost_ = sess.run(
                    [infer_op, lower_bound,cost],
                    feed_dict={n_particles: lb_samples,
                               x: x_batch, y: y_batch})
                lbs.append(lb)
            print('Epoch {}: Lower bound = {}:cost={}'.format(epoch, np.mean(lbs),cost_))

            if epoch % test_freq == 0:
                test_lb, test_rmse, test_ll = sess.run(
                    [lower_bound, rmse, log_likelihood],
                    feed_dict={n_particles: ll_samples,
                               x: x_test, y: y_test})
                print('>> TEST')
                print('>> Test lower bound = {}, rmse = {}, log_likelihood = {}'
                      .format(test_lb, test_rmse, test_ll))
        y_pred1 = []
        y_pred_ = sess.run([y_qw],feed_dict={n_particles: ll_samples,x: x_test, y: y_test})
        y_pred_ = np.array(y_pred_[0])
        preds = np.array(y_pred_*std_y_train+mean_y_train)
        mean = np.mean(preds,axis=0)
        print(mean.shape)
        std = np.std(preds,axis = 0)
        print(std.shape)
        #for i in range(100):
            #y_pred_ = sess.run([y_qw],feed_dict={n_particles: ll_samples,x: x_test, y: y_test})
            #y_pred_ = np.array(y_pred_[0]).reshape(-1,1)
            #y_pred1.append(y_pred_)
            #print(np.array(y_pred_).shape)

            
        #print(y_pred1.shape)
        #preds = np.array(y_pred1*std_y_train+mean_y_train)
        #print(preds.shape)
        #mean = np.mean(preds, axis=0).reshape(200)
        #std = np.std(preds, axis=0).reshape(200)
        x = np.arange(200).reshape(200)
        #print(y_test.shape)
        #print(preds.shape)
        #print(mean.shape)
        #print(std)
        #print(x.shape)
        #plt.figure()
        #plt.plot(data[:,0:-1])
        #plt.plot(data[:,-1], linestyle = '--')
        #plt.show()

        plt.figure()
        plt.plot(x, y_test1)
        plt.plot(x, mean, linestyle = '--')
        #plt.plot(x, mean-std, linestyle = '--')
        #plt.plot(x, mean+std, linestyle = '--')
        plt.fill_between(x, mean-std, mean+std, alpha = 0.3, color = 'orange')
        plt.show()

from zhusuan.

28huasheng avatar 28huasheng commented on June 12, 2024
  1. Your code seems equivalent to (keeping the graph construction code in the example intact, and do the following for test)
y_mean_val = sess.run(
                    y_mean,
                    feed_dict={n_particles: ll_samples, # insert large value here
                               x: x_test, y: y_test})

which is neater IMO. I would be surprised if they produce different results.

  1. The pyro code corresponds to a Bayesian linear regression model, and I'm not sure when you adapt it to BNNs, you have made sure the modified model is exactly the same to our BNN example. For example, our code uses N(0,1) for weight and biases and scaled the output by 1/sqrt(n_in), while the Pyro example used N(0, 2). Which choice is more appropriate depends on the dataset you are dealing with, and you should invest some effort on model specification *. You should also pay attention to things like optimizer hyperparameters when you change the model.
    If you have controled all variables and still get very different results, there should be a bug in either side (which seems unlikely). Let me know if this is the case.

(*) The more principled approach is to build a hierarchical model. But you still need to think about the hierarchical prior, or do some diagnosis afterwards.

I changed the code, not only reuse it, original code is

def mean_field_variational(x,layer_sizes, n_particles):
    with zs.BayesianNet() as variational:
    ws = []
    for i, (n_in, n_out) in enumerate(zip(layer_sizes[:-1], layer_sizes[1:])):
        w_mean = tf.get_variable('w_mean_' + str(i), shape=[1, n_out, n_in + 1],
                                 initializer=tf.constant_initializer(0.))
        w_logstd = tf.get_variable('w_logstd_' + str(i), shape=[1, n_out, n_in + 1],
                                   initializer=tf.constant_initializer(0.))
        ws.append(zs.Normal('w' + str(i), w_mean, logstd=w_logstd,
                  n_samples=n_particles, group_ndims=2))
    return variational

without forward in mean_field_variational, and i add forward parts

from zhusuan.

meta-inf avatar meta-inf commented on June 12, 2024

It seems your torch code implements a deep linear network, while the Zhusuan model has ReLU nonlinearity. In that case you should certainly expect different results...

from zhusuan.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.