GithubHelp home page GithubHelp logo

Adding RGB Data? about pointnet HOT 10 CLOSED

charlesq34 avatar charlesq34 commented on May 22, 2024
Adding RGB Data?

from pointnet.

Comments (10)

Logrus avatar Logrus commented on May 22, 2024 11

You are correct. Here's how I did it:

def placeholder_inputs_rgb(batch_size, num_point):
    pointclouds_pl = tf.placeholder(tf.float32, shape=(batch_size, num_point, 3))
    pointclouds_rgb_pl = tf.placeholder(tf.float32, shape=(batch_size, num_point, 6))
    labels_pl = tf.placeholder(tf.int32, shape=(batch_size,num_point))
    return pointclouds_pl, pointclouds_rgb_pl, labels_pl

def get_model(point_cloud, point_cloud_rgb, is_training, bn_decay=None):
    """ Segmentation PointNet, input is BxNx3 and BxNx6, output BxNx13 """
    batch_size = point_cloud.get_shape()[0].value
    num_point = point_cloud.get_shape()[1].value
    end_points = {}

    with tf.variable_scope('transform_net1') as sc:
        transform = input_transform_net(point_cloud, is_training, bn_decay, K=3)
    point_cloud_transformed = tf.matmul(point_cloud, transform)
    input_image = tf.expand_dims(point_cloud_transformed, -1)

    net = tf_util.conv2d(input_image, 64, [1,3],
                         padding='VALID', stride=[1,1],
                         bn=True, is_training=is_training,
                         scope='conv1', bn_decay=bn_decay)
    net = tf_util.conv2d(net, 64, [1,1],
                         padding='VALID', stride=[1,1],
                         bn=True, is_training=is_training,
                         scope='conv2', bn_decay=bn_decay)

    with tf.variable_scope('transform_net2') as sc:
        transform = feature_transform_net(net, is_training, bn_decay, K=64)
    end_points['transform'] = transform
    net_transformed = tf.matmul(tf.squeeze(net, axis=2), transform)
    point_feat = tf.expand_dims(net_transformed, [2])

    point_cloud_rgb = tf.expand_dims(point_cloud_rgb, [2])
    concat_rgb = tf.concat(axis=3, values=[point_feat, point_cloud_rgb])

    net = tf_util.conv2d(concat_rgb, 64, [1,1],
                         padding='VALID', stride=[1,1],
                         bn=True, is_training=is_training,
                         scope='conv3', bn_decay=bn_decay)

from pointnet.

charlesq34 avatar charlesq34 commented on May 22, 2024 5

@vaidarnav that's surely a reasonable way to do it! You can experiment with different variations though to see the different and gain more insights.

@Logrus Thanks for sharing your code :) looks good to me

from pointnet.

charlesq34 avatar charlesq34 commented on May 22, 2024

Hi @soulslicer

RGB or normals can be included in the network. The most naive way is to add more channels to input points -- currently it's 3-channel of XYZ, you can make it 9-channel as XYZ, RGB and normal.

From our experience the network is quite robust to inconsistent sampling density :)

from pointnet.

vaidarnav avatar vaidarnav commented on May 22, 2024

I'm also interested in adding rgb and normals to the network. Right now the n-point input has dimension (n X 3 X 1), would we need to change it to (n X 9 X 1)? Or would we add rgb and normal data as separate channels and make it (n X 3 X 3)

from pointnet.

Logrus avatar Logrus commented on May 22, 2024

Hi @vaidarnav, you should add all channels that you want as you described, like (n X 9 X 1). You also need to change the dimension of the first kernel to account for this.

from pointnet.

vaidarnav avatar vaidarnav commented on May 22, 2024

Thanks @Logrus .
Another question I had was about the affine transformation step involving the T-net. As far as I understand, the input of this network should only be the xyz data (of dim n X 3), and the output (dim 3 X 3) should only be multiplied with the xyz data, not affecting the rgb or normal values. Is this correct?

from pointnet.

Logrus avatar Logrus commented on May 22, 2024

@vaidarnav, that is correct. You should have a couple of placeholders (one nX3 for geometry and another nX6 for rgb+normals). Then you apply couple of the transformer networks on geometry and concatenate nX64 with your rgb placeholder and get nX70. After that, the usual architecture follows.

Unfortunately in this repo there's no example of this, but it isn't hard to do.

By the way, for the dataset from paper (building parsing) spacial transformer nets don't help much, but I saw +6% MIoU increase in another dataset where the rotations were very important. So keep that in mind.

from pointnet.

vaidarnav avatar vaidarnav commented on May 22, 2024

So, to be completely clear, you should feed XYZ from "input transform" all the way to "feature transform", and then append RGB/norm data to the output of "feature transform"?
image

from pointnet.

Shanfeng-Hu avatar Shanfeng-Hu commented on May 22, 2024

Hello @charlesq34 , very nice work! I'm wondering is the archirecture (global pooling + local transformation + dimension expansion) able to learn the geodesic distance between any pair of points in a point cloud, i.e., isometric embedding in a parametric manner? You showd a demo of normal vector prediction in the paper, but the success seems almost gauranteed to me since normal is a rather local quantity, while geodesics is way more global? Anyway, I'm not sure about this, and appreciate your comments very much!

from pointnet.

charlesq34 avatar charlesq34 commented on May 22, 2024

@Shanfeng-Hu

Honestly I'm not sure how well the network can perform in learning geodesic distance. It's a very interesting direction though -- I can imagine some siamese network with geodesic loss will be a good starting point for the project :)

from pointnet.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.