GithubHelp home page GithubHelp logo

charlesq34 / pointnet Goto Github PK

View Code? Open in Web Editor NEW
4.6K 4.6K 1.4K 532 KB

PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation

License: Other

Python 99.68% Shell 0.32%
classification geometry-processing neural-network point-cloud segmentation tensorflow

pointnet's People

Contributors

barryridge avatar charlesq34 avatar daerduocarey avatar tangchengcheng avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pointnet's Issues

Windows compatibility

Hello,

I tried to use PointNet in Windows, but it's failed to run in below two lines:

os.system('cp %s %s' % (MODEL_FILE, LOG_DIR)) # bkp of model def
os.system('cp train.py %s' % (LOG_DIR)) # bkp of train procedure

So I changed os.system function calls to shutil's copy2 function.

import shutil
shutil.copy2(MODEL_FILE, LOG_DIR)
shutil.copy2('train.py', LOG_DIR)

After that, it's working nicely in windows.

BTW, great work!

What would be the optimised nn layer num_output_channels for 8192 points?

Hi Charles,
I'm training pointnet with 8192 points per cloud. Should the neural net tensors be modified to a specific size. i.e Currently the nn's output channles are 64 output (1,3) stride ,64(1,1),128(1,1),1024 and these are optimised for a max of 2048 points I guess.
What would be the optimised nn layer num_output_channels for 8192 points?

Detection pipeline for scene segmentation

Hi Charles,
How exactly are you building the graph for BFS search to get the connected components in the detection pipeline? I could not find the code for detection pipeline in the repo. Are you planning to add it in near future? Thanks.

how to download data?

instruction said that: Point clouds of ModelNet40 models in HDF5 files will be automatically downloaded (416MB) to the data folder,but it didn't happened to me.
I download the ModelNet40 data from the link but it was in .off format. I'm sorry because i'm new to python and tensorflow.
environments are: windows 7,python 3.5.3 , tensorflow1.1.0

Cannot run train.py in part_seg

Hi,

I'm currently running the PointNet code. However, I cannot run train.py in part_seg as mentioned in the "README". Also there seems to be bugs in the code -- I myself fixed some. Is this code the final one? It seems like this is not the complete or even wrong version.

Thanks,
Yiru

Sem Seg: Reason for reducing xyz values by xyz_min in `indoor3d_util` for data preprocessing

Why do we reduce the xyz values by xyz_min during data pre-processing for Semantic Segmentation? The following link points to the lines I am referring to:

https://github.com/charlesq34/pointnet/blob/master/sem_seg/indoor3d_util.py#L52-L62

There is a note in the method which states:

the points are shifted before save, the most negative point is now at origin.

What does this mean? and Why do we need to do it? Do I need to do the same if I am preparing my own data?

Confusion with pointnet_cls.py

In pointnet_cls.py why have you expanded the dimension of the transformed input point cloud?

` def get_model(point_cloud, is_training, bn_decay=None):
""" Classification PointNet, input is BxNx3, output Bx40 """
batch_size = point_cloud.get_shape()[0].value
num_point = point_cloud.get_shape()[1].value
end_points = {}

with tf.variable_scope('transform_net1') as sc:
    transform = input_transform_net(point_cloud, is_training, bn_decay, K=3)
point_cloud_transformed = tf.matmul(point_cloud, transform)
input_image = tf.expand_dims(point_cloud_transformed, -1)

net = tf_util.conv2d(input_image, 64, [1,3],
                     padding='VALID', stride=[1,1],
                     bn=True, is_training=is_training,
                     scope='conv1', bn_decay=bn_decay)`

Input Ranges

Hi there,

I have a question regarding the ranges in the input. According to the paper, the proof that any function f can be approximated by the pointnet network uses the assumption that all the inputs are in the range [0,1]. I want to ask for some clarification regarding this:

  1. For the object classification take where you uniformly sample 1024 points and "normalize into a unit sphere", does this mean that the sphere is centered at (0.5,0.5) with a diameter of 1 in order to maintain the [0,1] input range?

  2. For the scene segmentation application, the 9-dim vector, is X,Y,Z also in the range [0,1]? Because according to this (#7) post, X,Y,Z is centered at (0,0), meaning that negative coordinates were also taken as input?

  3. Does this input range mean that the data is not zero-centered?

Thank you!

Adding RGB Data?

Are there any plans to incorporate RGB information into the network. Have you tried to incorporate normal data into the network, rather than it predicting it?

Also, does your algorithm work on inconsistent sampling? For example, if i have a box point cloud, but the areas with lower curvature (flat parts) are downsampled much more than those with more curvature (intersection points) in order to conserve space/memory

pointnet/part_seg/

Hi,I have trained segmentation network with default batch size 32,point number 2048,epoch 50,However .Test's Iou is 83.00 lower than 83.7 as shown in paper,Is there some advance to enhance it to correspond with 83.7 or reasons resulted in lower Iou.
Thanks

Semantic segmentation : How to generate my own room_filelist.txt

Hi, I want to generate my own room_filelist.txt that was generated by the gen_indoor3d_h5.py. I have already run gen_indoor3d_h5.py but the room_filelist.txt has been deleted so i won't re-run gen_indoor3d_h5.py to get the room_filelist because I have too big data and it will take 4 hours to run gen_indoor3d_h5.py again. So is there a way to get it without running gen_indoor3d_h5.py?
Note: I use my own dataset not the Stanford3dDataset

Accuracy not consistent?

after traing, I got 88.65% accuracy and 85.62% for avg class accuracy. why it's not consistent with 89.2% and 86.2% as paper suggested?

object segmentaion

Hello Charles,

I am trying to use raw point cloud data(pcd file format). My goal is to segment random object from the table top scene.
How can I process the data set and visualize(view) the point cloud data which is segmented?
Thank you in advanced!

Trained models

It would be quite useful if the trained models are also added. I was trying to train for part segmentation and quickly ran out of memory.

Network architecture inconsistent with paper

hello, in your paper, supplementary C it says:

As to semantic segmentation task, we used the architecture as in Fig 2 in the main paper.

but in pointnet/sem_seg/model.py, get_model function, I found some inconsistency such as there is no T-Net implemented, and the concatenated layer comes from another two layer instead of the layers marked by dotted arrow of Fig 2. Is this a mistake or i got something wrong?

by the way , I want to ask that how did you came up the idea of T-Net? Is there any reason about that? Is the T-Net in relation to some rigid transformation of the point cloud? Because it seems like a magic to me.

low accuracy when batch_size = 1 or 2

HI Charles,
Thank you for this great work.
I found something very strange: in the classification task (train.py), the training and testing accuracy are very low (around 0.1) when i set the batch size to 1 or 2. Do you have any idea why this happens?

I want to use batch size 1 because I want to process point clouds with varying number of points.

Best,

Can PointNet deal with varyinig densities?

In my application, the measured points are not equally spaced. Some regions are empty and then there are some regions with rather high density. Is pointnet able to deal with this?
Here is an example image for which I would like to do semantic segmentation:
image

about training date generation pipeline

Hello.Thanks for your excellent job. I read your code and do not find a pipeline demo on how to convert the modelnet40 files (.off file) into hdf5 files, though the functions in utils are provided. Could you please enrich your document or code with such demos. I'll be appreciated if you could reply soon.

The fluctuation of the performance during training

Hi,
Thanks for sharing your amazing work! I have a quick question. I have trained several trails of the point net on the ModelNet 40 dataset with all the default parameters in the code. I saw quite big performance fluctuation during the training, so it's a bit hard to pinpoint when the model converges. So is there any way to alleviate this performance fluctuation problem? Thanks

Sincerely,
Maggie

Could this code be modified for flexible number of points

Hi Charles, thanks for sharing such a nice implementation. I have a question that if we can have this code modified so that is can takes points with flexible size. For example to claim the pointclouds placeholder as:
pointclouds_pl = tf.placeholder(tf.float32, shape=(batch_size, None, 3))

However, it reports error during graph building,

num_point = pointclouds_pl.get_shape()[1].value
net = tf_util.max_pool2d(net, [num_point,1], padding='VALID', scope='maxpool')

TypeError: Expected int for argument 'ksize' not None.

==== add by Sep. 3 ====
This error is from tf.nn.max_pool, which requires kernel size should not be None. A direct solution could be create a new operation layer in place of tf.nn.max_pool.

Questions about ModelNet40 point clouds data

Excellent work! Thank you very much.

I have two questions about the point clouds data you provide:

(1) How could you generate fixed size(1024,2^10) point cloud from mesh file?
I try to use ray casting to generate, but it does not produce num of points equal to power of 2

(2) For each point could, I notice the first half part(0-1023) and the second half part(1024-2047) are identical
Does it still meaningful to train pointnet with 2048 as num_point ?

Evaluating on part segument results

Hi, Excellent work. And After reading your code on evaluation of part segment code, i find that
(1) the evaluation part (these lines) uses cur_gt_label (which is groundtruth) to calculate the iou_oids, so why donot you use the predicted labels to evaluate the iou?? or just simply calculated the mask by argmax on seg_pred_res here and then compared it to groundtruth segment mask.
(2) i also find that you donot use the .h5 test files but use the folders which contains test point cloud data to evaluate. Based on my observation, .h5 file uses point cloud data which has 2048 points for each test examples. However, in the test folder, each item has variable number of points. which one is the correct on to evalute the model? can you give some tips.

Confusion about pointcloud data generation

Hi Charles,

Thank you for your state-of-the-art work and code. It's really a brand new method to utilize the 3D raw data. I'm interested in the 3D object classification based on ModelNet40. Here is the question.

The point number of the data is fixed to 2048 in the hdf5 file. But when I turn to ModelNet40, the vertices numbers of models are always more or less than 2048. It's easy to sample from models with vertices numbers more than 2048, but how to deal with the models with vertices less than 2048 even 1024? (for example wardrobe_000000112.off, only with 46 vertices)

I have tried to use MeshLab to mesh models and then uniformly sample 2048 points from the mesh. But when I use these data to train and test, the test accuracy is only about 86.0%, far less than about 89.1% based on the hdf5 file. Could you please share the method to generate pointcloud data?

PS: In the last paragraph of Appendix C, maybe by using Adam optimizer, momentum parameter is not needed?

I'll be appreciated if you could reply soon.

Many thanks,
Jerry Li

sem_seg learning rate clip typo

Dear Charles,

In sem_seg/train.py probably a typo which eliminates your learning rate clipping, which you so emphasize.

learing_rate should be learning_rate?

def get_learning_rate(batch):
--
def get_learning_rate(batch):
    learning_rate = tf.train.exponential_decay(
                        BASE_LEARNING_RATE,  # Base learning rate.
                        batch * BATCH_SIZE,  # Current index into the dataset.
                        DECAY_STEP,          # Decay step.
                        DECAY_RATE,          # Decay rate.
                        staircase=True)
    learing_rate = tf.maximum(learning_rate, 0.00001) # CLIP THE LEARNING RATE!!
    return learning_rate

sem_seg testing issue

Hello Charles,
Arter training,I follow the instruction about testing in README.md of sem_seg folder.But as you can see below,there is an error .
img_5644 20170714-093438

pointnet++

I'm interested in your follow-up work - the improved Pointnet (Pointnet++).

To me it's not totally clear how exactly you applied (vanilla) Pointnet after data grouping stage - namely, what was the value of parameter C' (length of the feature vector for every group) in your paper?

For example, the notation SA(512, 0.2, [64, 64, 128]) - how exactly does the explanation of PointNet layer in Section 3.2 fit here? What is the exact size of the input to the PointNet (after sampling and grouping stage), and the output?

Thanks!

model retrieval

I just want more information about model retrieval part from your article. How can I pull datas for nearest neighbor search? Do you have that part of the code?

some questions about code

Dear Mr./Ms,
I am a graduate student from XDU. My project is about Deep Learning. I am writing to asking you sth about your paper PointNet. Here is the problem
I have achieved the part of the training and testing of 3D object classification before i ask you, thanks for your codes. Now i want to understanding what the network has learnt for next step, so would you tell me how to achieve visualization or what should i do.
I am appreciated if you can give us the code about segmentation and other experiments.
Looking forward to hearing from you, and thank you for the help in advance.
Best regards,
Zhang GH.

Example of the 9-dimvector version of Semantic Segmentation

Hi Charles,

In the paper you mentioned that each point in the semantic segmentation task what represented by a 9-dim vector. I couldn't find the corresponding code/network that expected such input. How do you deal with the other 6-elements in the point representation? Does the first set of filters become [1,9]? do you process the RGB data separately? Can you upload an example with Stanford 3D data?

Best,

Training on my own dataset

Hello, I tired to train model on my own dataset witch generated by lidar, but the models have different numbers of points because of the distance. Are there any suggested algorithm to increase or decrease (normalize) the number of points?

about segmentation network

Hi! Thanks for sharing your nice works. I wonder why the segmentation network need to concatenate the one hot label with the pooling feature, will remove the encoded label decrease the performance? Thanks. @charlesq34

error if batch size =1

Hi..

I'm getting the following error error while trying to change batch size to 1

python evaluate.py --visu --batch_size 1

.....
......
ValueError: Shape must be rank 2 but is rank 3 for 'MatMul_1' (op: 'MatMul') with input shapes: [1024,64], [1,64,64].

My final goal is to call the network to predicted from data input from Depth sensor (e.g. Kinect)

I'm trying to call it with only one point cloud and see what it will predict, any idea how to achieve that!

modifying the code and changing batch size to None, so it should depend on actual input, results in different error!

TypeError: Failed to convert object of type <class 'list'> to Tensor. Contents: [None, -1]. Consider casting elements to a supported type.

IOU computation

Hi Charles, I see that you use the following line of code

iou = true_positive_classes[i]/float(gt_classes[i]+positive_classes[i]-true_positive_classes[i])

I'm a bit confused by the what the difference between these variables. Others use this or this definition, both of which seem equivalent to each other, but not equivalent to your definition.

screen shot 2017-06-29 at 12 24 07 am

screen shot 2017-06-29 at 12 21 43 am

Can the pointnet be used for the segmentation of outdoor scenes?

Hi!Mr. Qi:
I try to run pointnet to carry out segmentation of point cloud data. In your repository,you provide indoor scenes segmentation, I train the sample data,but the process spend too much time,the final segmentation result is also acceptable, so I want it to be used for outdoor scene segmentation.
My question is:How to prepare the point cloud data for training?Or how to get the outdoor point cloud with labels? Should I convert the original format to the HDF5 format?

                                                         Thanks a lot!

Semantic segmentation experiments

Hi,

I am currently trying to do experiments on semantic segmentation similar to what is described in the paper.

I took Stanford 3d dataset (building parser) and cut it in 1m^3 cubes (example video of assembling pieces back into the scene). After that I sample 4096 points in each cube.
So now I have (16651x4096x6) array where 6 is a vector (x,y,z,r,g,b).

However, I find the next step of creating 9-dim vectors for training confusing and would appreciate if you could clarify what is exactly in the 9-dim vector that you have used.
From paper I get it the following way:

  1. Original x,y,z in global space
  2. Normalized x',y',z' from [0,1] (normalized x,y,z values from above. Or should it be relative to the whole room/scene?)
  3. RGB values (they are from 0 to 255 and should be normalized to 0 and 1, right?)

At the same time, just using original x,y,z (potentially huge numbers) and values normalized between 0 and 1 doesn't seem to be correct, because the magnitude of the inputs is likely to affect the training process. Do you normalize them to something like 0 and 1 or from -1 to 1?

I am basically curious about how important RGB information for pointnet. It obviously helps, but it is interesting how much. Have you done or plan to do such experiments?

Best,
Vladislav

How to extract features?

Firstly, thank you for making the code available.
I used the train.py script to learn the (parameters of the) classification network on the ModelNet database (with default parameters), and want to extract features given the learned network. I have problems with understanding how to compute features for given input point cloud. Namely, I'm interested in features after the transform_net_2 (i.e. feature transform) layer - these should be the "key points" referred to in the paper, if I understood correctly?
After restoring the session and trying to compute the net_transformed layer, I get the

FailedPreconditionError (see above for traceback): Attempting to use uninitialized value transform_net1/tfc2/biases_1

I'm not experienced with tensorflow, so help would be welcome. Sorry if I missed something simple.

Sementic segmentation

Hi,
Could you provide a sample for the semantic segmentation. Your project is really great by the way. Thanks!

Data Availabilty

Great work! I'm excited to try this out.

I'm having an issue fetching the data, the connection has been timing out today and yesterday for
wget https://shapenet.cs.stanford.edu/media/modelnet40_ply_hdf5_2048.zip

It does not seem like https://shapenet.cs.stanford.edu/ is available from a browser either. Are there any other sources of this data? I checked out the Princeton ModelNet40 set, but the files are of type .off instead of hdf5.

confusion about T-Net.

Hi Charles,
I have some confusion about T-Net:
1.How can the T-Net update its weights in the network.You know,error BackPropagation algorithm need error to update the weights,but I can't find any error about the T-Net.
2.The T-Net's role is ''to align all input set to a canonical space before feature extraction'',you said in your paper,but how can an architecture like T-Net achieves that goal?

No nonlinearities?

I can't see any non-linearities/activation functions in your model. Is this intentional?

I suspect this is why you need such a big embedding layer. In my own experiments I found that you can make this layer pretty small (even as low as 8 in some cases!), especially if you use set-wide information in the transformations.

Recognize bottle from ycb data as monitor!

hi..
I'm trying to use pointnet to recognize object from ycb dataset (http://www.ycbbenchmarks.com/) under the category of bottle, and it does have very similar shape to training dataset, but point-net did not recognize it correctly! I think the issue may be related to how I prepare/down-sample the data!

I tried with the 006_mustard_bottle.ply, it contains 847289 point, down-sampled it randomly to 1024 point:

image

result in following:

image

image

The classification result was :

irplane = -10065.7
bathtub = -10174.7
bed = -5508.48
bench = -6663.82
bookshelf = -7400.75
bottle = -15664.8
bowl = -6521.66
car = -14696.8
chair = -11587.8
cone = -8178.83
cup = -13763.6
curtain = -4454.49
desk = -19952.4
door = -12188.9
dresser = -4633.64
flower_pot = 1036.69
glass_box = -7719.06
guitar = -8977.43
keyboard = -15161.3
lamp = -5138.02
laptop = -6836.96
mantel = -13274.9
monitor = -9731.3
night_stand = -11224.4
person = -16864.7
piano = -17242.8
plant = 1167.26
radio = 1352.54
range_hood = -13157.3
sink = -9762.18
sofa = -18294.6
stairs = -5749.73
stool = -10743.8
table = -6021.3
tent = -11153.3
toilet = -10596.6
tv_stand = -3238.71
vase = -7211.66
wardrobe = -13034.5
xbox = -6992.64

I tried different approach, by voxlize to get better result:

image

image

loss value 13.4148

airplane = -9.70358
bathtub = -19.6874
bed = -13.8628
bench = -15.0484
bookshelf = -14.7929
bottle = -10.4973
bowl = -14.3144
car = -7.3393
chair = -12.6577
cone = -8.53707
cup = -7.65949
curtain = -19.0757
desk = -16.9046
door = -20.5851
dresser = -8.47661
flower_pot = 1.46643
glass_box = -15.3492
guitar = -12.4239
keyboard = -20.1066
lamp = -8.02352
laptop = -17.523
mantel = -13.7787
monitor = 2.63108
night_stand = -6.75316
person = -11.2164
piano = -3.55436
plant = -2.09806
radio = -3.64729
range_hood = -14.5621
sink = -10.3373
sofa = -14.9684
stairs = -3.0044
stool = -14.2689
table = -19.0393
tent = -3.54769
toilet = -4.67468
tv_stand = -15.6987
vase = -5.87417
wardrobe = -15.3384
xbox = -11.0024

What is the best approach to prepare data to get more accurate results?

Semantic Segmentation issue.

I use sh download_data.sh to download dataset.
Then use python collect_indoor3d_data.py.But it appears a problem that I don't know how to deal with (As the picture shows below).
img_5554

Bug in training/evaluation: some data missed

Dear authors,

I have noticed that for training/evaluation (https://github.com/charlesq34/pointnet/blob/master/train.py#L187) you iterate over the whole number of batches. However, for the number of instances not divisible by batch_size this would result in the fact that some instances are never seen for training or evaluation. For example, for M40 train size=9840, test size=2468. These are not divisible by default batch size of 32, which means you miss out 4 instances for testing and 16 instances for training. This can have dramatic consequences for much larger batch sizes (which make sense for GPUs with large amount of RAM).

If you confirm this, I can create a pull request, which fixes this problem.

Best
Dmytro Bobkov

fine tuning

Hi

How to fine tune the model for different number of classes and a custom dataset?

Extra character in v1.2 data in Area_5/hallway_6

Hi Charles,
I faced this problem as I ran collect_indoor3d_data.py.

D:\pointnet\data\Stanford3dDataset_v1.2_Aligned_Version\Area_5/hallway_6/Annotations
D:\pointnet\data\Stanford3dDataset_v1.2_Aligned_Version\Area_5/hallway_6/Annotations ERROR!!

In the code, you mentioned that this is because of "an extra character in the v1.2 data in Area_5/hallway_6. It's fixed manually."
May I know what you mean by this and how do you fix manually?
Thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.