GithubHelp home page GithubHelp logo

facebookresearch / 3detr Goto Github PK

View Code? Open in Web Editor NEW
610.0 56.0 76.0 7.16 MB

Code & Models for 3DETR - an End-to-end transformer model for 3D object detection

License: Apache License 2.0

Python 95.78% Shell 1.31% Cython 2.90%

3detr's Introduction

3DETR: An End-to-End Transformer Model for 3D Object Detection

PyTorch implementation and models for 3DETR.

3DETR (3D DEtection TRansformer) is a simpler alternative to complex hand-crafted 3D detection pipelines. It does not rely on 3D backbones such as PointNet++ and uses few 3D-specific operators. 3DETR obtains comparable or better performance than 3D detection methods such as VoteNet. The encoder can also be used for other 3D tasks such as shape classification. More details in the paper "An End-to-End Transformer Model for 3D Object Detection".

[website] [arXiv] [bibtex]

Code description. Our code is based on prior work such as DETR and VoteNet and we aim for simplicity in our implementation. We hope it can ease research in 3D detection.

3DETR Approach Decoder Detections

Pretrained Models

We provide the pretrained model weights and the corresponding metrics on the val set (per class APs, Recalls). We provide a Python script utils/download_weights.py to easily download the weights/metrics files.

Arch Dataset Epochs AP25 AP50 Model weights Eval metrics
3DETR-m SUN RGB-D 1080 59.1 30.3 weights metrics
3DETR SUN RGB-D 1080 58.0 30.3 weights metrics
3DETR-m ScanNet 1080 65.0 47.0 weights metrics
3DETR ScanNet 1080 62.1 37.9 weights metrics

Model Zoo

For convenience, we provide model weights for 3DETR trained for different number of epochs.

Arch Dataset Epochs AP25 AP50 Model weights Eval metrics
3DETR-m SUN RGB-D 90 51.0 22.0 weights metrics
3DETR-m SUN RGB-D 180 55.6 27.5 weights metrics
3DETR-m SUN RGB-D 360 58.2 30.6 weights metrics
3DETR-m SUN RGB-D 720 58.1 30.4 weights metrics
3DETR SUN RGB-D 90 43.7 16.2 weights metrics
3DETR SUN RGB-D 180 52.1 25.8 weights metrics
3DETR SUN RGB-D 360 56.3 29.6 weights metrics
3DETR SUN RGB-D 720 56.0 27.8 weights metrics
3DETR-m ScanNet 90 47.1 19.5 weights metrics
3DETR-m ScanNet 180 58.7 33.6 weights metrics
3DETR-m ScanNet 360 62.4 37.7 weights metrics
3DETR-m ScanNet 720 63.7 44.5 weights metrics
3DETR ScanNet 90 42.8 15.3 weights metrics
3DETR ScanNet 180 54.5 28.8 weights metrics
3DETR ScanNet 360 59.0 35.4 weights metrics
3DETR ScanNet 720 61.1 40.2 weights metrics

Running 3DETR

Installation

Our code is tested with PyTorch 1.9.0, CUDA 10.2 and Python 3.6. It may work with other versions.

You will need to install pointnet2 layers by running

cd third_party/pointnet2 && python setup.py install

You will also need Python dependencies (either conda install or pip install)

matplotlib
opencv-python
plyfile
'trimesh>=2.35.39,<2.35.40'
'networkx>=2.2,<2.3'
scipy

Some users have experienced issues using CUDA 11 or higher. Please try using CUDA 10.2 if you run into CUDA issues.

Optionally, you can install a Cythonized implementation of gIOU for faster training.

conda install cython
cd utils && python cython_compile.py build_ext --inplace

Benchmarking

Dataset preparation

We follow the VoteNet codebase for preprocessing our data. The instructions for preprocessing SUN RGB-D are here and ScanNet are here.

You can edit the dataset paths in datasets/sunrgbd.py and datasets/scannet.py or choose to specify at runtime.

Testing

Once you have the datasets prepared, you can test pretrained models as

python main.py --dataset_name <dataset_name> --nqueries <number of queries> --test_ckpt <path_to_checkpoint> --test_only [--enc_type masked]

We use 128 queries for the SUN RGB-D dataset and 256 queries for the ScanNet dataset. You will need to add the flag --enc_type masked when testing the 3DETR-m checkpoints. Please note that the testing process is stochastic (due to randomness in point cloud sampling and sampling the queries) and so results can vary within 1% AP25 across runs. This stochastic nature of the inference process is also common for methods such as VoteNet.

If you have not edited the dataset paths for the files in the datasets folder, you can pass the path to the datasets using the --dataset_root_dir flag.

Training

The model can be simply trained by running main.py.

python main.py --dataset_name <dataset_name> --checkpoint_dir <path to store outputs>

To reproduce the results in the paper, we provide the arguments in the scripts folder. A variance of 1% AP25 across different training runs can be expected.

You can quickly verify your installation by training a 3DETR model for 90 epochs on ScanNet following the file scripts/scannet_quick.sh and compare it to the pretrained checkpoint from the Model Zoo.

License

The majority of 3DETR is licensed under the Apache 2.0 license as found in the LICENSE file, however portions of the project are available under separate license terms: licensing information for pointnet2 is available at https://github.com/erikwijmans/Pointnet2_PyTorch/blob/master/UNLICENSE

Contributing

We welcome your pull requests! Please see CONTRIBUTING and CODE_OF_CONDUCT for more info.

Citation

If you find this repository useful, please consider starring ⭐ us and citing

@inproceedings{misra2021-3detr,
    title={{An End-to-End Transformer Model for 3D Object Detection}},
    author={Misra, Ishan and Girdhar, Rohit and Joulin, Armand},
    booktitle={{ICCV}},
    year={2021},
}

3detr's People

Contributors

chaoyivision avatar daveredrum avatar ghostish avatar imisra avatar sabughazal avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

3detr's Issues

The Fourier positional embedding

Hi, thank you for your excellent work.

Is the shift_scal_points() way important in the def get_fourier_emebddings(xyz, num_channles, input_range)?

My model uses your version of get_fourier_emebddings(), but the performance is similar to the MLP way. The only difference is that I have not used the sshift_scal_points() because my input has been normalized before being fed to get_fourier_emebddings().
Do you have any suggestions about that? Learning rate or some other things would affect the application of Fourier features?

I appreciate it.

need your help

hello, I am intertsted in your great work.Thanks.
I have a problem about what does "ret_unique_cnt" mean?
“ret" and "cnt" are short for what?

about the non-parametric query embeddings

Hi, thanks for the great work and the open-sourced codebase!

In the paper, you mentioned that non-parametric query embedding could get much better results than the original parametric one used in the DETR paper. I noticed that the non-parametric one involves a random sampling for points. So I am just wondering how this "randomness" gonna affect the performance during inference time?

Thanks!

Visualizing ground-truth point clouds and 3D bounding boxes

Hi,
Thanks for sharing the great work and helping me so far!

I'm visualizing the input point cloud and the corresponding ground-truth 3D bounding boxes.
visualize is my code that takes the point cloud and 3D boxes and does not change them.

pcd  = batch_data_label['point_clouds'][0]   # pcd.shape  == [N=20k, 3]
obbs = batch_data_label['gt_box_corners'][0] # obbs.shape == [K=64, 8, 3]
visualize(pcd, obbs)

but they seem not to match each other as below.
image

But when changing the YZ axis of the input point cloud (pcd = pcd[:,[0,2,1]]), both seem to match each other a little bit.
image

Is this expected visualization?
Is the input point cloud preprocessed before being fed into the model?

Thanks in advance!

Questions about the code about the criterion.py

Hello, I do not understand the code "assign = linear_sum_assignment(final_cost[b, :, : nactual_gt[b]])", especially the nactual_gt. why not "assign = linear_sum_assignment(final_cost[b, :, : ])" ? Thanks for your great work and hope your reply soon!

training on my own data in sunrgbd format raises error

Hello @likethesky , @Celebio , @colesbury , @pdollar , @minqi ,

thank you for this is amazing work of 3detr.
I have built my dataset with sunrgbd format and it already worked with Votenet, but when i use it with 3detr it raises the following issue:

Training started at epoch 0 until 720.
One training epoch = 18 iters.
One eval epoch = 2 iters.
Traceback (most recent call last):
  File "main.py", line 435, in <module>
    launch_distributed(args)
  File "main.py", line 423, in launch_distributed
    main(local_rank=0, args=args)
  File "main.py", line 416, in main
    best_val_metrics,
  File "main.py", line 191, in do_train
    logger,
  File "/home/mad/3detr/engine.py", line 87, in train_one_epoch
    outputs = model(inputs)
  File "/home/mad/miniconda3/envs/3detr/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/mad/3detr/models/model_3detr.py", line 327, in forward
    query_xyz, query_embed = self.get_query_embeddings(enc_xyz, point_cloud_dims)
  File "/home/mad/3detr/models/model_3detr.py", line 179, in get_query_embeddings
    pos_embed = self.pos_embedding(query_xyz, input_range=point_cloud_dims)
  File "/home/mad/miniconda3/envs/3detr/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/mad/3detr/models/position_embedding.py", line 134, in forward
    return self.get_fourier_embeddings(xyz, num_channels, input_range)
  File "/home/mad/3detr/models/position_embedding.py", line 112, in get_fourier_embeddings
    xyz_proj = torch.mm(xyz2d.view(-1, d_in), self.gauss_B[:, :d_out]).view(
RuntimeError: expected scalar type Double but found Float

i have also used the sunrgbd dataset and it works fine, i have built my dataloader from the sunrgbd loader and customized it according to my data.

Plans on supporting Kitti

First, thank you very much for this great repo.
I was wondering if you planned on supporting the Kitti dataset in the future? I tried to add it myself but ended with very bad results, so I suppose I am doing something wrong.

How to visualize the output bbox

Thanks for your amazing work!
I would like to visualize the predicted bounding box output, yet I dont know how to do it.
I would really appreciate it if anyone can offer some guidance!

A possible bug about enc_inds

There are two downsampling operations when using MaskedTransformerEncoder. However, the final enc_inds is confusing.

The range of pre_enc_inds (0-N) is different from that of enc_inds (0-preenc_npoints) returned by self.encoder(). And the final output inds should be given by indexing the pre_enc_inds using the enc_inds. I am not sure whether it is a bug.

    def run_encoder(self, point_clouds):
        xyz, features = self._break_up_pc(point_clouds)
        pre_enc_xyz, pre_enc_features, pre_enc_inds = self.pre_encoder(xyz, features)
        # xyz: batch x npoints x 3
        # features: batch x channel x npoints
        # inds: batch x npoints

        # nn.MultiHeadAttention in encoder expects npoints x batch x channel features
        pre_enc_features = pre_enc_features.permute(2, 0, 1)

        # xyz points are in batch x npointx channel order
        enc_xyz, enc_features, enc_inds = self.encoder(
            pre_enc_features, xyz=pre_enc_xyz
        )
        if enc_inds is None:
            # encoder does not perform any downsampling
            enc_inds = pre_enc_inds
        return enc_xyz, enc_features, enc_inds

Problems with training on my own datasets

Dear author:
At present, I have a batch of data sets to do 3D point cloud detection called center,which does not need to be classified, but only need to identify its location. I generated training and testing files including xxbbox.npy, xxpc.npz, votes.npz according to the format of sunrgbd data.The modifications in the dataset configuration file are as follows:
1

The parameters set during training are as follows:
--dataset_name center
--max_epoch 90
--nqueries 128
--base_lr 7e-4
--matcher_giou_cost 3
--matcher_cls_cost 1
--matcher_center_cost 5
--matcher_objectness_cost 5
--loss_giou_weight 0
--loss_no_object_weight 0.1
--save_separate_checkpoint_every_epoch -1
--checkpoint_dir outputs/certer_90

The following problems occurred during training:
image

Do you know the possible causes of the problem?Look forward to your answer.

Questions on query embeddings

Hi! Thank you for releasing such wonderful work. I have a few questions regarding the query embeddings:

  1. I was wondering why the query embeddings are called "non-parametric" in the paper, since an MLP is still applied to project the Fourier positional embedding into actual queries?
  2. If the MLP is learned, it feels like the major difference is to ensure good coverage with FPS sampling, whereas the DETR-style query embeddings are "free"; the learning capacity of the embeddings is about the same as DETR-style embedding. Am I missing anything?
  3. Is there a performance comparison to using DETR style embeddings?

Looking forward to hearing from you. Thanks!

hang on bacause targets["num_boxes_replica"]=0

Hi, dear author, thanks for the excellent work!

When I try to train 3detr with 8 gpus. I found the training is hang on after some iterations.
Then I found that the reason is that targets["num_boxes_replica"]=0.
But I am curious that why it will become 0

Looking forward to your reply. Thanks.

Big doubt about the performance

Dear writer, i have trained your model on the sunrgbd dataset, except the parameters: --dataset_num_workers 4, --batchsize_per_gpu 4, (limited to my poor GPU condition),and others are the same as your "sunrgbd_quick.sh", but the results(AP25) after training are far lower than your "43.7", it is 27.3. To verify whether my data preprocessing is wrong,i test the model with the files: "sunrgbd_ep1080.pth" you provided, and guess what, the results are: mAP0.25, mAP0.50: 79.81, 54.79
AR0.25, AR0.50: 88.97, 66.01, I'm really confused. Can you tell me why??Thanks!!!!!!

Question About Offset

Hi, thanks for the good research.

I have a question about offset.
Predicted offset value is in [-0.5, 0.5].
But when sampled query_xyzs have minimum distance 10 beween them, predicted center couldn't cover all possible center area.

For example, there's two query xzy for simplicity.
query_xyz0------target center----query_xyz1
Distance between query_xyz0 and target center is 5
Distance between query_xzy1 and target center is 4
Since offset value only can get value [-0.5, 0.5], predicted center never can get the same value with any target center.(Best difference is 3.5 -> Isn't it big?)

Please help me. Thank you.

Question about abnormal gIOU Loss

Hi, i am troubled for the problem of gIOU loss remaining high during training on my own dataset for days. I found it is calculated between outputs["box_corners"] and gt box corners. So I visualized my gt bbox and corner to see whether it is correct. In my visualizion I see the gt center and size is correct, but the corners are not aligned with the 3D bbox. Then I found all the box corners are calculated by function box_parametrization_to_corners or box_parametrization_to_corners_np. However, if I chose the corners calculated by function my_compute_box_3d(), the corners are aligned well. So I guess it is owing to function box_parametrization_to_corners or box_parametrization_to_corners_np.
In these two function, the inputs are box center, size and angle. I see the code first rotate the coordinate to image axis, but doesn't return to initial world coordinate. Does this matter in the calculation of gIOU? Because both GT and prediction corners are calculated by these two function. if it isn't the reason, could you help me with the problem why the gIOU loss remains high during training while other classification loss decreases during training? Thanks a lot!
gIOU calculation
image
box corners calculation
image
gIOU loss remains high while classification loss decreases
image

Questions about AP on scannet

Hi, Thanks for your great work.
By training your model 3detr-m, I got the following results on scannet(epoch1080):
====================Final Eval Numbers.
mAP0. 25, mAP0. 50: 62.12, 43.58
AR0. 25, AR0. 50: 74.09, 54.71
====================Best Eval Numbers.
mAP0. 25, mAP0. 50: 63.78, 39.67
AR0. 25, AR0. 50: 78.57, 53.71
With your help, the current result is close to 65.0. Is this a reasonable result?
Can you help me see what caused the result not to reach 65.0 / 47.0?

How do you visualize attention weights?

Thanks for sharing the great work.

I would like to reproduce the paper's visualization results for the decoder self-attention weights in Figure 1 and the encoder self-attention weights in Figure 8.
Please help me.

  • How do you visualize attention weights?
  • Does "all points in the scene" mean no down sampling?
  • Which layer and which head's attention weights did you visualize?
  • Could you share the code for the attention weights visualization?

Thank you.

Hangs with ngpus > 1

I know this is explicitly not supported, but was wondering if you (or anyone) ran into hangs when trying to parallelize across multiple GPUs? I don't get any explicit errors, the process seems to just stop after 1000 or so steps (and also hangs)

I've tried pytorch 1.08 (LTS) with CUDA 11.1 and pytorch 1.09 (also with CUDA 11.1). Our cluster uses 3090s so we are unable to run with 10.2. I'm going to try with CUDA 11.3/pytorch 1.10 this week after our nvidia driver is updated.

Reproduce results on SUNRGBD

Hi, thanks for you great work!

I have met problems to reproduce the results on SUNRGBD, the model seems not to converge after the 1st epoch.

I follow this instruction to preprocess the dataset. I run python sunrgbd_data.py --viz and use MeshLab to view the data, the result seems fine. My PYTORCH version is 1.9.0 and CUDA version is 10.2

Under single gpu condition, I simply run scripts/sunrgbd_quick.sh and just modify the batch size to 4(I don't have V100) and the network doesn't converge. I think it's because the batch size is too small. So I tried to use 8 gpus to train the model. (Use SyncBN and scale the learning rate to 4x since the truly batch size is 32) . But I found a problem that after several iteration the training process will hang up.(In my experiments, 104 iter at epoch 0). The reason of this phenomenon is that some parameters(for DistributedDataParallel) or inputs(for SyncBN) can't reduce gradient among all gpus. (Some gpus have gradients and some not. For example, 1st SyncBN of center head on gpu 0 can't get gradient_input). I solve this problem by applying a small trick like:
for p in model.parameters(): loss += p.sum() * 0.

But after enlarge the batchsize to 32 and the model still doesn't converge(First epoch loss 35->17 and never drop). Is there any problem in my experiments.

By the way, how many gpus do you use in your experiments to train 1080 epoch. My training iter time is about 1.8. That means training 90 epochs will cost about 7 hours... I have install the Cythonized implementation of gIOU follow your instruction.

Question about VRAM

Hi, I am very impressed with your research work on 3detr! Now I want to reproduce your experimental results, did you use V100-16G or V100-32G?

Question about the reported AP mismatch between the paper and GitHub

Hi,
Thanks for your great work.
I am just wondering the AP difference between the Scannet Results (for AP25 and AP50)

  • In the paper, it is 62.7 | 37.5
  • In Github, it is 62.1 | 37.9
    here AP25 in GitHub is lower than that in the paper.

Besides, I am also confused about Scannet 720 epochs AP values: in GitHub, it is 61.1 | 40.2, where I am quite confused about the AP50 results...

Code understanding

What does "ret_unique_cnt" mean?
“ret" and "cnt" are short for what?

RuntimeError: ReluBackward0, is at version 1; expected version 0 instead

I tried a 90 epoch training for sanity check on a 3090 GPU, torch 1.13, and an exception just occurred:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [256, 8, 256]], which is output 0 of ReluBackward0, is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

So I just removed all the inplace=True in this repo(however, they are all Dropout instead of Relu mentioned in the Traceback), and the question is solved. Maybe it will slow a bit or not, but better than nothing.

In case someone may face the same problem, leaving a message here.

Doesn't Work with Pytorch 1.10

With Pytorch 1.9, I get no errors. However, with Pytorch 1.10, I get this error.
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [256, 1, 256]], which is output 0 of ReluBackward0, is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

Even though Pytorch 1.10 isn't supported, I was wondering, did any large behavior change happen between pytorch 1.9 and 1.10? It seems odd that this error won't have been raised with 1.9.

Sharing models through the Hugging Face Hub

Hi @imisra, @rohitgirdhar and rest of 3DETR team

3DETR is quite interesting and it wold be great to make it more visible to the rest of the Machine Learning Ecosystem!

Would you be interested in sharing your models in the Hugging Face Hub? The Hub offers free hosting of over 25K models, and it would make your work more accessible and visible to the rest of the ML community. There's an existing Facebook organization with models such as DETR and DeiT.

Some of the benefits of sharing your models through the Hub would be:

  • wider reach of your work to the ecosystem
  • versioning, commit history and diffs
  • repos provide useful metadata about their tasks, languages, metrics, etc that make them discoverable
  • multiple features from TensorBoard visualizations, PapersWithCode integration, and more

Creating the repos and adding new models should be a relatively straightforward process if you've used Git before. This is a step-by-step guide explaining the process in case you're interested. Please let us know if you would be interested and if you have any questions.

Happy to hear your thoughts,
Omar and the Hugging Face team

cc @mishig25 @LysandreJik @NielsRogge

Super slow training when predicting bounding box orientations

Thanks for the great work and the amazing code base! I was interested in using your code to predict oriented bounding boxes. However, when doing so the training speed decreases dramatically compared to predicting oriented bounding boxes. I am training on a single RTX 2080 with 11 GB and bs of 7. The training time for 1080 epochs on ScanNet for the axis aligned bounding boxes is 30 hours. However, when I attempt to train to predict oriented bounding boxes (pretending scannet boxes have an orientation) the estimated train time shoots up to between 40 and 80 days!!! making it unusable for me. The main increase in time seems to come from computing 3D bounding box overlap with generalized_box3d_iou_tensor() and increased time in backpropagating gradients because I assume the computational graph has gotten a lot more complex. Do these numbers seem reasonable to you or am I missing something ? Is there some way to speed up the 3D bounding box overlap calculation ?
Thanks!

_pickle.PicklingError: Can't pickle <class 'numpy.core._exceptions.UFuncTypeError'>: it's not the same object as numpy.core._exceptions.UFuncTypeError

Sorry, I accidentally closed my question. Have you tried to use Python 3.8?
Traceback (most recent call last):
File "/usr/local/python38/lib/python3.8/multiprocessing/queues.py", line 239, in _feed
obj = _ForkingPickler.dumps(obj)
File "/usr/local/python38/lib/python3.8/multiprocessing/reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
_pickle.PicklingError: Can't pickle <class 'numpy.core._exceptions.UFuncTypeError'>: it's not the same object as numpy.core._exceptions.UFuncTypeError
^CTraceback (most recent call last):
File "/home/lab30201/sdb/lwb/3detr-main/main.py", line 430, in
launch_distributed(args)
File "/home/lab30201/sdb/lwb/3detr-main/main.py", line 418, in launch_distributed
main(local_rank=0, args=args)
File "/home/lab30201/sdb/lwb/3detr-main/main.py", line 403, in main
do_train(
File "/home/lab30201/sdb/lwb/3detr-main/main.py", line 179, in do_train
aps = train_one_epoch(
File "/home/lab30201/sdb/lwb/3detr-main/engine.py", line 74, in train_one_epoch
for batch_idx, batch_data_label in enumerate(dataset_loader):
File "/usr/local/python38/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 435, in next
data = self._next_data()
File "/usr/local/python38/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1068, in _next_data
idx, data = self._get_data()
File "/usr/local/python38/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1034, in _get_data
success, data = self._try_get_data()
File "/usr/local/python38/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 872, in _try_get_data
data = self._data_queue.get(timeout=timeout)
File "/usr/local/python38/lib/python3.8/multiprocessing/queues.py", line 107, in get
if not self._poll(timeout):
File "/usr/local/python38/lib/python3.8/multiprocessing/connection.py", line 257, in poll
return self._poll(timeout)
File "/usr/local/python38/lib/python3.8/multiprocessing/connection.py", line 424, in _poll
r = wait([self], timeout)
File "/usr/local/python38/lib/python3.8/multiprocessing/connection.py", line 930, in wait
ready = selector.select(timeout)
File "/usr/local/python38/lib/python3.8/selectors.py", line 415, in select
fd_event_list = self._selector.poll(timeout)

Different matcher cost weights than reported in the paper

Hi, thanks for this work and the code release!

In the current arxiv version of your paper, section A2 of the supplementary, it is stated that λ1, λ2, λ3, λ4 are set to 2, 1, 0, 0 for ScanNet., i.e. there is no weight on the semantic assignment cost.
However, in the code they're set to 2, 0, 1, 0 respectively.
Which setting is the correct one?

Basic question about dataset

Hi, I just wonder why you chose RGB-D data for training

For example, KITTI-3D data (only with point cloud) haven't reached the performance as expected or

without any reason?

Just a simple curiosity, thanks

DDP performance varies from gpu number

I have found that when using 4 Gpus', the results are inferior to 2 Gpus'.

After checking the codes, I found that the code reduce losses and then divide by the world_size. That is kinda weird to me. I think the loss.backward will do it implicitly. So I remove this.

#def all_reduce_average(tensor):
#    val = all_reduce_sum(tensor)
#    return val / get_world_size()

def all_reduce_average(tensor):
    return tensor

I'm still running the experiments, and after getting the result, I'll record it here.

Predictions with colored point cloud input

I notice that there is a script called scannet_masked_ep1080_color.sh that uses color + point cloud as model input. However, I did not see any discussions about using color input in the paper or in this code repo. Do you intend to release checkpoints of a model that take color input for reference? Furthermore, since SUNRGBD also has color information, is there any plan to support SUNRGBD with color input as well?

Comment on VoxSeT statement about outdoor PCs?

In their recent paper on a "Voxel Set Transformer", He et al. mention that 3DETR can only be applied to indoor datasets:

3DETR present a promising solution by computing self-attention on a reduced set of seed points, this solution is only applicable to indoor scenes, where the point clouds are relatively dense and concentrated.

I saw issues #2 #15 #20, where you seem to suggest 3DETR could work on outdoor data, so do you agree with the claim of the authors?
Do you know of any successful applications of 3DETR outside of indoor datasets?

question regarding the optimizer

Hi Ishan
Thanks a lot for such an amazing work.
I have a question and appreciate it if you can please provide your insight and opinion on that.
I see in this work and a couple of your other works you used adamw, I notice that the parameters for the adamw is quite different from the ones that we use for adam though.

  • I was wondering why is that and if there is a reason behind it?

My other question is that

  • I notice that there is no stablished parameters for adamw yet, in different works of you I notice different parameters for adamw. I would be really thankful if you can provide some suggestion on what should I take into account to choose the right hyper parameters for adamw and if you have any additional suggestion for that matter.
    Thanks a lot for ur help :)

RuntimeError: output with shape [32, 2048, 2048] doesn't match the broadcast shape [1, 32, 2048, 2048]

Hi,
Thank you for sharing your great work!

I'm running your pre-trained model to reproduce the results but I see the runtime error below.
The error seems about an input shape issue.
Do you have any suggestions for this?

This is the command that I put.

python main.py --dataset_name sunrgbd --nqueries 128 --test_ckpt weights/sunrgbd_masked_ep1080.pth --test_only --enc_type masked

This is the error that I have.

Traceback (most recent call last):
  File "main.py", line 427, in <module>
    launch_distributed(args)
  File "main.py", line 415, in launch_distributed
    main(local_rank=0, args=args)
  File "main.py", line 388, in main
    test_model(args, model, model_no_ddp, criterion, dataset_config, dataloaders)
  File "main.py", line 316, in test_model
    curr_iter,
  File "/home/user/anaconda3/envs/3detr/lib/python3.6/site-packages/torch/autograd/grad_mode.py", line 49, in decorate_no_grad
    return func(*args, **kwargs)
  File "/home/user/Desktop/3detr/engine.py", line 179, in evaluate
    outputs = model(inputs)
  File "/home/user/anaconda3/envs/3detr/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/user/Desktop/3detr/models/model_3detr.py", line 306, in forward
    enc_xyz, enc_features, enc_inds = self.run_encoder(point_clouds)
  File "/home/user/Desktop/3detr/models/model_3detr.py", line 200, in run_encoder
    pre_enc_features, xyz=pre_enc_xyz
  File "/home/user/anaconda3/envs/3detr/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/user/Desktop/3detr/models/transformer.py", line 190, in forward
    output = layer(output, src_mask=mask, src_key_padding_mask=src_key_padding_mask, pos=pos)
  File "/home/user/anaconda3/envs/3detr/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/user/Desktop/3detr/models/transformer.py", line 288, in forward
    return self.forward_pre(src, src_mask, src_key_padding_mask, pos, return_attn_weights)
  File "/home/user/Desktop/3detr/models/transformer.py", line 272, in forward_pre
    key_padding_mask=src_key_padding_mask)
  File "/home/user/anaconda3/envs/3detr/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/user/anaconda3/envs/3detr/lib/python3.6/site-packages/torch/nn/modules/activation.py", line 819, in forward
    attn_mask=attn_mask)
  File "/home/user/anaconda3/envs/3detr/lib/python3.6/site-packages/torch/nn/functional.py", line 3362, in multi_head_attention_forward
    attn_output_weights += attn_mask
RuntimeError: output with shape [32, 2048, 2048] doesn't match the broadcast shape [1, 32, 2048, 2048]

Model Training is Behind in Performance

When training with one batch, and scaling all training rate parameters(except the weight decay) by the sqrt(1/8), I notice my model's training performance is way behind the checkpoints(90 epoch, 180 epochs, 360 epochs, 720 epochs, etc), but a significant amounts by having the results of mAP that is often 40% to 60% of the mAP of the checkpoints.
I also tried gradient accumulation up to a batch size of 8, and used the included training parameters, but my model performance, even when training with the scannet_quick script, I still only have a mAP of 30-35.
Any tips for getting full performance with a batch size of 1, for setups with limited memory?

The network does not perform well on scan

Dear author, your work is very good, but when I run your code, I don't perform very well. Can you help me see what's wrong?

environment:
PyTorch 1.9.0, CUDA 10.2, Python 3.6.

Run:
python main.py
--dataset_name scannet
--max_epoch 1080
--nqueries 256
--matcher_giou_cost 2
--matcher_cls_cost 1
--matcher_center_cost 0
--matcher_objectness_cost 0
--loss_giou_weight 1
--loss_no_object_weight 0.25
--save_separate_checkpoint_every_epoch -1
--checkpoint_dir outputs/scannet_ep1080
Result:
Training Finished.
====================Final Eval Numbers.
mAP0.25, mAP0.50: 16.68, 4.53
AR0.25, AR0.50: 23.55, 9.46

IOU Thresh=0.25
cabinet Average Precision: 5.25
bed Average Precision: 46.70
chair Average Precision: 40.31
sofa Average Precision: 16.30
table Average Precision: 15.51
door Average Precision: 9.95
window Average Precision: 2.23
bookshelf Average Precision: 2.74
picture Average Precision: 0.56
counter Average Precision: 10.11
desk Average Precision: 32.43
curtain Average Precision: 6.47
refrigerator Average Precision: 0.02
showercurtrain Average Precision: 18.62
toilet Average Precision: 44.26
sink Average Precision: 23.71
bathtub Average Precision: 16.13
garbagebin Average Precision: 8.95
cabinet Recall: 11.83
bed Recall: 53.09
chair Recall: 46.49
sofa Recall: 29.90
table Recall: 29.71
door Recall: 16.06
window Recall: 6.03
bookshelf Recall: 9.09
picture Recall: 1.80
counter Recall: 17.31
desk Recall: 50.39
curtain Recall: 8.96
refrigerator Recall: 3.51
showercurtrain Recall: 25.00
toilet Recall: 48.28
sink Recall: 33.67
bathtub Recall: 16.13
garbagebin Recall: 16.60

IOU Thresh=0.5
cabinet Average Precision: 0.74
bed Average Precision: 13.76
chair Average Precision: 11.42
sofa Average Precision: 5.46
table Average Precision: 3.97
door Average Precision: 1.42
window Average Precision: 0.47
bookshelf Average Precision: 2.01
picture Average Precision: 0.03
counter Average Precision: 0.96
desk Average Precision: 9.44
curtain Average Precision: 0.50
refrigerator Average Precision: 0.02
showercurtrain Average Precision: 7.14
toilet Average Precision: 13.95
sink Average Precision: 3.75
bathtub Average Precision: 3.23
garbagebin Average Precision: 3.19
cabinet Recall: 2.96
bed Recall: 22.22
chair Recall: 22.73
sofa Recall: 14.43
table Recall: 13.43
door Recall: 5.14
window Recall: 2.13
bookshelf Recall: 5.19
picture Recall: 0.45
counter Recall: 1.92
desk Recall: 24.41
curtain Recall: 1.49
refrigerator Recall: 1.75
showercurtrain Recall: 7.14
toilet Recall: 20.69
sink Recall: 10.20
bathtub Recall: 6.45
garbagebin Recall: 7.55
====================Best Eval Numbers.
mAP0.25, mAP0.50: 31.15, 7.70
AR0.25, AR0.50: 58.95, 20.68

IOU Thresh=0.25
cabinet Average Precision: 12.00
bed Average Precision: 70.82
chair Average Precision: 56.46
sofa Average Precision: 56.72
table Average Precision: 29.51
door Average Precision: 15.84
window Average Precision: 8.84
bookshelf Average Precision: 22.21
picture Average Precision: 0.26
counter Average Precision: 31.04
desk Average Precision: 49.17
curtain Average Precision: 6.81
refrigerator Average Precision: 15.72
showercurtrain Average Precision: 29.42
toilet Average Precision: 76.88
sink Average Precision: 32.54
bathtub Average Precision: 37.64
garbagebin Average Precision: 8.90
cabinet Recall: 47.58
bed Recall: 86.42
chair Recall: 76.83
sofa Recall: 89.69
table Recall: 67.14
door Recall: 36.62
window Recall: 28.01
bookshelf Recall: 67.53
picture Recall: 7.21
counter Recall: 57.69
desk Recall: 85.83
curtain Recall: 50.75
refrigerator Recall: 61.40
showercurtrain Recall: 60.71
toilet Recall: 89.66
sink Recall: 50.00
bathtub Recall: 61.29
garbagebin Recall: 36.79

IOU Thresh=0.5
cabinet Average Precision: 0.91
bed Average Precision: 29.13
chair Average Precision: 11.63
sofa Average Precision: 20.43
table Average Precision: 5.33
door Average Precision: 1.59
window Average Precision: 1.30
bookshelf Average Precision: 4.18
picture Average Precision: 0.00
counter Average Precision: 0.25
desk Average Precision: 16.56
curtain Average Precision: 0.13
refrigerator Average Precision: 4.87
showercurtrain Average Precision: 1.69
toilet Average Precision: 26.55
sink Average Precision: 1.95
bathtub Average Precision: 10.95
garbagebin Average Precision: 1.18
cabinet Recall: 10.75
bed Recall: 51.85
chair Recall: 30.12
sofa Recall: 45.36
table Recall: 22.29
door Recall: 8.14
window Recall: 6.38
bookshelf Recall: 28.57
picture Recall: 0.45
counter Recall: 7.69
desk Recall: 39.37
curtain Recall: 5.97
refrigerator Recall: 24.56
showercurtrain Recall: 10.71
toilet Recall: 41.38
sink Recall: 9.18
bathtub Recall: 19.35
garbagebin Recall: 10.19

Looking forward to your reply!

Question about size loss.

In function loss_size, the "gt_box_sizes" is scaled by point_cloud_dims on the scale of 0 to ∞ while the "pred_box_sizes" is scaled into (0, 1) by sigmoid. F.l1_loss is used to match "gt_box_sizes" and "pred_box_sizes". Is this the right thing to do?

In [1]: pred_box_sizes.max()
Out[1]: tensor(0.8369, device='cuda:0', grad_fn=<MaxBackward1>)

In [2]: gt_box_sizes.max()
Out[2]: tensor(2.5446, device='cuda:0')

In [3]:

Query about center offset

self.mlp_heads["center_head"](box_features).sigmoid().transpose(1, 2) - 0.5

Thank you for your work, it seems interesting!

  • Is 0.5 subtracted from center head to make offset in range [-0.5, 0.5] ?
  • As center offset is added to query points to get unnormalized center, is this based on assumption that center of object lies within +/- 0.5 m from the query point ?
    Can you please shed some light on this!

stuck in the 88 epoch

When I train the sunrgbd, it always stuck in the 88 epoch without any information. Following the commands:
sh scripts/sunrgbd_quick.sh
image
image

Possible to avoid PointNet?

Hi,

thanks for the wonderful work and the nice repo! So far, both the code and the paper have been a pleasure to read.

Is it possible to avoid using PointNet2 for the initial downsampling (the PointnetSAModuleVotes module)? I have problems compiling the code for pointnet, probably because I'm on a newer cuda version. Even if the performance is not exactly identical (just in case somebody cares, each error pertains to a wrong number of arguments to std::tuple), some workaround might be useful.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.