GithubHelp home page GithubHelp logo

traveller59 / second.pytorch Goto Github PK

View Code? Open in Web Editor NEW
1.7K 47.0 720.0 4.51 MB

SECOND for KITTI/NuScenes object detection

License: MIT License

Python 84.63% Dockerfile 0.37% HTML 1.92% JavaScript 9.27% CSS 0.11% Jupyter Notebook 3.69%
kitti object-detection voxelnet nuscenes

second.pytorch's Introduction

This Project is DEPRECATED, please use OpenPCDet or mmdetection3d instead, they both implement SECOND and support spconv 2.x.

SECOND for KITTI/NuScenes object detection (1.6.0 Alpha)

SECOND detector.

"Alpha" means there may be many bugs, config format may change, spconv API may change.

ONLY support python 3.6+, pytorch 1.0.0+. Tested in Ubuntu 16.04/18.04/Windows 10.

If you want to train nuscenes dataset, see this.

News

2019-4-1: SECOND V1.6.0alpha released: New Data API, NuScenes support, PointPillars support, fp16 and multi-gpu support.

2019-3-21: SECOND V1.5.1 (minor improvement and bug fix) released!

2019-1-20: SECOND V1.5 released! Sparse convolution-based network.

See release notes for more details.

WARNING: you should rerun info generation after every code update.

Performance in KITTI validation set (50/50 split)

car.fhd.config + 160 epochs (25 fps in 1080Ti):

Car [email protected], 0.70, 0.70:
bbox AP:90.77, 89.50, 80.80
bev  AP:90.28, 87.73, 79.67
3d   AP:88.84, 78.43, 76.88

car.fhd.config + 50 epochs + super converge (6.5 hours) + (25 fps in 1080Ti):

Car [email protected], 0.70, 0.70:
bbox AP:90.78, 89.59, 88.42
bev  AP:90.12, 87.87, 86.77
3d   AP:88.62, 78.31, 76.62

car.fhd.onestage.config + 50 epochs + super converge (6.5 hours) + (25 fps in 1080Ti):

Car [email protected], 0.70, 0.70:
bbox AP:97.65, 89.59, 88.72
bev  AP:90.38, 88.20, 86.98
3d   AP:89.16, 78.78, 77.41

Performance in NuScenes validation set (all.pp.config, NuScenes mini train set, 3517 samples, not v1.0-mini)

car Nusc dist [email protected], 1.0, 2.0, 4.0
62.90, 73.07, 76.77, 78.79
bicycle Nusc dist [email protected], 1.0, 2.0, 4.0
0.00, 0.00, 0.00, 0.00
bus Nusc dist [email protected], 1.0, 2.0, 4.0
9.53, 26.17, 38.01, 40.60
construction_vehicle Nusc dist [email protected], 1.0, 2.0, 4.0
0.00, 0.00, 0.44, 1.43
motorcycle Nusc dist [email protected], 1.0, 2.0, 4.0
9.25, 12.90, 13.69, 14.11
pedestrian Nusc dist [email protected], 1.0, 2.0, 4.0
61.44, 62.61, 64.09, 66.35
traffic_cone Nusc dist [email protected], 1.0, 2.0, 4.0
11.63, 13.14, 15.81, 21.22
trailer Nusc dist [email protected], 1.0, 2.0, 4.0
0.80, 9.90, 17.61, 23.26
truck Nusc dist [email protected], 1.0, 2.0, 4.0
9.81, 21.40, 27.55, 30.34

Install

1. Clone code

git clone https://github.com/traveller59/second.pytorch.git
cd ./second.pytorch/second

2. Install dependence python packages

It is recommend to use Anaconda package manager.

conda install scikit-image scipy numba pillow matplotlib
pip install fire tensorboardX protobuf opencv-python

If you don't have Anaconda:

pip install numba scikit-image scipy pillow

Follow instructions in spconv to install spconv.

If you want to train with fp16 mixed precision (train faster in RTX series, Titan V/RTX and Tesla V100, but I only have 1080Ti), you need to install apex.

If you want to use NuScenes dataset, you need to install nuscenes-devkit.

3. Setup cuda for numba (will be removed in 1.6.0 release)

you need to add following environment variable for numba.cuda, you can add them to ~/.bashrc:

export NUMBAPRO_CUDA_DRIVER=/usr/lib/x86_64-linux-gnu/libcuda.so
export NUMBAPRO_NVVM=/usr/local/cuda/nvvm/lib64/libnvvm.so
export NUMBAPRO_LIBDEVICE=/usr/local/cuda/nvvm/libdevice

4. add second.pytorch/ to PYTHONPATH

Prepare dataset

  • KITTI Dataset preparation

Download KITTI dataset and create some directories first:

└── KITTI_DATASET_ROOT
       ├── training    <-- 7481 train data
       |   ├── image_2 <-- for visualization
       |   ├── calib
       |   ├── label_2
       |   ├── velodyne
       |   └── velodyne_reduced <-- empty directory
       └── testing     <-- 7580 test data
           ├── image_2 <-- for visualization
           ├── calib
           ├── velodyne
           └── velodyne_reduced <-- empty directory

Then run

python create_data.py kitti_data_prep --data_path=KITTI_DATASET_ROOT

Download NuScenes dataset:

└── NUSCENES_TRAINVAL_DATASET_ROOT
       ├── samples       <-- key frames
       ├── sweeps        <-- frames without annotation
       ├── maps          <-- unused
       └── v1.0-trainval <-- metadata and annotations
└── NUSCENES_TEST_DATASET_ROOT
       ├── samples       <-- key frames
       ├── sweeps        <-- frames without annotation
       ├── maps          <-- unused
       └── v1.0-test     <-- metadata

Then run

python create_data.py nuscenes_data_prep --data_path=NUSCENES_TRAINVAL_DATASET_ROOT --version="v1.0-trainval" --max_sweeps=10
python create_data.py nuscenes_data_prep --data_path=NUSCENES_TEST_DATASET_ROOT --version="v1.0-test" --max_sweeps=10
--dataset_name="NuscenesDataset"

This will create gt database without velocity. to add velocity, use dataset name NuscenesDatasetVelo.

  • Modify config file

There is some path need to be configured in config file:

train_input_reader: {
  ...
  database_sampler {
    database_info_path: "/path/to/dataset_dbinfos_train.pkl"
    ...
  }
  dataset: {
    dataset_class_name: "DATASET_NAME"
    kitti_info_path: "/path/to/dataset_infos_train.pkl"
    kitti_root_path: "DATASET_ROOT"
  }
}
...
eval_input_reader: {
  ...
  dataset: {
    dataset_class_name: "DATASET_NAME"
    kitti_info_path: "/path/to/dataset_infos_val.pkl"
    kitti_root_path: "DATASET_ROOT"
  }
}

Usage

train

I recommend to use script.py to train and eval. see script.py for more details.

train with single GPU

python ./pytorch/train.py train --config_path=./configs/car.fhd.config --model_dir=/path/to/model_dir

train with multiple GPU (need test, I only have one GPU)

Assume you have 4 GPUs and want to train with 3 GPUs:

CUDA_VISIBLE_DEVICES=0,1,3 python ./pytorch/train.py train --config_path=./configs/car.fhd.config --model_dir=/path/to/model_dir --multi_gpu=True

Note: The batch_size and num_workers in config file is per-GPU, if you use multi-gpu, they will be multiplied by number of GPUs. Don't modify them manually.

You need to modify total step in config file. For example, 50 epochs = 15500 steps for car.lite.config and single GPU, if you use 4 GPUs, you need to divide steps and steps_per_eval by 4.

train with fp16 (mixed precision)

Modify config file, set enable_mixed_precision to true.

  • Make sure "/path/to/model_dir" doesn't exist if you want to train new model. A new directory will be created if the model_dir doesn't exist, otherwise will read checkpoints in it.

  • training process use batchsize=6 as default for 1080Ti, you need to reduce batchsize if your GPU has less memory.

  • Currently only support single GPU training, but train a model only needs 20 hours (165 epoch) in a single 1080Ti and only needs 50 epoch to reach 78.3 AP with super converge in car moderate 3D in Kitti validation dateset.

evaluate

python ./pytorch/train.py evaluate --config_path=./configs/car.fhd.config --model_dir=/path/to/model_dir --measure_time=True --batch_size=1
  • detection result will saved as a result.pkl file in model_dir/eval_results/step_xxx or save as official KITTI label format if you use --pickle_result=False.

pretrained model

You can download pretrained models in google drive. The car_fhd model is corresponding to car.fhd.config.

Note that this pretrained model is trained before a bug of sparse convolution fixed, so the eval result may slightly worse.

Docker (Deprecated. I can't push docker due to network problem.)

You can use a prebuilt docker for testing:

docker pull scrin/second-pytorch 

Then run:

nvidia-docker run -it --rm -v /media/yy/960evo/datasets/:/root/data -v $HOME/pretrained_models:/root/model --ipc=host second-pytorch:latest
python ./pytorch/train.py evaluate --config_path=./configs/car.config --model_dir=/root/model/car

Try Kitti Viewer Web

Major step

  1. run python ./kittiviewer/backend/main.py main --port=xxxx in your server/local.

  2. run cd ./kittiviewer/frontend && python -m http.server to launch a local web server.

  3. open your browser and enter your frontend url (e.g. http://127.0.0.1:8000, default]).

  4. input backend url (e.g. http://127.0.0.1:16666)

  5. input root path, info path and det path (optional)

  6. click load, loadDet (optional), input image index in center bottom of screen and press Enter.

Inference step

Firstly the load button must be clicked and load successfully.

  1. input checkpointPath and configPath.

  2. click buildNet.

  3. click inference.

GuidePic

Try Kitti Viewer (Deprecated)

You should use kitti viewer based on pyqt and pyqtgraph to check data before training.

run python ./kittiviewer/viewer.py, check following picture to use kitti viewer: GuidePic

Concepts

  • Kitti lidar box

A kitti lidar box is consist of 7 elements: [x, y, z, w, l, h, rz], see figure.

Kitti Box Image

All training and inference code use kitti box format. So we need to convert other format to KITTI format before training.

  • Kitti camera box

A kitti camera box is consist of 7 elements: [x, y, z, l, h, w, ry].

second.pytorch's People

Contributors

dashidhy avatar finddefinition avatar traveller59 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

second.pytorch's Issues

can't compile the nms_kernel.cu.cc

I have met the problem while compile the "../cc/nms/nms_kernel.cu.cc", "../cc/nms/nms.cc"
the log output is below:

/usr/lib64/python3.6/site-packages/h5py/init.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type.
from ._conv import register_converters as _register_converters
nvcc -std=c++14 -c -o ../cc/nms/nms_kernel.cu.o ../cc/nms/nms_kernel.cu.cc -I/usr/local/cuda/include -x cu -Xcompiler -fPIC -arch=sm_52 --expt-relaxed-constexpr
nvcc fatal : Value 'c++14' is not defined for option 'std'
concurrent.futures.process._RemoteTraceback:
"""
Traceback (most recent call last):
File "/mnt/data-3/data/wenyong.zheng/vxlnet/second.pytorch/second/core/non_max_suppression/nms_cpu.py", line 10, in
from second.core.non_max_suppression.nms import (
ModuleNotFoundError: No module named 'second.core.non_max_suppression.nms'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/lib64/python3.6/concurrent/futures/process.py", line 175, in _process_worker
r = call_item.fn(*call_item.args, **call_item.kwargs)
File "/usr/lib64/python3.6/concurrent/futures/process.py", line 153, in _process_chunk
return [fn(*args) for args in chunk]
File "/usr/lib64/python3.6/concurrent/futures/process.py", line 153, in
return [fn(*args) for args in chunk]
File "/mnt/data-3/data/wenyong.zheng/vxlnet/second.pytorch/second/utils/buildtools/command.py", line 256, in compile_func
raise RuntimeError("compile failed with retcode", ret.returncode)
RuntimeError: ('compile failed with retcode', 1)
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "create_data.py", line 9, in
from second.core import box_np_ops
File "/mnt/data-3/data/wenyong.zheng/vxlnet/second.pytorch/second/core/box_np_ops.py", line 7, in
from second.core.non_max_suppression.nms_gpu import rotate_iou_gpu_eval
File "/mnt/data-3/data/wenyong.zheng/vxlnet/second.pytorch/second/core/non_max_suppression/init.py", line 1, in
from second.core.non_max_suppression.nms_cpu import nms_jit, soft_nms_jit
File "/mnt/data-3/data/wenyong.zheng/vxlnet/second.pytorch/second/core/non_max_suppression/nms_cpu.py", line 18, in
cuda=True)
File "/mnt/data-3/data/wenyong.zheng/vxlnet/second.pytorch/second/utils/buildtools/pybind11_build.py", line 113, in load_pb11
cmds, cwd, num_workers=num_workers, compiler=compiler)
File "/mnt/data-3/data/wenyong.zheng/vxlnet/second.pytorch/second/utils/buildtools/command.py", line 278, in compile_libraries
if any([r.returncode != 0 for r in rets]):
File "/mnt/data-3/data/wenyong.zheng/vxlnet/second.pytorch/second/utils/buildtools/command.py", line 278, in
if any([r.returncode != 0 for r in rets]):
File "/usr/lib64/python3.6/concurrent/futures/process.py", line 366, in _chain_from_iterable_of_lists
for element in iterable:
File "/usr/lib64/python3.6/concurrent/futures/_base.py", line 586, in result_iterator
yield fs.pop().result()
File "/usr/lib64/python3.6/concurrent/futures/_base.py", line 425, in result
return self.__get_result()
File "/usr/lib64/python3.6/concurrent/futures/_base.py", line 384, in __get_result
raise self._exception
RuntimeError: ('compile failed with retcode', 1)

Error in save model during training

I try to train model with cmd

python pytorch/train.py train --config_path=./configs/car_test.config --model_dir=./predicts

but encountered following error message

  File "pytorch/train.py", line 396, in train
    net.get_global_step())
  File "/mine/KITTI/second.pytorch.mine/torchplus/train/checkpoint.py", line 173, in save_models
    save(model_dir, model, name, global_step, max_to_keep, keep_latest)
  File "/mine/KITTI/second.pytorch.mine/torchplus/train/checkpoint.py", line 107, in save
    os.remove(str(Path(model_dir) / ckpt_to_delete))
FileNotFoundError: [Errno 2] No such file or directory: 'predicts/predicts/voxelnet-2487.tckpt'

The path seems incorrect which leads to the error of removing.

kitti viewer

I am now running my code in the server cluster , how can I view the results locally?

nvcc fatal : Value 'sm_75' is not defined for option 'gpu-architecture'

Everything works perfect with my old GTX 960. But after replacing with a new RTX2070, it comes with nvcc error:

nvcc -std=c++11 -c -o ../cc/nms/nms_kernel.cu.o ../cc/nms/nms_kernel.cu.cc -I/usr/local/cuda/include -x cu -Xcompiler -fPIC -arch=sm_75 --expt-relaxed-constexpr

nvcc fatal : Value 'sm_75' is not defined for option 'gpu-architecture'
concurrent.futures.process._RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/yangyang/second.pytorch/second/core/non_max_suppression/nms_cpu.py", line 10, in <module>
from second.core.non_max_suppression.nms import (
ModuleNotFoundError: No module named 'second.core.non_max_suppression.nms'

My enviroment is:

yangyang@WS016:~$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176


yangyang@WS016:~$ nvidia-smi
Tue Nov 20 12:04:05 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.73 Driver Version: 410.73 CUDA Version: 9.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce RTX 2070 Off | 00000000:42:00.0 On | N/A |
| 0% 44C P0 51W / 175W | 342MiB / 7949MiB | 2% Default |
+-------------------------------+----------------------+----------------------+

BTW, by directly hack the value of arch to sm_70 at ~/second.pytorch/second/utils/find/py, I can bypass this error. But Im not sure if it will cause other problems.

Question about data augment for targets

Hi, I have two questions about data augmentation.
First,
In config file, I notice you set target random rotation at range [-45,45].

groundtruth_rotation_uniform_noise: [-0.78539816, 0.78539816]

and random shifting at
groundtruth_localization_noise_std: [1.0, 1.0, 0.5]

My question is whether these two ranges are a little too wide which may cause the collision of targets.
Second,
I notice you do not switch on this flag
remove_points_after_sample: false

But i think this operation will help to generate samples closer to real. Have you ever try it?

Thank you in advance.

Inference performance in KITTI dataset

Hello,

Thanks @traveller59 for sharing the code! I tried to implement it as a ROS node repository link and test the performance with KITTI raw dataset 2011_09_26_drive_0005. You can find a video at the youtube link.

I suspect that I might have done something wrong and the performance could be improved. If any of you can check out my code and let me know if you have any suggestions / comments, feel free to do so. Thank you.

Best Regards,

Yuesong

train error

hello, there is some error happened, can u help me ,thank u

Traceback (most recent call last):
File "/home/dingjiangang/anaconda3/lib/python3.6/site-packages/google/protobuf/text_format.py", line 1500, in _ParseAbstractInteger
return int(text, 0)
ValueError: invalid literal for int() with base 0: 'upsample_strides'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/dingjiangang/anaconda3/lib/python3.6/site-packages/google/protobuf/text_format.py", line 1449, in _ConsumeInteger
result = ParseInteger(tokenizer.token, is_signed=is_signed, is_long=is_long)
File "/home/dingjiangang/anaconda3/lib/python3.6/site-packages/google/protobuf/text_format.py", line 1471, in ParseInteger
result = _ParseAbstractInteger(text, is_long=is_long)
File "/home/dingjiangang/anaconda3/lib/python3.6/site-packages/google/protobuf/text_format.py", line 1502, in _ParseAbstractInteger
raise ValueError('Couldn't parse integer: %s' % text)
ValueError: Couldn't parse integer: upsample_strides

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "./pytorch/train.py", line 643, in
fire.Fire()
File "/home/dingjiangang/anaconda3/lib/python3.6/site-packages/fire/core.py", line 127, in Fire
component_trace = _Fire(component, args, context, name)
File "/home/dingjiangang/anaconda3/lib/python3.6/site-packages/fire/core.py", line 366, in _Fire
component, remaining_args)
File "/home/dingjiangang/anaconda3/lib/python3.6/site-packages/fire/core.py", line 542, in _CallCallable
result = fn(*varargs, **kwargs)
File "./pytorch/train.py", line 107, in train
text_format.Merge(proto_str, config)
File "/home/dingjiangang/anaconda3/lib/python3.6/site-packages/google/protobuf/text_format.py", line 536, in Merge
descriptor_pool=descriptor_pool)
File "/home/dingjiangang/anaconda3/lib/python3.6/site-packages/google/protobuf/text_format.py", line 590, in MergeLines
return parser.MergeLines(lines, message)
File "/home/dingjiangang/anaconda3/lib/python3.6/site-packages/google/protobuf/text_format.py", line 623, in MergeLines
self._ParseOrMerge(lines, message)
File "/home/dingjiangang/anaconda3/lib/python3.6/site-packages/google/protobuf/text_format.py", line 638, in _ParseOrMerge
self._MergeField(tokenizer, message)
File "/home/dingjiangang/anaconda3/lib/python3.6/site-packages/google/protobuf/text_format.py", line 763, in _MergeField
merger(tokenizer, message, field)
File "/home/dingjiangang/anaconda3/lib/python3.6/site-packages/google/protobuf/text_format.py", line 837, in _MergeMessageField
self._MergeField(tokenizer, sub_message)
File "/home/dingjiangang/anaconda3/lib/python3.6/site-packages/google/protobuf/text_format.py", line 763, in _MergeField
merger(tokenizer, message, field)
File "/home/dingjiangang/anaconda3/lib/python3.6/site-packages/google/protobuf/text_format.py", line 837, in _MergeMessageField
self._MergeField(tokenizer, sub_message)
File "/home/dingjiangang/anaconda3/lib/python3.6/site-packages/google/protobuf/text_format.py", line 763, in _MergeField
merger(tokenizer, message, field)
File "/home/dingjiangang/anaconda3/lib/python3.6/site-packages/google/protobuf/text_format.py", line 837, in _MergeMessageField
self._MergeField(tokenizer, sub_message)
File "/home/dingjiangang/anaconda3/lib/python3.6/site-packages/google/protobuf/text_format.py", line 757, in _MergeField
merger(tokenizer, message, field)
File "/home/dingjiangang/anaconda3/lib/python3.6/site-packages/google/protobuf/text_format.py", line 878, in _MergeScalarField
value = _ConsumeUint32(tokenizer)
File "/home/dingjiangang/anaconda3/lib/python3.6/site-packages/google/protobuf/text_format.py", line 1377, in _ConsumeUint32
return _ConsumeInteger(tokenizer, is_signed=False, is_long=False)
File "/home/dingjiangang/anaconda3/lib/python3.6/site-packages/google/protobuf/text_format.py", line 1451, in _ConsumeInteger
raise tokenizer.ParseError(str(e))
google.protobuf.text_format.ParseError: 27:7 : Couldn't parse integer: upsample_strides

sparse_rpn

self._total_forward_time += time.time() - t

Hi, many thanks for sharing meaningful work.

The self.sparse_rpn, 673 lines in voxelnet.py, is still not implemented. Do you have the plan to finish this function, self.sparse_rpn, in the near future?

Regards,

Different state_dic in pre-trained model

Hello,

Thank you for sharing your code. This is such an excellent work.

I am trying to get the pre-trained model working on my computer. I did the change on SparseConvnet according to your README and rebuild the SparseConvnet. However, when I try to run evaluation with your pre-trained model, I got the following error:
screenshot from 2018-11-02 14-57-24

From my understanding, this is caused by different names between the current convnet middle layer and the pretrain model. So I cannot load the pretrain model on convnet middle layer. I tried to load the pre-trained model with "load_state_dict(state_dict, strict=False)" but clearly all conv layers are still not loaded. This is verified by the source code. What would you suggest to do to fix it? I am thinking about opening the voxelnet-204270.tckpt file somehow and edit the state_dict but no succeeding.

Thanks for your time.
Vince

Van and truck detection

Could you kindly share the config file of other type of object, e.g. van and truck.

Thanks in advance

training error

when I change the sparseconv to normal conv, errors occurred below:

Traceback (most recent call last):
File "pytorch/train.py", line 647, in
fire.Fire()
File "/home/users/wenyong.zheng/.local/lib/python3.6/site-packages/fire/core.py", line 127, in Fire
component_trace = _Fire(component, args, context, name)
File "/home/users/wenyong.zheng/.local/lib/python3.6/site-packages/fire/core.py", line 366, in _Fire
component, remaining_args)
File "/home/users/wenyong.zheng/.local/lib/python3.6/site-packages/fire/core.py", line 542, in _CallCallable
result = fn(*varargs, **kwargs)
File "pytorch/train.py", line 402, in train
raise e
File "pytorch/train.py", line 249, in train
ret_dict = net(example_torch)
File "/home/users/wenyong.zheng/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "../second/pytorch/models/voxelnet.py", line 679, in forward
voxel_features, coors, batch_size_dev)
File "/home/users/wenyong.zheng/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "../second/pytorch/models/voxelnet.py", line 331, in forward
ret = scatter_nd(coors, voxel_features, output_shape)
File "../torchplus/ops/array_ops.py", line 20, in scatter_nd
ret[slices] = updates.view(*output_shape)
RuntimeError: tensors used as indices must be long or byte tensors

Is pretrained model necessary for the provided performance?

Hi, thanks for your work.
I wonder is the pretrained model necessary for the detection performance, cause I tried to train from scratch and the results seemed terrible.

Besides, I could not open google drive. Could you provide the pretrained models in another way?

Unable to Run

Results look excellent however unfortunately I am unable to run.

OS: Ubuntu 18.04
GPU: 1050ti

python3 ./pytorch/train.py evaluate --config_path=./configs/car.config --model_dir=pretrained_models/car/

Results in Segmentation fault (core dumped)

pip3 freeze

Do you have any video?

Hi, can you provide a video of what we should expect to see in Rviz (or any other kind of visualization tool)?
Thank you very much!

Could not initialize OpenGL

i got this problem ,Could not initialize OpenGL Aborted (core dumped) when i try to run python ./kittiviewer/viewer.py. i don't know how to fix it.

New version of the sparse conv implementation?

Hello,

I have carefully checked your paper and your code. Nice work! Thanks for sharing this work.
May I know where is the implementation of the mentioned sparse convolution in your paper?
(i.e. the GPU-based rule generation algorithm)

Thanks,
Lin

error removing saved model during training

I try to train model with cmd

python pytorch/train.py train --config_path=./configs/car_test.config --model_dir=./predicts

but encountered following error message

  File "pytorch/train.py", line 396, in train
    net.get_global_step())
  File "/mine/KITTI/second.pytorch.mine/torchplus/train/checkpoint.py", line 173, in save_models
    save(model_dir, model, name, global_step, max_to_keep, keep_latest)
  File "/mine/KITTI/second.pytorch.mine/torchplus/train/checkpoint.py", line 107, in save
    os.remove(str(Path(model_dir) / ckpt_to_delete))
FileNotFoundError: [Errno 2] No such file or directory: 'predicts/predicts/voxelnet-2487.tckpt'

The path seems incorrect which leads to the error of removing.

eval error during training

Hi,
I am trying to train my own model with pytorch-1.0.
But it reports EVALUATION error during training.

I am not familiar to pytorch, need help.
Thank you

......
......
step=6150, steptime=0.651, cls_loss=0.257, cls_loss_rt=0.186, loc_loss=0.466, loc_loss_rt=0.401, rpn_acc=0.997, prec@10=0.0855, rec@10=0.897, prec@30=0.539, rec@30=0.604, prec@50=0.808, rec@50=0.346, prec@70=0.981, rec@70=0.116, prec@80=0.986, rec@80=0.0299, prec@90=0.885, rec@90=0.000593, prec@95=0.087, rec@95=1.82e-06, loss.loc_elem=[0.00809, 0.00449, 0.0655, 0.0176, 0.0335, 0.0395, 0.0319], loss.cls_pos_rt=0.143, loss.cls_neg_rt=0.0429, loss.dir_rt=0.409, num_vox=16246, num_pos=71, num_neg=15578, num_anchors=15766, lr=0.0002, image_idx=851
#################################
# EVAL
#################################
Generate output labels...
Traceback (most recent call last):
  File "./pytorch/train.py", line 643, in <module>
    fire.Fire()
  File "/usr/lib/python3.7/site-packages/fire/core.py", line 127, in Fire
    component_trace = _Fire(component, args, context, name)
  File "/usr/lib/python3.7/site-packages/fire/core.py", line 366, in _Fire
    component, remaining_args)
  File "/usr/lib/python3.7/site-packages/fire/core.py", line 542, in _CallCallable
    result = fn(*varargs, **kwargs)
  File "./pytorch/train.py", line 398, in train
    raise e
  File "./pytorch/train.py", line 358, in train
    model_cfg.lidar_input)
  File "./pytorch/train.py", line 472, in predict_kitti_to_anno
    predictions_dicts = net(example)
  File "/usr/lib/python3.7/site-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/mine/KITTI/second.pytorch.orig/second/pytorch/models/voxelnet.py", line 746, in forward
    return self.predict(example, preds_dict)
  File "/mine/KITTI/second.pytorch.orig/second/pytorch/models/voxelnet.py", line 874, in predict
    device=total_scores.device).type_as(total_scores)
RuntimeError: legacy constructor for device type: cpu was passed device type: cuda, but device type must be: cpu

Error in kitti viewer

I try to use kitti viewer to show my training result.
There is no error when I clicked load info and load detection button.
But I clicked plot button. The kitti viewer crashed.

Traceback (most recent call last):
  File "kittiviewer/viewer.py", line 1248, in on_plotButtonPressed
    if self.plot_all(image_idx):
  File "kittiviewer/viewer.py", line 1233, in plot_all
    self.draw_gt_in_image()
  File "kittiviewer/viewer.py", line 995, in draw_gt_in_image
    self.gt_boxes, rect, Trv2c)
  File "/root/second/second/core/box_np_ops.py", line 632, in box_lidar_to_camera
    xyz_lidar = data[:, 0:3]
TypeError: 'NoneType' object is not subscriptable
Aborted (core dumped)

selection_008

How can I fix it. Thank you.

Training with a NVIDIA GTX 1050

Hello,
Sorry for asking again a question.
I'm not sure if its possible to train the neuronal network with an NVIDIA GTX 1050
as it always says that there was an CUDA Error (Out Of Memory).
I already changed the Batchsize from 3 to 1. Do I have to do something else?
(maybe changing the maximum voxels,...?)
Thank you for your help!
Greetings
Patrick

rgb or pointcloud data used ?

Thanks for a lot for your great work.

I have two questions about SECOND 3D detection. Can you give me some advices.

Q1: rgb or pointcloud data be used during detection?
Q2: Can you recommend me some reference documentations to realize the SECOND theory?

Thank you!

RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED

Hello when I do the training for people detection, the following error comes:

Traceback (most recent call last):
File "./pytorch/train.py", line 643, in
fire.Fire()
File "/home/yangyang/anaconda3/envs/dl-second/lib/python3.6/site-packages/fire/core.py", line 127, in Fire
component_trace = _Fire(component, args, context, name)
File "/home/yangyang/anaconda3/envs/dl-second/lib/python3.6/site-packages/fire/core.py", line 366, in _Fire
component, remaining_args)
File "/home/yangyang/anaconda3/envs/dl-second/lib/python3.6/site-packages/fire/core.py", line 542, in _CallCallable
result = fn(*varargs, **kwargs)
File "./pytorch/train.py", line 398, in train
raise e
File "./pytorch/train.py", line 245, in train
ret_dict = net(example_torch)
File "/home/yangyang/anaconda3/envs/dl-second/lib/python3.6/site-packages/torch/nn/modules/module.py", line 479, in call
result = self.forward(*input, **kwargs)
File "/home/yangyang/second.pytorch/second/pytorch/models/voxelnet.py", line 671, in forward
voxel_features = self.voxel_feature_extractor(voxels, num_points)
File "/home/yangyang/anaconda3/envs/dl-second/lib/python3.6/site-packages/torch/nn/modules/module.py", line 479, in call
result = self.forward(*input, **kwargs)
File "/home/yangyang/second.pytorch/second/pytorch/models/voxelnet.py", line 140, in forward
x = self.vfe1(features)
File "/home/yangyang/anaconda3/envs/dl-second/lib/python3.6/site-packages/torch/nn/modules/module.py", line 479, in call
result = self.forward(*input, **kwargs)
File "/home/yangyang/second.pytorch/second/pytorch/models/voxelnet.py", line 82, in forward
x = self.norm(x.permute(0, 2, 1).contiguous()).permute(0, 2,
File "/home/yangyang/anaconda3/envs/dl-second/lib/python3.6/site-packages/torch/nn/modules/module.py", line 479, in call
result = self.forward(*input, **kwargs)
File "/home/yangyang/anaconda3/envs/dl-second/lib/python3.6/site-packages/torch/nn/modules/batchnorm.py", line 67, in forward
exponential_average_factor, self.eps)
File "/home/yangyang/anaconda3/envs/dl-second/lib/python3.6/site-packages/torch/nn/functional.py", line 1429, in batch_norm
training, momentum, eps, torch.backends.cudnn.enabled
RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED

my env is: CUDA 9.0 on a Ubuntu 16.04 workstation.

BTW, the problem comes after the old GTX 960 graphic card was replaced by the new RTX 2070. Everything was OK before that.

Unique parallel algorithm

Hi.

I read your SECOND paper and I became curious about the "unique parallel algorithm" mentioned in page 7.
What parallel algorithm did you use to obtain the unique output indexes?

Thanks in advance!

How to generate curves?

I've finished the training step, but I don't know how to generate the performance curve. Can you give me some help?

How to train multi-classes at the same time?

Hi, If I want train it to detect multi classes, what should I change?

model: {
  second: {
    voxel_generator {
      point_cloud_range : [0, -40, -3, 70.4, 40, 1]
      # point_cloud_range : [0, -32.0, -3, 52.8, 32.0, 1]
      voxel_size : [0.2, 0.2, 0.4]
      max_number_of_points_per_voxel : 35
    }

    num_class: 1
    voxel_feature_extractor: {
      module_class_name: "VoxelFeatureExtractor"
      num_filters: [32, 128]
      with_distance: false
    }
.....

RuntimeError

Hi, have you ever met the following bug? Every time when code runs to loss.backward(), the bug will appear.

raceback (most recent call last):
  File "/home/b/miniconda3/lib/python3.7/site-packages/pudb/__init__.py", line 119, in runscript
    dbg._runscript(mainpyfile)
  File "/home/b/miniconda3/lib/python3.7/site-packages/pudb/debugger.py", line 457, in _runscript
    self.run(statement, globals=globals_, locals=locals_)
  File "/home/b/miniconda3/lib/python3.7/bdb.py", line 585, in run 
    exec(cmd, globals, locals)
  File "<string>", line 1, in <module>
  File "./pytorch/train.py", line 653, in <module>
    fire.Fire()
  File "/home/b/miniconda3/lib/python3.7/site-packages/fire/core.py", line 127, in Fire
    component_trace = _Fire(component, args, context, name)
  File "/home/b/miniconda3/lib/python3.7/site-packages/fire/core.py", line 366, in _Fire
    component, remaining_args)
  File "/home/b/miniconda3/lib/python3.7/site-packages/fire/core.py", line 542, in _CallCallable
    result = fn(*varargs, **kwargs)
  File "./pytorch/train.py", line 408, in train
    raise e
  File "./pytorch/train.py", line 271, in train
    loss.backward()
  File "/home/b/miniconda3/lib/python3.7/site-packages/torch/tensor.py", line 102, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/home/b/miniconda3/lib/python3.7/site-packages/torch/autograd/__init__.py", line 90, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: Function ConvolutionFunctionBackward returned an invalid gradient at index 0 - got [0] but expected shape compatible with [730, 512]

train.py core dumped in scn_input()

python3 ./second/pytorch/train.py train --config_path=./second/configs/car.config --model_dir=/media/1t/data/kitti/second_model
Segmentation fault (core dumped)

seems it core dump at voxelnet.py: line 278
ret = self.scn_input((coors.cpu(), voxel_features, batch_size))

Here is docker file I used to generate the docker image. (I cp the extension.h fro pytorch 1.0)

#docker build -f Dockerfile-python35-pytorch41  -t vacuum/pytorch:python35-pytorch41-simple-v1
From nvidia/cuda:9.1-cudnn7-devel-ubuntu16.04
#From nvidia/cuda:9.1-base-ubuntu16.04
RUN apt update -y
RUN apt-get install software-properties-common python-software-properties -y
RUN add-apt-repository ppa:deadsnakes/ppa
RUN apt update -y && apt install -y \
    python3.6 \
    python3-pip \
    python3-tk \
    libglib2.0-0 \
    libsm6 \
    libxext6 \
    libfontconfig1 \
    libxrender1 \
    vim \
    less \
    git 

RUN python3.6 -m pip install torch torchvision opencv-python
RUN python3.6 -m pip install shapely fire pybind11 pyqtgraph tensorboardX protobuf numba
RUN apt-get install libboost-all-dev -y
RUN apt-get install -y cuda-nvprof-9-1
RUN apt-get install -y libsparsehash-dev
RUN apt-get install -y python3.6-dev
RUN python3.6 -m pip install pillow
RUN rm -fr /usr/bin/python
RUN rm -fr /usr/bin/python3
RUN ln -s /usr/bin/python3.6 /usr/bin/python
RUN ln -s /usr/bin/python3.6 /usr/bin/python3
RUN git clone https://github.com/facebookresearch/SparseConvNet.git
COPY extension.h /usr/local/lib/python3.6/dist-packages/torch/lib/include/torch/extension.h
RUN cd SparseConvNet && bash build.sh && cd ..
ENV NUMBAPRO_CUDA_DRIVER /usr/lib/x86_64-linux-gnu/libcuda.so
ENV NUMBAPRO_NVVM /usr/local/cuda/nvvm/lib64/libnvvm.so
ENV NUMBAPRO_LIBDEVICE /usr/local/cuda/nvvm/libdevice
RUN apt install -y gdb psmisc

only 70 AP in car moderate validation dateset

Hi, traveller, Thank you for sharing such a well designed code.

I trained the model without any modification from scratch, but the performance (shown below) is far from the paper. Do you know why? Maybe there's something wrong with the code?

Car [email protected], 0.70, 0.70: 
bbox AP:87.55, 83.90, 77.61 
bev AP:87.04, 83.55, 77.43 
3d AP:80.62, 70.94, 65.31 
aos AP:87.41, 83.10, 76.56 
Car [email protected], 0.50, 0.50: 
bbox AP:87.55, 83.90, 77.61 
bev AP:88.46, 86.74, 85.87 
3d AP:88.41, 86.22, 85.26 
aos AP:87.41, 83.10, 76.56 

Best Regards

error evaluate

when I tried to evalute the trained model, I got:
python ./pytorch/train.py evaluate --config_path=./configs/car.config --model_dir=./data/models /home/users/benjin.zhu/data2/libs/anaconda3.6/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: compiletime version 3.5 of module 'tensorflow.python.framework.fast_tensor_util' does not match runtime version 3.6 return f(*args, **kwds) /home/users/benjin.zhu/data2/libs/anaconda3.6/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from floattonp.floatingis deprecated. In future, it will be treated asnp.float64 == np.dtype(float).type. from ._conv import register_converters as _register_converters [ 11 400 352] Restoring parameters from data/models/voxelnet-0.tckpt remain number of infos: 3769 Generate output labels... [1] 20341 segmentation fault python ./pytorch/train.py evaluate --config_path=./configs/car.config

Then I modify convolution.py and submanifoldConvolution.py as the README, I got:
RuntimeError: Error(s) in loading state_dict for VoxelNet: size mismatch for middle_feature_extractor.middle_conv.0.weight: copying a param of torch.Size([27, 128, 64]) from checkpoint, where the shape is torch.Size([3456, 64]) in current model. size mismatch for middle_feature_extractor.middle_conv.2.weight: copying a param of torch.Size([3, 64, 64]) from checkpoint, where the shape is torch.Size([192, 64]) in current model. size mismatch for middle_feature_extractor.middle_conv.4.weight: copying a param of torch.Size([27, 64, 64]) from checkpoint, where the shape is torch.Size([1728, 64]) in current model. size mismatch for middle_feature_extractor.middle_conv.6.weight: copying a param of torch.Size([27, 64, 64]) from checkpoint, where the shape is torch.Size([1728, 64]) in current model. size mismatch for middle_feature_extractor.middle_conv.8.weight: copying a param of torch.Size([3, 64, 64]) from checkpoint, where the shape is torch.Size([192, 64]) in current model.

Discrepancy between the paper and your README

The AP for the hard case is different in the paper and the README.
For example the 3d detection, in README it's 74.66 but in the paper you have 69.10 only.
The BEV AP for the hard case is also different.

Problem with nvcc

Hello,
I tried to execute the "create_data.py" python file but the command "nvcc ....." always fails
as it says "nvcc: not found"

g++ /tmp/tmppqrwb6ac.cc -o tmppqrwb6ac -I/usr/local/cuda/include -L/usr/local/cuda/lib64 -lcudart
nvcc -std=c++11 -c -o ../cc/nms/nms_kernel.cu.o ../cc/nms/nms_kernel.cu.cc -I/usr/local/cuda/include -x cu -Xcompiler -fPIC -arch=sm_61 --expt-relaxed-constexpr 
/bin/sh: 1: nvcc: not found
concurrent.futures.process._RemoteTraceback: 
"""

Anyway, if I use nvcc normaly in the shell, it works fine.

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Sun_Oct_22_03:08:45_CDT_2017
Cuda compilation tools, release 9.0, V9.0.225

I use Ubuntu 16.04
with Python 3.6.6
and pyTorch 0.4.1

Thank you very much for your help!

Greetings Patrick

training error

while training the model, I met the problem:
Traceback (most recent call last):
File "pytorch/train.py", line 21, in
from second.builder import target_assigner_builder, voxel_builder
File "../second/builder/target_assigner_builder.py", line 4, in
from second.protos import target_pb2, anchors_pb2
File "../second/protos/target_pb2.py", line 16, in
from second.protos import similarity_pb2 as second_dot_protos_dot_similarity__pb2
File "../second/protos/similarity_pb2.py", line 22, in
serialized_pb=_b('\n\x1esecond/protos/similarity.proto\x12\rsecond.protos"\xff\x01\n\x1aRegionSimilarityCalculator\x12\x43\n\x15rotate_iou_similarity\x18\x01 \x01(\x0b\x32".second.protos.RotateIouSimilarityH\x00\x12\x45\n\x16nearest_iou_similarity\x18\x02 \x01(\x0b\x32#.second.protos.NearestIouSimilarityH\x00\x12@\n\x13\x64istance_similarity\x18\x03 \x01(\x0b\x32!.second.protos.DistanceSimilarityH\x00\x42\x13\n\x11region_similarity"\x15\n\x13RotateIouSimilarity"\x16\n\x14NearestIouSimilarity"Z\n\x12\x44istanceSimilarity\x12\x15\n\rdistance_norm\x18\x01 \x01(\x02\x12\x15\n\rwith_rotation\x18\x02 \x01(\x08\x12\x16\n\x0erotation_alpha\x18\x03 \x01(\x02\x62\x06proto3')
TypeError: new() got an unexpected keyword argument 'serialized_options'

can I commit the argument 'serialized_options' ?

Harvesting negatives from Don'tCare examples

Hi Yan. Thanks so much for making your code public.

I was wondering about the way Don'tCare examples (int label -1) are handled. Based on my understanding of the source code, it seems that these examples are removed (filtered) rather than given a weight of 0.0 for purposes of computing cls and reg loss. Doesn't this mean that some anchors overlapping with Don'tCare regions will be chosen as negatives by the TargetAssigner?

My suggestion is that all anchors having any overlap with a Don'tCare example should be given a target class of -1, similar to how anchors with IOU between 0.45 and 0.6 are handled. This is also how Faster R-CNN handles cross-boundary ground truth objects. What do you think?

How to detect all points?

Hi,
I do the detection on VLP-16 data, result is shown in RVIZ and it's pretty good.
But i have noticed that only the points with x > 0 are detected. Do you know how to make the net detecting all points?

car_dect

GPU-based 3D Rule Generation

Hi author, thank you very much for sharing such good codes.

I have one question that core idea of SECOND is to use GPU-based 3D rule generation to accelerate the sparse convolution, but I didn't find which part code in the repository is for rule generation. Would you mind explaining how you implement the GPU-based 3D rule generation in your code?

How to inference

Hello
May I know how to inference a example from the kitti test dataset or a point cloud file in kitti format but without a info file? I have checked second/pytorch/inference.py. It seems a little bit different from train.py (e.g. should I create a reduced point cloud version of the original point cloud?).

Just wanna make sure I am using the inference in the correct way.

Thanks in advance.
Lin

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.