3d bounding box estimation from monocular image based on 2d bounding box

Python 100.00%

autonomous-driving kitti-dataset 3d-detection

3d-bounding-box-estimation-for-autonomous-driving's Introduction

3D Bounding Box Estimation for Autonomous Drinving

This project fully implemented paper "3D Bounding Box Estimation Using Deep Learning and Geometry" based on previous work by image-to-3d-bbox(https://github.com/experiencor/image-to-3d-bbox).

Depandency:

Python 3.6
Tensorflow 1.12.0

Modifications and Improvements:

No prior knowledge of the object location is needed. Instead of reducing configuration numbers to 64, the location of each object is solved analytically based on local orientation and 2D location.
Add soft constraints to improve the stability of 3D bounding box at certain locations.
MobileNetV2 backend is used to significantly reduce parameter numbers and make the model Fully Convolutional.
The orientation loss is changed to the correct form.
Bird-eye view visualization is added.

Results on KITTI raw data:

MobilenetV2 with ground truth 2D bounding box.

Video: https://www.youtube.com/watch?v=IIReDnbLQAE

Train and Evaluate:

First prepare your KITTI dataset in the following format:

kitti_dateset/
├── 2011_09_26
│   └── 2011_09_26_drive_0084_sync
│           ├── box_3d       <- predicted data
│           ├── calib_02
│           ├── calib_cam_to_cam.txt
│           ├── calib_velo_to_cam.txt
│           ├── image_02
│           ├── label_02
│           └── tracklet_labels.xml
│
└── training
    ├── box_3d    <- predicted data
    ├── calib
    ├── image_2
    └── label_2

To train:

Specify parameters in config.py.
run train.py to train the model:

python3 train.py

To predict:

Change dir in read_dir.py to your prediction folder.
run prediction.py to predict 3D bounding boxes. Change -d to your dataset directory, -a to specify which type of dataset(train/val split or raw), -w to specify the training weights.

To visualize 3D bounding box:

run visualization3Dbox.py. Specify -s to if save figures or view the plot , specify -p to your output image folders.

Performance:

				w/o soft constraint			w/ soft constraint
backbone	parameters / model size	inference time(s/img)(cpu/gpu)	type	Easy	Mode	Hard	Easy	Mode	Hard
VGG	40.4 mil. / 323 MB	2.041 / 0.081	AP2D	100	100	100	100	100	100
			AOS	99.98	99.82	99.57	99.98	99.82	99.57
			APBV	26.42	28.15	27.74	32.89	29.40	33.46
			AP3D	20.53	22.17	25.71	27.04	27.62	27.06
mobileNet v2	2.2 mil. / 19 MB	0.410 / 0.113	AP2D	100	100	100	100	100	100
			AOS	99.78	99.23	98.18	99.78	99.23	98.18
			APBV	11.04	8.99	10.51	11.62	8.90	10.42
			AP3D	7.98	7.95	9.32	10.42	7.99	9.32

Offline Evaluation: 50% for training / 50 % for testing
cpu: core i5 7th
gpu: NVIDIA TITAN X

3d-bounding-box-estimation-for-autonomous-driving's People

Contributors

Stargazers

Watchers

3d-bounding-box-estimation-for-autonomous-driving's Issues

VGG model

If I use "VGG", should I download a pre-trained VGG model?

角度loss的实现问题

作者您好：
很不错的复现。有个问题想问下，就是你实现角度location loss是不是与原文当中的实现不太一样呢，因为我最近也在复现这篇论文.忘赐教谢谢!

Training early stopping

My training stopped at about epoch 50 and the programs says "early stopping". What does this mean?

How to get the prediction or visualization of new images, not on Kitti?

I trained the model, it is working fine for the Kitti dataset images. I went through your prediction code and found you are using ground truth labels for generating predictions. You are creating an image patch using the ground truth labels and feeding that patch to the trained model instead of just raw images. So, what if I do not have ground truth labels which is normal for testing images or videos?

how to plot 3dbb without P2

Hi,

firstly, excellent repo, very nicely written code, love it!

But I am confused a bit with the math.

The model outputs 2 orientation values, a sin and a cosine of the local orientation alpha, as predictions.

But while looking at visualization3Dbox.py, it appears that it used the P2 provided by the calib data.

So how will plotting happen for an unseen image, when P2 for it isn't available?

Let me know if I failed to explain myself. Thanks!

Camera Calibration Data ?

Hello i have one question,

which kind of data do we need for the camera calibration data ? because if we want to compute other 3d bounding box especially for other cameras, which camera calibration data do we need to get good results ? or just better results.

Kind Regards.

Do you have any pretrained parameters we can use?

Loss is Nan, VGG backbone not working

I was able to setup and run the training, but if I use VGG as backbone, the training log looked like below:

ETA: 3:15:50 - loss: inf - dimensions_loss: inf - orientation_loss: 1.74
ETA: 2:11:50 - loss: nan - dimensions_loss: nan - orientation_loss: 1.69
TA: 1:39:53 - loss: nan - dimensions_loss: nan - orientation_loss: 1.34

MobileNetV2 works fine. But the performance is not as good as VGG. I really need to use VGG.

How to measure performance?

Hi, thanks for uploading this code!
How did you measure the performance?

what's the threshold of IOU?

the result is tested when IoU = 0.5 or IoU = 0.7?

some problem about KITTI data set

I have problem with preparing KITTI data set. I download the data set you mention and the box_3d, label_02 and calib_02 file is missing can you please give me the proper link with these folders.
Thank you

Where is the trained model saved

I am running the training now, but do not know where the model was saved. Anyone can help?

Confused about label files.

Hi Author, I am confuse about the labels_02, calib_02 files in the folder
kitti_dateset/
├── 2011_09_26
│ └── 2011_09_26_drive_0084_sync
│ ├── box_3d <- predicted data
│ ├── calib_02
│ ├── calib_cam_to_cam.txt
│ ├── calib_velo_to_cam.txt
│ ├── image_02
│ ├── label_02
│ └── tracklet_labels.xml

i dont find this the kitti raw data. can you please explain where do you get this data from?

problem with preparing KITTI dataset

about the orientation loss?

@lzccccc Hi,you said the oritentation loss is changed to the correct form.So is different from the paper? I check the code you uploaded , I don't think there is any difference between the paper . Did I make some mistake or misunderstand you ?

Data ?

Could you specify all the data and corresponding links needed for training process ? Thank you

License?

Hi, I was just wondering about the license for the project since it doesn't seem to have one. Are people ok to use the repo?

Trained with imagenet pre-trained model

Hello lzccccc good morning
I want to know if using pretrained weights from imagenet to train usage is something like this one?
We need to set the trainable layer to false or not?
Because I train without any pre-trained weights from imagenet give a bad result

inputs = layers.Input(shape=(cfg().norm_h, cfg().norm_w, 3))
x = layers.Conv2D(64, (3, 3),
activation='relu',
padding='same',
name='block1_conv1')(inputs)
x = layers.Conv2D(64, (3, 3),
activation='relu',
padding='same',
name='block1_conv2')(x)
x = layers.MaxPooling2D((2, 2), strides=(2, 2), name='block1_pool')(x)

Block 2

x = layers.Conv2D(128, (3, 3),
activation='relu',
padding='same',
name='block2_conv1')(x)
x = layers.Conv2D(128, (3, 3),
activation='relu',
padding='same',
name='block2_conv2')(x)
x = layers.MaxPooling2D((2, 2), strides=(2, 2), name='block2_pool')(x)

Block 3

x = layers.Conv2D(256, (3, 3),
activation='relu',
padding='same',
name='block3_conv1')(x)
x = layers.Conv2D(256, (3, 3),
activation='relu',
padding='same',
name='block3_conv2')(x)
x = layers.Conv2D(256, (3, 3),
activation='relu',
padding='same',
name='block3_conv3')(x)
x = layers.MaxPooling2D((2, 2), strides=(2, 2), name='block3_pool')(x)

Block 4

x = layers.Conv2D(512, (3, 3),
activation='relu',
padding='same',
name='block4_conv1')(x)
x = layers.Conv2D(512, (3, 3),
activation='relu',
padding='same',
name='block4_conv2')(x)
x = layers.Conv2D(512, (3, 3),
activation='relu',
padding='same',
name='block4_conv3')(x)
x = layers.MaxPooling2D((2, 2), strides=(2, 2), name='block4_pool')(x)

Block 5

x = layers.Conv2D(512, (3, 3),
activation='relu',
padding='same',
name='block5_conv1')(x)
x = layers.Conv2D(512, (3, 3),
activation='relu',
padding='same',
name='block5_conv2')(x)
x = layers.Conv2D(512, (3, 3),
activation='relu',
padding='same',
name='block5_conv3')(x)
x = layers.MaxPooling2D((2, 2), strides=(2, 2), name='block5_pool')(x)

x = layers.GlobalAveragePooling2D()(x)

model1 = tf.keras.Model([inputs], [x], name='vgg16_1')
model1.load_weights('vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5')
model1.summary()

for layer in model1.layers[:]:
layer.trainable = False

for layer in model1.layers:
print(layer, layer.trainable)

lzccccc / 3d-bounding-box-estimation-for-autonomous-driving Goto Github PK