GithubHelp home page GithubHelp logo

lzccccc / 3d-bounding-box-estimation-for-autonomous-driving Goto Github PK

View Code? Open in Web Editor NEW
127.0 5.0 36.0 2.27 MB

3d bounding box estimation from monocular image based on 2d bounding box

Python 100.00%
autonomous-driving kitti-dataset 3d-detection

3d-bounding-box-estimation-for-autonomous-driving's Introduction

3D Bounding Box Estimation for Autonomous Drinving

This project fully implemented paper "3D Bounding Box Estimation Using Deep Learning and Geometry" based on previous work by image-to-3d-bbox(https://github.com/experiencor/image-to-3d-bbox).

Depandency:

  • Python 3.6
  • Tensorflow 1.12.0

Modifications and Improvements:

  1. No prior knowledge of the object location is needed. Instead of reducing configuration numbers to 64, the location of each object is solved analytically based on local orientation and 2D location.

  2. Add soft constraints to improve the stability of 3D bounding box at certain locations.

  3. MobileNetV2 backend is used to significantly reduce parameter numbers and make the model Fully Convolutional.

  4. The orientation loss is changed to the correct form.

  5. Bird-eye view visualization is added.

Results on KITTI raw data:

MobilenetV2 with ground truth 2D bounding box. 347.png

Video: https://www.youtube.com/watch?v=IIReDnbLQAE

Train and Evaluate:

First prepare your KITTI dataset in the following format:

kitti_dateset/
├── 2011_09_26
│   └── 2011_09_26_drive_0084_sync
│           ├── box_3d       <- predicted data
│           ├── calib_02
│           ├── calib_cam_to_cam.txt
│           ├── calib_velo_to_cam.txt
│           ├── image_02
│           ├── label_02
│           └── tracklet_labels.xml
│
└── training
    ├── box_3d    <- predicted data
    ├── calib
    ├── image_2
    └── label_2

To train:

  1. Specify parameters in config.py.
  2. run train.py to train the model:
python3 train.py

To predict:

  1. Change dir in read_dir.py to your prediction folder.
  2. run prediction.py to predict 3D bounding boxes. Change -d to your dataset directory, -a to specify which type of dataset(train/val split or raw), -w to specify the training weights.

To visualize 3D bounding box:

  1. run visualization3Dbox.py. Specify -s to if save figures or view the plot , specify -p to your output image folders.

Performance:

w/o soft constraint w/ soft constraint
backbone parameters / model size inference time(s/img)(cpu/gpu) type Easy Mode Hard Easy Mode Hard
VGG 40.4 mil. / 323 MB 2.041 / 0.081 AP2D 100 100 100 100 100 100
AOS 99.98 99.82 99.57 99.98 99.82 99.57
APBV 26.42 28.15 27.74 32.89 29.40 33.46
AP3D 20.53 22.17 25.71 27.04 27.62 27.06
mobileNet v2 2.2 mil. / 19 MB 0.410 / 0.113 AP2D 100 100 100 100 100 100
AOS 99.78 99.23 98.18 99.78 99.23 98.18
APBV 11.04 8.99 10.51 11.62 8.90 10.42
AP3D 7.98 7.95 9.32 10.42 7.99 9.32
Offline Evaluation: 50% for training / 50 % for testing
cpu: core i5 7th
gpu: NVIDIA TITAN X

3d-bounding-box-estimation-for-autonomous-driving's People

Contributors

lzccccc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

3d-bounding-box-estimation-for-autonomous-driving's Issues

VGG model

If I use "VGG", should I download a pre-trained VGG model?

角度loss的实现问题

作者您好:
很不错的复现。有个问题想问下,就是你实现角度location loss是不是与原文当中的实现不太一样呢,因为我最近也在复现这篇论文.忘赐教 谢谢!

Training early stopping

My training stopped at about epoch 50 and the programs says "early stopping". What does this mean?

How to get the prediction or visualization of new images, not on Kitti?

I trained the model, it is working fine for the Kitti dataset images. I went through your prediction code and found you are using ground truth labels for generating predictions. You are creating an image patch using the ground truth labels and feeding that patch to the trained model instead of just raw images. So, what if I do not have ground truth labels which is normal for testing images or videos?

how to plot 3dbb without P2

Hi,

firstly, excellent repo, very nicely written code, love it!

But I am confused a bit with the math.

The model outputs 2 orientation values, a sin and a cosine of the local orientation alpha, as predictions.

But while looking at visualization3Dbox.py, it appears that it used the P2 provided by the calib data.

So how will plotting happen for an unseen image, when P2 for it isn't available?

Let me know if I failed to explain myself. Thanks!

Camera Calibration Data ?

Hello i have one question,

which kind of data do we need for the camera calibration data ? because if we want to compute other 3d bounding box especially for other cameras, which camera calibration data do we need to get good results ? or just better results.

Kind Regards.

Loss is Nan, VGG backbone not working

I was able to setup and run the training, but if I use VGG as backbone, the training log looked like below:

ETA: 3:15:50 - loss: inf - dimensions_loss: inf - orientation_loss: 1.74
ETA: 2:11:50 - loss: nan - dimensions_loss: nan - orientation_loss: 1.69
TA: 1:39:53 - loss: nan - dimensions_loss: nan - orientation_loss: 1.34

MobileNetV2 works fine. But the performance is not as good as VGG. I really need to use VGG.

some problem about KITTI data set

I have problem with preparing KITTI data set. I download the data set you mention and the box_3d, label_02 and calib_02 file is missing can you please give me the proper link with these folders.
Thank you

Confused about label files.

Hi Author, I am confuse about the labels_02, calib_02 files in the folder
kitti_dateset/
├── 2011_09_26
│ └── 2011_09_26_drive_0084_sync
│ ├── box_3d <- predicted data
│ ├── calib_02
│ ├── calib_cam_to_cam.txt
│ ├── calib_velo_to_cam.txt
│ ├── image_02
│ ├── label_02
│ └── tracklet_labels.xml

i dont find this the kitti raw data. can you please explain where do you get this data from?

about the orientation loss?

@lzccccc Hi,you said the oritentation loss is changed to the correct form.So is different from the paper? I check the code you uploaded , I don't think there is any difference between the paper . Did I make some mistake or misunderstand you ?

Data ?

Could you specify all the data and corresponding links needed for training process ? Thank you

License?

Hi, I was just wondering about the license for the project since it doesn't seem to have one. Are people ok to use the repo?

Trained with imagenet pre-trained model

Hello lzccccc good morning
I want to know if using pretrained weights from imagenet to train usage is something like this one?
We need to set the trainable layer to false or not?
Because I train without any pre-trained weights from imagenet give a bad result

inputs = layers.Input(shape=(cfg().norm_h, cfg().norm_w, 3))
x = layers.Conv2D(64, (3, 3),
activation='relu',
padding='same',
name='block1_conv1')(inputs)
x = layers.Conv2D(64, (3, 3),
activation='relu',
padding='same',
name='block1_conv2')(x)
x = layers.MaxPooling2D((2, 2), strides=(2, 2), name='block1_pool')(x)

Block 2

x = layers.Conv2D(128, (3, 3),
activation='relu',
padding='same',
name='block2_conv1')(x)
x = layers.Conv2D(128, (3, 3),
activation='relu',
padding='same',
name='block2_conv2')(x)
x = layers.MaxPooling2D((2, 2), strides=(2, 2), name='block2_pool')(x)

Block 3

x = layers.Conv2D(256, (3, 3),
activation='relu',
padding='same',
name='block3_conv1')(x)
x = layers.Conv2D(256, (3, 3),
activation='relu',
padding='same',
name='block3_conv2')(x)
x = layers.Conv2D(256, (3, 3),
activation='relu',
padding='same',
name='block3_conv3')(x)
x = layers.MaxPooling2D((2, 2), strides=(2, 2), name='block3_pool')(x)

Block 4

x = layers.Conv2D(512, (3, 3),
activation='relu',
padding='same',
name='block4_conv1')(x)
x = layers.Conv2D(512, (3, 3),
activation='relu',
padding='same',
name='block4_conv2')(x)
x = layers.Conv2D(512, (3, 3),
activation='relu',
padding='same',
name='block4_conv3')(x)
x = layers.MaxPooling2D((2, 2), strides=(2, 2), name='block4_pool')(x)

Block 5

x = layers.Conv2D(512, (3, 3),
activation='relu',
padding='same',
name='block5_conv1')(x)
x = layers.Conv2D(512, (3, 3),
activation='relu',
padding='same',
name='block5_conv2')(x)
x = layers.Conv2D(512, (3, 3),
activation='relu',
padding='same',
name='block5_conv3')(x)
x = layers.MaxPooling2D((2, 2), strides=(2, 2), name='block5_pool')(x)

x = layers.GlobalAveragePooling2D()(x)

model1 = tf.keras.Model([inputs], [x], name='vgg16_1')
model1.load_weights('vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5')
model1.summary()

for layer in model1.layers[:]:
layer.trainable = False

for layer in model1.layers:
print(layer, layer.trainable)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.