GithubHelp home page GithubHelp logo

robert-junwang / pelee Goto Github PK

View Code? Open in Web Editor NEW
882.0 51.0 256.0 652 KB

Pelee: A Real-Time Object Detection System on Mobile Devices

License: Apache License 2.0

Python 13.94% Jupyter Notebook 86.06%

pelee's Introduction

Pelee: A Real-Time Object Detection System on Mobile Devices

This repository contains the code for the following paper.

Pelee: A Real-Time Object Detection System on Mobile Devices (NeurIPS 2018)

The code is based on the SSD framework.

Citation

If you find this work useful in your research, please consider citing:


@incollection{NIPS2018_7466,
title = {Pelee: A Real-Time Object Detection System on Mobile Devices},
author = {Wang, Robert J and Li, Xiang and Ling, Charles X},
booktitle = {Advances in Neural Information Processing Systems 31},
editor = {S. Bengio and H. Wallach and H. Larochelle and K. Grauman and N. Cesa-Bianchi and R. Garnett},
pages = {1967--1976},
year = {2018},
publisher = {Curran Associates, Inc.},
url = {http://papers.nips.cc/paper/7466-pelee-a-real-time-object-detection-system-on-mobile-devices.pdf}
}

Results on VOC 2007

The table below shows the results on PASCAL VOC 2007 test.

Method mAP (%) FPS (Intel i7) FPS (NVIDIA TX2) FPS (iPhone 8) # parameters
YOLOv2-288 69.0 1.0 - - 58.0M
DSOD300_smallest 73.6 1.3 - - 5.9M
Tiny-YOLOv2 57.1 2.4 - 23.8 15.9M
SSD+MobileNet 68.0 6.1 82 22.8 5.8M
Pelee 70.9 6.7 125 23.6 5.4M
Method 07+12 07+12+coco
SSD300 77.2 81.2
SSD+MobileNet 68 72.7
Pelee 70.9 76.4

Results on COCO

The table below shows the results on COCO test-dev2015.

Method mAP@[0.5:0.95] [email protected] [email protected] FPS (NVIDIA TX2) # parameters
SSD300 25.1 43.1 25.8 - 34.30 M
YOLOv2-416 21.6 44.0 19.2 32.2 67.43 M
YOLOv3-320 - 51.5 - 21.5 67.43 M
TinyYOLOv3-416 - 33.1 - 105 12.3 M
SSD+MobileNet-300 18.8 - - 80 6.80 M
SSDLite+MobileNet V2-320 22 - - 61 6.80 M
Pelee-304 22.4 38.3 22.9 120 5.98 M

Preparation

  1. Install SSD (https://github.com/weiliu89/caffe/tree/ssd) following the instructions there, including: (1) Install SSD caffe; (2) Download PASCAL VOC 2007 and 2012 datasets; and (3) Create LMDB file. Make sure you can run it without any errors.

  2. Download the pretrained PeleeNet model. By default, we assume the model is stored in $CAFFE_ROOT/models/

  3. Clone this repository and create a soft link to $CAFFE_ROOT/examples

git clone https://github.com/Robert-JunWang/Pelee.git
ln -sf `pwd`/Pelee $CAFFE_ROOT/examples/pelee

Training & Testing

  • Train a Pelee model on VOC 07+12:

    cd $CAFFE_ROOT
    python examples/pelee/train_voc.py
  • Evaluate the model:

    cd $CAFFE_ROOT
    python examples/pelee/eval_voc.py
    
    

Models

pelee's People

Contributors

robert-junwang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pelee's Issues

pelee speed

Hi ,the Pelee test time in 1080Ti is what?

Paper Clarification

@Robert-JunWang Hi

The paper points:
Residual prediction block make it possible for us to apply 1x1 convolutional kernels to predict the category scores and box offsets

Why can use 1*1 here?

The idea of the Residual Prediction Block comes from the paper: Residual features and unified Prediction network for single stage detection

in the code it provids, it is still 3*3.

try TensorRT on pelee

has anyone tried tensorRT on pelee?
how is the performance?
could you share your experience with us?

How

how to use the merged prototxt? Do we need to modified the convolution layer code?

Any plan to add a detection program?

Hi guys, thank you for your work and contribution. I would like to know is there any plan to add a simple cpp code or python script that only performs the inference process on a single image (just like ssd_detect.py)?

Is there something wrong with pycaffe?

Hi, @Robert-JunWang
Thanks for your great job~ It's really fast~
I trained peleeNet with my own data and got fantastic mAp in test dataset. But when it comes to pycaffe, and I tried to detect one image and check detection results, I got no results. I found 'detection_output' blob only contained [0, -1, -1, -1, -1, -1, -1]. I'm sure that the code is OK.
Is there something in peleeNet unsupported by pycaffe?
Thanks ~~~

Small object detection

@Robert-JunWang Hello, thanks for your great work! I want to use this network to detect the small objects in a drone dataset. The images are about 1400px1080px, and the objects are only 50px50px. The training mbox_loss maintains at 4 or 5 after 20000 iterations, the detection accuracy is only 30%. Is there any suggestions to modify the network or is there any tricks during training phase? Please help me, thank you~

Can't start training

I cloned Pelee and created a soft link to $CAFFE_ROOT/examples.
But when I run : python examples/pelee/train_voc.py there is an error : No such file or directory
Please help me. Thanks!

training on custom data

Hi,
Thanks for this new way of CNN object detection.
I have to train this on my custom data. Its not clear from the documentation. what are the steps?

Also how to use this trained model in c++ code?

training script does not work in CPU mode?

python examples/pelee/train_voc.py
Because I have not installed the cuDNN drivers yet, only run the training available in CPU mode.
It shows does not work.
Even afterwards, I did comment some lines of GPU mode related and also make change as below etc:
gpus = # "0"
num_gpus = 0 # len(gpulist)

Show the CORE dump output:

I0505 13:59:09.810305 15159 net.cpp:157] Top shape: 32 16 10 10 (51200)
I0505 13:59:09.810309 15159 net.cpp:165] Memory required for data: 5256052864
I0505 13:59:09.810313 15159 layer_factory.hpp:77] Creating layer stage4_4/branch2a
I0505 13:59:09.810322 15159 net.cpp:100] Creating Layer stage4_4/branch2a
I0505 13:59:09.810328 15159 net.cpp:434] stage4_4/branch2a <- stage4_3/concat_stage4_3/concat_0_split_1
I0505 13:59:09.810335 15159 net.cpp:408] stage4_4/branch2a -> stage4_4/branch2a
@ (nil) (unknown)
Aborted (core dumped)

@Robert-JunWang , Could you help me to figure it out? Thanks.

is there 14x14 pooling in pelee net?

Hello, I read your code, and I reproduce the code with pytorch, then I found that 14x14 pooling on the top of innerproduct, am I wrong? And why the last transition layer do not execute the pooing operation? Is this 'big' pooling work well?

does it costs 6min per 50iters during training?

the bactch size is 32(*4=128) in 4 gpus,i also use (build/tools/caffe time) while the cost time is shorter about 1minute 27seconds. i am confused with this, can anyone help me? thanks

Hi i want to test pelee + ssd using webcam

Hi i am studying object detection. i read ur paper and now i am trying to test pelee + ssd using webcam
i changed ssd_pascal_webcam.py source.
i removed VGGNETBODY and used pelee

but it seems it doesnt detect properly
i am not sure i am doing properly..
i used pelee_voc pretrained model

thanks

Paper Clarification

@Robert-JunWang Hi

The paper points:
To compensate for the negative impact on accuracy caused by this change, we use a shallow and wide network structure. We also add a 1x1 convolution layer after the last dense block to get the stronger representational abilities

This design, In the code, it doesn't seem to show up.

How do I get testing accuracy?

When i run python examples/pelee/eval_voc.py --weights=models/pelee/VOC0712/SSD_304x304/pelee_304x304_acc7094.caffemodel,it runs and gives this.
screen shot 2018-05-28 at 6 04 49 pm.
And the only results i get are in /data/VOCdevkit/results/VOC2007/SSD_304x304/Main/ directory in the form of this.
screen shot 2018-05-28 at 6 05 17 pm

How do i get overall test set accuracy in terms of map, loss etc?

19x19 features are fed twice

According to dot file I obtained for the deploy_merged.prototxt the 19x19 feature maps are fed to confidence and location extraction layers twice. Is it intentional or a copy/paste error?

pelee
deploy_merged.pdf

detection_eval= 0.000040

Hello
When i try
Evaluate the model:
cd $CAFFE_ROOT
python examples/pelee/eval_voc.py
pelee1
i used pretrained model

BatchNorm

why don't you use BatchNorm layer, does it hurt the performance?

test accuracy

Hello, I'm testing on the voc using the trained model you provided. The result of the pelee_SSD_304x304_iter_112000.caffemodel model map is 0.701925, and the map given in your thesis is 70.9. The same pelee_304x304_voc_coco_iter2k_7637.caffemodel model test map for 0.710299, your result is 76.4. I don't know if there is a problem with my test or what other skills? Does anyone have the same problem? Thank you.

train_voc.py about mAP

您好,请问我在voc0712上训练出来的mAP=69.9,和实际70.9相差了一个百分点,pre_caffemodel用的是peleenet_inet_acc7243.caffemodel,不知道是为什么?

Error when changing the number of classes

Hi,

@Robert-JunWang When I change the num_classes in the train_voc.py to 2 for my custom object detection I am gettting error as follows:

F0628 20:45:42.785300 15254 multibox_loss_layer.cpp:139] Check failed: num_priors_ * loc_classes_ * 4 == bottom[0]->channels() (6856 vs. 17600) Number of priors must match number of location predictions.

Do I have to change the mbox_loss params to match the priors? As far as original SSD is concerned there was no problem when I changed the num_classes to 2.
Thanks!

Bounding box sizes too large

Robert many thanks for your great work!

I am having trouble understanding why I am getting larger than expected bounding boxes for Pelee detections.

The heights and widths are not as closely cropped when compared to mobilenet-SSD implementations. I have read that you trained the model with pytorch, could the conv padding be a problem? Or is there something else I have missed?

Many Thanks,

Simon

I am using the following python script to for my test:

net_file= 'pelee.prototxt'  
caffe_model='pelee_304x304_acc7637.caffemodel' 
test_dir = "images"

if not os.path.exists(caffe_model):
    print("caffemodel does not exist")
    exit()
net = caffe.Net(net_file,caffe_model,caffe.TEST)  

CLASSES = ('background',
           'aeroplane', 'bicycle', 'bird', 'boat',
           'bottle', 'bus', 'car', 'cat', 'chair',
           'cow', 'diningtable', 'dog', 'horse',
           'motorbike', 'person', 'pottedplant',
           'sheep', 'sofa', 'train', 'tvmonitor')

def preprocess(src):
    img = cv2.resize(src, (304,304))
    img_mean = np.array([103.94, 116.78, 123.68], dtype=np.float32)
    img = img.astype(np.float32, copy=True) - img_mean
    img = img * 0.017
    return img

def postprocess(img, out):   
    h = img.shape[0]
    w = img.shape[1]
    box = out['detection_out'][0,0,:,3:7] * np.array([w, h, w, h])
    cls = out['detection_out'][0,0,:,1]
    conf = out['detection_out'][0,0,:,2]
    return (box.astype(np.int32), conf, cls)

def detect(imgfile, thresh):
    origimg = cv2.imread(imgfile)
    img = preprocess(origimg)
    img = img.astype(np.float32)
    img = img.transpose((2, 0, 1))

    net.blobs['data'].data[...] = img
    out = net.forward()  
    box, conf, cls = postprocess(origimg, out)
    for i in range(len(box)):
       if conf[i] > thresh :
          p1 = (box[i][0], box[i][1])
          p2 = (box[i][2], box[i][3])
          cv2.rectangle(origimg, p1, p2, (0,255,0))
          p3 = (max(p1[0], 15), max(p1[1], 15))
          title = "%s:%.2f" % (CLASSES[int(cls[i])], conf[i])
          cv2.putText(origimg, title, p3, cv2.FONT_ITALIC, 0.6, (0, 255, 0), 1)
    cv2.imshow("Pelee", origimg)
 
    k = cv2.waitKey(0) & 0xff
    if k == 27 : return False
    return True

for f in os.listdir(test_dir):
    if detect(test_dir + "/" + f, 0.2) == False:
       break

ssd_screenshot_04 05 2018

About model size

hi all, in the paper, the trained model is 5.4M on PASCAL VOC 2007 test. However, the link to the trained model from the author is 20M~. Does the network has some differenence? Anyone can explain that for me? THX!

trian/test.prototxt

@Robert-JunWang

I found that in the final prediction, stage4_tb/ext/pm2/res was used to make two predictions, except that the min_size and max_size settings in the "PriorBox" were different, all the others are same

Why do you make two predictions separately instead of merging them? What are the advantages of these two predictions?

./build/tools/caffe: not found

Hello, thank you for your code!
When I run the train_voc.py, I meet a problem:

args: Namespace(arch='pelee', batch_size=32, image_size=304, kernel_size=1, lr=0.005, posfix='', run_soon=True, step_value=[80000, 100000, 120000], weight_decay=0.0005, weights='models/peleenet_inet_acc7243.caffemodel')
jobs/pelee/VOC0712/SSD_304x304/pelee_SSD_304x304.sh: 2: jobs/pelee/VOC0712/SSD_304x304/pelee_SSD_304x304.sh: ./build/tools/caffe: not found

How can I solve it? Appreciate it you can give me some advice !!!Thank u!!!

fps of Pelee on TX1

At first, thank you for the good work.

Including the pre and post processings, I got 14 fps detection rate on Nvidia TX1 board. I didn't utilize tensorRT engine. Pretty good results but still want to get your comments on it. Is it reasonable with respect to fps results given for the iphones?

Pretrained model does not seem to give good detection_eval

First of all many thanks for providing the code, greatly appreciated!!

I have run your evaluation code on your pre-trained model, i.e.

python examples/pelee/eval_voc.py --weights=models/pelee/peleenet_inet_acc7243.caffemodel

which is giving to my surprise a loss value of loss=29.92 and detection_eval = 0.002 on the Pascal VOC validation data.

I have then retrained the model for 120,000 iterations and I am obtaining a loss value of loss=1.81 and detection_eval = 0.705, so fairly close to what you have published.

Would you maybe so kind to look once more at the pre-trained model at https://drive.google.com/file/d/1OBzEnD5VEB_q_B8YkLx-i3PMHVO-wagk/view?usp=sharing? Does that model give you good validation?

Cannot match performance on COCO

Hi @Robert-JunWang ,

Thanks for the contribution.
I downloaded and tested your trained model using COCO API, but I could just get 21.8% mAP instead of 22.4% mAP reported in the paper. The following is the detail results:

Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.218
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.376
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.219
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.034
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.209
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.400
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.212
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.309
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.327
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.072
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.345
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.562

Could you provide your evaluation code to get the target evaluation performance? Thank you!

MAX v.s. AVE Pooling

Hi Robert,

I noticed that you have commented out line 80 in peleenet.py file, which is the max pooling layer replacing the original average pooling layer. Could you share your experience on the performance difference between the two options? Thank you!

MXNet model

Can you share a pre trained model in MXNet. I want to try this network. Thanks

Train on voc07+12+coco dataset mAP problem

Hi, I use your upload coco model, its mAP is 38.3(IOU = 0.5).
Then I finetune the model on voc0712 dataset, I set some parameters below:

  • I follow upload voc0712+coco model's min_size and max_size, ex: (21.28, 45.6)、(45.6, 100.32)、(100.32、155.04)、(155.04, 209.76)、(209.76, 264.48)、(264.48, 319.2)

  • I rename all confidence layers, preserve all loc layers.

  • I set train batch size 20 and test batch size 4 (due to my GPU memory cost)

  • Other hyper-parameters(which in solver.prototxt), I keep it equal to yours.

I get a 73% mAP model after training 200k iterations, 3 percentage point lower than yours.
Can you give me some advice?

mvNCCompile failed

When I trained the model and converted it to graph, I encountered the following error. Excuse me, have anyone met, how to solve it, thank you

Traceback (most recent call last):
File "/usr/local/bin/mvNCCompile", line 118, in
create_graph(args.network, args.inputnode, args.outputnode, args.outfile, args.nshaves, args.inputsize, args.weights)
File "/usr/local/bin/mvNCCompile", line 101, in create_graph
net = parse_caffe(args, myriad_config)
File "/usr/local/bin/ncsdk/Controllers/CaffeParser.py", line 503, in parse_caffe
node.concat_axis = layer.concat_param.axis
AttributeError: 'list' object has no attribute 'concat_axis'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.