GithubHelp home page GithubHelp logo

develooper1994 / singleshotmultiboxdetector_demo Goto Github PK

View Code? Open in Web Editor NEW
0.0 2.0 0.0 57.54 MB

The job interview for "Turk AI" company

License: MIT License

Jupyter Notebook 100.00%
single shot multiple detector detectron2 ssd ssmd pytorch cnn yolo

singleshotmultiboxdetector_demo's Introduction

Single Shot Multibox Detector demo

The interview challenge

Dataset: https://www.cis.upenn.edu/~jshi/ped_html/PennFudanPed.zip

Desired Input:
Desired Input

Desired Output:
Desired Output

Faster RCNN result (detectron2) Output:
Faster RCNN result Output

SSD result (torch hub) Output:
SSD result Output
There is a small error region at human on the right but i don't mind it. It is not a full project!
Note: Unfortunatelly for now detectron and torchvision not supports ready to use SSD models but they are actively working on it. Researchers should write own their own custom detectron models right now .

Installation requirements

I am using Anaconda Python distribution

First Pytorch:

"conda install pytorch torchvision cudatoolkit=10.1 -c pytorch"

Second Detectron2 api:

Not: use linux to make it easy
ssd fork: "https://github.com/ArutyunovG/detectron2/tree/master"
"cd detectron2 && pip install -e ."

Or if you are on macOS
CC=clang CXX=clang++ pip install -e .


Not: If "from detectron2.engine import DefaultPredictor" gives an error please reinstall detectron-2 with compiling all dependencies.

Expatiation

Video: https://youtu.be/7-jbJ8Ga8_s
There is two populer choise for image detection and segmentation. The main difference is R-CNN needs mask(white and black) for where the object should be, YOLO just try to classify object parts and combine them. Also YOLO is just a single neural network. So that YOLO is very fast.

R-CNN

It is actually helps to detect object on the screen. There is bunch of versions of R-CNN Mask R-CNN, Fast R-CNN, Faster R-CNN(complete CNN layers), KeyPoint R-CNN. Each of them just gives probability of where the object should be. Algorithm gives the boxes and classifier classify them. They don't classify the object. Need a backbone or backend neural network to classify Generally "torchvision" using resnet50, resnet101 classifiers. If Application should run on mobile, you can change with mobilenet_v2. Pre masked data helps to identify object. R-CNN detects ends of neural network.
First reduces processing requirement by segmenting image with mask. Than search the object by mask and backbone classify it. Mask R-CNN adds an extra branch into Faster R-CNN, which also predicts segmentation masks for each instance. Faster R-CNN Pytorch supports lots of classifiers.

YOLO(You look only once)

Splits screen into squares and searchs the objects. It doesn't use mask. YOLO is really really fast, but it slows down if the screen splits into too many pieces. Yolo struggles with small object because needs to same hole screen. If sampling is low, than YOLO can't detect. Also needs an anchor to detect.

Not: This name isn't only for image processing or neural networks. I have seen it used for extremly high performance blas kernels.

How I Made?

First I have made a mask-rcnn with "PennFudan dataset". After short training, I have figured out data and training time isn't enough. I have found a bigger dataset at mathworks cite, however training was so slow and i have short time to complete because i have other projects to complete.
Finally I have tried bunch of pre-trained models and inference apis. I know pytorch, i have made my audio denoising master thesis before so i decided to go with pytorch inference api and that is the detection api of facebook. it is "DETECTRON-2"
It is the first time i have used inference apis. I usually like to research and code it myself.

THAKNKS TO TURK AI TO GIVE THAT CHALLANGE. I experienced detectron-2 api.

singleshotmultiboxdetector_demo's People

Contributors

develooper1994 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.