GithubHelp home page GithubHelp logo

dian3f / spatial-pyramid-matching-scene-recognition Goto Github PK

View Code? Open in Web Editor NEW

This project forked from trungtvo/spatial-pyramid-matching-scene-recognition

0.0 0.0 0.0 80.58 MB

Trained a classifier to recognize 3000 images with 15 categories using Bag of Features model and Spatial Pyramid Matching algorithm. Improved accuracy from ~50% to ~70%

Jupyter Notebook 100.00%

spatial-pyramid-matching-scene-recognition's Introduction

Spatial Pyramid Matching Scene Recognition

Trained a classifier to recognize 3000 images with 15 categories using Bag of Features model and Spatial Pyramid Matching algorithm. Improved accuracy from ~50% to ~70%.

dataset

Install dependencies

All common python packages are needed (Numpy, Matplotlib,...). We also need OpenCV in this project.

pip install numpy matplotlib opencv-python

We will need to use SIFT built-int function of OpenCV to extract SIFT or SURF features from images, and since version OpenCV 3.x.x no longer includes these functions, one simple way is to downgrade to version 2.x.x. So first uninstall old version if already exists:

pip uninstall opencv-python

Then install this version:

pip install opencv-contrib-python

Now we should be good to use SIFT/SURF descriptor in OpenCV.

Note that we can also use other features descriptor like HOG instead of SIFT.

Overall

  • Try classify on raw features (accuracy ~18% - 25%)
  • Build a SIFT descriptor by constructing histogram of frequencies of "visual words". We find SIFT features for each image which is a 1D vector of size 128, then concatenate all these 1D vectors into a long 1D vector of the whole training set. Use K-means to cluster these data points from this vector into K groups. Now for each image, we have its SIFT features, assign these features into clusters that we've already clustered before, this will represent the histogram representation of frequencies of "visual words" for our image. Do this similarly for the rest of the images to form the features representation of our data before feeding into the classifier model. First, we will try K nearest neighbor. (accuracy ~50% - 60%)
  • Build the SIFT histogram representation of "visual words" similarly as mentioned above, now we'll try to use multiclass Support Vector Machines as our classification model and compare the result. (accuracy ~60% - 70%)

Here is the intuition for constructing SIFT descriptor: screen shot 2018-03-16 at 12 21 34 am screen shot 2018-03-16 at 12 21 49 am

Finally, concatenate the 16 histograms together to get the final 128-element SIFT descriptor. screen shot 2018-03-16 at 12 22 01 am

Spatial Pyramid Matching

One drawback of Bag of Visual Words is, all local features are encoded into a single code vector ignoring the position of the feature descriptors, which means spatial information between words are discarded in the final code vector. Thus, to incorporate the spatial information into the final code vector, we can apply Spatial Pyramid Matching, a very simple but powerful idea proposed in Lazebnik et al. 2006.

spatial-pyramid-matching-scene-recognition's People

Contributors

trungtvo avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.