GithubHelp home page GithubHelp logo

syashakash / handwritten-digits-recognition Goto Github PK

View Code? Open in Web Editor NEW

This project forked from sdimi/handwritten-digits-recognition

0.0 2.0 0.0 1.53 MB

Image recognition of handwritten digits [MNIST]

Python 100.00%

handwritten-digits-recognition's Introduction

Machine learning semester project for the Statistical Learning course at Aristotle University of Thessaloniki. Task of the project was to perform machine learning algorithms on the benchmark dataset of MNIST, in order to recognize handwritten digit images. MNIST was introduced by Yann LeCunn, and contains 70.000 images of 28x28 pixels each, extending our feature vector to 784 dimensions. The training set comprises of the first 60.000 images and the testing set of the last 10.000 images.

I performed classification, clustering, dimensionality reduction and embedding. At best, SVM achieved an 1.8% error rate.

####Dependencies

  • Python 2.7+
  • Scikit-learn
  • Matplotlib
  • Numpy

####Classification By running svm_mnist.py we run the SVM classification code. The code first loads the dataset via its helper function provided by sklearn. Then it normalizes each pixel at [0,1].

X_train, y_train = np.float32(mnist.data[:60000])/ 255., np.float32(mnist.target[:60000])

In order to be able to run this task in a regular machine, we reduce the dimensions from 784 to 90 with PCA. That way, we keep around 91% of the initial information. PICTURE

After dimensionality reduction, we perform SVM with various kernels and hyperparameters. The following accuracy results are obtained after 5-fold cross validation. PICTURE

Some correct and false classification examples are shown below. At MNIST the "9" digit is confused with "4" sometimes.

Correct prediction False prediction
PICTURE PICTURE

####Dimensionality Reduction By running kpca_mnist.py we run the lda + kernelPCA code. With the new reduced dimensions, we perform kNN and NearestCentroid. Please note that kPCA is a memory intensive process, so we limit our training set to 15.000 samples. The following table presents the classification accuracy with our eventually reduced dimensions down to 9. PICTURE

####Embedding Projections & Clustering Finally, we run cluster_mnist.py in order to project our dataset in the two-dimensional space, leveraging Spectral and Isomap embeddings. By keeping 5000 samples for visualization, we perform spectral clustering. To evaluate the clustering effectiveness, we compute the cluster completeness score which is under 0.5 for both cases. The following scatterplots display the embeddings.

Isomap Spectral
PICTURE PICTURE

handwritten-digits-recognition's People

Contributors

sdimi avatar

Watchers

James Cloos avatar Akash Shivram avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.