GithubHelp home page GithubHelp logo

gopirajumatta / ocr_caffe Goto Github PK

View Code? Open in Web Editor NEW

This project forked from anant-agarwal/ocr_caffe

0.0 0.0 0.0 761 KB

Segmentation using OpenCV and OCR using Caffe with Python

Jupyter Notebook 100.00%

ocr_caffe's Introduction

Optical Character Recognition using Neural Networks
By: Anant Agarwal (aa2387), Deekshith Belchappada (db786)

The code is based on Python 2.7, uses Anaconda, and Caffe.

Setting up Anaconda package:
i) Download the anaconda package based on your Operating System from -  https://docs.continuum.io/anaconda/install 
ii) Make sure you are downloading the "Python 2.7 version"
iii) Follow the instructions on the page, it should just require a few clicks to get it done.

Setting up Caffe:
i) Caffe installation requires a lot of intricate changes in setting it up the right way, and in installing the right set of dependencies. Describing all of these details here will make this README file very complicated.
ii) We list the blogpost which describes all the intricacies in setting up caffe in detail and was used by us extensively to setup our version. We feel that repeating the same content here in this file will be wasteful.
iii) Blogpost:  http://installing-caffe-the-right-way.wikidot.com/start
iv) Caffe official installation instructions: http://caffe.berkeleyvision.org/installation.html

How to run the code:
i) Code is contained in 2 python notebooks: Segmentation.ipynb, OCR.ipynb
ii) Jupyter python notebooks are part of Anaconda, and can be triggered by the following command in shell/terminal:
	jupyter notebook
iii) Once you have the notebook server running, open these notebooks.
iv) Both the notebooks are decribed below.

Segmentation.ipynb:
i) This file contains all the code related to segmentaion, the file temp/1-this_is_a_test-1.png is provided as a sample image to run this code.
ii) It will extract characters from this sample image and keep them in a sorted order.
iii) You can change the input image file to any other file, and the output will be generated in temp/segmentation_pdfimage/
iv) The code is heavily commented and provides explanation of all the chunks.
v) To run the code: in the top menu, go to Cell -> Run Cells.

OCR.ipynb:
This file contains following functionalities:
i)Data Generation:
This code block cell generates data using popular fonts.
The code has been explained with the help of comments in the file.
Output folder generated as a result of this code - "data_gen_new" has been included as well.

ii)Generate Training and Testing files:
This code block cell generates training_index.txt and test_index.txt
These files are used by caffe model for training and testing purposes.
They essentially contain the image file address and the image class label. Sample files created as a result of this code are present in the data_gen_new folder, and are named training_index.txt and test_index.txt

iii)Loading Caffe model in Python:
You need to train the caffe model first, and then provide its location to the python code along with its configuration file.
The process of training the model is described later in this file.

iv) Demo, Testing, Classification code: (You need to load the model first for any of this to run)
a)Demo code: We take one image per class and try to classify it using the model trained, the input folder has been provided, and can be found at "temp/0-9A-Za-z". The code also calculates Mean Reciprocal Rank using top 3 guesses.
b)Code for Testing the model: This one loads your test file, and uses the model to classify and calculate accuracy. A sample test file has been provided and can be found at "data_gen_new/test_index.txt"
c) Classify segmented images: This is essentially same as above, just that we are passing in the images generated as a result of character segmentation on a image. You can generate the inputs for this code piece by running the segmentation code from segmentation.ipynb 

Training the caffe Model:
We have two models - lenet (with 62 classes), alpha (referred as 'LeNet Modified' in report)
Caffe requires following 3 files:
Model File: temp/lenet.prototxt, temp/alpha.prototxt
Solver File: temp/lenet_solver.prototxt, temp/alpha_solver.prototxt
Deploy File: temp/lenet_deploy.prototxt, temp/alpha_deploy.prototxt
Details for each of these are mention in the report.
For training the following needs to be done:
i) Fix the paths in Model File for training_index.txt, test_index.txt
ii) See that the data(training image files) are located on the locations mentioned in the training and testing files.
iii) In the solver file, make sure the path to model file on line 2 is correct.
iv) In terminal/shell give the following command with desired solver file:
	caffe train -solver <solver file>
v) The model will start training, and a version of model will be stored after every few iterations.


All the details above, along with the report(it has additional details about caffe) should be sufficient in getting the code up and running. However, if there are any troubles, please feel free to reach out to any of us. ([email protected], [email protected])

Thanks!

ocr_caffe's People

Contributors

anant-agarwal avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.