OCR-Handwriting Project

1. Summary of Design Decisions

This project will follow an abstraction based design: letters, words, lines, and entire documents. Every document can be broken down into these respective groups of abstraction.

(i) An entire document.
(ii) A collection of lines in a document.
(iii) A collection of words that are consecutively placed on each line.
(iv) Single characters that make up the words.

It can be seen that each level abstraction relies on the previous, going all the way down to the individual letters that are on the document. Given the nature of that abstraction Dr. Johnson suggested we start from the ground up, meaning ﬁrst we will be building the data set for letters, and training a model to recognize other letters of similar (1800’s English) style. Our current priority is to build this large data set of characters for our neural network to pull from. After this set is built up we will work on ﬁguring out the optimal design of our model and start to train it. After this section is completed we will have a network that can identify individual characters. From this base level we will then work on the next level of abstraction, that will be able to identify the words in a line. The project will follow a similar style of abstraction based progress until we can use every level to read an entire document.

2. Past Progress

Currently the project is in the data collection phase. We have scans of manuscripts from John Quincy Adams that we are imaging. The imaging process will conclude shortly after which we will work on building the networks for the various phases of the model. We anticipate a schedule that proceeds as follows:

(i) Data collection -- Complete
(ii) Model outline
(iii) Model optimization

3. Current Progress

Example of our neural network correctly predicting an image from an alternate author.

Currently the project is progressing nicely. We are now in a phase of basic R&D where we are using our collected data set to figure out an optimal model (convolution neural network) to categorize the characters. We will continually update this page with significant chunks of development.

mattlm0831 / ocr-handwriting Goto Github PK

ocr-handwriting's Introduction

OCR-Handwriting Project

1. Summary of Design Decisions

2. Past Progress

3. Current Progress

ocr-handwriting's People

Contributors

Watchers

Forkers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs