GithubHelp home page GithubHelp logo

jobinkv / document-image-classification-tl-sg Goto Github PK

View Code? Open in Web Editor NEW

This project forked from hiarindam/document-image-classification-tl-sg

0.0 2.0 0.0 181 KB

Document Image Classification with Intra-Domain Transfer Learning and Stacked Generalization of Deep Convolutional Neural Networks

Home Page: https://arxiv.org/abs/1801.09321

License: MIT License

Python 100.00%

document-image-classification-tl-sg's Introduction

Document Image Classification with Intra-Domain Transfer Learning and Stacked Generalization of Deep Convolutional Neural Networks

Contributors: Arindam Das, Saikat Roy, Ujjwal Bhattacharya, S.K. Parui

This research work has been made available here.

This page is published with intention to provide region based pre-trained models for document image classification for document structure learning. For using weight matrices, please note that we used theano as the backend for all our experiments hence everything is ordered per theano's style.

Please cite our work if you find it useful for you research.

@inproceedings{das2018document,
  title={Document Image Classification with Intra-Domain Transfer Learning and Stacked Generalization of Deep Convolutional Neural Networks},
  author={Das, Arindam and Roy, Saikat and Bhattacharya, Ujjwal and Parui, Swapan K},
  booktitle={2018 24th International Conference on Pattern Recognition (ICPR)},
  pages={3180--3185},
  year={2018},
  organization={IEEE}
}

Theano to Tensorflow Weight Convertor

There has been an ongoing issue by users unable to use (properly load) the weights in tensorflow using a convertor or otherwise since the version of theano and keras used for this project was pretty old (late 2017/early 2018). Please also look at the section on preprocessing the input. This section deals with weight conversion from theano to tensorflow. This particular module was developed by Auke Zijlstra ([email protected]) and although he was unable to replicate the exact results we had using this script, he did get things working. We provide excerpts from his communication with us on the usage of the script.

"... Although I have not been able to fully replicate your results, I have been able to achieve 0.87 accuracy score on the RVL-CDIP test set using your holistic model weights with a Keras+tensorflow setup. My steps to convert your Theano ordered weights into Tensorflow ordering were as follows:

Hopefully this gives a way forward for people having issues using our weights for newer versions of keras, theano, tensorflow and the like.

Dataset

RVL-CDIP has been used to validate the proposed methodology. This dataset consists of 400000 scanned grayscale images distributed among 16 categories. Also this collection is subdivided into training, validation and test sets each containing 320000, 40000 and 40000 images respectively.

Preprocessing

Please look at this comment to see a small example on how to preprocess the input for the networks.

Proposed Architecture

Experimental Results

Performance Comparison with State-of-the-art Approaches
Method Accuracy(%) Comments
Harley et al. [1] 89.90 Document region based DCNN models with transfer learning
Tensmeyer et al. [2] 89.31 Spatial pyramidal pooling based AlexNet without transfer learning
Tensmeyer et al. [2] 90.94 Same model as above with increased image dimension (384X384) keeping aspect ratio same
Csurka et al. [3] 90.70 GoogleNet with weights transferred from ImageNet
Afzal et al. [4] 90.97 VGG-16 with weights transferred from ImageNet
Kölsch et al. [5] 90.05 Weights transferred from ImageNet to VGG-16 and adding ELM in place of MLP
Proposed 91.11 VGG-16 model trained on holistic samples with weights transferred from ImageNet
Proposed 92.21 Inter and intra domain transfer learning on region based DCNNs and MLNN based stacking

Pre-trained Models

Trained models in this publication have been made available here. Please note that all weight matrices are formatted with theano as a background and not tensorflow. That also includes theano style input dimension ordering.

References

[1] A. W. Harley, A. Ufkes, and K. G. Derpanis, “Evaluation of deep convolutional nets for document image classification and retrieval,” in Document Analysis and Recognition (ICDAR), 2015 13th International Conference on. IEEE, 2015, pp. 991–995.

[2] C. Tensmeyer and T. Martinez, “Analysis of convolutional neural networks for document image classification,” arXiv preprint arXiv:1708.03273, 2017.

[3] G. Csurka, D. Larlus, A. Gordo, and J. Almazan, “What is the right way to represent document images?” arXiv preprint arXiv:1603.01076, 2016.

[4] M. Z. Afzal, A. K¨olsch, S. Ahmed, and M. Liwicki, “Cutting the error by half: Investigation of very deep cnn and advanced training strategies for document image classification,” arXiv preprint arXiv:1704.03557, 2017.

[5] Andreas Kölsch, Muhammad Zeshan Afzal, Markus Ebbecke, Marcus Liwicki, "Cutting the Error by Half: Investigation of Very Deep CNN and Advanced Training Strategies for Document Image Classification", arXiv preprint arXiv:1704.03557, 2017.

document-image-classification-tl-sg's People

Contributors

hiarindam avatar

Watchers

James Cloos avatar paper2code - bot avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.