GithubHelp home page GithubHelp logo

pharos-alexandria / ocr-greek_cursive Goto Github PK

View Code? Open in Web Editor NEW
12.0 3.0 1.0 167.26 MB

Training files for Greek cursive script (in early print)

ocr greek kraken ground-truth calamari-ocr calamari-model

ocr-greek_cursive's Introduction

ocr-greek_cursive

Training files for Greek cursive script (in early print)

This repository contains ground truth and pre-trained models for Kraken and for Calamari.

Ground truth

The folder gt contains ground truth made from the exemplar of the edition of John Chrysostom's works by Henry Savile, ed. Eton (John Norton), Tom. V, 1612, which is today at Bayerische Staatsbibliothek München, Res/2 P.gr. 55-5, and was made available in digitized form at http://mdz-nbn-resolving.de/urn:nbn:de:bvb:12-bsb10870413-4 (subfolder Savile) and from the exemplar of ΣΕΙΡΑ ΕΝΟΣ ΚΑΙ ΠΕΝΤΗΚΟΝΤΑ ΥΠΟΜΝΗΜΑΤΙΣΤΩΝ ΕΙΣ ΤΗΝ ΟΚΤΑΤΕΥΧΟΝ ΚΑΙ ΤΑ ΤΩΝ ΒΑΣΙΛΕΙΩΝ ΗΔΗ ΠΡΩΤΟΝ ΤΥΠΟΙΣ ΕΚΔΟΘΕΙΣΑ ... ΓΡΗΓΟΡΙΟΥ ΑΛΕΞΑΝΔΡΟΥ ΓΚΙΚΑ, ΤΟΜΟΣ ΠΡΩΤΟΣ, Leipzig 1772, which is today at Staatsbibliothek zu Berlin Preußischer Kulturbesitz, 2" B 1774-1, and was made available in digitized form at http://resolver.staatsbibliothek-berlin.de/SBB00028A5400000000 (subfolder CatenaLipsiensis, NFC; there's a subfolder with ground truth in NFD).

The photos were pre-processed with ScanTailor, esp. in the Catena Lispiensis the two columns were manually split.

For the transcripts of Savile the text in the Patristic Text Archive was used; the transcripts of Catena Lipsiensis were made by Janina Skóra and Karin Metzler and converted to plain text.

The models

In the folder calamari-models is a ensemble of models that was trained via calamari-cross-fold-train on the Catena Lipsiensis ground truth.

In the folder kraken-models are models that were trained on the Savile or the Catena Lispiensis ground truth on several machines. -nfc and -nfd-models are trained on Unicode precomposed and decomposed text respectively.

Evaluation data is in folder eval.

ocr-greek_cursive's People

Contributors

pharos-alexandria avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

ocr-greek_cursive's Issues

how to add one of those recognition models to Kraken

Hi,

I struggle to add other models in Kraken than those that are available in Zenodo repository (the simple command kraken get id-zenodo-repository will make this model load in the proper directory.
How can I proceed with these models to add them to kraken ?

Thank you in advance for your reply

Regards,

Damien Belvèze

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.