GithubHelp home page GithubHelp logo

calamari_models's People

Contributors

chreul avatar chwick avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

calamari_models's Issues

forever "Upgrading from version ..." with fraktur_19th_century (via OCR-D)

Thanks!
Just trying it... but it seems to take forever:

12:38:15.117 INFO ocrd.task_sequence.run_tasks - Start processing task 'calamari
-recognize -I OCR-D-N11 -O OCR-D-OCR -p {"checkpoint":"/usr/local/ocrd_models/ca
lamari/calamari_models/fraktur_19th_century/*.ckpt.json"}'
12:38:16.414 INFO ocrd.workspace_validator - input_file_grp=['OCR-D-N11'] output
_file_grp=['OCR-D-OCR']
Upgrading from version 2
Upgrading from version 3
Upgrading from version 4
Upgrading from version 5
Upgrading from version 6
Upgrading from version 7
Upgrading from version 8
...
Upgrading from version 1738637
Upgrading from version 1738638
Upgrading from version 1738639
...

now aborting... what I've done wrong?

PS: The same command works with https://qurator-data.de/calamari-models/GT4HistOCR/model.tar.xz

python version of models: "bad marshal data"

When loading keras models, the python version needs to be equal between the system the model was trained on and the system loading the file (cf. keras-team/keras#7440). I stumbled upon this when transferring models for inference to another machine running 3.8 instead of 3.7. Would't it be helpful to include this version in the json and provide some more useful error message based on that information? Is there a way to load and save the models in a way that updates them to another python version?

Models from UW3 training?

Is there a place where we can find the models that were trained in the published paper? Or must we perform training ourselves if we wish to use calamari on modern text?

U+EADA with using antiqua_historical_ligs 2020-06-05

When using https://github.com/Calamari-OCR/calamari_models/raw/d61781a9a17e20ca38faf71478185585ea227fd9/antiqua_historical_ligs/0.ckpt.h5 +*.ckpt.*
with current ocrd_all docker image and this scan:

https://digi.ub.uni-heidelberg.de/diglitData/v/montfaucon1719bd2_1.210.tif

I'll get this XML:

...
ſe mit devant les rangs; & approchant de Xanthe, il uſa dune tromperie
qui lui reuit: E ce agir en honnete homme, dit. il, damener un ſecond,

What are these models trained on?

Hello there--

I was wondering if you could disclose any details on what these different models were trained on? Much appriciated!

Best,
James

GT4HistOCR models

"The current model gt4histocr was trained on the entire GT4HistOCR corpus."

A combined model spanning all subcorpora of GT4HistOCR is not a good idea for several reasons:

  1. The subcorpora used different transcription guidelines (labeling similar-looking glyphs with different Unicode characters). The resulting model will therefore get confused about these glyphs.
  2. The size of the subcorpora is vastly different. Currently the bulk is made up 19th c. Fraktur and this will dominate the model and may even crowd out the learning of comparably rare glyphs in the smaller subcorpora.
  3. The quality of the subcorpora (i.e., the remaining errors in ground truth) are also widely different. The incunabula and early modern Latin corpus have been checked much more carefully than others.

For these reasons it is a much better idea to train separate models for the subcorpora.

recommended way to download models

Short of cloning the entire repo or clicking on each individual checkpoint HDF5 and JSON file, is there a simple way to download models individually from this site? (Ideally one that can also be scripted...)

Or could you try to make GH release archives from them?

What version to run the model?

Am trying to run inference on

calamari_models-2.0/uw3-modern-english/0.ckpt

What version of everything should we be using? I am getting all kinds of complaints about mismatches and can't resolve them so far.

  • what python version
  • what tensorflow version
  • what calamari version

Exception: Downgrading of models is not supported (5 to 2). Please upgrade your Calamari instance (currently installed: 1.0.5)

And you also get this trying to install latest calamari:

error: tensorflow 2.9.1 is installed but tensorflow<2.7.0,>=2.4.0 is required by {'tfaip'}

And then there is no version of that in pip.

Missing json file in fraktur_19th_century?

It seems that 0.ckpt.json is missing from https://github.com/Calamari-OCR/calamari_models/tree/master/fraktur_19th_century, which gives the following error when loading:

  File "/opt/miniconda3/envs/nteract/lib/python3.7/site-packages/calamari_ocr/ocr/predictor.py", line 228, in __init__
    data_preproc=data_preproc, processes=processes) for cp in checkpoints]
  File "/opt/miniconda3/envs/nteract/lib/python3.7/site-packages/calamari_ocr/ocr/predictor.py", line 228, in <listcomp>
    data_preproc=data_preproc, processes=processes) for cp in checkpoints]
  File "/opt/miniconda3/envs/nteract/lib/python3.7/site-packages/calamari_ocr/ocr/predictor.py", line 102, in __init__
    ckpt = Checkpoint(checkpoint, auto_update=self.auto_update_checkpoints)
  File "/opt/miniconda3/envs/nteract/lib/python3.7/site-packages/calamari_ocr/ocr/checkpoint.py", line 20, in __init__
    with open(self.json_path, 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: '/Users/arbeit/Documents/calamari-models/calamari_official_fraktur_19th_century/0.ckpt.json'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.