calamari-ocr / calamari_models Goto Github PK

View Code? Open in Web Editor NEW

54.0 54.0 17.0 2.29 GB

Pretrained mixed models to be used with Calamari.

License: MIT License

calamari_models's People

Contributors

Stargazers

Watchers

Forkers

hejin k-sandhu vikas-kumar-infrrd zzmcdc synthetik-technologies wjn0918 ganwang wjson usteiner9 midnight93 ocr-collection sunxingxingtf admariner ltdeivis wocnb gsoykan folichonne openfnord

calamari_models's Issues

forever "Upgrading from version ..." with fraktur_19th_century (via OCR-D)

Thanks!
Just trying it... but it seems to take forever:

12:38:15.117 INFO ocrd.task_sequence.run_tasks - Start processing task 'calamari
-recognize -I OCR-D-N11 -O OCR-D-OCR -p {"checkpoint":"/usr/local/ocrd_models/ca
lamari/calamari_models/fraktur_19th_century/*.ckpt.json"}'
12:38:16.414 INFO ocrd.workspace_validator - input_file_grp=['OCR-D-N11'] output
_file_grp=['OCR-D-OCR']
Upgrading from version 2
Upgrading from version 3
Upgrading from version 4
Upgrading from version 5
Upgrading from version 6
Upgrading from version 7
Upgrading from version 8
...
Upgrading from version 1738637
Upgrading from version 1738638
Upgrading from version 1738639
...

now aborting... what I've done wrong?

PS: The same command works with https://qurator-data.de/calamari-models/GT4HistOCR/model.tar.xz

python version of models: "bad marshal data"

When loading keras models, the python version needs to be equal between the system the model was trained on and the system loading the file (cf. keras-team/keras#7440). I stumbled upon this when transferring models for inference to another machine running 3.8 instead of 3.7. Would't it be helpful to include this version in the json and provide some more useful error message based on that information? Is there a way to load and save the models in a way that updates them to another python version?

The result given by Calamari is not completed

Hello
@ChWick
When i test Calamari ocr on my printed database it show me just statistic results and there is no lines of characters shown:

Thank you for your response

Models from UW3 training?

Is there a place where we can find the models that were trained in the published paper? Or must we perform training ourselves if we wish to use calamari on modern text?

U+EADA with using antiqua_historical_ligs 2020-06-05

When using https://github.com/Calamari-OCR/calamari_models/raw/d61781a9a17e20ca38faf71478185585ea227fd9/antiqua_historical_ligs/0.ckpt.h5 +*.ckpt.*
with current ocrd_all docker image and this scan:

https://digi.ub.uni-heidelberg.de/diglitData/v/montfaucon1719bd2_1.210.tif

I'll get this XML:

...
ſe mit devant les rangs; &amp; approchant de Xanthe, il uſa dune tromperie
qui lui reuit: E ce agir en honnete homme, dit. il, damener un ſecond,

uw3-modern-english = antiqua_modern (1.0.zip)

Thanks for renaming antiqua_modern to uw3-modern-english, because model seems to miss some accented characters. Example:

https://digi.hadw-bw.de/view/di016/0032

»Grabstein eines Hildebertus. Urspriinglich in der Peterskirche ...«

Is there a calamari model for modern antiqua fonts & german?

What are these models trained on?

Hello there--

I was wondering if you could disclose any details on what these different models were trained on? Much appriciated!

Best,
James

GT4HistOCR models

"The current model gt4histocr was trained on the entire GT4HistOCR corpus."

A combined model spanning all subcorpora of GT4HistOCR is not a good idea for several reasons:

The subcorpora used different transcription guidelines (labeling similar-looking glyphs with different Unicode characters). The resulting model will therefore get confused about these glyphs.
The size of the subcorpora is vastly different. Currently the bulk is made up 19th c. Fraktur and this will dominate the model and may even crowd out the learning of comparably rare glyphs in the smaller subcorpora.
The quality of the subcorpora (i.e., the remaining errors in ground truth) are also widely different. The incunabula and early modern Latin corpus have been checked much more carefully than others.

For these reasons it is a much better idea to train separate models for the subcorpora.

How to train our own model？

I want to train a set of my own handwriting model and want to know how to train。

fraktur_19th_century vs github.com/qurator-spk/train-calamari-gt4histocr

Dear reader,
do you have any details about the model in fraktur_19th_century?

Is it based on gt4histocr ground truth?

Kind regards.

recommended way to download models

Short of cloning the entire repo or clicking on each individual checkpoint HDF5 and JSON file, is there a simple way to download models individually from this site? (Ideally one that can also be scripted...)

Or could you try to make GH release archives from them?

What version to run the model?

Am trying to run inference on

calamari_models-2.0/uw3-modern-english/0.ckpt

What version of everything should we be using? I am getting all kinds of complaints about mismatches and can't resolve them so far.

what python version
what tensorflow version
what calamari version

Exception: Downgrading of models is not supported (5 to 2). Please upgrade your Calamari instance (currently installed: 1.0.5)

And you also get this trying to install latest calamari:

error: tensorflow 2.9.1 is installed but tensorflow<2.7.0,>=2.4.0 is required by {'tfaip'}

And then there is no version of that in pip.

Which model is better for modern French as a pre-training model

Missing json file in fraktur_19th_century?

It seems that 0.ckpt.json is missing from https://github.com/Calamari-OCR/calamari_models/tree/master/fraktur_19th_century, which gives the following error when loading:

  File "/opt/miniconda3/envs/nteract/lib/python3.7/site-packages/calamari_ocr/ocr/predictor.py", line 228, in __init__
    data_preproc=data_preproc, processes=processes) for cp in checkpoints]
  File "/opt/miniconda3/envs/nteract/lib/python3.7/site-packages/calamari_ocr/ocr/predictor.py", line 228, in <listcomp>
    data_preproc=data_preproc, processes=processes) for cp in checkpoints]
  File "/opt/miniconda3/envs/nteract/lib/python3.7/site-packages/calamari_ocr/ocr/predictor.py", line 102, in __init__
    ckpt = Checkpoint(checkpoint, auto_update=self.auto_update_checkpoints)
  File "/opt/miniconda3/envs/nteract/lib/python3.7/site-packages/calamari_ocr/ocr/checkpoint.py", line 20, in __init__
    with open(self.json_path, 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: '/Users/arbeit/Documents/calamari-models/calamari_official_fraktur_19th_century/0.ckpt.json'

calamari-ocr / calamari_models Goto Github PK

calamari_models's People

Contributors

Stargazers

Watchers

Forkers

calamari_models's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs