calamari-ocr / calamari_models Goto Github PK
View Code? Open in Web Editor NEWPretrained mixed models to be used with Calamari.
License: MIT License
Pretrained mixed models to be used with Calamari.
License: MIT License
Thanks!
Just trying it... but it seems to take forever:
12:38:15.117 INFO ocrd.task_sequence.run_tasks - Start processing task 'calamari
-recognize -I OCR-D-N11 -O OCR-D-OCR -p {"checkpoint":"/usr/local/ocrd_models/ca
lamari/calamari_models/fraktur_19th_century/*.ckpt.json"}'
12:38:16.414 INFO ocrd.workspace_validator - input_file_grp=['OCR-D-N11'] output
_file_grp=['OCR-D-OCR']
Upgrading from version 2
Upgrading from version 3
Upgrading from version 4
Upgrading from version 5
Upgrading from version 6
Upgrading from version 7
Upgrading from version 8
...
Upgrading from version 1738637
Upgrading from version 1738638
Upgrading from version 1738639
...
now aborting... what I've done wrong?
PS: The same command works with https://qurator-data.de/calamari-models/GT4HistOCR/model.tar.xz
When loading keras models, the python version needs to be equal between the system the model was trained on and the system loading the file (cf. keras-team/keras#7440). I stumbled upon this when transferring models for inference to another machine running 3.8 instead of 3.7. Would't it be helpful to include this version in the json and provide some more useful error message based on that information? Is there a way to load and save the models in a way that updates them to another python version?
Hello
@ChWick
When i test Calamari ocr on my printed database it show me just statistic results and there is no lines of characters shown:
Thank you for your response
Is there a place where we can find the models that were trained in the published paper? Or must we perform training ourselves if we wish to use calamari on modern text?
When using https://github.com/Calamari-OCR/calamari_models/raw/d61781a9a17e20ca38faf71478185585ea227fd9/antiqua_historical_ligs/0.ckpt.h5 +*.ckpt.*
with current ocrd_all docker image and this scan:
https://digi.ub.uni-heidelberg.de/diglitData/v/montfaucon1719bd2_1.210.tif
I'll get this XML:
...
ſe mit devant les rangs; & approchant de Xanthe, il uſa dune tromperie
qui lui reuit: E ce agir en honnete homme, dit. il, damener un ſecond,
Thanks for renaming antiqua_modern to uw3-modern-english, because model seems to miss some accented characters. Example:
https://digi.hadw-bw.de/view/di016/0032
»Grabstein eines Hildebertus. Urspriinglich in der Peterskirche ...«
Is there a calamari model for modern antiqua fonts & german?
Hello there--
I was wondering if you could disclose any details on what these different models were trained on? Much appriciated!
Best,
James
"The current model gt4histocr was trained on the entire GT4HistOCR corpus."
A combined model spanning all subcorpora of GT4HistOCR is not a good idea for several reasons:
For these reasons it is a much better idea to train separate models for the subcorpora.
I want to train a set of my own handwriting model and want to know how to train。
Dear reader,
do you have any details about the model in fraktur_19th_century?
Is it based on gt4histocr ground truth?
Kind regards.
Short of cloning the entire repo or clicking on each individual checkpoint HDF5 and JSON file, is there a simple way to download models individually from this site? (Ideally one that can also be scripted...)
Or could you try to make GH release archives from them?
Am trying to run inference on
calamari_models-2.0/uw3-modern-english/0.ckpt
What version of everything should we be using? I am getting all kinds of complaints about mismatches and can't resolve them so far.
Exception: Downgrading of models is not supported (5 to 2). Please upgrade your Calamari instance (currently installed: 1.0.5)
And you also get this trying to install latest calamari:
error: tensorflow 2.9.1 is installed but tensorflow<2.7.0,>=2.4.0 is required by {'tfaip'}
And then there is no version of that in pip.
It seems that 0.ckpt.json
is missing from https://github.com/Calamari-OCR/calamari_models/tree/master/fraktur_19th_century, which gives the following error when loading:
File "/opt/miniconda3/envs/nteract/lib/python3.7/site-packages/calamari_ocr/ocr/predictor.py", line 228, in __init__
data_preproc=data_preproc, processes=processes) for cp in checkpoints]
File "/opt/miniconda3/envs/nteract/lib/python3.7/site-packages/calamari_ocr/ocr/predictor.py", line 228, in <listcomp>
data_preproc=data_preproc, processes=processes) for cp in checkpoints]
File "/opt/miniconda3/envs/nteract/lib/python3.7/site-packages/calamari_ocr/ocr/predictor.py", line 102, in __init__
ckpt = Checkpoint(checkpoint, auto_update=self.auto_update_checkpoints)
File "/opt/miniconda3/envs/nteract/lib/python3.7/site-packages/calamari_ocr/ocr/checkpoint.py", line 20, in __init__
with open(self.json_path, 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: '/Users/arbeit/Documents/calamari-models/calamari_official_fraktur_19th_century/0.ckpt.json'
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.