Comments (3)
Hm, both tesseract-ocr and r-tesseract seems like they could be included in Guix proper and thus they should go there. Iām not sure about the training data files though. According to the repo they seem to be open source, with the source available in theory, but no instructions how to build them. Would be nice to have them in Guix proper too though, since tesseract is quite useless without data.
edit: It seems the LSTM source data is in this repo: https://github.com/tesseract-ocr/langdata_lstm
from guix-science.
My packaging foo is quite weak, but is packaging this training data a use case for the copy-build-system
? This just reminded me of something like a vim plugin which uses that. I'm also not sure what to do about the fact that this git repository of traineddata
files is huge. The eng.traineddata file 22.4 Mb by itself and there are dozens of languages. Can git-fetch
handle partial shallow sparse clones?
I saw this for tesseract 4 which looks like a build plan: https://github.com/tesseract-ocr/tesstrain
from guix-science.
Iām not sure tesstrain would work. As far as I know it expects image files, which the langdata repo does not contain. The latter relies on something to generate images from fonts and the supplied text. Also apparently it takes weeks to train the LSTM, so maybe not a good idea to build it from source.
But yeah, the binary training data is a case for copy-build-system
. I think big is not that much of a problem, but Iām not sure about shallow/sparse clones.
from guix-science.
Related Issues (20)
- Read Only Errors in Stdout of "jupyter lab"? HOT 4
- Rstudio don't show text in panes HOT 17
- grid-engine-core depends on openssl-1.0 HOT 1
- channel not working.
- RStudio build fails HOT 1
- python-tornado conflict in python-jupyterhub HOT 2
- python-jupyterhub-ldapauthenticator - python-tornado conflict HOT 3
- Substitute server status? HOT 2
- RStudio only works in `$HOME/.guix-profile` HOT 1
- Recommendation: Can someone try to build zotero for guix? HOT 7
- Help packaging GATK-4.4.0.0 HOT 2
- Error installing bazel: error: in phase 'build': uncaught exception HOT 7
- Profile build failure due to "unbound variable" which should be bound via gnu/packages HOT 2
- Wrong hash when installing python-tensorflow HOT 2
- tensorflow breaks due to new flatbuffers version
- Updating jax/jaxlib to the latest version HOT 1
- Upstreaming rstudio-server
- Adding to Savannah
- rstudio-server service example
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
š Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ššš
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ā¤ļø Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from guix-science.