Comments (3)
We've done some experiments with other Latin-based languages. The results were mostly satisfactory, although any special characters from these languages will be replaced with
the closest equivalent from the Latin alphabet.
Non-Latin script languages are not supported by the training set at all, so it won't work in these cases
from nougat.
For text line position detection, is mupdf better than cptn?
from nougat.
I have no insight on that matter. We are not performing a text detection step
from nougat.
Related Issues (20)
- dataset HOT 2
- Inference Stucks [NLTK Error + MISSING_PAGE_FAIL=1]
- Fine-tuning base model fails with KeyError "pytorch-lightning_version" HOT 2
- WARNING:root:GPU VRAM is too small. Computing on CPU.
- How to delete downloaded model? HOT 4
- How to process a pdf to convert to mmd inside of python instead of bash? (instead of nougat path/to/file.pdf -o output_directory -m 0.1.0-base)
- Error in generating binary.jar file from PDF-FIGURES2
- Finetune Chinese data with Warning: Found repetitions in sample0 HOT 2
- Add transformers usage & demo link to model card
- How do I specify the use of a particular GPU for inference? HOT 1
- pypdfium2 in rasterize() causing memory leak? HOT 1
- [Request] Implement CPU mode even when GPU system can support. HOT 3
- Not working
- Train nougat with mbart scratch init HOT 10
- Latex to PDF HOT 2
- Can it be used for commercial purposes HOT 1
- Slow Execution of test.py Despite Model and Data Being on cuda:0
- Downloaded Pytorch, conversion doesn't run anymore (returns immediately) HOT 4
- Iterating over the pages in the generated mmd file in python [INFO] HOT 2
- Please advise approach to increase Nougat ocr quality for small size super subscript characters
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from nougat.