Comments (5)
Tried a fix in [issue_55_fix_chinese_stats](https://github.com/jzohrab/lute/tree/issue_55_fix_chinese_stats)
but the code is way too slow for real prod usage. There may be a far better way to do this, not sure what just yet though.
from lute.
Tried another way that didn't rely on a full render calculation, also failed spectacularly with timeout. New class TokenCoverage
in same branch. Messy code too, which is great.
from lute.
Tried yet another method using regex matches, still nowhere near completes processing before 30s timeout, so it's way too slow for prod.
Have sunk several hours into this, because a) it's interesting, and b) if I could figure this out, I'd be able to drop the TextTokens table, which takes up a lot of space. Currently, I'm really only using the TextTokens table for calculating stats -- ... actually, the current methods would probably suffice for calculating stats, so I may revisit this idea for that.
Regardless, I'm still not sure how to calculate coverage accurately for Chinese at the moment. The first method used (do a fake render) seemed to be the best -- still slow-ish, but maybe there are some good optimizations possible in the rendering calculations which feel overcomplicated.
from lute.
Returned to the first method (effectively rendering each page in code), found some good simplifications to the renderable calculator class, but still not good enough. For a book of ~100K spanish words, the stats calc takes ~20s on my Mac, not usable.
Still found some good code optimizations, they're pushed to the branch, and can be pulled into the develop branch. Will handle that separately. Leaving this issue open.
from lute.
Reducing the calc size makes it workable. Merged into the dev branch, added wiki faq page about it -- https://github.com/jzohrab/lute/wiki/Stats-calculation -- and will include it in next launch. Phew.
from lute.
Related Issues (20)
- Error on import: "...Integrity constraint violation: 19 UNIQUE constraint failed: words.WoTextLC, words.WoLgID" HOT 1
- Change how Lute displays overlapping Terms, so that words/characters aren't written twice
- Bust javascript cache on release HOT 1
- Raspberry Pi installation fails HOT 3
- Japanese ctrl-c not respecting paragraphs HOT 2
- Ability to remove "Mark rest as known" button HOT 4
- Creating a registry for the app image in docker hub is possible? HOT 2
- Missing る in texts HOT 2
- Support multiple parents HOT 2
- Preserve term case for German nouns HOT 3
- Broken RTL languages support. HOT 9
- Bare metal install throws unless MeCab is installed? HOT 1
- [Bug] Turkish İi Iı casing problem HOT 8
- Fix Windows GitHub CI HOT 4
- Hover window sometimes gets stuck HOT 4
- Consider different way of handling "reset" after demo HOT 1
- Image can't be selected if the "border" pushes it to a new line HOT 3
- Changing case of Turkish words not working HOT 3
- Fatal error after container install HOT 7
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lute.