Comments (2)
Hmm so it seems the most critical elements for just getting the kana breakdown based on how I've understood the code thus far is the surface form, left id, right id, and cost.
var surface_form = entry[0];
var left_id = entry[1];
var right_id = entry[2];
var word_cost = entry[3];
var feature = entry.slice(4).join(","); // TODO Optimize
The other features (everything after the 4th element in the CSVs) seem to be only for informational output, and not needed for analysis?
from kuromoji.js.
Without the extra feature elements (part of speech, pronunciation etc.) we can shave off ~20% (20MB) from the uncompressed version. After compression this saves us about ~13% (2MB) which is quite significant.
@takuyaa do you think there are additional ways to bring down the dictionary file sizes?
Thanks again for your work on this very useful project.
from kuromoji.js.
Related Issues (20)
- Wrong pos?
- User dictionary support HOT 1
- Infection blocked ( at avast ) HOT 2
- How do you import a dictionary in React Native?
- Not getting the same results as Kuromoji java HOT 3
- 「見れる」の解析結果がおかしい HOT 2
- Phraze tokenized as single token HOT 3
- gzip library not needed in the browser version
- 、 as 名詞 数
- 微笑み is broken down to 微 and 笑み HOT 1
- Can not load dict from external URL HOT 2
- Builder wont accept url to data folder in chrome extension HOT 1
- kuromoji-vercel
- ローカルでは動くのに、Webサーバー上ではkuromoji.jsが動作しません HOT 8
- This repository is not maintained now? HOT 3
- gulpタスクが実行できない HOT 1
- can't resolve path . HOT 1
- Doesn't work in Firefox because of error in loading array buffer
- byte length of Int16Array should be a multiple of 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from kuromoji.js.