Comments (6)
All languages present in JMdict will be included in the next scheduled release as separate JSON files.
from jmdict-simplified.
Hello @rbleuse
Yes, that's possible, but quite a lot of work because of the sheer size of the multilingual JMdict file. (Languages other than English don't have their separate files.) Memory limitations are of the primary concern, right now conversion runs on 6 GB of RAM, and a bigger file would require splitting a processing piece-by-piece as I did for JMnedict.
But that would be a great feature, so I'll definitely look into that.
from jmdict-simplified.
This makes a lot of sense, @aehlke! I am only worried about the granularity because if you have, as in your example, Dutch+German as a single file, this may be easier for some users, compared to a scenario when they need to download and import 2 files separately (one for Dutch, another one for German).
I will do some testing and will add language-specific builds some time in the near future.
from jmdict-simplified.
Hello @rbleuse! Is this still relevant? If so, I have a question about your use case:
Do you need versions for each language separately, or something like French+English, Deutch+English, etc.? The reason I'm asking is that most languages have pretty small numbers of items translated into them. See the table below. Thus, it may be useful to have English as a default, included in every other version. Or, maybe it's fine to have small language-specific versions.
Let me know what you think. I can do it either way, but I don't want to clutter releases with versions which nobody will use.
Language | # of entries |
---|---|
all | 198680 |
eng | 198680 |
ger | 123519 |
rus | 67379 |
hun | 41803 |
dut | 40964 |
spa | 34110 |
fre | 15307 |
swe | 14562 |
slv | 8757 |
from jmdict-simplified.
I personally use your JSON to generate a Realm database for mobile, so it would be fine either way in my use case... However since the data files are large the goal is to allow users to download only the relevant dictionary sets for their needs. So I split my Realm files and would encourage the same for use cases where the JSON is used by users directly.
I imagine a user who selects, say Dutch may want to also download German rather than English.
from jmdict-simplified.
What I meant with that comment is that I can see valuable use cases for users to select individual language packs rather than defaulting to always including English or always bundling combinations
from jmdict-simplified.
Related Issues (20)
- Add JSON schema validation HOT 2
- More directions on set up? HOT 2
- Possible to get the JSON file? HOT 1
- Update with latest JMdict? HOT 2
- Automatically update when source dictionaries are updated HOT 4
- Add g_type attribute on gloss elements
- New JMnedict packages endpoint HOT 1
- TypeScript type definitions
- Usually Kana HOT 1
- "misc" tags for senses HOT 2
- Publish NPM packages HOT 1
- appliesToKanji / appliesToKana became empty for senses HOT 6
- *.tgz files are not compressed
- KanjiDic? HOT 5
- Generate documentation from types HOT 1
- RADKFILE/KRADFILE HOT 1
- Kanjidic 3.5.0 json is missing some radicals HOT 3
- xref element in JMdict sometimes contains a reb with JIS centre-dots HOT 1
- Make BaseX a build script dependency
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from jmdict-simplified.