GithubHelp home page GithubHelp logo

Comments (6)

scriptin avatar scriptin commented on August 30, 2024 2

All languages present in JMdict will be included in the next scheduled release as separate JSON files.

from jmdict-simplified.

scriptin avatar scriptin commented on August 30, 2024 1

Hello @rbleuse

Yes, that's possible, but quite a lot of work because of the sheer size of the multilingual JMdict file. (Languages other than English don't have their separate files.) Memory limitations are of the primary concern, right now conversion runs on 6 GB of RAM, and a bigger file would require splitting a processing piece-by-piece as I did for JMnedict.

But that would be a great feature, so I'll definitely look into that.

from jmdict-simplified.

scriptin avatar scriptin commented on August 30, 2024 1

This makes a lot of sense, @aehlke! I am only worried about the granularity because if you have, as in your example, Dutch+German as a single file, this may be easier for some users, compared to a scenario when they need to download and import 2 files separately (one for Dutch, another one for German).

I will do some testing and will add language-specific builds some time in the near future.

from jmdict-simplified.

scriptin avatar scriptin commented on August 30, 2024

Hello @rbleuse! Is this still relevant? If so, I have a question about your use case:

Do you need versions for each language separately, or something like French+English, Deutch+English, etc.? The reason I'm asking is that most languages have pretty small numbers of items translated into them. See the table below. Thus, it may be useful to have English as a default, included in every other version. Or, maybe it's fine to have small language-specific versions.

Let me know what you think. I can do it either way, but I don't want to clutter releases with versions which nobody will use.

Language # of entries
all 198680
eng 198680
ger 123519
rus 67379
hun 41803
dut 40964
spa 34110
fre 15307
swe 14562
slv 8757

from jmdict-simplified.

aehlke avatar aehlke commented on August 30, 2024

I personally use your JSON to generate a Realm database for mobile, so it would be fine either way in my use case... However since the data files are large the goal is to allow users to download only the relevant dictionary sets for their needs. So I split my Realm files and would encourage the same for use cases where the JSON is used by users directly.

I imagine a user who selects, say Dutch may want to also download German rather than English.

from jmdict-simplified.

aehlke avatar aehlke commented on August 30, 2024

What I meant with that comment is that I can see valuable use cases for users to select individual language packs rather than defaulting to always including English or always bundling combinations

from jmdict-simplified.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.