Comments (8)
I can confirm it is a Zorba bug: 28msec/zorba/issues/228
from jmdict-simplified.
Fixed by adding a function which checks if a string is duplicated, and if it is, returns only a half.
from jmdict-simplified.
Ah after running a build myself, it's fine! For some reason the archive I've downloaded from the release has the question marks.
from jmdict-simplified.
Yes, all tags are broken in the release. Thank you for reporting that.
Question mark is a default value when converting tags. If none of cases match, it inserts a question mark. This whole tagging thing in original files is so yucky because it uses custom XML entities (e.g. <!ENTITY oK "word containing out-dated kanji">
), which are substituted with text, and because of that I had to implement the backwards conversion.
Since all tags are broken, and original XML file hasn't changed, I think it's something related to how Zorba handles XML entities. I'll have to figure out what it is.
Can you tell which version of Zorba you are using?
from jmdict-simplified.
Zorba NoSQL Query Processor, Version: 3.1.0
from jmdict-simplified.
All entities appear to be parsed "doubled" inside sense → pos
elements, e.g. &n;
is interpreted as noun (common) (futsuumeishi)noun (common) (futsuumeishi)
instead of just noun (common) (futsuumeishi)
. The only exception is the very first entry of the dictionary. (I discovered that by adding trace messages into the code.) At first it made me think that I somehow concatenate variables, but I can't find any part of the code which could do that, unless I terribly misunderstand how variables in XQuery work.
I also added a script for generating tag-related code to make sure I haven't messed that up. It doesn't resolve this problem either.
My version of Zorba is the same.
Can you possibly pull the master as it is right now and re-run the build as ./build.sh dev dev
and tell me if you get the same error as below? I'm starting to think if it's a bug in Zorba.
$ ./build.sh dev dev
Processing a full EN version
</home/d/jmdict-simplified/src/tags.xq>:180,18: error [unknown-tag]: Unknown tag 'noun (common) (futsuumeishi)noun (common) (futsuumeishi)' on entity 1000010
Processing EN version with common words only
</home/d/jmdict-simplified/src/tags.xq>:180,18: error [unknown-tag]: Unknown tag 'noun (common) (futsuumeishi)noun (common) (futsuumeishi)' on entity 1000110
Done
from jmdict-simplified.
May I recommend documenting a docker-based build process to avoid system environment differences?
from jmdict-simplified.
Well, Docker image uses the same version we both use, so there is no difference. As for documenting, I'd throw away Zorba altogether rather than documenting it. After using it a bit I absolutely hate it: its error messages are really bad, and it's not in active development anymore.
If you know good alternatives, please tell me.
from jmdict-simplified.
Related Issues (20)
- Add JSON schema validation HOT 2
- More directions on set up? HOT 2
- Possible to get the JSON file? HOT 1
- Update with latest JMdict? HOT 2
- Automatically update when source dictionaries are updated HOT 4
- Extract specific language HOT 6
- Add g_type attribute on gloss elements
- New JMnedict packages endpoint HOT 1
- TypeScript type definitions
- Usually Kana HOT 1
- "misc" tags for senses HOT 2
- Publish NPM packages HOT 1
- appliesToKanji / appliesToKana became empty for senses HOT 6
- *.tgz files are not compressed
- KanjiDic? HOT 5
- Generate documentation from types HOT 1
- RADKFILE/KRADFILE HOT 1
- Kanjidic 3.5.0 json is missing some radicals HOT 3
- xref element in JMdict sometimes contains a reb with JIS centre-dots HOT 1
- Make BaseX a build script dependency
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from jmdict-simplified.