GithubHelp home page GithubHelp logo

Comments (8)

scriptin avatar scriptin commented on July 26, 2024 1

I can confirm it is a Zorba bug: 28msec/zorba/issues/228

from jmdict-simplified.

scriptin avatar scriptin commented on July 26, 2024 1

Fixed by adding a function which checks if a string is duplicated, and if it is, returns only a half.

from jmdict-simplified.

aehlke avatar aehlke commented on July 26, 2024

Ah after running a build myself, it's fine! For some reason the archive I've downloaded from the release has the question marks.

from jmdict-simplified.

scriptin avatar scriptin commented on July 26, 2024

Yes, all tags are broken in the release. Thank you for reporting that.

Question mark is a default value when converting tags. If none of cases match, it inserts a question mark. This whole tagging thing in original files is so yucky because it uses custom XML entities (e.g. <!ENTITY oK "word containing out-dated kanji">), which are substituted with text, and because of that I had to implement the backwards conversion.

Since all tags are broken, and original XML file hasn't changed, I think it's something related to how Zorba handles XML entities. I'll have to figure out what it is.

Can you tell which version of Zorba you are using?

from jmdict-simplified.

aehlke avatar aehlke commented on July 26, 2024

Zorba NoSQL Query Processor, Version: 3.1.0

from jmdict-simplified.

scriptin avatar scriptin commented on July 26, 2024

All entities appear to be parsed "doubled" inside sense → pos elements, e.g. &n; is interpreted as noun (common) (futsuumeishi)noun (common) (futsuumeishi) instead of just noun (common) (futsuumeishi). The only exception is the very first entry of the dictionary. (I discovered that by adding trace messages into the code.) At first it made me think that I somehow concatenate variables, but I can't find any part of the code which could do that, unless I terribly misunderstand how variables in XQuery work.

I also added a script for generating tag-related code to make sure I haven't messed that up. It doesn't resolve this problem either.

My version of Zorba is the same.

Can you possibly pull the master as it is right now and re-run the build as ./build.sh dev dev and tell me if you get the same error as below? I'm starting to think if it's a bug in Zorba.

$ ./build.sh dev dev
Processing a full EN version
</home/d/jmdict-simplified/src/tags.xq>:180,18: error [unknown-tag]: Unknown tag 'noun (common) (futsuumeishi)noun (common) (futsuumeishi)' on entity 1000010
Processing EN version with common words only
</home/d/jmdict-simplified/src/tags.xq>:180,18: error [unknown-tag]: Unknown tag 'noun (common) (futsuumeishi)noun (common) (futsuumeishi)' on entity 1000110
Done

from jmdict-simplified.

aehlke avatar aehlke commented on July 26, 2024

May I recommend documenting a docker-based build process to avoid system environment differences?

from jmdict-simplified.

scriptin avatar scriptin commented on July 26, 2024

Well, Docker image uses the same version we both use, so there is no difference. As for documenting, I'd throw away Zorba altogether rather than documenting it. After using it a bit I absolutely hate it: its error messages are really bad, and it's not in active development anymore.

If you know good alternatives, please tell me.

from jmdict-simplified.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.