GithubHelp home page GithubHelp logo

Revise N-Quads parser about cayley HOT 5 CLOSED

cayleygraph avatar cayleygraph commented on July 17, 2024
Revise N-Quads parser

from cayley.

Comments (5)

kortschak avatar kortschak commented on July 17, 2024

I had some spare time, so there is a branch with this parser implemented here.

@barakmich I have added a converter in nquads (quadfix.go, run zcat 30kmovies.nt.gz | go run nquads/quadfix.go > 30kmovies.nq) that can be used to convert (as close as I can tell) between the nt format in 30kmovies and the N-Quads spec. The converted form is rejected by the parser (for clear reasons) but I'm not sure how much I can make the movie data conform to the spec without breaking things elsewhere - the objects in 30kmovies.nt are prefixed with ':' which I have taken to mean they are blank nodes, but '/' is not a legal character in this type. I could convert them to literals, but I don't know what the '"' wrapping them will do to queries. I'd like to know what you think here.

from cayley.

barakmich avatar barakmich commented on July 17, 2024

Oh, that's actually really nice.

So the ":" prefix thing is an ancient thing that I can safely explain and trace. One way to look at the set is as literals all, with the : prefix merely meaning that it's in the local URI space. The truly blank nodes are the ones matching :\d{6} -- and if we're truly doing NQuads we could use a story here.Those blank node IDs were generated ahead of time, so they're unique and consistent. It doesn't have to be the case that they maintain that ID once they're loaded, they just need to be file-consistent. Right now, it treats them as literally that, and we might care to fix that -- assigning them some reasonable unique ID would be correct.

As for the other ones, they could reasonably be, say, </en/larry_fine_1902> or "/en/larry_fine_1902" -- both are equivalent, I expect, when it comes to the triples created, so whichever is kosher -- and of course literal strings (which are, I think, consistently object fields) like "Humphrey Bogart" need not change (this already happens).

This may cause the removal of a few leading ":"s in the documentation, but probably pretty rarely. And ultimately for the better!

from cayley.

kortschak avatar kortschak commented on July 17, 2024

the : prefix merely meaning that it's in the local URI space

Yes, I figured that was the case.

ones matching :\d{6}

Do you want me to fix them to :\d{6}? At the moment they are :\d{,6}. I don't see any real reason to, but while I'm here.

As for the other ones, they could reasonably be, say, </en/larry_fine_1902> or "/en/larry_fine_1902" -- both are equivalent, I expect, when it comes to the triples created, so whichever is kosher -- and of course literal strings (which are, I think, consistently object fields) like "Humphrey Bogart" need not change (this already happens).

The path types can be made into IRIRef or literal, whichever you prefer, though it's not quite true to say that they are equivalent once they're in the graph.Triple. By design, I have kept the markers that distinguish literals from IRIRef parts so that that information can be used later, rather than stripping the leading ["<] and trainling [">]. The upshot of this is that if you want them to be identical they need to be removed, otherwise - if identity is important - then the queries need to be handled in such a way to ignore the first and last character in those cases (and possible the leading "_:" in blank nodes). I prefer this last case.

from cayley.

kortschak avatar kortschak commented on July 17, 2024

Update on this. I have what I think should work without additional effort - at branch above. Now working through unicode mess.

from cayley.

barakmich avatar barakmich commented on July 17, 2024

Closed with the merge of #82 -- though there's still more semantics around it, the revision is there.

from cayley.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.