GithubHelp home page GithubHelp logo

query-panel's People

Contributors

gitter-badger avatar justinwb avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

gitter-badger

query-panel's Issues

Idea for server-side SPARQL over a Solid Pod

Introduction

There are numerous problems with having public SPARQL endpoints, stemming mainly from the very power of SPARQL, it is very expressive. Therefore, more lightweight interfaces should be the first concern, but since there might be use cases where SPARQL would make sense, it is also worth discussing relatively simple approaches to enable it. The more elaborate ways to enable SPARQL endpoints is to limit its expressiveness, and there are various ways to do it. However, this proposal focuses on rather limiting the amount of data that would be queried, and by that, try to limit the impact on the server.

We also note that SPARQL has the notion of quad patterns, not just triple patterns. However, we can ensure that most queries stay rather simple (to write), by wisely choosing what is known as the default graph. This will enable most queries to just use simple triple patterns, and not enter the complexity of named graphs. This proposal deals with these two problems, 1) Ensuring server-side SPARQL is evaluated over reasonably sized graphs, and 2) defining graphs to make most queries simple to write and help with problem 1).

Quad Semantics

An RDF graph is a set of triples, where each triple contains a subject, a predicate and an object. In the Turtle serialization (and others), graphs consisting of just triples are what are serialized. Typically, resources on Solid are just a bunch of triples.

With graph names, the triple is extended to a quad. The graph names can be used to name a set of triples, and may be useful to group triples, and to partition the dataset for various purposes. In a Solid context where users may be authorized to write data, one might, for example, want to partition the data so that data from users that are unverified are kept in a different graph than data that have been verified by some party.

The default graph

The SPARQL 1.1 Query Language specification defines the RDF Dataset, and SPARQL queries will be executed over the data in that dataset. What comprises the dataset may be influenced in the query itself or in the protocol.

Moreover,

An RDF Dataset comprises one graph, the default graph, which does not have a name, and zero or more named graphs, where each named graph is identified by an IRI.

In other words, the default graph is what is queried if nothing else is defined, and these queries are then quite simple in that they only use triple patterns.

How can we limit the amount of data that is queried in this case in Solid? There is a clear partitioning of data with Linked Data Platform, the Container. My proposal is therefore:

A Container may expose a SPARQL endpoint, and if so, the RDF Dataset of that endpoint must be have a default graph that is given by the RDF Merge of all of its contained RDF Documents.

Further named graphs

Although it is not needed for the above, I think it is interesting to discuss what could reasonably be considered a "named graph" in Solid context:

Since any data can be written to a resource in, not just data where the Request-URI matches the URIs in the data, for example, I might PUT

<https://example.org/foo> a <https://example.com/Bar> .

to https://alice.dev.inrupt.net/foo/bar.ttl

In effect, that makes https://alice.dev.inrupt.net/foo/bar.ttl similar to the graph names in that it groups certain triples. I think we should simply formalize this intuition: Systems using quad semantics should use the Request-URI of a resource as the graph name of that RDF Document.

Discussing all implications of this is beyond the scope of this issue, the advantage is that the RDF dataset can be amended with some data from outside the container in the cases where the query writer just needs some data beyond the containers graph.

So, for example, FROM clause can be used in the query, say that the endpoint and thus the default graph is https://alice.dev.inrupt.net/bar/ , then

FROM <https://alice.dev.inrupt.net/foo/bar.ttl>
SELECT ?foo WHERE {
 ?foo a [] .
}

would select from everything in /bar/ as well as /foo/bar.ttl. For advanced queries, the entire repertoire of the dataset section can be used, thus making simple things easy and hard things possible.

With this, a Pod would have many SPARQL endpoints, each with different default graphs, but they could query all documents in the Pod by naming them. Cross-pod queries would still require SERVICE or some client side federation though, FROM would only be for queries within Pods.

While this feature is a bit dangerous, since in principle, the client might add all resources in a Pod to the dataset. Some defenses should be added to SPARQL endpoint implementations against such things anyway.

Considerations for internals

While this feature could make it harder to use named graphs for more advanced purposes, I note that it simplifies the implementation of quad store based storage layers under Solid greatly: Web Access Control can be computed over the graph names, and thus integration with some existing SPARQL implementations is greatly simplified. Also, the resourceStore interface can also simply use the graph name of a backend quad store for the concrete implementation of the interface.

Lightweight semaphore mechanism

@timbl suggested in his RWW Design Issue that DELETE INSERT WHERE queries could be used for a semaphore mechanism, so that if the DELETE part fails, a conflict flag should be raised.

I think the mechanism can be illustrated with simpler DELETE DATA/INSERT DATA queries, so here's my description:

Say that client 1 goes:

DELETE DATA { <foo> <baz> "Dahut" } ;
INSERT DATA { <foo> <baz> "Bar" }

independently, client 2 goes

DELETE DATA { <foo> <baz> "Dahut" } ;
INSERT DATA { <foo> <baz> "Foobar" }

before the first client as finished. In that case, the Solid implementation
would return a 409 Conflict to the second client.

There are a few problems, as noted in solid/solid-spec#193 it is a willful violation of the SPARQL 1.1 specification, which says:

Deleting triples that are not present, or from a graph that is not present will have no effect and will result in success.

Moreover, reporting a success or failure status to DELETE queries are problematic from a confidentiality perspective, as is the topic of #2 .

Potential confidentiality breach with DELETE queries.

Ideally, DELETE should only require acl:Write permissions. Currently, it seems that DELETE also require acl:Read, and the reason for this is that a success or failure status is reported for DELETE queries.

For an example, imagine a malicious user "Mallory": Mallory is authorized to write, but not to read, and does not particularly care if he destroys things, he just wants to check if certain triples were there. In that case, he can send the query

DELETE DATA {
  <alice/profile#me> ex:age 14 . 
}

In SPARQL 1.1, Mallory cannot tell whether the triple was there since it will always succeed, so he can't tell that Alice was in fact 14 years old. So, DELETE with acl:Write is OK. Since Solid currently reports whether the DELETE succeeded or failed, acl:Read is required to execute the query to ensure that Mallory can't tell that Alice is 14.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.