streamreasoning / rsp-ql Goto Github PK

View Code? Open in Web Editor NEW

18.0 18.0 14.0 357 KB

A home of RSP-QL syntax and semantics discussion

License: Apache License 2.0

Web Ontology Language 3.25% HTML 96.75%

rsp-ql's People

Contributors

Stargazers

Watchers

Forkers

lpdanh jpcik dellaglio beta2k josixp ocorcho alasdairgray axelpolleres webdata keski greentara pbmdq jamirescosta

rsp-ql's Issues

Define Window Function parameters

https://github.com/streamreasoning/RSP-QL/blob/master/Semantics.md#window-functions
A lot of parameters are used for window function parameters but are never introduced.

Mandatory and optional characteristics of a time-stamped graph

The definition of time-stamped graph in https://github.com/streamreasoning/RSP-QL/blob/master/Semantics.md does not rigorously capture a group consensus about the concept. This issue may be used to formulate a list of mandatory and optional characteristics, that can be accepted or rejected by the group. Once the concept is agreed, then a rigorous definition may be drafted.

functional requirements do not provide details about query operators

The Functional Requirements section could explicitly mention what operators an RSP-QL language should support. This could be too specific, so it may depend on the level of granularity we want to give to this seciton.

comments/issues on the Time Vocabulary

Please post any comments/issues on the first proposal of a Time Vocabulary, which we will use to relate a graph to a time instant for timestamped graphs.

The Time Vocab can be found here: https://github.com/streamreasoning/RSP-QL/blob/master/TimeVocab.owl
A visually more appealing version is here: http://www.essepuntato.it/lode/owlapi/https://raw.githubusercontent.com/streamreasoning/RSP-QL/master/TimeVocab.owl

Some initial discussion already took place on the list in this thread: https://lists.w3.org/Archives/Public/public-rsp/2015Jun/0040.html

Add concept of convex substream

In order to assist in characterising window functions, I think it would be helpful to define the relationship of "convex substream", being a substream which doesn't leave out any elements of the original stream that fall between any two elements that are included.

This would allow us to say something like:
A time-based window function is a mapping from RDF streams to their time-bounded convex substreams such that ...

Question on storage for processing

Currently the document says "RSPs should process streams of data actively and in-stream, without the need of storing them". However, there are some reasonable operations that would require partial storage, just not storage of the entire past history. For example, to determine if two streams are isomorphic, some storage would be required due to the possibility of different sequential ordering in the serialization.

Should identifying an RDF Stream by an IRI be part of the abstract syntax?

Consider the analogy to graphs and named graph pairs.

The abstract syntax for RDF graph (nor the concrete syntaxes for that matter) does not include the identification of the graph by an IRI.
Of course, an RDF graph can be considered an RDF resource, so an IRI can always *denote" an RDF graph.
But it is only when the named graph pair is introduced, leading up to the definition of RDF dataset (https://www.w3.org/TR/rdf11-concepts/#section-dataset), that the data model is extended to explicitly identify an RDF graph by an IRI.
Prior to the publication of RDF 1.1, named graphs were developed outside of the RDF specification by necessity.

It seems unlikely that there will be a new structure built on top of RDF streams, at least any time soon (an RDF stream bundle?). The need to identify RDF streams by IRI has already been noted, e.g. to refer to them for querying.
To avoid a proliferation of methods for identifying an RDF stream by an IRI, it seems best to include this in scope.

However, if the abstract syntax is introduced, then it is necessary to talk about semantics and queries.

The semantics of RDF datasets is in its current messy state because several different semantics for named graphs were developed independently. To avoid such a situation for a "named RDF stream", should the semantics be standardized at this time?
When a query refers to several streams by their IRIs, does this mean that the query is applied to the merger or union of the streams? Should the query be able to specify one or the other?
Can the stream IRI be used elsewhere in the query, e.g. in selecting from the default graphs of only one stream?

Reference existing RSP QL

Section 4 starts by stating that there have been many existing proposals. These, or at least the key ones, should be cited.

Functional Requirements

In general, the functional requirements should avoid introducing terms that are undefined or vague, and should not get too specific - they should not anticipate the design. On the other hand, they should not be so general that they have little effect.

In particular, I suggest the following:
#1. Stronger: "RDF streams should be representable in an abstract model, and the semantics of this abstract model should provide the basis of the results of RSP queries."
#2. Weaker: "The RDF stream abstract model should be serialized in concrete formats derived from standard formats, extending beyond the standard format only when necessary
#4. Correction: "RDF streams may have timestamps based on different notions of time (time instants, intervals) with different semantics (application, validity, transactional)."

Re: " In case no timestamp is associated to an RDF stream data item, the system is responsible of managing time-based ordering of stream items."
This is something quite different than the earlier part of the requirement, and somewhat controversial. I suggest it be made a separate item and discussed at greater length.
#5. Re: "reactively" - what exactly do we mean by this requirement? How does this requirement impose constraints on the abstract model, semantics, concrete language, or query langauge?
#7. Greater generality: "RSP engines should be able to query a portion of the knowledge expressed in an RDF stream."

We need a common terminology for the components of an RDF stream: items, elements, members, but please not "events" or "streaming graphs".
#9. It is unclear from the wording, but I don't think it is obvious whether the statement includes:

"RSP engines should support combining multiple RDF streams.
It could be put into one:
"RSP engines should support combining multiple RDF streams as well as stored RDF (aka static RDF graphs or datasets)."
#12. More general:

"RSP queries should be able to access all knowledge expicitly expressed in the stream, including names of named graphs and triples containing such names."
#13. A window is a stream, so there is no need to have "stream/windows"

vocabularies and datatypes for time intervals

Hi,

I saw some examples of RDF encodings of RDF streams. I am wondering if standardizing the vocabulary and datatypes is within the goal this WG.

:g1 {:axel :isIn :RedRoom. :darko :isIn :RedRoom} {:g1 :atInterval 2016-03-01T13:00:00Z/2016-03-01T13:00:10Z}
:g2 {:axel :isIn :BlueRoom. }                     {:g2 :atInterval 2016-03-01T13:00:20Z/2016-03-01T13:00:40Z}
:g3 {:minh :isIn :RedRoom. }                      {:g3 :atInterval 2016-03-01T13:00:50Z/2016-03-01T13:00:60Z}

For instance, based on the above example

What is the prefix for :atInterval? Do we want to define other properties like ":startAt" or ":endAt"
What is datatype for the literal 2016-03-01T13:00:00Z/2016-03-01T13:00:10Z?
Do we want to introduce some built-in RDF functions to retrieve the start and end of an interval literal?

Refine terminology about RDF Streams

There is a need to distinguish RDF streams that have only predicates whose associated order is total.
(Because there are number of important considerations that depend on this property, so it will be a great convenience to have a term for just these kinds of RDF streams.)

Some options:

Use a different term (e.g. RDF braid) for all things that are currently called RDF streams, and use the term "RDF stream" for only those with "totally-ordered predicates.
Retain the term RDF stream for the more general class, as it is now and create a new term, e.g. "RDF totally-ordered stream" or "RDF t.o.-stream"

Querying RDF graphs in windows

Maybe this has been clarified somewhere but I feel that there is an assumption that although RDF streams consist of graphs, with some timestamp or similar, the graphs themselves may not be accessible in the windows.
Basically, we've touched upon this previously and my understanding was that some argued against making the graphs available, and instead proposed that all streaming graphs would be put in "a default graph" which represents a window,. But to me the arguments for this view was not motivated well, other than "in this and this example we can manage without it". So, if possible I would like it to be clarified whether a query like the one below for filtering a stream would be valid:

# Assume that the graphs in this particular stream have more than one timestamp.
# The timestamp property used by the engine is :generatedAt. Now we wish to create
# a substream based on a specific time property (:observedAt) excluding all non-
# applicable events in the stream.

PREFIX : <http://examplel.org#>
REGISTER STREAM :filteredEventStream AS

CONSTRUCT ISTREAM {
   GRAPH ?g { ?subj1 ?prop1 ?obj1 . }
   ?g :observedAt ?obj2 .
   ?g ?prop3 ?obj3 .
}
FROM NAMED WINDOW :w ON :fullEventStream [RANGE PT10S]
WHERE {
   WINDOW :w {
      GRAPH ?g { ?subj1 ?prop1 ?obj1 . }
      ?g :observedAt ?obj2 .
      OPTIONAL { ?g ?prop3 ?obj3 . }
   }
}

Are there any arguments for why this query should not be valid? How would this query be expressed in a general way if the graphs cannot be referenced?

Reorganize material for correct sequential flow

As described in several ednotes, there is a need to reorder certain definitions.

Identify a time vocabulary

To support the data model where a graph can be annotated with different types of timestamp, we need to agree on a vocabulary of standard predicates for representing these time relationships.
Alejandro has mentioned that there is work taking place in the W3C Spatial Data on the Web Working Group

rename "timestamped graph"

In section 2.3 of RDF Stream Abstract Syntax and Semantics timestamped graphs are defined. The first part of the definition begins:

A timestamped graph is defined as an RDF Dataset [...]

But referring to an RDF dataset as a graph seems to be a bit of a misnomer. I suggest that timstamped graph is instead replaced by timestamped dataset.

Revisit and Harmonize Examples

After the current pull requests are resolved, the examples should be cleaned up.

Discuss the practical uses of isomorphism

The various theoretical concepts of isomorphism are introduced in the Abstract Syntax and Semantics document to support the definitions of window functions and entailments. The practical side of isomorphism is out of scope for this document, but is worthy of discussion somewhere. E.g.
under what circumstances is isomorphism decidable? This appears to be the case iff the stream is finite.

Missing Definition of Timestamp Predicate

In a timestamped graph, the predicate in the timestamp triple cannot be just any predicate.
It should have a specified range (of termporal entities).
This can be accomplished, to some extent, with RDFS, by identifying an rdfs:range.
Also the (partial or total) order that will be used to structure a stream must be specified.

Namespace for the time vocabulary

We should work out what the final namespace would be. Maybe we should consider a purl. Another alternative was under the community groups W3C namespace

REvise FILTER MINUS query

Check if the example is ok and if it would work as stated

Add examples in the Serialisation section

Window function definitions revised

The current definitions of window functions have the following issues:

contain some wording referring to how the function application is iterated. A window function should only be applied once, with the mechanism for iteration being an extra structure defined later.
must respect S-isomorphism. That is, the two streams resulting from the application of a window function on two S-isomorphic streams should be S-isomorphic.

Define isomorphism of RDF stream relative to sequence order

Even in the case of an RDF stream using a single predicate with a totally-order set of temporal entities( e.g. time instants), there is still the possibility that there are two timestamped graphs in the stream that have the same timestamp. Therefore these stream items could be ordered differently in the sequence while still satisfying the stream constraints.

The consensus is (to my knowledge) that these streams should always give the same query results (when the same query is applied). This suggests a kind of equivalence class.

There should be a definition of the relationship between two streams that contain the same items, differing only in their order in the sequence. This kind of relationship is usually called "isomorphism". There is already a notion of isomorphism in RDF graphs, due to bnode renaming. We could still call this some qualified kind of isomorphism (e.g. S-isomorphism, where S is for sequence). Bnode (only) isomorphism could be called B-isomorphism, and the combination of the two, simply "isomorphism".

I propose that these definitions, and discussion about them, be contained in a new section called "Isomorphism", following the "Data Model" section.

Add windows in the past

Windows whose upper interval is not now, but some time in the past

add requirements that are not in the scope of this document

S2S Section, evaluate if it is appropriate

It has been argued that RSP queries without windows should be possible. In fact they make sense for some cases, e.g. filtering streams, should they be kept?

Example 30

I believe the title should state that it is RSP-QL rather than SPARQLStream

Clarify Definition of Timestamped Graph Re Multiplicity of Timestamp Triples

The definition has the word "single" with a strike-out.
All further developments of the RDF stream have a single timestamp triple as the default graph in the RDF Dataset that is a timestamped graph.

Therefore, we need to either accept that a timestamped graph has this restricted structure, or modify RDF streams to accommodate timestamped graphs with multiple triples in their default graph.

Note that the definition of RDF stream allows the same named graph (n, g) to be used in multiple items in the stream, so it is possible to assign multiple timestamps to the same named graph within an RDF stream
(n, g) (g p1 t1)
(n, g) (g p2 t2)

If it is desired to transmit nontemporal metadata of the named graphs as part of an RDF stream, then this must be handled by another mechanism (if the default graph is restricted to just the timestamp triple).
Here is one option: timestamp the metadata and add it to the stream:
(n, g) (n p1 t1)
(n, g) (n p2 t2)
(m, h) (m p3 t3)
where h contains metadata about g, e.g. (n r x) where x is not a temporal entity.

include time zone information in timestamps in examples

the current RGN_Location_TempC_Minute_Merged.json includes, for example, the following observations.

  "@graph": [
    {  "@id": "source:Berlin_1",  "observedAt": "2015-01-01T01:01:00"  },
    {  "@id": "source:Madrid_1", "observedAt": "2015-01-01T01:01:00"  },
    {  "@id": "source:Paris_1",  "observedAt": "2015-01-01T01:01:00"  },
... ]

it is important to know whether a processor located in one of those cities would interpret this to be the "same" stream as one which contained

  "@graph": [
    {  "@id": "source:Berlin_1",  "observedAt": "2015-01-01T01:01:00"  },
    {  "@id": "source:Madrid_1", "observedAt": "2015-01-01T01:01:00+01:00"  },
    {  "@id": "source:Paris_1",  "observedAt": "2015-01-01T00:01:00Z"  },
... ]

Time vocabulary deployment

We should set up a LODE page or neologism (if it is still available) to allow us to more easily discuss the vocabulary. Examples

LODE an example deployment can be found with the PAV vocabulary which uses purls
Neologism an example deployment can be found on the wikipathways hosted gene ontology

Example of rdf stream could be expanded to show other cases

Including multiple time annotation predicates, blank nodes or intervals

Conformance

Currently conformance notes are entered in Section 4.11, a subsection of the RDF Stream Query Language.

In W3C specs conformance notes typically come at the start of the document so that you understand how to read the document before you read it!

This section should be move to the top of the document. I would also suggest extending the use of RFC2119 to include the phrases MUST, SHOULD for the requirements section.

Use "element" to refer to the time-stamped graphs in an RDF straem

Temporal Entities and OWL Time

There is an ambiguity or inconsistency in the Abstract Syntax and Semantics document regarding temporal entities. The section "Temporal Entities" says the specification is neutral in regard to temporal ontologies. The section "Instants and Intervals" references the OWL Time Ontology.

Structure of Section 4

S2R, R2R and R2S should be at the same structure level as S2S, not as sub-sections

Punctuation

Section 3.3.1 of https://github.com/streamreasoning/RSP-QL/blob/gh-pages/RSP_Requirements_Design_Document/index.html is unclear as to whether it is proposing the use of punctuations or HTTP chunking. More clarity is needed in this section.

Example of FROM clause could show more cases

Such as multiple streams, multiple windows on same stream or combination with named graphs, i.e. more complex dataset

Discuss streamed named graph

In 3.3.2 Immutability and Event Derivation we state that a new unique graph should be generated for each derived event, which could then link back to other events.
In my understanding this means that events would be referenced by their named graphs (?). When I read the document this made me a bit confused since there is no indication that the named graph structure used for anything other than transporting a set of triples from A to B. All examples seem to assume that: 1) no named graphs are being streamed only triples, or 2) the named graph structure is lost and all triples are added to the default graph of the window upon arrival.
If 2) will we be able to use RSP-QL for CEP? Perhaps this is out of scope for this document but if possible it would be nice if we could we bit clearer about this.

Separate references to timestamped graph from references to its (named) graph

As in the first example, It not precise to say "The following timestamped graph :g1 contains 2 triples ..."
The timestamped graph is not named :g1, and it contains a default graph and a named graph. It is the named graph whose name is :g1 .

Time-bounded and count-bounded substream definitions replaced

As currently expressed, these definitions are mixed up with the concept of window function.

What is actually needed are the following concepts:

Time-bounded RDF Stream
An RDF Stream where for every timestamp predicate that occurs in the stream, there is a temporal entity in its range that bounds (is greater than or equal to) all timestamps of that predicate in the stream.
Finite RDF Stream
An RDF stream with a finite number of items in it.

Note that these are properties that are preserved under finite merger/union, and so form closed algebras under these operations.

RDF Stream Profile: Linked List

There is a form of RDF stream that does not give complete information about timestamps, but only provides order information. A useful form is a linked list, e.g.

:g1 :p _:t1.
:g1 {...}.

:g2 :p _:t2.
_:t2 time:after _:t1
:g2 {...}.

...

where _:t1 and _:t2 are OWL-TIME temporal entities.

It would be helpful to have a concrete usecase for this profile, as well as a specification. The temporal information provided in such an RDF stream should be sufficient for detection of complex events where the pattern is based on the order of its sub-events.

Interplay of timestamped graph predicate with stream order and time-bounded substreams

Should the predicate p in a timestamped graph (g,p,t) be considered when deciding upon the order of timestamped graphs in the stream S or when considering which timestamped graphs are included in a time-bounded substream?

R2S S2R R2R is an appropriate notation?

Blank node as the name of a streamed graph

From 2.3 Timestamped Graphs:

There is exactly one named graph pair <n, G> in the RDF Dataset
(where G is an RDF graph, and n is an IRI or blank node).

This means that the name of the graph in the streamed element/dataset (I'll call it an event from here) can be represented as a blank node, e.g. as Trig:

_:b { :John :isIn :Room1 } .
_:b :observedAt "2017-08-16T16:35:00Z" .

However, blank nodes are always locally scoped to the file or RDF store (or in this case the streamed element), which effectively means that a stream using blank nodes can't contain references to other events in the stream, e.g. if the intention is:

_:b0 { :John :isIn :Room1 } .
_:b0 :observedAt "2017-08-16T16:35:00Z" .

_:b1 { :John :isIn :Room2 } .
_:b1 :observedAt "2017-08-16T16:35:05Z" .
_:b1 :after _:b0 .

_:b2 { :John :isIn :Room3 } .
_:b2 :observedAt "2017-08-16T16:35:10Z" .
_:b2 :after _:b1 .

but each element is streamed separately the labels of the blank nodes don't apply. I'm not saying that we should remove the alternative of having a blank node as the name of a graph but I'm not sure we've covered the implications of actually doing so. For example, from the 3.3.2 Immutability and Event Derivation in the RSP Requirements Design Document:

For RSP this means: (1) create a new (unique) graph for the derived event and (2) possibly
link back to the base event(s) thus enabling drill-down or root cause / provenance analysis
of the derived event.

Is (2) possible under the assumption that the streamed event is referenced using a blank node?

Windows in S2S operators

The text in Section 4.4 argues for not needing to define windows for streaming operators. The examples 6&7 then go and give example queries with windows. Why was the following form of query not used for Example 6?

SELECT ?room ?person
FROM STREAM ex:social
WHERE {
  ?person :isIn ?room
}

The existing proposal seems to require a lot of syntax for no benefit.

RDF Stream Profile: Time Series

Certain usecases or application domains do not need the full generality of the RDF stream definition, and so may be able to implement more efficient reasoning methods when the input is confined to be some subclass of RDF streams. It is common to call such subclasses "profiles" (e.g. OWL profiles RL, EL, QL). A new section of the Abstract Syntax and Semantics document should be devoted to defining and naming some important profiles.

Section 4.1: Input

What is the purpose of Section 4? Is it to define a new query language? If so, then the way to define a stream should be stated, particularly as Examples 3&4 both use a stream that is presumably defined somewhere. More clarity is required here.

Specify Semantics of Timestamped Graph Precisely

The document refers to the RDF Semantics WG Note (https://www.w3.org/TR/2014/NOTE-rdf11-datasets-20140225/#each-named-graph-defines-its-own-context). But there it says there are several possible formalizations, so it is necessary to state the formalization exactly in our document.

E.g. "One way is to interpret the graph name as denoting a graph, and a named graph pair is true if this graph entails the graph inside the pair." If this is the semantics we want (and also for streams), then we can adopt the formalization that follows in that document.

Define RDF Stream Merge and Union

The definition is complicated by the current definition of RDF stream such that it is a sequence, while in general the merge or union of streams is not a deterministic sequence, but is an equivalence class of isomorphic streams.

streamreasoning / rsp-ql Goto Github PK

rsp-ql's People

Contributors

Stargazers

Watchers

Forkers

rsp-ql's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs