streamreasoning / rsp-ql Goto Github PK
View Code? Open in Web Editor NEWA home of RSP-QL syntax and semantics discussion
License: Apache License 2.0
A home of RSP-QL syntax and semantics discussion
License: Apache License 2.0
https://github.com/streamreasoning/RSP-QL/blob/master/Semantics.md#window-functions
A lot of parameters are used for window function parameters but are never introduced.
The definition of time-stamped graph in https://github.com/streamreasoning/RSP-QL/blob/master/Semantics.md does not rigorously capture a group consensus about the concept. This issue may be used to formulate a list of mandatory and optional characteristics, that can be accepted or rejected by the group. Once the concept is agreed, then a rigorous definition may be drafted.
The Functional Requirements section could explicitly mention what operators an RSP-QL language should support. This could be too specific, so it may depend on the level of granularity we want to give to this seciton.
Please post any comments/issues on the first proposal of a Time Vocabulary, which we will use to relate a graph to a time instant for timestamped graphs.
The Time Vocab can be found here: https://github.com/streamreasoning/RSP-QL/blob/master/TimeVocab.owl
A visually more appealing version is here: http://www.essepuntato.it/lode/owlapi/https://raw.githubusercontent.com/streamreasoning/RSP-QL/master/TimeVocab.owl
Some initial discussion already took place on the list in this thread: https://lists.w3.org/Archives/Public/public-rsp/2015Jun/0040.html
In order to assist in characterising window functions, I think it would be helpful to define the relationship of "convex substream", being a substream which doesn't leave out any elements of the original stream that fall between any two elements that are included.
This would allow us to say something like:
A time-based window function is a mapping from RDF streams to their time-bounded convex substreams such that ...
Currently the document says "RSPs should process streams of data actively and in-stream, without the need of storing them". However, there are some reasonable operations that would require partial storage, just not storage of the entire past history. For example, to determine if two streams are isomorphic, some storage would be required due to the possibility of different sequential ordering in the serialization.
Consider the analogy to graphs and named graph pairs.
It seems unlikely that there will be a new structure built on top of RDF streams, at least any time soon (an RDF stream bundle?). The need to identify RDF streams by IRI has already been noted, e.g. to refer to them for querying.
To avoid a proliferation of methods for identifying an RDF stream by an IRI, it seems best to include this in scope.
However, if the abstract syntax is introduced, then it is necessary to talk about semantics and queries.
Section 4 starts by stating that there have been many existing proposals. These, or at least the key ones, should be cited.
In general, the functional requirements should avoid introducing terms that are undefined or vague, and should not get too specific - they should not anticipate the design. On the other hand, they should not be so general that they have little effect.
In particular, I suggest the following:
#1. Stronger: "RDF streams should be representable in an abstract model, and the semantics of this abstract model should provide the basis of the results of RSP queries."
#2. Weaker: "The RDF stream abstract model should be serialized in concrete formats derived from standard formats, extending beyond the standard format only when necessary
#4. Correction: "RDF streams may have timestamps based on different notions of time (time instants, intervals) with different semantics (application, validity, transactional)."
Re: " In case no timestamp is associated to an RDF stream data item, the system is responsible of managing time-based ordering of stream items."
This is something quite different than the earlier part of the requirement, and somewhat controversial. I suggest it be made a separate item and discussed at greater length.
#5. Re: "reactively" - what exactly do we mean by this requirement? How does this requirement impose constraints on the abstract model, semantics, concrete language, or query langauge?
#7. Greater generality: "RSP engines should be able to query a portion of the knowledge expressed in an RDF stream."
We need a common terminology for the components of an RDF stream: items, elements, members, but please not "events" or "streaming graphs".
#9. It is unclear from the wording, but I don't think it is obvious whether the statement includes:
"RSP engines should support combining multiple RDF streams.
It could be put into one:
"RSP engines should support combining multiple RDF streams as well as stored RDF (aka static RDF graphs or datasets)."
#12. More general:
"RSP queries should be able to access all knowledge expicitly expressed in the stream, including names of named graphs and triples containing such names."
#13. A window is a stream, so there is no need to have "stream/windows"
Hi,
I saw some examples of RDF encodings of RDF streams. I am wondering if standardizing the vocabulary and datatypes is within the goal this WG.
:g1 {:axel :isIn :RedRoom. :darko :isIn :RedRoom} {:g1 :atInterval 2016-03-01T13:00:00Z/2016-03-01T13:00:10Z}
:g2 {:axel :isIn :BlueRoom. } {:g2 :atInterval 2016-03-01T13:00:20Z/2016-03-01T13:00:40Z}
:g3 {:minh :isIn :RedRoom. } {:g3 :atInterval 2016-03-01T13:00:50Z/2016-03-01T13:00:60Z}
For instance, based on the above example
:atInterval
? Do we want to define other properties like ":startAt" or ":endAt"2016-03-01T13:00:00Z/2016-03-01T13:00:10Z
?There is a need to distinguish RDF streams that have only predicates whose associated order is total.
(Because there are number of important considerations that depend on this property, so it will be a great convenience to have a term for just these kinds of RDF streams.)
Some options:
Maybe this has been clarified somewhere but I feel that there is an assumption that although RDF streams consist of graphs, with some timestamp or similar, the graphs themselves may not be accessible in the windows.
Basically, we've touched upon this previously and my understanding was that some argued against making the graphs available, and instead proposed that all streaming graphs would be put in "a default graph" which represents a window,. But to me the arguments for this view was not motivated well, other than "in this and this example we can manage without it". So, if possible I would like it to be clarified whether a query like the one below for filtering a stream would be valid:
# Assume that the graphs in this particular stream have more than one timestamp.
# The timestamp property used by the engine is :generatedAt. Now we wish to create
# a substream based on a specific time property (:observedAt) excluding all non-
# applicable events in the stream.
PREFIX : <http://examplel.org#>
REGISTER STREAM :filteredEventStream AS
CONSTRUCT ISTREAM {
GRAPH ?g { ?subj1 ?prop1 ?obj1 . }
?g :observedAt ?obj2 .
?g ?prop3 ?obj3 .
}
FROM NAMED WINDOW :w ON :fullEventStream [RANGE PT10S]
WHERE {
WINDOW :w {
GRAPH ?g { ?subj1 ?prop1 ?obj1 . }
?g :observedAt ?obj2 .
OPTIONAL { ?g ?prop3 ?obj3 . }
}
}
Are there any arguments for why this query should not be valid? How would this query be expressed in a general way if the graphs cannot be referenced?
As described in several ednotes, there is a need to reorder certain definitions.
To support the data model where a graph can be annotated with different types of timestamp, we need to agree on a vocabulary of standard predicates for representing these time relationships.
Alejandro has mentioned that there is work taking place in the W3C Spatial Data on the Web Working Group
In section 2.3 of RDF Stream Abstract Syntax and Semantics
timestamped graphs are defined. The first part of the definition begins:
A timestamped graph is defined as an RDF Dataset [...]
But referring to an RDF dataset as a graph
seems to be a bit of a misnomer. I suggest that timstamped graph
is instead replaced by timestamped dataset
.
After the current pull requests are resolved, the examples should be cleaned up.
The various theoretical concepts of isomorphism are introduced in the Abstract Syntax and Semantics document to support the definitions of window functions and entailments. The practical side of isomorphism is out of scope for this document, but is worthy of discussion somewhere. E.g.
under what circumstances is isomorphism decidable? This appears to be the case iff the stream is finite.
In a timestamped graph, the predicate in the timestamp triple cannot be just any predicate.
It should have a specified range (of termporal entities).
This can be accomplished, to some extent, with RDFS, by identifying an rdfs:range.
Also the (partial or total) order that will be used to structure a stream must be specified.
We should work out what the final namespace would be. Maybe we should consider a purl. Another alternative was under the community groups W3C namespace
Check if the example is ok and if it would work as stated
The current definitions of window functions have the following issues:
Even in the case of an RDF stream using a single predicate with a totally-order set of temporal entities( e.g. time instants), there is still the possibility that there are two timestamped graphs in the stream that have the same timestamp. Therefore these stream items could be ordered differently in the sequence while still satisfying the stream constraints.
The consensus is (to my knowledge) that these streams should always give the same query results (when the same query is applied). This suggests a kind of equivalence class.
There should be a definition of the relationship between two streams that contain the same items, differing only in their order in the sequence. This kind of relationship is usually called "isomorphism". There is already a notion of isomorphism in RDF graphs, due to bnode renaming. We could still call this some qualified kind of isomorphism (e.g. S-isomorphism, where S is for sequence). Bnode (only) isomorphism could be called B-isomorphism, and the combination of the two, simply "isomorphism".
I propose that these definitions, and discussion about them, be contained in a new section called "Isomorphism", following the "Data Model" section.
Windows whose upper interval is not now, but some time in the past
It has been argued that RSP queries without windows should be possible. In fact they make sense for some cases, e.g. filtering streams, should they be kept?
I believe the title should state that it is RSP-QL rather than SPARQLStream
The definition has the word "single" with a strike-out.
All further developments of the RDF stream have a single timestamp triple as the default graph in the RDF Dataset that is a timestamped graph.
Therefore, we need to either accept that a timestamped graph has this restricted structure, or modify RDF streams to accommodate timestamped graphs with multiple triples in their default graph.
Note that the definition of RDF stream allows the same named graph (n, g) to be used in multiple items in the stream, so it is possible to assign multiple timestamps to the same named graph within an RDF stream
(n, g) (g p1 t1)
(n, g) (g p2 t2)
If it is desired to transmit nontemporal metadata of the named graphs as part of an RDF stream, then this must be handled by another mechanism (if the default graph is restricted to just the timestamp triple).
Here is one option: timestamp the metadata and add it to the stream:
(n, g) (n p1 t1)
(n, g) (n p2 t2)
(m, h) (m p3 t3)
where h contains metadata about g, e.g. (n r x) where x is not a temporal entity.
the current RGN_Location_TempC_Minute_Merged.json includes, for example, the following observations.
"@graph": [
{ "@id": "source:Berlin_1", "observedAt": "2015-01-01T01:01:00" },
{ "@id": "source:Madrid_1", "observedAt": "2015-01-01T01:01:00" },
{ "@id": "source:Paris_1", "observedAt": "2015-01-01T01:01:00" },
... ]
it is important to know whether a processor located in one of those cities would interpret this to be the "same" stream as one which contained
"@graph": [
{ "@id": "source:Berlin_1", "observedAt": "2015-01-01T01:01:00" },
{ "@id": "source:Madrid_1", "observedAt": "2015-01-01T01:01:00+01:00" },
{ "@id": "source:Paris_1", "observedAt": "2015-01-01T00:01:00Z" },
... ]
We should set up a LODE page or neologism (if it is still available) to allow us to more easily discuss the vocabulary. Examples
Including multiple time annotation predicates, blank nodes or intervals
Currently conformance notes are entered in Section 4.11, a subsection of the RDF Stream Query Language.
In W3C specs conformance notes typically come at the start of the document so that you understand how to read the document before you read it!
This section should be move to the top of the document. I would also suggest extending the use of RFC2119 to include the phrases MUST, SHOULD for the requirements section.
There is an ambiguity or inconsistency in the Abstract Syntax and Semantics document regarding temporal entities. The section "Temporal Entities" says the specification is neutral in regard to temporal ontologies. The section "Instants and Intervals" references the OWL Time Ontology.
S2R, R2R and R2S should be at the same structure level as S2S, not as sub-sections
Section 3.3.1 of https://github.com/streamreasoning/RSP-QL/blob/gh-pages/RSP_Requirements_Design_Document/index.html is unclear as to whether it is proposing the use of punctuations or HTTP chunking. More clarity is needed in this section.
Such as multiple streams, multiple windows on same stream or combination with named graphs, i.e. more complex dataset
In 3.3.2 Immutability and Event Derivation we state that a new unique graph should be generated for each derived event, which could then link back to other events.
In my understanding this means that events would be referenced by their named graphs (?). When I read the document this made me a bit confused since there is no indication that the named graph structure used for anything other than transporting a set of triples from A to B. All examples seem to assume that: 1) no named graphs are being streamed only triples, or 2) the named graph structure is lost and all triples are added to the default graph of the window upon arrival.
If 2) will we be able to use RSP-QL for CEP? Perhaps this is out of scope for this document but if possible it would be nice if we could we bit clearer about this.
As in the first example, It not precise to say "The following timestamped graph :g1 contains 2 triples ..."
The timestamped graph is not named :g1, and it contains a default graph and a named graph. It is the named graph whose name is :g1 .
As currently expressed, these definitions are mixed up with the concept of window function.
What is actually needed are the following concepts:
Note that these are properties that are preserved under finite merger/union, and so form closed algebras under these operations.
There is a form of RDF stream that does not give complete information about timestamps, but only provides order information. A useful form is a linked list, e.g.
:g1 :p _:t1.
:g1 {...}.
:g2 :p _:t2.
_:t2 time:after _:t1
:g2 {...}.
...
where _:t1 and _:t2 are OWL-TIME temporal entities.
It would be helpful to have a concrete usecase for this profile, as well as a specification. The temporal information provided in such an RDF stream should be sufficient for detection of complex events where the pattern is based on the order of its sub-events.
Should the predicate p
in a timestamped graph (g,p,t)
be considered when deciding upon the order of timestamped graphs in the stream S
or when considering which timestamped graphs are included in a time-bounded substream?
From 2.3 Timestamped Graphs:
There is exactly one named graph pair <n, G> in the RDF Dataset
(where G is an RDF graph, and n is an IRI or blank node).
This means that the name of the graph in the streamed element/dataset (I'll call it an event from here) can be represented as a blank node, e.g. as Trig:
_:b { :John :isIn :Room1 } .
_:b :observedAt "2017-08-16T16:35:00Z" .
However, blank nodes are always locally scoped to the file or RDF store (or in this case the streamed element), which effectively means that a stream using blank nodes can't contain references to other events in the stream, e.g. if the intention is:
_:b0 { :John :isIn :Room1 } .
_:b0 :observedAt "2017-08-16T16:35:00Z" .
_:b1 { :John :isIn :Room2 } .
_:b1 :observedAt "2017-08-16T16:35:05Z" .
_:b1 :after _:b0 .
_:b2 { :John :isIn :Room3 } .
_:b2 :observedAt "2017-08-16T16:35:10Z" .
_:b2 :after _:b1 .
but each element is streamed separately the labels of the blank nodes don't apply. I'm not saying that we should remove the alternative of having a blank node as the name of a graph but I'm not sure we've covered the implications of actually doing so. For example, from the 3.3.2 Immutability and Event Derivation in the RSP Requirements Design Document:
For RSP this means: (1) create a new (unique) graph for the derived event and (2) possibly
link back to the base event(s) thus enabling drill-down or root cause / provenance analysis
of the derived event.
Is (2) possible under the assumption that the streamed event is referenced using a blank node?
The text in Section 4.4 argues for not needing to define windows for streaming operators. The examples 6&7 then go and give example queries with windows. Why was the following form of query not used for Example 6?
SELECT ?room ?person
FROM STREAM ex:social
WHERE {
?person :isIn ?room
}
The existing proposal seems to require a lot of syntax for no benefit.
Certain usecases or application domains do not need the full generality of the RDF stream definition, and so may be able to implement more efficient reasoning methods when the input is confined to be some subclass of RDF streams. It is common to call such subclasses "profiles" (e.g. OWL profiles RL, EL, QL). A new section of the Abstract Syntax and Semantics document should be devoted to defining and naming some important profiles.
What is the purpose of Section 4? Is it to define a new query language? If so, then the way to define a stream should be stated, particularly as Examples 3&4 both use a stream that is presumably defined somewhere. More clarity is required here.
The document refers to the RDF Semantics WG Note (https://www.w3.org/TR/2014/NOTE-rdf11-datasets-20140225/#each-named-graph-defines-its-own-context). But there it says there are several possible formalizations, so it is necessary to state the formalization exactly in our document.
E.g. "One way is to interpret the graph name as denoting a graph, and a named graph pair is true if this graph entails the graph inside the pair." If this is the semantics we want (and also for streams), then we can adopt the formalization that follows in that document.
The definition is complicated by the current definition of RDF stream such that it is a sequence, while in general the merge or union of streams is not a deterministic sequence, but is an equivalence class of isomorphic streams.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.