GithubHelp home page GithubHelp logo

rsp-ql's People

Contributors

amileo avatar beortner avatar beta2k avatar danhlephuoc avatar daotranminh avatar dellaglio avatar emanueledellavalle avatar greentara avatar jpcik avatar keski avatar lisp avatar lpdanh avatar mbalduini avatar ocorcho avatar webdata avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rsp-ql's Issues

comments/issues on the Time Vocabulary

Please post any comments/issues on the first proposal of a Time Vocabulary, which we will use to relate a graph to a time instant for timestamped graphs.

The Time Vocab can be found here: https://github.com/streamreasoning/RSP-QL/blob/master/TimeVocab.owl
A visually more appealing version is here: http://www.essepuntato.it/lode/owlapi/https://raw.githubusercontent.com/streamreasoning/RSP-QL/master/TimeVocab.owl

Some initial discussion already took place on the list in this thread: https://lists.w3.org/Archives/Public/public-rsp/2015Jun/0040.html

Add concept of convex substream

In order to assist in characterising window functions, I think it would be helpful to define the relationship of "convex substream", being a substream which doesn't leave out any elements of the original stream that fall between any two elements that are included.

This would allow us to say something like:
A time-based window function is a mapping from RDF streams to their time-bounded convex substreams such that ...

Question on storage for processing

Currently the document says "RSPs should process streams of data actively and in-stream, without the need of storing them". However, there are some reasonable operations that would require partial storage, just not storage of the entire past history. For example, to determine if two streams are isomorphic, some storage would be required due to the possibility of different sequential ordering in the serialization.

Should identifying an RDF Stream by an IRI be part of the abstract syntax?

Consider the analogy to graphs and named graph pairs.

  • The abstract syntax for RDF graph (nor the concrete syntaxes for that matter) does not include the identification of the graph by an IRI.
  • Of course, an RDF graph can be considered an RDF resource, so an IRI can always *denote" an RDF graph.
  • But it is only when the named graph pair is introduced, leading up to the definition of RDF dataset (https://www.w3.org/TR/rdf11-concepts/#section-dataset), that the data model is extended to explicitly identify an RDF graph by an IRI.
  • Prior to the publication of RDF 1.1, named graphs were developed outside of the RDF specification by necessity.

It seems unlikely that there will be a new structure built on top of RDF streams, at least any time soon (an RDF stream bundle?). The need to identify RDF streams by IRI has already been noted, e.g. to refer to them for querying.
To avoid a proliferation of methods for identifying an RDF stream by an IRI, it seems best to include this in scope.

However, if the abstract syntax is introduced, then it is necessary to talk about semantics and queries.

  • The semantics of RDF datasets is in its current messy state because several different semantics for named graphs were developed independently. To avoid such a situation for a "named RDF stream", should the semantics be standardized at this time?
  • When a query refers to several streams by their IRIs, does this mean that the query is applied to the merger or union of the streams? Should the query be able to specify one or the other?
  • Can the stream IRI be used elsewhere in the query, e.g. in selecting from the default graphs of only one stream?

Reference existing RSP QL

Section 4 starts by stating that there have been many existing proposals. These, or at least the key ones, should be cited.

Functional Requirements

In general, the functional requirements should avoid introducing terms that are undefined or vague, and should not get too specific - they should not anticipate the design. On the other hand, they should not be so general that they have little effect.

In particular, I suggest the following:
#1. Stronger: "RDF streams should be representable in an abstract model, and the semantics of this abstract model should provide the basis of the results of RSP queries."
#2. Weaker: "The RDF stream abstract model should be serialized in concrete formats derived from standard formats, extending beyond the standard format only when necessary
#4. Correction: "RDF streams may have timestamps based on different notions of time (time instants, intervals) with different semantics (application, validity, transactional)."

Re: " In case no timestamp is associated to an RDF stream data item, the system is responsible of managing time-based ordering of stream items."
This is something quite different than the earlier part of the requirement, and somewhat controversial. I suggest it be made a separate item and discussed at greater length.
#5. Re: "reactively" - what exactly do we mean by this requirement? How does this requirement impose constraints on the abstract model, semantics, concrete language, or query langauge?
#7. Greater generality: "RSP engines should be able to query a portion of the knowledge expressed in an RDF stream."

We need a common terminology for the components of an RDF stream: items, elements, members, but please not "events" or "streaming graphs".
#9. It is unclear from the wording, but I don't think it is obvious whether the statement includes:

"RSP engines should support combining multiple RDF streams.
It could be put into one:
"RSP engines should support combining multiple RDF streams as well as stored RDF (aka static RDF graphs or datasets)."
#12. More general:

"RSP queries should be able to access all knowledge expicitly expressed in the stream, including names of named graphs and triples containing such names."
#13. A window is a stream, so there is no need to have "stream/windows"

vocabularies and datatypes for time intervals

Hi,

I saw some examples of RDF encodings of RDF streams. I am wondering if standardizing the vocabulary and datatypes is within the goal this WG.

:g1 {:axel :isIn :RedRoom. :darko :isIn :RedRoom} {:g1 :atInterval 2016-03-01T13:00:00Z/2016-03-01T13:00:10Z}
:g2 {:axel :isIn :BlueRoom. }                     {:g2 :atInterval 2016-03-01T13:00:20Z/2016-03-01T13:00:40Z}
:g3 {:minh :isIn :RedRoom. }                      {:g3 :atInterval 2016-03-01T13:00:50Z/2016-03-01T13:00:60Z}

For instance, based on the above example

  • What is the prefix for :atInterval? Do we want to define other properties like ":startAt" or ":endAt"
  • What is datatype for the literal 2016-03-01T13:00:00Z/2016-03-01T13:00:10Z?
  • Do we want to introduce some built-in RDF functions to retrieve the start and end of an interval literal?

Refine terminology about RDF Streams

There is a need to distinguish RDF streams that have only predicates whose associated order is total.
(Because there are number of important considerations that depend on this property, so it will be a great convenience to have a term for just these kinds of RDF streams.)

Some options:

  1. Use a different term (e.g. RDF braid) for all things that are currently called RDF streams, and use the term "RDF stream" for only those with "totally-ordered predicates.
  2. Retain the term RDF stream for the more general class, as it is now and create a new term, e.g. "RDF totally-ordered stream" or "RDF t.o.-stream"

Querying RDF graphs in windows

Maybe this has been clarified somewhere but I feel that there is an assumption that although RDF streams consist of graphs, with some timestamp or similar, the graphs themselves may not be accessible in the windows.
Basically, we've touched upon this previously and my understanding was that some argued against making the graphs available, and instead proposed that all streaming graphs would be put in "a default graph" which represents a window,. But to me the arguments for this view was not motivated well, other than "in this and this example we can manage without it". So, if possible I would like it to be clarified whether a query like the one below for filtering a stream would be valid:

# Assume that the graphs in this particular stream have more than one timestamp.
# The timestamp property used by the engine is :generatedAt. Now we wish to create
# a substream based on a specific time property (:observedAt) excluding all non-
# applicable events in the stream.

PREFIX : <http://examplel.org#>
REGISTER STREAM :filteredEventStream AS

CONSTRUCT ISTREAM {
   GRAPH ?g { ?subj1 ?prop1 ?obj1 . }
   ?g :observedAt ?obj2 .
   ?g ?prop3 ?obj3 .
}
FROM NAMED WINDOW :w ON :fullEventStream [RANGE PT10S]
WHERE {
   WINDOW :w {
      GRAPH ?g { ?subj1 ?prop1 ?obj1 . }
      ?g :observedAt ?obj2 .
      OPTIONAL { ?g ?prop3 ?obj3 . }
   }
}

Are there any arguments for why this query should not be valid? How would this query be expressed in a general way if the graphs cannot be referenced?

rename "timestamped graph"

In section 2.3 of RDF Stream Abstract Syntax and Semantics timestamped graphs are defined. The first part of the definition begins:

A timestamped graph is defined as an RDF Dataset [...]

But referring to an RDF dataset as a graph seems to be a bit of a misnomer. I suggest that timstamped graph is instead replaced by timestamped dataset.

Discuss the practical uses of isomorphism

The various theoretical concepts of isomorphism are introduced in the Abstract Syntax and Semantics document to support the definitions of window functions and entailments. The practical side of isomorphism is out of scope for this document, but is worthy of discussion somewhere. E.g.
under what circumstances is isomorphism decidable? This appears to be the case iff the stream is finite.

Missing Definition of Timestamp Predicate

In a timestamped graph, the predicate in the timestamp triple cannot be just any predicate.
It should have a specified range (of termporal entities).
This can be accomplished, to some extent, with RDFS, by identifying an rdfs:range.
Also the (partial or total) order that will be used to structure a stream must be specified.

Namespace for the time vocabulary

We should work out what the final namespace would be. Maybe we should consider a purl. Another alternative was under the community groups W3C namespace

Window function definitions revised

The current definitions of window functions have the following issues:

  1. contain some wording referring to how the function application is iterated. A window function should only be applied once, with the mechanism for iteration being an extra structure defined later.
  2. must respect S-isomorphism. That is, the two streams resulting from the application of a window function on two S-isomorphic streams should be S-isomorphic.

Define isomorphism of RDF stream relative to sequence order

Even in the case of an RDF stream using a single predicate with a totally-order set of temporal entities( e.g. time instants), there is still the possibility that there are two timestamped graphs in the stream that have the same timestamp. Therefore these stream items could be ordered differently in the sequence while still satisfying the stream constraints.

The consensus is (to my knowledge) that these streams should always give the same query results (when the same query is applied). This suggests a kind of equivalence class.

There should be a definition of the relationship between two streams that contain the same items, differing only in their order in the sequence. This kind of relationship is usually called "isomorphism". There is already a notion of isomorphism in RDF graphs, due to bnode renaming. We could still call this some qualified kind of isomorphism (e.g. S-isomorphism, where S is for sequence). Bnode (only) isomorphism could be called B-isomorphism, and the combination of the two, simply "isomorphism".

I propose that these definitions, and discussion about them, be contained in a new section called "Isomorphism", following the "Data Model" section.

Example 30

I believe the title should state that it is RSP-QL rather than SPARQLStream

Clarify Definition of Timestamped Graph Re Multiplicity of Timestamp Triples

The definition has the word "single" with a strike-out.
All further developments of the RDF stream have a single timestamp triple as the default graph in the RDF Dataset that is a timestamped graph.

Therefore, we need to either accept that a timestamped graph has this restricted structure, or modify RDF streams to accommodate timestamped graphs with multiple triples in their default graph.

Note that the definition of RDF stream allows the same named graph (n, g) to be used in multiple items in the stream, so it is possible to assign multiple timestamps to the same named graph within an RDF stream
(n, g) (g p1 t1)
(n, g) (g p2 t2)

If it is desired to transmit nontemporal metadata of the named graphs as part of an RDF stream, then this must be handled by another mechanism (if the default graph is restricted to just the timestamp triple).
Here is one option: timestamp the metadata and add it to the stream:
(n, g) (n p1 t1)
(n, g) (n p2 t2)
(m, h) (m p3 t3)
where h contains metadata about g, e.g. (n r x) where x is not a temporal entity.

include time zone information in timestamps in examples

the current RGN_Location_TempC_Minute_Merged.json includes, for example, the following observations.

  "@graph": [
    {  "@id": "source:Berlin_1",  "observedAt": "2015-01-01T01:01:00"  },
    {  "@id": "source:Madrid_1", "observedAt": "2015-01-01T01:01:00"  },
    {  "@id": "source:Paris_1",  "observedAt": "2015-01-01T01:01:00"  },
... ]

it is important to know whether a processor located in one of those cities would interpret this to be the "same" stream as one which contained

  "@graph": [
    {  "@id": "source:Berlin_1",  "observedAt": "2015-01-01T01:01:00"  },
    {  "@id": "source:Madrid_1", "observedAt": "2015-01-01T01:01:00+01:00"  },
    {  "@id": "source:Paris_1",  "observedAt": "2015-01-01T00:01:00Z"  },
... ]

Conformance

Currently conformance notes are entered in Section 4.11, a subsection of the RDF Stream Query Language.

In W3C specs conformance notes typically come at the start of the document so that you understand how to read the document before you read it!

This section should be move to the top of the document. I would also suggest extending the use of RFC2119 to include the phrases MUST, SHOULD for the requirements section.

Temporal Entities and OWL Time

There is an ambiguity or inconsistency in the Abstract Syntax and Semantics document regarding temporal entities. The section "Temporal Entities" says the specification is neutral in regard to temporal ontologies. The section "Instants and Intervals" references the OWL Time Ontology.

Discuss streamed named graph

In 3.3.2 Immutability and Event Derivation we state that a new unique graph should be generated for each derived event, which could then link back to other events.
In my understanding this means that events would be referenced by their named graphs (?). When I read the document this made me a bit confused since there is no indication that the named graph structure used for anything other than transporting a set of triples from A to B. All examples seem to assume that: 1) no named graphs are being streamed only triples, or 2) the named graph structure is lost and all triples are added to the default graph of the window upon arrival.
If 2) will we be able to use RSP-QL for CEP? Perhaps this is out of scope for this document but if possible it would be nice if we could we bit clearer about this.

Time-bounded and count-bounded substream definitions replaced

As currently expressed, these definitions are mixed up with the concept of window function.

What is actually needed are the following concepts:

  • Time-bounded RDF Stream
    An RDF Stream where for every timestamp predicate that occurs in the stream, there is a temporal entity in its range that bounds (is greater than or equal to) all timestamps of that predicate in the stream.
  • Finite RDF Stream
    An RDF stream with a finite number of items in it.

Note that these are properties that are preserved under finite merger/union, and so form closed algebras under these operations.

RDF Stream Profile: Linked List

There is a form of RDF stream that does not give complete information about timestamps, but only provides order information. A useful form is a linked list, e.g.

:g1 :p _:t1.
:g1 {...}.

:g2 :p _:t2.
_:t2 time:after _:t1
:g2 {...}.

... 

where _:t1 and _:t2 are OWL-TIME temporal entities.

It would be helpful to have a concrete usecase for this profile, as well as a specification. The temporal information provided in such an RDF stream should be sufficient for detection of complex events where the pattern is based on the order of its sub-events.

Blank node as the name of a streamed graph

From 2.3 Timestamped Graphs:

There is exactly one named graph pair <n, G> in the RDF Dataset
(where G is an RDF graph, and n is an IRI or blank node).

This means that the name of the graph in the streamed element/dataset (I'll call it an event from here) can be represented as a blank node, e.g. as Trig:

_:b { :John :isIn :Room1 } .
_:b :observedAt "2017-08-16T16:35:00Z" .

However, blank nodes are always locally scoped to the file or RDF store (or in this case the streamed element), which effectively means that a stream using blank nodes can't contain references to other events in the stream, e.g. if the intention is:

_:b0 { :John :isIn :Room1 } .
_:b0 :observedAt "2017-08-16T16:35:00Z" .

_:b1 { :John :isIn :Room2 } .
_:b1 :observedAt "2017-08-16T16:35:05Z" .
_:b1 :after _:b0 .

_:b2 { :John :isIn :Room3 } .
_:b2 :observedAt "2017-08-16T16:35:10Z" .
_:b2 :after _:b1 .

but each element is streamed separately the labels of the blank nodes don't apply. I'm not saying that we should remove the alternative of having a blank node as the name of a graph but I'm not sure we've covered the implications of actually doing so. For example, from the 3.3.2 Immutability and Event Derivation in the RSP Requirements Design Document:

For RSP this means: (1) create a new (unique) graph for the derived event and (2) possibly
link back to the base event(s) thus enabling drill-down or root cause / provenance analysis
of the derived event.

Is (2) possible under the assumption that the streamed event is referenced using a blank node?

Windows in S2S operators

The text in Section 4.4 argues for not needing to define windows for streaming operators. The examples 6&7 then go and give example queries with windows. Why was the following form of query not used for Example 6?

SELECT ?room ?person
FROM STREAM ex:social
WHERE {
  ?person :isIn ?room
}

The existing proposal seems to require a lot of syntax for no benefit.

RDF Stream Profile: Time Series

Certain usecases or application domains do not need the full generality of the RDF stream definition, and so may be able to implement more efficient reasoning methods when the input is confined to be some subclass of RDF streams. It is common to call such subclasses "profiles" (e.g. OWL profiles RL, EL, QL). A new section of the Abstract Syntax and Semantics document should be devoted to defining and naming some important profiles.

Section 4.1: Input

What is the purpose of Section 4? Is it to define a new query language? If so, then the way to define a stream should be stated, particularly as Examples 3&4 both use a stream that is presumably defined somewhere. More clarity is required here.

Specify Semantics of Timestamped Graph Precisely

The document refers to the RDF Semantics WG Note (https://www.w3.org/TR/2014/NOTE-rdf11-datasets-20140225/#each-named-graph-defines-its-own-context). But there it says there are several possible formalizations, so it is necessary to state the formalization exactly in our document.

E.g. "One way is to interpret the graph name as denoting a graph, and a named graph pair is true if this graph entails the graph inside the pair." If this is the semantics we want (and also for streams), then we can adopt the formalization that follows in that document.

Define RDF Stream Merge and Union

The definition is complicated by the current definition of RDF stream such that it is a sequence, while in general the merge or union of streams is not a deterministic sequence, but is an equivalence class of isomorphic streams.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.