GithubHelp home page GithubHelp logo

opencypher / opencypher Goto Github PK

View Code? Open in Web Editor NEW
809.0 809.0 145.0 8.52 MB

Specification of the Cypher property graph query language

Home Page: http://www.opencypher.org

License: Apache License 2.0

Gherkin 48.11% Java 38.78% Shell 0.12% Scala 12.36% FreeMarker 0.08% ANTLR 0.22% HTML 0.12% CSS 0.01% JavaScript 0.05% Cypher 0.15%
cypher database declarative grammar graph language property query specification standard tck

opencypher's Introduction

The Cypher Property Graph Query Language

This repository holds the specification of Cypher, a declarative property graph query language. Its purpose is to be central to the process of evolving the specification and standardisation of Cypher as a graph query language.

Overview of the process

Changes to openCypher are made through consensus in the openCypher Implementers Group (oCIG). The process for proposing changes, voting on proposals and measuring consensus is described in this set of slides.

Refer to the Cypher Improvement Process document for more details on CIPs, CIRs, their structure and lifecycle.

The structure of this repository

  • Cypher Improvement Proposals (CIP), /cip

    • Contains a list of accepted CIP documents.

  • Cypher grammar, /grammar

    • Contains the Cypher grammar specification, in XML source format.

    • A more readily consumable form of the grammar is generated as output from the build and can be found here:

      • Railroad diagrams

      • EBNF

      • ANTLR4 Grammar

  • Cypher Technology Compatibility Kit (TCK), /tck

    • Contains a set of Cucumber features that define Cypher behaviour, and documentation on how to use it.

  • openCypher developer tools, /tools

    • Contains code that tests the integrity of the repository, generates release artifacts, and aids implementers of openCypher.

Building

This repository uses a Maven build and supports cross building for Scala 2.12 and Scala 2.13:

  • For Scala 2.12, use mvn -U clean install -P scala-212

  • For Scala 2.13 use mvn -U clean install -P scala-213

Contact us

There are several ways to get in touch with the openCypher project and its participants:

  • Are you interested in implementing openCypher for your platform, but you have general questions and want to reach out to other community members with similar interests? Post to our Google Groups mailing list: https://groups.google.com/forum/#!forum/opencypher

  • For specific feature requests or bug reports, please open an issue on this repository.

  • Do you have a particular contribution in mind, and concrete ideas on how to implement them? Open a pull request.

© Copyright 2015-2017 Neo Technology, Inc.

Feedback

Any feedback you provide to Neo Technology, Inc. through this repository shall be deemed to be non-confidential. You grant Neo Technology, Inc. a perpetual, irrevocable, worldwide, royalty-free license to use, reproduce, modify, publicly perform, publicly display and distribute such feedback on an unrestricted basis.

License

The openCypher project is licensed under the Apache license 2.0.

opencypher's People

Contributors

alexaverbuch avatar arnefischereit avatar aviavni avatar boggle avatar darthmax avatar dvirdukhan avatar dwitry avatar fickludd avatar gem-neo4j avatar hedengran avatar hvub avatar jjaderberg avatar jmarton avatar jsoref avatar linneaandersson avatar lojjs avatar loveleif avatar mats-sx avatar mnd999 avatar peterfurniss avatar saschapeukert avatar sherfert avatar smietana avatar swilly22 avatar systay avatar szarnyasg avatar technige avatar thobe avatar tobias-johansson avatar volkovs avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

opencypher's Issues

parameters not parsing

this query runs fine in the console and via BOLT but fails to parse with m03 grammar
match(n:Person {name:{pname}}) return n order by n.name limit 10 ;

produces these errors:
line 1:27 mismatched input '}' expecting {':', WHITESPACE}
line 1:28 extraneous input '}' expecting {')', WHITESPACE}

Regular expressions should be marked as legacy

The Standardisation scope of Cypher states that regular expressions (i.e. the =~ operator) are to be excluded from openCypher. However, the current grammar allows them: https://github.com/opencypher/openCypher/blob/master/grammar/basic-grammar.xml#L228

I took a quick look at the grammar projects: it seems that only the Production / ProductionResolver classes are able to handle the legacy attribute, so marking a single operator legacy is not a trivial task.

Support calling aggregation functions on lists in expressions

CIR-2017-183

Cypher has a great set of aggregation functions that can be used to compute aggregate values over multiple result records.

However it currently is not possible to call these aggregation functions on lists in an expression evaluation context. Currently users that need to achieve this usually try to help themselves by using collect, UNWIND, and list comprehensions.

This CIR looks for a better way to achieve this user goal.

Requirements

Proposals should provide a syntax extensions for allowing to call aggregation functions over lists in an expression evaluation context.

.abnf grammar

Hi, I'm working on building a pretty deep neo4j integration with Elixir. I'm pretty new to grammars and I have only found parsers that support anbf grammars. I'm not familiar with the difference between abnf and ebnf.

Is this something that is easy to add? It looks like you are generating the .enbf from an xml file.

Thanks!

Support for DECODE or MAP function as an alternative to simple CASE

Can Neo4j Cypher support DECODE function in a future version? It could simply be an alias to the simple CASE:

CASE test
WHEN value THEN result
[WHEN ...]
[ELSE default]
END

translates as DECODE to:

DECODE ( test
, value , result
[, value , result
, ...]
[, default] )

The function is less verbose than simple CASE.

If the word DECODE is not the right fit, consider the word CASE itself, or something else like MAP (for mapping values to alternative values).

Moved from neo4j/neo4j#7139.

launch.sh script may create polluted grammar file

A small annoyance: if the user does not strictly follow the steps in the grammar generation instructions, s/he may enter something like this:

cd /tmp
git clone [email protected]:opencypher/openCypher
cd openCypher/
mkdir -p grammar/generated
./tools/grammar/src/main/shell/launch.sh Antlr4 cypher.xml > grammar/generated/Cypher.g4

It seemingly runs without any output/errors, but upon closer inspection, the Cypher.g4 file looks like this:

$ head grammar/generated/Cypher.g4 
/tmp/openCypher/tools/grammar /tmp/openCypher
/tmp/openCypher
/*
 * Copyright (c) 2015-2016 "Neo Technology,"
 * Network Engine for Objects in Lund AB [http://neotechnology.com]
 * 

The reason for this is that the output of the pushd and popd commands is sent to the standard output (and redirected to the Cypher.g4 file). Fortunately, this can be fixed trivially.

Grammar lacks whitespace after distinct UNION

This makes the following unparseable:

MATCH (a:A)
RETURN a AS a
UNION
MATCH (b:B)
RETURN b AS a

Error given by parser:

line 4:0 extraneous input 'MATCH' expecting {ALL, WHITESPACE}
line 4:6 extraneous input '(' expecting {ALL, WHITESPACE}
line 5:0 extraneous input 'RETURN' expecting {ALL, WHITESPACE}
line 5:7 extraneous input 'b' expecting {ALL, WHITESPACE}
line 5:9 extraneous input 'AS' expecting {ALL, WHITESPACE}
line 5:12 extraneous input 'a' expecting {ALL, WHITESPACE}
line 6:0 mismatched input '' expecting ALL

EBNF grammar problem with SP in Expression7

Instead of

Expression7 = Expression6, { (WS, '+', WS, Expression6) | (WS, '-', WS, Expression6) } ;

should it be

Expression7 = Expression6, { (WS, '+', WS, Expression6) | (WS, '-', SP, Expression6) } ;

since else wise there is a problem with a possible construction of

--

when subtracting a negative unary expression?

Extended subqueries for Cypher

CIR-2017-181

Support for subqueries increases the expressivity of query languages by a large margin.
This CIR seeks to improve Cypher to support a full set of relevant subqueries.

Background

Cypher today supports a limited set of subqueries

  • List comprehensions
  • Pattern predicates
  • Existential subqueries

Use-Cases

Common uses for subqueries:

  • Continuing after a set operation (e.g. UNION)
  • Use of a scalar value (e.g. computed via aggregation) from deep within an expression
  • Performing multiple updates based on additional matches without changing the cardinality of the outer query

Requirements

Make a proposal for adding full subquery support to Cypher, including:

  • Correlated subqueries
  • Uncorrelated subqueries
  • Scalar subqueries in expressions
  • Update subqueries (execute subquery without changing the outer query)

Considerations

  • Interaction with existing features

EBNF grammar problem with SP in CaseAlternatives & CaseExpression

A terminal followed by a terminal or an expression and an expression followed by a terminal should be separated by a SP?

Hence

CaseExpression = (((C,A,S,E), { SP, CaseAlternatives }-) | ((C,A,S,E), SP, Expression, { SP, CaseAlternatives }-)), [SP, (E,L,S,E), SP, Expression], SP, (E,N,D) ;
CaseAlternatives = (W,H,E,N), SP, Expression, SP, (T,H,E,N), SP, Expression ;

instead of

CaseExpression = (((C,A,S,E), { WS, CaseAlternatives }-) | ((C,A,S,E), Expression, { WS, CaseAlternatives }-)), [WS, (E,L,S,E), WS, Expression], WS, (E,N,D) ;
CaseAlternatives = (W,H,E,N), WS, Expression, WS, (T,H,E,N), WS, Expression ;

Grammar prohibits using whitespace between pattern parts in MATCH clause

The current grammar (at least the generated ANTLR4 grammar) only allows the 1st form, and rejects the 2nd and 3rd query forms below. I.e. it prohibits using whitespaces around the comma between the PatternPart non-terminal symbols.

  1. MATCH (n1),(n2) RETURN n1, n2
  2. MATCH (n1), (n2) RETURN n1, n2
  3. MATCH (n1) , (n2) RETURN n1, n2

Make it easier collect the result of a matched pattern

CIR-2017-177

Something like a shorthand syntax for a subquery that combines (OPTIONAL) MATCH, collect(...) and RETURN would be really useful, in particular for creating object hierarchy projections from the graph.

One thought would be to use something like a comprehension syntax for this. Something like this:

MATCH (user:Person{username:$username})
RETURN [
    MATCH (user)-[:FRIEND]-(friend)
    RETURN friend.username
] AS friends

Would be a shorthand for:

MATCH (user:Person{username:$username})
OPTIONAL MATCH (user)-[:FRIEND]-(friend)
WITH user, collect(friend.username) AS friends
RETURN friends

This particular example is deliberately kept simple, in order to capture only the essence of the proposal, but even if the syntactic saving isn't very substantial, there is a clear benefit to having the scope delineation take care of the grouping of friends per user that the superfluous variable in the aggregating WITH statement is otherwise needed for.

Optional Create or Optional Merge

When I used OPTIONAL MATCH and MERGE together, I encountered a problem. The query ran and crashed with an error logged. I was told to check the log, and I found that the variable I was getting from the OPTIONAL MATCH was null, when a MERGE was being performed to create a relationship to it.

How do I make it create the relationship only when OPTIONAL MATCH returns a non-null value (i.e. a node) for it? I thought the simplest syntax (if this has to be implemented) would be to simply add OPTIONAL in front of MERGE or CREATE. Of course, Neo4j could also just ignore any MERGE or CREATE involving null value and emit a warning for such.

Support common set operations in Cypher

CIR-2017-180

Set operations between query results are a commonly available feature of declarative query languages.

Adding set operations to Cypher has frequently been requested by users.

Background

Cypher today supports computing the UNION between operators. Other set operations can be simulated to some degree using lists.

Requirements

Add a full set of set operations to the language, including

  • Computing the intersection between the results of two queries
  • Computing the symmetric difference (exclusive union) between the results of two queries
  • Computing the asymmetric difference between the results of two queries
  • Computing the cartesian product between the results of two queries
  • Computing the power set of the result of a query

Considerations

  • Consider interaction with existing list operations, esp. in subqueries (e.g. from list to subquery to set operation and back)
  • Consider adding a cross product as part of a projection (i.e. easy way to do cross product between two or more projections over the same input)

EBNF grammar problem with SP in Expression3

Instead of

Expression3 = Expression2, { (WS, '[', Expression, ']') | (WS, '[', [Expression], '..', [Expression], ']') | (((WS, '=~') | (SP, (I,N)) | (SP, (S,T,A,R,T,S), SP, (W,I,T,H)) | (SP, (E,N,D,S), SP, (W,I,T,H)) | (SP, (C,O,N,T,A,I,N,S))), WS, Expression2) | (SP, (I,S), SP, (N,U,L,L)) | (SP, (I,S), SP, (N,O,T), SP, (N,U,L,L)) } ;

should it be

Expression3 = Expression2, { (WS, '[', Expression, ']') | (WS, '[', [Expression], '..', [Expression], ']') | (((WS, '=~') | (((SP, (I,N)) | (SP, (S,T,A,R,T,S), SP, (W,I,T,H)) | (SP, (E,N,D,S), SP, (W,I,T,H)) | (SP, (C,O,N,T,A,I,N,S))), SP)), WS, Expression2) | (SP, (I,S), SP, (N,U,L,L)) | (SP, (I,S), SP, (N,O,T), SP, (N,U,L,L)) } ;

?

It should also be

(((SP, '=~')

since else wise there is a problem with a possible construction of

!=~

Adding support for multiple graphs

CIR-2017-182

Supporting multiple graphs from within the same Cypher query would massively increase the power and expressivity of the language. This CIR asks the community to help us explore this idea at greater depth.

Background

For the purpose of this CIR, we assume an extended version of the property graph model.

  • There is a new called a graph
  • Graphs have properties
  • Graphs have labels
  • A graph can contain many nodes and relationships
  • Every node and relationship is contained in one or more graphs

Requirements

For this CIR, we're looking for a wide set of proposals. Therefore we do not ask for many requirements. However, it is expected that a full proposal would touch on a sensible set of the following topics in some form:

  • Passing graphs as input to Cypher
  • Receving graphs as output from Cypher
  • Combining graphs (e.g. set operations)
  • Dynamically creating graphs from queries
  • Possible changes to the overall language execution model
  • Querying multiple graphs explicitly within the same query
  • Updating graphs
  • Updating graph membership (which nodes/relationships are part of a graph)
  • Representation of graphs inside a Cypher query (as value? as a context?)

Considerations

Furthermore, proposals are invited to cover the following additional facets

  • Graphs as entities (i.e. they may have an identity, like nodes and relationships)
  • Views
  • Addressing graphs
  • Federation/Cross-database operation
  • Access control

Thank you very much!

Broken ANTLR4 grammar

Hi openCypher!
I'm having some trouble using the provided ANTLR4 grammar to parse DDL statements.

First, in order to get ANTLR4 to generate target java code, I had to make a small change and rename the "return" rule. Otherwise, I would get the following error:
error(134): Cypher.g4:71:9: symbol return conflicts with generated code in target language or runtime
I am sure I have done this small change carefully though, without touching any other rules like returnBody.

Now, when I try to parse a simple DDL statement like:

CREATE (n:Person { name : 'Andres', title : 'Developer' }) \n

and print the parse tree using ANTLR's grun tool. I get the following errors:

line 1:17 extraneous input '{' expecting {')', WHITESPACE}
line 1:19 extraneous input 'name' expecting {')', WHITESPACE}
line 1:24 extraneous input ':' expecting {')', WHITESPACE}
line 1:26 extraneous input ''Andres'' expecting {')', WHITESPACE}
line 1:35 extraneous input ' ' expecting {'(', CYPHER, EXPLAIN, PROFILE, USING, PERIODIC, COMMIT, UNION, ALL, CREATE, DROP, INDEX, ON, CONSTRAINT, ASSERT, IS, UNIQUE, EXISTS, LOAD, CSV, WITH, HEADERS, FROM, AS, FIELDTERMINATOR, OPTIONAL, MATCH, UNWIND, MERGE, SET, DELETE, DETACH, REMOVE, FOREACH, IN, DISTINCT, RETURN, ORDER, BY, L_SKIP, LIMIT, DESCENDING, DESC, ASCENDING, ASC, JOIN, SCAN, START, NODE, RELATIONSHIP, REL, WHERE, SHORTESTPATH, ALLSHORTESTPATHS, OR, XOR, AND, NOT, STARTS, ENDS, CONTAINS, NULL, TRUE, FALSE, COUNT, FILTER, EXTRACT, ANY, NONE, SINGLE, REDUCE, CASE, ELSE, END, WHEN, THEN, L_0X, UnescapedSymbolicName, EscapedSymbolicName}
line 1:42 extraneous input ':' expecting {'=', WHITESPACE}
line 1:44 extraneous input ''Developer'' expecting {'=', WHITESPACE}
line 1:56 extraneous input '}' expecting {'=', WHITESPACE}
line 2:0 mismatched input '<EOF>' expecting '='

Additionally, while queries like this work:

MATCH (node1:Label1)

WHERE node1.propertyA = {value}

RETURN node2.propertyA, node2.propertyB

queries like this:

MATCH (tom:Person)-[:ACTED_IN]->(tomHanksMovies) RETURN tom

give the following error:
line 1:18 mismatched input '-' expecting {<EOF>, ',', USING, UNION, CREATE, LOAD, WITH, OPTIONAL, MATCH, UNWIND, MERGE, SET, DELETE, DETACH, REMOVE, FOREACH, RETURN, START, WHERE, WHITESPACE}

Do you think there could be a bug in the provided grammar, or am I doing something wrong?
Thank you!

Questions on labels and tests

  1. Is it a goal to support multiple labels? TinkerPop3 does not support multiple labels:

    TinkerPop3 requires every Element to have a single, immutable string label (i.e. a Vertex, Edge, and VertexProperty).

    I can imagine the TinkerPop property graph model as a basis for potential openCypher implementations, so this seems an important design decision. The acceptance tests suggest that openCypher implementations should be able to handle multiple labels.

  2. The Create.feature and the CreateAcceptance.feature tests share some similar test cases (e.g. the first test case is identical). Is this intentional?

Strange behavior on simple query parsing using ANTLR4 grammar

Hi,
I'm using the latest stable antl4 grammar for some Cypher parsing and I found a strange behavior.

Parsing the query:
MATCH (e) return e which is a pretty basic query I get:

line 1:7 no viable alternative at input '(e'
line 1:17 no viable alternative at input 'return e'
line 1:17 extraneous input 'e' expecting {<EOF>, WHITESPACE}

If I change the nodePattern e to x it works.

MATCH (x) return x

Grammar problem: several consecutive statements

The following queries from the Neo4j Developer Manual v3.0 seem not to be covered by the grammar:

9.2.5 Using unique constraints with MERGE

CREATE CONSTRAINT ON (n:Person) ASSERT n.name IS UNIQUE;
CREATE CONSTRAINT ON (n:Person) ASSERT n.role IS UNIQUE;

The grammar does not allow several consecutive statements.

Shorthand for coalesce

CIR-2016-22

Since null handling can be common within a statement, writing coalesce every time can be a bit verbose. It would be nice to have some syntactic shorthand sugar for this.

Oracle, for example, has the nvl function which is a more concise alternative to coalesce, which they also have (don't read this in any way as me being a fan of Oracle .... it's one of the only times their syntax is "concise").

This all started from the casual slack conversation on the neo4j-users slack channel #help-cypher. Not a massive priority, but raised issue this at the request of Michael Hunger ...

Kevin Turner [2:58 AM] 
makes me want something like `(x IS NULL) ? 0 : x` sometimes.

Matt Byrne [6:23 AM] 
Like `coalesce(x, 0)` ?

Kevin Turner [6:56 AM] 
oh hey that is what coalesce does, isn’t it. okay, that’d be readable enough.

Matt Byrne [7:12 AM] 
:simple_smile: ... yea comes in handy

Eve Freeman [7:13 AM] 
if you have more complex stuff `... ? ... : ...` is basically `case when x is null then 0 else x end`

Matt Byrne [7:14 AM] 
I'm not a massive fan of Oracle but they do have a more concise equivalent (as well as `coalesce`) called `nvl` ... fewer letters ... when you have a few of these in one statement it's more readable with 3 letters.

new messages
Michael Hunger [12:33 PM] 
ya, true

[12:33] 
can you raise it as issue on the openCypher repo @mattbyrne ?

Matt Byrne [1:19 PM] 
sure @michael.neo

EBNF broken ?

Previous:

Cypher = WS, QueryOptions, Statement, [WS, ';'], WS ;

QueryOptions = { AnyCypherOption, WS } ;

AnyCypherOption = CypherOption
                | Explain
                | Profile
                ;

CypherOption = (C,Y,P,H,E,R), [SP, VersionNumber], { SP, ConfigurationOption } ;

VersionNumber = DigitString, '.', DigitString ;

Explain = E,X,P,L,A,I,N ;

Profile = P,R,O,F,I,L,E ;

ConfigurationOption = SymbolicName, WS, '=', WS, SymbolicName ;

Statement = Command
          | Query
          ;

Query = RegularQuery
      | BulkImportQuery
      ;

...

Current:

Cypher = WS, Statement, [WS, ';'], WS ;

Statement = Query ;

Query = RegularQuery ;
...

Grammar problem: brackets

The following queries from the Neo4j Developer Manual v3.0 seem not to be covered by the grammar because of the use of brackets:

8.3 Where

MATCH (n)
WHERE n.name = 'Peter' XOR (n.age < 30 AND n.name = "Tobias") OR NOT (n.name = "Tobias" OR
  n.name="Peter")
RETURN n

The following is replaced by #113:

9.2.5 Using unique constraints with MERGE

CREATE CONSTRAINT ON (n:Person) ASSERT n.name IS UNIQUE;
CREATE CONSTRAINT ON (n:Person) ASSERT n.role IS UNIQUE;

EBNF: RangeLiteral

The current definition is

RangeLiteral = WS, [IntegerLiteral, WS], ['..', WS, [IntegerLiteral, WS]] ;

Is WS really a valid RangeLiteral ?

Wrong numberLiteral token resolution on mapLiteral instances

I found a bug on the ANTLR4 snapshot grammar, given the following query:
MATCH (n {x: 1.000}) return n

It returns:

line 1:15 mismatched input '000' expecting {HexString, UNION, ALL, OPTIONAL, MATCH, UNWIND, AS, MERGE, ON, CREATE, SET, DETACH, DELETE, REMOVE, WITH, DISTINCT, RETURN, ORDER, BY, L_SKIP, LIMIT, ASCENDING, ASC, DESCENDING, DESC, WHERE, OR, XOR, AND, NOT, IN, STARTS, ENDS, CONTAINS, IS, NULL, COUNT, FILTER, EXTRACT, ANY, NONE, SINGLE, TRUE, FALSE, UnescapedSymbolicName, EscapedSymbolicName}

Looking at the tree it seems like it takes the . as a property lookup.

It happens with other doubleLiterals like in:

MATCH (n) where n.x = 1.000 return n

The problem seems to be in the rule:

expression2 : atom ( propertyLookup | nodeLabels )* ;

Or in numberLiteral that resolves as integerLiteral instead of decimalLiteral

Add support for Regular Path Queries (RPQs)

CIR-2017-179

In order to support expressing more complex patterns over the graph, a syntax for Regular Path Queries would be desirable. Such a feature should support the following combinations of path patterns:

  • Alternatives (or unions) (either this pattern or that pattern)
  • Sequences (first this pattern, than that pattern)
  • Transitive closure (this pattern repeated multiple times)
  • Grouping of patterns
  • Predicates on node and relationship properties.

Possibly we might also want a way to express any path not matching the given pattern.

Isomorphic pattern matching and configurable uniqueness

CIR-2017-174

Cypher pattern matching assumes relationship uniqueness: A relationship can only be matched once per instance of a pattern. This has been criticized occasionally in research papers and by graph analytics practitioners as being a somewhat arbitrary decision that makes it hard to express homomorphic and (node-)isomorphic matching.

This CIR (cypher improvement request) invites proposals to address this, e.g. via a syntactic extension for overriding default uniqueness.

Background

Pattern matching in Cypher can be described as filtering a candidate set of possible matches that are formed by computing the cross product of possible values for all pattern variables.
Candidate filtering only keeps matches for which all relevant predicates evaluate to true and whose relationships are matched in accordance with relationship direction and uniqueness requirements as specified by the relevant patterns.

It would certainly be desirable to further elaborate the exact full semantics of pattern matching at some point but for the purpose of this CIR only the semantics of pattern uniqueness is covered below.

Pattern variable

A variable that is used to name a part of a pattern is called a pattern variable. Currently, there are four kinds of pattern variables:

  • Node pattern variables, e.g. n in (n)
  • Single relationship pattern variables, e.g. r in ()-[r]->()
  • Variable length relationship pattern variables, e.g. r in ()-[r*..4]-()
  • Named path variables in named path patterns, e.g. p in p=()-[*]-()<-(x)

A pattern variable may have already been bound in an outer scope or it may be newly introduced by the syntactic element that contains the pattern.

Uniqueness scope

Every MATCH clause forms a uniqueness scope for all pattern variables used to name any of the patterns in the clause. The uniqueness scope does not consider variables from predicates in the associated WHERE clause, not even if those predicates are implied by patterns (e.g. patterns with literal maps whose value expressions reference pattern variables introduced in earlier parts of the query).

Similarly, other constructs that contain patterns (like pattern comprehensions) form a uniqueness scope for all pattern variables used to name any of their patterns.

Furthermore - and without any loss of generality - it is assumed in the following that all unnamed parts of patterns are named using artificially generated pattern variables, i.e. unnamed patterns are treated as syntactic sugar and everything is considered to be named.

Entity consumption

We say that a given concrete entity (i.e. a node or a relationship) is consumed ("used up") in a candidate match by a pattern variable from the associated uniqueness scope if

  • the concrete entity is a node, the pattern variable is a node pattern variable, and the variable is bound to the concrete node,
  • or the concrete entity is relationship, the pattern variable is a single relationship pattern variable, and the variable is bound to the concrete relationship,
  • or the concrete entity is a relationship, the pattern variable is variable length relationship pattern variable, and the variable is bound to a list of relationships that contains the concrete relationship (each occurrence counts separately),
  • or the concrete entity is a node, the pattern variable is variable length relationship pattern variable, the variable is bound to a list of relationships, and the concrete node is one of the interior nodes of one the relationships in that list (each occurrence counts separately),
  • or nothing else.

Note that this definition itself does not consider uniqueness, it is merely used below to define uniqueness and therefore the same concrete entity can be consumed multiple times by the same instance of a pattern (the same match) according to this definition.

Also note that this definition does not consider named path variables. Paths are always constructed from other patterns in the same uniqueness scope and therefore do not need any additional consideration regarding the uniqueness of their contained entities.

Uniqueness

With these definitions it now becomes possible to precisely describe relationship uniqueness:

A candidate match is relationship-unique in a uniqueness scope if it does not consume a relationship more than once.

By analogy, a candidate match is node-unique if it does not consume a node more than once.

Note that node uniqueness implies relationship uniqueness and prevents matching of self-relationships (relationships with the same start and end node).

Pattern matching in Cypher by default only returns relationship-unique matches.

Requirements

Requested changes

  • Change Cypher such that it becomes possible to configure uniqueness per uniqueness scope in order to at least enable matching without any uniqueness constraints
  • Change Cypher to support: homomorphic matching (no uniqueness), isomorphic matching (node uniqueness), and the currently provided semantics (relationship uniqueness)

Requested considerations

  • Consider the impact of the proposal on path matching semantics, especially regarding cycles
  • Consider the impact of the proposal on potential changes in possible query result cardinality (esp. regarding the possible introduction of infinite query results)
  • Consider the extensibility of the proposal regarding the introduction of further uniqueness modes in the future

Interaction with existing features

  • Do not change the semantics of existing queries without giving a very strong argument

Issue with ExponentDecimalReal

Should't it be

ExponentDecimalReal = ['-'], { Digit | '.' }-, ((e) | (E)), ['-'], DigitString ;

instead of

ExponentDecimalReal = ['-'], { Digit | '.' }-, ((E) | (E)), ['-'], DigitString ; ?

Semantics of collections

Similarly to most RDBMS systems, openCypher uses multiset semantics and users can enforce set semantics with the DISTINCT keyword. However, unlike most RDBMSs, Cypher suports lists as attributes. For example, it can use the collect method to gather a specific attribute from multiple rows.

The simplest example that I can think of:

UNWIND [1,2] AS x
RETURN collect(x)

Does the specification require that the result is [1,2], or [2,1] is also acceptable? (As far as I can tell, Neo4j always returns [1,2].)

As a more general question, is there currently a specification for this, or are there any plans for standardizing when collections should be treated as a multiset or a list?

Missing 'integerLiteral' in clause 'literalIds ' in opencypher-M02-legacy antlr4 grammar

I just stumbled over the 'literalIds' clause in the opencypher-M02-legacy antlr4 grammar. In a former version of this grammar there was a clause

literalIds : unsignedIntegerLiteral ( ws ',' ws unsignedIntegerLiteral )*;

This clause has now changed to

literalIds : ( sp? ',' sp? )* ;

As far as I understand the new clause it matches a sequence of commas, optionally separated by a single whitespace, where it should match a comma-separeted list of node ids (integerLiterals).

[Cypher Grammar] Migration notes

Hi all,

I am author of Cypher plugin for Jebrains IDE's.
Initially I created Cypher grammar by porting cypher-compiler into .bnf.

Currently I am trying to migrate plugin to official Cypher grammar.


  1. Cypher = WS, AllOptions, WS, Statements ;
    Accordingly to this rule, we can't specify options for each statement. In my implementation it's possible to define options per-statement.

Looks like that AllOptions rule should be in Statement rule.

Statement = {AllOptions} (Command | Query);
  1. PRAGMA rule.
    I have asked about this one and answer was "this is for internal use".
    Also I didn't find any information on internet about this clause.
    Do we need this in grammar?

  2. Consistency with using 'SP' rule.
    Let's take this rule for example:

Expression3 = Expression2, {
                ('[', Expression, ']')
              | ('[', [Expression], '..', [Expression], ']')
              | ('=~', Expression2) | ('IN', Expression2)
              | ('STARTS', 'WITH', Expression2)
              | ('ENDS', 'WITH', Expression2)
              | ('CONTAINS', Expression2)
              | ('IS', 'NULL')
              | ('IS', 'NOT', 'NULL')
            } ;

SP rule is not used here at all.
However if we look at this one:

CreateIndex = 'CREATE', SP, 'INDEX', SP, 'ON', NodeLabel, '(', PropertyKeyName, ')';

In this rule SP is used between every keyword.


Overall - thank you for grammar!

EBNF: mismatch of SP and WS

This one is fine:

CreateIndex = (C,R,E,A,T,E), SP, Index ;
DropIndex = (D,R,O,P), SP, Index ;

but SP and WS differs here:

CreateUniqueConstraint = (C,R,E,A,T,E), WS, UniqueConstraint ;
DropUniqueConstraint = (D,R,O,P), SP, UniqueConstraint ;

CreateNodePropertyExistenceConstraint = (C,R,E,A,T,E), WS, NodePropertyExistenceConstraint ;
DropNodePropertyExistenceConstraint = (D,R,O,P), SP, NodePropertyExistenceConstraint ;

CreateRelationshipPropertyExistenceConstraint = (C,R,E,A,T,E), WS, RelationshipPropertyExistenceConstraint ;
DropRelationshipPropertyExistenceConstraint = (D,R,O,P), SP, RelationshipPropertyExistenceConstraint ;

NOT operator requires spaces on both sides

The NOT operator requires a whitespace both before and after the operator: https://github.com/opencypher/openCypher/blob/master/grammar/basic-grammar.xml#L191

  <production name="Expression9" rr:inline="true">
    <repeat>&SP; NOT &SP;</repeat>
    <non-terminal ref="Expression8"/>
  </production>

Let's investigate this with an example query (inspired by the Neo4j Cypher documentation.).

This query should be parsed (it works with Neo4j), but it isn't:

MATCH (other:Person)
WHERE NOT other.age > 25
RETURN other.name
line 2:10 mismatched input 'other' expecting {<EOF>, WHITESPACE}

This query should be parsed (it works with Neo4j), but it isn't:

MATCH (other:Person)
WHERE (NOT other.age > 25)
RETURN other.name
line 2:11 no viable alternative at input '(NOT other'
line 2:11 mismatched input 'other' expecting {<EOF>, WHITESPACE}

In contrast, this query can be parsed as it has a space character before the NOT operator.

MATCH (other:Person)
WHERE ( NOT other.age > 25)
RETURN other.name

As an alternative, this query can be parsed -- it also has a space before the NOT operator, but more importantly, the NOT ... string is parsed with the functionInvocation rule (instead of the expression9 rule)

MATCH (other:Person)
WHERE NOT (other.age > 25)
RETURN other.name

Grammar problem: relationship variable in variable length relationships

The Neo4j Developer Manual v3.0 shows in chapter 8.1.4 the following example query:

MATCH (actor { name:'Charlie Sheen' })-[r:ACTED_IN*2]-(co_actor)
RETURN r

This query seems not to be covered by the current grammar, since we have:

RelationshipDetail = '[', [Variable], ['?'], [RelationshipTypes], ['*', [RangeLiteral]], [Properties], ']' ;
...
RangeLiteral = [UnsignedIntegerLiteral, WS], '..', [WS, UnsignedIntegerLiteral] ;

Support "acyclic" constraint on relationships

CIR-2017-172

Related to #173

I want to be able to model a DAG (Directed Acyclic Graph) in cypher. So I need to be able to ensure that when a relationship is created then it doesn't create a loop.

I guess it could be done by remembering to code all the creates in a transaction, something like

# start transaction
match (a:T {key:'val1'}), (b:T {key:'val2'}) create (a)-[:R]->(b)
match p = (a:T {key:'val1'})-[:R*]-(a) return p limit 1
# abort transaction if there's now a loop

But that feels really clumsy and prone to developer errors. NB. see also neo4j/neo4j#8667

It would be much nicer to be able to say something like

create constraint on ()-[r:R]-() assert acyclic(r)

and have match (a:T {key:'val1'}), (b:T {key:'val2'}) create (a)-[:R]->(b) throw an error automatically if it would make a loop.

Cannot parse "ExponentDecimalReal" numbers

Using the generated ANTLR4 grammar, I cannot parse exponent decimal real numbers. Parsing a string of 1E2 with the exponentDecimalReal rule returns the following error:

line 1:0 mismatched input '1E2' expecting {'.', DecimalInteger, Digit}

I am not yet very familiar with ANTLR but I think the issue is either the ordering of the rules or the lexer/parser nature of the related rules. This leads us to the shouldBeLexerRule method and the Antlr4Massager classes.

In issue #56 @Mats-SX stated that it's not a goal to provide a good ANTLR parser, so I think a good approach would be to patch the Antlr4 and the Antlr4Massager classes. Do you agree?

Also, @thobe can you please elaborate on how to proceed with the // TODO: melt these snow flakes. task?

desc keyword not allowed

This query:
match (s:Store) where s.storeFormat ='Open' return distinct s.storeFormat order by s.storeFormat desc limit 5 parses fine in the console and BOLT but fails the parser with the following errors:
line 1:97 extraneous input 'desc' expecting {, WHITESPACE}
line 1:102 extraneous input 'limit' expecting {, WHITESPACE}
line 1:108 extraneous input '5' expecting {, WHITESPACE}

How to generate Railroad diagrams/EBNF/ANTLR grammar

The README mentions that

A more readily consumable form of the grammar is generated as output from the build

How can I generate these outputs? I tried various Maven commands (e.g. mvn compile test), but none produced the ebnf/g4/html files. I also tried searching the code, but did not find the related code.

EBNF grammar problem with SP in Expression4

Instead of

Expression4 = { ('+' | '-'), WS }, Expression3 ;

should it be

Expression4 = { ('+' | '-'), SP }, Expression3 ;

since else wise there is a problem with a possible construction of

--

whit two successive unary expression?

Make it easier to return maps of select properties

CIR-2017-178

It is a common requirement to return only a handful of properties from a node, in cases where the actual node has many more properties than that. Currently this requires quite repetitive syntax:

MATCH (user:User{username:$username})
RETURN {
    username: user.username,
    email: user.email,
    firstName: user.firstName,
    lastName: user.lastName
} AS user

It would be nice to have a shorter syntax to allow selecting the properties to return:

MATCH (user:User{username:$username})
RETURN user{.username, .email, .firstName, .lastName}

Execution plan

Hi There,

I have been working on the ruruki in-memory graph db implementation of opencypher based on the ebnf (see cypher_parser on my fork or ruruki ) which is working quite nicely at this stage. Now that I have parsing results, I am trying to figure out the execution plan. Is there information that I can read on how to go about implementing the execution plan for cypher. I have looked at the EXPLAIN and PROFILE diagrams which gave me a simple overview but I need more detailed information about the rules and etc...

Your assistance is appreciated.

Jenda

[Grammar] Missing CALL clause

Neo4j 3.0 brings stored procedures, which adds new pieces to a Grammar.
Looks like that currently CALL clause is missing from grammar in the openCypher project.

This is BNF sample from Intellij plugin grammar that I wrote to add CALL support:

Call ::= K_CALL ProcedureNamespace ProcedureName ProcedureArguments ProcedureResults?
ProcedureNamespace ::= (SymbolicNameString ".")*
ProcedureArguments ::= "(" Expression? ("," Expression)* ")"
ProcedureResults ::= K_YIELD ProcedureResult ("," ProcedureResult)*
ProcedureResult ::= AliasedProcedureResult | SimpleProcedureResult
AliasedProcedureResult ::= ProcedureOutput K_AS Variable
SimpleProcedureResult ::= Variable
ProcedureOutput ::= SymbolicNameString

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.