Comments (3)
Hi Piotr,
Thank you for your question and for engaging with the openCypher artifacts.
Unfortunately, I cannot really get a handle on your question. Could be please be more precise? What is precisely the mess? In which respect you find things messy? What is the picture suppose to show? Which parser generator framework are you using Antlr, I guess? Which query are you trying to parse?
Can you try to reduce the problem to a very small instance (very short and simple query), where you start to get "the mess".
Sorry for all the questions, but your post does not provide me enough context to do anything helpful with it.
Cheers
-Hannes
from opencypher.
Hi Hannes,
The exact cypher query is:
MATCH(n:Person)
WHERE n.name='Adam' AND n.Age = 18
RETURN n
I am starting to get "the mess" in the Expression Evaluation.
I am using the newest Antlr version to generate my parser/lexer.
This is the correct Raiload Diagram from the OpenCypher Website (https://s3.amazonaws.com/artifacts.opencypher.org/M18/railroad/Expression.html)
As you can see AND,XOR,OR are "sibling-sibling" relationship
However, it becomes "parent-child" relationship and is also incomplete (it stops where it is supposed to have children - see the intelisense):
In the generated parser and lexer the type dependency diagram for the expression context looks like this:
Which means that there is a parent-child relationship.
As an experiment I wrote a recursive method to manually walk the parse tree and write out every lexeme (child context) that it finds. Indents indicate Child-Parent Relationships in a context.
--- Start Parse Tree ---
Type: OC_StatementContext, Value:
MATCH(n:Person)
WHERE n.name='Adam' AND n.Age = 18
RETURN n
Type: OC_QueryContext, Value:
MATCH(n:Person)
WHERE n.name='Adam' AND n.Age = 18
RETURN n
Type: OC_RegularQueryContext, Value:
MATCH(n:Person)
WHERE n.name='Adam' AND n.Age = 18
RETURN n
Type: OC_SingleQueryContext, Value:
MATCH(n:Person)
WHERE n.name='Adam' AND n.Age = 18
RETURN n
Type: OC_SinglePartQueryContext, Value:
MATCH(n:Person)
WHERE n.name='Adam' AND n.Age = 18
RETURN n
Type: OC_ReadingClauseContext, Value: MATCH(n:Person)
WHERE n.name='Adam' AND n.Age = 18
Type: OC_MatchContext, Value: MATCH(n:Person)
WHERE n.name='Adam' AND n.Age = 18
Type: TerminalNodeImpl, Value: MATCH
Type: OC_PatternContext, Value: (n:Person)
Type: OC_PatternPartContext, Value: (n:Person)
Type: OC_AnonymousPatternPartContext, Value: (n:Person)
Type: OC_PatternElementContext, Value: (n:Person)
Type: OC_NodePatternContext, Value: (n:Person)
Type: TerminalNodeImpl, Value: (
Type: OC_VariableContext, Value: n
Type: OC_SymbolicNameContext, Value: n
Type: TerminalNodeImpl, Value: n
Type: OC_NodeLabelsContext, Value: :Person
Type: OC_NodeLabelContext, Value: :Person
Type: TerminalNodeImpl, Value: :
Type: OC_LabelNameContext, Value: Person
Type: OC_SchemaNameContext, Value: Person
Type: OC_SymbolicNameContext, Value: Person
Type: TerminalNodeImpl, Value: Person
Type: TerminalNodeImpl, Value: )
Type: TerminalNodeImpl, Value:
Type: OC_WhereContext, Value: WHERE n.name='Adam' AND n.Age = 18
Type: TerminalNodeImpl, Value: WHERE
Type: TerminalNodeImpl, Value:
Type: OC_ExpressionContext, Value: n.name='Adam' AND n.Age = 18
Type: OC_OrExpressionContext, Value: n.name='Adam' AND n.Age = 18
Type: OC_XorExpressionContext, Value: n.name='Adam' AND n.Age = 18
Type: OC_AndExpressionContext, Value: n.name='Adam' AND n.Age = 18
Type: OC_NotExpressionContext, Value: n.name='Adam'
Type: OC_ComparisonExpressionContext, Value: n.name='Adam'
Type: OC_AddOrSubtractExpressionContext, Value: n.name
Type: OC_MultiplyDivideModuloExpressionContext, Value: n.name
Type: OC_PowerOfExpressionContext, Value: n.name
Type: OC_UnaryAddOrSubtractExpressionContext, Value: n.name
Type: OC_StringListNullOperatorExpressionContext, Value: n.name
Type: OC_PropertyOrLabelsExpressionContext, Value: n.name
Type: OC_AtomContext, Value: n
Type: OC_VariableContext, Value: n
Type: OC_SymbolicNameContext, Value: n
Type: TerminalNodeImpl, Value: n
Type: OC_PropertyLookupContext, Value: .name
Type: TerminalNodeImpl, Value: .
Type: OC_PropertyKeyNameContext, Value: name
Type: OC_SchemaNameContext, Value: name
Type: OC_SymbolicNameContext, Value: name
Type: TerminalNodeImpl, Value: name
Type: OC_PartialComparisonExpressionContext, Value: ='Adam'
Type: TerminalNodeImpl, Value: =
Type: OC_AddOrSubtractExpressionContext, Value: 'Adam'
Type: OC_MultiplyDivideModuloExpressionContext, Value: 'Adam'
Type: OC_PowerOfExpressionContext, Value: 'Adam'
Type: OC_UnaryAddOrSubtractExpressionContext, Value: 'Adam'
Type: OC_StringListNullOperatorExpressionContext, Value: 'Adam'
Type: OC_PropertyOrLabelsExpressionContext, Value: 'Adam'
Type: OC_AtomContext, Value: 'Adam'
Type: OC_LiteralContext, Value: 'Adam'
Type: TerminalNodeImpl, Value: 'Adam'
Type: TerminalNodeImpl, Value:
Type: TerminalNodeImpl, Value: AND
Type: TerminalNodeImpl, Value:
Type: OC_NotExpressionContext, Value: n.Age = 18
Type: OC_ComparisonExpressionContext, Value: n.Age = 18
Type: OC_AddOrSubtractExpressionContext, Value: n.Age
Type: OC_MultiplyDivideModuloExpressionContext, Value: n.Age
Type: OC_PowerOfExpressionContext, Value: n.Age
Type: OC_UnaryAddOrSubtractExpressionContext, Value: n.Age
Type: OC_StringListNullOperatorExpressionContext, Value: n.Age
Type: OC_PropertyOrLabelsExpressionContext, Value: n.Age
Type: OC_AtomContext, Value: n
Type: OC_VariableContext, Value: n
Type: OC_SymbolicNameContext, Value: n
Type: TerminalNodeImpl, Value: n
Type: OC_PropertyLookupContext, Value: .Age
Type: TerminalNodeImpl, Value: .
Type: OC_PropertyKeyNameContext, Value: Age
Type: OC_SchemaNameContext, Value: Age
Type: OC_SymbolicNameContext, Value: Age
Type: TerminalNodeImpl, Value: Age
Type: TerminalNodeImpl, Value:
Type: OC_PartialComparisonExpressionContext, Value: = 18
Type: TerminalNodeImpl, Value: =
Type: TerminalNodeImpl, Value:
Type: OC_AddOrSubtractExpressionContext, Value: 18
Type: OC_MultiplyDivideModuloExpressionContext, Value: 18
Type: OC_PowerOfExpressionContext, Value: 18
Type: OC_UnaryAddOrSubtractExpressionContext, Value: 18
Type: OC_StringListNullOperatorExpressionContext, Value: 18
Type: OC_PropertyOrLabelsExpressionContext, Value: 18
Type: OC_AtomContext, Value: 18
Type: OC_LiteralContext, Value: 18
Type: OC_NumberLiteralContext, Value: 18
Type: OC_IntegerLiteralContext, Value: 18
Type: TerminalNodeImpl, Value: 18
Type: TerminalNodeImpl, Value:
Type: OC_ReturnContext, Value: RETURN n
Type: TerminalNodeImpl, Value: RETURN
Type: OC_ProjectionBodyContext, Value: n
Type: TerminalNodeImpl, Value:
Type: OC_ProjectionItemsContext, Value: n
Type: OC_ProjectionItemContext, Value: n
Type: OC_ExpressionContext, Value: n
Type: OC_OrExpressionContext, Value: n
Type: OC_XorExpressionContext, Value: n
Type: OC_AndExpressionContext, Value: n
Type: OC_NotExpressionContext, Value: n
Type: OC_ComparisonExpressionContext, Value: n
Type: OC_AddOrSubtractExpressionContext, Value: n
Type: OC_MultiplyDivideModuloExpressionContext, Value: n
Type: OC_PowerOfExpressionContext, Value: n
Type: OC_UnaryAddOrSubtractExpressionContext, Value: n
Type: OC_StringListNullOperatorExpressionContext, Value: n
Type: OC_PropertyOrLabelsExpressionContext, Value: n
Type: OC_AtomContext, Value: n
Type: OC_VariableContext, Value: n
Type: OC_SymbolicNameContext, Value: n
Type: TerminalNodeImpl, Value: n
--- End Parse Tree ---
For convenience, I attached a graphical representation of all elements of the parse tree diagram
As you can see, current interpretation of a parse tree is incomplete and incorrect. Especially this part seems to defy all logic, how to determine if this is any of these expression? The only way I see this parsed is to write Template-like structure to see at the children of every context and somehow take into account every edge case
Type: OC_ExpressionContext, Value: n.name='Adam' AND n.Age = 18
Type: OC_OrExpressionContext, Value: n.name='Adam' AND n.Age = 18
Type: OC_XorExpressionContext, Value: n.name='Adam' AND n.Age = 18
Type: OC_AndExpressionContext, Value: n.name='Adam' AND n.Age = 18
Type: OC_NotExpressionContext, Value: n.name='Adam'
Type: OC_ComparisonExpressionContext, Value: n.name='Adam'
Type: OC_AddOrSubtractExpressionContext, Value: n.name
Type: OC_MultiplyDivideModuloExpressionContext, Value: n.name
Type: OC_PowerOfExpressionContext, Value: n.name
Type: OC_UnaryAddOrSubtractExpressionContext, Value: n.name
Type: OC_StringListNullOperatorExpressionContext, Value: n.name
Type: OC_PropertyOrLabelsExpressionContext, Value: n.name
Here's a sourcecode of ExpressionAnalyzer - It analyzes the expression : (https://github.com/pmikstacki/SliccDB/blob/master/SliccDB.Cypher/Analyzers/ExpressionAnalyzer.cs)
Here's a method that I used to generate text parse tree: (https://github.com/pmikstacki/SliccDB/blob/master/SliccDB.Cypher/Utility/ParseTreeHelper.cs)
Here's a code that I used to generate the Graphical Representation of a parse tree: (https://github.com/pmikstacki/SliccDB/blob/master/SliccDB.Explorer/Utility/DebugParseTreeVisual.xaml.cs)
Here's a working implementation of PatternPart that is working correctly (notice the similar way that I try to parse the oC_PatternElement: (https://github.com/pmikstacki/SliccDB/blob/master/SliccDB.Cypher/Analyzers/PatternPartAnalyzer.cs)
I hope that the details above are sufficient.
Is this supposed to look like this or there is an error in grammar?
Kind Regards,
Piotr Mikstacki
from opencypher.
Hi Piotr,
thanks for your detailed reply.
I understand you are confused why a parser gives you for n.name='Adam' AND n.age=18
an OrExpression
containing a XorExpression
containing an AndExpression
containing a NotExpression
and so on all the way through the whole stack of expressions – while at the same time the railroad diagrams show them all neatly as alternatives of one single thing.
The answer to this is that two-fold.
Firstly, the railroad diagrams are not the ground truth of the grammar, but merely a more developer-manual-like representation of the grammar that gets generator out of the grammar xml files. The g4 antlr grammar also gets generated from the same xml files. These xml files are the ground truth of the grammar.
For (subjectively) better readability, the railroad diagrams inline certain grammar productions into a single picture. This is again encoded in the grammar xml files. Productions that have the attribute rr:inline="true"
get inlined. You will find this attribute on almost all expression productions. This is why they all end up in a single railroad diagram, although they are in fact a large number of individual grammar productions.
Secondly, let's look at the "true" grammar as encoded in the xml files. The production Expression
is the top of the stack of productions that define the syntax of expressions, followed by the productions for the logical expressions (OR, XOR, AND, etc). For these you find the following. in the xml files:
<production name="Expression">
<non-terminal ref="OrExpression"/>
</production>
<production name="OrExpression" rr:inline="true">
<non-terminal ref="XorExpression"/>
<repeat>&SP; OR &SP;<non-terminal ref="XorExpression"/></repeat>
</production>
<production name="XorExpression" rr:inline="true">
<non-terminal ref="AndExpression"/>
<repeat>&SP; XOR &SP;<non-terminal ref="AndExpression"/></repeat>
</production>
<production name="AndExpression" rr:inline="true">
<non-terminal ref="NotExpression"/>
<repeat>&SP; AND &SP;<non-terminal ref="NotExpression"/></repeat>
</production>
<production name="NotExpression" rr:inline="true">
<repeat>NOT &WS;</repeat>
<non-terminal ref="ComparisonExpression"/>
</production>
They all follow the same pattern. Every production allows the syntax to repeat the same operation zero or more time, while the respective operands are expression of operand that is next in order of increasing precedence. This allows to nest operations of higher precedence into operation of lower precedence, so that you can write a XOR b OR c XOR d
, where a XOR b
and c XOR d
are operands of _ OR _
.
Note also that each production allows zero repetitions of the operation it encodes, where zero repetition means the operation does not appear at all. The effect is that an expression can "drop through" the operations it does not need. Or more specifically, it allows that an _ OR _
does not require its operand to be _ XOR _
s but allows any expression of high precedence, so that you can also write e.g. NOT b OR c AND d
, where NOT b
and c AND d
are operands of _ OR _
.
This pattern continues through the whole stack of expressions and ends in the production called Atom
. Atoms are expressions with the highest precedence such as a literal or an expression enclosed in parentheses. The latter loops back to the top of the expression stack, which allows nesting operations of lower precedence under operations of high precedence, so that you can write a AND (b OR c)
, where a
and b OR c
are the operands of _ AND _
, something the production listed above do not allow by themselves.
This recursive pattern of "drop through" production rules is the usual way expression grammar looks like if they encode the precedence. Some grammars have recursion in each production itself (instead of repetition), to encode operator associativity in the grammar. The oC grammar does not do that. Some grammars have multiple such stacks of production with the grammar also encode value type rules (e.g. you can not AND
´d integers). The oC grammar does not encode value type rules so it has just a single stack of expression productions including all operations.
When consuming the parsing result, the effect of this grammar design is that you see also expressions in the parse tree. because the parse put a node in the parse tree for each production it uses. This is totally normal and expected.
Many implementations have some kind of normalization and simplification step that turn the parse tree into an intermediate query representation (IR) which is usually also a tree and is feed into the query processor. This IR would usually have all "drop through" productions be removed, because they are a pure parsing artifacts. It is easy to detect, for instance, if an OrExpression
only contains a single child (which is an XorExpression
) that where is no actual OR
in the query text and the IR does not need a node for the OR
operations. Note this kind of removal make sense in the expression context but is not a generally correct step. In other contexts, even the parse tree node with only a single child may convey valuable information. However, when designing an IR you are free to chose how to represent such information in your IR. Parse generator frameworks such as Antlr cannot know these things, since they only see the grammar, i.e. a syntax specification and do not know the semantics of the specified language.
Hope that helps.
Best
-H
from opencypher.
Related Issues (20)
- Deleting the relationship for the dynamic labels in UNWIND HOT 1
- Issue in UNWIND option for duplicate records HOT 2
- Performance is very poor for the create/update relationships HOT 1
- Any plan to release Cypher 10? HOT 3
- Beta version missing dependency ?
- Problem parsing oC_MultiPartQuery with ANTLR4 HOT 3
- Improve the use of lists stored as property HOT 3
- missing children from `OC_ComparisonExpressionContext` in cpp-runtime of ANTLR HOT 2
- Incomplete EBNF Grammar HOT 9
- NULL token name is a reserved word in C++ HOT 2
- MANDATORY MATCH HOT 3
- Invalid syntax HOT 3
- Commit the .ebnf output into the repo HOT 1
- User defined function in custom namespace parsed as property lookup if function name is a keyword HOT 3
- Validating ``WHERE (value in property)``
- TCK only published with features
- RedisGraph: EOL
- Question on Semantics of WITH clause HOT 1
- Grammar HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from opencypher.