GithubHelp home page GithubHelp logo

Comments (5)

maxbrunsfeld avatar maxbrunsfeld commented on May 19, 2024

I’m pretty sure this code is not valid; you would have to wrap the struct literal in parentheses. They are explicitly disallowed in that position in order to avoid ambiguity with the block.

https://doc.rust-lang.org/reference/expressions/struct-expr.html#struct-expressions

Struct expressions can't be used directly in the head of a loop or an if, if let or match expression. But struct expressions can still be in used inside parentheses, for example.

from tree-sitter-rust.

maxbrunsfeld avatar maxbrunsfeld commented on May 19, 2024

I'm going to close this out as I think we're doing the right thing. Let me know if I'm missing something.

from tree-sitter-rust.

p-e-w avatar p-e-w commented on May 19, 2024

Yeah, that's not actually what I meant. Obviously, this code is not valid Rust per the Rust language specification.

However, the code appears to be a string that should be recognized by this grammar as

(source_file (if_let_expression (remaining_field_pattern) (struct_expression (type_identifier) (field_initializer_list)) (block)))

because it was generated from that rule tree. Instead, the grammar tries a different approach (where A is an identifier) which results in a syntax error because of the ,, which is invalid by itself in a block, but not inside a field_initializer_list.

Note that if_let_expression permits any _expression after the =, which in turn permits a struct_expression, matching the AST above.

So unless I have misunderstood the rule mechanics somehow, at minimum there appears to be an ambiguity in the tree-sitter-rust grammar here, and possibly even a problem in tree-sitter itself if the runtime ambiguity resolution prefers an AST with a syntax error over an alternative AST without one.

from tree-sitter-rust.

maxbrunsfeld avatar maxbrunsfeld commented on May 19, 2024

This is intentional. There isn't exactly an ambiguity in the grammar, but there is what you might call a local ambiguity, also referred to as an LR(1) conflict. To explain the conflict, I'll show you the error message that Tree-sitter would give if we hadn't already specified how to resolve the conflict:

Error: Unresolved conflict for symbol sequence:

  'if'  identifier  •  '{'  …

Possible interpretations:

  1:  'if'  (_expression  identifier)  •  '{'  …
  2:  'if'  (struct_expression  identifier  •  field_initializer_list)

Possible resolutions:

  1:  Specify a higher precedence in `struct_expression` than in the other rules.
  2:  Specify a higher precedence in `_expression` than in the other rules.
  3:  Specify a left or right associativity in `_expression`
  4:  Add a conflict for these rules: `_expression` `struct_expression`

Normally, Tree-sitter uses the LR(1) parsing algorithm. At a high level, this means that it processes the input from left to right, and before it can advance past a given token, it must fully decide how to interpret the preceding tokens. In this case, before it can advance past the {, it must decide how to interpret the preceding identifier. Is it an expression (evaluating a variable) or is it a bare identifier that will become part of a struct_expression?

Currently, we have resolved the conflict using option 3: we have specified a left associativity in the _expression rule which instructs the parser generator, in the event of a conflict at compile time, to prefer building up subtrees to the left of the current position (i.e. build the _expression tree now rather than waiting to build the struct_expression tree later).

If the Rust language was specified differently, and struct literals were allowed in this position, this would not work. We would need to do option 4: add a whitelisted conflict between _expression and struct_expression. If we did that, then when parsing this code, Tree-sitter would split the parse stack in order to resolve the conflict at runtime. This would have a slight performance cost.

from tree-sitter-rust.

p-e-w avatar p-e-w commented on May 19, 2024

Thank you for explaining this in such detail, it makes much more sense now.

I had noticed the precedence annotations before, but I never considered the possibility that they would be honored even if the resulting parse tree contains a syntax error and ignoring the precedence/associativity rules leads to a parse tree that is error-free.

from tree-sitter-rust.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.