GithubHelp home page GithubHelp logo

Comments (7)

maxbrunsfeld avatar maxbrunsfeld commented on June 13, 2024 1

@clojj You can definitely put (ERROR) in the tests. We usually don't because the details of how errors are parsed (like which tokens get wrapped in an ERROR) is up to the Tree-sitter library itself and is subject to change as we further optimize and improve the error recovery algorithms.

from tree-sitter-haskell.

rewinfrey avatar rewinfrey commented on June 13, 2024 1

@clojj, thanks for your continued work here. So cool to see tree-sitter-haskell in use in your language-haskell plugin! Thanks also to @maxbrunsfeld for the remote 🍐 on the external scanner logic in 01ee480. @clojj, your concern about detecting in and potentially creating conflicts with identifiers starting with in was a great catch, but is something we have done in other grammars. In this case, we ensured that the in would not be confused with an identifier starting with in, by using iswalpha (which checks that the character after n is a white space character).

@clojj, I further simplified the let production in 0fe1f72 to reuse the _declarations rule, which preserves layout rules for declarations. 👍 for pointing out that the prec.right around _declarations was no longer necessary. 🙇

@clojj, I tried to use your solution to parse the test case added in 1b4b014, but received a parse error. I might not have configured the grammar exactly as you have locally, but it's a potential edge case to be aware of in your fork.

@clojj, the parse tree produced by the problematic example you've referenced:

f = let y = x
   z = 2
        x = 1
        in y

Appears to be a little misleading. It correctly produces an error, but I don't think it's identifying the intended error. Specifically the (ERROR...) subtree includes the final function declaration which is not the location of the error. The final function declaration also has a MISSING node in its tree, which should not be the case.

I've run the same example on the updated grammar and it produces a tree that I think more closely matches what we'd expect:

(module [0, 0] - [4, 0]
  (function_declaration [0, 0] - [3, 12]
    (function_head [0, 0] - [0, 1]
      (function_identifier [0, 0] - [0, 1]
        (variable [0, 0] - [0, 1])))
    (function_body [0, 4] - [3, 12]
      (let [0, 4] - [3, 12]
        (ERROR [0, 8] - [2, 9]
          (function_declaration [0, 8] - [0, 13]
            (function_head [0, 8] - [0, 9]
              (function_identifier [0, 8] - [0, 9]
                (variable [0, 8] - [0, 9])))
            (function_body [0, 12] - [0, 13]
              (variable [0, 12] - [0, 13])))
          (function_declaration [1, 3] - [2, 9]
            (function_head [1, 3] - [1, 4]
              (function_identifier [1, 3] - [1, 4]
                (variable [1, 3] - [1, 4])))
            (function_body [1, 7] - [2, 9]
              (function_application [1, 7] - [2, 9]
                (integer [1, 7] - [1, 8])
                (variable [1, 8] - [2, 9])))))
        (function_declaration [2, 10] - [2, 13]
          (function_head [2, 10] - [2, 10]
            (function_identifier [2, 10] - [2, 10]
              (variable_symbol [2, 10] - [2, 10])))
          (function_body [2, 12] - [2, 13]
            (integer [2, 12] - [2, 13])))
        (in_clause [2, 13] - [3, 12]
          (variable [3, 11] - [3, 12]))))))

It's a bit hard to read and look at nesting, but the (ERROR....) portion of the tree closes over the second function declaration (z = 2), which is the source of the error because as you've rightly pointed out, violates Haskell's layout rules. The final function declaration is still a child of the let subtree, but otherwise represents a well formed function declaration (this is because tree-sitter attempts to produce its best guess for trees containing errors, and in this case assumed the final function declaration is still part of the let statement because it is followed by an in_clause).

@clojj if you have changes that you'd like to push upstream PR's are always welcome. For now, I will close this issue as the original example is now correctly producing an error, and let productions enforce layout rules correctly as best as we can tell at this time.

from tree-sitter-haskell.

rewinfrey avatar rewinfrey commented on June 13, 2024

👋 @clojj, thanks for opening this issue. This is definitely a bug and the grammar is intended to respect Haskell's layout rules.

To correctly identify and enforce the layout rules, the grammar.js file defines externals. Those are special symbols used within productions like _declarations. Those symbols are detected in src/scanner.cc's scan method. The scanner is invoked whenever a production rule in grammar.js depends on a symbol in the externals list to help with detecting indentation or special sequences of characters that aren't expressible in a context free grammar.

The bug is in the let production rule. Specifically, the let rule should be defined as:

    let: $ => seq(
      'let',
      choice(
        seq(
          '{',
          repeat(seq($._declaration, $._terminal)),
          '}'
        ),
        seq(
          $._layout_open_brace,
          repeat1(seq($._declaration, choice($._terminal, $._layout_semicolon))),
          $._layout_close_brace
        )
      ),
      $.in_clause
    )

Edit: I forgot to mention that this update to let correctly detects an error in the example you provided, but then fails to correctly parse the example below.

However, the grammar is not smart enough to know when a function declaration stops and when the in_clause portion of the let production begins. I suspect to fix this, we'll want to add a new external symbol for detecting in, so we can correctly parse this example:

f = let x = 1 in y

Currently on a holiday weekend, but will look at fixing this Tuesday. In the meantime if you're curious and want to try fixing this PR's are welcome! I do realize there is a lot of context here that isn't well documented and many apologies for that. Thanks for taking a look and for your interest in this grammar!

cc/ @maxbrunsfeld

from tree-sitter-haskell.

clojj avatar clojj commented on June 13, 2024

If 'in' would represent this special termination-symbol (for let clauses), wouldn't it create conflicts with identifiers (in function bodies) named 'in' ?

I also tried this..


    let: $ => seq(
      'let',
      choice(
        seq(
          '{',
          repeat(seq($._declaration, $._terminal)),
          '}'
        ),
        seq(
          $._layout_open_brace,
          repeat1(seq($._declaration, choice($._terminal, $._layout_semicolon))),
          choice(
            seq($._declaration, $.in_clause),
            seq($._layout_close_brace, $.in_clause)
          )
        ),
        seq($._declaration, $.in_clause, $._layout_close_brace)
      )
    )

...but this gives me an error inside the type-declaration in this test:

f = let y = x
        x :: Int
        x = 1 in y

Further tests showed that the problem seems to occur at the type-signature part "x :: Int"
and in the aliased _guard_let of the "do"-parsing
..so I don't know what is going on here

See here
https://github.com/clojj/tree-sitter-haskell/blob/master/grammar.js

from tree-sitter-haskell.

clojj avatar clojj commented on June 13, 2024

another question:
how can I explicitly (assert) in 'corpus' tests for ERRORs ?

from tree-sitter-haskell.

clojj avatar clojj commented on June 13, 2024

this works, thx
(and I'll be aware of possible changes in error-handling upstream)

===================================================
Function Declarations With wrong indentation in Let
===================================================

f = let y = x
   z = 2
        x = 1
        in y

---

(module
  (function_declaration
    (function_head
      (function_identifier (variable)))
    (function_body
      (let
        (ERROR
          (function_declaration
            (function_head
              (function_identifier (variable)))
            (function_body (variable)))
          (function_declaration
            (function_head
              (function_identifier (variable)))
            (function_body
              (function_application (integer) (variable)))))
          (function_declaration
            (function_head (function_identifier (variable_symbol (MISSING))))
          (function_body (integer)))
      (in_clause (variable))))))

Unfortunately the grammar-change which makes this (error-)test green breaks other tests (see comment above).

from tree-sitter-haskell.

maxbrunsfeld avatar maxbrunsfeld commented on June 13, 2024

Is correct indentation a goal for this grammar?

@clojj Successfully parsing all correct Haskell code is the goal. It's not necessarily a goal to produce an error for all incorrect code (though we will of course return an error for many types of errors); if we parse some incorrect code without error, that's ok. Personally, I wouldn't recommend adding explicit tests that assert what errors we produce for the reasons I mentioned above.

from tree-sitter-haskell.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.