GithubHelp home page GithubHelp logo

Comments (11)

maxbrunsfeld avatar maxbrunsfeld commented on June 14, 2024 2

One likely reason for the behavior difference is that tree-sitter’s get_column function currently returns a byte count, not a character count. The JavaScript bindings encode strings as UTF16, so most characters are 2 bytes instead of 1, as in UTF8.

The get_column method is still a bit experimental, and we’ve talked about changing it back to returning a character count. In the meantime, scanners should make sure to not make assumptions about the absolute values that method returns, but instead just compare the values to each other.

Not sure if that’s the problem here, but wanted to raise it, since it’s gotcha currently, and this is one of the few grammars affected.

from tree-sitter-haskell.

ahelwer avatar ahelwer commented on June 14, 2024 2

It's about time I get off my butt and write good tests for my codepoint-based get_column tree-sitter fork so it can be merged upstream.

from tree-sitter-haskell.

wenkokke avatar wenkokke commented on June 14, 2024 1

I can confirm that this is mostly fixed using the latest version, the only file that doesn't parse using the web assembly bindings is <examples/haskell-language-server/test/testdata/format/BrittanyCRLF.formatted_range.hs>.

from tree-sitter-haskell.

414owen avatar 414owen commented on June 14, 2024

That's interesting, you can try running a -DDEBUG build, and see what the first few tokens the scanner produces are?

from tree-sitter-haskell.

wenkokke avatar wenkokke commented on June 14, 2024

Not quite sure how to run a -DDEBUG build? You should be able to run these tests yourself, though. Lemme dig up the commands.

from tree-sitter-haskell.

wenkokke avatar wenkokke commented on June 14, 2024

Check out #68 and run npm run examples-wasm. You'll need Node and Emscripten.

from tree-sitter-haskell.

414owen avatar 414owen commented on June 14, 2024

I'll give it a go.

I've documented -DDEBUG in enable scanner debug output here

from tree-sitter-haskell.

414owen avatar 414owen commented on June 14, 2024

Parsing examples/postgrest/test/Main.hs passes if you delete either both let statements, or the spec $ do ... block

from tree-sitter-haskell.

414owen avatar 414owen commented on June 14, 2024

I've created a list of the (smallest to largest) files that fail to parse under wasm:

{ for i in examples/**/*.hs; do out="$(2>&1 ./script/tree-sitter-parse.js $i)"; if echo "$out" | grep 'Parse error' > /dev/null; then wc -l "$i"; fi; done } | sort -n | tee list

output: here

The first file passes when you delete the -> in the GADT, or the type signature of test, or any two comment lines... I'll have a proper look this evening.

If you can spot anything that those files have in common, and others don't, let me know

from tree-sitter-haskell.

414owen avatar 414owen commented on June 14, 2024

Thanks @maxbrunsfeld, that sounds very plausible.

I wonder if I halve double the indentation on those files whether they'll break with the native parser...

edit they didn't

from tree-sitter-haskell.

maxbrunsfeld avatar maxbrunsfeld commented on June 14, 2024

Ok great! If anyone gets a chance, it'd be super useful to extract from that file a minimal example of how the WASM and native parsers behave differently. At this point, if they still behave differently, that seems most likely to be a bug in Tree-sitter.

from tree-sitter-haskell.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.