Comments (11)
One likely reason for the behavior difference is that tree-sitter’s get_column function currently returns a byte count, not a character count. The JavaScript bindings encode strings as UTF16, so most characters are 2 bytes instead of 1, as in UTF8.
The get_column method is still a bit experimental, and we’ve talked about changing it back to returning a character count. In the meantime, scanners should make sure to not make assumptions about the absolute values that method returns, but instead just compare the values to each other.
Not sure if that’s the problem here, but wanted to raise it, since it’s gotcha currently, and this is one of the few grammars affected.
from tree-sitter-haskell.
It's about time I get off my butt and write good tests for my codepoint-based get_column
tree-sitter fork so it can be merged upstream.
from tree-sitter-haskell.
I can confirm that this is mostly fixed using the latest version, the only file that doesn't parse using the web assembly bindings is <examples/haskell-language-server/test/testdata/format/BrittanyCRLF.formatted_range.hs>.
from tree-sitter-haskell.
That's interesting, you can try running a -DDEBUG
build, and see what the first few tokens the scanner produces are?
from tree-sitter-haskell.
Not quite sure how to run a -DDEBUG build? You should be able to run these tests yourself, though. Lemme dig up the commands.
from tree-sitter-haskell.
Check out #68 and run npm run examples-wasm
. You'll need Node and Emscripten.
from tree-sitter-haskell.
I'll give it a go.
I've documented -DDEBUG
in enable scanner debug output
here
from tree-sitter-haskell.
Parsing examples/postgrest/test/Main.hs
passes if you delete either both let
statements, or the spec $ do ...
block
from tree-sitter-haskell.
I've created a list of the (smallest to largest) files that fail to parse under wasm:
{ for i in examples/**/*.hs; do out="$(2>&1 ./script/tree-sitter-parse.js $i)"; if echo "$out" | grep 'Parse error' > /dev/null; then wc -l "$i"; fi; done } | sort -n | tee list
output: here
The first file passes when you delete the ->
in the GADT, or the type signature of test
, or any two comment lines... I'll have a proper look this evening.
If you can spot anything that those files have in common, and others don't, let me know
from tree-sitter-haskell.
Thanks @maxbrunsfeld, that sounds very plausible.
I wonder if I halve double the indentation on those files whether they'll break with the native parser...
edit they didn't
from tree-sitter-haskell.
Ok great! If anyone gets a chance, it'd be super useful to extract from that file a minimal example of how the WASM and native parsers behave differently. At this point, if they still behave differently, that seems most likely to be a bug in Tree-sitter.
from tree-sitter-haskell.
Related Issues (20)
- Combining characters in identifiers are not parsed correctly HOT 1
- Include . from qualified modules and variables HOT 6
- Segfault on large files (in Neovim) HOT 1
- Upgrade node-gyp dependency HOT 2
- Components parser as type when they are not HOT 1
- Include ! from strictness annotations
- exp_section_right not parsed when containing a hash HOT 3
- Incorrect parse for function with where-clause and comments HOT 4
- Can't npm install tree-sitter-haskell on Mac M3 Node.js v20.10? HOT 1
- infixr and infixl not respected HOT 4
- Crashing (possibly while editing markdown) HOT 7
- Comments following function included in function pattern HOT 6
- Update to latest tree-sitter version
- Outermost function when using $ operator isn't parsed as a function HOT 8
- Instance with associated type, following TH top level splice, misparsed as function HOT 2
- Misparse of explicit-braced code
- UnicodeSyntax support HOT 10
- "undefined symbol: tree_sitter_haskell_external_scanner_create" when running "tree-sitter test" HOT 7
- Support `OverloadedRecordDot` HOT 8
- I added three more symbols for built-in syntax.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tree-sitter-haskell.