GithubHelp home page GithubHelp logo

tree-sitter / tree-sitter-ruby Goto Github PK

View Code? Open in Web Editor NEW
160.0 17.0 51.0 308.5 MB

Ruby grammar for tree-sitter

License: MIT License

JavaScript 44.21% Ruby 3.80% Scheme 5.45% C 46.53%
tree-sitter ruby parser

tree-sitter-ruby's Introduction

tree-sitter-ruby's People

Contributors

ahelwer avatar aibaars avatar amaanq avatar aryx avatar daviwil avatar dcreager avatar drwpow avatar hendrikvanantwerpen avatar hvitved avatar joshvera avatar lunks avatar maletor avatar mattmassicotte avatar maxbrunsfeld avatar mjambon avatar nbrahms avatar nickrolfe avatar npezza93 avatar patrickt avatar philipturnbull avatar rebornix avatar rewinfrey avatar robrix avatar tclem avatar wingrunr21 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tree-sitter-ruby's Issues

Modules aren't foldable using tree sitter, even with correct indentation

Screenshot

image

Steps to reproduce

Create a new ruby file and put the following text in it

module A
  module B
    class C
      def D
        puts 'test'
      end
    end
  end
end

Try to fold using either the fold handles or the fold-at-indent-level-* functions.

Expected Result

You can fold ruby code at modules the same as at classes and methods

Actual Result

Modules don't fold (at all)

Static elements of word and symbol lists are ignored

Static elements of word and symbol lists are ignored:

$ node_modules/.bin/tree-sitter parse <(echo '%w(foo bar)')
(program [0, 0] - [1, 0]
  (array [0, 0] - [0, 11]))
$ node_modules/.bin/tree-sitter parse <(echo '%W(foo bar)')
(program [0, 0] - [1, 0]
  (array [0, 0] - [0, 11]))
$ node_modules/.bin/tree-sitter parse <(echo '%i(abc def)')
(program [0, 0] - [1, 0]
  (array [0, 0] - [0, 11]))
$ node_modules/.bin/tree-sitter parse <(echo '%I(abc def)')
(program [0, 0] - [1, 0]
  (array [0, 0] - [0, 11]))

However in literal types allowing interpolation (%W and %I), interpolated elements behave as expected:

$ node_modules/.bin/tree-sitter parse <(echo '%W(foo #{123} bar #{"456"})')
(program [0, 0] - [1, 0]
  (array [0, 0] - [0, 27]
    (integer [0, 9] - [0, 12])
    (string [0, 20] - [0, 25])))
$ node_modules/.bin/tree-sitter parse <(echo '%I(foo #{123} bar #{"456"})')
(program [0, 0] - [1, 0]
  (array [0, 0] - [0, 27]
    (integer [0, 9] - [0, 12])
    (string [0, 20] - [0, 25])))

It also appears that the AST does not distinguish between regular word lists and symbol lists.

Parse heredoc strings & subshells

#4 implemented all of the other literals, but not heredoc strings/subshells, because:

  1. There’s no way for us to use a matched token as the end delimiter. We would need something essentially like monadic bind to make this work.

  2. Heredoc elements apply to the next line. I don’t know a way to make that work correctly w.r.t. the remainder of the current line:

    print (<<END + ", world")
    hello
    END

Error on trailing comma in argument list

foo.rb:

# this works
a(b:'c',)

# this works too
a(b_:'c')

# this chokes on the trailing comma
a(b_:'c',)
$ npx tree-sitter parse foo.rb
(program [0, 0] - [8, 0]
  (comment [0, 0] - [0, 12])
  (method_call [1, 0] - [1, 9]
    method: (identifier [1, 0] - [1, 1])
    arguments: (argument_list [1, 1] - [1, 9]
      (pair [1, 2] - [1, 7]
        key: (symbol [1, 2] - [1, 3])
        value: (string [1, 4] - [1, 7]
          (string_content [1, 5] - [1, 6])))))
  (comment [3, 0] - [3, 16])
  (method_call [4, 0] - [4, 9]
    method: (identifier [4, 0] - [4, 1])
    arguments: (argument_list [4, 1] - [4, 9]
      (method_call [4, 2] - [4, 8]
        method: (identifier [4, 2] - [4, 4])
        arguments: (argument_list [4, 4] - [4, 8]
          (symbol [4, 4] - [4, 8]
            (string_content [4, 6] - [4, 7]))))))
  (comment [6, 0] - [6, 12])
  (method_call [7, 0] - [7, 10]
    method: (identifier [7, 0] - [7, 1])
    arguments: (argument_list [7, 1] - [7, 10]
      (method_call [7, 2] - [7, 8]
        method: (identifier [7, 2] - [7, 4])
        arguments: (argument_list [7, 4] - [7, 8]
          (symbol [7, 4] - [7, 8]
            (string_content [7, 6] - [7, 7]))))
      (ERROR [7, 8] - [7, 9]))))
foo.rb	0 ms	(ERROR [7, 8] - [7, 9])

Symbols are highlighted the same as Constants

Thanks for this excellent project!

I'm using nvim-treesitter and 8fd340f and seeing Symbols highlighted the same as Constants. Is there a way to distinguish between the two so I can highlight them differently?

Here's screenshots using :TSHighlightCapturesUnderCursor from nvim-treesitter/playground

Screen Shot 2021-01-25 at 11 41 30 AM

Screen Shot 2021-01-25 at 11 41 21 AM

Parse error on uppercase int literal prefixes (hex, octal, etc)

The file in question comes from metasploit-framework: windows_registry_parser.rb.

It appears that the parser errors on uppercase int literal prefixes:

class WindowsRegistryParser
  # SK magic value: 'sk'
  SK_MAGIC = 0X7269
end

I.e. 0X7269 fails, but 0x7269 does not.

Apparently, these prefixes are case insensitive. Oddly enough, it looks like this only errors out for hex values - decimal, octal, and binary appear to work. It couldn't hurt to double check me here, though.

Compound statements not supported in interpolations

"#{ foo; bar }"

is accepted by MRI, but not by tree-sitter:

program [0, 0] - [1, 0])
  string [0, 0] - [0, 15])
    interpolation [0, 1] - [0, 14])
      method_call [0, 4] - [0, 12])
        method: identifier [0, 4] - [0, 7])
        ERROR [0, 7] - [0, 8])
        arguments: argument_list [0, 9] - [0, 12])
          identifier [0, 9] - [0, 12])

Infinite loop when scanning symbol identifiers

An infinite loop can be triggered in the scanner when an input ends with a symbol, e.g.

$ echo -n ':foo' >/tmp/symbol
$ hexdump -C /tmp/symbol
00000000  3a 66 6f 6f                                       |:foo|
00000004
$ /node_modules/.bin/tree-sitter parse /tmp/symbol
^C

I believe this caused by this loop:

while (is_iden_char(lexer->lookahead)) {
advance(lexer);
}

When we reach the end of the buffer, ->lookahead is zero so is not an identifier character and advance(...) is a no-op so we never make any forward progress.

Distinguish protected and private methods and constants

Given the following Ruby code:

class Example
  def bar; end

  private
  def foo; end
end

Today tree-sitter-ruby produces:

(program [0, 0] - [6, 0]
  (class [0, 0] - [5, 3]
    (constant [0, 6] - [0, 13])
    (method [1, 2] - [1, 14]
      (identifier [1, 6] - [1, 9]))
    (identifier [3, 2] - [3, 9])
    (method [4, 2] - [4, 14]
      (identifier [4, 6] - [4, 9]))))

@maxbrunsfeld what do you think about changing the parse tree to communicate private or protected members? It'd be great from a client perspective if the parse tree could reflect the visibility of a member e.g. private_method or protected_constant (with the default always indicating public visibility). One advantage is that clients wouldn't require state to track what visibility scope (nor have to make a check against each identifier) in the parse tree for protected or private, but I'm guessing this would require an additional piece of state on the external scanner and would further complicate the grammar. What do you think?

Misparse on empty brace block passed to dot-call

tree-sitter-ruby misparses this code:

a.() {}
(program [0, 0] - [1, 0]
  (method_call [0, 0] - [0, 7]
    (call [0, 0] - [0, 4]
      (identifier [0, 0] - [0, 1])
      (argument_list_with_parens [0, 2] - [0, 4]))
    (argument_list [0, 5] - [0, 7]
      (hash [0, 5] - [0, 7]))))

I think this is related to #73, but interestingly the grammar parses do blocks and non-empty brace blocks correctly:

$ node_modules/.bin/tree-sitter parse <(echo 'a.() { 123 }')
(program [0, 0] - [1, 0]
  (method_call [0, 0] - [0, 12]
    (call [0, 0] - [0, 4]
      (identifier [0, 0] - [0, 1])
      (argument_list_with_parens [0, 2] - [0, 4]))
    (block [0, 5] - [0, 12]
      (integer [0, 7] - [0, 10]))))
$ node_modules/.bin/tree-sitter parse <(echo 'a.() do end')
(program [0, 0] - [1, 0]
  (method_call [0, 0] - [0, 11]
    (call [0, 0] - [0, 4]
      (identifier [0, 0] - [0, 1])
      (argument_list_with_parens [0, 2] - [0, 4]))
    (do_block [0, 5] - [0, 11])))

Unable to parse constants with unicode chars

Looks like this might also be a problem for other things like symbols.

Example:

 = 1
❯ tree-sitter parse test.rb -d
new_parse {}
process { version: '0',
  version_count: '1',
  state: '1',
  row: '0',
  col: '0' }
lex_external { state: '2', row: '0', column: '0' }
lex_internal { state: '232', row: '0', column: '0' }
   consume { character: '\'C\'' }
lexed_lookahead { sym: 'constant', size: '2' }
shift { state: '40' }
process { version: '0',
  version_count: '1',
  state: '40',
  row: '0',
  col: '2' }
lex_external { state: '9', row: '0', column: '2' }
lex_internal { state: '571', row: '0', column: '2' }
retry_in_error_mode {}
lex_external { state: '1', row: '0', column: '2' }
lex_internal { state: '0', row: '0', column: '2' }
skip_unrecognized_character {}
   consume { character: '\'�\'' }
lex_external { state: '1', row: '0', column: '4' }
   skip { character: '\' \'' }
lex_internal { state: '0', row: '0', column: '4' }
   skip { character: '\' \'' }
   consume { character: '\'=\'' }
lexed_lookahead { sym: 'ERROR', size: '2' }
handle_error {}
shift { state: '0' }
process { version: '0',
  version_count: '1',
  state: '0',
  row: '0',
  col: '4' }
lex_external { state: '1', row: '0', column: '4' }
   skip { character: '\' \'' }
lex_internal { state: '0', row: '0', column: '4' }
   skip { character: '\' \'' }
   consume { character: '\'=\'' }
lexed_lookahead { sym: '=', size: '2' }
recover { state: '13442' }
process { version: '0',
  version_count: '2',
  state: '13442',
  row: '0',
  col: '8' }
lex_external { state: '75', row: '0', column: '8' }
   skip { character: '\' \'' }
lex_internal { state: '1510', row: '0', column: '8' }
   skip { character: '\' \'' }
   consume { character: '\'1\'' }
lexed_lookahead { sym: 'integer', size: '2' }
shift { state: '13484' }
process { version: '1',
  version_count: '2',
  state: '0',
  row: '0',
  col: '8' }
lex_external { state: '1', row: '0', column: '8' }
   skip { character: '\' \'' }
lex_internal { state: '0', row: '0', column: '8' }
   skip { character: '\' \'' }
   consume { character: '\'1\'' }
lexed_lookahead { sym: 'integer', size: '2' }
recover { state: '13484' }
process { version: '0',
  version_count: '3',
  state: '13484',
  row: '0',
  col: '12' }
lex_external { state: '69', row: '0', column: '12' }
   consume { character: '\'\n\'' }
lexed_lookahead { sym: '_line_break', size: '2' }
reduce { sym: '_primary', child_count: '1' }
reduce { sym: '_arg', child_count: '1' }
reduce { sym: '_arg_or_splat_arg', child_count: '1' }
reduce { sym: 'optional_parameter', child_count: '3' }
repair_error {}
no_repair_found {}
reduce { sym: 'right_assignment_list', child_count: '1' }
reduce { sym: 'assignment', child_count: '3' }
repair_error {}
halt_other { version: '0' }
halt_other { version: '1' }
halt_other { version: '2' }
halt_other { version: '3' }
repair_found { sym: 'assignment', child_count: '3', cost: '101' }
reduce { sym: '_arg', child_count: '1' }
reduce { sym: '_statement', child_count: '1' }
reduce { sym: '_top_level_statement', child_count: '1' }
shift { state: '219' }
condense {}
process { version: '0',
  version_count: '1',
  state: '219',
  row: '1',
  col: '0' }
lex_external { state: '2', row: '1', column: '0' }
lex_internal { state: '703', row: '1', column: '0' }
lexed_lookahead { sym: 'END', size: '0' }
reduce { sym: '_terminator', child_count: '1' }
reduce { sym: '_statements', child_count: '2' }
reduce { sym: 'program', child_count: '1' }
accept {}
condense {}
done {}
test.rb	40 ms	ERROR [0, 1] - [0, 2]

Load Ruby Language using wasm failed

Hello,

I cannot load this Language through wasm.

import Parser from "web-tree-sitter"
await Parser.init()
const parser = new Parser()
const Lang = await Parser.Language.load("tree-sitter-ruby.wasm")
parser.setLanguage(Lang)

Error:

{
    "errorMessage": "e[Object.keys(...).find(...)] is not a function",
    "errorType": "TypeError",
    "stackTrace": [
        "TypeError: e[Object.keys(...).find(...)] is not a function",
        "    at /home/xxx/code/xxx/node_modules/web-tree-sitter/tree-sitter.js:1:45617",
        "    at handler (/home/xxx/code/xxx/.webpack/index/webpack:home/xxx/code/xxx/index.ts:23:16)"
    ]
}

The code fails on const Lang = await Parser.Language.load("tree-sitter-ruby.wasm").
Testing with tree-sitter-javascript.wasm works.
Both wasm files have been generated using https://github.com/tree-sitter/tree-sitter/tree/master/lib/binding_web#generate-wasm-language-files

As the code is working with another Language, I guess it's not an issue related with web-tree-sitter but with the ruby Language.

Let me know If you need anything else to figure out the issue.

Parse error on loop iterator

The file in question comes from metasploit-framework: enum_ie.rb.

Minimized example:

def is_86
  pid = session.sys.process.open.pid
  return session.sys.process.each_process.find { |i| i["pid"] == pid} ["arch"] == "x86"
end

It appears to be failing on the { |i| i["pid"] == pid} portion.

$ tree-sitter parse loop-iter.rb 
...
loop-iter.rb	0 ms	(ERROR [2, 47] - [2, 69])
$ ruby -c loop-iter.rb 
Syntax OK

Interpolations should not be parsed in quoted heredocs

The following heredoc is wrongly parsed. It looks like the single quotes have no effect.

x = <<'NO_INTERPOLATION'
This should not be interpolated #{interpolation}
NO_INTERPOLATION
program [0, 0] - [4, 0])
  assignment [0, 0] - [0, 24])
    left: identifier [0, 0] - [0, 1])
    right: heredoc_beginning [0, 4] - [0, 24])
  heredoc_body [0, 24] - [2, 16])
    interpolation [1, 32] - [1, 48])
      identifier [1, 34] - [1, 47])
    heredoc_end [2, 0] - [2, 16])

I think it should be

program [0, 0] - [4, 0])
  assignment [0, 0] - [0, 24])
    left: identifier [0, 0] - [0, 1])
    right: heredoc_beginning [0, 4] - [0, 24])
  heredoc_body [0, 24] - [2, 16])
    heredoc_end [2, 0] - [2, 16])

Various questions

Hey

I am sorry but i couldnt find an irc or something

I am trying to change the grammar output for personal needs a bit and i am wondering about some things:
is it possible to somehow generate a named node for optional even if it doesnt match? i tried choice between optional and nil_node where nil_node accepts empty string but this doesnt seem nice

overally is there a way to add a named "nil_node" in some places always? it should just be generated , equivalently to always matching an empty string, but i am not sure how to do it as it seems to break other stuff

Parse error on valid squiggly heredoc code

The title is a bit vague because I'm not sure what Ruby construct is causing this issue. I suspect it may be the squiggly heredoc, but I'm not sure. The following is a minimized example of a valid Ruby file in the Homebrew repository:

def_node_matcher :example_or_group_or_include?, <<~PATTERN
  {
    #{block_pattern(
      '{#Examples.all #ExampleGroups.all #Includes.all}'
    )}
    #{send_pattern('{#Examples.all #Includes.all}')}
  }
PATTERN
$ npx tree-sitter parse test.rb 
(program [0, 0] - [8, 0]
  (method_call [0, 0] - [0, 58]
    method: (identifier [0, 0] - [0, 16])
    arguments: (argument_list [0, 17] - [0, 58]
      (symbol [0, 17] - [0, 46])
      (heredoc_beginning [0, 48] - [0, 58])))
  (ERROR [0, 58] - [3, 56]
    (identifier [2, 6] - [2, 19])
    (string [3, 6] - [3, 56]))
  (heredoc_body [3, 56] - [7, 7]
    (interpolation [5, 4] - [5, 52]
      (method_call [5, 6] - [5, 51]
        method: (identifier [5, 6] - [5, 18])
        arguments: (argument_list [5, 18] - [5, 51]
          (string [5, 19] - [5, 50]))))
    (heredoc_end [7, 0] - [7, 7])))
test.rb	0 ms	(ERROR [0, 58] - [3, 56])
$ ruby -c test.rb 
Syntax OK

However, if we change the block_pattern portion slightly it successfully parses:

def_node_matcher :example_or_group_or_include?, <<~PATTERN
  {
    #{block_pattern('{#Examples.all #ExampleGroups.all #Includes.all}')}
    #{send_pattern('{#Examples.all #Includes.all}')}
  }
PATTERN
$ npx tree-sitter parse test.rb 
(program [0, 0] - [6, 0]
  (method_call [0, 0] - [0, 58]
    method: (identifier [0, 0] - [0, 16])
    arguments: (argument_list [0, 17] - [0, 58]
      (symbol [0, 17] - [0, 46])
      (heredoc_beginning [0, 48] - [0, 58])))
  (heredoc_body [0, 58] - [5, 7]
    (interpolation [2, 4] - [2, 72]
      (method_call [2, 6] - [2, 71]
        method: (identifier [2, 6] - [2, 19])
        arguments: (argument_list [2, 19] - [2, 71]
          (string [2, 20] - [2, 70]))))
    (interpolation [3, 4] - [3, 52]
      (method_call [3, 6] - [3, 51]
        method: (identifier [3, 6] - [3, 18])
        arguments: (argument_list [3, 18] - [3, 51]
          (string [3, 19] - [3, 50]))))
    (heredoc_end [5, 0] - [5, 7])))
$ ruby -c test.rb 
Syntax OK

Parse error on 'not pattern match' operator

The file in question comes from brew: regexp_match.rb.

The following is a reduced example:

def correct_operator(corrector, recv, arg, oper = nil)
  op_range = correction_range(recv, arg)

  replace_with_match_predicate_method(corrector, recv, arg, op_range)

  corrector.insert_after(arg.loc.expression, ')') unless op_range.source.end_with?('(')
  corrector.insert_before(recv.loc.expression, '!') if oper == :!~
end

It appears that the parse error is coming from :!~, which is the 'not pattern match' operator (!~).

$ tree-sitter parse not-pattern-match.rb 
(ERROR [0, 0] - [8, 0]
...
not-pattern-match.rb	1 ms	(ERROR [0, 0] - [8, 0])
$ ruby -c not-pattern-match.rb 
Syntax OK

Incorrectly parsing incorrect Ruby

def bar
en

should result in an error, but instead parses:

(program [0, 0] - [2, 0]
  (method [0, 0] - [2, 0]
    (identifier [0, 4] - [0, 7])
    (identifier [1, 0] - [1, 2])
    (string [2, 0] - [2, 0])))

Many node types share the same color with this grammar

I appreciate that tree-sitter is faster & more efficient than the old grammar, but the change is a bit disappointing. Symbols, constants, numbers, and boolean values (true/false) are now all the same color with all of the built-in themes I've tried (Base16 Tomorrow Dark, Atom Dark, One Dark).

Examples w/ Base16 Tomorrow Dark. Before:
screen shot 2018-10-24 at 5 40 38 pm

Tree-sitter:
screen shot 2018-10-24 at 5 40 25 pm

Was this intentional or is it just a WiP?

Cut new release with new ABI 69

My version of Node has

> process.platform
'linux'
> process.arch
'x64'
> process.versions.modules
'67'
> process.version
'v11.15.0'

can you recompile and release for 67 (and preemptively for 69)?

Incorrect precedence for indexing with comparison

While investigating #146 , I discovered that the tree for a comparison of an indexed object appears incorrect.

Consider the following program:

x [0] == 1

tree-sitter-ruby will emit the following CST:

(program [0, 0] - [1, 0]
  (call [0, 0] - [0, 10]
    method: (identifier [0, 0] - [0, 1])
    arguments: (argument_list [0, 2] - [0, 10]
      (binary [0, 2] - [0, 10]
        left: (array [0, 2] - [0, 5]
          (integer [0, 3] - [0, 4]))
        right: (integer [0, 9] - [0, 10])))))

That is, this is interpreted as equivalent to:

x([0] == 1)

whereas Ruby evaluates this as:

(x[0]) == 1

This can be confirmed by running

x = [1, 2, 3]
z = x [0] == 1
puts z

which prints true.

I believe the correct CST should be:

(program [0, 0] - [1, 0]
  (binary [0, 0] - [0, 10]
    left: (element_reference [0, 0] - [0, 5]
        object: (identifier [0, 0] - [0, 1])
        (integer [0, 3] - [0, 4]))
    right: (integer [0, 9] - [0, 10])))

Parse error on hex ASCII character code

The file in question comes from metasploit-framework: rbmysql.rb.

The following is a minimal example:

if ret[0] == ?\xff
  f, errno, marker, @sqlstate, message = ret.unpack("Cvaa5a*")
end

The parse error comes from ?\xff.

I'm not a Ruby expert, but this appears to be the ASCII character code for a hex character. See Numeric literals.

$ tree-sitter parse cond-hex.rb 
...
cond-hex.rb	0 ms	(ERROR [0, 13] - [0, 16])
$ ruby -c cond-hex.rb 
Syntax OK

Slow parsing with multiple '\r' characters

Parsing multiple \r characters seems to cause very slow parse times, e.g.

wgoKDQ0NDQ0NDQ0NDQ0NDQ0NDQ0NDQ0NDQ0NDWk=
00000000  c2 0a 0a 0d 0d 0d 0d 0d  0d 0d 0d 0d 0d 0d 0d 0d  |................|
00000010  0d 0d 0d 0d 0d 0d 0d 0d  0d 0d 0d 0d 69           |............i|
0000001d

I don't fully understand the code, but I believe it is related to the fall-through logic in scan_parser. Should this be something like:

case '\r':
  if (lexer->lookahead != '\n') return true;
  skip(lexer);
case '\n':

A lone \r isn't strictly whitespace but the interpreter seems to handle it anyway:

$ printf "puts\r123" | ruby
-:1: warning: encountered \r in middle of line, treated as a mere space
123

Parse error on rational 'r' suffix

test.rb:

Time.new(2002, 10, 31, 2, 2, 2.123456789r, "+02:00")
$ npx tree-sitter parse test.rb 
(program [0, 0] - [1, 0]
  (method_call [0, 0] - [0, 52]
    method: (call [0, 0] - [0, 8]
      receiver: (constant [0, 0] - [0, 4])
      method: (identifier [0, 5] - [0, 8]))
    arguments: (argument_list [0, 8] - [0, 52]
      (integer [0, 9] - [0, 13])
      (integer [0, 15] - [0, 17])
      (integer [0, 19] - [0, 21])
      (integer [0, 23] - [0, 24])
      (integer [0, 26] - [0, 27])
      (float [0, 29] - [0, 40])
      (ERROR [0, 40] - [0, 41])
      (string [0, 43] - [0, 51]))))
test.rb	0 ms	(ERROR [0, 40] - [0, 41])

This is the rational 'r' suffix for a literal: https://ruby-doc.org/core-2.5.0/Rational.html

Final `end` does not get highlighted after a block with a rescue inside a method

Description

Ruby 2.6 added the ability to do a rescue without a corresponding begin inside a block. That means code like

arr.each do |thing|
  begin
    try(thing)
  rescue error
    handle_it
  end
end

can be rewritten a little shorter (omitting the begin, and fixing the indentation):

arr.each do |thing|
  try(thing)
rescue error
  handle_it
end

Steps to Reproduce

  1. Put a block like the above into a method

Expected behavior:
All end tags are syntax highlighted

Actual behavior:
The method's end tag loses highlighting:
image

Reproduces how often:
100% of the time

Versions

atom --version

Atom    : 1.40.0
Electron: 3.1.10
Chrome  : 66.0.3359.181
Node    : 10.2.0

apm --version
apm  2.4.2
npm  6.2.0
node 10.2.1 x64
atom 1.40.0
python 3.7.2
git 2.16.2.windows.1
visual studio

OS: Windows 10 Enterprise (1809)

Additional Information

I originally logged this as atom/language-ruby#274

Unexpected position of comments in else

❯ cat test.rb
if a
else
  # comment
end

Produces the tree:

❯ tree-sitter parse test.rb
(program [0, 0] - [4, 0]
  (if [0, 0] - [3, 3]
    (identifier [0, 3] - [0, 4])
    (else [1, 0] - [2, 2])
    (comment [2, 2] - [2, 11])))

Instead I would expect the comment to be a child of else like so:

(program [0, 0] - [4, 0]
  (if [0, 0] - [3, 3]
    (identifier [0, 3] - [0, 4])
    (else [1, 0] - [2, 2])
      (comment [2, 2] - [2, 11])))

%w/%W/%i/%I children should be strings/symbols

Ruby’s %w/%W, and %i/%I syntaxes are array literals containing strings and symbols respectively. However, the parse tree won’t contain nodes with the appropriate names, even with the changes described in tree-sitter/tree-sitter#29.

I would like this source:

%w(hello world)

to end up with this parse tree:

(program (expression.literal.array (expression.literal.string) (expression.literal.string)))

However, note that the elements of the array would not be parsed via the string production; they’d be parsed via the array and named string sort of by fiat.

@maxbrunsfeld: Is there any way to assign names like that? API like this might do it:

rules: {
  array: choice(
    seq('[', commaSep($.expression), ']'),
    seq('%w(', sep({ string: /[^\s]+/ }, /\s+/), ')'),
    
  ),}

Motivated by #4.

Ruby dot call syntax accepts errant extra set of arguments

The grammar accepts the following code:

x.(123)(456)

And produces the following syntax tree:

(program [0, 0] - [1, 0]
  (method_call [0, 0] - [0, 12]
    (call [0, 0] - [0, 7]
      (identifier [0, 0] - [0, 1])
      (argument_list_with_parens [0, 2] - [0, 7]
        (integer [0, 3] - [0, 6])))
    (argument_list [0, 7] - [0, 12]
      (integer [0, 8] - [0, 11]))))

This is because the call rule accepts dot-call syntax including argument list, but method_call unconditionally sequences call with another argument list:

call: $ => prec.left(PREC.BITWISE_AND + 1, seq(
  $._primary,
  choice('.', '&.'),
  repeat($.heredoc_end),
  choice($.identifier, $.operator, $.constant, $.argument_list_with_parens)
)),

method_call: $ => {
  const receiver = choice($._variable, $.scope_resolution, $.call)

  return choice(
    seq(receiver, $.argument_list),
    seq(receiver, prec(PREC.CURLY_BLOCK, seq($.argument_list, $.block))),
    seq(receiver, prec(PREC.DO_BLOCK, seq($.argument_list, $.do_block))),
    prec(PREC.CURLY_BLOCK, seq(receiver, $.block)),
    prec(PREC.DO_BLOCK, seq(receiver, $.do_block))
  )
},

By comparison, Ruby rejects the example code:

$ ruby -c -e 'x.(123)(456)'
-e:1: syntax error, unexpected '(', expecting end-of-input
x.(123)(456)
        ^

Wrong tree produced for binary minus without surrounding spaces

@ivar-1

For that ruby code, tree-sitter currently produces this incorrect tree:

(program [0, 0] - [1, 0]
  (method_call [0, 0] - [0, 7]
    (instance_variable [0, 0] - [0, 5])
    (argument_list [0, 5] - [0, 7]
      (unary [0, 5] - [0, 7]
        (integer [0, 6] - [0, 7])))))

If you add in the spaces like so, you get the right tree.

@ivar - 1
(program [0, 0] - [1, 0]
  (binary [0, 0] - [0, 9]
    (instance_variable [0, 0] - [0, 5])
    (integer [0, 8] - [0, 9])))

Nested heredocs are not parsed correctly

puts <<HERE
  hello #{<<HERE}
  world
HERE
HERE

In the following parse tree the ranges of the heredoc bodies are not right

program [0, 0] - [6, 0])
  method_call [0, 0] - [0, 11])
    method: identifier [0, 0] - [0, 4])
    arguments: argument_list [0, 5] - [0, 11])
      heredoc_beginning [0, 5] - [0, 11])
  heredoc_body [0, 11] - [3, 4])
    interpolation [1, 8] - [1, 17])
      heredoc_beginning [1, 10] - [1, 16])
    heredoc_end [3, 0] - [3, 4])
  heredoc_body [3, 4] - [4, 4])
    heredoc_end [4, 0] - [4, 4])

No distinction between identifier and symbol hash keys

The grammar does not distinguish between identifiers used as hash keys and symbols using the 1.9 hash syntax:

$ node_modules/.bin/tree-sitter parse <(echo '{ foo: bar }')
(program [0, 0] - [1, 0]
  (hash [0, 0] - [0, 12]
    (pair [0, 2] - [0, 10]
      (identifier [0, 2] - [0, 5])
      (identifier [0, 7] - [0, 10]))))
$ node_modules/.bin/tree-sitter parse <(echo '{ foo => bar }')
(program [0, 0] - [1, 0]
  (hash [0, 0] - [0, 14]
    (pair [0, 2] - [0, 12]
      (identifier [0, 2] - [0, 5])
      (identifier [0, 9] - [0, 12]))))

The first example should parse as (pair (symbol) (identifier))

Incorrect parsing of a method given a hash without brackets inside of a block

First of all, thank you for this amazing project, which has made something I'm working on much easier to implement! In the process of using tree-sitter, I noticed that it produces an unexpected result for the following syntax:

format.json { render :key => value }

This produced the following syntax tree:

(program (method_call (call (identifier) (identifier)) (argument_list (hash (pair (method_call (identifier) (argument_list (symbol))) (identifier))))))

However, the expected result was:

(program (method_call (call (identifier) (identifier)) (block (method_call (identifier) (argument_list (pair (symbol) (identifier)))))))

Basically, it looks like tree-sitter is treating the contents of the curlies as a method call that's receiving a hash whose key is itself a method call, instead of a method call with an implicit hash as its only argument. For reference, the expected output is produced from any of the following forms:

format.json { render(:key => value) }
format.json { render {:key => value} }
format.json { render key: value }

Given that, it seems like the unexpected behavior is only triggered when using hash rockets. I tried parsing the code that produced the unexpected tree using whitequark's Ruby parser, and it correctly produced the block form.

Thanks again for this project, and sorry for being the bearer of bug news. If you need any more information, just let me know!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.