GithubHelp home page GithubHelp logo

tree-sitter / tree-sitter-python Goto Github PK

View Code? Open in Web Editor NEW
309.0 21.0 122.0 21.59 MB

Python grammar for tree-sitter

License: MIT License

Python 4.32% JavaScript 58.15% Scheme 4.86% C 32.67%
tree-sitter python parser

tree-sitter-python's Introduction

tree-sitter-python's People

Contributors

amaanq avatar aryx avatar aymannadeem avatar berchn avatar bm424 avatar calixteman avatar dcreager avatar drjdn avatar eloitor avatar hellebore avatar jasontatton avatar joshvera avatar lukepistrol avatar m-novikov avatar maxbrunsfeld avatar observeroftime avatar p-e-w avatar patrickt avatar resolritter avatar rewinfrey avatar robrix avatar samanpa avatar sjord avatar stsewd avatar tausbn avatar tclem avatar the-mikedavis avatar thehamsta avatar yoff avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tree-sitter-python's Issues

Runtime error while fuzzing

Hey, I've run libfuzzer on this, with the instructions from ikatyang/tree-sitter-markdown#14

and while it doesn't crash, it still reports these:

razze@razze:~/dev/tree-sitter$ ./out/python_fuzzer 
INFO: Seed: 1630354177
INFO: Loaded 1 modules   (3997 inline 8-bit counters): 3997 [0x677250, 0x6781ed), 
INFO: Loaded 1 PC tables (3997 PCs): 3997 [0x5deb38,0x5ee508), 
INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 4096 bytes
INFO: A corpus is not provided, starting from an empty corpus
#2      INITED cov: 398 ft: 399 corp: 1/1b exec/s: 0 rss: 40Mb
#3      NEW    cov: 438 ft: 620 corp: 2/2b lim: 4 exec/s: 0 rss: 40Mb L: 1/1 MS: 1 ChangeBit-
#5      NEW    cov: 450 ft: 720 corp: 3/4b lim: 4 exec/s: 0 rss: 41Mb L: 2/2 MS: 2 CrossOver-InsertByte-
#6      NEW    cov: 456 ft: 772 corp: 4/6b lim: 4 exec/s: 0 rss: 41Mb L: 2/2 MS: 1 InsertByte-
#7      NEW    cov: 456 ft: 871 corp: 5/10b lim: 4 exec/s: 0 rss: 41Mb L: 4/4 MS: 1 CrossOver-
#9      NEW    cov: 457 ft: 981 corp: 6/14b lim: 4 exec/s: 0 rss: 41Mb L: 4/4 MS: 2 CrossOver-ShuffleBytes-
#12     NEW    cov: 457 ft: 982 corp: 7/16b lim: 4 exec/s: 0 rss: 41Mb L: 2/4 MS: 3 ShuffleBytes-EraseBytes-EraseBytes-
#19     NEW    cov: 474 ft: 1093 corp: 8/18b lim: 4 exec/s: 0 rss: 41Mb L: 2/4 MS: 2 ShuffleBytes-ChangeByte-
#20     NEW    cov: 483 ft: 1104 corp: 9/20b lim: 4 exec/s: 0 rss: 41Mb L: 2/4 MS: 1 InsertByte-
#22     NEW    cov: 483 ft: 1122 corp: 10/23b lim: 4 exec/s: 0 rss: 41Mb L: 3/4 MS: 2 ShuffleBytes-InsertByte-
#24     NEW    cov: 483 ft: 1141 corp: 11/25b lim: 4 exec/s: 0 rss: 41Mb L: 2/4 MS: 2 CrossOver-ChangeByte-
#27     NEW    cov: 483 ft: 1166 corp: 12/29b lim: 4 exec/s: 0 rss: 41Mb L: 4/4 MS: 3 ChangeByte-EraseBytes-CopyPart-
#28     NEW    cov: 483 ft: 1167 corp: 13/30b lim: 4 exec/s: 0 rss: 41Mb L: 1/4 MS: 1 EraseBytes-
#33     NEW    cov: 490 ft: 1177 corp: 14/33b lim: 4 exec/s: 0 rss: 41Mb L: 3/4 MS: 5 ShuffleBytes-ChangeBit-CopyPart-ChangeBit-ChangeBit-
#34     NEW    cov: 490 ft: 1187 corp: 15/36b lim: 4 exec/s: 0 rss: 41Mb L: 3/4 MS: 1 InsertByte-
#35     NEW    cov: 490 ft: 1188 corp: 16/40b lim: 4 exec/s: 0 rss: 41Mb L: 4/4 MS: 1 CrossOver-
#38     NEW    cov: 490 ft: 1193 corp: 17/42b lim: 4 exec/s: 0 rss: 41Mb L: 2/4 MS: 3 ChangeBit-ChangeBit-CrossOver-
#39     NEW    cov: 496 ft: 1201 corp: 18/44b lim: 4 exec/s: 0 rss: 41Mb L: 2/4 MS: 1 InsertByte-
#40     NEW    cov: 498 ft: 1203 corp: 19/46b lim: 4 exec/s: 0 rss: 41Mb L: 2/4 MS: 1 ChangeByte-
test/fixtures/grammars/python/src/scanner.cc:106:24: runtime error: null pointer passed as argument 2, which is declared to never be null
/usr/include/string.h:44:28: note: nonnull attribute specified here
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior test/fixtures/grammars/python/src/scanner.cc:106:24 in 
test/fixtures/grammars/python/src/scanner.cc:130:14: runtime error: null pointer passed as argument 1, which is declared to never be null
/usr/include/string.h:44:28: note: nonnull attribute specified here
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior test/fixtures/grammars/python/src/scanner.cc:130:14 in 
        NEW_FUNC[1/8]: 0x55d610 in tree_sitter_python_external_scanner_serialize /home/razze/dev/tree-sitter/test/fixtures/grammars/python/src/scanner.cc:376
        NEW_FUNC[2/8]: 0x55d690 in (anonymous namespace)::Scanner::serialize(char*) /home/razze/dev/tree-sitter/test/fixtures/grammars/python/src/scanner.cc:99
[...]

async and await aren't keywords

They are pseudo-keywords since Python 3.5 and actual keywords since Python 3.7. There's a lot of async/await code in the wild already. "pseudo-keyword" handling requires a bit of magic, you can see what lib2to3's Grammar.txt is doing in the grammar for that. Look for "ASYNC" and "AWAIT" there.

Example of misformatted code:

Fail to parse lists with binary expressions involving floats

Examples:

def a(self):
    [.1-.0]

The above example produces an ERROR:

(function_definition [0, 0] - [2, 0]
    (identifier [0, 4] - [0, 5])
    (parameters [0, 5] - [0, 11]
      (identifier [0, 6] - [0, 10]))
    (expression_statement [1, 4] - [1, 11]
      (list [1, 4] - [1, 11]
        (float [1, 5] - [1, 8])
        (ERROR [1, 8] - [1, 10]
          (integer [1, 9] - [1, 10])))))

def a(self):
    [.1-0]

The above example does not produce an error, but is not the correct tree:

(function_definition [3, 0] - [5, 0]
    (identifier [3, 4] - [3, 5])
    (parameters [3, 5] - [3, 11]
      (identifier [3, 6] - [3, 10]))
    (expression_statement [4, 4] - [4, 10]
      (list [4, 4] - [4, 10]
        (float [4, 5] - [4, 9])))))

Error with BOM

There is a parsing error for a file beginning with a bom (\uFEFF).
What's weird here is that it should be handled correctly because \uFEFF appears in the extras list.

Python parsing does not work well with empty blocks

In the playground, if I type in

def foo():

I get

module [0, 0] - [1, 0])
  ERROR [0, 0] - [0, 10])
    identifier [0, 4] - [0, 7])
    parameters [0, 7] - [0, 9])

If I add a statement at indent level 0:

def foo():

print "hi"

I get

module [0, 0] - [3, 0])
  function_definition [0, 0] - [2, 10])
    name: identifier [0, 4] - [0, 7])
    parameters: parameters [0, 7] - [0, 9])
    body: block [2, 0] - [2, 10])
      print_statement [2, 0] - [2, 10])
        argument: string [2, 6] - [2, 10])

I believe this reveals a flaw in the indentation sensitivity of the python parser. Ideally:

  1. print "hi" should not get parsed as a child of the function_definition.

  2. The empty block should be filled in with a MISSING recovery of insertion of pass

I don't know much about tree sitter grammar construction, so I am not sure if fixing this is feasible. Similar issues occur with unterminated parenthesis at the end of the block, incomplete infix operators, etc.

custumize sexp (s-expression) output

Hey forum,

how do I set the style of a s expression? E.g. I want to catch all the if statements, I use a query like this:(if_statement(condition_clause(_)) @condition_clause.inner) to get them. So lets say I matched on this line of code in c++

if (x==2) {doSomething}
--> (condition clause value (binary_expression left: (identifier) right: (number_literal)))
However, i'd rather like this style:

if (x==2) {doSomething}
--> (condition clause value (binary_expression "=="( (identifier) (number_literal)))
where we drop left/right and get more detail on the actual binary_expression, i.e. what kind of expression, etc. Can this be done without parsing the result again?

Support for PEP 570--Positional-only Parameters

The parser appears to generate an error on function definitions with positional-only parameters:

def name(p1, p2=None, /, p_or_kw=None, *, kw):
  pass
module [0, 0] - [2, 0])
  function_definition [0, 0] - [1, 6])
    name: identifier [0, 4] - [0, 8])
    parameters: parameters [0, 8] - [0, 45])
      identifier [0, 9] - [0, 11])
      default_parameter [0, 13] - [0, 20])
        name: identifier [0, 13] - [0, 15])
        value: none [0, 16] - [0, 20])
      ERROR [0, 22] - [0, 24])
      default_parameter [0, 25] - [0, 37])
        name: identifier [0, 25] - [0, 32])
        value: none [0, 33] - [0, 37])
      list_splat_pattern [0, 39] - [0, 40])
      identifier [0, 42] - [0, 44])
    body: block [1, 2] - [1, 6])
      pass_statement [1, 2] - [1, 6])

Comprehensions and lists of lambda expressions with final conditional expression parse tree is incorrect

Given the following valid Python2 code:

[ x() for x in lambda: True, lambda: False if x() ]

The tree we produce today:

(module [0, 0] - [1, 0]
  (expression_statement [0, 0] - [0, 51]
    (list_comprehension [0, 0] - [0, 51]
      (call [0, 2] - [0, 5]
        (identifier [0, 2] - [0, 3])
        (argument_list [0, 3] - [0, 5]))
      (variables [0, 10] - [0, 11]
        (identifier [0, 10] - [0, 11]))
      (lambda [0, 15] - [0, 27]
        (true [0, 23] - [0, 27]))
      (lambda [0, 29] - [0, 49]
        (conditional_expression [0, 37] - [0, 49]
          (false [0, 37] - [0, 42])
          (call [0, 46] - [0, 49]
            (identifier [0, 46] - [0, 47])
            (argument_list [0, 47] - [0, 49])))))))

The conditional_expression subtree is contained within the final lambda expression in the list comprehension. The conditional_expression though should be a sibling to the lambda expression. The conditional_expression's then case is the value of x() when the x() call is truthy.

The result of this in the repl:

>>> [ x() for x in lambda: True, lambda: False if x() ]
[True]

Matching method call

Hi! I am coming from Atom's language-python.

We need to be able to detect a method call (the priority now) or object property access. How can we do this using tree-sitter?

import json
obj = 1234

json.dumps( )
#         ^meta.method-call.python

I have an on-going pull request which tries to do this.
atom/language-python#325

f-string contents aren't formatted

Formatted expressions in f-strings are regular Python code. They should be formatted as such.

Example done by MagicPython (opening f" highlighted by me):

Tuples do not support splatting

Given

foo = [โ€œbarโ€, โ€œbazโ€]
print ((โ€œbafโ€, *foo))

the result should be (โ€œbafโ€, โ€œbarโ€, โ€œbazโ€). The parser currently chokes on the * character.

`left` incorrectly appears twice in boolean_operator grammar rule

The grammar rule for boolean_operator says left twice in the first bit when it should be right:

    boolean_operator: $ => choice(
      prec.left(PREC.and, seq(
        field('left', $._expression),
        field('operator', 'and'),
        field('left', $._expression)
      )),
      prec.left(PREC.or, seq(
        field('left', $._expression),
        field('operator', 'or'),
        field('right', $._expression)
      ))
    ),

Punctuation doesn't have its own class

By punctuation I essentially mean brackets, dots, commas, colons. This makes it hard to color them differently from regular text.

Example where coloring punctuation would make text easier to read:

Debug notation with = is not supported in f-strings

Python 3.8 introduced a syntax to display both the unevaluated and evaluated value of an expression that tree-sitter-python fails to parser.

The following file

print(f'{question=} {points=}')

generates the following AST

expression_statement [0, 0] - [0, 31]
  call [0, 0] - [0, 31]
    function: identifier [0, 0] - [0, 5]
    arguments: argument_list [0, 5] - [0, 31]
      string [0, 6] - [0, 30]
        interpolation [0, 8] - [0, 19]
          identifier [0, 9] - [0, 17]
          ERROR [0, 17] - [0, 18]
        interpolation [0, 20] - [0, 29]
          identifier [0, 21] - [0, 27]
          ERROR [0, 27] - [0, 28]

If question would be "What is 1+1" and points"5"? The above expression would print question='What is 1+1?' points=5.

Parse `import __future__ from x` as a future import statement

According to the Python docs, future statements are a special form of import statements.

Currently parsing the following code

from __future__ import print_function

produces

(module [0, 0] - [1, 0]
  (import_from_statement [0, 0] - [0, 37]
    (dotted_name [0, 5] - [0, 15]
      (identifier [0, 5] - [0, 15]))
    (dotted_name [0, 23] - [0, 37]
      (identifier [0, 23] - [0, 37]))))

but ideally this would emit a tree like:

(module [0, 0] - [1, 0]
  (future_import_statement [0, 0] - [0, 37]
    (dotted_name [0, 23] - [0, 37]
      (identifier [0, 23] - [0, 37]))))

Incorrect parse tree for comparison operator

Given the expression:

assert isinstance(copy2, variables.Variable)

The parse tree generated is:

(module [0, 0] - [1, 0]
  (expression_statement [0, 0] - [0, 44]
    (comparison_operator [0, 0] - [0, 44]
      (identifier [0, 0] - [0, 6])
      (call [0, 9] - [0, 44]
        (identifier [0, 9] - [0, 17])
        (identifier [0, 18] - [0, 23])
        (attribute [0, 25] - [0, 43]
          (identifier [0, 25] - [0, 34])
          (identifier [0, 35] - [0, 43]))))))

The comparison_operator rule contains in, and it appears the isinstance call is being incorrectly labeled as a comparison_operator because it's matching in. Is that what you think is happening @maxbrunsfeld?

Tree Structure is not coming right

I am trying to get the tree for following code.
get_data_dir() + os.path.sep

Ideally I excpect that there will be a binary_expression/operator at the start. Then Left side will be call and right side would be attribute.

However, the structure is coming as follows. I feel that it is wrong.
Can somebody tell me how I could able to get the correct tree structure? (I changed the code to "Javascript" and the structure came correct. So I think there is something wrong with python-parser)

module [0, 0] - [1, 0])
  expression_statement [0, 0] - [0, 28])
    attribute [0, 0] - [0, 28])
      object: attribute [0, 0] - [0, 24])
        object: binary_operator [0, 0] - [0, 19])
          left: call [0, 0] - [0, 14])
            function: identifier [0, 0] - [0, 12])
            arguments: argument_list [0, 12] - [0, 14])
          right: identifier [0, 17] - [0, 19])
        attribute: identifier [0, 20] - [0, 24])
      attribute: identifier [0, 25] - [0, 28])

Detect placeholders in pre-formatted strings

I would love it if Tree-sitter handled placeholders in regular strings:

'My name is %12s' % 'Slim Shady'
'My name is %(name)12s' % {'name': 'Slim Shady'}
'My name is {:>12}'.format('Slim Shady')
'My name is {name:>12}'.format(name='Slim Shady')
'My name is {names[0]:>12}'.format(names=['Slim Shady'])

More examples:
https://pyformat.info

MagicPython output with Atom's One Dark theme:
s

except_clause no except_item

For the following code:

try:
  pass
except Exception as e:
  pass
...   except_clause )
        identifier )
        identifier )
        block ) ...

It would be helpful to have an 'except_item' similarly to 'with_item', helping distinguish between the identifiers:

    with_item )
      value: identifier )
      alias: identifier )

.. # suggestion
    except_item )
      value: identifier )
      alias: identifier )

Incorrect parse tree for subscript_expression

For the following code:
x[0] + x[0]

module [0, 0] - [1, 0])
  expression_statement [0, 0] - [0, 11])
    subscript [0, 0] - [0, 11])
      value: binary_operator [0, 0] - [0, 8])
        left: subscript [0, 0] - [0, 4])
          value: identifier [0, 0] - [0, 1])
          subscript: integer [0, 2] - [0, 3])
        right: identifier [0, 7] - [0, 8])
      subscript: integer [0, 9] - [0, 10])

Same code for JavaScript produces the correct result

program [0, 0] - [1, 0])
  expression_statement [0, 0] - [0, 11])
    binary_expression [0, 0] - [0, 11])
      left: subscript_expression [0, 0] - [0, 4])
        object: identifier [0, 0] - [0, 1])
        index: number [0, 2] - [0, 3])
      right: subscript_expression [0, 7] - [0, 11])
        object: identifier [0, 7] - [0, 8])
        index: number [0, 9] - [0, 10])

(i.e. the subscript operation should be encapsulated within the binary operator and not the other way around)

Precedence of **

In python the power operator is right associative so:

x=2**2**3
print(x)

returns 256 and not 64. Also, it binds stronger than unary minus so:

x=-2**2
print(x)

returns -4 and not 4.

Best regards,
Jason

Implicit narrowing in serialize might be problematic

I created this issue tree-sitter/tree-sitter-agda#2 -- I just realized that the code there is based off the code here, so you have the same issue (uint16_t gets narrowed to char):

vector<uint16_t>::iterator
iter = indent_length_stack.begin() + 1,
end = indent_length_stack.end();
for (; iter != end && i < TREE_SITTER_SERIALIZATION_BUFFER_SIZE; ++iter) {
buffer[i++] = *iter;
}

I submitted a PR there to fix the wraparound due to narrowing -- would you like me to do the same here?

On a related note, I noticed this commit 250cc78 where you clamp the length of delimiter stack -- isn't that a logic error (because you're throwing away delimiters)? Is it that you expect that limit to never be reached in a practical situation, so you've written it this way to keep the code simple?

atm language build failed: Python 3.6.5 not supported by gyp

Python language build error when running: apm install python-debugger language-python. Is tree-sitter incompatible with newer versions of python?

npm ERR! node v6.9.5
npm ERR! npm  v3.10.10
npm ERR! code ELIFECYCLE

npm ERR! [email protected] install: `node-gyp rebuild`
npm ERR! Exit status 1
npm ERR! 
npm ERR! Failed at the [email protected] install script 'node-gyp rebuild'.
npm ERR! Make sure you have the latest version of node.js and npm installed.
npm ERR! If you do, this is most likely a problem with the tree-sitter-python package,
npm ERR! not with npm itself.
npm ERR! Tell the author that this fails on your system:
npm ERR!     node-gyp rebuild
npm ERR! You can get information on how to open an issue for this project with:
npm ERR!     npm bugs tree-sitter-python
npm ERR! Or if that isn't available, you can get their info via:
npm ERR!     npm owner ls tree-sitter-python
npm ERR! There is likely additional logging output above.

npm ERR! Please include the following file with any support request:
npm ERR!     /private/var/folders/1_/3cmgrq_n4y16hz37wm7mfnkm0000gn/T/apm-install-dir-118512-42406-1tygjwq.yermqp8pvi/npm-debug.log
npm ERR! code 1

No /private/var/folders/1_/3cmgrq_n4y16hz37wm7mfnkm0000gn/T/apm-install-dir-118512-42406-1tygjwq.yermqp8pvi/npm-debug.log file available.

Got a more verbose error when attempting to build tree-sitter-cli 'manually' using npm install tree-sitter-cli

gyp ERR! configure error 
gyp ERR! stack Error: Python executable "/anaconda3/bin/python" is v3.6.5, which is not supported by gyp.
gyp ERR! stack You can pass the --python switch to point to Python >= v2.5.0 & < 3.0.0.
gyp ERR! stack     at PythonFinder.failPythonVersion (/usr/local/lib/node_modules/npm/node_modules/node-gyp/lib/configure.js:492:19)
gyp ERR! stack     at PythonFinder.<anonymous> (/usr/local/lib/node_modules/npm/node_modules/node-gyp/lib/configure.js:474:14)
gyp ERR! stack     at ChildProcess.exithandler (child_process.js:282:7)
gyp ERR! stack     at ChildProcess.emit (events.js:182:13)
gyp ERR! stack     at maybeClose (internal/child_process.js:961:16)
gyp ERR! stack     at Socket.stream.socket.on (internal/child_process.js:380:11)
gyp ERR! stack     at Socket.emit (events.js:182:13)
gyp ERR! stack     at Pipe._handle.close [as _onclose] (net.js:595:12)
gyp ERR! System Darwin 17.5.0
gyp ERR! command "/usr/local/Cellar/node/10.4.0/bin/node" "/usr/local/lib/node_modules/npm/node_modules/node-gyp/bin/node-gyp.js" "rebuild"
gyp ERR! cwd /Users/milk/node_modules/tree-sitter
gyp ERR! node -v v10.4.0
gyp ERR! node-gyp -v v3.6.2
gyp ERR! not ok 

Is it possible to provide a condensed/readable version of grammar.json (similar to the python reference grammar format)

I am new to tree-sitter and my understanding is that the python grammar specified in grammar.json is based on the Python reference grammar. However, it isn't exactly 1 to 1 with the reference grammar, so I've been wondering what the differences are. I realize that the differences are reflected in grammar.json , but that file is difficult to read to get a summary view.

This leads me to ask, is it possible to provide the tree-sitter python grammar in a more readable and condensed format, similar to how the Python reference is presented (https://docs.python.org/3.8/reference/grammar.html)?

If it can't be provided, do you know if it is relatively trivial to generate, or are there some theoretical reasons it couldn't be converted into something like EBNF?

Thanks

Undefined symbols for architecture x86_64 on OSX Big Sur

Hi tree-sitter-python,

I was able to develop using tree-sitter-python without any issues until I upgraded my computer to OSX Big Sur. Now I'm running into the following error:

Python 3.7.7 (default, Mar 10 2020, 16:11:21) 
[Clang 11.0.0 (clang-1100.0.33.12)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from tree_sitter import Language
>>> Language.build_library('languages.so', ["tree-sitter-python"])
Undefined symbols for architecture x86_64:
  "std::__1::__vector_base_common<true>::__throw_length_error() const", referenced from:
      std::__1::vector<unsigned short, std::__1::allocator<unsigned short> >::__recommend(unsigned long) const in scanner.o
      std::__1::vector<(anonymous namespace)::Delimiter, std::__1::allocator<(anonymous namespace)::Delimiter> >::__recommend(unsigned long) const in scanner.o
  "std::logic_error::logic_error(char const*)", referenced from:
      std::length_error::length_error(char const*) in scanner.o
  "std::length_error::~length_error()", referenced from:
      std::__1::__throw_length_error(char const*) in scanner.o
  "std::terminate()", referenced from:
      ___clang_call_terminate in scanner.o
  "typeinfo for std::length_error", referenced from:
      std::__1::__throw_length_error(char const*) in scanner.o
  "vtable for std::length_error", referenced from:
      std::length_error::length_error(char const*) in scanner.o
  NOTE: a missing vtable usually means the first non-inline virtual member function has no definition.
  "operator delete(void*)", referenced from:
      _tree_sitter_python_external_scanner_create in scanner.o
      _tree_sitter_python_external_scanner_destroy in scanner.o
      std::__1::_DeallocateCaller::__do_call(void*) in scanner.o
  "operator new(unsigned long)", referenced from:
      _tree_sitter_python_external_scanner_create in scanner.o
      std::__1::__libcpp_allocate(unsigned long, unsigned long) in scanner.o
  "___cxa_allocate_exception", referenced from:
      std::__1::__throw_length_error(char const*) in scanner.o
  "___cxa_begin_catch", referenced from:
      ___clang_call_terminate in scanner.o
  "___cxa_call_unexpected", referenced from:
      std::__1::__vector_base<(anonymous namespace)::Delimiter, std::__1::allocator<(anonymous namespace)::Delimiter> >::__destruct_at_end((anonymous namespace)::Delimiter*) in scanner.o
      std::__1::allocator<(anonymous namespace)::Delimiter>::deallocate((anonymous namespace)::Delimiter*, unsigned long) in scanner.o
      std::__1::__vector_base<unsigned short, std::__1::allocator<unsigned short> >::__destruct_at_end(unsigned short*) in scanner.o
      std::__1::allocator<unsigned short>::deallocate(unsigned short*, unsigned long) in scanner.o
      std::__1::vector<(anonymous namespace)::Delimiter, std::__1::allocator<(anonymous namespace)::Delimiter> >::__destruct_at_end((anonymous namespace)::Delimiter*) in scanner.o
      std::__1::vector<unsigned short, std::__1::allocator<unsigned short> >::max_size() const in scanner.o
      std::__1::__split_buffer<unsigned short, std::__1::allocator<unsigned short>&>::__destruct_at_end(unsigned short*, std::__1::integral_constant<bool, false>) in scanner.o
      ...
  "___cxa_free_exception", referenced from:
      std::__1::__throw_length_error(char const*) in scanner.o
  "___cxa_throw", referenced from:
      std::__1::__throw_length_error(char const*) in scanner.o
  "___gxx_personality_v0", referenced from:
      _tree_sitter_python_external_scanner_create in scanner.o
      _tree_sitter_python_external_scanner_destroy in scanner.o
      (anonymous namespace)::Scanner::Scanner() in scanner.o
      std::__1::__vector_base<(anonymous namespace)::Delimiter, std::__1::allocator<(anonymous namespace)::Delimiter> >::__destruct_at_end((anonymous namespace)::Delimiter*) in scanner.o
      std::__1::allocator<(anonymous namespace)::Delimiter>::deallocate((anonymous namespace)::Delimiter*, unsigned long) in scanner.o
      std::__1::__vector_base<unsigned short, std::__1::allocator<unsigned short> >::__destruct_at_end(unsigned short*) in scanner.o
      std::__1::allocator<unsigned short>::deallocate(unsigned short*, unsigned long) in scanner.o
      ...
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
Traceback (most recent call last):
  File "/System/Volumes/Data/export/apps/python/3.7.7/lib/python3.7/distutils/unixccompiler.py", line 204, in link
    self.spawn(linker + ld_args)
  File "/System/Volumes/Data/export/apps/python/3.7.7/lib/python3.7/distutils/ccompiler.py", line 910, in spawn
    spawn(cmd, dry_run=self.dry_run)
  File "/System/Volumes/Data/export/apps/python/3.7.7/lib/python3.7/distutils/spawn.py", line 36, in spawn
    _spawn_posix(cmd, search_path, dry_run=dry_run)
  File "/System/Volumes/Data/export/apps/python/3.7.7/lib/python3.7/distutils/spawn.py", line 159, in _spawn_posix
    % (cmd, exit_status))
distutils.errors.DistutilsExecError: command 'cc' failed with exit status 1

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "python3.7/site-packages/tree_sitter/__init__.py", line 72, in build_library
    compiler.link_shared_object(object_paths, output_path)
  File "/System/Volumes/Data/export/apps/python/3.7.7/lib/python3.7/distutils/ccompiler.py", line 717, in link_shared_object
    extra_preargs, extra_postargs, build_temp, target_lang)
  File "/System/Volumes/Data/export/apps/python/3.7.7/lib/python3.7/distutils/unixccompiler.py", line 206, in link
    raise LinkError(msg)
distutils.errors.LinkError: command 'cc' failed with exit status 1

To add to this mystery I've been able to build tree-sitter-java, tree-sitter-javascript, and tree-sitter-scala successfully, however also run into the error when attempting tree-sitter-cpp.

I've played around with different versions of python 3 (to see if clang was the issue) to no avail. Ex on python 3.6:

Python 3.6.10 (default, Jan 16 2020, 13:37:48) 
[GCC 4.2.1 Compatible Apple LLVM 10.0.1 (clang-1001.0.46.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from tree_sitter import Language
>>> Language.build_library('languages.so', ["tree-sitter-python"])
Undefined symbols for architecture x86_64:
  "std::__1::__vector_base_common<true>::__throw_length_error() const", referenced from:
      std::__1::vector<unsigned short, std::__1::allocator<unsigned short> >::__recommend(unsigned long) const in scanner.o
      std::__1::vector<(anonymous namespace)::Delimiter, std::__1::allocator<(anonymous namespace)::Delimiter> >::__recommend(unsigned long) const in scanner.o
...

I've also tried different versions of tree-sitter and tree-sitter-python and seen the error consistently in every variation: tree-sitter 0.19.0 and tree-sitter-python 0.19.0, as well as tree-sitter 0.2.0 and the versions of tree-sitter-python 0.13.6 - 0.17.1.

I also have a linux machine and don't have any problems over there.

Do you know where this issue is coming from and how to get around it?

Thank you!

Lack of body does not cause error

Trying out the Python playground, I entered this code:

def foo():

And got this result:

module [0, 0] - [1, 0])
  function_definition [0, 0] - [0, 10])
    name: identifier [0, 4] - [0, 7])
    parameters: parameters [0, 7] - [0, 9])
    body: block [0, 10] - [0, 10])

Even though not having a body in a function is a syntax error in Python 3. Perhaps this is related to the error construction behavior, and I misunderstand how Tree-Sitter works, but it seems to me like there should be an error node in there.

`tree-sitter-python` parses file with errors, but tree has no errors

I would expect this not to parse/parse with errors:

def foo():
foo()

But AFAICT there are no ERROR nodes in the tree, nor are there any nodes for which hasError() returns true. It looks like this was intentionally added in #65 ("As is often the case, I think the most practical fix is to allow a superset of what Python really allows, and treat empty blocks as valid blocks.")

I have a higher-level question: if I want to know whether a file parses correctly, can I do that with tree-sitter? Or is that not what tree-sitter is for?

Installation fails

Running npm install -g tree-sitter-python on macOS 10.14.6, I get this output:

> [email protected] install /usr/local/lib/node_modules/tree-sitter-python
> node-gyp rebuild

  CC(target) Release/obj.target/tree_sitter_python_binding/src/parser.o
  CXX(target) Release/obj.target/tree_sitter_python_binding/src/binding.o
../src/binding.cc:13:6: error: variable has incomplete type 'void'
void Init(Handle<Object> exports, Handle<Object> module) {
     ^
../src/binding.cc:13:11: error: use of undeclared identifier 'Handle'
void Init(Handle<Object> exports, Handle<Object> module) {
          ^
../src/binding.cc:13:18: error: 'Object' does not refer to a value
void Init(Handle<Object> exports, Handle<Object> module) {
                 ^
/Users/chbk/Library/Caches/node-gyp/12.9.1/include/node/v8.h:3369:17: note: 
      declared here
class V8_EXPORT Object : public Value {
                ^
../src/binding.cc:13:26: error: use of undeclared identifier 'exports'
void Init(Handle<Object> exports, Handle<Object> module) {
                         ^
../src/binding.cc:13:35: error: use of undeclared identifier 'Handle'
void Init(Handle<Object> exports, Handle<Object> module) {
                                  ^
../src/binding.cc:13:42: error: 'Object' does not refer to a value
void Init(Handle<Object> exports, Handle<Object> module) {
                                         ^
/Users/chbk/Library/Caches/node-gyp/12.9.1/include/node/v8.h:3369:17: note: 
      declared here
class V8_EXPORT Object : public Value {
                ^
../src/binding.cc:13:50: error: use of undeclared identifier 'module'
void Init(Handle<Object> exports, Handle<Object> module) {
                                                 ^
../src/binding.cc:13:57: error: expected ';' after top level declarator
void Init(Handle<Object> exports, Handle<Object> module) {
                                                        ^
                                                        ;
8 errors generated.
make: *** [Release/obj.target/tree_sitter_python_binding/src/binding.o] Error 1
gyp ERR! build error 
gyp ERR! stack Error: `make` failed with exit code: 2
gyp ERR! stack     at ChildProcess.onExit (/usr/local/lib/node_modules/npm/node_modules/node-gyp/lib/build.js:196:23)
gyp ERR! stack     at ChildProcess.emit (events.js:209:13)
gyp ERR! stack     at Process.ChildProcess._handle.onexit (internal/child_process.js:272:12)
gyp ERR! System Darwin 18.7.0
gyp ERR! command "/usr/local/Cellar/node/12.9.1/bin/node" "/usr/local/lib/node_modules/npm/node_modules/node-gyp/bin/node-gyp.js" "rebuild"
gyp ERR! cwd /usr/local/lib/node_modules/tree-sitter-python
gyp ERR! node -v v12.9.1
gyp ERR! node-gyp -v v5.0.3
gyp ERR! not ok 
npm ERR! code ELIFECYCLE
npm ERR! errno 1
npm ERR! [email protected] install: `node-gyp rebuild`
npm ERR! Exit status 1
npm ERR! 
npm ERR! Failed at the [email protected] install script.
npm ERR! This is probably not a problem with npm. There is likely additional logging output above.

Related to atom/language-python#310 (comment)

Error in build with tree-sitter-cli 0.7.3: "Unknown rule type: ALIAS"

In a fresh clone, if I run npm install and then npm run build, I get the following error:

$ npm run build

> [email protected] build /tmp/tree-sitter-python
> tree-sitter generate && node-gyp build

Error: Invalid choice member: Unknown rule type: ALIAS

npm ERR! Linux 4.9.0-3-amd64
npm ERR! argv "/usr/bin/node" "/usr/bin/npm" "run" "build"
npm ERR! node v6.11.4
npm ERR! npm  v3.10.10
npm ERR! code ELIFECYCLE
npm ERR! [email protected] build: `tree-sitter generate && node-gyp build`
npm ERR! Exit status 1
[...]

Similarly if I simply run node_modules/.bin/tree-sitter generate, I get

Error: Invalid choice member: Unknown rule type: ALIAS

If after the npm install I add npm install [email protected], then npm run build works great. Ditto with 0.7.2.

`print()` as expression, or statement?

In Python 2, print() prints: ().

In Python 3, print() prints ``.

Python 2 treats print() as a print statement for the empty tuple.

Python 3 treats print() as an expression with no arguments.

I may be mistaken, but I believe we have to make a fundamental decision about how the Python grammar will interpret this, because we do not differentiate between Python versions.

@maxbrunsfeld, my thought is to parse print() according to the native AST produced by Python 3. My simple mind thinks, "Python 3 code will likely outlive Python 2 code." What would you suggest?

Current behavior on master parses this according to Python 2:

(print_statement (tuple))

I'd propose instead:

(expression_statement (call (identifier) (argument_list)))

"yield from" not recognized

The parser does not recognize "yield from" generators and classifies "from" as an identifier

Example:

yield from block_iteration(self.blocks, ghost_layers, self.dim, prefix)

is parsed as

program [0, 0] - [2, 0])
  yield [0, 0] - [0, 28])
    argument_list [0, 6] - [0, 28])
      method_call [0, 6] - [0, 28])
        method: identifier [0, 6] - [0, 10])
        arguments: argument_list [0, 11] - [0, 28])
          method_call [0, 11] - [0, 28])
            method: identifier [0, 11] - [0, 26])
            arguments: argument_list [0, 26] - [0, 28])

Further reference: http://simeonvisser.com/posts/python-3-using-yield-from-in-generators-part-1.html

New patch release

Master contains rather important fixes and improvements such as:
#39
#43
#44

would you please consider cutting a new release?

Why is C++ used rather than C

Compared to most other tree sitter grammars, the Python one uses C++, i. e. scanner.cc which makes it much harder to use (because of the requirements to link the application with libstdc++) . I haven't managed to get it working using the Rust bindings on stable. Is there a good reason? It contradicts one of the stated goals of tree sitter:

Dependency-free so that the runtime library (which is written in pure C) can be embedded in any application

Docstrings not recognized

PEP 257

A docstring is a string literal that occurs as the first statement in a module, function, class, or method definition.

Can this be incorporated into the grammar?

Should expression be a subtype of expression_statement instead of a child?

In grammar.js, expression is a choice of expression_statement: https://github.com/tree-sitter/tree-sitter-python/blob/master/grammar.js#L183-L184

In node-types.json, expression is a child of expression_statement: https://github.com/tree-sitter/tree-sitter-python/blob/master/src/node-types.json#L1164-L1182

Is this correct? Since expression is a choice of expression_statement, I would have expected expression to be a subtype of expression_statement in node-types.json, not a child.

(I'm interested in this because I want to identify nodes that are valid (syntactically correct) Python by themselves so that I can send them to a REPL. My method for doing this is to check whether they are subtypes of _compound_statement or _simple_statement. But this breaks down for expression_statement, which has children instead of subtypes.)

Thank you for making tree-sitter-python!

EDIT: Fixed link.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.