Python grammar for tree-sitter.
tree-sitter / tree-sitter-python Goto Github PK
View Code? Open in Web Editor NEWPython grammar for tree-sitter
License: MIT License
Python grammar for tree-sitter
License: MIT License
Python grammar for tree-sitter.
Hey, I've run libfuzzer on this, with the instructions from ikatyang/tree-sitter-markdown#14
and while it doesn't crash, it still reports these:
razze@razze:~/dev/tree-sitter$ ./out/python_fuzzer
INFO: Seed: 1630354177
INFO: Loaded 1 modules (3997 inline 8-bit counters): 3997 [0x677250, 0x6781ed),
INFO: Loaded 1 PC tables (3997 PCs): 3997 [0x5deb38,0x5ee508),
INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 4096 bytes
INFO: A corpus is not provided, starting from an empty corpus
#2 INITED cov: 398 ft: 399 corp: 1/1b exec/s: 0 rss: 40Mb
#3 NEW cov: 438 ft: 620 corp: 2/2b lim: 4 exec/s: 0 rss: 40Mb L: 1/1 MS: 1 ChangeBit-
#5 NEW cov: 450 ft: 720 corp: 3/4b lim: 4 exec/s: 0 rss: 41Mb L: 2/2 MS: 2 CrossOver-InsertByte-
#6 NEW cov: 456 ft: 772 corp: 4/6b lim: 4 exec/s: 0 rss: 41Mb L: 2/2 MS: 1 InsertByte-
#7 NEW cov: 456 ft: 871 corp: 5/10b lim: 4 exec/s: 0 rss: 41Mb L: 4/4 MS: 1 CrossOver-
#9 NEW cov: 457 ft: 981 corp: 6/14b lim: 4 exec/s: 0 rss: 41Mb L: 4/4 MS: 2 CrossOver-ShuffleBytes-
#12 NEW cov: 457 ft: 982 corp: 7/16b lim: 4 exec/s: 0 rss: 41Mb L: 2/4 MS: 3 ShuffleBytes-EraseBytes-EraseBytes-
#19 NEW cov: 474 ft: 1093 corp: 8/18b lim: 4 exec/s: 0 rss: 41Mb L: 2/4 MS: 2 ShuffleBytes-ChangeByte-
#20 NEW cov: 483 ft: 1104 corp: 9/20b lim: 4 exec/s: 0 rss: 41Mb L: 2/4 MS: 1 InsertByte-
#22 NEW cov: 483 ft: 1122 corp: 10/23b lim: 4 exec/s: 0 rss: 41Mb L: 3/4 MS: 2 ShuffleBytes-InsertByte-
#24 NEW cov: 483 ft: 1141 corp: 11/25b lim: 4 exec/s: 0 rss: 41Mb L: 2/4 MS: 2 CrossOver-ChangeByte-
#27 NEW cov: 483 ft: 1166 corp: 12/29b lim: 4 exec/s: 0 rss: 41Mb L: 4/4 MS: 3 ChangeByte-EraseBytes-CopyPart-
#28 NEW cov: 483 ft: 1167 corp: 13/30b lim: 4 exec/s: 0 rss: 41Mb L: 1/4 MS: 1 EraseBytes-
#33 NEW cov: 490 ft: 1177 corp: 14/33b lim: 4 exec/s: 0 rss: 41Mb L: 3/4 MS: 5 ShuffleBytes-ChangeBit-CopyPart-ChangeBit-ChangeBit-
#34 NEW cov: 490 ft: 1187 corp: 15/36b lim: 4 exec/s: 0 rss: 41Mb L: 3/4 MS: 1 InsertByte-
#35 NEW cov: 490 ft: 1188 corp: 16/40b lim: 4 exec/s: 0 rss: 41Mb L: 4/4 MS: 1 CrossOver-
#38 NEW cov: 490 ft: 1193 corp: 17/42b lim: 4 exec/s: 0 rss: 41Mb L: 2/4 MS: 3 ChangeBit-ChangeBit-CrossOver-
#39 NEW cov: 496 ft: 1201 corp: 18/44b lim: 4 exec/s: 0 rss: 41Mb L: 2/4 MS: 1 InsertByte-
#40 NEW cov: 498 ft: 1203 corp: 19/46b lim: 4 exec/s: 0 rss: 41Mb L: 2/4 MS: 1 ChangeByte-
test/fixtures/grammars/python/src/scanner.cc:106:24: runtime error: null pointer passed as argument 2, which is declared to never be null
/usr/include/string.h:44:28: note: nonnull attribute specified here
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior test/fixtures/grammars/python/src/scanner.cc:106:24 in
test/fixtures/grammars/python/src/scanner.cc:130:14: runtime error: null pointer passed as argument 1, which is declared to never be null
/usr/include/string.h:44:28: note: nonnull attribute specified here
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior test/fixtures/grammars/python/src/scanner.cc:130:14 in
NEW_FUNC[1/8]: 0x55d610 in tree_sitter_python_external_scanner_serialize /home/razze/dev/tree-sitter/test/fixtures/grammars/python/src/scanner.cc:376
NEW_FUNC[2/8]: 0x55d690 in (anonymous namespace)::Scanner::serialize(char*) /home/razze/dev/tree-sitter/test/fixtures/grammars/python/src/scanner.cc:99
[...]
They are pseudo-keywords since Python 3.5 and actual keywords since Python 3.7. There's a lot of async/await code in the wild already. "pseudo-keyword" handling requires a bit of magic, you can see what lib2to3's Grammar.txt is doing in the grammar for that. Look for "ASYNC" and "AWAIT" there.
Examples:
def a(self):
[.1-.0]
The above example produces an ERROR
:
(function_definition [0, 0] - [2, 0]
(identifier [0, 4] - [0, 5])
(parameters [0, 5] - [0, 11]
(identifier [0, 6] - [0, 10]))
(expression_statement [1, 4] - [1, 11]
(list [1, 4] - [1, 11]
(float [1, 5] - [1, 8])
(ERROR [1, 8] - [1, 10]
(integer [1, 9] - [1, 10])))))
def a(self):
[.1-0]
The above example does not produce an error, but is not the correct tree:
(function_definition [3, 0] - [5, 0]
(identifier [3, 4] - [3, 5])
(parameters [3, 5] - [3, 11]
(identifier [3, 6] - [3, 10]))
(expression_statement [4, 4] - [4, 10]
(list [4, 4] - [4, 10]
(float [4, 5] - [4, 9])))))
There is a parsing error for a file beginning with a bom (\uFEFF).
What's weird here is that it should be handled correctly because \uFEFF appears in the extras list.
In the playground, if I type in
def foo():
I get
module [0, 0] - [1, 0])
ERROR [0, 0] - [0, 10])
identifier [0, 4] - [0, 7])
parameters [0, 7] - [0, 9])
If I add a statement at indent level 0:
def foo():
print "hi"
I get
module [0, 0] - [3, 0])
function_definition [0, 0] - [2, 10])
name: identifier [0, 4] - [0, 7])
parameters: parameters [0, 7] - [0, 9])
body: block [2, 0] - [2, 10])
print_statement [2, 0] - [2, 10])
argument: string [2, 6] - [2, 10])
I believe this reveals a flaw in the indentation sensitivity of the python parser. Ideally:
print "hi"
should not get parsed as a child of the function_definition
.
The empty block should be filled in with a MISSING
recovery of insertion of pass
I don't know much about tree sitter grammar construction, so I am not sure if fixing this is feasible. Similar issues occur with unterminated parenthesis at the end of the block, incomplete infix operators, etc.
Hey forum,
how do I set the style of a s expression? E.g. I want to catch all the if statements, I use a query like this:(if_statement(condition_clause(_)) @condition_clause.inner)
to get them. So lets say I matched on this line of code in c++
if (x==2) {doSomething}
--> (condition clause value (binary_expression left: (identifier) right: (number_literal)))
However, i'd rather like this style:
if (x==2) {doSomething}
--> (condition clause value (binary_expression "=="( (identifier) (number_literal)))
where we drop left/right and get more detail on the actual binary_expression, i.e. what kind of expression, etc. Can this be done without parsing the result again?
The parser appears to generate an error on function definitions with positional-only parameters:
def name(p1, p2=None, /, p_or_kw=None, *, kw):
pass
module [0, 0] - [2, 0])
function_definition [0, 0] - [1, 6])
name: identifier [0, 4] - [0, 8])
parameters: parameters [0, 8] - [0, 45])
identifier [0, 9] - [0, 11])
default_parameter [0, 13] - [0, 20])
name: identifier [0, 13] - [0, 15])
value: none [0, 16] - [0, 20])
ERROR [0, 22] - [0, 24])
default_parameter [0, 25] - [0, 37])
name: identifier [0, 25] - [0, 32])
value: none [0, 33] - [0, 37])
list_splat_pattern [0, 39] - [0, 40])
identifier [0, 42] - [0, 44])
body: block [1, 2] - [1, 6])
pass_statement [1, 2] - [1, 6])
Given the following valid Python2 code:
[ x() for x in lambda: True, lambda: False if x() ]
The tree we produce today:
(module [0, 0] - [1, 0]
(expression_statement [0, 0] - [0, 51]
(list_comprehension [0, 0] - [0, 51]
(call [0, 2] - [0, 5]
(identifier [0, 2] - [0, 3])
(argument_list [0, 3] - [0, 5]))
(variables [0, 10] - [0, 11]
(identifier [0, 10] - [0, 11]))
(lambda [0, 15] - [0, 27]
(true [0, 23] - [0, 27]))
(lambda [0, 29] - [0, 49]
(conditional_expression [0, 37] - [0, 49]
(false [0, 37] - [0, 42])
(call [0, 46] - [0, 49]
(identifier [0, 46] - [0, 47])
(argument_list [0, 47] - [0, 49])))))))
The conditional_expression
subtree is contained within the final lambda
expression in the list comprehension. The conditional_expression
though should be a sibling to the lambda
expression. The conditional_expression
's then
case is the value of x()
when the x()
call is truthy.
The result of this in the repl:
>>> [ x() for x in lambda: True, lambda: False if x() ]
[True]
Hi! I am coming from Atom's language-python.
We need to be able to detect a method call (the priority now) or object property access. How can we do this using tree-sitter?
import json
obj = 1234
json.dumps( )
# ^meta.method-call.python
I have an on-going pull request which tries to do this.
atom/language-python#325
Example:
if something is None:
yield None
With tree-sitter is
and yield
are not highlighted as before
For example:
def foo():
"""some
docstring
string""""
# comment
Will parse a function node that terminates in the docstring.
Given
foo = [โbarโ, โbazโ]
print ((โbafโ, *foo))
the result should be (โbafโ, โbarโ, โbazโ)
. The parser currently chokes on the *
character.
Just playing around on the playground (https://tree-sitter.github.io/tree-sitter/playground), the following python code says for is an identifier, and marks x as the error, instead of the other way around.
def func(x, list):
if for x in list:
print(x)
The grammar rule for boolean_operator
says left
twice in the first bit when it should be right
:
boolean_operator: $ => choice(
prec.left(PREC.and, seq(
field('left', $._expression),
field('operator', 'and'),
field('left', $._expression)
)),
prec.left(PREC.or, seq(
field('left', $._expression),
field('operator', 'or'),
field('right', $._expression)
))
),
The following valid python code fails to parse:
def test():
s = f'{{'
x = 'test'
return s + x
Atom incorrectly highlights x =
as a string.
Python 3.8 introduced a syntax to display both the unevaluated and evaluated value of an expression that tree-sitter-python fails to parser.
The following file
print(f'{question=} {points=}')
generates the following AST
expression_statement [0, 0] - [0, 31]
call [0, 0] - [0, 31]
function: identifier [0, 0] - [0, 5]
arguments: argument_list [0, 5] - [0, 31]
string [0, 6] - [0, 30]
interpolation [0, 8] - [0, 19]
identifier [0, 9] - [0, 17]
ERROR [0, 17] - [0, 18]
interpolation [0, 20] - [0, 29]
identifier [0, 21] - [0, 27]
ERROR [0, 27] - [0, 28]
If question would be "What is 1+1"
and points"5"
? The above expression would print question='What is 1+1?' points=5
.
Example:
tail_leaves: List[Leaf] = []
I see with Cmd+Option+P that the variable annotation receives an ERROR class.
According to the Python docs, future statements
are a special form of import statements.
Currently parsing the following code
from __future__ import print_function
produces
(module [0, 0] - [1, 0]
(import_from_statement [0, 0] - [0, 37]
(dotted_name [0, 5] - [0, 15]
(identifier [0, 5] - [0, 15]))
(dotted_name [0, 23] - [0, 37]
(identifier [0, 23] - [0, 37]))))
but ideally this would emit a tree like:
(module [0, 0] - [1, 0]
(future_import_statement [0, 0] - [0, 37]
(dotted_name [0, 23] - [0, 37]
(identifier [0, 23] - [0, 37]))))
If I make a docstring like
def func(a, b):
"This is my docstring."
return True
It will give "Th
and ng"
different highlighting. As if it would have been written with 3 quotation marks.
def func(a, b):
"""This is my docstring."""
return True
The example doesn't follow the PEP conventions for docstrings. But should still have valid parsing.
Given the expression:
assert isinstance(copy2, variables.Variable)
The parse tree generated is:
(module [0, 0] - [1, 0]
(expression_statement [0, 0] - [0, 44]
(comparison_operator [0, 0] - [0, 44]
(identifier [0, 0] - [0, 6])
(call [0, 9] - [0, 44]
(identifier [0, 9] - [0, 17])
(identifier [0, 18] - [0, 23])
(attribute [0, 25] - [0, 43]
(identifier [0, 25] - [0, 34])
(identifier [0, 35] - [0, 43]))))))
The comparison_operator
rule contains in
, and it appears the isinstance
call is being incorrectly labeled as a comparison_operator
because it's matching in
. Is that what you think is happening @maxbrunsfeld?
I am trying to get the tree for following code.
get_data_dir() + os.path.sep
Ideally I excpect that there will be a binary_expression/operator at the start. Then Left side will be call and right side would be attribute.
However, the structure is coming as follows. I feel that it is wrong.
Can somebody tell me how I could able to get the correct tree structure? (I changed the code to "Javascript" and the structure came correct. So I think there is something wrong with python-parser)
module [0, 0] - [1, 0])
expression_statement [0, 0] - [0, 28])
attribute [0, 0] - [0, 28])
object: attribute [0, 0] - [0, 24])
object: binary_operator [0, 0] - [0, 19])
left: call [0, 0] - [0, 14])
function: identifier [0, 0] - [0, 12])
arguments: argument_list [0, 12] - [0, 14])
right: identifier [0, 17] - [0, 19])
attribute: identifier [0, 20] - [0, 24])
attribute: identifier [0, 25] - [0, 28])
I would love it if Tree-sitter handled placeholders in regular strings:
'My name is %12s' % 'Slim Shady'
'My name is %(name)12s' % {'name': 'Slim Shady'}
'My name is {:>12}'.format('Slim Shady')
'My name is {name:>12}'.format(name='Slim Shady')
'My name is {names[0]:>12}'.format(names=['Slim Shady'])
More examples:
https://pyformat.info
For the following code:
try:
pass
except Exception as e:
pass
... except_clause )
identifier )
identifier )
block ) ...
It would be helpful to have an 'except_item' similarly to 'with_item', helping distinguish between the identifiers:
with_item )
value: identifier )
alias: identifier )
.. # suggestion
except_item )
value: identifier )
alias: identifier )
For the following code:
x[0] + x[0]
module [0, 0] - [1, 0])
expression_statement [0, 0] - [0, 11])
subscript [0, 0] - [0, 11])
value: binary_operator [0, 0] - [0, 8])
left: subscript [0, 0] - [0, 4])
value: identifier [0, 0] - [0, 1])
subscript: integer [0, 2] - [0, 3])
right: identifier [0, 7] - [0, 8])
subscript: integer [0, 9] - [0, 10])
Same code for JavaScript produces the correct result
program [0, 0] - [1, 0])
expression_statement [0, 0] - [0, 11])
binary_expression [0, 0] - [0, 11])
left: subscript_expression [0, 0] - [0, 4])
object: identifier [0, 0] - [0, 1])
index: number [0, 2] - [0, 3])
right: subscript_expression [0, 7] - [0, 11])
object: identifier [0, 7] - [0, 8])
index: number [0, 9] - [0, 10])
(i.e. the subscript operation should be encapsulated within the binary operator and not the other way around)
We currently generate ann
, left
, and right
fields in this type, which doesnโt provide us any information about what operator was actually used (+=
, -=
, etc.)
It would be helpful to add a ReturnStatement
node for lambdas
In python the power operator is right associative so:
x=2**2**3
print(x)
returns 256 and not 64. Also, it binds stronger than unary minus so:
x=-2**2
print(x)
returns -4 and not 4.
Best regards,
Jason
I created this issue tree-sitter/tree-sitter-agda#2 -- I just realized that the code there is based off the code here, so you have the same issue (uint16_t
gets narrowed to char
):
tree-sitter-python/src/scanner.cc
Lines 100 to 106 in a954c04
I submitted a PR there to fix the wraparound due to narrowing -- would you like me to do the same here?
On a related note, I noticed this commit 250cc78 where you clamp the length of delimiter stack -- isn't that a logic error (because you're throwing away delimiters)? Is it that you expect that limit to never be reached in a practical situation, so you've written it this way to keep the code simple?
Python language build error when running: apm install python-debugger language-python
. Is tree-sitter incompatible with newer versions of python?
npm ERR! node v6.9.5
npm ERR! npm v3.10.10
npm ERR! code ELIFECYCLE
npm ERR! [email protected] install: `node-gyp rebuild`
npm ERR! Exit status 1
npm ERR!
npm ERR! Failed at the [email protected] install script 'node-gyp rebuild'.
npm ERR! Make sure you have the latest version of node.js and npm installed.
npm ERR! If you do, this is most likely a problem with the tree-sitter-python package,
npm ERR! not with npm itself.
npm ERR! Tell the author that this fails on your system:
npm ERR! node-gyp rebuild
npm ERR! You can get information on how to open an issue for this project with:
npm ERR! npm bugs tree-sitter-python
npm ERR! Or if that isn't available, you can get their info via:
npm ERR! npm owner ls tree-sitter-python
npm ERR! There is likely additional logging output above.
npm ERR! Please include the following file with any support request:
npm ERR! /private/var/folders/1_/3cmgrq_n4y16hz37wm7mfnkm0000gn/T/apm-install-dir-118512-42406-1tygjwq.yermqp8pvi/npm-debug.log
npm ERR! code 1
No /private/var/folders/1_/3cmgrq_n4y16hz37wm7mfnkm0000gn/T/apm-install-dir-118512-42406-1tygjwq.yermqp8pvi/npm-debug.log
file available.
Got a more verbose error when attempting to build tree-sitter-cli 'manually' using npm install tree-sitter-cli
gyp ERR! configure error
gyp ERR! stack Error: Python executable "/anaconda3/bin/python" is v3.6.5, which is not supported by gyp.
gyp ERR! stack You can pass the --python switch to point to Python >= v2.5.0 & < 3.0.0.
gyp ERR! stack at PythonFinder.failPythonVersion (/usr/local/lib/node_modules/npm/node_modules/node-gyp/lib/configure.js:492:19)
gyp ERR! stack at PythonFinder.<anonymous> (/usr/local/lib/node_modules/npm/node_modules/node-gyp/lib/configure.js:474:14)
gyp ERR! stack at ChildProcess.exithandler (child_process.js:282:7)
gyp ERR! stack at ChildProcess.emit (events.js:182:13)
gyp ERR! stack at maybeClose (internal/child_process.js:961:16)
gyp ERR! stack at Socket.stream.socket.on (internal/child_process.js:380:11)
gyp ERR! stack at Socket.emit (events.js:182:13)
gyp ERR! stack at Pipe._handle.close [as _onclose] (net.js:595:12)
gyp ERR! System Darwin 17.5.0
gyp ERR! command "/usr/local/Cellar/node/10.4.0/bin/node" "/usr/local/lib/node_modules/npm/node_modules/node-gyp/bin/node-gyp.js" "rebuild"
gyp ERR! cwd /Users/milk/node_modules/tree-sitter
gyp ERR! node -v v10.4.0
gyp ERR! node-gyp -v v3.6.2
gyp ERR! not ok
I am new to tree-sitter and my understanding is that the python grammar specified in grammar.json is based on the Python reference grammar. However, it isn't exactly 1 to 1 with the reference grammar, so I've been wondering what the differences are. I realize that the differences are reflected in grammar.json , but that file is difficult to read to get a summary view.
This leads me to ask, is it possible to provide the tree-sitter python grammar in a more readable and condensed format, similar to how the Python reference is presented (https://docs.python.org/3.8/reference/grammar.html)?
If it can't be provided, do you know if it is relatively trivial to generate, or are there some theoretical reasons it couldn't be converted into something like EBNF?
Thanks
Hi tree-sitter-python,
I was able to develop using tree-sitter-python
without any issues until I upgraded my computer to OSX Big Sur. Now I'm running into the following error:
Python 3.7.7 (default, Mar 10 2020, 16:11:21)
[Clang 11.0.0 (clang-1100.0.33.12)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from tree_sitter import Language
>>> Language.build_library('languages.so', ["tree-sitter-python"])
Undefined symbols for architecture x86_64:
"std::__1::__vector_base_common<true>::__throw_length_error() const", referenced from:
std::__1::vector<unsigned short, std::__1::allocator<unsigned short> >::__recommend(unsigned long) const in scanner.o
std::__1::vector<(anonymous namespace)::Delimiter, std::__1::allocator<(anonymous namespace)::Delimiter> >::__recommend(unsigned long) const in scanner.o
"std::logic_error::logic_error(char const*)", referenced from:
std::length_error::length_error(char const*) in scanner.o
"std::length_error::~length_error()", referenced from:
std::__1::__throw_length_error(char const*) in scanner.o
"std::terminate()", referenced from:
___clang_call_terminate in scanner.o
"typeinfo for std::length_error", referenced from:
std::__1::__throw_length_error(char const*) in scanner.o
"vtable for std::length_error", referenced from:
std::length_error::length_error(char const*) in scanner.o
NOTE: a missing vtable usually means the first non-inline virtual member function has no definition.
"operator delete(void*)", referenced from:
_tree_sitter_python_external_scanner_create in scanner.o
_tree_sitter_python_external_scanner_destroy in scanner.o
std::__1::_DeallocateCaller::__do_call(void*) in scanner.o
"operator new(unsigned long)", referenced from:
_tree_sitter_python_external_scanner_create in scanner.o
std::__1::__libcpp_allocate(unsigned long, unsigned long) in scanner.o
"___cxa_allocate_exception", referenced from:
std::__1::__throw_length_error(char const*) in scanner.o
"___cxa_begin_catch", referenced from:
___clang_call_terminate in scanner.o
"___cxa_call_unexpected", referenced from:
std::__1::__vector_base<(anonymous namespace)::Delimiter, std::__1::allocator<(anonymous namespace)::Delimiter> >::__destruct_at_end((anonymous namespace)::Delimiter*) in scanner.o
std::__1::allocator<(anonymous namespace)::Delimiter>::deallocate((anonymous namespace)::Delimiter*, unsigned long) in scanner.o
std::__1::__vector_base<unsigned short, std::__1::allocator<unsigned short> >::__destruct_at_end(unsigned short*) in scanner.o
std::__1::allocator<unsigned short>::deallocate(unsigned short*, unsigned long) in scanner.o
std::__1::vector<(anonymous namespace)::Delimiter, std::__1::allocator<(anonymous namespace)::Delimiter> >::__destruct_at_end((anonymous namespace)::Delimiter*) in scanner.o
std::__1::vector<unsigned short, std::__1::allocator<unsigned short> >::max_size() const in scanner.o
std::__1::__split_buffer<unsigned short, std::__1::allocator<unsigned short>&>::__destruct_at_end(unsigned short*, std::__1::integral_constant<bool, false>) in scanner.o
...
"___cxa_free_exception", referenced from:
std::__1::__throw_length_error(char const*) in scanner.o
"___cxa_throw", referenced from:
std::__1::__throw_length_error(char const*) in scanner.o
"___gxx_personality_v0", referenced from:
_tree_sitter_python_external_scanner_create in scanner.o
_tree_sitter_python_external_scanner_destroy in scanner.o
(anonymous namespace)::Scanner::Scanner() in scanner.o
std::__1::__vector_base<(anonymous namespace)::Delimiter, std::__1::allocator<(anonymous namespace)::Delimiter> >::__destruct_at_end((anonymous namespace)::Delimiter*) in scanner.o
std::__1::allocator<(anonymous namespace)::Delimiter>::deallocate((anonymous namespace)::Delimiter*, unsigned long) in scanner.o
std::__1::__vector_base<unsigned short, std::__1::allocator<unsigned short> >::__destruct_at_end(unsigned short*) in scanner.o
std::__1::allocator<unsigned short>::deallocate(unsigned short*, unsigned long) in scanner.o
...
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
Traceback (most recent call last):
File "/System/Volumes/Data/export/apps/python/3.7.7/lib/python3.7/distutils/unixccompiler.py", line 204, in link
self.spawn(linker + ld_args)
File "/System/Volumes/Data/export/apps/python/3.7.7/lib/python3.7/distutils/ccompiler.py", line 910, in spawn
spawn(cmd, dry_run=self.dry_run)
File "/System/Volumes/Data/export/apps/python/3.7.7/lib/python3.7/distutils/spawn.py", line 36, in spawn
_spawn_posix(cmd, search_path, dry_run=dry_run)
File "/System/Volumes/Data/export/apps/python/3.7.7/lib/python3.7/distutils/spawn.py", line 159, in _spawn_posix
% (cmd, exit_status))
distutils.errors.DistutilsExecError: command 'cc' failed with exit status 1
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "python3.7/site-packages/tree_sitter/__init__.py", line 72, in build_library
compiler.link_shared_object(object_paths, output_path)
File "/System/Volumes/Data/export/apps/python/3.7.7/lib/python3.7/distutils/ccompiler.py", line 717, in link_shared_object
extra_preargs, extra_postargs, build_temp, target_lang)
File "/System/Volumes/Data/export/apps/python/3.7.7/lib/python3.7/distutils/unixccompiler.py", line 206, in link
raise LinkError(msg)
distutils.errors.LinkError: command 'cc' failed with exit status 1
To add to this mystery I've been able to build tree-sitter-java
, tree-sitter-javascript
, and tree-sitter-scala
successfully, however also run into the error when attempting tree-sitter-cpp
.
I've played around with different versions of python 3
(to see if clang
was the issue) to no avail. Ex on python 3.6
:
Python 3.6.10 (default, Jan 16 2020, 13:37:48)
[GCC 4.2.1 Compatible Apple LLVM 10.0.1 (clang-1001.0.46.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from tree_sitter import Language
>>> Language.build_library('languages.so', ["tree-sitter-python"])
Undefined symbols for architecture x86_64:
"std::__1::__vector_base_common<true>::__throw_length_error() const", referenced from:
std::__1::vector<unsigned short, std::__1::allocator<unsigned short> >::__recommend(unsigned long) const in scanner.o
std::__1::vector<(anonymous namespace)::Delimiter, std::__1::allocator<(anonymous namespace)::Delimiter> >::__recommend(unsigned long) const in scanner.o
...
I've also tried different versions of tree-sitter
and tree-sitter-python
and seen the error consistently in every variation: tree-sitter 0.19.0
and tree-sitter-python 0.19.0
, as well as tree-sitter 0.2.0
and the versions of tree-sitter-python 0.13.6 - 0.17.1
.
I also have a linux machine and don't have any problems over there.
Do you know where this issue is coming from and how to get around it?
Thank you!
Trying out the Python playground, I entered this code:
def foo():
And got this result:
module [0, 0] - [1, 0])
function_definition [0, 0] - [0, 10])
name: identifier [0, 4] - [0, 7])
parameters: parameters [0, 7] - [0, 9])
body: block [0, 10] - [0, 10])
Even though not having a body in a function is a syntax error in Python 3. Perhaps this is related to the error construction behavior, and I misunderstand how Tree-Sitter works, but it seems to me like there should be an error node in there.
In a function definition, the positional-only /
and keyword-only *
specifiers aren't recognized.
Some documentation here.
I would expect this not to parse/parse with errors:
def foo():
foo()
But AFAICT there are no ERROR
nodes in the tree, nor are there any nodes for which hasError()
returns true. It looks like this was intentionally added in #65 ("As is often the case, I think the most practical fix is to allow a superset of what Python really allows, and treat empty blocks as valid blocks.")
I have a higher-level question: if I want to know whether a file parses correctly, can I do that with tree-sitter? Or is that not what tree-sitter is for?
Using the following code snippet in Syntax Tree Playground
def get_vid_from_url(url):
"""
Extracts video ID from URL.
"""
return match(url, r'youtu\.be/([^?/]+)')
the resulting tree fails to identify Extracts video ID from URL.
as docstring.
Running npm install -g tree-sitter-python
on macOS 10.14.6, I get this output:
> [email protected] install /usr/local/lib/node_modules/tree-sitter-python
> node-gyp rebuild
CC(target) Release/obj.target/tree_sitter_python_binding/src/parser.o
CXX(target) Release/obj.target/tree_sitter_python_binding/src/binding.o
../src/binding.cc:13:6: error: variable has incomplete type 'void'
void Init(Handle<Object> exports, Handle<Object> module) {
^
../src/binding.cc:13:11: error: use of undeclared identifier 'Handle'
void Init(Handle<Object> exports, Handle<Object> module) {
^
../src/binding.cc:13:18: error: 'Object' does not refer to a value
void Init(Handle<Object> exports, Handle<Object> module) {
^
/Users/chbk/Library/Caches/node-gyp/12.9.1/include/node/v8.h:3369:17: note:
declared here
class V8_EXPORT Object : public Value {
^
../src/binding.cc:13:26: error: use of undeclared identifier 'exports'
void Init(Handle<Object> exports, Handle<Object> module) {
^
../src/binding.cc:13:35: error: use of undeclared identifier 'Handle'
void Init(Handle<Object> exports, Handle<Object> module) {
^
../src/binding.cc:13:42: error: 'Object' does not refer to a value
void Init(Handle<Object> exports, Handle<Object> module) {
^
/Users/chbk/Library/Caches/node-gyp/12.9.1/include/node/v8.h:3369:17: note:
declared here
class V8_EXPORT Object : public Value {
^
../src/binding.cc:13:50: error: use of undeclared identifier 'module'
void Init(Handle<Object> exports, Handle<Object> module) {
^
../src/binding.cc:13:57: error: expected ';' after top level declarator
void Init(Handle<Object> exports, Handle<Object> module) {
^
;
8 errors generated.
make: *** [Release/obj.target/tree_sitter_python_binding/src/binding.o] Error 1
gyp ERR! build error
gyp ERR! stack Error: `make` failed with exit code: 2
gyp ERR! stack at ChildProcess.onExit (/usr/local/lib/node_modules/npm/node_modules/node-gyp/lib/build.js:196:23)
gyp ERR! stack at ChildProcess.emit (events.js:209:13)
gyp ERR! stack at Process.ChildProcess._handle.onexit (internal/child_process.js:272:12)
gyp ERR! System Darwin 18.7.0
gyp ERR! command "/usr/local/Cellar/node/12.9.1/bin/node" "/usr/local/lib/node_modules/npm/node_modules/node-gyp/bin/node-gyp.js" "rebuild"
gyp ERR! cwd /usr/local/lib/node_modules/tree-sitter-python
gyp ERR! node -v v12.9.1
gyp ERR! node-gyp -v v5.0.3
gyp ERR! not ok
npm ERR! code ELIFECYCLE
npm ERR! errno 1
npm ERR! [email protected] install: `node-gyp rebuild`
npm ERR! Exit status 1
npm ERR!
npm ERR! Failed at the [email protected] install script.
npm ERR! This is probably not a problem with npm. There is likely additional logging output above.
Related to atom/language-python#310 (comment)
In a fresh clone, if I run npm install
and then npm run build
, I get the following error:
$ npm run build
> [email protected] build /tmp/tree-sitter-python
> tree-sitter generate && node-gyp build
Error: Invalid choice member: Unknown rule type: ALIAS
npm ERR! Linux 4.9.0-3-amd64
npm ERR! argv "/usr/bin/node" "/usr/bin/npm" "run" "build"
npm ERR! node v6.11.4
npm ERR! npm v3.10.10
npm ERR! code ELIFECYCLE
npm ERR! [email protected] build: `tree-sitter generate && node-gyp build`
npm ERR! Exit status 1
[...]
Similarly if I simply run node_modules/.bin/tree-sitter generate
, I get
Error: Invalid choice member: Unknown rule type: ALIAS
If after the npm install
I add npm install [email protected]
, then npm run build
works great. Ditto with 0.7.2.
In Python 2, print()
prints: ()
.
In Python 3, print()
prints ``.
Python 2 treats print()
as a print statement for the empty tuple.
Python 3 treats print()
as an expression with no arguments.
I may be mistaken, but I believe we have to make a fundamental decision about how the Python grammar will interpret this, because we do not differentiate between Python versions.
@maxbrunsfeld, my thought is to parse print()
according to the native AST produced by Python 3. My simple mind thinks, "Python 3 code will likely outlive Python 2 code." What would you suggest?
Current behavior on master parses this according to Python 2:
(print_statement (tuple))
I'd propose instead:
(expression_statement (call (identifier) (argument_list)))
Parser cannot handle expressions like this:
value = 1234
print(f'input={value:#06x}')
Tree-sitter-python doesn't provide scopes for built-in types like Tree-sitter-java.
Built-in types are: bool
, bytearray
, bytes
, complex
, dict
, float
, frozenset
, int
, list
, memoryview
, object
, range
, set
, str
, tuple
.
The parser does not recognize "yield from" generators and classifies "from" as an identifier
Example:
yield from block_iteration(self.blocks, ghost_layers, self.dim, prefix)
is parsed as
program [0, 0] - [2, 0])
yield [0, 0] - [0, 28])
argument_list [0, 6] - [0, 28])
method_call [0, 6] - [0, 28])
method: identifier [0, 6] - [0, 10])
arguments: argument_list [0, 11] - [0, 28])
method_call [0, 11] - [0, 28])
method: identifier [0, 11] - [0, 26])
arguments: argument_list [0, 26] - [0, 28])
Further reference: http://simeonvisser.com/posts/python-3-using-yield-from-in-generators-part-1.html
Compared to most other tree sitter grammars, the Python one uses C++, i. e. scanner.cc
which makes it much harder to use (because of the requirements to link the application with libstdc++) . I haven't managed to get it working using the Rust bindings on stable. Is there a good reason? It contradicts one of the stated goals of tree sitter:
Dependency-free so that the runtime library (which is written in pure C) can be embedded in any application
A docstring is a string literal that occurs as the first statement in a module, function, class, or method definition.
Can this be incorporated into the grammar?
Python v3 supports a wide variety of Unicode character classes in identifiers. The grammar definition should support the same sets as specified.
In grammar.js
, expression
is a choice
of expression_statement
: https://github.com/tree-sitter/tree-sitter-python/blob/master/grammar.js#L183-L184
In node-types.json
, expression
is a child of expression_statement
: https://github.com/tree-sitter/tree-sitter-python/blob/master/src/node-types.json#L1164-L1182
Is this correct? Since expression
is a choice
of expression_statement
, I would have expected expression
to be a subtype of expression_statement
in node-types.json
, not a child.
(I'm interested in this because I want to identify nodes that are valid (syntactically correct) Python by themselves so that I can send them to a REPL. My method for doing this is to check whether they are subtypes of _compound_statement
or _simple_statement
. But this breaks down for expression_statement
, which has children instead of subtypes.)
Thank you for making tree-sitter-python!
EDIT: Fixed link.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.