GithubHelp home page GithubHelp logo

python-driver's People

Contributors

abeaumont avatar bzz avatar creachadair avatar dennwc avatar juanjux avatar lwsanty avatar mcarmonaa avatar mcuadros avatar smacker avatar smola avatar tsolakoua avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

python-driver's Issues

PRIMITIVE role meaning for Python

The documentation says:

    // Primitive is a language builtin.
    Primitive

Python has a lot of built-in names, but since I can override any built-in function it is not possible to say if this name built-in or not.

Can you please clarify Primitive role meaning for Python?

There is no simple identifier of module when I import submodule

I run this code:

from bblfsh.client import BblfshClient
bc = BblfshClient("0.0.0.0:9432")
res = bc.parse("example.py", language="Python")
print(res)

where example.py is

import matplotlib.pyplot

Here is full output:

uast {
  internal_type: "Module"
  children {
    internal_type: "Import"
    properties {
      key: "internalRole"
      value: "body"
    }
    children {
      internal_type: "Import.names"
      properties {
        key: "promotedPropertyList"
        value: "true"
      }
      children {
        internal_type: "alias"
        properties {
          key: "asname"
          value: "<nil>"
        }
        token: "matplotlib.pyplot"
        start_position {
          line: 1
          col: 1
        }
        roles: IMPORT_PATH
        roles: SIMPLE_IDENTIFIER
      }
      roles: 142
    }
    start_position {
      line: 1
      col: 1
    }
    roles: IMPORT_DECLARATION
    roles: STATEMENT
  }
  start_position {
    line: 1
    col: 1
  }
  roles: FILE
}

Why don't I have matplotlib as a simple identifier?
Is it correct?

Wrong Role assignment for ""

If you extract UAST for

""""

You have next tree:

#  Token  Internal Role  Roles Tree                 
                                                    
   ||     Module         FILE                       
1  ||     Expr           ┣ EXPRESSION               
1  ||     Str            ┗ ┗ BYTE, TUPLE, EXPRESSION

What I expect to see is:

#  Token  Internal Role  Roles Tree                 
                                                    
   ||     Module         FILE                       
1  ||     Expr           ┣ EXPRESSION               
1  ||     Str            ┗ ┗ ~BYTE~, ~TUPLE~, EXPRESSION, +STRING+, +VALUE+ 

Legend:

+ROLE+ -- add Role
~ROLE~ -- remove Role
?ROLE? -- maybe add/remove Role

Gist to generate UAST Roles visualization: https://gist.github.com/zurk/d314d67d9aac8843d3776c82cd738b40

Wrong Role assignment for {1:2}

If you extract UAST for

{1:2}

You have next tree:

#  Token  Internal Role  Roles Tree                                     
                                                                        
   ||     Module         FILE                                           
1  ||     Expr           ┣ EXPRESSION                                   
1  ||     Dict           ┃ ┣ BYTE, NULL, EXPRESSION                     
1  |1|    Num            ┃ ┃ ┣ BYTE, REGEXP, EXPRESSION, NULL, PRIMITIVE
1  |2|    Num            ┗ ┗ ┗ BYTE, REGEXP, EXPRESSION, NULL, VALUE    

What I expect to see is:

#  Token  Internal Role  Roles Tree                                     
                                                                        
   ||     Module         FILE                                           
1  ||     Expr           ┣ EXPRESSION                                   
1  ||     Dict           ┃ ┣ ~BYTE~, ~NULL~, EXPRESSION, ?TYPE?                     
1  |1|    Num            ┃ ┃ ┣ ~BYTE~, ~REGEXP~, EXPRESSION, ~NULL~, ~PRIMITIVE~, +NUMBER+, +VALUE+
1  |2|    Num            ┗ ┗ ┗ ~BYTE~, ~REGEXP~, EXPRESSION, ~NULL~, +NUMBER+, +VALUE+

Legend:

+ROLE+ -- add Role
~ROLE~ -- remove Role
?ROLE? -- maybe add/remove Role

Gist to generate UAST Roles visualization: https://gist.github.com/zurk/d314d67d9aac8843d3776c82cd738b40

Snippet produces wrong offset

This code:

class Repo2nBOW(Repo2Base):
    @property
    def id2vec(self):
        return self._id2vec

Produces and offset of 0 for _id2vec.

Parsing failed due to Exception: Could not determine Python version

Using latest https://gist.github.com/bzz/c0c3dbcab5fecbe48e22167e2ad78595 UAST parsing fails on what seems to be https://github.com/damoeb/kalipo/blob/master/kalipo-ir/harvester/spiders/heise_spider.py

Serve log

time="2017-06-21T13:44:13Z" level=debug msg="sending ParseUAST request: Filename:"kalipo-ir/harvester/spiders/heise_spider.py" Language:"python" Content:"import scrapy\nfrom scrapy.contrib.spiders import CrawlSpider, Rule\nfrom scrapy.contrib.linkextractors import LinkExtractor\nfrom scrapy.selector import Selector\n\nfrom harvester.items import Comment\nimport time\nimport calendar\nimport re\n\nclass HeiseSpider(CrawlSpider):\n    name = \"heise\"\n    allowed_domains = [\"www.heise.de\"]\n    start_urls = [\n            \"http://www.heise.de/forum/Telepolis/Kommentare/Ohne-Vorratsdatenspeicherung-sterben-vermisste-Kinder-und-Suizidale/forum-242979/\"\n    ]\n\n    rules = (\n        #Rule(LinkExtractor(allow=('/tp/foren/[^/]+/forum-[0-9]+/list'))),\n\tRule(LinkExtractor(allow=('/posting-[0-9]+/show')), callback='parse_item')\n    )\n\n    def clean_str(self, val):\n\treturn val.replace(u'\\xa0', u' ').strip()\n\n    def to_str(self, arr):\n\treturn self.clean_str(''.join(arr))\n\n    def parse_date(self, val):\n\tgrps = re.search('[0-9]+\\. ([A-Za-z]+) [0-9]{4} [0-9]{2}:[0-9]{2}', val)\n\n\tmnth = grps.group(1)\n\n   \tmonths = ['Januar', 'Februar', 'M\\u00e4rz', 'April', 'Mai', 'Juni', 'Juli', 'August', 'September', 'Oktober', 'November', 'Dezember']\n\tfor index, item in enumerate(months):\n\t   if item.lower() == mnth.lower():\n\t      val = val.replace(mnth, str(index))\n\t      break\n\n\treturn calendar.timegm(time.strptime(val, \"%d. %m %Y %H:%M\"))\n\n    def parse_item(self, response):\n        sel = Selector(response)\n\n\n\tisRoot = len(response.xpath(\"//ul[@class='forum_navi'][2]/li\")) == 6\n\n\tif !isRoot:\n\t   # find parent\n\t   parent = response.xpath(\"//span[@class='active_post']/../../../parent::ul[@class='nextlevel_line']/preceding-sibling::div[@class='hover_line']\")\n\t   # get link\n\t   link = parent.xpath(\".//div[@class='thread_title']/a\")\n\t   # extract parent id from href\n\n\n\titem = Comment()\n\titem['text'] = self.to_str(sel.xpath(\"//h3[@class='posting_subject']/text()\").extract()) + self.to_str(sel.xpath(\"//p[@class='posting_text']/text()\").extract())\n\titem['url'] = response.url\n\titem['parent'] = 'unknown'\n\titem['level'] = 0\n\titem['thread'] = re.search('forum-([0-9]+)', response.url).group(1)\n\titem['author'] = self.to_str(sel.xpath(\"//div[@class='user_info']/i//text()\").extract())\n\titem['date'] = self.parse_date(self.to_str(response.xpath(\"//div[@class='posting_date']/text()\").extract()))\n        return item\n\n" "
time="2017-06-21T13:35:14Z" level=error msg="driver bblfsh/python-driver:latest (01BK5BZ6N1S7MZBCSFPADDBFSW) stderr: ERROR:root:Filepath: , Errors: ['Traceback (most recent call last):\n  File "/usr/lib/python3.6/site-packages/python_driver/requestprocessor.py", line 151, in process_request\n    raise Exception(\'Could not determine Python version\')\nException: Could not determine Python version\n']"

Client logs

Read kalipo-ir/harvester/spiders/heise_spider.py, 2247 bytes	Parsing file:'kalipo-ir/harvester/spiders/heise_spider.py'

Panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x1186147]

goroutine 1 [running]:
github.com/bblfsh/sdk/uast.(*Node).ProtoSize(0x0, 0xc4201f9f50)
	/go/src/github.com/bblfsh/sdk/uast/generated.pb.go:510 +0x37
github.com/bblfsh/sdk/uast.(*Node).Marshal(0x0, 0x1c18070, 0xc4200102c0, 0xc4201f9f20, 0x0, 0x0)
	/go/src/github.com/bblfsh/sdk/uast/generated.pb.go:352 +0x2f
main.main.func1(0xc4201c4e60, 0xc4201c4e60, 0x0)
	/go/src/github.com/src-d/analysis-pipeline/juanjo/pyFromGit2ast2pb.go:69 +0x4d5

Node positions not being according with specs

It is not retrieved start_position.offset field nor end_position as documented by https://doc.bblf.sh/uast/specification.html
All token positions on the same line has the same start_position

[...] it is guaranteed that nodes in a UAST either have no position attached or they have a position with valid offset, line and col. [...]
End position, if present in a token node, is the position of the last character of the token in the original source code.

pasted image at 2017_06_27 08_58 am

example:

import sys
sys.stdout.write("Hello world!\n")

client: client-python
It is retrieved the same positions for second line sys and stdout Nodes

start_position {
    line: 2
    col: 1
}

A child process hangs if I import bblfsh

If you run this code

import os
import sys
import time

import bblfsh

def func(n):
    print('Sleep during {} sec...'.format(n))
    time.sleep(n)
    return n

if __name__ == '__main__':
    for n in range(1, 3):
        pid = os.fork()
        if pid == 0:
            # if you do import bblfsh in this place everythink will be fine
            res = func(n)
            print('Sleep in is Done!')
            sys.exit(0)
        else:
            print('os.waitpid is waiting for {}...'.format(pid))
            _, status = os.waitpid(pid, 0)
            print('os.waitpid is fine!')
    print('ok, terminate.')

You will see, that first child process hangs during sys.exit(0).
If you comment import bblfsh everything will be fine.
Also, you can replace import bblfsh with

from bblfsh.github.com.bblfsh.sdk.protocol.generated_pb2 import ParseResponse

and have the same effect. And here is the main point. The problem is not exactly in bblfsh, but in grpc.

Seems that ParseResponse should be imported in child process because grpc starts threads during import and if you call fork after that, a child process will be hang during exit.

And here grpc/grpc#7951 (comment) you can find that fork is not supported.

So, the reason I write it here is just FYI, I think it is important to know.

And may be it is not a big deal to move ParseRequest and ProtocolServiceStub imports in /bblfsh/client.py to the place, where they are directly called. It can help to avoid this kind of problems. Or find more elegant way. What do you think?

P.S.: test it on MacOS and Ubuntu.

What the difference between IDENTIFIER and NAME Roles for python?

The documentation says:

// Identifier is any form of identifier, used for variable names, functions, packages, etc.
    Identifier
...
// Name is an identifier used to reference a value.
    Name

But in Python all identifiers are references.
So, what the difference between this two roles in Python?
If there is no difference we should add NAME role to each IDENTIFIER Node.

QUALIFIED_IDENTIFIER is not SIMPLE_IDENTIFIER and duplication of CALL_CALLEE role

I found that QUALIFIED_IDENTIFIER is not SIMPLE_IDENTIFIER, but @vmarkovtsev say that it is supposed to be.
Also, I found duplication of CALL_CALLEE role.

How to reproduce:

from bblfsh.client import BblfshClient

filepath = "./matplotlib_example.py"
bc = BblfshClient("0.0.0.0:9432")
res = bc.parse(filepath, language='Python')
print(res)

matplotlib_example.py:

from matplotlib import pyplot as plt
plt.figure()

Output (lines 76-83):

       token: "figure"
        start_position {
          line: 2
          col: 1
        }
        roles: CALL_CALLEE
        roles: CALL_CALLEE
        roles: QUALIFIED_IDENTIFIER

The problem is in figure token. It means that we do not take into account function names during our machine learning analysis.

P.S.: Moved from bblfsh/bblfshd#82 because it is a python specific problem.

SIMPLE_IDENTIFIER is not assigned to import symbols and aliases

Hi,
I was playing with UASTs and found some bug (as I think):
minimal reproducible example:

from os import path
import sys

import numpy as np

All SIMPLE_IDENTIFIERS: {'path': 1, 'numpy': 1, 'sys': 1} - as we can see - np and os are missed.

Some helper code to debug:

from collections import Counter

from ast2vec.bblfsh_roles import SIMPLE_IDENTIFIER
from ast2vec.repo2.base import Repo2Base


class Repo2IdModel:
    NAME = "Repo2IdModel"


class Repo2IdCounter(Repo2Base):
    """
    Print all SIMPLE_IDENTIFIERs (and counters) from repository
    """
    MODEL_CLASS = Repo2IdModel

    def collect_id_cnt(self, root, id_cnt):
        for ch in root.children:
            if SIMPLE_IDENTIFIER in ch.roles:
                id_cnt[ch.token] += 1
            self.collect_id_cnt(ch, id_cnt)

    def convert_uasts(self, file_uast_generator):
        for file_uast in file_uast_generator:
            print("-" * 20 + " " + str(file_uast.filepath))
            id_cnt = Counter()
            self.collect_id_cnt(file_uast.response.uast, id_cnt)
            print(id_cnt)


if __name__ == "__main__":
    repo = "test/imports/"
    c2v = Repo2IdCounter(linguist="path/to/enry", bblfsh_endpoint="0.0.0.0:9432")
    c2v.convert_repository(repo)

Try to automatically fix files with mixed tabs

This is proving to be a common error causing the Python AST module not to parse the source file, but it should be easily fixable using the reindent.py script/module included with the Python standard distribution.

IMPORT node problems: duplication, roles, positions

Hi,
I tried new version of python-driver and found several errors
Code in test.py:
from collections import defaultdict
Then launch bblfsh client:

egor@egor-sourced:~/workspace/uast_playground$ python3 -m bblfsh -f test.py 
uast {
  internal_type: "Module"
  children {
    internal_type: "ImportFrom"
    properties {
      key: "internalRole"
      value: "body"
    }
    properties {
      key: "level"
      value: "0"
    }
    children {
      internal_type: "alias"
      properties {
        key: "asname"
        value: "<nil>"
      }
      properties {
        key: "internalRole"
        value: "names"
      }
      token: "defaultdict"
      start_position {
        offset: 24
        line: 1
        col: 25
      }
      end_position {
        offset: 34
        line: 1
        col: 35
      }
      roles: IMPORT_PATH
      roles: SIMPLE_IDENTIFIER
    }
    children {
      internal_type: "ImportFrom.module"
      properties {
        key: "promotedPropertyString"
        value: "true"
      }
      token: "collections"
      roles: IMPORT_PATH
      roles: SIMPLE_IDENTIFIER
    }
    token: "collections"
    start_position {
      offset: 5
      line: 1
      col: 6
    }
    end_position {
      offset: 34
      line: 1
      col: 35
    }
    roles: IMPORT_DECLARATION
    roles: STATEMENT
  }
  start_position {
    line: 1
    col: 1
  }
  end_position {
    offset: 34
    line: 1
    col: 35
  }
  roles: FILE
}

so token collections is met twice (I think that it's wrong)

  1. there is no start & end position
internal_type: "ImportFrom.module"
properties {
  key: "promotedPropertyString"
  value: "true"
}
token: "collections"
roles: IMPORT_PATH
roles: SIMPLE_IDENTIFIER
  1. there is no role SIMPLE_IDENTIFIER
internal_type: "ImportFrom"
properties {
  key: "internalRole"
  value: "body"
}
properties {
  key: "level"
  value: "0"
}
children {
  internal_type: "alias"
  properties {
    key: "asname"
    value: "<nil>"
  }
  properties {
    key: "internalRole"
    value: "names"
  }
  token: "defaultdict"
  start_position {
    offset: 24
    line: 1
    col: 25
  }
  end_position {
    offset: 34
    line: 1
    col: 35
  }
  roles: IMPORT_PATH
  roles: SIMPLE_IDENTIFIER
}
children {
  internal_type: "ImportFrom.module"
  properties {
    key: "promotedPropertyString"
    value: "true"
  }
  token: "collections"
  roles: IMPORT_PATH
  roles: SIMPLE_IDENTIFIER
}
token: "collections"
start_position {
  offset: 5
  line: 1
  col: 6
}
end_position {
  offset: 34
  line: 1
  col: 35
}
roles: IMPORT_DECLARATION
roles: STATEMENT

Wrong Role assignment for t = set(); t = {}

If you extract UAST for

t = set()
t = {0,1}

You have next tree:

#  Token  Internal Role  Roles Tree                                      
                                                                         
   ||     Module         FILE                                            
1  ||     Assign         ┣ BINARY, THIS, EXPRESSION                      
1  |t|    Name           ┃ ┣ LEFT, IDENTIFIER, EXPRESSION                
1  ||     Call           ┃ ┣ FUNCTION, CALLEE, EXPRESSION, RIGHT         
1  |set|  Name           ┃ ┗ ┗ CALLEE, POSITIONAL, IDENTIFIER, EXPRESSION
2  ||     Assign         ┣ BINARY, THIS, EXPRESSION                      
2  |t|    Name           ┃ ┣ LEFT, IDENTIFIER, EXPRESSION                
2  ||     Set            ┃ ┣ BYTE, STRING, EXPRESSION, RIGHT             
2  |0|    Num            ┃ ┃ ┣ BYTE, REGEXP, EXPRESSION                  
2  |1|    Num            ┗ ┗ ┗ BYTE, REGEXP, EXPRESSION  

What I expect to see is:

#  Token  Internal Role  Roles Tree                                      
                                                                         
   ||     Module         FILE                                            
1  ||     Assign         ┣ BINARY, ~THIS~, EXPRESSION, +Assignment+                      
1  |t|    Name           ┃ ┣ LEFT, IDENTIFIER, EXPRESSION                
1  ||     Call           ┃ ┣ FUNCTION, ~CALLEE~, EXPRESSION, RIGHT, +CALL+         
1  |set|  Name           ┃ ┗ ┗ CALLEE, ~POSITIONAL~, IDENTIFIER, EXPRESSION, +Name+
2  ||     Assign         ┣ BINARY, ~THIS~, EXPRESSION, +Assignment+                      
2  |t|    Name           ┃ ┣ LEFT, IDENTIFIER, EXPRESSION                
2  ||     Set            ┃ ┣ ~BYTE~, ~STRING~, EXPRESSION, RIGHT, +SET+, ?TYPE?              
2  |0|    Num            ┃ ┃ ┣ ~BYTE~, ~REGEXP~, EXPRESSION, +NUMBER+, +VALUE+                  
2  |1|    Num            ┗ ┗ ┗ ~BYTE~, ~REGEXP~, EXPRESSION, +NUMBER+, +VALUE+

Legend:

+ROLE+ -- add Role
~ROLE~ -- remove Role
?ROLE? -- maybe add/remove Role

Gist to generate UAST Roles visualization: https://gist.github.com/zurk/d314d67d9aac8843d3776c82cd738b40

Exception deserializing message in BblfshClient.parse()

I try to run bblfsh python client and it fails (change end point if you need):

from bblfsh import BblfshClient
BblfshClient("172.17.0.1:9432").parse('./TickType.py', language='Python', )

Here is file example: TickType.py.zip

The output I get:

ERROR:root:Exception deserializing message!
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/grpc/_common.py", line 129, in _transform
    return transformer(message)
google.protobuf.message.DecodeError: Error parsing message
Traceback (most recent call last):
  File "bug_exapmle.py", line 3, in <module>
    BblfshClient("172.17.0.1:9432").parse('./temp/TickType.py', language='Python', )
  File "/usr/local/lib/python3.5/dist-packages/bblfsh/client.py", line 58, in parse
    response = self._stub.Parse(request, timeout=timeout)
  File "/usr/local/lib/python3.5/dist-packages/grpc/_channel.py", line 507, in __call__
    return _end_unary_response_blocking(state, call, False, deadline)
  File "/usr/local/lib/python3.5/dist-packages/grpc/_channel.py", line 455, in _end_unary_response_blocking
    raise _Rendezvous(state, None, None, deadline)
grpc._channel._Rendezvous: <_Rendezvous of RPC that terminated with (StatusCode.INTERNAL, Exception deserializing response!)>

And bblfsh server does not fails in this example if I run it directly.
Seems to be some problem in grpc module...

status: ERROR errors: "column out of bounds: 0 [1, 1]"

Hi,
I tried to extract UAST from python file and got an unexpected result - result of extraction starts with the
error message.

The content of exp.py:

# Here you can find the code of /app/exp.py:
# find x in range (-2, 2) that should minimize math.sin function
import math
from hyperopt import fmin, tpe, hp
from hyperopt.mongoexp import MongoTrials

mongodb = "mongodb"
db_port = "27017"
db_name = "ml_exp"
exp_key = "exp1"

trials = MongoTrials("mongo://%s:%s/%s/jobs" % (mongodb, db_port, db_name), exp_key="exp1")
best = fmin(math.sin, hp.uniform("x", -2, 2), trials=trials, algo=tpe.suggest, max_evals=10)
print("Result:", best)

and result of extraction:

status: ERROR
errors: "column out of bounds: 0 [1, 1]"
uast {
  internal_type: "Module"
...

Full UAST is attached:
exp.uast.txt

PS:
is it correct that we have several lines of roles?

      roles: STRING_LITERAL
      roles: EXPRESSION
      roles: ASSIGNMENT_VALUE

Wrong Role assignment for f()

If you extract UAST for

f()

You have next tree:

#  Token  Internal Role  Roles Tree                                      
                                                                         
   ||     Module         FILE                                            
1  ||     Expr           ┣ EXPRESSION                                    
1  ||     Call           ┃ ┣ FUNCTION, CALLEE, EXPRESSION                
1  |f|    Name           ┗ ┗ ┗ CALLEE, POSITIONAL, IDENTIFIER, EXPRESSION

What I expect to see is:

#  Token  Internal Role  Roles Tree                                      
                                                                         
   ||     Module         FILE                                            
1  ||     Expr           ┣ EXPRESSION                                    
1  ||     Call           ┃ ┣ FUNCTION, ~CALLEE~, EXPRESSION, +CALL+                
1  |f|    Name           ┗ ┗ ┗ CALLEE, ~POSITIONAL~, IDENTIFIER, EXPRESSION,+Name+

Legend:

+ROLE+ -- add Role
~ROLE~ -- remove Role
?ROLE? -- maybe add/remove Role

Gist to generate UAST Roles visualization: https://gist.github.com/zurk/d314d67d9aac8843d3776c82cd738b40

Wrong Role assignment for (0,)

If you extract UAST for

(0,)

You have next tree:

#  Token  Internal Role  Roles Tree                    
                                                       
   ||     Module         FILE                          
1  ||     Expr           ┣ EXPRESSION                  
1  ||     Tuple          ┃ ┣ BYTE, TYPE, EXPRESSION    
1  |0|    Num            ┗ ┗ ┗ BYTE, REGEXP, EXPRESSION

What I expect to see is:

#  Token  Internal Role  Roles Tree                    
                                                       
   ||     Module         FILE                          
1  ||     Expr           ┣ EXPRESSION                  
1  ||     Tuple          ┃ ┣ ~BYTE~, ?TYPE?, EXPRESSION, +TUPLE+    
1  |0|    Num            ┗ ┗ ┗ ~BYTE~, ~REGEXP~, EXPRESSION, +NUMBER+, +VALUE+

Legend:

+ROLE+ -- add Role
~ROLE~ -- remove Role
?ROLE? -- maybe add/remove Role

Gist to generate UAST Roles visualization: https://gist.github.com/zurk/d314d67d9aac8843d3776c82cd738b40

Some nodes have the Column location at 0

Some nodes (probably the ones originating from a property in the Python native AST, without location info) have the Column at 0, which is wrong. They should take the Col from their parent node in the case of properties.

This blocks the fixing of the offset with the TransformationUASTParser.

Missing QUALIFIED role

if you get UAST for

import lib1
lib1.lib2.lib3.var = None

You will have next list of Identifiers with roles:

<lib1>:         ['IMPORT', 'PATHNAME', 'IDENTIFIER']
<var>:          ['IDENTIFIER', 'EXPRESSION', 'LEFT']
<lib3>:         ['IDENTIFIER', 'EXPRESSION']
<lib2>:         ['IDENTIFIER', 'EXPRESSION']
<lib1>:         ['IDENTIFIER', 'QUALIFIED', 'IDENTIFIER', 'EXPRESSION']

QUALIFIED role is missing in lib2 and lib3 identifiers.
you can use code from issue #94 to reproduse result.
this gist: https://gist.github.com/zurk/8f7dd974347925ae62c31d9441491613

SIMPLE_IDENTIFIER is not assigned to argument names

Hi!
SIMPLE_IDENTIFIER is not assigned when it has to be assigned.
better ex:

def some_funct(some_arg):
  pass

some_funct(some_arg=some_val)

some_funct(some_arg=some_arg)

Vis:
selection_009

PS: and there's some magic when you pass variable with the same name as argument

IfCondition Role not asigned in some cases

When you have an If which don't have the usual form like a binary expression the role IfCondition dont appear in the UAST, for an input like this:

 if True:
	    print(True)

or

if functionCallThatReturnsABoolean(){
           print(True)
}

the uast roles that we expect is something like this

If{
           IfCondition,
           IfBody
}

and we get this

If{
           IfBody
           expresion/functionCall
}

range(10) wrong role assignment

If you extract UAST for range(10) you have next tree:

line#   token       Roles

        ||          FILE                                                                                                     
1       ||          ┣ EXPRESSION                                                                                             
1       ||          ┃ ┣ FUNCTION, CALLEE, EXPRESSION                                                                         
1       |10|        ┃ ┃ ┣ EXPRESSION, FUNCTION, DECLARATION, ARGUMENT, NAME, IDENTIFIER, CALLEE, ARGUMENT, NOOP
1       |range|     ┗ ┗ ┗ CALLEE, POSITIONAL, IDENTIFIER, EXPRESSION         

What I expect to see is:

line#   token       Roles

        ||          FILE                                                                                                     
1       ||          ┣ EXPRESSION                                                                                             
1       ||          ┃ ┣ FUNCTION, CALLEE, EXPRESSION                                                                         
1       |10|        ┃ ┃ ┣ NUMBER, EXPRESSION, ARGUMENT, NAME, IDENTIFIER, ARGUMENT, POSITIONAL, VALUE
1       |range|     ┗ ┗ ┗ CALLEE, IDENTIFIER 

not sure about EXPRESSION role. It is too common.

Also, I have another experiment and find out that if you parse just 10 you have:

  ||   FILE                        
1 ||   ┣ EXPRESSION                
1 |10| ┗ ┗ BYTE, REGEXP, EXPRESSION

What I expect to see is:

  ||   FILE                        
1 ||   ┣ EXPRESSION                
1 |10| ┗ ┗ NUMBER, EXPRESSION, VALUE

BTW, what about MODULE role? Each file in python is considered as the module, or I am wrong?

Add endposition to nodes

From:

#30

Node endpositions are not mandatory in the current spec if the native driver doesn't provide them as happen with the Python driver, but it would be nice to have them on this driver.

CC @dpordomingo

Bblfsh fails to extract UAST from file

I have strange error trying to get UAST from this file: oo.py.zip

I run this code

from bblfsh.client import BblfshClient
bc = BblfshClient("0.0.0.0:9432")
res = bc.parse("./oo.py", language='Python')
print(res)

Output:

status: FATAL
errors: "expected object of type map[string]interface{}, got: \"NoneLiteral\""

but py file seems to be correct because I can run python3 ./oo.py without any problem

Also, bblfsh server is running in docker and output only

time="2017-08-08T20:34:55Z" level=info msg="parsing oo.py (34525 bytes)"

SIMPLE_IDENTIFIER is assigned wrongly in `with` statement

Hi,
ex:

with open(os.path.join(args.output, "row_vocab.txt"), "w") as out:
    out.write('\n'.join(chosen_words))

and UAST containes node with emty token, wrong position (0,0):

internal_type: "With.items"
properties {
  key: "promotedPropertyList"
  value: "true"
}
children {
  internal_type: "withitem"
  children {
    internal_type: "Name"
    properties {
      key: "ctx"
      value: "Load"
    }
    properties {
      key: "internalRole"
      value: "context_expr"
    }
    token: "a"
    start_position {
      offset: 5
      line: 1
      col: 6
    }
    end_position {
      offset: 5
      line: 1
      col: 6
    }
    roles: SIMPLE_IDENTIFIER
    roles: EXPRESSION
  }
  children {
    internal_type: "Name"
    properties {
      key: "ctx"
      value: "Store"
    }
    properties {
      key: "internalRole"
      value: "optional_vars"
    }
    token: "b"
    start_position {
      offset: 10
      line: 1
      col: 11
    }
    end_position {
      offset: 10
      line: 1
      col: 11
    }
    roles: SIMPLE_IDENTIFIER
    roles: EXPRESSION
  }
  start_position {
    line: 1
    col: 1
  }
  end_position {
    offset: 10
    line: 1
    col: 11
  }
  roles: SIMPLE_IDENTIFIER
  roles: INCOMPLETE
}
roles: SIMPLE_IDENTIFIER
roles: EXPRESSION
roles: INCOMPLETE

Nodes without any roles.

I wrote a small tool for collecting statistics for number of nodes w.r.t. number of node roles in UASTs. It turned out that for my dataset there're some cases when no roles are assigned to a UAST node.

Repositories: /storage/timofei/repos
Extracted UASTs: /storage/timofei/uasts
Collected statistics: uasts_stat.txt
List of suspicious UASTs (csv file with columns: path to UAST, total number of nodes, number of nodes without roles): uasts_susp.txt

Parsing an empty file produces a fatal error

Sending an empty file to the Python driver produces a fatal error (with by definition stops the driver from processing more requests). This shouldn't be so since empty files are common in Python (init.py), it should just produce an error with an empty UAST returned.

"column out of bounds" on minitwit

Merge bblfsh/python-client#38 to print errors

Then

git clone https://github.com/pallets/flask
python3 -m bblfsh -f flask/examples/minitwit/minitwit/minitwit.py >/dev/null

And you get an error from bblfsh:

column out of bounds: 63 [1, 51]

The file is parsed though.

SIMPLE_IDENTIFIER is not assigned when it has to be

Hi,
I tried to extract UAST from python code and noticed when you define a function:
def a(b, c): ...
the node for this function will have roles: ‘FUNCTION_DECLARATION_BODY’, ‘FUNCTION_DECLARATION_RECEIVER’, but not 'SIMPLE_IDENTIFIER'.
In the documentation it's mentioned:

// SimpleIdentifier is the most basic form of identifier, used for variable
// names, functions, packages, etc.

I think that this node should have 'SIMPLE_IDENTIFIER' role.

Divide the astexport.py module between pydetector and the driver

In the meeting planning we decided to split the functionality in the current pydetector.astexport.py module between the retrieval of the native AST data structure unmodified (but for the right Python version) in pydetector and the visitor + noop extractor + position updater in python-driver, reusing the data returned from pydetector to avoid doing a double parsing.

Incorrect positions for nodes: nodes have the same position (line continuation?)

Hi,
I made some experiments and found bug - SIMPLE_IDENTIFIER nodes have the same positions.

Reproducible example:

a += b.c["Some val"] \
    .d

And uast_playground gives us:

# New token 'b' at position (1, 6) has the same position as token 'd' at the same position. Skip new token.
a += b.c["Some val"] \
# Something wrong with token 'd' at pos (1, 6) - it's not equal to 'b' at this position in code
    .d

It looks like that it happens because of line continuation because in case of code:

a += b.c["Some val"].d

everything works well.

BTW: it looks like that d is higher in UAST than b - is it correct? Because it appears earlier during traversing of UAST

Screenshot:
selection_001

role assignment in a = b = c

If you have this code:

a = b = c 
var1 == var2 == var3
var4 < var5 < var6

And run UAST extraction you have strange role assignment.

Please take a look at the code. https://gist.github.com/zurk/66a3045746287bdb5002c0812b94f611
Here is output (the same gist):
https://gist.github.com/zurk/66a3045746287bdb5002c0812b94f611#file-output

Comments for output:

//*[@roleIdentifier] :
<a>:            ['LEFT', 'IDENTIFIER', 'EXPRESSION']
<b>:            ['LEFT', 'IDENTIFIER', 'EXPRESSION']
<c>:            ['RIGHT', 'IDENTIFIER', 'EXPRESSION']
<var2>:         ['IDENTIFIER', 'EXPRESSION']
<var3>:         ['IDENTIFIER', 'EXPRESSION']
<var1>:         ['IDENTIFIER', 'EXPRESSION', 'EXPRESSION', 'BINARY', 'LEFT']
<var5>:         ['IDENTIFIER', 'EXPRESSION']
<var6>:         ['IDENTIFIER', 'EXPRESSION']
<var4>:         ['IDENTIFIER', 'EXPRESSION', 'EXPRESSION', 'BINARY', 'LEFT']

I am not sure how it should be but at least var3 and var6 are on right side. :)
Why we have 'EXPRESSION', 'BINARY' for var1 and var4? I think it is just EXPRESSION as for all others. BINARY is upper level in UAST tree.

Also, I am not sure that we can call second and last expressions as binary at all.
Maybe, the first line of code can be considered as two binary expressions.

//*[@roleLeft] :
<a>:            ['LEFT', 'IDENTIFIER', 'EXPRESSION']
<b>:            ['LEFT', 'IDENTIFIER', 'EXPRESSION']
<var1>:         ['IDENTIFIER', 'EXPRESSION', 'EXPRESSION', 'BINARY', 'LEFT']
<var4>:         ['IDENTIFIER', 'EXPRESSION', 'EXPRESSION', 'BINARY', 'LEFT']

Ok, it can be true for 'a' and 'b'.

//*[@roleRight] :
<c>:            ['RIGHT', 'IDENTIFIER', 'EXPRESSION']
<>:             ['EXPRESSION', 'BINARY', 'RIGHT']
<>:             ['EXPRESSION', 'BINARY', 'RIGHT']

tokens var4 and var6 missing?

//*[@roleBinary] :
<>:             ['BINARY', 'THIS', 'EXPRESSION']
<>:             ['EXPRESSION', 'BINARY']
<>:             ['EXPRESSION', 'BINARY', 'RIGHT']
<var1>:         ['IDENTIFIER', 'EXPRESSION', 'EXPRESSION', 'BINARY', 'LEFT']
<>:             ['EXPRESSION', 'BINARY', 'OPERATOR']
<==>:           ['BINARY', 'OPERATOR', 'EQUAL']
<==>:           ['BINARY', 'OPERATOR', 'EQUAL']
<>:             ['EXPRESSION', 'BINARY']
<>:             ['EXPRESSION', 'BINARY', 'RIGHT']
<var4>:         ['IDENTIFIER', 'EXPRESSION', 'EXPRESSION', 'BINARY', 'LEFT']
<>:             ['EXPRESSION', 'BINARY', 'OPERATOR']
<<>:            ['BINARY', 'OPERATOR', 'LESS_THAN']
<<>:            ['BINARY', 'OPERATOR', 'LESS_THAN']

Everything is fine for a = b = c statement maybe except THIS role, but I am not sure. Please take a look at the defenition and if it suitable here.

And there is a mess for second two lines of code.

<>: ['EXPRESSION', 'BINARY', 'OPERATOR']
seems that it is the node for the full ternary operator because it is without a token. Not sure.

Hope it helps to investigate the problem.

Server "hanging". Empty code received, returning empty UAST

Related to this issue bblfsh/bblfshd#101 (actually problem was no in the server but in python-driver). I have kind of the same symptoms with a new driver. At some moment the server logs:

time="2017-09-22T15:41:39Z" level=debug msg="Empty code received, returning empty UAST"

And then nothing, but my program actually continues to send queries. At some moment (~after 30sec) server logs

time="2017-09-22T15:42:11Z" level=debug msg="driver exited without error"

and then you can actually continue parsing.

I couldn't find the file which breaks everything, and also if I run 1 thread for queries, everything seems to be fine.

Code to reproduce:
https://gist.github.com/zurk/2d9e786e6577ebe60e963091c13b4ecd

files.txt
they are on science-3. Can download it and attach if you want.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.