bblfsh / sdk Goto Github PK
View Code? Open in Web Editor NEWBabelfish driver SDK
License: GNU General Public License v3.0
Babelfish driver SDK
License: GNU General Public License v3.0
related to #1 Can't commit after generating with bootstrap
RUNTIME_NATIVE_VERSION
in the manifest.toml
-> git add -A
git commit -m "IT SHOULD FAIL"
-> it fails, because a "managed file changed"bblfsh-sdk update
-> it will generate a new README.md
with the new manifest.toml
specsgit commit -m "IT SHOULD FAIL"
-> it is COMMITTED!!! and it shouldn't, because you didn't added the README.md
changes. (the new commit is broke, as you can see if you git co .
and then bblfsh-sdk update --dry-run
The pre-commit must validate ONLY the staging area instead of the current working copy.
To do it, the process is:
This makes the driver Docker images need bash which they shouldn't need.
make fails on a uninitialized git repository or repo with no commit. It should initialize it or print a meaningful error.
➜ mylang-driver git:(master) ✗ make all
fatal: ambiguous argument 'HEAD': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions, like this:
'git <command> [<revision>...] -- [<file>...]'
fatal: ambiguous argument 'HEAD': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions, like this:
'git <command> [<revision>...] -- [<file>...]'
fatal: ambiguous argument 'HEAD': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions, like this:
'git <command> [<revision>...] -- [<file>...]'
fatal: ambiguous argument 'HEAD': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions, like this:
'git <command> [<revision>...] -- [<file>...]'
Makefile:6: warning: overriding recipe for target 'test-native'
/home/smola/dev/demos/demo-2017-03-03/1_sdk/mylang-driver/.sdk/make/rules.mk:74: warning: ignoring old recipe for target 'test-native'
Makefile:10: warning: overriding recipe for target 'build-native'
/home/smola/dev/demos/demo-2017-03-03/1_sdk/mylang-driver/.sdk/make/rules.mk:85: warning: ignoring old recipe for target 'build-native'
fatal: ambiguous argument 'HEAD': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions, like this:
'git <command> [<revision>...] -- [<file>...]'
fatal: ambiguous argument 'HEAD': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions, like this:
'git <command> [<revision>...] -- [<file>...]'
fatal: ambiguous argument 'HEAD': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions, like this:
'git <command> [<revision>...] -- [<file>...]'
fatal: ambiguous argument 'HEAD': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions, like this:
'git <command> [<revision>...] -- [<file>...]'
+ mkdir -p /home/smola/dev/demos/demo-2017-03-03/1_sdk/mylang-driver/build
fatal: ambiguous argument 'HEAD': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions, like this:
'git <command> [<revision>...] -- [<file>...]'
fatal: ambiguous argument 'HEAD': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions, like this:
'git <command> [<revision>...] -- [<file>...]'
fatal: ambiguous argument 'HEAD': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions, like this:
'git <command> [<revision>...] -- [<file>...]'
fatal: ambiguous argument 'HEAD': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions, like this:
'git <command> [<revision>...] -- [<file>...]'
/bin/sh: 1: eval: cannot open /home/smola/dev/demos/demo-2017-03-03/1_sdk/mylang-driver/Dockerfile.build.tpl: No such file
/home/smola/dev/demos/demo-2017-03-03/1_sdk/mylang-driver/.sdk/make/rules.mk:67: recipe for target 'bblfsh/mylang-driver-build' failed
make: *** [bblfsh/mylang-driver-build] Error 2
Update drivers to use new transformation DSL and new ToNode helpers.
TODO:
Use new DSL to port old ObjectToNode
transformations.
Requires: #241
TODO:
$ docker run --rm bblfsh/python-driver:latest /opt/driver/bin/driver --help
Usage:
/opt/driver/bin/driver [OPTIONS] <command>
Help Options:
-h, --help Show this help message
Available commands:
parse-native
parse-uast
serve
tokenize
Build information
commit:
date:
The initial code block described at babelfish documentation (creating a directory, a new git repo, etc.) could be done by bblfsh-sdk init
command
Babelfish drivers may return 3 main statuses: OK
, Error
, Fatal
.
Currently, there is no guarantee that Error
responses will have a partial AST, since it's not tested in any of the drivers.
At the same time, Error
status is not considered an error/exception in clients as reported in bblfsh/java-driver#77, thus clients might not be aware that they receive a partial AST.
We should either:
a) Make a special type of error/exception that user code should assert to accept a partial AST. In this case clients will receive an error in case of any syntax erros, which might be a desired behavior.
b) Mention current behavior in Babelfish docs, clarify that user should check the status.
Personally, I would prefer the first option.
Will be useful when doing batch-parsing to have the filename in the response.
A better solution for this kind of problems: #127
Currently, we have:
// Assignment represents a variable assignment or binding.
// The variable that is being assigned to is annotated with the
// AssignmentVariable role, while the value is annotated with
// AssignmentValue.
Assignment
AssignmentVariable
AssignmentValue
// AugmentedAssignment is an augmented assignment usually combining the equal operator with
// another one (e. g. +=, -=, *=, etc). It is expected that children contains an
// AugmentedAssignmentOperator with a child or aditional role for the specific Bitwise or
// Arithmetic operator used. The AugmentedAssignmentVariable and AugmentedAssignmentValue roles
// have the same meaning than in Assignment.
AugmentedAssignment
AugmentedAssignmentOperator
AugmentedAssignmentVariable
AugmentedAssignmentValue
This feels quite redundant. I wonder if we can come up with a more succint way of annotating augmented assignments.
So if you pass an empty file, the TransformationParser will give this error:
column out of bounds: 1 [1, 0]
This error is produced both using a 0 as column index or an 1.
Take as an example the AST generated by babylon:
{
"type": "SomeType",
"loc": {
"start": { "column": 1, "line": 1 },
"end": { "column": 2, "line": 4 },
},
"offset": 1,
"endOffset": 34
}
It'd be nice to be able to put as LineKey
and so on loc.start.line
instead of having to transform the AST.
Doing the integration tests for the Python driver I noticed that the function arguments list, whose items has the internalRole "args" and Role MethodInvocationArguments (childrens of a MethodInvocation) are not sorting in the right order.
For this code:
print("something1", 42, somesymbbol)
I will attach the complete jsons for the Python AST and the generated UAST to this issue
but to see it clearly a (simplified) version of the Python AST part of the arguments is:
"args" : [
{
"LiteralValue" : "something1",
"ast_type" : "StringLiteral"
},
{
"ast_type" : "NumLiteral",
"NumType" : "int",
"LiteralValue" : 42,
},
{
"id" : "somesymbbol",
"ast_type" : "Name"
}
],
While the (simplified) UAST generated is:
{
"Properties" : {
"NumType" : "0",
"internalRole" : "args"
},
"Roles" : [ 59, 54 ],
"Token" : "42",
"InternalType" : "NumLiteral"
},
{
"InternalType" : "Name",
"Token" : "somesymbbol",
"Roles" : [ 54, 0 ],
"Properties" : {
"internalRole" : "args",
},
},
{
"InternalType" : "StringLiteral",
"Properties" : {
"internalRole" : "args"
},
"Roles" : [ 58, 54 ],
"Token" : "something1",
}
As you can see, the arguments are not in the same order for the UAST. The Rule I'm using for them is:
On(HasInternalType(pyast.Call)).Roles(MethodInvocation).Children(
On(HasInternalRole("args")).Roles(MethodInvocationArgument),
On(HasInternalRole("func")).Self(On(HasInternalRole("id"))).Roles(MethodInvocationName),
On(HasInternalRole("func")).Self(On(HasInternalRole("attr"))).Roles(MethodInvocationName),
On(HasInternalRole("func")).Self(On(HasInternalType(pyast.Attribute))).Children(
On(HasInternalRole("id")).Roles(MethodInvocationObject),
),
),
We will move current Node
structure to protocol and will use maps and slices to represent AST nodes internally. Conversion to Node
will happen in protocol package to preserve backward compatibility.
All the work happens in bblfsh/sdk@transform-dsl.
Currently, the way to get installed the sdk's binaries in $GOPATH/bin is running
go get -t -v ./...
Some things that I found missing (it could be maybe a lack of documentation)
Inside the driver, after running bblfsh-sdk prepare-build
native
directory if it does not exists,.gitignore
with a .sdk
rule; if .gitignore
exists without that rule, then append it.Makefile
when running make
if the SDK has not been yet installing, just saying something like:bblfsh-sdk prepare-build
"bblfsh-sdk prepare-build
bblfsh-sdk init
Inside bblfsh/sdk
project
make test
does not pass> make test
--- FAIL: github.com/bblfsh/sdk/etc/skeleton/driver/normalizer :: TestNativeBinary
Error Trace: normalizer_test.go:18
Error: Expected nil, but got: &os.PathError{Op:"fork/exec", Path:"/opt/driver/src/build/native", Err:0x2}
make all
before doing anything? anyway tests keeps failingSome languages (Jotlin, Python, Nim, etc) use async to qualify blocks (which are not always functions, sometimes they can be scoped blocks) that could run as interruptible coroutines. Usually those languages also have an await keyword for waiting for the completion of those blocks (similar to the join call/keyword for threads).
For functions this is usually done on the definition, with Go and Erlang instead having a similar keyword that is used on the call (we could have AsyncDefinition and AsyncCall for example).
We need to check several languages with built-in coroutine features and decide how to implement those roles in the AST.
When it is used "autodetect" feature, passing the filename
and content
in the request, it would be great to obtain the language
as part of the response.
It could be something like:
message ParseResponse {
option (gogoproto.goproto_getters) = false;
option (gogoproto.goproto_stringer) = false;
option (gogoproto.typedecl) = false;
gopkg.in.bblfsh.sdk.v1.protocol.Status status = 1;
repeated string errors = 2;
google.protobuf.Duration elapsed = 3 [(gogoproto.nullable) = false, (gogoproto.stdduration) = true];
gopkg.in.bblfsh.sdk.v1.uast.Node uast = 4 [(gogoproto.customname) = "UAST"];
string language = 5;
}
instead of the current ParseResponse:
message ParseResponse {
option (gogoproto.goproto_getters) = false;
option (gogoproto.goproto_stringer) = false;
option (gogoproto.typedecl) = false;
gopkg.in.bblfsh.sdk.v1.protocol.Status status = 1;
repeated string errors = 2;
google.protobuf.Duration elapsed = 3 [(gogoproto.nullable) = false, (gogoproto.stdduration) = true];
gopkg.in.bblfsh.sdk.v1.uast.Node uast = 4 [(gogoproto.customname) = "UAST"];
}
Default GOPATH on GCE cloud console is "/home/alex/gopath:/google/gopath"
But running a build docker container for any driver fails on mounting such GOPATH
+ docker run --rm -t -u bblfsh:1000 -v /home/alex/java-driver:/opt/driver/src/ -v /home/alex/gopath:/google/gopath:/go -e ENVIRONMENT=bblfs
h/java-driver-build-with-go bblfsh/java-driver-build-with-go make test-driver-internal
docker: Error response from daemon: Invalid bind mount spec "/home/alex/gopath:/google/gopath:/go": invalid mode: /go.
See 'docker run --help'.
So they just show as properties.
https://github.com/bblfsh/sdk/blob/master/protocol/native/objecttonoder.go
We should implement shape-based transformation DSL as described in BIP-5.
All the work happens in bblfsh/sdk@transform-dsl.
TODO:
AST -> UAST
{"langSpecific": {"a": 1}} -> {"Properties": {"langSpecific": "map[a:1]"}}
Such format isn't very useful.
more info here: #213 (comment)
> make install
CGO_ENABLED=0 go get -t -v -ldflags '-extldflags "-static"' ./...
net
go install net: open /usr/lib/go/pkg/linux_amd64/net.a: permission denied
make: *** [Makefile:32: install] Error 1
This prevents drivers from building:
> make build
+ docker build -q -t bblfsh/bash-driver-build -f /home/abeaumont/go/src/github.com/bblfsh/bash-driver/.sdk/tmp/tmp.1510160514-853298827 .
+ docker run --rm -t -u bblfsh:1000 -v /home/abeaumont/go/src/github.com/bblfsh/bash-driver:/opt/driver/src/ -e ENVIRONMENT=bblfsh/bash-driver-build -e HOST_PLATFORM=Linux bblfsh/bash-driver-build make build-native-internal
+ docker build -q -t bblfsh/bash-driver-build-with-go -f /home/abeaumont/go/src/github.com/bblfsh/bash-driver/.sdk/tmp/tmp.1510160520-277170896 .
+ docker run --rm -t -u bblfsh:1000 -v /home/abeaumont/go/src/github.com/bblfsh/bash-driver:/opt/driver/src/ -v /home/abeaumont/go:/go -e ENVIRONMENT=bblfsh/bash-driver-build-with-go -e HOST_PLATFORM=Linux bblfsh/bash-driver-build-with-go make build-driver-internal
/bin/sh: /go/bin/bblfsh-sdk-tools: not found
This is to record the demand of data scientists to have *.ipynb parsed. There are quite a few already. I guess we should merge all the code cells and interpret them as a Python script.
@juanjux Do you think it should be added to Python driver? It is going to reuse 100% of the code.
When an annotation adds more than one role, only the first one is extracted in the annotation documentation. For example, the following annotations:
On(jdt.EnhancedForStatement).Roles(ForEach, Statement).Children(
On(jdt.PropertyParameter).Roles(ForInit, ForUpdate),
On(jdt.PropertyExpression).Roles(ForExpression),
On(jdt.PropertyBody).Roles(ForBody),
),
generates the following annotation documentation:
| /self::\*\[@InternalType='CompilationUnit'\]//\*\[@InternalType='EnhancedForStatement'\] | ForEach |
| /self::\*\[@InternalType='CompilationUnit'\]//\*\[@InternalType='EnhancedForStatement'\]/\*\[@internalRole\]\[@internalRole='parameter'\] | ForInit |
| /self::\*\[@InternalType='CompilationUnit'\]//\*\[@InternalType='EnhancedForStatement'\]/\*\[@internalRole\]\[@internalRole='expression'\] | ForExpression |
| /self::\*\[@InternalType='CompilationUnit'\]//\*\[@InternalType='EnhancedForStatement'\]/\*\[@internalRole\]\[@internalRole='body'\] | ForBody |
while it should generate:
| /self::\*\[@InternalType='CompilationUnit'\]//\*\[@InternalType='EnhancedForStatement'\] | ForEach, Statement |
| /self::\*\[@InternalType='CompilationUnit'\]//\*\[@InternalType='EnhancedForStatement'\]/\*\[@internalRole\]\[@internalRole='parameter'\] | ForInit, ForUpdate |
| /self::\*\[@InternalType='CompilationUnit'\]//\*\[@InternalType='EnhancedForStatement'\]/\*\[@internalRole\]\[@internalRole='expression'\] | ForExpression |
| /self::\*\[@InternalType='CompilationUnit'\]//\*\[@InternalType='EnhancedForStatement'\]/\*\[@internalRole\]\[@internalRole='body'\] | ForBody |
Currently, the server fills the Response.Language, either with the Request.Language if it was specified, or with the autodetected language if it wasn't. But when running against a driver container directly, the driver must at least do the first part. This should be implemented in the generic (golang) part of the drivers.
With the new agglutinative UAST it'll be hard to avoid some duplicated roles. They shouldn't have any effect, but they are un-classy and annoy some users.
It should be pretty easy to "uniq" the roles of a node in the SDK.
For instance, in case of the bash-driver, whose native driver is writen in Java, the batch should say java version: 1.8....
but instead it says bash version: 1.8....
.
Currently drivers manifests has no mentions of version of SDK they were written for. It leads to a situation when driver might become outdated and incompatible, while still being listed as beta
or even stable
.
I propose to include new (required) field to manifest with semantic version of SDK. This field will be overwritten by bblfsh-sdk update
and will be considered when interpreting driver status.
As always, minor version difference will not affect driver status, but major difference will automatically drop the status to inactive
.
As @juanjux commented in slack:
We probably should relax the specifications of the positions to just provide what the native AST provides or we won't make a new driver per year. Future versions could then add optionally tokenizing information in the nodes.
Port old annotation DSL predicates to a new transformation DSL.
Requires: #241
TODO:
Hook prevents the first time your want to commit README.md after generating it with bootstrap.
That will make tests easier and errors much more informative.
As required by bblfsh/scala-client#68 and bblfsh/web#89, it would be needed to let the clients to fetch the list of installed drivers.
That API is part of the bblfshd/daemon/protocol, and it's currently only exported throwgh a unix socket for admin purposes so it can not be reached from the current clients.
Is there any plan to move the diver listing methods to the user network?
Here is the list of clients needing this:
These are modifiers that in general give an enclosed scope access to a variable in an outside scope. This could mean read/write or just read access.
This concept is similar to the existing "VisibleFrom" rules already on the AST but at the point of usage instead of declaration.
For example in Python we have "global" to allow an enclosed scope (usually a function definition) to modify a globally defined variable. There is also "nonlocal" for closures to be able to write variables of the enclosing scope; this concept (for closures) is usually called "captures" and it's also in C++11 and other languages supporting closures (in C++11 the capture is needed even for read access, in Python the read access is automatic and its used for obtaining write access).
We should investigate what other languages have for these access usage modifiers before defining how we'll do this (for example, are languages also supporting module/namespace/other access modifiers?
Found while running the fixtures with a just build python-driver image and running the fixture regeneration: it didn't find the python 2 binary.
After deleting the docker image, I rebuild it and then when trying to install it into bblfshd it said:
"Installing python language driver from "bblfsh/python-driver:dev-21b075d-dirty"... Error, manifest unknown: manifest unknown"
.
I deleted everything (docker rmi (docker images -q) -f; and docker rm (docker ps -a -q) -f
), redownloaded bblfshd's image and then it worked.
We should investigate this if it happens again.
Currently the drivers always link against ~master. We should add glide to the skeleton files and use it in the Go part of the driver build system.
Roles numeric values do not match in Go and Proto. It seems they start at index 1 in Go and they start at index 0 in Proto.
It might be caused by a proteus issue: src-d/proteus#86
(or it might not)
Once we fix this, we need to build new versions of published drivers.
This would make the Roles language more expressive and flexible, remove complexity, make it more extensive and better suited for language analysis.
There are various tasks that need to be done for this:
Add development section to README, explaining which files are generated, when to update them, etc.
Currently all the fields added to Parser.Tokenkeys
are evaluated for token extraction for all the nodes. This has the problem that some nodes could have both of the fields, being only one of them the token
field, but producing an error.
Two solutions could be used:
Change the data structure to optionally allow to specify an "internal_type" where the token extraction will be done for every specified token-field.
Add a new annotation that could extract a field as the token, e. g.:
// Node of internal type "Something" as the token in the "value" field:
On(HasInternalType(pyast.Something)).Roles(SomeRoles).TokenFrom("value")
Sometimes it is important to keep it on hands.
If there is no such field you need to store all parents while performing some complicated UAST traversing.
We have to provide up-to-data proto files and gRPC service, which will be implemented by the bblfsh server.
The pre-commit hook installed by bblfsh-sdk has a compatibility issue with git < 2.7 because that's when the "--points-at" option to git branch
was introduced (git/git@aa3bc55).
Is there a way to do something similar using older options?
This update will allow us to use type aliases, that can be used to map driver.Status
to protocol.Status
(and other similar cases).
(I'll take a look at this, adding the issue to add a note on the sprint task for prefix/infix/postfix that depends on this being fixed).
How to reproduce:
make integration-test
so these files are regenerated again.git commit
adding the new files.git diff
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.