bblfsh / documentation Goto Github PK
View Code? Open in Web Editor NEWBabelfish documentation (GitBook)
Home Page: https://docs.sourced.tech/babelfish
License: Creative Commons Attribution Share Alike 4.0 International
Babelfish documentation (GitBook)
Home Page: https://docs.sourced.tech/babelfish
License: Creative Commons Attribution Share Alike 4.0 International
https://github.com/bblfsh/documentation/blob/master/uast/semantic-uast.md does not contain info about uast:FunctionGroup
but it exists.
I think it is also a good idea to have a node about pending semantic uast Nodes.
Related and previous conversation at https://github.com/src-d/devrel/issues/77
We should replace --all
by --recommended
in Getting Started and explain --all
too, probably in Advanced Usage
In the clients guide:
The client API's differ to adapt to their language specific idioms, the following [codes] [shows] several simple examples with the Go, Python and Scala clients that [parsers] a file and [applies] a filter to return all the simple identifiers.
codes
. Maybe it could be replaced with code snippets
for example.shows
should be show
alsoparsers
-> parse
applies
-> apply
As I'm not native english speaker, every time I mention something non-trivial please double-check if possible in case I am wrong.
After the empathy session, I think it is a good idea to add a link to the https://doc.bblf.sh/using-babelfish/getting-started.html page on https://doc.bblf.sh/. It exists on the menu, but I think it is better to put it to somewhere in the text also)
We need to add a section on the UAST Specification clarifying what can be assumed as language independent in UAST.
For example, a recent example: it cannot be assumed that some type of declaration (e.g. top-level type declaration) is at a specific level (e.g. level 2) in the tree.
Documentation is missing references and a link in the navigation menu to the online dashboard https://dashboard.bblf.sh/
Page: https://doc.bblf.sh/driver/annotations.html
A few minor issues on this page:
From @ajnavarro yesterday (more):
- Next documentation point: Language clients
- a little bit strange name for bblfsh clients
It is true that "Language Clients" might be confused and it could be just "Clients".
What do you think?
By default, bblfsh server spams with debug logs. When I extract hundreds of files, my terminal explodes. I suggest to add the notice (to FAQ?) how to set it to info, error, etc. - and which log levels are supported at all.
Hello! I make a living by translating articles, lectures and some other documents English to Turkish or Turkish to English. I would like to help you to translate your project to Turkish. If you download the Licence document and give me a permission to do so, I would be really appreciated to help you. Thank you in advanced...
Previous conversation at https://github.com/src-d/devrel/issues/78.
Seems it's not very clear the separation of stateless and non-stateless (or stateful? ) in the documentation.
I hadn't understood that after the sentence "On macOS, you first need to create etc." is referring to non-stateless mode. In my opinion is a bit confusing the non-clear separation. Maybe a sentence could be added saying something like:
https://doc.bblf.sh/ page (and any other): I click to the first link in the menu: Babelfish (aka. bblfsh)
and it leads to https://legacy.gitbook.com/book/bblfsh/documentation/details.
I expect to reach the first doc page (https://doc.bblf.sh/).
From @ajnavarro yesterday (more):
- Next documentation point: UAST querying
- I don't need that at that point, I need a way to connect to bblfsh programatically
"Language Clients" section could go before "UAST Querying", since getting a client for your language of choice and connecting to Babelfish will usually come before querying any UAST.
I think we should keep a binary compatibility with roles, ie, even if the list of roles changes in the future the associated numeric number will not.
I propose to change:
const (
// Invalid Role is assigned as a zero value since protobuf enum definition must start at 0.
Invalid Role
SimpleIdentifier
QualifiedIdentifier
BinaryExpression
BinaryExpressionLeft
BinaryExpressionRight
BinaryExpressionOp
/// ...
to:
const (
// Invalid Role is assigned as a zero value since protobuf enum definition must start at 0.
Invalid Role = 0
SimpleIdentifier = 1
QualifiedIdentifier = 2
BinaryExpression = 3
BinaryExpressionLeft = 4
BinaryExpressionRight = 5
BinaryExpressionOp = 6
/// ...
https://godoc.org/github.com/bblfsh/sdk/uast#Role
This is specially important since we will need to provide these roles in the C++ library too.
Native parsers MUST provide, at least, offset or line+col for positions
to:
Native parsers SHOULD provide, at least, offset or line+col for positions when the native parser gives any positional information for the node
And:
Nodes with defined token SHOULD have (...) when the native parser provides it.
We should keep ./uast/roles.md file updated by auto-generating it from SDK sources and by examining available drivers.
Just check the links https://doc.bblf.sh/uast/uast_v2.md and https://doc.bblf.sh/uast/representation_v2.md and you see that they are not rendered correctly. You see pure text files. Also, links to these pages do not open. For example the first link on https://doc.bblf.sh/uast/uast-specification.html ( See UASTv2 for the new version.
). Or broken link on https://doc.bblf.sh/using-babelfish/advanced-usage.html in 2) Use the latest Go client, set.Mode(Semantic) and change XPath to use the new Semantic UAST types.
sentence.
I think it is all about the same problem, so I create only one issue.
To detail common problems and solutions:
Once bblfsh/bblfshd#47 is merged an published, documentation should be updated to reflect how to override driver images, something like:
export BBLFSH_DRIVER_IMAGES="python=docker-daemon:bblfsh/python-driver:dev-96b24d3;java=docker-daemon:bblfsh/java-driver:latest"
docker run -e BBLFSH_DRIVER_IMAGES --privileged -p 9432:9432 --name bblfsh bblfsh/server bblfsh server
I noticed in https://doc.bblf.sh/using-babelfish/getting-started.html section, we have both Docker Image
and docker Image
. Maybe one form should be kept, so the documentation will be consistent.
Related to https://github.com/src-d/devrel/issues/72
At the documentation at Further Reading This repo contains the project documentation, which you can also see properly rendered at https://doc.bblf.sh/. (see bottom of image below) when clicking here, it redirects at the same page. The contect of this webpage is the same as on the readme on Github which makes sense there, but doesn't make sense on the doc pages.
Default getting started instructions include stateful bblfshd server, which requires to mount /var/lib/bblfshd
to host FS.
Current instructions do not work on macOS and that could confuse new users
docker start bblfshd
Error response from daemon: Mounts denied:
The path /var/lib/bblfshd
is not shared from OS X and is not known to Docker.
You can configure shared paths from Docker -> Preferences... -> File Sharing.
See https://docs.docker.com/docker-for-mac/osxfs/#namespaces for more info.
Error: failed to start containers: bblfshd
To avoid that, one suggestion could be - make quickstart instructions not stateful, but keep a section on how to make it stateful below, marked as (Optional).
The build example recommends this steps:
# build SDK
go get -u github.com/bblfsh/sdk/...
# build a driver + container
git clone https://github.com/bblfsh/java-driver.git
go get -v -t ./...
make build
If I follow them, I get an error in the last one (make build
):
; make build
Makefile:3: *** You must install bblfsh-sdk. Stop.
I think the problem is that the .sdk
directory is missing and the error reporting is misleading.
According to https://github.com/bblfsh/java-driver/blob/master/README.md
a bblfsh-sdk pre-build
is needed to generate the .sdk
directory.
Add a roadmap section so that people knows what to expect from Babelfish in the near future.
https://github.com/bblfsh/documentation/blob/master/using-babelfish/clients.md#python-example
Should be
import bblfsh
from bblfsh import filter as filter_uast
if __name__ == "__main__":
client = bblfsh.BblfshClient("0.0.0.0:9432")
response = client.parse("some_file.py")
if response.status != 0:
raise Exception('Some error happened: ' + str(response.errors))
query = "//*[@roleIdentifier and not(@roleQualified)]"
nodes = filter_uast(response.uast, query)
for n in nodes:
print(n)
We can generate languages.md directly from driver repositories by reading manifest files and checking Docker registry.
Explain how to use --max-message-size
I was following the Getting Started instructions to install bblfshd and the drivers.
First I typed
$ docker run -d --name bblfshd --privileged -p 9432:9432 -v /tmp/bblfshd:/var/lib/bblfshd bblfsh/bblfshd
And everything seemed ok . Response for docker logs bblfshd
:
time="2017-10-24T09:02:06Z" level=info msg="bblfshd version: v2.1.0 (build: 2017-10-11T14:17:00+0000)"
time="2017-10-24T09:02:06Z" level=info msg="initializing runtime at /var/lib/bblfshd"
time="2017-10-24T09:02:06Z" level=info msg="control server listening in /var/run/bblfshctl.sock (unix)"
time="2017-10-24T09:02:06Z" level=info msg="server listening in 0.0.0.0:9432 (tcp)"
However, when I typed
$ docker exec -it bblfshd bblfshctl driver install --all
I got as return
Installing python driver language from "docker://bblfsh/python-driver:latest"... Error
Error, mkdir /var/lib/bblfshd/tmp/image773185457: permission denied
Installing java driver language from "docker://bblfsh/java-driver:latest"... Error
Error, mkdir /var/lib/bblfshd/tmp/image255471964: permission denied
I don't know if this relates to the Issue #97 opened by Alex.
https://github.com/bblfsh/documentation/blob/master/uast/semantic-uast.md contains 3 ways to deal with position information: @pos
, uast:Position
, uast:Positions
and to be honest I do not get the difference from the doc. And when I should use what.
Can we clarify it?
So all drivers should provide them.
Please add me as a member of bblfsh organization, so we can gather metrics on the public repos on data intelligence.
Its GSoC 2018 org CFP period and I thought Bblfsh project might want to participate, so why do not we start gathering preliminary project ideas?
Some from the top of my head include i.e adding more drivers for new languages.
What do you guys think?
Missing a link to the full documentation at https://doc.bblf.sh/ in the README (since the readme by itself is only the introduction to the project).
Previous discussion at: https://github.com/src-d/devrel/issues/75
Maybe it's not necessary, but I thought it would be nice.
From @ajnavarro yesterday (more):
- I have the necessity of know the language of the file before hand to use bblfsh
- talk about enry on bblfsh documentation to fill that necessity
Somewhere in the documentation we should clarify that when we choose language keys, we use github/linguist as reference (languages.yml) and both enry in Go and linguist in Ruby will do the job if language detection is needed before passing to Babelfish.
Compare => (comparators) .ops[list] = Eq | NotEq | Lt | LtE | Gt | GtE | Is | IsNot | In | NotIn
BoolOp => .boolop = And | Or
BinOp => .op = Add | Sub | Mult | MatMult | Div | Mod | Pow | LShift | RShift | BitOr |
BitXor | BitAnd | FloorDiv
UnaryOp => .unaryop = Invert | Not | UAdd | USub
FunctionInvocation, FunctionInvocation, FunctionInvocationName, FunctionInvocationArgumentList, FuncionInvocationArgument, FunctionInvocationArgumentDefaultValue (and MethodInvocationArgumentDefaultValue).
FunctionInvocationObject for languages that support emulating method calls with functions like C#, Ruby, D, Nim, etc (3.toString, etc).
LambdaExpression, LambdaArguments, LambdaBody
BoolLiteral
UnicodeString or RawString or ByteString and decide what the default String is
Compount literals (with child nodes): ComplexLiteral, TupleLiteral, ListLiteral, SetLiteral, DictLiteral (or HashLiteral), FormattedStringLiteral or ParametrizedStringLiteral.
Async / Await or Join and/or the more specific (in Python) AsyncDef, AsyncFor and AsyncWhile.
Comprehension (ListComprehension, DictComprehension, SetComprehension).
GeneratorExpression, Yield, YieldFrom.
IfExpression (a = 3 if condition else 4 in Python or the a = condition? Value: elseValue or C derived languages).
ForEach, ForEachIter, ForEachTarget
ForElse, WhileElse, DoWhileElse: in Python and other languages loops can have an "else" clause that will be run if the loops reach the end (by iterating over everything in case of the for or the condition being false in for the while) without a break. Could be clearer if we call it “ForComplete, WhileComplete, etc”.
TryElse: Python exceptions can have an else "If there is no exception run this". Could be called “TryNoException”.
BlockScopeResource, BlockScopeResourceObject, BlockScopeResourceAlias: for representing Python (and other languages) blocks with a resorce that will be freed at the end (with open("file.txt") as f:).
Delete
Print / Echo is a keyword in a lot of languages.
Keyword for other very language-specific keywords with a SimpleIdentifier subnode and optionally argument lists. For example in Python we could use it for: global, nonlocal, exec, eval, ellipsis.
ExpandOperator (* in Python to expand a list).
Annotation (like argument or assigment annotations in Python).
AugmentedAssigment (+=, -+, etc).
Unary[Pre|Post]Increase / Unary[Pre|Post]Decrease (a++, --a, etc). Or UnaryOperator with a IncreaseOperator or DecreaseOperator child node (Python doesn’t have this but C-derived languages do).
IndexExpression, IndexSliceExpression, IndexCompoundExpression. In Python we have Subscript ([]) that can have an Index child node ([3]) or a Slice child node with lower, upper, step ([3:2:1]) or a ExtSlice that can have any number or Index or Slice childs ([3,2:4,2:7,3…]).
Comments and Whitespace: I added them as properties of other "real" nodes (nodes.leading_comments, nodes.sameline_comments). This is the approach that two other FST are using, they could certainly could be converted to nodes with some effort but we should consider if they’re better represented as this (this structure of noops as node properties makes it pretty easy to recover the original code).
Python AST nodes that assign or read (so most of them) have a "ctx" or “expression context” field that indicates if the node is written to (“Store”), is read (“Load”), or deleted (“Del”). While it’s interesting it also can be 100% inferred so I omitted that.
The Call node in Python is used for both methods and functions, but there are differences that will all us to differentiate normal calls from method calls:
"func": {"ast_type": "Attribute",
"attr": "update",
"value": {"ast_type": "Name", "id": "retDict"}
}
"func": {"ast_type": "Name", "id": "export_dict"}
The differences that will allow us to identify one from the other are:
The method has the "Attribute" ast_type while the function has simply “Name”.
The method has a "value" subnode and Call.func.value.id is the name of the object on which the call is being made.
The method name is on the Call.func.attr node whiile on the func is on Call.func.id.
With this it will be easy to identify one or the other with the caveat that non-method functions used with the module name (example: ast.parse()) use the "method" form. This makes sense because in Python modules are (singleton) objects but if the function was imported as “from ast import parse” then it uses the second for. So we could just keep them as MethodCalls or I could add some intelligence to the AST exporter to find if the left side of the dot is an imported module and mark those as normal functions. This could fail in some cases because in Python you can play and modify the import system but those cases should be rare and we could fallback to leave them as Method in that case.
PS: this site is the best description of Python AST nodes:
https://docs.python.org/3/library/ast.html
Probably on this page:
There is an inconsistency in java&python drivers. EndPosition.End.Col
for java driver returns the position of next character and python driver returns the position of the last character. For example for code print
node will have EndPosition.End.Col = 6
for java and EndPosition.End.Col = 5
for python.
Currently, it isn't defined what driver should return, so it depends on native AST.
Position should be defined and later fixed in drivers accordingly.
It would be nice to have a page where Babelfish is briefly compared with other similar efforts. Highlighting key differences would help people from other communities get interested in Babelfish as well as provide better understanding of possible solutions "landscape".
Other efforts worth comparing to:
Something like https://github.com/OpenGrok/OpenGrok/wiki/Comparison-with-Similar-Tools or even simpler - a paragraph about each tool would be great.
What would be the best place for such page? If you guys see value in this - I'll be happy to allocate some time and contribute a first draft.
SVG diagrams generated from Markdown with mermaid are ugly and look broken. Let's fix them or switch to some nice PNGs.
$ grep -nr --exclude='node_modules/*' mermaid **/*.md
architecture.md:8:```mermaid
architecture.md:25:```mermaid
driver/protocol.md:21:```mermaid
uast/specification.md:105:```mermaid
uast/specification.md:114:```mermaid
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.