bblfsh / bblfshd Goto Github PK
View Code? Open in Web Editor NEWA self-hosted server for source code parsing
Home Page: https://doc.bblf.sh
License: GNU General Public License v3.0
A self-hosted server for source code parsing
Home Page: https://doc.bblf.sh
License: GNU General Public License v3.0
Add basic Server
that takes a runtime path and uses it to execute drivers with default names (bblfsh/<language>-driver
) given gRPC requests.
I have a strange problem and it is hard to give you a short example how to reproduce the bug.
I really like to have one, but I can not find it.
The problem is that bblfsh server hangs after several minutes of work on science-3
. The easy way to reproduce it on science-3
will be
docker run --rm --privileged -d -p 9434:9432 --name bblfsh_test bblfsh/server:v0.7.0
docker run --rm -it -v /storage:/storage --name bblfsh_hang_client -v /data:/data -e "LD_PRELOAD=" srcd/science bash
# next in bblfsh_hang_client container
cd /storage/konstantin
./setup_docker.sh
export PYTHONPATH='./modelforge:./ast2vec:./snippet_ranger'
rm -rf ./data/sources/matplotlib # I keep my data in matplotlib_old no worries :)
python3 ./entry_pnt.py
It will convert repos using ast2vec
(last develop version(src-d/ml@973707e)) to asdf model filies. You need to wait 5-20 minutes and you will see in bblfsh logs something like:
time="2017-09-13T16:50:46Z" level=info msg="parsing rcm_server_pbs.py (9073 bytes)"
time="2017-09-13T16:50:46Z" level=info msg="parsing rcm_server_ssh.py (4199 bytes)"
time="2017-09-13T16:50:46Z" level=info msg="parsing test.py (254 bytes)"
time="2017-09-13T16:50:46Z" level=info msg="parsing rcm_client_tk.spec (5756 bytes)"
time="2017-09-13T16:50:47Z" level=info msg="container started bblfsh/python-driver:latest (01BSY2CV6XY2RJX079VC9PQQKD)"
time="2017-09-13T16:50:47Z" level=error msg="driver bblfsh/python-driver:latest (01BSY2C400G44J3ZW3AV2YHZ2C) stderr: ERROR:root:Filepath: , Errors: ['Traceback (most recent call last):\\n File \"/usr/lib/python3.6/site-packages/python_driver/requestprocessor.py\", line 173, in process_request\\n self._send_response(response)\\n File \"/usr/lib/python3.6/site-packages/python_driver/requestprocessor.py\", line 220, in _send_response\\n self.outbuffer.write(json.dumps(response, ensure_ascii=False))\\n File \"/usr/lib/python3.6/json/__init__.py\", line 238, in dumps\\n **kw).encode(obj)\\n File \"/usr/lib/python3.6/json/encoder.py\", line 199, in encode\\n chunks = self.iterencode(o, _one_shot=True)\\n File \"/usr/lib/python3.6/json/encoder.py\", line 257, in iterencode\\n return _iterencode(o, 0)\\n File \"/usr/lib/python3.6/json/encoder.py\", line 180, in default\\n o.__class__.__name__)\\nTypeError: Object of type \\'complex\\' is not JSON serializable\\n']"
(Sometime it is just hangs without error).
and in bblfsh_hang_client
containner
WARNING:source_transformer:Failed to construct model for /storage/konstantin/data/repos/matplotlib/fish2000@imread: itemsize cannot be zero in type
It is because bblfsh hang.
But if you restart last command python3 ./entry_pnt.py
it will produce the same warnings and nothing in bblfsh logs. May be It can be related to grpc problems, but I am not 100% sure.
P.S.: I and @fineguy keep trying to find simple example without ast2vec
usage. Something like this: https://gist.github.com/zurk/ad464aa73ad244980457dd2f09ff3abd#file-bblfsh_hang-py but it seems to work ok at least for short time. Also ./entry_pnt.py
you can find in the same gist: https://gist.github.com/zurk/ad464aa73ad244980457dd2f09ff3abd#file-entry_pnt-py just in case.
When a bblfshd
binary runs inside a container, it crash.
Is it an expected behavior?
How to reproduce:
I used the following Dockerfile
, to define a container with a bblfshd
binary inside
FROM ubuntu:16.04
WORKDIR bblfsh
RUN apt-get update && \
apt-get install --assume-yes wget vim
ENV BBLFSH_VERSION 2.2.0
RUN wget "https://github.com/bblfsh/bblfshd/releases/download/v${BBLFSH_VERSION}/bblfshd_v${BBLFSH_VERSION}_linux_amd64.tar.gz" && \
wget "https://github.com/bblfsh/bblfshd/releases/download/v${BBLFSH_VERSION}/bblfshctl_v${BBLFSH_VERSION}_linux_amd64.tar.gz" && \
tar -xf "bblfshd_v${BBLFSH_VERSION}_linux_amd64.tar.gz" && \
tar -xf "bblfshctl_v${BBLFSH_VERSION}_linux_amd64.tar.gz" && \
mv */bblfsh* /usr/local/bin/ && \
rm -rf bblfsh*
RUN apt-get install --assume-yes software-properties-common && \
add-apt-repository --yes ppa:alexlarsson/flatpak && \
apt-get update && \
apt-get install --assume-yes libostree-1-1 tzdata
ENTRYPOINT bblfshd -log-level debug
docker build --rm --tag bblfsh-image .
docker run --detach --interactive --tty --rm --name bblfsh-container bblfsh-image
docker exec --interactive --tty bblfsh-container bash
and from its inside I installed the drivers and tried to parse a python file:
bblfshctl driver install python bblfsh/python-driver:latest;
echo "import something" > example.py
bblfshctl parse example.py
You get:
Installing python driver language from "bblfsh/python-driver:latest"... Done
Status: Fatal
Elapsed: 10.96427ms
Errors:
- unexpected error: container_linux.go:265: starting container process caused "process_linux.go:250: running exec setns process for init caused \"exit status 34\""
[2017-11-17T21:42:48Z] INFO bblfshd version: v2.2.0 (build: 2017-11-14T09:15:01+0000)
[2017-11-17T21:42:48Z] INFO initializing runtime at /var/lib/bblfshd
[2017-11-17T21:42:48Z] INFO server listening in 0.0.0.0:9432 (tcp)
[2017-11-17T21:42:48Z] DEBUG registering grpc service
[2017-11-17T21:42:48Z] INFO control server listening in /var/run/bblfshctl.sock (unix)
...
[2017-11-17T21:43:23Z] INFO driver python installed "bblfsh/python-driver:latest"
[2017-11-17T21:43:23Z] DEBUG detected language "python", filename "example.py"
[2017-11-17T21:43:23Z] DEBUG spawning driver instance "bblfsh/python-driver:latest" ...
nsenter: failed to unshare namespaces: Operation not permitted
WARN[0034] os: process already finished
[2017-11-17T21:43:23Z] ERROR error selecting pool: unexpected error: container_linux.go:265: starting container process caused "process_linux.go:250: running exec setns process for init caused \"exit status 34\""
[2017-11-17T21:43:23Z] ERROR request processed content 17 bytes, status Fatal elapsed=48.104549ms language=
Clients should be able to query the server for installed drivers, as well as supported drivers (maybe through some metadata file published officially).
Add driver client to be used by the server.
It should implement, at least:
ParseUAST(req *protocol.ParseUASTRequest) (*protocol.ParseUASTResponse, error)
The rest of it should be something along the lines of:
type Driver struct{}
func ExecDriver(r *runtime.Runtime, desc *DriverDescriptor) (*Driver, error)
func (d *Driver) ParseUAST(req *protocol.ParseUASTRequest) (*protocol.ParseUASTResponse, error)
func (d *Driver) Close() error
You are maybe interested: SmaCCRefactoring/SmaCC#25
According to documentation, this would be enough to make both server and client work with docker:
$ docker run --privileged -p 9432:9432 --name bblfsh bblfsh/server
$ docker run -v $(pwd):/work --link bblfsh bblfsh/server bblfsh client --address=bblfsh:9432 /work/sample.py
The second command generates an error instead:
$ docker run -v $(pwd):/work --link bblfsh bblfsh/server bblfsh client --address=bblfsh:9432 /work/sample.py
time="2017-09-07T10:03:07Z" level=info msg="binding to bblfsh:9432"
time="2017-09-07T10:03:07Z" level=info msg="initializing runtime at /tmp/bblfsh-runtime"
listen tcp 172.17.0.2:9432: bind: cannot assign requested address
time="2017-09-07T10:03:07Z" level=error msg="exiting with error: listen tcp 172.17.0.2:9432: bind: cannot assign requested address"
Separating the linking and client execution parts seem to work though, so there may be same race condition there.
Reported by @dpordomingo
Hi,
Hope you are all well !
Goal: generate a source code summary by parsing source code and package manager manifests.
I was wondering if the project would be suitable to parse CMakeLists.txt or other dependencies manager (NPM, Maven,...) in order to get a more complete overview of any project. And how would be the best approach to do it.
Example of use case:
Thanks in advance for any insights or point of view.
Cheers,
Right now, if one tries to override default driver images while running server in docker container
BBLFSH_DRIVER_IMAGES="python=docker-daemon:bblfsh/python-driver:dev-4dd607b;java=docker-daemon:bblfsh/java-driver:dev-45a5e8f" docker run -e BBLFSH_DRIVER_IMAGES --privileged -p 9432:9432 --name bblfsh bblfsh/server:dev-45a5e8f
time="2017-07-17T09:28:02Z" level=debug msg="binding to 0.0.0.0:9432"
time="2017-07-17T09:28:02Z" level=debug msg="initializing runtime at /tmp/bblfsh-runtime"
time="2017-07-17T09:28:02Z" level=debug msg="Overriding image for python: docker-daemon:bblfsh/python-driver:dev-4dd607b"
time="2017-07-17T09:28:02Z" level=debug msg="Overriding image for java: docker-daemon:bblfsh/java-driver:dev-45a5e8f"
time="2017-07-17T09:28:02Z" level=debug msg="starting server"
time="2017-07-17T09:28:02Z" level=debug msg="registering gRPC service"
time="2017-07-17T09:28:02Z" level=info msg="starting gRPC server"
The client will fail with
error getting driver: missing driver for language python: runtime failure: Error loading image from docker engine: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
error getting driver: missing driver for language java: runtime failure: Error loading image from docker engine: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?]
There are no attempts to fetch a driver in server logs
Any ideas how to debug?
if I start bblfshd like this
docker run -d --name bblfshd --privileged -p 9432:9432 -v $(pwd)/bblfshd-data:/var/lib/bblfshd bblfsh/bblfshd
and then call "parse" method - server never respond. No matter do I do call inside or outside container docker exec -it bblfshd bblfshctl parse /opt/bblfsh/etc/examples/python.py
|| using dashboard.
After that I'm unable to do anything with container docker stop
& docker kill
just hangs forever. Only restarting all docker helps.
But if I run bblfshd without mounting
docker run -d --name bblfshd --privileged -p 9432:9432 bblfsh/bblfshd
everything works fine.
I undestand that issue description isn't very helpful, but I don't know how to collect more information.
This is an example of a bblfsh server's logs under high load:
time="2017-09-11T10:12:06Z" level=info msg="parsing HomePageFragment2.java (28041 bytes)"
time="2017-09-11T10:12:06Z" level=info msg="parsing OnlineFragment.java (6132 bytes)"
time="2017-09-11T10:12:06Z" level=info msg="parsing PersonCenterFragment.java (134 bytes)"
time="2017-09-11T10:12:06Z" level=info msg="parsing RankFragment.java (7577 bytes)"
time="2017-09-11T10:12:06Z" level=info msg="container started bblfsh/java-driver:latest (01BSR6SDW513VN0ZZSS3PES6N4)"
time="2017-09-11T10:12:06Z" level=info msg="parsing RelevantVideoFragment.java (135 bytes)"
time="2017-09-11T10:12:06Z" level=info msg="parsing SubareaFragment.java (2345 bytes)"
time="2017-09-11T10:12:06Z" level=info msg="parsing VideoInfoFragment.java (9443 bytes)"
time="2017-09-11T10:12:06Z" level=info msg="container started bblfsh/java-driver:latest (01BSR6SE047BJAT9SPFDAWW33Y)"
time="2017-09-11T10:12:06Z" level=info msg="parsing AreaItem.java (349 bytes)"
time="2017-09-11T10:12:06Z" level=info msg="parsing BannerItem.java (708 bytes)"
time="2017-09-11T10:12:06Z" level=info msg="parsing GameItem.java (499 bytes)"
time="2017-09-11T10:12:06Z" level=info msg="parsing OnlineVideo.java (806 bytes)"
time="2017-09-11T10:12:06Z" level=info msg="parsing Page.java (494 bytes)"
time="2017-09-11T10:12:06Z" level=info msg="parsing User.java (2211 bytes)"
time="2017-09-11T10:12:07Z" level=info msg="parsing Video.java (3521 bytes)"
time="2017-09-11T10:12:07Z" level=info msg="parsing VideoItem.java (2590 bytes)"
time="2017-09-11T10:12:07Z" level=info msg="container started bblfsh/java-driver:latest (01BSR6SE443RDZYB2WRGX9XZA2)"
time="2017-09-11T10:12:07Z" level=info msg="parsing ArrayUtils.java (1166 bytes)"
time="2017-09-11T10:12:07Z" level=info msg="parsing CompressionTools.java (2928 bytes)"
time="2017-09-11T10:12:07Z" level=info msg="parsing Constants.java (3866 bytes)"
time="2017-09-11T10:12:07Z" level=info msg="parsing DeviceUtils.java (7551 bytes)"
time="2017-09-11T10:12:07Z" level=info msg="container started bblfsh/java-driver:latest (01BSR6SE8SKJ4YE9PH84NVX79P)"
time="2017-09-11T10:12:07Z" level=info msg="parsing DownUtil.java (4623 bytes)"
time="2017-09-11T10:12:07Z" level=info msg="container started bblfsh/java-driver:latest (01BSR6SEDFX9589EGJZA0Y7S7C)"
time="2017-09-11T10:12:07Z" level=info msg="container started bblfsh/java-driver:latest (01BSR6SEHXZ69FWJKYBC579B59)"
time="2017-09-11T10:12:07Z" level=info msg="parsing FileUitl.java (3662 bytes)"
time="2017-09-11T10:12:07Z" level=info msg="parsing FileUtils.java (11263 bytes)"
time="2017-09-11T10:12:07Z" level=info msg="parsing FractionalTouchDelegate.java (5312 bytes)"
time="2017-09-11T10:12:08Z" level=info msg="parsing HttpDownloader.java (5836 bytes)"
time="2017-09-11T10:12:08Z" level=info msg="parsing HttpUtil.java (15548 bytes)"
time="2017-09-11T10:12:08Z" level=info msg="parsing ImageUtils.java (10043 bytes)"
time="2017-09-11T10:12:08Z" level=info msg="parsing IntentHelper.java (3262 bytes)"
time="2017-09-11T10:12:08Z" level=info msg="parsing JsoupUtil.java (833 bytes)"
time="2017-09-11T10:12:08Z" level=info msg="parsing Logger.java (2561 bytes)"
time="2017-09-11T10:12:08Z" level=info msg="parsing MediaUtils.java (9612 bytes)"
time="2017-09-11T10:12:08Z" level=info msg="parsing MultiMemberGZIPInputStream.java (3315 bytes)"
time="2017-09-11T10:12:08Z" level=info msg="parsing PreferenceUtils.java (4323 bytes)"
time="2017-09-11T10:12:08Z" level=info msg="parsing StringUtils.java (8705 bytes)"
time="2017-09-11T10:12:08Z" level=info msg="parsing ToastUtils.java (2011 bytes)"
time="2017-09-11T10:12:08Z" level=info msg="parsing URLUtil.java (11974 bytes)"
time="2017-09-11T10:12:08Z" level=info msg="parsing XmlReaderHelper.java (3897 bytes)"
time="2017-09-11T10:12:08Z" level=info msg="container started bblfsh/java-driver:latest (01BSR6SG0XZ0FG1PV9DG4WKJM2)"
time="2017-09-11T10:12:08Z" level=info msg="parsing ApplicationUtils.java (3928 bytes)"
time="2017-09-11T10:12:09Z" level=info msg="parsing CircleImageView.java (7305 bytes)"
time="2017-09-11T10:12:09Z" level=info msg="parsing CommonGestures.java (4955 bytes)"
time="2017-09-11T10:12:09Z" level=info msg="parsing FileUtils.java (11308 bytes)"
time="2017-09-11T10:12:09Z" level=info msg="container started bblfsh/java-driver:latest (01BSR6SG51GZQQN4R9SY4B798W)"
time="2017-09-11T10:12:09Z" level=info msg="parsing LeftSliderLayout.java (13298 bytes)"
time="2017-09-11T10:12:09Z" level=info msg="parsing MediaController.java (23956 bytes)"
time="2017-09-11T10:12:09Z" level=info msg="container started bblfsh/java-driver:latest (01BSR6SG97GW4TBW4NWB7FPB94)"
time="2017-09-11T10:12:09Z" level=info msg="parsing PlayerService.java (15043 bytes)"
time="2017-09-11T10:12:09Z" level=info msg="container started bblfsh/java-driver:latest (01BSR6SGDF49AZFENEC93EPWYK)"
time="2017-09-11T10:12:09Z" level=info msg="parsing PullToZoomListView.java (9210 bytes)"
time="2017-09-11T10:12:09Z" level=info msg="container started bblfsh/java-driver:latest (01BSR6SGJ056PZ6VM9P9DB6SE3)"
time="2017-09-11T10:12:09Z" level=info msg="container started bblfsh/java-driver:latest (01BSR6SGPG5NQZN6H0C6RBSHCY)"
time="2017-09-11T10:12:10Z" level=info msg="parsing VP.java (2207 bytes)"
time="2017-09-11T10:12:10Z" level=info msg="parsing VideoView.java (4721 bytes)"
time="2017-09-11T10:12:12Z" level=info msg="parsing ScreenCaptureImageActivity.java (9608 bytes)"
time="2017-09-11T10:12:13Z" level=info msg="parsing NotFoundException.java (546 bytes)"
time="2017-09-11T10:12:13Z" level=info msg="parsing RestError.java (4171 bytes)"
time="2017-09-11T10:12:13Z" level=info msg="parsing NotImplementedException.java (565 bytes)"
time="2017-09-11T10:12:13Z" level=info msg="parsing CrudService.java (1943 bytes)"
time="2017-09-11T10:12:13Z" level=info msg="parsing CrudServiceImpl.java (3271 bytes)"
time="2017-09-11T10:12:13Z" level=info msg="parsing package-info.java (62 bytes)"
time="2017-09-11T10:12:13Z" level=info msg="parsing ClassUtils.java (1725 bytes)"
time="2017-09-11T10:12:13Z" level=info msg="parsing PostInitialize.java (811 bytes)"
time="2017-09-11T10:12:13Z" level=info msg="parsing package-info.java (59 bytes)"
time="2017-09-11T10:12:13Z" level=info msg="parsing PostInitializerRunner.java (7054 bytes)"
I miss the message when containers stop/die. I assume they are written under DEBUG log level, but the level should be INFO since you report "started" messages under INFO. Thus currently it is not possible to estimate the average lifetime of a container.
I see that containers are spawned like crazy. It's been 17 hours since the torture started. Thus I have an impression that the scaling algorithm went wild. Are there any cold period tunables? How can I debug the reason why it happens? Will DEBUG log level help?
I propose to add the reason why the container is killed or restarted - e.g. a scaling decision or a panic/segfault. No need to be verbose, I want smth like
time="2017-09-11T10:12:09Z" level=info msg="container stopped bblfsh/java-driver:latest (01BSR6SG97GW4TBW4NWB7FPB94) - oops"
time="2017-09-11T10:12:09Z" level=info msg="container stopped bblfsh/java-driver:latest (01BSR6SG97GW4TBW4NWB7FPB94) - scaling"
After correctly setting GOPATH
I still had issues
package _/Users/rporres/git/bblfsh-server: unrecognized import path "_/Users/rporres/git/bblfsh-server" (import path does not begin with hostname)
This is how it worked for me (thx to @smola)
mkdir -p $GOPATH/src/github.com/bblfsh
cd $GOPATH/src/github.com/bblfsh
git clone https://github.com/bblfsh/server.git
cd server
make dependencies
make build
Tested in Ubuntu and Mac OS X
Processing 1000s or repos results in 100+mb of server logs mostly due to UASTs printed to stdout.
It can be avoided by skipping printing UASTs to stdout.
May be it could be hidden in -v
or -vv
or some deeper log-level? As it is very useful for lower-level debugging.
Create driver client pool, controlling the execution of a pool of drivers with the same image and using it to serve concurrent requests.
reproducible for https://github.com/spring/svn-spring-archive
specific file: https://github.com/spring/svn-spring-archive/blob/master/Lobby/TASServer/TASServer.java
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/grpc/_common.py", line 129, in _transform
return transformer(message)
google.protobuf.message.DecodeError: Error parsing message
^[[1;36mINFO^[[0m:repos2coocc:^[[0mhttps://github.com/shaunduncan/helga.git pending tasks: 43^[[0m
^[[1;36mINFO^[[0m:repos2coocc:^[[0mhttps://github.com/ups-nlp/nlp425.git pending tasks: 17^[[0m
^[[1;31mERROR^[[0m:repos2coocc:^[[0mError while processing ('/tmp/repo2nbow-6lu1go2q/TASServer/TASServer.java', 'Java').^[[0m
Traceback (most recent call last):
File "/media/root/storage/egor/ast2vec_rsync/ast2vec/repo2base.py", line 97, in thread_loop
filename, language=language, timeout=self._timeout)
File "/media/root/storage/egor/ast2vec_rsync/src/bblfsh/bblfsh/client.py", line 62, in parse_uast
response = self._stub.ParseUAST(request, timeout=timeout)
File "/usr/local/lib/python3.5/dist-packages/grpc/_channel.py", line 507, in __call__
return _end_unary_response_blocking(state, call, False, deadline)
File "/usr/local/lib/python3.5/dist-packages/grpc/_channel.py", line 455, in _end_unary_response_blocking
raise _Rendezvous(state, None, None, deadline)
grpc._channel._Rendezvous: <_Rendezvous of RPC that terminated with (StatusCode.INTERNAL, Exception deserializing response!)>
code to reproduce:
python3 -m bblfsh -e 0.0.0.0:9432 -f TASServer.java --disable-bblfsh-autorun
ERROR:root:Exception deserializing message!
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/grpc/_common.py", line 129, in _transform
return transformer(message)
google.protobuf.message.DecodeError: Error parsing message
Traceback (most recent call last):
File "/usr/lib/python3.5/runpy.py", line 184, in _run_module_as_main
"__main__", mod_spec)
File "/usr/lib/python3.5/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/media/root/storage/egor/ast2vec_rsync/src/bblfsh/bblfsh/__main__.py", line 34, in <module>
sys.exit(main())
File "/media/root/storage/egor/ast2vec_rsync/src/bblfsh/bblfsh/__main__.py", line 30, in main
print(client.parse_uast(args.file, args.language))
File "/media/root/storage/egor/ast2vec_rsync/src/bblfsh/bblfsh/client.py", line 62, in parse_uast
response = self._stub.ParseUAST(request, timeout=timeout)
File "/usr/local/lib/python3.5/dist-packages/grpc/_channel.py", line 507, in __call__
return _end_unary_response_blocking(state, call, False, deadline)
File "/usr/local/lib/python3.5/dist-packages/grpc/_channel.py", line 455, in _end_unary_response_blocking
raise _Rendezvous(state, None, None, deadline)
grpc._channel._Rendezvous: <_Rendezvous of RPC that terminated with (StatusCode.INTERNAL, Exception deserializing response!)>
Add CLI flag for log level.
this is the code used in language.go
func GetLanguage(filename string, content []byte) string {
lang := enry.GetLanguage(filename, content)
if lang == "" {
lang = enry.OtherLanguage
}
lang = strings.ToLower(lang)
lang = strings.Replace(lang, " ", "-", -1)
lang = strings.Replace(lang, "+", "p", -1)
lang = strings.Replace(lang, "#", "sharp", -1)
return lang
}
but since enry.OtherLanguage was changed to be the string zero value ""
, the if statement is reassigning again the same value to lang.
What is the expected value this function should return in case a language couldn't be detected?
When I execute many (3000) concurrent queries (4 threads), Babelfish server either hangs or drops some requests without answering them. CPU load is 0%. After that, server becomes completely unresponsive and I have to restart it.
I found that QUALIFIED_IDENTIFIER
is not SIMPLE_IDENTIFIER
, but @vmarkovtsev say that it is supposed to be.
Also, I found duplication of CALL_CALLEE
role.
How to reproduce:
from bblfsh.client import BblfshClient
filepath = "./matplotlib_example.py"
bc = BblfshClient("0.0.0.0:9432")
res = bc.parse(filepath, language='Python')
print(res)
matplotlib_example.py
:
from matplotlib import pyplot as plt
plt.figure()
Output (lines 76-83):
token: "figure"
start_position {
line: 2
col: 1
}
roles: CALL_CALLEE
roles: CALL_CALLEE
roles: QUALIFIED_IDENTIFIER
The problem is in figure
token. It means that we do not take into account function names during our machine leaning analysis
When you launch bblfsh server and send requests to extract UAST before it fetches all docker images, bblfsh server returns empty responses.
How to reproduce:
docker run --privileged -p 9432:9432 --name bblfsh -e BBLFSH_MAX_INSTANCES_PER_DRIVER=1 bblfsh/server
LC_ALL=en_GB.UTF-8 python3 -m ast2vec repo2coocc --bblfsh 172.17.0.1:9432 -r https ://github.com/some/repo --linguist ./enry -o test_uast_one/repo.asdf
python3 -m ast2vec dump test_uast_one/repo.asdf
{'created_at': datetime.datetime(2017, 6, 20, 14, 29, 7, 781788),
'dependencies': [],
'model': 'co-occurrences',
'uuid': 'f582e9a1-bcdb-4acd-9979-aa0ecf5d1f4f',
'version': [1, 0, 0]}
Number of words: 0
First 10 words: []
Matrix info: number of non zero elements 0 , shape: [0, 0]
but if you wait and send request after loading of all images:
{'created_at': datetime.datetime(2017, 6, 20, 14, 31, 13, 988607),
'dependencies': [],
'model': 'co-occurrences',
'uuid': '6da8f27b-7769-447b-93eb-afdfc4adfa5e',
'version': [1, 0, 0]}
Number of words: 74
First 10 words: ['print', 'input', 'file', 'output', 'reader', 'writer', 'header', 'line', 'csv', 'sys']
Matrix info: number of non zero elements 2435 , shape: [74, 74]
Code & repo to reproduce
# repo to reproduce
git clone https://github.com/svn2github/icu4j/ --depth 1
# bblfsh client call to reproduce error
root@science-3:~/tmp# PYTHONPATH=/media/root/storage/egor/ast2vec_rsync:/media/root/storage/egor/ast2vec_rsync/src/bblfsh GRPC_ARG_MAX_SEND_MESSAGE_LENGTH=-1 python3 -m bblfsh --disable-bblfsh-autorun -e 172.17.0.1:9432 -f icu4j/main/tests/core/src/com/ibm/icu/dev/test/bigdec/DiagBigDecimalTest.java
Traceback (most recent call last):
File "/usr/lib/python3.5/runpy.py", line 184, in _run_module_as_main
"__main__", mod_spec)
File "/usr/lib/python3.5/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/media/root/storage/egor/ast2vec_rsync/src/bblfsh/bblfsh/__main__.py", line 34, in <module>
sys.exit(main())
File "/media/root/storage/egor/ast2vec_rsync/src/bblfsh/bblfsh/__main__.py", line 30, in main
print(client.parse_uast(args.file, args.language))
File "/media/root/storage/egor/ast2vec_rsync/src/bblfsh/bblfsh/client.py", line 62, in parse_uast
response = self._stub.ParseUAST(request, timeout=timeout)
File "/usr/local/lib/python3.5/dist-packages/grpc/_channel.py", line 507, in __call__
return _end_unary_response_blocking(state, call, False, deadline)
File "/usr/local/lib/python3.5/dist-packages/grpc/_channel.py", line 455, in _end_unary_response_blocking
raise _Rendezvous(state, None, None, deadline)
grpc._channel._Rendezvous: <_Rendezvous of RPC that terminated with (StatusCode.RESOURCE_EXHAUSTED, grpc: trying to send message larger than max (4315530 vs. 4194304))>
I worked with bblfsh on science-3 machine and found that too many zombies spawn in the system.
when I close bblfsh container they are gone away.
I work with bblfsh on science-3 machine.
It will be easier to reproduce it on it (In case, I can show you how) but in general, I do something like this:
run bblfsh in the usual way
docker run --privileged -p 9432:9432 --name bblfsh bblfsh/server
run another container in my case it is
docker run --rm -it -v /storage:/storage --device /dev/nvidiactl --device /dev/nvidia-uvm -v /data:/data -expose=9432 --privileged -e "LD_PRELOAD=" srcd/science bash
May be it is reproducible just in the system.
then
pip3 install git+https://github.com/bblfsh/client-python
pip3 install git+https://github.com/src-d/ast2vec@develop
and then run attached script.
entry_point.py.zip
You can check kind of zombies number with ps aux | grep Z | wc -l
before and after.
In my case it 4 and 15.
Medium
I need to get UAST for many repos in nearest future, so it quite important.
I tried to reproduce the same result under MacOS and failed
Server version: 54c71fc
SDK version: bblfsh/sdk@df3e0da
Build log:
docker build -f Dockerfile.build -t bblfsh-server-build .
Sending build context to Docker daemon 112.9 MB
Step 1 : FROM golang:1.8-alpine
---> e7baf3b1a3a5
Step 2 : RUN apk add --no-cache git make musl-dev musl-utils gcc lvm2-dev btrfs-progs-dev
---> Using cache
---> fd6daebf7136
Step 3 : ENV GOPATH /go
---> Using cache
---> 48d0a86e3f57
Step 4 : WORKDIR /go/src/github.com/bblfsh/server
---> Using cache
---> 4f7d3d5252c2
Successfully built 4f7d3d5252c2
docker run --rm -v /home/dennwc/Go:/go bblfsh-server-build make build-internal
mkdir -p /go/src/github.com/bblfsh/server/build; \
for cmd in bblfsh; do \
cd /go/src/github.com/bblfsh/server/cmd/${cmd}; \
go build --ldflags '-X main.version=master -X main.build=06-15-2017_19_51_11' -o /go/src/github.com/bblfsh/server/build/${cmd} .; \
done;
# github.com/bblfsh/server/cmd/bblfsh
/usr/local/go/pkg/tool/linux_amd64/link: running gcc failed: exit status 1
/tmp/go-link-688671004/000001.o: In function `vsnprintf':
/usr/include/x86_64-linux-gnu/bits/stdio2.h:77: undefined reference to `__vsnprintf_chk'
/tmp/go-link-688671004/000001.o: In function `child_func':
/home/dennwc/Go/src/github.com/bblfsh/server/vendor/github.com/opencontainers/runc/libcontainer/nsenter/nsexec.c:212: undefined reference to `__longjmp_chk'
/tmp/go-link-688671004/000001.o: In function `fprintf':
/usr/include/x86_64-linux-gnu/bits/stdio2.h:97: undefined reference to `__fprintf_chk'
/usr/include/x86_64-linux-gnu/bits/stdio2.h:97: undefined reference to `__fprintf_chk'
/usr/include/x86_64-linux-gnu/bits/stdio2.h:97: undefined reference to `__fprintf_chk'
/usr/include/x86_64-linux-gnu/bits/stdio2.h:97: undefined reference to `__fprintf_chk'
/usr/include/x86_64-linux-gnu/bits/stdio2.h:97: undefined reference to `__fprintf_chk'
/tmp/go-link-688671004/000001.o:/usr/include/x86_64-linux-gnu/bits/stdio2.h:97: more undefined references to `__fprintf_chk' follow
/tmp/go-link-688671004/000001.o: In function `snprintf':
/usr/include/x86_64-linux-gnu/bits/stdio2.h:64: undefined reference to `__snprintf_chk'
/tmp/go-link-688671004/000001.o: In function `fprintf':
/usr/include/x86_64-linux-gnu/bits/stdio2.h:97: undefined reference to `__fprintf_chk'
/usr/include/x86_64-linux-gnu/bits/stdio2.h:97: undefined reference to `__fprintf_chk'
/usr/include/x86_64-linux-gnu/bits/stdio2.h:97: undefined reference to `__fprintf_chk'
/usr/include/x86_64-linux-gnu/bits/stdio2.h:97: undefined reference to `__fprintf_chk'
/usr/include/x86_64-linux-gnu/bits/stdio2.h:97: undefined reference to `__fprintf_chk'
/tmp/go-link-688671004/000001.o:/usr/include/x86_64-linux-gnu/bits/stdio2.h:97: more undefined references to `__fprintf_chk' follow
collect2: error: ld returned 1 exit status
make: *** [Makefile:118: build-internal] Error 2
Makefile:115: recipe for target 'build' failed
make: *** [build] Error 2
Is there any plans to measure how fast bblfsh and drivers work and improve it?
I thought to use it in my project, but it works very slow.
Currently, the difference bblfsh vs native parser is huge.
Example of parsing itsdangerous.py
500 times:
native parser:
import ast
from datetime import datetime
st = datetime.now()
for _ in range(500):
tree = ast.parse(open('itsdangerous.py').read())
ast.dump(tree)
print(datetime.now() - st)
using bblfs:
package main
import (
"fmt"
"io/ioutil"
"time"
"gopkg.in/bblfsh/client-go.v2"
)
const times = 500
func main() {
b, err := ioutil.ReadFile("itsdangerous.py")
if err != nil {
panic(err)
}
content := string(b)
client, err := bblfsh.NewClient("0.0.0.0:9432")
if err != nil {
panic(err)
}
st := time.Now()
for i := 0; i < times; i++ {
_, err = client.NewParseRequest().Language("python").Content(content).Do()
if err != nil {
panic(err)
}
}
fmt.Println(time.Now().Sub(st))
}
Results:
$ python3 parse.py
0:00:09.097725
$ go run main.go
2m34.900917922s
9s vs 2m34s is kinda huge...
Will you be open to my help with profiling/optimization?
Currently we always use bblfsh/<lang>-driver:latest
but we should be able to select specific images for some languages with environment variables. That will be particularly useful for testing.
I would say something like JAVA_DRIVER_IMAGE=myimage:foo
or JAVA_DRIVER_IMAGE=docker-daemon:myimage:foo
, but this is open to discussion.
Right now
docker run --privileged -p 9432:9432 --name bblfsh bblfsh/server --some-option 1
would not pass CLI arguments to the server binary.
This could be changed if instead of CMD
an ENTRYPOINT
used in Dockerfile#L5
Currently we have some configuration handled by CLI options and envvars, we can use a better config system.
Add driver client that does language detection with simple-linguist and routes it to the required driver if present.
Doing ParseUAST requests in a loop for several files will hang on ParseUAST() if the previous request returned a FATAL error, even reconnecting after it.
If we handle the error subsequent ParseUASTRequest for different files should continue, maybe after doing a reconnect to the server.
The server log show the source code of the next file but the client process hangs on the next ParseUAST request.
/usr/lib/python3.6/__future__.py
/usr/lib/python3.6/__phello__.foo.py
/usr/lib/python3.6/_bootlocale.py
/usr/lib/python3.6/_collections_abc.py
/usr/lib/python3.6/_compat_pickle.py
/usr/lib/python3.6/_compression.py
/usr/lib/python3.6/_dummy_thread.py
/usr/lib/python3.6/_markupbase.py
/usr/lib/python3.6/_osx_support.py
/usr/lib/python3.6/_pydecimal.py
FATAL error with the file [ /usr/lib/python3.6/_pydecimal.py ]: [buffer size exceeded]
/usr/lib/python3.6/_pyio.py
# HANGS HERE
If no files crash (because SDK PR #127 has been merged), copy this file to the directory with the others (its intentionally so huge that it'll crash even with the 4MB buffer): https://gist.github.com/juanjux/b6a51cd934368c142e6ea235811d1f33
Start the server: docker run --privileged -p 9432:9432 --name bblfsh bblfsh/server
go run py2uast2pb.go
This is related to some internal renaming. See sirupsen/logrus#570 (comment)
ubuntu@ubuntu16:~$ go version
go version go1.8.3 linux/amd64
ubuntu@ubuntu16:~$ echo $GOPATH
/home/ubuntu/go
ubuntu@ubuntu16:~$ go get -v github.com/bblfsh/server
can't load package: package github.com/bblfsh/server: case-insensitive import collision: "github.com/Sirupsen/logrus" and "github.com/sirupsen/logrus"
Apparently logrus should be imported using github.com/sirupsen/logrus
If I run the server like:
docker run --privileged -p 9432:9432 --rm --name bblfsh bblfsh/server --transport=docker-daemon
And then do a client request, I get this error:
python -m bblfsh --disable-bblfsh-autorun -f test.py
status: FATAL
errors: "error getting driver: missing driver for language python: runtime failure:
Error loading image from docker engine: Cannot connect to the Docker daemon at
unix:///var/run/docker.sock. Is the docker daemon running?"
But it works perfectly if I run the server directly without Docker:
sudo ./bblfsh server --transport=docker-daemon
It was added to support the dashboard, but they already have they're own server, so no need to support it anymore.
$ wget https://github.com/bblfsh/server/releases/download/v0.6.0/bblfsh
$ ldd bblfsh
linux-vdso.so.1 => (0x00007ffc52dac000)
libc.musl-x86_64.so.1 => not found
cc @rporres
egor@science-3 ~/bblfsh-dev-image $docker run --privileged -p 9432:9432 --name bblfsh bblfsh/server
Unable to find image 'bblfsh/server:latest' locally
latest: Pulling from bblfsh/server
88286f41530e: Pull complete
878f656258fa: Pull complete
94595a4777da: Pull complete
Digest: sha256:628b09f1a669a851abfecf9231ecfbaa07cda13a5fb34f8c1eb6d71dd1dbc6bc
Status: Downloaded newer image for bblfsh/server:latest
time="2017-07-05T16:47:33Z" level=debug msg="binding to 0.0.0.0:9432"
time="2017-07-05T16:47:33Z" level=debug msg="initializing runtime at /tmp/bblfsh-runtime"
invalid image driver format
time="2017-07-05T16:47:33Z" level=error msg="exiting with error: invalid image driver format "
@abeaumont gave solution how it can be avoided right now:
export BBLFSH_DRIVER_IMAGES="a=a"; docker run -e BBLFSH_DRIVER_IMAGES --privileged -p 9432:9432 --name bblfsh bblfsh/server
but it should be fixed or documentation should be changed.
We should avoid having versions unset since glide
's cache makes weird things in that case.
A sane aproach may be:
Running babelfish server currently requires root. This makes running tests a pain, so Makefile
should be modified to run tests in Docker and travis.yml
should be updated if needed.
Add basic gRPC client to test the server.
We use the following script to reproduce the problem:
python3 -m bblfsh ast2vec/ast2vec/__init__.py ast2vec/ast2vec/__main__.py ast2vec/ast2vec/dataset.py ast2vec/ast2vec/df.py ast2vec/ast2vec/enry.py ast2vec/ast2vec/id2vec.py ast2vec/ast2vec/id_embedding.py ast2vec/ast2vec/repo2base.py ast2vec/ast2vec/repo2coocc.py ast2vec/ast2vec/repo2nbow.py ast2vec/ast2vec/swivel.py ast2vec/ast2vec/utils/__init__.py ast2vec/ast2vec/utils/ast2vec.py ast2vec/ast2vec/utils/gogo_pb2.py ast2vec/ast2vec/utils/gogo_pb2_grpc.py ast2vec/ast2vec/utils/sdk_protocol_pb2.py ast2vec/ast2vec/utils/sdk_protocol_pb2_grpc.py ast2vec/ast2vec/utils/uast_pb2.py ast2vec/ast2vec/utils/uast_pb2_grpc.py ast2vec/data/test.py ast2vec/src/bblfsh/bblfsh/__init__.py ast2vec/src/bblfsh/bblfsh/__main__.py ast2vec/src/bblfsh/bblfsh/client.py ast2vec/src/bblfsh/bblfsh/github/__init__.py ast2vec/src/bblfsh/bblfsh/github/com/__init__.py ast2vec/src/bblfsh/bblfsh/github/com/bblfsh/__init__.py ast2vec/src/bblfsh/bblfsh/github/com/bblfsh/sdk/__init__.py ast2vec/src/bblfsh/bblfsh/github/com/bblfsh/sdk/protocol/__init__.py ast2vec/src/bblfsh/bblfsh/github/com/bblfsh/sdk/protocol/generated_pb2.py ast2vec/src/bblfsh/bblfsh/github/com/bblfsh/sdk/protocol/generated_pb2_grpc.py ast2vec/src/bblfsh/bblfsh/github/com/bblfsh/sdk/uast/__init__.py ast2vec/src/bblfsh/bblfsh/github/com/bblfsh/sdk/uast/generated_pb2.py ast2vec/src/bblfsh/bblfsh/github/com/gogo/__init__.py ast2vec/src/bblfsh/bblfsh/github/com/gogo/protobuf/__init__.py ast2vec/src/bblfsh/bblfsh/github/com/gogo/protobuf/gogoproto/__init__.py ast2vec/src/bblfsh/bblfsh/github/com/gogo/protobuf/gogoproto/gogo_pb2.py ast2vec/src/bblfsh/bblfsh/launcher.py ast2vec/src/bblfsh/bblfsh/test.py ast2vec/src/bblfsh/setup.py > 1.txt &
python3 -m bblfsh ast2vec/ast2vec/__init__.py ast2vec/ast2vec/__main__.py ast2vec/ast2vec/dataset.py ast2vec/ast2vec/df.py ast2vec/ast2vec/enry.py ast2vec/ast2vec/id2vec.py ast2vec/ast2vec/id_embedding.py ast2vec/ast2vec/repo2base.py ast2vec/ast2vec/repo2coocc.py ast2vec/ast2vec/repo2nbow.py ast2vec/ast2vec/swivel.py ast2vec/ast2vec/utils/__init__.py ast2vec/ast2vec/utils/ast2vec.py ast2vec/ast2vec/utils/gogo_pb2.py ast2vec/ast2vec/utils/gogo_pb2_grpc.py ast2vec/ast2vec/utils/sdk_protocol_pb2.py ast2vec/ast2vec/utils/sdk_protocol_pb2_grpc.py ast2vec/ast2vec/utils/uast_pb2.py ast2vec/ast2vec/utils/uast_pb2_grpc.py ast2vec/data/test.py ast2vec/src/bblfsh/bblfsh/__init__.py ast2vec/src/bblfsh/bblfsh/__main__.py ast2vec/src/bblfsh/bblfsh/client.py ast2vec/src/bblfsh/bblfsh/github/__init__.py ast2vec/src/bblfsh/bblfsh/github/com/__init__.py ast2vec/src/bblfsh/bblfsh/github/com/bblfsh/__init__.py ast2vec/src/bblfsh/bblfsh/github/com/bblfsh/sdk/__init__.py ast2vec/src/bblfsh/bblfsh/github/com/bblfsh/sdk/protocol/__init__.py ast2vec/src/bblfsh/bblfsh/github/com/bblfsh/sdk/protocol/generated_pb2.py ast2vec/src/bblfsh/bblfsh/github/com/bblfsh/sdk/protocol/generated_pb2_grpc.py ast2vec/src/bblfsh/bblfsh/github/com/bblfsh/sdk/uast/__init__.py ast2vec/src/bblfsh/bblfsh/github/com/bblfsh/sdk/uast/generated_pb2.py ast2vec/src/bblfsh/bblfsh/github/com/gogo/__init__.py ast2vec/src/bblfsh/bblfsh/github/com/gogo/protobuf/__init__.py ast2vec/src/bblfsh/bblfsh/github/com/gogo/protobuf/gogoproto/__init__.py ast2vec/src/bblfsh/bblfsh/github/com/gogo/protobuf/gogoproto/gogo_pb2.py ast2vec/src/bblfsh/bblfsh/launcher.py ast2vec/src/bblfsh/bblfsh/test.py ast2vec/src/bblfsh/setup.py > 2.txt &
python3 -m bblfsh ast2vec/ast2vec/__init__.py ast2vec/ast2vec/__main__.py ast2vec/ast2vec/dataset.py ast2vec/ast2vec/df.py ast2vec/ast2vec/enry.py ast2vec/ast2vec/id2vec.py ast2vec/ast2vec/id_embedding.py ast2vec/ast2vec/repo2base.py ast2vec/ast2vec/repo2coocc.py ast2vec/ast2vec/repo2nbow.py ast2vec/ast2vec/swivel.py ast2vec/ast2vec/utils/__init__.py ast2vec/ast2vec/utils/ast2vec.py ast2vec/ast2vec/utils/gogo_pb2.py ast2vec/ast2vec/utils/gogo_pb2_grpc.py ast2vec/ast2vec/utils/sdk_protocol_pb2.py ast2vec/ast2vec/utils/sdk_protocol_pb2_grpc.py ast2vec/ast2vec/utils/uast_pb2.py ast2vec/ast2vec/utils/uast_pb2_grpc.py ast2vec/data/test.py ast2vec/src/bblfsh/bblfsh/__init__.py ast2vec/src/bblfsh/bblfsh/__main__.py ast2vec/src/bblfsh/bblfsh/client.py ast2vec/src/bblfsh/bblfsh/github/__init__.py ast2vec/src/bblfsh/bblfsh/github/com/__init__.py ast2vec/src/bblfsh/bblfsh/github/com/bblfsh/__init__.py ast2vec/src/bblfsh/bblfsh/github/com/bblfsh/sdk/__init__.py ast2vec/src/bblfsh/bblfsh/github/com/bblfsh/sdk/protocol/__init__.py ast2vec/src/bblfsh/bblfsh/github/com/bblfsh/sdk/protocol/generated_pb2.py ast2vec/src/bblfsh/bblfsh/github/com/bblfsh/sdk/protocol/generated_pb2_grpc.py ast2vec/src/bblfsh/bblfsh/github/com/bblfsh/sdk/uast/__init__.py ast2vec/src/bblfsh/bblfsh/github/com/bblfsh/sdk/uast/generated_pb2.py ast2vec/src/bblfsh/bblfsh/github/com/gogo/__init__.py ast2vec/src/bblfsh/bblfsh/github/com/gogo/protobuf/__init__.py ast2vec/src/bblfsh/bblfsh/github/com/gogo/protobuf/gogoproto/__init__.py ast2vec/src/bblfsh/bblfsh/github/com/gogo/protobuf/gogoproto/gogo_pb2.py ast2vec/src/bblfsh/bblfsh/launcher.py ast2vec/src/bblfsh/bblfsh/test.py ast2vec/src/bblfsh/setup.py > 3.txt
Please note that you need the patch to client-python: https://github.com/vmarkovtsev/bblfsh.client-python/tree/main-change
Result: 1.txt, 2.txt and 3.txt are randomly different (checked with diff
). Besides, some files are in wrong places or empty.
I use bblfsh and often check what is doing on via
docker logs -f bblfshd
and it gives messages like:
time="2017-11-24T11:38:43Z" level=warning msg="request processed content 4047 bytes, status Error" elapsed=103.048661ms language=java
If status Error
I'd really like to know the file path, on which it fails.
Is it possible to add?
You can add it and permanent field or add just in case of error.
ML team needs to have an API to ask the running server for its version.
Discussions:
https://src-d.slack.com/archives/C4NNGEVGW/p1497445078106377
https://src-d.slack.com/archives/C4NNGEVGW/p1502888509000205
Related to #28
Hi,
I was just made aware of this project and this sounds super interesting - first: I'm the founder of coala.io and we have been thinking about doing some universal AST as well but never got around doing it properly. It's great to see that there's an own project and effort around this concept!
coala is a python based open source code analysis framework with a dependency mechanism. The main idea here is that people can write a module that generates an AST (preferrably something like your UAST) and other modules that consume that one, coala handles parallelization, caching, user interaction etc.
Given that it'd be super cool if we could maybe collaborate on that to a degree. I see you already have a python client and I need to read up on this stuff but I could see us providing your UAST for researchers and programmers to write code analysis and maybe build a query language for the code analysis (maybe you already have something like this in mind).
I just read about this 5 mins ago and those are just a few initial thoughts, what do you think?
Cheers!
CI has failed due to srcd.works not existing anymore: https://travis-ci.org/bblfsh/server/builds/273847022
It should be replaced by gopkg.in/src-d/go-errors.v0
It's not a problem for grpc to have bigger sizes and was 100mb by default before 1.0.0
In use-case of src-d/berserker is not un-common to have 10s mb UASTs so may be shall we try 100mb default instead of 4?
It would also be nice to always log this parameter at runtime, for debugging purpose.
Right now resp, err := client.ParseUAST(context.TODO(), req)
may fail with error beeing NOT nil
but actual parsing has failed and there are resp.Errors
and resp.status
is fatal
.
which is absolutely fine.resp
may be nil (i.e if server is not running)
It seems like quite un-obvious behavior (at least for Golang) that may either be possible to fix or at least to documented everywhere, including https://doc.bblf.sh/user/server-grpc-example.html#full-source-of-the-example
Otherwise API users will on random NPE panics on further response.UAST
manipulations (which do not propagate any error message).
Here is the example of client that works around current behavior, which boils down to
resp, err := client.ParseUAST(context.TODO(), req)
if err != nil {
fmt.Printf("Error - ParseUAST failed, reposne is nil, error:%v for %s\n", err, f.Name)
} else if resp == nil {
fmt.Printf("No error, but - ParseUAST failed, response is nil\n")
} else if (len(resp.Errors) != 0) || (resp.Status != protocol.Ok) {
fmt.Printf("No error, but - ParseUAST status:%s, error:%v for %s\n", resp.Status, resp.Errors, f.Name)
}
While getting UASTs and filtering for identifiers for python files of a single project using Engine, after 30min I can see 350+ driver processes inside the bblfshd container
Logs in details
ps
root 2169 0.0 0.0 18188 3188 pts/0 Ss 08:43 0:00 bash
root 2177 0.0 0.0 0 0 ? Z 08:43 0:00 [runc:[1:CHILD]] <defunct>
root 2203 0.0 0.0 0 0 ? Z 08:43 0:00 [runc:[1:CHILD]] <defunct>
root 2228 0.0 0.0 0 0 ? Z 08:43 0:00 [runc:[1:CHILD]] <defunct>
root 2249 0.0 0.0 0 0 ? Z 08:43 0:00 [runc:[1:CHILD]] <defunct>
root 2269 0.0 0.0 0 0 ? Z 08:44 0:00 [runc:[1:CHILD]] <defunct>
root 2473 0.0 0.0 0 0 ? Z 08:44 0:00 [runc:[1:CHILD]] <defunct>
root 2561 0.0 0.0 0 0 ? Z 08:44 0:00 [runc:[1:CHILD]] <defunct>
root 2562 28.4 0.7 36036 28552 ? Ssl 08:44 0:01 /opt/driver/bin/driver --log-level info --log-format text -
root 2572 30.7 0.6 81092 27848 ? S 08:44 0:02 /usr/bin/python3.6 /usr/bin/python_driver
Container log
time="2017-11-16T08:48:35Z" level=info msg="python-driver version: dev-1908ca8 (build: 2017-11-14T11:31:28Z)" id=01bz207xxmc18dppgxwgywr5zs language=python
time="2017-11-16T08:48:35Z" level=info msg="server listening in /tmp/rpc.sock (unix)" id=01bz207xxmc18dppgxwgywr5zs language=python
time="2017-11-16T08:48:36Z" level=info msg="new driver instance started bblfsh/python-driver:latest (01bz207xxmc18dppgxwgywr5zs)"
time="2017-11-16T08:49:07Z" level=info msg="python-driver version: dev-1908ca8 (build: 2017-11-14T11:31:28Z)" id=01bz208xey432evff0pga9dnxr language=python
time="2017-11-16T08:49:07Z" level=info msg="server listening in /tmp/rpc.sock (unix)" id=01bz208xey432evff0pga9dnxr language=python
time="2017-11-16T12:24:51Z" level=error msg="error re-scaling pool: container is not destroyed" language=python
apache spark thread dump
org.bblfsh.client.BblfshClient.filter(BblfshClient.scala:33)
tech.sourced.engine.udf.QueryXPathUDF$$anonfun$queryXPath$2.apply(QueryXPathUDF.scala:45)
tech.sourced.engine.udf.QueryXPathUDF$$anonfun$queryXPath$2.apply(QueryXPathUDF.scala:44)
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
scala.collection.AbstractTraversable.flatMap(Traversable.scala:104)
tech.sourced.engine.udf.QueryXPathUDF$.queryXPath(QueryXPathUDF.scala:44)
Steps to reproduce, using 30 concurrent clients:
// get Borges from https://github.com/src-d/borges/releases/tag/v0.8.3
echo -e "https://github.com/src-d/borges.git\nhttps://github.com/erizocosmico/borges.git\nhttps://github.com/jelmer/dulwich.git" > repos.txt
borges pack --loglevel=debug --workers=2 --to=./repos -f repos.txt
// get Apache Spark https://github.com/src-d/engine#quick-start
$SPARK_HOME/bin/spark-shell --driver-memory=4g --packages "tech.sourced:engine:0.1.7"
and then run :paste
, paste code below and hit Ctrl+D
import tech.sourced.engine._
val engine = Engine(spark, "repos")
val repos = engine.getRepositories
val refs = repos.getHEAD.withColumnRenamed("hash","commit_hash")
val langs = refs.getFiles.classifyLanguages
val pyTokens = langs
.where('lang === "Python")
.extractUASTs.queryUAST("//*[@roleIdentifier]", "uast", "result")
.extractTokens("result", "tokens")
val tokensToWrite = pyTokens
.join(refs, "commit_hash")
.select('repository_id, 'name, 'commit_hash, 'file_hash, 'path, 'lang, 'tokens)
spark.conf.set("spark.sql.shuffle.partitions", "30") //instead of default 200
tokensToWrite.show
then, if exec'ed to bblfshd container, one can see number of driver processes growing
apt-get update && apt-get install -y procps
ps aux | wc -l
When I start bblfsh and parse a file, I get
time="2017-09-27T14:53:52Z" level=info msg="binding to 0.0.0.0:9432"
time="2017-09-27T14:53:52Z" level=info msg="initializing runtime at /tmp/bblfsh-runtime"
time="2017-09-27T14:53:52Z" level=info msg="setting maximum size for sending and receiving messages to 104857600"
time="2017-09-27T14:53:52Z" level=info msg="starting gRPC server"
time="2017-09-27T14:54:17Z" level=info msg="container started bblfsh/java-driver:latest (01BV1X9KXP0HJBG4WYW2MNZZZB)"
time="2017-09-27T14:54:17Z" level=info msg="parsing FileUtils.java (2248 bytes)"
proto: no encoder for Filename string [GetProperties]
proto: no encoder for Language string [GetProperties]
proto: no encoder for Content string [GetProperties]
proto: no encoder for Encoding protocol.Encoding [GetProperties]
I guess these "proto" messages signal about smth, though the files are successfully parsed.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.