cuplv / biggroum Goto Github PK

Top-level project for the graph extraction

License: Apache License 2.0

Shell 8.16% Python 90.14% Makefile 0.23% Dockerfile 1.47%

biggroum's Issues

Change TAFPI API - rename bugType to type.

From Tom: "I believe I fixed all the issues with the run/finalize commands. There is one client-facing change and that's the JSON field of bugType should be called type . From our standpoint that keeps the fields consistent from v1 tools and v3 tools."

Extend the Graph Extractor to extract a single repository

Create scripts to extract APK files into groum graphs inside of the Musedev docker container.

Sub tasks:

filter methods to extract based on files
script to run extractor on build directory
docker container setup

Import data into Solr

Import groum, cluster and pattern documents into solr.

Import groum
import patterns
import clusters

Musedev: add endpoint to the source code service

The source code service downloads a repository and then applies a patch.
When we integrate with musedev instead we get the source code files as input (i.e., we do not have to download the giit repository again).

There are two main tasks:

Add an endpoint in the source code service that takes as input the "diffs" for a single source code file and the source code to patch. This activity is a refactoring of the existing service: all the functionalities are already there.
Change the search service to call the new endpoint, passing the right files (this task will require some refactoring to pass the file content)

Return the right filename when processing anonymous classes and constructors

Anonymous inner classes and constructors have special names (e.g., and contains $). In those cases we do not construct the correct filename in the toolnote.

To check the two cases.

Improve protobuf efficiency

Reading from protobuf in python is inefficient when using the python implementation.

We should use the c++ bindings when generating the isomorphisms pages (we need to read several protobuf files).

Here there is a link that explain how to do it: http://yz.mit.edu/wp/fast-native-c-protocol-buffers-from-python/

Directory structure - uniform it with the other fixr tool

Improve the directory structure used for graphs/provenance/isomorphism:

always use the username, repo name (not done now)
trie structure on them

Implement lattice-based search

Implement the new lattice-based search for patterns.

The search will return a set of suggestions that a developer can use to correct/complete his code.
The search takes as input a groum and returns a list of patterns, their relationship with the input groum (e.g., isomorphic, a subgraph, or a supergraph), their "distance" from the input groum (e.g., how many changes need to happen to change one groum in the pattern).

The tasks are:

on GraphIso - serialize the lattice
on GraphIso - serialize the frequent itemset index
on GraphIso - implement the new search algorithm
in the biggroum fixriso python scripts - adapt the script to the new interface
in the FixrGraphPatternSearch - change the search interface
in the fixr_groum_search_frontend - change the web interface to call the new service and display the new data

Send Deployment to Musedev

Tasks to complete before monday meeting:

sergio: find container to run service and push to dockerhub
shawn: add sergio to dockerhub org
shawn: test that our script works in their docker container
have demo to walk through

Docker file in the python package --- to move

The files:

are in the fixrgraph python package and refer to the musedev image (i.e., are in the wrong place).

We should move those files somewhere else, outside the python package (ideally, we could create a deployment folder at the toplevel of the repository and organize our docker files there)

Save error log in the TAFPI api implementation

The execution of the biggroum MuseDev script does not save the error output.

We can save the error log on the image to help us debugging any issue.
We should save also the logging output (I don't remember if it is redirected to stderr by default now).

Build the Demo

Script for demo we are going to show darpa
good set of test cases
what good results can we show?

Extend Biggroum corpus

Get better data for demo

Musedev: Implement the new Features in the Search Service

Create a new endpoint accepting a list of groums and source files.
Refactor implementation to be stateless with respect to previous runs.
Analysis on refactoring (note: don't remember the specifics of this, @smover could you look at the photo and update this point? it was item 5.2 on the whiteboard discussion 11/5/19)

Let run_extractor.py to use the apk file directly, instead of classes

Prepare tools-deployment for the demo/hackaton

Implementation

Import patterns,clusters,groums in Solr
Implement search tool for patterns
Improve the graph extractor (APKs, ignore libraries, scale)
Improve extraction scripts (switch to python, ease the execution/debugging of the different steps)

Deployment

Deploy search tool
Import graphs in solr
Import clusters into solr

Fix source code packaging in the musedev api

The muse api is supposed to create an archive of the source code corresponding to the graphs built from the class files.

Now, the muse api puts in the archive the content of a "source" folder that should exist in the same place the graphs have been extracted: this logic is wrong, since such source folder is not created from the graph extractor.

The muse api must construct the archive of source code files to send to the search service differently.

run_mining.py fails to perform all mining steps on APKs

scripts/run_mining.py exits after extracting graphs but before performing clustering, pattern search, and HTML generation.

Assuming a test directory with the following structure

myTest
└── fdroid
    ├── app1
    ├── app2
    └── app3

The configuration files can be generated like so:

python scripts/generate_mining_files.py -p <path to myTest>/fdroid -b <path to myTest> -o <output path>

To run the mining:

python scripts/run_mining.py -c <output path>/mining_configuration/config.yaml

Currently, run_mining.py completes the graph extraction but fails to compute the clusters. The full mining does work, but only after config.yamlis edited to disable extraction, and run_mining.py is run again on the same directory.

API finalize should be robust to a build that generates multiple targets

Some builds will generate multiple versions of an app creating several classes.

e.g.

root@5db2c9353c41:/fpp# find ./ -name "MainActivity.class"
./analyzing-3354dff4a11388be/MapboxAndroidWearDemo/build/intermediates/javac/debug/compileDebugJavaWithJavac/classes/com/mapbox/mapboxandroiddemo/MainActivity.class
./analyzing-3354dff4a11388be/MapboxAndroidDemo/build/intermediates/javac/globalDebug/compileGlobalDebugJavaWithJavac/classes/com/mapbox/mapboxandroiddemo/MainActivity.class
./analyzing-3354dff4a11388be/MapboxAndroidDemo/build/intermediates/javac/chinaDebug/compileChinaDebugJavaWithJavac/classes/com/mapbox/mapboxandroiddemo/MainActivity.class

The finalize command currently searches for any class file:

https://github.com/cuplv/biggroum/blob/fix_docker/python/fixrgraph/extraction/extract_single.py#L31

This should be updated to intelligently choose one version of the app.

Current workaround is to build with a command for only one release:

e.g. for mapbox:

./gradlew compileGlobalDebugSources

Musedev: Manage Residue

The residue is the part of the graph extractor run that persists between interactions with the developer.

Sub tasks:

Decide Residue Format
Residue in "run" command
Residue in "finalize" command
Residue in "talk" command

Web deployment of groum search

To test script FixrGraphPatternSearch/docker_search/test.py

Musedev: Call Search Service From Biggroum Script

Write the code to send graphs and associated source files to the search service that runs the analysis.

Biggroumscript.sh breaks the tests in python/fixrgraph/musedev/test.

The changes to the biggroumscript.sh breaks the tests in python/fixrgraph/musedev/test.

We were testing that the bash script was calling the python script correctly, but now the bash script became "path dependent" (e.g., biggroumsetup/biggroum and /root/biggroumsetup) and without a way to skip the setup process (we cannot test the script on mac anymore for example, because path are hardcoded and the update-alternative command that is always called).

The comment Note: Environment variable to determine run was difficult. docker does not run .bashrc on shell start, sourcing .bashrc in script also failed does not really explain why we switched from environment variables (which are quite easy to set when invoking the script, for local testing) to files, it just motivates a workaround.

Why does sourcing .bashrc file does not work? That seems to break the behavior of bash, while it should not be the case.

Here my main complaints:

The script uses the absolute path /root, while you should use ${HOME} and save everything there (e.g., what does it happen if tomorrow the musedev image change and you run as another user?).
Using the absolute path (e.g. /root/biggroumsetup) further makes the local testing (outside the container) impossible. It would be ok to replace the absolute path with a relative path (e.g. ${HOME}/biggroumsetup_completed).
At this point you could really just save a file with the additional environment variables you set during the setup.
using the relative path biggroumsetup to invoke the python script is another issue for running the test locally (e.g., on a machine where we already have a setup). The lines in the script are:

cd "$(dirname "${BASH_SOURCE[0]}")"
python biggroumsetup/biggroum/python/fixrgraph/musedev/biggroumscript.py "${dir}" "${commit}" "${cmd}" "${graph_extractor_path}" "${fixr_search_endpoint}" < /dev/stdin 1> /dev/stdout 2> /dev/stderr

FIrst, cd "$(dirname "${BASH_SOURCE[0]}")" may not work on the musedev deployment unless the biggroumcheck.sh script is always in the home directory.

You have the same issues for the other environment variables you keep setting:

export GRAPH_EXTRACTOR_PATH="${HOME}/biggroumsetup/fixrgraphextractor_2.12-0.1.0-one-jar.jar" >>setup_log 2>&1 && \
export PYTHONPATH="${HOME}/biggroumsetup/biggroum/python:$PYTHONPATH"  >>setup_log 2>&1

They were inside the setup first, under the assumption that the container was created fresh at every run.
I would just export those environment variables in the setup and export them in the bashrc (and execute the bashrc every time), so we do not lose them and we can test the script locally.

I would also not change directory before invoking the biggroumscript.py, and I would use an environment variable telling where is the biggroum repository:

${BIGGROUMREPO}/python/fixrgraph/musedev/biggroumscript.py`

You should move the update_alternative command in the setup steps (the change is persistent, I think)

Originally posted by @smover in #60 (comment)

cuplv / biggroum Goto Github PK

biggroum's People

Contributors

Stargazers

Watchers

Forkers

biggroum's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs