cuplv / biggroum Goto Github PK
View Code? Open in Web Editor NEWTop-level project for the graph extraction
License: Apache License 2.0
Top-level project for the graph extraction
License: Apache License 2.0
From Tom: "I believe I fixed all the issues with the run/finalize commands. There is one client-facing change and that's the JSON field of bugType should be called type . From our standpoint that keeps the fields consistent from v1 tools and v3 tools."
Create scripts to extract APK files into groum graphs inside of the Musedev docker container.
Sub tasks:
Import groum, cluster and pattern documents into solr.
The source code service downloads a repository and then applies a patch.
When we integrate with musedev instead we get the source code files as input (i.e., we do not have to download the giit repository again).
There are two main tasks:
Anonymous inner classes and constructors have special names (e.g., and contains $). In those cases we do not construct the correct filename in the toolnote.
To check the two cases.
Reading from protobuf in python is inefficient when using the python implementation.
We should use the c++ bindings when generating the isomorphisms pages (we need to read several protobuf files).
Here there is a link that explain how to do it: http://yz.mit.edu/wp/fast-native-c-protocol-buffers-from-python/
Improve the directory structure used for graphs/provenance/isomorphism:
Implement the new lattice-based search for patterns.
The search will return a set of suggestions that a developer can use to correct/complete his code.
The search takes as input a groum and returns a list of patterns, their relationship with the input groum (e.g., isomorphic, a subgraph, or a supergraph), their "distance" from the input groum (e.g., how many changes need to happen to change one groum in the pattern).
The tasks are:
Tasks to complete before monday meeting:
The files:
are in the fixrgraph python package and refer to the musedev image (i.e., are in the wrong place).
We should move those files somewhere else, outside the python package (ideally, we could create a deployment folder at the toplevel of the repository and organize our docker files there)
The execution of the biggroum MuseDev script does not save the error output.
We can save the error log on the image to help us debugging any issue.
We should save also the logging output (I don't remember if it is redirected to stderr by default now).
Get better data for demo
The muse api is supposed to create an archive of the source code corresponding to the graphs built from the class files.
Now, the muse api puts in the archive the content of a "source" folder that should exist in the same place the graphs have been extracted: this logic is wrong, since such source folder is not created from the graph extractor.
The muse api must construct the archive of source code files to send to the search service differently.
scripts/run_mining.py
exits after extracting graphs but before performing clustering, pattern search, and HTML generation.
Assuming a test directory with the following structure
myTest
└── fdroid
├── app1
├── app2
└── app3
The configuration files can be generated like so:
python scripts/generate_mining_files.py -p <path to myTest>/fdroid -b <path to myTest> -o <output path>
To run the mining:
python scripts/run_mining.py -c <output path>/mining_configuration/config.yaml
Currently, run_mining.py
completes the graph extraction but fails to compute the clusters. The full mining does work, but only after config.yaml
is edited to disable extraction, and run_mining.py
is run again on the same directory.
Some builds will generate multiple versions of an app creating several classes.
e.g.
root@5db2c9353c41:/fpp# find ./ -name "MainActivity.class"
./analyzing-3354dff4a11388be/MapboxAndroidWearDemo/build/intermediates/javac/debug/compileDebugJavaWithJavac/classes/com/mapbox/mapboxandroiddemo/MainActivity.class
./analyzing-3354dff4a11388be/MapboxAndroidDemo/build/intermediates/javac/globalDebug/compileGlobalDebugJavaWithJavac/classes/com/mapbox/mapboxandroiddemo/MainActivity.class
./analyzing-3354dff4a11388be/MapboxAndroidDemo/build/intermediates/javac/chinaDebug/compileChinaDebugJavaWithJavac/classes/com/mapbox/mapboxandroiddemo/MainActivity.class
The finalize command currently searches for any class file:
https://github.com/cuplv/biggroum/blob/fix_docker/python/fixrgraph/extraction/extract_single.py#L31
This should be updated to intelligently choose one version of the app.
Current workaround is to build with a command for only one release:
e.g. for mapbox:
./gradlew compileGlobalDebugSources
The residue is the part of the graph extractor run that persists between interactions with the developer.
Sub tasks:
To test script FixrGraphPatternSearch/docker_search/test.py
Write the code to send graphs and associated source files to the search service that runs the analysis.
The changes to the biggroumscript.sh breaks the tests in python/fixrgraph/musedev/test.
We were testing that the bash script was calling the python script correctly, but now the bash script became "path dependent" (e.g., biggroumsetup/biggroum
and /root/biggroumsetup) and without a way to skip the setup process (we cannot test the script on mac anymore for example, because path are hardcoded and the update-alternative
command that is always called).
The comment Note: Environment variable to determine run was difficult. docker does not run .bashrc on shell start, sourcing .bashrc in script also failed
does not really explain why we switched from environment variables (which are quite easy to set when invoking the script, for local testing) to files, it just motivates a workaround.
Why does sourcing .bashrc
file does not work? That seems to break the behavior of bash, while it should not be the case.
Here my main complaints:
The script uses the absolute path /root
, while you should use ${HOME} and save everything there (e.g., what does it happen if tomorrow the musedev image change and you run as another user?).
Using the absolute path (e.g. /root/biggroumsetup
) further makes the local testing (outside the container) impossible. It would be ok to replace the absolute path with a relative path (e.g. ${HOME}/biggroumsetup_completed).
At this point you could really just save a file with the additional environment variables you set during the setup.
using the relative path biggroumsetup to invoke the python script is another issue for running the test locally (e.g., on a machine where we already have a setup). The lines in the script are:
cd "$(dirname "${BASH_SOURCE[0]}")"
python biggroumsetup/biggroum/python/fixrgraph/musedev/biggroumscript.py "${dir}" "${commit}" "${cmd}" "${graph_extractor_path}" "${fixr_search_endpoint}" < /dev/stdin 1> /dev/stdout 2> /dev/stderr
FIrst, cd "$(dirname "${BASH_SOURCE[0]}")"
may not work on the musedev deployment unless the biggroumcheck.sh
script is always in the home directory.
You have the same issues for the other environment variables you keep setting:
export GRAPH_EXTRACTOR_PATH="${HOME}/biggroumsetup/fixrgraphextractor_2.12-0.1.0-one-jar.jar" >>setup_log 2>&1 && \
export PYTHONPATH="${HOME}/biggroumsetup/biggroum/python:$PYTHONPATH" >>setup_log 2>&1
They were inside the setup first, under the assumption that the container was created fresh at every run.
I would just export those environment variables in the setup and export them in the bashrc (and execute the bashrc every time), so we do not lose them and we can test the script locally.
I would also not change directory before invoking the biggroumscript.py
, and I would use an environment variable telling where is the biggroum repository:
${BIGGROUMREPO}/python/fixrgraph/musedev/biggroumscript.py`
update_alternative
command in the setup steps (the change is persistent, I think)Originally posted by @smover in #60 (comment)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.