GithubHelp home page GithubHelp logo

ooiiooiioo / compy-learn Goto Github PK

View Code? Open in Web Editor NEW

This project forked from tud-ccc/compy-learn

0.0 0.0 0.0 368 KB

ComPy-Learn is a framework for exploring program representations for ML4CODE tasks.

License: Apache License 2.0

CMake 1.98% Python 62.36% C++ 34.82% Shell 0.84%

compy-learn's Introduction

ComPy-Learn

Build Status codecov License

ComPy-Learn is a framework for defining and exploring program representations for machine learning on source code (ML4CODE) tasks. While the special focus is on compiler optimization tasks, ComPy-Learn can also be used in other domains like software engineering, or systems security.

Project goals

  • Exploration of best-performing code representation and model: Depending on the task, different representations and models have shown to be differently suitable. Finding the best-performing one is not obvious and currently requires empirical evaluation. ComPy-Learn provides a common framework for that - evaluating different representations on a given task to find the best-performing one.
  • Design and discovery of new representations: Custom, task-specific representations of code can improve a models performance. However, extracting representations of program code is a tedious endeavor and requires low-level development with compiler tools. We aim to take away this burden by enabling to define program representations with a simple, high-level programming interface. This allows easier design and faster iterations.
  • Common tools, evaluation pipeline and datasets: Several promising representations and models to learn embeddings from those representations have been proposed in recent time. However, they use unique tools and pipelines for evaluations, making further comparisons to those methods time-consuming and difficult. ComPy-Learn provides a common framework for representations, models, and datasets and allows for evaluation of their combinations. Implementing a novel representation and model in this framework enables researches to do an effort-less and complete evaluation on the one hand, on the other hand contributes another widely applicable method to the community.

Design

ComPy-Learn's main components are shown in the pipeline below:

  • compy.representation allows the user to define custom representations (such as the ones from published work) of source code based on available semantic compiler-internal information, currently from the Clang/LLVM framework. Both, linear and graph representations of code are supported.
  • compy.model contains ML-models (in fact, it provides connectors to well-established model libraries) that embed the representations into vectors and finally output a prediction.
  • compy.dataset contains datasets of source code for evaluation, along with helper functions that allow integration of new datasets.

Supported representations

Currently, the following representations and models from published work are implemented in this framework:

Installation

We supply an installation script that automates the build, test, and installation process. The script currently supports the platforms listed below. Because the process builds ComPy-Learn from its sources, other platforms can be used with a bit of manual installation effort.

Platform Build status
Ubuntu 16.04 Build Status
Ubuntu 18.04 Build Status
Ubuntu 20.04 Build Status

To get started on one of the supported platforms, we suggest to first create a virtual environment, then run:

./install_deps.sh ${CUDA}

whereas ${CUDA} needs to be cpu, cu92, cu100 or cu102, depending on your machine's capabilities.

After successful installation, ComPy-Learn should be compiled and tested. To do so, please run:

python setup.py test

Finally, install ComPy-Learn in order to use it in your project:

python setup.py install

An example exploration is located in examples/devmap_exploration.py.

Publications

compy-learn's People

Contributors

bennofs avatar alexanderb14 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.