GithubHelp home page GithubHelp logo

d3v3l0 / workshop Goto Github PK

View Code? Open in Web Editor NEW

This project forked from mloncode/workshop

0.0 0.0 0.0 16.85 MB

Machine Learning for Software Engineering: modelling the source code workshop.

Dockerfile 1.81% Makefile 0.55% Jupyter Notebook 77.20% Python 18.22% Shell 2.23%

workshop's Introduction

Machine Learning for Software Engineering: modelling the source code

Slides are online.

OSS tools covered:

Abstract

Machine Learning on Source Code (MLonCode) is an emerging research domain which stands at the intersection of deep learning, natural language processing, software engineering and programming language communities.

During this 3h30 workshop, we will review recent Software Engineering tasks that benefit from applying Machine Learning, with a focus on hands-on experience on:

  • extracting data from real source code
  • developing multiple Machine Learning models
  • for a particular task of source code summarization (or function name suggestion).

At the end of the workshop participants will build 2 working models on a real dataset, producing near state-of-the-art results. Practical skill of extracting information from source code as well as modelling different aspects of it are going to be acquired.

Prerequisites: familiarity with the basics of DeepLearning, a laptop with Docker installed

Prerequisites

  • Docker

Dependencies

Import Docker images (works offline):

docker load -i images/jupyter.tgz
docker load -i images/gitbase.tgz
docker load -i images/bblfshd-with-drivers.tgz

docker images

Run bblfsh

docker run \
    --detach \
    --rm \
    --name amld_bblfshd \
    --privileged \
    --publish 9432:9432 \
    bblfsh/bblfshd:v2.15.0-drivers \
    --log-level DEBUG

Run gitbase

docker run \
    --detach \
    --rm \
    --name amld_gitbase \
    --publish 3306:3306 \
    --link amld_bblfshd:amld_bblfshd \
    --env BBLFSH_ENDPOINT=amld_bblfshd:9432 \
    --env MAX_MEMORY=1024 \
    --volume $(pwd)/repos/git-data:/opt/repos \
    srcd/gitbase:v0.24.0-rc2

Run the jupyter image

docker run \
    --rm \
    --name amld_jupyter \
    --publish 8888:8888 \
    --link amld_bblfshd:amld_bblfshd \
    --link amld_gitbase:amld_gitbase \
    --volume $(pwd)/notebooks:/amld/notebooks \
    --volume $(pwd)/repos:/amld/repos \
    mloncode/amld
With make

To build the workshop image and launch the 3 required containers

make build-and-run

To only launch the 3 required containers

make

Workflow

1. Download the data

We are going to use top 50 repositories from Apache Software Foundation though this workshop.

Notebook 1: data collection pipeline (example)

2. Project and Developer Similarities

Build a vector model for projects and developers using Topic Modelling of code identifiers.

Notebook 2: project and developer similarities (example)

3. Function Name Suggestion

Train a NMT seq2seq model for predicting method names based on identifiers in method bodies.

Notebook 2: function name suggestion (example)

workshop's People

Contributors

bzz avatar m09 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.