GithubHelp home page GithubHelp logo

robinsingh1 / altair Goto Github PK

View Code? Open in Web Editor NEW

This project forked from lab41/altair

0.0 1.0 0.0 62.46 MB

Assessing Source Code Similarity with Unsupervised Learning

Shell 0.17% Python 47.69% CSS 1.11% JavaScript 1.08% HTML 0.75% Jupyter Notebook 49.20%

altair's Introduction

Altair

altair logo

Assessing Source Code Similarity with Unsupervised Learning

How do you determine what a segment of source code does?

How do you search a corpus for source code that you want to use?

Altair is Lab41's exploration of representing source code and its associated features in a vector space. We are interested in generating robust source code embeddings for Python like Word2Vec creates word embeddings for written text. You can read about our early experimentation with word embeddings for source code on the Lab41 blog.

Our primary use case of source code representation and similarity calculation is enabling meaningful recommendations of code to coders. We believe that similar techniques could be useful for code security analysis, code authorship, and code plaigarism detection.

Prerequisites

Local Computing Components

  • git
  • python3
  • pip
  • conda

Installation

Cloning the repository

Clone Altair repository from the command line, then cd into the directory

git clone https://github.com/Lab41/altair.git
cd altair
Conda

Anaconda is a completely free Python distribution. It includes more than 400 of the most popular Python packages for science, math, engineering, and data analysis. Anaconda includes conda, a cross-platform package manager and environment manager and seen by some as the successor to pip.

Before getting started, you’ll need both conda and gcc installed on your system. Download the Anaconda version for Python3+ by entering the following (as of Feb 2017) on a Linux command line:

wget https://repo.continuum.io/archive/Anaconda3-4.3.0-Linux-x86_64.sh
bash Anaconda3-4.3.0-Linux-x86_64.sh

Once that’s done, you can create an new environment on your system by calling:

conda env create -f environment.yml

Note: If the conda command is not found, start a new shell to refresh your path.

After it finishes, you should have a new conda environment named altair containing all of the dependencies. Activate it by calling

source activate altair

Check out the preprocessing README.md to find out where you can obtain our training and testing data.

altair's People

Contributors

kylemvz avatar tukeyclothespin avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.