GithubHelp home page GithubHelp logo

ruanchaves / epoxy Goto Github PK

View Code? Open in Web Editor NEW

This project forked from hazyresearch/epoxy

0.0 2.0 0.0 5.46 MB

Interactive Model Iteration with Weak Supervision and Pre-Trained Embeddings

License: Apache License 2.0

Python 100.00%

epoxy's Introduction

Epoxy: Interactive Model Iteration with Weak Supervision and Pre-Trained Embeddings

UPDATE 07/20/20: Code now supports using FAISS for fast and scalable NN search! Tutorial example updated to use FAISS now.

Epoxy uses weak supervision and pre-trained embeddings to create models that can train at programmatically-interactive speeds (less than 1/2 second), but that can retain the performance of training deep networks. This repository presents a simple proof-of-concept implementation for Epoxy (our implementation is around 100 LOC, including docstrings).

In weak supervision, users write noisy labeling functions that generate labels for the data. Historically, we have observed that these labeling functions are often high accuracy but low coverage (each labeling function only votes on a subset of points). The only ways to make up the gap in the past have been to write more labeling functions (which can get difficult as you start dealing with the long tail), or use the labeling functions to train an end model (see, e.g., FlyingSquid for more details).

In Epoxy, we use pre-trained embeddings to get some of the benefits of training an end model--without having to train one. We use the embeddings to create extended labeling functions through nearest-neighbors search (improving coverage), and then use FlyingSquid to aggregate the extended labeling functions. This helps get some of the benefits of training a deep network, but at a fraction of the cost. And if you do have time to train a deep network, Epoxy can be used to generate labels to train a downstream end model as well.

Check out our paper on arXiv for more details!

Getting Started

  • Install Epoxy
  • Check out the example tutorial for a simple Jupyter notebook showing the proof of concept in this repo.

Installation

This repository depends on FlyingSquid. We recommend using conda to install FlyingSquid, and then you can install Epoxy:

git clone https://github.com/HazyResearch/flyingsquid.git

cd flyingsquid

conda env create -f environment.yml
conda activate flyingsquid

pip install -e .

cd ..

git clone https://github.com/HazyResearch/epoxy.git

cd epoxy

# if you are on a machine with a GPU
pip install faiss-gpu
# if you are on a machine without a GPU
pip install faiss-cpu

pip install -e .

Alternatively, you can install FlyingSquid (and its dependencies) yourself, see the FlyingSquid repo for more details.

Citation

If you use our work or found it useful, please cite our arXiv paper for now:

@article{chen2020train,
  author = {Mayee F. Chen and Daniel Y. Fu and Frederic Sala and Sen Wu and Ravi Teja Mullapudi and Fait Poms and Kayvon Fatahalian and Christopher R\'e},
  title = {Train and You'll Miss It: Interactive Model Iteration with Weak Supervision and Pre-Trained Embeddings},
  journal = {arXiv preprint arXiv:2006.15168},
  year = {2020},
}

epoxy's People

Contributors

danfu09 avatar ruanchaves avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.