GithubHelp home page GithubHelp logo

denmoroz / snorkel Goto Github PK

View Code? Open in Web Editor NEW

This project forked from snorkel-team/snorkel

0.0 1.0 0.0 237.86 MB

A lightweight platform for developing information extraction systems using data programming

License: Apache License 2.0

Shell 0.50% Python 59.61% Jupyter Notebook 35.83% JavaScript 4.07%

snorkel's Introduction

**_v0.4.0_**

Documentation Build Status

Acknowledgements

Sponsored in part by DARPA as part of the SIMPLEX program under contract number N66001-15-C-4043.

Getting Started

Motivation

Snorkel is intended to be a lightweight but powerful framework for developing structured information extraction applications for domains in which large labeled training sets are not available or easy to obtain, using the data programming paradigm.

In the data programming approach to developing a machine learning system, the developer focuses on writing a set of labeling functions, which create a large but noisy training set. Snorkel then learns a generative model of this noise—learning, essentially, which labeling functions are more accurate than others—and uses this to train a discriminative classifier.

At a high level, the idea is that developers can focus on writing labeling functions—which are just (Python) functions that provide a label for some subset of data points—and not think about algorithms or features!

Snorkel is very much a work in progress, but some people have already begun developing applications with it, and initial feedback has been positive... let us know what you think, and how we can improve it, in the Issues section!

References

Installation / dependencies

Snorkel uses Python 2.7 and requires a few python packages which can be installed using pip:

pip install --requirement python-package-requirement.txt

Note that sudo can be prepended to install dependencies system wide if this is an option and the above does not work.

Finally, enable ipywidgets:

jupyter nbextension enable --py widgetsnbextension --sys-prefix

By default (e.g. in the tutorials, etc.) we also use Stanford CoreNLP for pre-processing text; you will be prompted to install this when you run run.sh.

Alternatively, virtualenv can be used by starting with:

virtualenv -p python2.7 .virtualenv
source .virtualenv/bin/activate

Running

After installing (see below), just run:

./run.sh

Learning how to use Snorkel

There are currently two tutorials for Snorkel, an introductory tutorial and a more advanced disease tagging tutorial. The tutorials are available in the following directories:

tutorials/intro
tutorials/disease_tagger

Issues

We like issues as a place to put bugs, questions, feature requests, etc- don't be shy! If submitting an issue about a bug, however, please provide a pointer to a notebook (and relevant data) to reproduce it.

Note: if you have an issue with the matplotlib install related to the module freetype, see this post; if you have an issue installing ipython, try upgrading setuptools

Jupyter Notebook Best Practices

Snorkel is built specifically with usage in Jupyter/IPython notebooks in mind; an incomplete set of best practices for the notebooks:

It's usually most convenient to write most code in an external .py file, and load as a module that's automatically reloaded; use:

%load_ext autoreload
%autoreload 2

A more convenient option is to add these lines to your IPython config file, in ~/.ipython/profile_default/ipython_config.py:

c.InteractiveShellApp.extensions = ['autoreload']     
c.InteractiveShellApp.exec_lines = ['%autoreload 2']

More badges...

Coverage Status

Code Climate

Test Coverage

Issue Count

snorkel's People

Contributors

ajratner avatar alldefector avatar bhancock8 avatar bryanhe avatar henryre avatar jason-fries avatar kuleshov avatar lukehsiao avatar mooz avatar netj avatar pmlandwehr avatar senwu avatar stephenbach avatar xiaoling avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.