GithubHelp home page GithubHelp logo

ramp's Introduction

Ramp - Rapid Machine Learning Prototyping

Ramp is a python library for rapid prototyping of machine learning solutions. It's a light-weight pandas-based machine learning framework pluggable with existing python machine learning and statistics tools (scikit-learn, rpy2, etc.). Ramp provides a simple, declarative syntax for exploring features, algorithms and transformations quickly and efficiently.

Documentation: http://ramp.readthedocs.org

Why Ramp?

  • Clean, declarative syntax

  • Complex feature transformations

    Chain and combine features:

Normalize(Log('x'))
Interactions([Log('x1'), (F('x2') + F('x3')) / 2])
Reduce feature dimension:
DimensionReduction([F('x%d'%i) for i in range(100)], decomposer=PCA(n_components=3))
Incorporate residuals or predictions to blend with other models:
Residuals(simple_model_def) + Predictions(complex_model_def)
  • Data context awareness

    Any feature that uses the target ("y") variable will automatically respect the current training and test sets. Similarly, preparation data (a feature's mean and stdev, for example) is stored and tracked between data contexts.

  • Composability

    All features, estimators, and their fits are composable, pluggable and storable.

  • Easy extensibility

    Ramp has a simple API, allowing you to plug in estimators from scikit-learn, rpy2 and elsewhere, or easily build your own feature transformations, metrics, feature selectors, reporters, or estimators.

Quick start

Getting started with Ramp: Classifying insults

Or, the quintessential Iris example:

import pandas
from ramp import *
import urllib2
import sklearn
from sklearn import decomposition


# fetch and clean iris data from UCI
data = pandas.read_csv(urllib2.urlopen(
    "http://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"))
data = data.drop([149]) # bad line
columns = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'class']
data.columns = columns


# all features
features = [FillMissing(f, 0) for f in columns[:-1]]

# features, log transformed features, and interaction terms
expanded_features = (
    features +
    [Log(F(f) + 1) for f in features] +
    [
        F('sepal_width') ** 2,
        combo.Interactions(features),
    ]
)


# Define several models and feature sets to explore,
# run 5 fold cross-validation on each and print the results.
# We define 2 models and 4 feature sets, so this will be
# 4 * 2 = 8 models tested.
shortcuts.cv_factory(
    data=data,

    target=[AsFactor('class')],
    metrics=[
        [metrics.GeneralizedMCC()],
        ],
    # report feature importance scores from Random Forest
    reporters=[
        [reporters.RFImportance()],
        ],

    # Try out two algorithms
    model=[
        sklearn.ensemble.RandomForestClassifier(
            n_estimators=20),
        sklearn.linear_model.LogisticRegression(),
        ],

    # and 4 feature sets
    features=[
        expanded_features,

        # Feature selection
        [trained.FeatureSelector(
            expanded_features,
            # use random forest's importance to trim
            selectors.RandomForestSelector(classifier=True),
            target=AsFactor('class'), # target to use
            n_keep=5, # keep top 5 features
            )],

        # Reduce feature dimension (pointless on this dataset)
        [combo.DimensionReduction(expanded_features,
                            decomposer=decomposition.PCA(n_components=4))],

        # Normalized features
        [Normalize(f) for f in expanded_features],
    ]
)

Status

Ramp is alpha currently, so expect bugs, bug fixes and API changes.

Requirements

  • Numpy
  • Scipy
  • Pandas
  • PyTables
  • Sci-kit Learn
  • gensim

Author

Ken Van Haren. Email with feedback/questions: [email protected] @squaredloss

Contributors

ramp's People

Contributors

frgtn avatar johnmcdonnell avatar kvh avatar wrobstory avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.