GithubHelp home page GithubHelp logo

stjordanis / pancake Goto Github PK

View Code? Open in Web Editor NEW

This project forked from bhimmetoglu/pancake

0.0 2.0 0.0 1.19 MB

Pancake is a Python package which provides a simple API to stack scikit-learn models.

License: MIT License

Python 100.00%

pancake's Introduction

README

PanCake is a Python package that allows users to stack scikit-learn models over a number of folds and train stacker models using out-of-sample predictions of input models.

Stacks

The stacking tool provides the construction of a stacking module composed of in-layer (models being stacked) and out-layer (stacker models) models. The output is a list or matrix of predictions from training of the module, which can either be used as the final results, or fed into a different module.

Installation

After cloning the repository, install from the directory of the package by

pip install .

Usage

Initiating stacker

stacker = Stacker(X, y, splitter, evalMetric, family)

where X is the data matrix (numpy array), y is target vector (numpy array), splitter is a scikit-learn cross-validation generator (KFold or StratifiedKFold), evalMetric is the metric to be maximized during training, and family is the type of the problem (currently "regression" or "binary").

Adding models (in-layer):

Add a scikit-learn model modelObj to in-layer by

stacker.addModelIn(modelObj, trainable, hyperParameters)

If trainable is set to True then the model will be trained across folds using the hyperParameters which is a dictionary of hyper-parameter grid for the model (check scikit-learn's documentation for the model). If it is set to False then the model is assumed fixed and is only fitted across folds.

Adding stacker models (out-layer):

Add a scikit-learn model modelObj to out-layer by

stacker.addModelOut(modelObj, hyperParameters)

Again, hyperParameters is a dictionary containig the grid of hyper-parameters for the model.

Training and Predictions:

To train the model and get predictions on the training data, use

predsTrain = stacker.stackTrain(matrixOut)

which yields final predictions for each out-layer model as a list when matrixOut is set to False. When it is set to True, predictions for each out-model is appended as column vectors is a an array.

For predictions on the test set, use:

predsTest = stacker.stackTest(X_ts, matrixOut)

where X_ts is the test data and matrixOut is the same as above.

Summary, Saving and Loading:

To get a summary on CV scores, fit and training times for each in-layer and out-layer model, use

stacker.summary()

To save the trained stacker for later use, call

saveModel(stacker, savePath)

To load a trained model from disk, call

stacker = loadModel(savePath)

Examples

Jupyter notebooks analyzing the Boston Housing data is included in the repo:

  1. Stacking linear models
  2. Stacking Random Forest and Support Vector Regressors

TODO

  1. Multi-class classification problems
  2. Parallelization at the model and/or hyper-parameter level

pancake's People

Contributors

bhimmetoglu avatar hariharasudhanas avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.