GithubHelp home page GithubHelp logo

dustinstansbury / stacked_generalization Goto Github PK

View Code? Open in Web Editor NEW
71.0 71.0 37.0 26 KB

Python implementation of stacked generalization classifier. Plays nice with sklearn.

License: GNU General Public License v3.0

Python 100.00%

stacked_generalization's People

Contributors

8tracks-datascience avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

stacked_generalization's Issues

IndexError: indices are out-of-bounds.

IndexError: indices are out-of-bounds.

import pandas as pd
import numpy as np
from stacked_generalizer import StackedGeneralizer
from sklearn.ensemble import RandomForestClassifier, ExtraTreesClassifier
from sklearn.linear_model import LogisticRegression

#Load cleaned data :
train = pd.read_csv('train1.csv')
test = pd.read_csv('test1.csv')
Then I selected the variables. Its a subset of all the variables in train data.

 target='Y1'
 ID = 'ID'
 predictors1= ['Marks_SA','Marks_PA',
         'Marks_CA','Feat2','Experience', 'Feat6','Feat1',
         'Feat5','Feat4']

Now blended the models:

base_models = [RandomForestClassifier(n_estimators=100, n_jobs=-1, criterion='gini'),
          RandomForestClassifier(n_estimators=100, n_jobs=-1, criterion='entropy'),
          ExtraTreesClassifier(n_estimators=100, n_jobs=-1, criterion='gini')]


# define blending model
blending_model = LogisticRegression()
VERBOSE = True
N_FOLDS = 5

# initialize multi-stage model
sg = StackedGeneralizer(base_models, blending_model, 
                   n_folds=N_FOLDS, verbose=VERBOSE)

# fit model
sg.fit(train[predictors1],train[target])

Following error is being generated:

Fitting Base Models...
Fitting model 01: RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
max_depth=None, max_features='auto', max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=50, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)

Fold 1

IndexError Traceback (most recent call last)
in ()
1 # fit model
2 #sg.fit(X[:n_train],y[:n_train])
----> 3 sg.fit(train[columns],train[target])

c:\users\src\stacked-generalization\stacked_generalizer.pyc in fit(self, X, y)
211
212 def fit(self, X, y):
--> 213 X_blend = self.fit_transform_base_models(X, y)
214 self.fit_blending_model(X_blend, y)
215

c:\users\src\stacked-generalization\stacked_generalizer.pyc in fit_transform_base_models(self, X, y)
159
160 def fit_transform_base_models(self, X, y):
--> 161 self.fit_base_models(X, y)
162 return self.transform_base_models(X)
163

c:\users\src\stacked-generalization\stacked_generalizer.pyc in fit_base_models(self, X, y)
129 print('Fold %d' % (j + 1))
130
--> 131 X_train = X[train_idx]
132 y_train = y[train_idx]
133

C:\Users\Anaconda2\envs\gl-env\lib\site- packages\pandas\core\frame.pyc in __ getitem__(self, key)
1984 if isinstance(key, (Series, np.ndarray, Index, list)):
1985 # either boolean or fancy integer index
-> 1986 return self._getitem_array(key)
1987 elif isinstance(key, DataFrame):
1988 return self._getitem_frame(key)

C:\Users\Anaconda2\envs\gl-env\lib\site-packages\pandas\core\frame.pyc in _getitem_array(self, key)
2029 else:
2030 indexer = self.ix._convert_to_indexer(key, axis=1)
-> 2031 return self.take(indexer, axis=1, convert=True)
2032
2033 def _getitem_multilevel(self, key):

C:\Users\Anaconda2\envs\gl-env\lib\site-packages\pandas\core\generic.pyc in take(self, indices, axis, convert, is_copy)
1626 new_data = self._data.take(indices,
1627 axis=self._get_block_manager_axis(axis),
-> 1628 convert=True, verify=True)
1629 result = self._constructor(new_data).finalize(self)
1630

C:\Users\Anaconda2\envs\gl-env\lib\site-packages\pandas\core\internals.pyc in take(self, indexer, axis, verify, convert)
3635 n = self.shape[axis]
3636 if convert:
-> 3637 indexer = maybe_convert_indices(indexer, n)
3638
3639 if verify:

C:\Usersnaconda2\envs\gl-env\lib\site-packages\pandas\core\indexing.pyc in maybe_convert_indices(indices, n)
1808 mask = (indices >= n) | (indices < 0)
1809 if mask.any():
-> 1810 raise IndexError("indices are out-of-bounds")
1811 return indices
1812

IndexError: indices are out-of-bounds

http://stackoverflow.com/questions/39345466/indexerror-while-fitting-stacked-generalization

Add predict_proba capability

Currently the classifier supports a 'predict' capability. Like other scikit classifiers, could this classifier also support predict_proba functionality?

computed probabilities don't add up to one

I am following the starter example on a binary classification problem. My goal is to obtain the probabilities for class 1 but as I found out the returned probabilities don't add up to one. Is this supposed to work that way? which in a way means that this module is not appropriate for probability prediction? I ended up normalizing the result vector using the max value but this doesn't make much sense.

Python 3 compatibility

The 'evaluate' function in 'stacking_generalizater.py' breaks when using Python 3. The error that occurs is illustrated below:

Traceback (most recent call last):
File "stackinggeneralizationtest.py", line 3, in
from stacked_generalizer import StackedGeneralizer
File "{myenv}/stacked_generalizer.py", line 217
print classification_report(y, y_pred)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.