The stacked_generalization from dustinstansbury

IndexError: indices are out-of-bounds.

import pandas as pd
import numpy as np
from stacked_generalizer import StackedGeneralizer
from sklearn.ensemble import RandomForestClassifier, ExtraTreesClassifier
from sklearn.linear_model import LogisticRegression

#Load cleaned data :
train = pd.read_csv('train1.csv')
test = pd.read_csv('test1.csv')
Then I selected the variables. Its a subset of all the variables in train data.

 target='Y1'
 ID = 'ID'
 predictors1= ['Marks_SA','Marks_PA',
         'Marks_CA','Feat2','Experience', 'Feat6','Feat1',
         'Feat5','Feat4']

Now blended the models:

base_models = [RandomForestClassifier(n_estimators=100, n_jobs=-1, criterion='gini'),
          RandomForestClassifier(n_estimators=100, n_jobs=-1, criterion='entropy'),
          ExtraTreesClassifier(n_estimators=100, n_jobs=-1, criterion='gini')]


# define blending model
blending_model = LogisticRegression()
VERBOSE = True
N_FOLDS = 5

# initialize multi-stage model
sg = StackedGeneralizer(base_models, blending_model, 
                   n_folds=N_FOLDS, verbose=VERBOSE)

# fit model
sg.fit(train[predictors1],train[target])

Following error is being generated:

Fitting Base Models...
Fitting model 01: RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
max_depth=None, max_features='auto', max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=50, n_jobs=-1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)

Fold 1

IndexError Traceback (most recent call last)
in ()
1 # fit model
2 #sg.fit(X[:n_train],y[:n_train])
----> 3 sg.fit(train[columns],train[target])

c:\users\src\stacked-generalization\stacked_generalizer.pyc in fit(self, X, y)
211
212 def fit(self, X, y):
--> 213 X_blend = self.fit_transform_base_models(X, y)
214 self.fit_blending_model(X_blend, y)
215

c:\users\src\stacked-generalization\stacked_generalizer.pyc in fit_transform_base_models(self, X, y)
159
160 def fit_transform_base_models(self, X, y):
--> 161 self.fit_base_models(X, y)
162 return self.transform_base_models(X)
163

c:\users\src\stacked-generalization\stacked_generalizer.pyc in fit_base_models(self, X, y)
129 print('Fold %d' % (j + 1))
130
--> 131 X_train = X[train_idx]
132 y_train = y[train_idx]
133

C:\Users\Anaconda2\envs\gl-env\lib\site- packages\pandas\core\frame.pyc in __ getitem__(self, key)
1984 if isinstance(key, (Series, np.ndarray, Index, list)):
1985 # either boolean or fancy integer index
-> 1986 return self._getitem_array(key)
1987 elif isinstance(key, DataFrame):
1988 return self._getitem_frame(key)

C:\Users\Anaconda2\envs\gl-env\lib\site-packages\pandas\core\frame.pyc in _getitem_array(self, key)
2029 else:
2030 indexer = self.ix._convert_to_indexer(key, axis=1)
-> 2031 return self.take(indexer, axis=1, convert=True)
2032
2033 def _getitem_multilevel(self, key):

C:\Users\Anaconda2\envs\gl-env\lib\site-packages\pandas\core\generic.pyc in take(self, indices, axis, convert, is_copy)
1626 new_data = self._data.take(indices,
1627 axis=self._get_block_manager_axis(axis),
-> 1628 convert=True, verify=True)
1629 result = self._constructor(new_data).finalize(self)
1630

C:\Users\Anaconda2\envs\gl-env\lib\site-packages\pandas\core\internals.pyc in take(self, indexer, axis, verify, convert)
3635 n = self.shape[axis]
3636 if convert:
-> 3637 indexer = maybe_convert_indices(indexer, n)
3638
3639 if verify:

C:\Usersnaconda2\envs\gl-env\lib\site-packages\pandas\core\indexing.pyc in maybe_convert_indices(indices, n)
1808 mask = (indices >= n) | (indices < 0)
1809 if mask.any():
-> 1810 raise IndexError("indices are out-of-bounds")
1811 return indices
1812

IndexError: indices are out-of-bounds

http://stackoverflow.com/questions/39345466/indexerror-while-fitting-stacked-generalization

Add predict_proba capability

Currently the classifier supports a 'predict' capability. Like other scikit classifiers, could this classifier also support predict_proba functionality?

computed probabilities don't add up to one

I am following the starter example on a binary classification problem. My goal is to obtain the probabilities for class 1 but as I found out the returned probabilities don't add up to one. Is this supposed to work that way? which in a way means that this module is not appropriate for probability prediction? I ended up normalizing the result vector using the max value but this doesn't make much sense.

Python 3 compatibility

The 'evaluate' function in 'stacking_generalizater.py' breaks when using Python 3. The error that occurs is illustrated below:

Traceback (most recent call last):
File "stackinggeneralizationtest.py", line 3, in
from stacked_generalizer import StackedGeneralizer
File "{myenv}/stacked_generalizer.py", line 217
print classification_report(y, y_pred)

dustinstansbury / stacked_generalization Goto Github PK

stacked_generalization's People

Contributors

Stargazers

Watchers

Forkers

stacked_generalization's Issues

IndexError: indices are out-of-bounds.

Add predict_proba capability

computed probabilities don't add up to one

x

Python 3 compatibility

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs