dclambert / pyensemble Goto Github PK
View Code? Open in Web Editor NEWAn implementation of Caruana et al's Ensemble Selection algorithm in Python, based on scikit-learn
License: Other
An implementation of Caruana et al's Ensemble Selection algorithm in Python, based on scikit-learn
License: Other
Hey! Great work on simplifying the process of building ensembles. I was trying to use this, but I found that the GBM implementation of Scikit learn is far too slow for my needs. We could possibly include a faster implementation, aka XGBoost.
I would love to help out with this, will make a PR soon after adding XGBoost. Also keen on helping you add more scoring functions (precision, recall, matthew's coefficient, and support for custom scoring functions similar to the scikit learn api)
Cheers!
Not sure if you were still maintaining this code base base but now fails with sklearn 0.16.0 due to:
"The min_density parameter is deprecated as of version 0.14 and will be removed in 0.16.
"and will be removed in 0.16.", DeprecationWarning)"
Hi:
I tested the simplest call of ensemble_train and got a ValueError for the parameter min_samples_split:
Traceback (most recent call last):
File "pyensemble/ensemble_train.py", line 202, in
ens.fit(X_train, y_train)
File "/home/mourao/income_prediction/pyensemble/ensemble.py", line 290, in fit
self.fit_models(X, y)
File "/home/mourao/income_prediction/pyensemble/ensemble.py", line 325, in fit_models
model.fit(X[train_inds], y[train_inds])
File "/usr/local/lib/python2.7/dist-packages/sklearn/tree/tree.py", line 790, in fit
X_idx_sorted=X_idx_sorted)
File "/usr/local/lib/python2.7/dist-packages/sklearn/tree/tree.py", line 194, in fit
% self.min_samples_split)
ValueError: min_samples_split must be an integer greater than 1 or a float in (0.0, 1.0]; got the integer 1
I solved the problem removing 1 from the list in the file model_library.py:
def build_decisionTreeClassifiers(random_state=None):
rs = check_random_state(random_state)
param_grid = {
'criterion': ['gini', 'entropy'],
'max_features': [None, 'auto', 'sqrt', 'log2'],
'max_depth': [None, 1, 2, 5, 10],
'min_samples_split': [2, 5, 10],
'random_state': [rs.random_integers(100000) for i in xrange(3)],
}
return build_models(DecisionTreeClassifier, param_grid)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.