Can autosklearn generate a model file that can be reused for classifying new data? Wou

Have a look at <a href="https://github.com/automl/ChaLearn_Automatic_Machine_Learning_

Looks like scikit learn uses some external libs <a href="http://stackoverflow.com/

Generate a model file and reuse model to classify new samples (eg streaming big data) about auto-sklearn HOT 6 CLOSED

giorgio79 commented on May 22, 2024

Generate a model file and reuse model to classify new samples (eg streaming big data)

from auto-sklearn.

Comments (6)

mfeurer commented on May 22, 2024 4

The latest version of auto-sklearn features pickleable classifiers/regressors. If there is still an issue with model persistence, please open a new issue.

from auto-sklearn.

mfeurer commented on May 22, 2024 3

Have a look at this.

from auto-sklearn.

mfeurer commented on May 22, 2024

Yes, that would be useful, but so far it can't. What you can do is use show_models(). It outputs something like:

[(weight, constructor),
 (weight, constructor)]

which determines the final ensemble. You can use that in order to retrain your model on the full data and pickle it in your own code.

from auto-sklearn.

giorgio79 commented on May 22, 2024

Looks like scikit learn uses some external libs
http://stackoverflow.com/questions/10592605/save-classifier-to-disk-in-scikit-learn

from auto-sklearn.

Motorrat commented on May 22, 2024

is there a simple programmatic way to convert the output of show_models() into a string that can be used to construct the classifiers in the code? Currently it comes out as

(0.040000, SimpleClassificationPipeline(configuration={
  'balancing:strategy': 'weighting',
  'classifier:__choice__': 'random_forest',
  'classifier:random_forest:bootstrap': 'False',
  'classifier:random_forest:criterion': 'entropy',
  'classifier:random_forest:max_depth': 'None',
  'classifier:random_forest:max_features': 1.6519823800472522,
  'classifier:random_forest:max_leaf_nodes': 'None',
  'classifier:random_forest:min_samples_leaf': 14,
  'classifier:random_forest:min_samples_split': 13,
  'classifier:random_forest:min_weight_fraction_leaf': 0.0,
  'classifier:random_forest:n_estimators': 100,
  'imputation:strategy': 'mean',
  'one_hot_encoding:use_minimum_fraction': 'False',
  'preprocessor:__choice__': 'no_preprocessing',
  'rescaling:__choice__': 'min/max'})),
(0.040000, SimpleClassificationPipeline(configuration={
  'balancing:strategy': 'weighting',
  'classifier:__choice__': 'sgd',
  'classifier:sgd:alpha': 8.157889958167601e-05,
  'classifier:sgd:average': 'False',
  'classifier:sgd:eta0': 0.042599381735495594,
  'classifier:sgd:fit_intercept': 'True',
  'classifier:sgd:learning_rate': 'optimal',
  'classifier:sgd:loss': 'perceptron',
  'classifier:sgd:n_iter': 25,
  'classifier:sgd:penalty': 'l2',
  'imputation:strategy': 'median',
  'one_hot_encoding:minimum_fraction': 0.040130045634589266,
  'one_hot_encoding:use_minimum_fraction': 'True',
  'preprocessor:__choice__': 'no_preprocessing',
  'rescaling:__choice__': 'normalize'})),

from auto-sklearn.

Motorrat commented on May 22, 2024

Also show_models() can be very slow and occupies a lot of memory - takes tens of minutes and tens of GB in my case.

Instead I am using
for quality in $(grep obj $ats/log-run*|sed -e 's/^.*obj\ $.*$$/\1/'|sort|uniq|head -10); do grep final -A 1 $(grep -l "$quality" $ats/log-run*|sort|head -1); done;
ats=salted_temp_dir_of_autoscklearn
to get top 10 classifiers that were chosen as having best scores from the log files and obviously this virtually takes no time at all.
I wonder if there is a reason show_models does what it does.

from auto-sklearn.

Generate a model file and reuse model to classify new samples (eg streaming big data) about auto-sklearn HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs