GithubHelp home page GithubHelp logo

Comments (8)

ajing avatar ajing commented on June 24, 2024

How to deal with missing values in StackNet?

from stacknet.

goldentom42 avatar goldentom42 commented on June 24, 2024

Hi ajing, for missing values I don't think StackNet is expected to deal with them unless you exclusively use XGBoost or LGBM, which deal with them out of the box.
You would need to treat them in your favorite ML language before using StackNet.

For the typical strategy I assume you need to clean your data and perform feature extraction before using StackNet.

I am not sure what you mean by model diagnosis but if there is some sort of plotting involved you will have to do it in your favorite language and use output_name parameter so that you get the predictions of each model and fold saved by StackNet.

Hope this helps, Olivier.

from stacknet.

ajing avatar ajing commented on June 24, 2024

Thanks, Oliver!

from stacknet.

kaz-Anova avatar kaz-Anova commented on June 24, 2024

If apologies for late response @ajing .

If you use sparse format, most StackNet-native algorithms treat the non-given values as zeros . As @goldentom42 pointed out , certain algorithms will treat these differently.

For the meantime, you are solely responsible for creating good features/feature_engineering , but in the future there will be options inside StackNet too.

A good approach to build a strong StackNet is explained here after the How to use StackNet . You need to build it model-by-model as in hyper tune one model at a time and sequentially you build your ensemble.

Consider having various different datasets and run different STackNets . For example in one dataset you might one-hot encode your categorical variables , while in another you may just label encode them.

Then you can average all eeh results.

Hope it helps.

from stacknet.

ajing avatar ajing commented on June 24, 2024

Hi Marios,

Thanks for answering my question in such detail!

You mentioned: "To tune a single model, one may choose an algorithm for the first layer and a dummy one for the second layer." How dumb should the second layer be?

Thanks,
Jing

from stacknet.

goldentom42 avatar goldentom42 commented on June 24, 2024

Hello Jing,

The goal in this step is to tune a single model so a linear regression / Logistic regression would do.
When I tune a model I usually set several versions of hyperparameters on the first level and a linear model or a random forest with small depth at the 2nd level . This way I can see how the model I want to tune behaves against the dataset.

Here is an example when searching for the best regularization parameter:

`LSVR Type:Liblinear threads:1 usescale:True C:1.0 maxim_Iteration:1000 seed:1 verbose:false
LSVR Type:Liblinear threads:1 usescale:True C:0.1 maxim_Iteration:1000 seed:1 verbose:false
LSVR Type:Liblinear threads:1 usescale:True C:0.01 maxim_Iteration:1000 seed:1 verbose:false
LSVR Type:Liblinear threads:1 usescale:True C:0.001 maxim_Iteration:1000 seed:1 verbose:false

RandomForestRegressor bootsrap:false estimators:100 threads:3 offset:0.00001 max_depth:5 max_features:0.3 min_leaf:1.0 min_split:5.0 Objective:RMSE row_subsample:0.8 seed:1 verbose:false`

Let me know if this answers your question.
Olivier

from stacknet.

ajing avatar ajing commented on June 24, 2024

Thanks for the explanation, Oliver!

Could I interpret it as another way to do GridSearch in CV?

from stacknet.

goldentom42 avatar goldentom42 commented on June 24, 2024

Sure you can ;-)

from stacknet.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.