GithubHelp home page GithubHelp logo

mvmukesh / featureselection-framework-ml Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 327 KB

Answer to how to select variables in data set and build simpler, faster, more reliable and interpretable ML models

backward-elimination feature-evaluator feature-generation feature-weighting-methods filter-methods forward-backward-algo forward-selection machine-learning recursive-feature-elimination wrapper-methods

featureselection-framework-ml's Introduction

FeatureSelection-Framework-ML

Answer of how to select variables in data set and build simpler, faster, more reliable and interpretable machine learning models


Why Do we Select Features?

  • Easier to implement by software developers --> Model Production
  • Enhance generalisation by reducing overfitting
  • Reduced risk of data errors during model use
  • Simple model are easier to interpret
  • Short training time
  • Data redundancy

Why?? Reducing Features for Model Deployment

  • Smaller json messages sent over to the model
    • Json messages contain only necessary variables / inputs
  • Less lines of code for error handling
    • Error handlers need to be written for each variable / input
  • Less feature engineering code
  • Less information to log

How to make Features selection part of Pipeline ????

Feature Selection can be the part of Pipeline, but it is good to select Feature ahead before building pipeline and make the list of selected features part of the pipeline we want to deploy.


Feature Selection Method Nature Pros Cons
Filter Methods Independent of ML Algorithm
Based only on variable characteristics
Quick Feature Removal
Model Agnostic
Fast Computation
Does not capture redundancy
Does not capture feature interaction
Poor model performance
Wrapper Methods / Greedy Algorithms Consider ML Algorithm
Evaluates subsets/grop of Features
Considers feature interaction
Best performance
Best feature subset for a given algorithm
Not model agnostic(features they find may not be best for certain algorithm)
Computation expensive
Often impracticable
Embedded Methods Feature selection during training of ML algorithm Good model performance
Capture feature interaction
Better than Filter
Faster than Wrapper
Not model agnostic

  1. Feature Selection Methods
  • Filter Methods
    • Variance
    • Correlation
    • Univariate Selection
  • Wrapper Methods
    • Forward Feature Selection
    • Backword Feature Elemenation
    • Exaustive Search
  • Embedded / Hybrid Methods
    • LASSO
    • Tree Importance
  • Moving Forward
Feature Selection Methods Code + Blog Link Video Link
  1. Feature Selection -- Basic Methods
  • Removing
    • Constant Features
    • Quasi-Constant Features
    • Duplicated Features
Feature Selection -- Basic Methods Code + Blog Link Video Link
  1. Feature Selection -- Correlation
  • Removing Correlated Features
  • Basic Selection Methods + Correlation -> Pipeline
Feature Selection -- Correlation Code + Blog Link Video Link

Filter Methods

  1. Univariate Statistical Methods
  • Mutual Information
  • Chi-square distribution
  • Anova
  • Basic Selection Methods + Statistical Methods -> Pipeline
Univariate Statistical Methods -- Filter Method Code + Blog Link Video Link
  1. Other Methods and Metrics
  • Univariate ROC-AUC, MSE etc
  • Method used in a KDD competition - 2009

Wrapper Methods

  1. Wrapper Methods
  • Forward Feature Selection
  • Backward Feature Selection
  • Exhaustive Feature Selection
Wrapper Methods -- Feature Selection Code + Blog Link Video Link

Embedded Methods

  1. Linear Model Coefficients
  • Logistic Regression Coefficients
  • Linear Regression Coefficients
  • Effect of Regularization on Coefficients
  • Basic Selection Methods + Correlation + Embedded -> Pipeline
Linear Model Coefficients Code + Blog Link Video Link
  1. Lasso
  • Lasso
  • Basic Selection Methods + Correlation + Lasso -> Pipeline
Lasso Code + Blog Link Video Link
  1. Tree Importance
  • Random Forest derived Feature Importance
  • Tree importance + Recursive Feature Elimination
  • Basic Selection Methods + Correlation + Tree importance -> Pipeline
Tree Importance Code + Blog Link Video Link

Hybrid Methods

  1. Hybrid Methods
  • Feature Shuffling
  • Recursive Feature Elimination
  • Recursive Feature Addition
Hybrid Methods Code + Blog Link Video Link

featureselection-framework-ml's People

Contributors

mvmukesh avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.