GithubHelp home page GithubHelp logo

mislavsag / mlr3filters Goto Github PK

View Code? Open in Web Editor NEW

This project forked from mlr-org/mlr3filters

0.0 0.0 0.0 813 KB

Filter-based feature selection for mlr3

Home Page: https://mlr3filters.mlr-org.com

License: GNU Lesser General Public License v3.0

R 100.00%

mlr3filters's Introduction

mlr3filters

Package website: release | dev

{mlr3filters} adds feature selection filters to mlr3. The implemented filters can be used stand-alone, or as part of a machine learning pipeline in combination with mlr3pipelines and the filter operator.

Wrapper methods for feature selection are implemented in mlr3fselect. Learners which support the extraction feature importance scores can be combined with a filter from this package for embedded feature selection.

r-cmd-check CRAN Status StackOverflow Mattermost

Installation

CRAN version

install.packages("mlr3filters")

Development version

remotes::install_github("mlr-org/mlr3filters")

Filters

Filter Example

set.seed(1)
library("mlr3")
library("mlr3filters")

task = tsk("sonar")
filter = flt("auc")
head(as.data.table(filter$calculate(task)))
##    feature     score
## 1:     V11 0.2811368
## 2:     V12 0.2429182
## 3:     V10 0.2327018
## 4:     V49 0.2312622
## 5:      V9 0.2308442
## 6:     V48 0.2062784

Implemented Filters

Name label Task Types Feature Types Package
anova ANOVA F-Test Classif Integer, Numeric stats
auc Area Under the ROC Curve Score Classif Integer, Numeric mlr3measures
carscore Correlation-Adjusted coRrelation Score Regr Numeric care
carsurvscore Correlation-Adjusted coRrelation Survival Score Surv Integer, Numeric carSurv, mlr3proba
cmim Minimal Conditional Mutual Information Maximization Classif & Regr Integer, Numeric, Factor, Ordered praznik
correlation Correlation Regr Integer, Numeric stats
disr Double Input Symmetrical Relevance Classif & Regr Integer, Numeric, Factor, Ordered praznik
find_correlation Correlation-based Score Classif & Regr Integer, Numeric stats
importance Importance Score Universal Logical, Integer, Numeric, Character, Factor, Ordered, POSIXct
information_gain Information Gain Classif & Regr Integer, Numeric, Factor, Ordered FSelectorRcpp
jmi Joint Mutual Information Classif & Regr Integer, Numeric, Factor, Ordered praznik
jmim Minimal Joint Mutual Information Maximization Classif & Regr Integer, Numeric, Factor, Ordered praznik
kruskal_test Kruskal-Wallis Test Classif Integer, Numeric stats
mim Mutual Information Maximization Classif & Regr Integer, Numeric, Factor, Ordered praznik
mrmr Minimum Redundancy Maximal Relevancy Classif & Regr Integer, Numeric, Factor, Ordered praznik
njmim Minimal Normalised Joint Mutual Information Maximization Classif & Regr Integer, Numeric, Factor, Ordered praznik
performance Predictive Performance Universal Logical, Integer, Numeric, Character, Factor, Ordered, POSIXct
permutation Permutation Score Universal Logical, Integer, Numeric, Character, Factor, Ordered, POSIXct
relief RELIEF Classif & Regr Integer, Numeric, Factor, Ordered FSelectorRcpp
selected_features Embedded Feature Selection Universal Logical, Integer, Numeric, Character, Factor, Ordered, POSIXct
variance Variance Universal Integer, Numeric stats

Variable Importance Filters

The following learners allow the extraction of variable importance and therefore are supported by FilterImportance:

## [1] "classif.featureless" "classif.ranger"      "classif.rpart"      
## [4] "classif.xgboost"     "regr.featureless"    "regr.ranger"        
## [7] "regr.rpart"          "regr.xgboost"

If your learner is not listed here but capable of extracting variable importance from the fitted model, the reason is most likely that it is not yet integrated in the package mlr3learners or the extra learner organization. Please open an issue so we can add your package.

Some learners need to have their variable importance measure “activated” during learner creation. For example, to use the “impurity” measure of Random Forest via the {ranger} package:

task = tsk("iris")
lrn = lrn("classif.ranger")
lrn$param_set$values = list(importance = "impurity")

filter = flt("importance", learner = lrn)
filter$calculate(task)
head(as.data.table(filter), 3)
##         feature    score
## 1: Petal.Length 43.19847
## 2:  Petal.Width 43.11627
## 3: Sepal.Length 10.62848

Performance Filter

FilterPerformance is a univariate filter method which calls resample() with every predictor variable in the dataset and ranks the final outcome using the supplied measure. Any learner can be passed to this filter with classif.rpart being the default. Of course, also regression learners can be passed if the task is of type “regr”.

Filter-based Feature Selection

In many cases filtering is only one step in the modeling pipeline. To select features based on filter values, one can use PipeOpFilter from mlr3pipelines.

library(mlr3pipelines)
task = tsk("spam")

# the `filter.frac` should be tuned
graph = po("filter", filter = flt("auc"), filter.frac = 0.5) %>>%
  po("learner", lrn("classif.rpart"))

learner = as_learner(graph)
rr = resample(task, learner, rsmp("holdout"))

mlr3filters's People

Contributors

be-marc avatar bommert avatar github-actions[bot] avatar jakob-r avatar lorenzwalthert avatar mb706 avatar mislavsag avatar mllg avatar pat-s avatar pre-commit-ci[bot] avatar sebffischer avatar sumny avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.