GithubHelp home page GithubHelp logo

me-nobody / dfnet Goto Github PK

View Code? Open in Web Editor NEW

This project forked from pievos101/dfnet

0.0 0.0 0.0 1.78 MB

Network-guided greedy decision forest for feature subset selection

Home Page: https://www.nature.com/articles/s41598-022-21417-8

License: GNU General Public License v3.0

R 100.00%

dfnet's Introduction

Network-guided greedy decision forest for feature subset selection

Paper

Installation

The DFNET R-package can be installed using devtools.

install.packages("devtools")
devtools::install_github("pievos101/DFNET")

Usage

See our examples using synthetic data sets or real world cancer data.

Generally speaking, DFNET follows a four step process:

  1. Preparing the input data (graph and features)
  2. Training the forest.
  3. Finding useful decision trees.
  4. Using these trees for evaluation.

Preparing input data

DFNET expects an igraph::igraph and a 2D or 3D feature array, as well as a target vector with the same number of rows as the array. The vertex names of the graph should be the same as the column names of the array. When in doubt, use launder or related functions to prepare the input data.

Training the forest

Once you have your graph and features, you can train your forest like so:

forest <- train(,
    graph, features, target,
    ...
)

If you have a pre-trained forest, you can use that for training as well:

forest <- train(forest,
    graph, features, target,
    ...
)

Finding useful trees

Since DFNET performs greedy optimization, the last generation of trees is the best according to the provided test metric. DFNET provides overrides for the standard R methods head and tail, which return generation.

# get the selected modules
last_gen <- tail(forest, 1)
tree_imp <- attr(last_gen, "last.performance")

Note, that performance metrics for earlier generations are not kept. Several importance scores can be derived from these metrics.

e_imp <- edge_importance(graph, last_gen$trees, tree_imp)

f_imp <- feature_importance(last_gen, features)

m_imp <- module_importance(
    graph,
    last_gen$modules,
    e_imp,
    tree_imp
)

The module importance is particularly useful for feature selection, as it combines the importance of edges within a module with the overall accuracy of the decision tree. You can use it to order decision trees or simply extract the best one.

best <- which.max(as.numeric(m_imp[, "total"]))
best.tree <- last_gen$trees[[best]]
by_importance <- order(m_imp[, "total"], decreasing = TRUE)
last_gen$trees[by_importance]

Using these trees for evaluation

DFNET provides an override for the predict method, that functions much like ranger's.

# Predict using the best DT
pred_best = predict(best.tree, test_data)$predictions

# predict using all detected modules
pred_all = predict(last_gen, test_data)$predictions

You can use ModelMetrics to evaluate the accuracy, precision, recall, or other performance metrics.

ModelMetrics::auc(pred_best, test_target)
ModelMetrics::auc(pred_all, test_target)

Now, lets check the performance of that module on the independent test data set. We compare the results with the performance of all trees selected.

# Prepare test data
colnames(mRNA_test)  = paste(colnames(mRNA_test),"$","mRNA", sep="")
colnames(Methy_test) = paste(colnames(Methy_test),"$","Methy", sep="")
DATA_test = as.data.frame(cbind(mRNA_test, Methy_test))

# Predict using the best DT
pred_best = predict(best_DT, DATA_test)$predictions

# predict using all detected modules
pred_all = predict(last_gen, DATA_test)$predictions

pred_best
pred_all

# Check the performance of the predictions
ModelMetrics::auc(pred_best, target[test_ids])
ModelMetrics::auc(pred_all, target[test_ids])

Finally, we provide an extension to compute tree-based SHAP values via treeshap.

forest_unified = dfnet.unify(last_gen$trees, test_data)
forest_shap = treeshap(forest_unified, test_data)

BibTeX Citation

@article{pfeifer2022multi,
  title={Multi-omics disease module detection with an explainable Greedy Decision Forest},
  author={Pfeifer, Bastian and Baniecki, Hubert and Saranti, Anna and Biecek, Przemyslaw and Holzinger, Andreas},
  journal={Scientific Reports},
  volume={12},
  number={1},
  pages={1--15},
  year={2022},
  publisher={Nature Publishing Group}
}

dfnet's People

Contributors

pievos101 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.