GithubHelp home page GithubHelp logo

akashjborah97 / graph-based-feature-selection-for-dimensionality-reduction Goto Github PK

View Code? Open in Web Editor NEW
4.0 1.0 0.0 163 KB

- Graph Based Feature Selection is a new approach of reducing the dimensionality of a dataset using a Graph Based approach. - The apporach tries to generate a Kruskal's minimum spanning tree of a graph where the features of the dataset are the vertices and the correlation among them are the weights of the edges. -The edges having weights greater than the user defined threshold are removed. Hence, reducing the dimension of the dataset.

Jupyter Notebook 100.00%
dimentionality-reduction graph machine-learning new-approach python

graph-based-feature-selection-for-dimensionality-reduction's Introduction

Graph-Based-Feature-Selection-for-Dimensionality-Reduction

Graph Based Feature Selection is a new approach of reducing the dimensionality of a dataset using a Graph Based approach. The apporach tries to generate a Kruskal's minimum spanning tree of a graph where the features of the dataset are the vertices and the correlation among them are the weights of the edges. The edges having weights greater than the user defined threshold are removed. Hence, reducing the dimension of the dataset.

Tools Used: Scikit-learn, Pandas, Numpy, Matplotlib, Networkx

Steps performed to achieve dimensionality reduction via Graph Based Method:

  1. Importing dataset and Data Preprocessing:

    • Naming the columns.
    • Imputation to treat Missing Values using imputer from scikit-learn library.
  2. Normalization:

    • Values are rescaled so that they end up ranging between 0 and 1 using Normalizer from scikit-learn library.
    • Also known as min-max scaling.
  3. Correlation matrix:

    • Correlation among the features are found using .corr() from pandas library.
  4. Defining the Vertices and Weights:

    • The features or the columns are taken into a list to define Vertices.
    • Weights are stored into another list by extracting the upper triangular matrix from the correlation matrix.
  5. The Graph Based Approach:

    Generating the Minimum Spanning Tree(MST) for the Graph using Krushkal's algorithm

    A spanning tree of a graph is a sub graph that is a tree and connects all the vertices without forming a loop. Number of vertices is same as the original graph. Method:

    a. Sort all the edges in non-decreasing order of their weight.

    b. Pick the smallest edge. Check if it forms a cycle with the spanning tree formed so far. If cycle is not formed include this edge else discard it.

    c. Repeat Step b until there are (V-1) edges in the spanning tree.

  6. Removing edges whose weights are greater than the threshold from the constructed MST

    • Cosidering the Threshold to be -0.31, the vertices associated with weight greater than this are dropped.
    • The remaining MST is printed after dropping the vertices.
  7. The Final Reduced Dataset is displayed

graph-based-feature-selection-for-dimensionality-reduction's People

Contributors

akashjborah97 avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.