GithubHelp home page GithubHelp logo

kevin-murgas / ppi-hypergraph-geometry Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 1.0 1.22 MB

Code for publication: Hypergraph Geometry Reflects Higher-Order Dynamics in Protein Interaction Networks; Kevin A. Murgas, Emil Saucan, Romeil Sandhu; Scientific Reports (2022)

License: GNU General Public License v3.0

R 49.60% MATLAB 50.40%
forman-curvature hypergraph network-analysis cancer discrete-geometry protein-protein-interaction-network stem-cell-differentiation

ppi-hypergraph-geometry's Introduction

PPI-hypergraph-geometry

Code for publication:
Hypergraph Geometry Reflects Higher-Order Dynamics in Protein Interaction Networks
Kevin A. Murgas, Emil Saucan, Romeil Sandhu
Scientific Reports (under review)

Requirements:

  • MATLAB
  • R

Summary of code script files:

  1. stringdb_parse.R, convert2mat.m - These two scripts prepare the PPI topology from the STRINGdb protein-protein interaction database. The R script accesses the database and saves node- and edge-list files; the MATLAB script then processes the node- and edge-list into an adjacency matrix and list of gene names (as HGNC gene symbol), which are saved as a single .mat file containing a standardized representation of a PPI topology.
  2. examine_topology.m - This MATLAB script is used to analyze topological aspects of the 1-dimensional (1D) graph model and 2-dimensional (2D) hypergraph model of the PPI network, including numbers of vertices, edges, and faces (in the 2D case), as well as topological characteristics including the Euler number. Subsequently, edge- and face-degree distributions are assessed.
  3. preprocess_expression.m - This MATLAB script preprocesses gene expression data (from various RNA-seq datasets from public databases such as GEO, TCGA, etc) and formats the normalized expression into a standardized .mat file containing the expression matrix along with sample (column) and gene (row) names. The normalization procedure includes quantile normalization and log_2 transform. Because of the variability in datasets, this processing script incorporates user input to ensure the correct data columns are selected and optional proecessing steps can be omitted. Note, because of the challenge of different data formats, some user tweaking of the code may be required to correctly prepare their own datasets.
  4. curvature_script.m - This MATLAB function script loads in specified topology and expression .mat files (prepared using the above scripts 1 and 3) corresponding to a PPI topology to build the network model and a gene expression dataset to be computed for geometric network analysis. The algorithm first combines the data by intersecting genes shared in both the gene expression data and PPI topology, then taking the maximally connected network component. Then, the algorithm runs through each sample (column) of gene expression, constructing a weighted network model (see manuscript for details) and then computing geometric features of the model, including Forman-Ricci curvature of the 1D graph model and 2D hypergraph extended model, as well as graph entropy. Scalar contraction on the vertices and a global average are computed. Finally these results from all samples are saved in .csv format.
    • This code is optimized with parallel processing to compute FR-curvature the edges of the network in the 2D model. For this reason we highly suggest the use of a high-performance compute cluster. Because the MATLAB is a function script, input arguments are expected when calling from BASH commands; below is an example of how to call the script:
      matlab -batch "curvature_script $BASEDIR $EXPRFILE $TOPOFILE $OUTDIR"
    • where $BASEDIR, $EXPRFILE, $TOPOFILE, $OUTDIR specify respectively the base directory (where to run code from), expression file (.mat, prepared by preprocess_expression.m), topology file (prepared by convert2mat.m), and output directory to save results files.
  5. analyze2Dresults.R - This R script is used to examine the results from the curvature_script.m analysis. The script loads in results files from a specified experiment (i.e. gene expression dataset). Then various data are presented in figures corresponding to the figures in our manuscript. Note, as the results files are quite large (>1 Gb), the results were not uploaded to Github and need to be processed and saved, or reach out to the author for a copy of the results data.

Support

Any questions about implementation or code bugs are welcome. Please use Github issues, or email the authors.

ppi-hypergraph-geometry's People

Contributors

kevin-murgas avatar

Stargazers

 avatar

Watchers

 avatar

Forkers

shouhengtuo

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.