GithubHelp home page GithubHelp logo

minhyukpark / cm_pipeline Goto Github PK

View Code? Open in Web Editor NEW

This project forked from illinois-or-research-analytics/cm_pipeline

0.0 0.0 0.0 65.61 MB

Pipeline that uses an improved version of CM for generating well-connected clusters

License: GNU General Public License v3.0

Python 78.37% R 4.60% TeX 17.03%

cm_pipeline's Introduction

CM++ Pipeline

DOI

Customizable modular pipeline for testing an improved version of CM for generating well-connected clusters. Image below from arXiv preprint: Park et. al. (2023). https://github.com/illinois-or-research-analytics/cm_pipeline/tree/main

cm_pipeline Overview

Quick Start Guide

Features

The CM Pipeline is a modular pipeline for community detection that contains the following modules:

  • Cluster Statistics: Compute statistics such as node and edge count,

Requirements

  • MacOS or Linux operating system
  • python3.9 or higher
  • cmake 3.2.0 or higher
  • gcc of any version (In our analysis, gcc 9.2.0 was used)

Installation and Setup

There are several strategies for installation

Installation via Cloning

  • Clone the cm_pipeline repository
  • Activate the venv which has the necessary packages
  • Run pip install -r requirements.txt && pip install .
  • Make sure everything installed properly by running cd tests && pytest

Installation via pip install

Simply run pip install git+https://github.com/illinois-or-research-analytics/cm_pipeline. This will install CM++, but to use pipeline functionality, please setup via cloning.

Input and Usage

Example Commands

CM++

  • python3 -m hm01.cm -i network.tsv -e clustering.tsv -o output.tsv -c leiden -g 0.5 --threshold 1log10 --nprocs 4 --quiet
    • Runs CM++ on a Leiden with resolution 0.5 clustering with connectivity threshold $log_{10}(n)$ (Every cluster with connectivity over the log of the number of nodes is considered "well-connected")
  • python3 -m hm01.cm -i network.tsv -e clustering.tsv -o output.tsv -c ikc -k 10 --threshold 1log10 --nprocs 4 --quiet
    • Similar idea but with IKC having hyperparameter $k=10$.

CM Pipeline

  • Suppose you have a pipeline like the one here. Call it pipeline.json
  • Then from the root of this repository run:
    • python -m main pipeline.json

CM++ Usage

To refer to usage instructions on CM++, see the following documentation.

Pipeline Usage

  • The input to the pipeline script is a pipeline.json file. NOTE that you can use any other json file as input as long as it fits the requirements in the documentation.
  • Description of the supported key-value pairs in the config file can be found here pipeline_template.json
  • Edit the fields of the pipeline.json file to reflect your inputs and requirements.
  • Run python -m main pipeline.json

JSON Input Documentation

For Developers

Loading a Developer Environment

To quickly set up a developer environment for the CM++ Pipeline, simply run the following commands. (NOTE: Make sure you have Conda installed)

conda env create -f environment.yml
conda activate 

Customizing the Pipeline

  • The CM++ Pipeline also allows for users to add their own pipeline stages and clustering methods.
  • Please refer to the customization documentation on how to modify the code to allow for your own pipeline stages and .

Output Files

  • The commands executed during the workflow are captured in {output_dir}/{run_name}-{timestamp}/commands.sh. This is the shell script generated by the pipeline that is run to generate outputs.
  • The output files generated during the workflow are stored in the folder {output_dir}/{run_name}-{timestamp}/
  • The descriptive analysis files can be found in the folder {output_dir}/{run_name}-{timestamp}/analysis with the *.csv file for each of the resolution values.

Archive

Citations

@misc{cm_pipe2023,
    author = {Vikram Ramavarapu and Vidya Kamath and Minhyuk Park and Fabio Ayres and George Chacko},
    title = {Connectivity Modifier Pipeline},
    howpublished = {\url{https://github.com/illinois-or-research-analytics/cm_pipeline}},
    year={2023},
    doi={10.5281/zenodo.10076514}
}

@misc{park2023wellconnected,
    title={Well-Connected Communities in Real-World and Synthetic Networks}, 
    author={Minhyuk Park and Yasamin Tabatabaee and Vikram Ramavarapu and Baqiao Liu and Vidya Kamath Pailodi and Rajiv Ramachandran and Dmitriy Korobskiy and Fabio Ayres and George Chacko and Tandy Warnow},
    year={2023},
    eprint={2303.02813},
    archivePrefix={arXiv},
    primaryClass={cs.SI}
}

cm_pipeline's People

Contributors

vikramr2 avatar vidyakamath avatar vidyak2uiuc avatar fabioayresinsper avatar chackoge avatar minhyukpark avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.