CM++ Pipeline

Customizable modular pipeline for testing an improved version of CM for generating well-connected clusters. Image below from arXiv preprint: Park et. al. (2023). https://github.com/illinois-or-research-analytics/cm_pipeline/tree/main

CM++ Pipeline

Quick Start Guide

Features

The CM Pipeline is a modular pipeline for community detection that contains the following modules:

Cluster Statistics: Compute statistics such as node and edge count,

Requirements

MacOS or Linux operating system
python3.9 or higher
cmake 3.2.0 or higher
gcc of any version (In our analysis, gcc 9.2.0 was used)

Installation and Setup

There are several strategies for installation

Installation via Cloning

Clone the cm_pipeline repository
Activate the venv which has the necessary packages
Run pip install -r requirements.txt && pip install .
Make sure everything installed properly by running cd tests && pytest

Installation via pip install

Simply run pip install git+https://github.com/illinois-or-research-analytics/cm_pipeline. This will install CM++, but to use pipeline functionality, please setup via cloning.

Input and Usage

Example Commands

CM++

python3 -m hm01.cm -i network.tsv -e clustering.tsv -o output.tsv -c leiden -g 0.5 --threshold 1log10 --nprocs 4 --quiet
- Runs CM++ on a Leiden with resolution 0.5 clustering with connectivity threshold $log_{10}(n)$ (Every cluster with connectivity over the log of the number of nodes is considered "well-connected")
python3 -m hm01.cm -i network.tsv -e clustering.tsv -o output.tsv -c ikc -k 10 --threshold 1log10 --nprocs 4 --quiet
- Similar idea but with IKC having hyperparameter $k=10$.

CM Pipeline

Suppose you have a pipeline like the one here. Call it pipeline.json
Then from the root of this repository run:
- python -m main pipeline.json

CM++ Usage

To refer to usage instructions on CM++, see the following documentation.

Pipeline Usage

The input to the pipeline script is a pipeline.json file. NOTE that you can use any other json file as input as long as it fits the requirements in the documentation.
Description of the supported key-value pairs in the config file can be found here pipeline_template.json
Edit the fields of the pipeline.json file to reflect your inputs and requirements.
Run python -m main pipeline.json

JSON Input Documentation

Please refer to the json format documentation on how to write the pipeline.json file.

For Developers

Loading a Developer Environment

To quickly set up a developer environment for the CM++ Pipeline, simply run the following commands. (NOTE: Make sure you have Conda installed)

conda env create -f environment.yml
conda activate

Customizing the Pipeline

The CM++ Pipeline also allows for users to add their own pipeline stages and clustering methods.
Please refer to the customization documentation on how to modify the code to allow for your own pipeline stages and .

Output Files

The commands executed during the workflow are captured in {output_dir}/{run_name}-{timestamp}/commands.sh. This is the shell script generated by the pipeline that is run to generate outputs.
The output files generated during the workflow are stored in the folder {output_dir}/{run_name}-{timestamp}/
The descriptive analysis files can be found in the folder {output_dir}/{run_name}-{timestamp}/analysis with the *.csv file for each of the resolution values.

Citations

@misc{cm_pipe2023,
    author = {Vikram Ramavarapu and Vidya Kamath and Minhyuk Park and Fabio Ayres and George Chacko},
    title = {Connectivity Modifier Pipeline},
    howpublished = {\url{https://github.com/illinois-or-research-analytics/cm_pipeline}},
    year={2023},
    doi={10.5281/zenodo.10076514}
}

@misc{park2023wellconnected,
    title={Well-Connected Communities in Real-World and Synthetic Networks}, 
    author={Minhyuk Park and Yasamin Tabatabaee and Vikram Ramavarapu and Baqiao Liu and Vidya Kamath Pailodi and Rajiv Ramachandran and Dmitriy Korobskiy and Fabio Ayres and George Chacko and Tandy Warnow},
    year={2023},
    eprint={2303.02813},
    archivePrefix={arXiv},
    primaryClass={cs.SI}
}

minhyukpark / cm_pipeline Goto Github PK

cm_pipeline's Introduction