GithubHelp home page GithubHelp logo

yonsei-tgil / clement Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 1.0 15.91 MB

Genomic decomposition and reconstruction of non-tumor diploid subclones

License: MIT License

Python 99.98% Shell 0.02%
clonality decomposition-algorithm

clement's Introduction

CLEMENT

  • Genomic decomposition and reconstruction of non-tumor diploid subclones (2022)
  • CLonal decomposition via Expectation-Maximization algorithm established in Non-Tumor setting
  • Support multiple diploid sample
  • Biallelic variants (Homo, 1/1) can degrade the performance of CLEMENT.

Overview of CLEMENT workflow and core algorithms


Figure1 overview

Installation

Dependencies

  • python 3.6.x
  • matplotlib 3.5.2
  • seaborn 0.11.2
  • numpy 1.21.5
  • pandas 1.3.4
  • scikit-learn 1.0.2
  • scipy 1.7.3
  • palettable 3.3.0

Install from github

  1. git clone https://github.com/Yonsei-TGIL/CLEMENT.git
    cd CLEMENT
    pip3 install .

  2. pip3 install git+https://github.com/Yonsei-TGIL/CLEMENT.git

Install from PyPi

  1. pip3 install CLEMENTDNA

Version update

1.0.4 (June 14h, 2023)

Input format

As now of 1.0.4, CLEMENT only supports standardized TSV input. Examples of input file is shown in "example" directory.

  • 1st column: mutation ID (CHR_POS is recommended)
  • 2nd column: label (answer), if possible. If user don't know the label (answer), just set 0
  • 3rd column: Depth1,Alt1,Depth2,Alt2....,Depth_n,Alt_n * should be comma-separated, and no space permitted
  • 4th column: BQ1,BQ2....,BQ_n * should be comma-separated, and no space permitted. If absent, CLEMENT set default BQ as 20.

Running

command line interface

CLEMENT [OPTIONS]   

options

(Mandatory) These options are regarding User's input and output format
	--INPUT_TSV		Input data whether TSV. The tool automatically detects the number of samples
	--CLEMENT_DIR 		Directory where the outputs of CLEMENT be saved

These options are regarding downsizing User's input or not
	--RANDOM_PICK 		Set this variable to user want to downsize the sample. If user don't want to downsize, set -1. (default : -1).

These options are adjusting E-M algorithm parameter
	--NUM_CLONE_TRIAL_START 	Minimum number of expected cluster_hards (initation of K) 	(default: 3)
	--NUM_CLONE_TRIAL_END 		Maximum number of expected cluster_hards (termination of K)	 (default: 5)
	--TRIAL_NO 			Trial number in each candidate cluster_hard number. DO NOT recommend over 15 (default: 5)
	--KMEANS_CLUSTERNO		Number of initial K-means cluster. Recommendation : 5~8 for one-sample, 8-15 for larger-sample (default: 8)
	--MIN_CLUSTER_SIZE		The minimum cluster size that is acceptable. Recommendation : 1-3% of total variants number 	(default: 9)

Other options
	--MODE			Selection of clustering method.
				"Hard": hard clustering only,  "Both": both hard and soft (fuzzy) clustering (default: "Both")
	--MAKEONE_STRICT  	1: strict, 2: lenient (default : 1)
	--TN_CONFIDENTIALITY  	Confidentiality that negative being negative (TN). Recommendation : > 0.99. (default : 0.995)

Miscelleneous
	--FONT_FAMILY		Font family that displayed in the plots (default : "arial")
	--VERBOSE			0: no record,  1: simplified record,  2: verbose record (default: 2)

output

${CLEMENT_DIR}"/result"

  • CLEMENT_decision CLEMENT's best recommendation among hard and soft clustering.
  • CLEMENT_hard_1st CLEMENT's best decomposition by hard clustering.
  • CLEMENT_hard.gapstatistics.txt Selecting the optimal K in hard clustering based on gap* stastics.
  • CLEMENT_soft_1st CLEMENT's best decomposition by soft (fuzzy) clustering.
  • membership.txt Membership assignment of all variants to each clusters.
  • membership_count.txt Count matrix of the membership assignment to each clusters.
  • mixture.txt Centroid of each clusters

Example

DIR=[YOUR_DIRECTORY]

# Example 1
CLEMENT \
	--INPUT_TSV ${DIR}"/example/2.CellData/MRS_2D/M1-5_M1-6/M1-5_M1-6_input.txt" \
	--CLEMENT_DIR ${DIR}"/example/2.CellData/MRS_2D/M1-5_M1-6"  \
	--NUM_CLONE_TRIAL_START 2 \
	--NUM_CLONE_TRIAL_END 6 \
	--RANDOM_PICK 500

# Example 2
CLEMENT \
	--INPUT_TSV ${DIR}"/example/2.CellData/MRS_2D/M1-5_M1-7/M1-5_M1-7_input.txt" \
	--CLEMENT_DIR ${DIR}"/example/2.CellData/MRS_2D/M1-5_M1-7"  \
	--NUM_CLONE_TRIAL_START 2 \
	--NUM_CLONE_TRIAL_END 6

example1 example2

Contact

clement's People

Contributors

goldpm1 avatar kangseungseok avatar yonsei-tgil avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.