GithubHelp home page GithubHelp logo

sci's Introduction

Sub-compartment Identifier (SCI)


Authors: Haitham Ashoor, Sheng Li Contact: [email protected], [email protected]

Description

SCI is a program to identify sub-compartments from HiC data. SCI utilizes graph embedding followed by K-means clustering in order to predict sub-compartments from HiC data.

SCI workflow

Dependencies

  • python 2.7

Python Libraries

C++ libraries

Installation

$ python setup.py install

Input format

SCI accepts bedpe-like format

  1. chr1: is the chromosome name for the first interacting HiC bin
  2. start1: is the starting coordinate for the first interacting HiC bin
  3. end1: is the ending coordinate for the first interacting HiC bin
  4. chr2: is the chromosome name for the second interacting HiC bin
  5. start2: is the starting coordinate for the second interacting HiC bin
  6. end2: is the ending coordinate for the second interacting HiC bin
  7. HiC count: number of HiC reads for the interacting HiC bins. SCI does not perform HiC normalization, if user wants to use normalized HiC data, HiC count should corresponds to the normalized HiC read-count.

SCI provides a script to convert .hic format into SCI accepted format under scripts/hic2sci.sh. In order to convert .hic file into please follow the following instructions:

export installed juicer-tools into JUICERTOOLS environment variable

$ export JUICERTOOLS=/path/to/juicer-tools

Then, run hic2sci script to get SCI formatted input data:

$ scripts/hic2sci.sh <input .hic file> <output file> <resolution> 

The command to start docker container is:

docker run -it -p 8080:8080 -v <directory of the Rao_2014.hic data file>:/data yuz12012/sci_container:latest

For the container, please run

export LD_LIBRARY_PATH=/sci/gsl/lib
export CPPFLAGS="-I/usr/local/zlib/1.2.8-4/include"
export JUICERTOOLS=/sci/juicer_tools_1.22.01.jar
sh scripts/hic2sci.sh /data/Rao_2014.hic sci_input 100

bash
conda activate sci

For people using singularity to load the docker container, please change the output directory with writable authority.

Parameters description:

Parameter Mandatory/Optional Description
-n, --name yes Name of the experiment, it will be used as a prefix for all output files
-r, --resolution yes Required resolution to predict compartments,provided bins' size should have resolution greater than or equal the provided value
-g, --genome_size yes File containing chromosome sizes of the target genome
-o, --order No. Default: 1 Graph order to consider when performing graph embedding. Available options are 1,2 or both
-s, --samples No. Default: 25 Number of edges to sample in millions order from the graph
-k, --clusters No. Default: 2 Nubmer of sub-compartments to be predicted

Output

SCI output sub-compartments annotation into BED format with the following fields:

  1. chr: chromosome for sub-compartment annotaiton
  2. start: genomic location where sub-compartment bin starts
  3. end: genomic location where sub-compartment bin ends
  4. label: sub-compartment unique label. Bins that do not have sub-compartment label due to low mapability are labeled with NA.

Test run

To preform test run for SCI please follow the following steps: The sample input sample is at: ftp://ftp.jax.org/zhaoyu/demo_data.txt.zip

$ python -m sci.sci -n test -f /sci-data/demo_data.txt -r 100000 -g chromosome_sizes/hg19.chrom.sizes -o both -s 1 -k 5

sci's People

Contributors

yuz12012 avatar ashoorh avatar zhqu1148980644 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.