GithubHelp home page GithubHelp logo

crane-nature-2015's Introduction

  

  

crane-nature-2015

Publisher: NPG; Journal: Nature; Article Type: Biology letter DOI: 10.1038/nature14450

Condensin-driven remodelling of X chromosome topology during dosage compensation

Emily Crane1, Qian Bian1, Rachel Patton McCord2, Bryan R. Lajoie2, Bayly S. Wheeler1, Edward J. Ralston1, Satoru Uzawa1, Job Dekker2 & Barbara J. Meyer1

Code associated with paper.

scripts/
    matrix2insulation.pl - Calculate insulation vector from matrix (tsv) file (matrix.gz)

Installation

Download the project.
```
wget -O crane-nature-2015.zip https://github.com/blajoie/crane-nature-2015/archive/master.zip
```
Or clone the git project
```
[ssh] - git clone [email protected]:blajoie/crane-nature-2015.git
[https] - git clone https://github.com/blajoie/crane-nature-2015.git
```

Unzip the master:
```
unzip crane-nature-2015.zip
cd crane-nature-2015/
```

To install the module:
```
perl Build.PL
./Build
./Build install
```

After installing the module, you should be free to run the matrix2insulation.pl script:
```
$ perl scripts/matrix2insulation.pl
```

Usage


See wiki for format spec.
https://github.com/blajoie/crane-nature-2015/wiki

$ perl scripts/matrix2insulation.pl

Tool:           matrix2insulation.pl
Version:        1.0.0
Summary:        calculate insulation index (TADs) of supplied matrix

Usage: perl matrix2insulation.pl [OPTIONS] -i <inputMatrix>

Required:

        -i         []         input matrix file

Options:

        -b         []         size (bp) of the insulation square

        -v         []         FLAG, verbose mode

        -ids       []         insulation delta span (size (bp) of insulation delta window)

        -im        []         insulation mode (how to aggregrate signal within insulation square), mean,sum,median

        -nt        [0.1]      noise threshold, minimum depth of valley

        -bmoe      [3]        boundary margin of error (specified in number of BINS), added to each side of the boundary

Notes:
        This script calculates the insulation index of a given matrix to identify TAD boundaries.
        Matrix can be TXT or gzipped TXT.
        See git wiki for details.

        Code associated with Crane, Bian, McCord, Lajoie et al. Nature 2015
        Publisher: NPG; Journal: Nature; Article Type: Biology letter DOI: 10.1038/nature14450
        Condensin-driven remodelling of X chromosome topology during dosage compensation 
        Emily Crane, Qian Bian, Rachel Patton McCord, Bryan R. Lajoie, Bayly S. Wheeler, Edward J. Ralston, Satoru Uzawa, Job Dekker & Barbara J. Meyer

Contact:
        Dekker Lab
        Bryan R. Lajoie
        http://my5C.umassmed.edu
        [email protected]
        https://github.com/blajoie/crane-nature-2015

Published Parameters

To re-create chrX data from paper (same options for autosomes):

    perl scripts/matrix2insulation.pl -i test/input/SRy93-DpnII__10kb__chrX.matrix.gz -is 500000 -ids 200000 -im mean -bmoe 3 -nt 0.1 -v
    perl scripts/matrix2insulation.pl -i test/input/N2-DpnII__10kb__chrX.matrix.gz -is 500000 -ids 200000 -im mean -bmoe 3 -nt 0.1 -v

Bugs and Feedback

For bugs, questions and discussions please use the Github Issues.

LICENSE

Licensed under the Apache License, Version 2.0 (the 'License'); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an 'AS IS' BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

crane-nature-2015's People

Contributors

blajoie avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

crane-nature-2015's Issues

how to create the input matrix?

sorry, i convert the HiC-Pro output matrix to dense matrix, and I add the row name and col name to this matrix. there is always a error:
Illegal division by zero at scripts/matrix2insulation.pl line 561.
i do not know what is the $headerSpacing? so, how to create the input file?
thanks!

an error in the usage

in usage, it displays:
Options:

    -b         []         size (bp) of the insulation square

but I find that '-b' is wrong, '-is' is true after my test and code check

how get TADs with a matrix from HiCPro?

Hi, I want to analyse ice matrix from hicpro with the script matrix2insulation.pl, is this posible? because I got the ERROR: Must supply headered matrix.

Input matrix header / rownames

Supplying a raw matrix of numbers throws ERROR: Must supply headered matrix!

Does this mean inputs must have headers in the format of your example inputs (something like bin#|genome|chr:bp-bp)? Would be helpful to document either way (apologies my perl is too rusty to figure this out from the source).

overlap between different boundaries

Hi,

I use a 40kb matrix to get the TAD boundries, while I find there are overlaps between different boundries. The part results are like this:
chr1 17200000 17480000
chr1 17440000 17720000
chr1 17560000 17840000

I don't think this is right. Could you please tell me whether it's reasonable?
Thank you so much.
Best wishes.

min

Some boundaries are local maximum of insulation score profile instead of local minimum

Hello,
I tried your methods on IMR90 (Jin 2013) at 40kb. According to your paper, after getting insulation score profile, boundaries are detected at the local minimum locus. Therefore, I use R to select all the bins that go from positive delta value to negative delta value (i.e., having a decreasing slope on the left then an increasing slope on the right). However, when I compared the output boundary list with that offered by your scripts, I found some local maximum peaks are included as the boundary. For example:

In *.insulation.boundaries:
header start end binStart binEnd binMidpoint header insulationScore
boundary.28|h19|chr1:13320001-13360001 13320001 13360001 333 334 333.5 rep1|h19|chr1:13320001-13360001 9.41871899515255

However, in *.insulation:
header start end midpoint binStart binEnd binMidpoint insulationScore delta deltaSquare
rep1|h19|chr1:13200001-13240001 13200001 13240001 13220001 330 331 330.5 -12.0651923157993 -3.25140591288299 -1
rep1|h19|chr1:13240001-13280001 13240001 13280001 13260001 331 332 331.5 -12.0651923157993 -3.25140591288299 -1
rep1|h19|chr1:13280001-13320001 13280001 13320001 13300001 332 333 332.5 -5.44620861391896 -1.59665998741291 -1
rep1|h19|chr1:13320001-13360001 13320001 13360001 13340001 333 334 333.5 -5.67855236614767 1.65474592547009 1
rep1|h19|chr1:13360001-13400001 13360001 13400001 13380001 334 335 334.5 -12.0651923157993 3.25140591288299 1
rep1|h19|chr1:13400001-13440001 13400001 13440001 13420001 335 336 335.5 -12.0651923157993 3.25140591288299 1
rep1|h19|chr1:13440001-13480001 13440001 13480001 13460001 336 337 336.5 -12.0651923157993 3.25140591288299 1

By looking at the insulationScore in *.insulation file, rep1|h19|chr1:13320001-13360001 is the local maximum. There are a few more examples like this. Therefore, I feel there must be something wrong in the scripts or I missed something. Hope it can draw your attention.

Thanks,
Ye Zheng

how to use insulation score to identify TAD domains

Hi,

Recently, I am interested to your developed method "insulation score". And I tried to use matrix2insulation.pl to identify TAD domains. I am curious about the result of this perl script. Because, the final result from matrix2insulation.pl is a summary about the boundary location. So, if I want to call TAD domains, is that simply divide the genome into several parts according to this boundaries? For example, if the boundary is chr1:4000000-4500000. Then the TAD domains are two parts?(chr1:0-4000000,chr1:4500000-...)

Could you give me some advise?

Thank you so much!

Best,
Garen

matrix interactions too large - cannot handle in memory

Hi - first of all, thanks for making your code available!

I'd like to use it on data from mouse, but am coming across issues with the larger chromosomes at high resolution, getting errors such as:

ERROR: matrix interactions too large - cannot handle in memory [19720 x 19720] (388,878,400 > 256,000,000 limit)

The actual memory usage of the code when running is low as far as I can tell, and I'm running it on a server with 512GB RAM, so I'm wondering if this matrix size limit can be adjusted? What would you recommend for using this code with mouse or human data?

different insulation scores between *.insulation file and *.insulation.boundaries file

This is test data's output in your packages.

The output of N2-DpnII__10kb__chrX.is500001.ids200001.insulation:
bin6000201|ce10|chrX:2010001-2020001 2010001 2020001 2015001 201 202 201.5 -0.237082756600542 -0.00117094427035152 -1

The output of N2-DpnII__10kb__chrX.is500001.ids200001.insulation.boundaries:
boundary.3|ce10|chrX:2010001-2020001 2010001 2020001 201 202 201.5 bin6000201|ce10|chrX:2010001-2020001 0.484256934770599

I want to know the difference of insulatioscore between the two files.

Thank you very much!
Yusen

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.