GithubHelp home page GithubHelp logo

mutarna's Introduction

MutARNA: mutational analysis and visualization for short and long span interactions of RNAs

Setup

conda env create -f conda-env-MutaRNA.yml -n MutaRNA
source activate MutaRNA

Usage

usage: MutaRNA-plot.py [-h] --fasta-wildtype FASTA_WILDTYPE --SNP-tag SNP_TAG
                       [--out-dir OUT_DIR] [--no-global-fold]
                       [--no-local-fold] [--local-W LOCAL_W]
                       [--local-L LOCAL_L] [--global-maxL GLOBAL_MAXL]
                       [--no-SNP-score] [--enable-long-range]
                       [--enable-global-fold]

MutaRNA-plot predict and plot local and global base-pair probabilities of
wildtype and mutant RNAs Sample call: "python bin/MutaRNA-plot.py --fasta-
wildtype data/sample0.fa --SNP-tag G3C --out-dir tmp --no-global-fold"

optional arguments:
  -h, --help            show this help message and exit
  --fasta-wildtype FASTA_WILDTYPE
                        Input sequence wildtype in fasta format
  --SNP-tag SNP_TAG     SNP tag e.g. "C3G" for mutation at position 3 from C
                        to G
  --out-dir OUT_DIR     path the output directory. The directory must already
                        exist.
  --no-global-fold      Do not run (semi-)global fold (semi: max-window
                        1000nt)
  --no-local-fold       Do not run local fold
  --local-W LOCAL_W     Window length for local fold
  --local-L LOCAL_L     Max base-pair interaction span for local fold
  --global-maxL GLOBAL_MAXL
                        Maximum interaction span of global length.
  --no-SNP-score        Do not run SNP structure abberation scores with RNAsnp
                        and remuRNA
  --enable-long-range   predict and plot long-range interactions of wildtype
                        and mutant RNAs using IntaRNA
  --enable-global-fold  enable global fold

Example call

python bin/MutaRNA-plot.py --fasta-wildtype data/sample0.fa --SNP-tag G3C --out-dir ./results/

Output

Outputs are stored under the current directory (by default) or the specified path via --out-dir option.

  • Local

The results under the local/ directory contain the visualization of base-pairing probabilities as computed by RNAplfold under different conditions. The plots are generated in two file formats (.png and svg).

- `RNA-WILD-circos`: The base pair probabilities of the wild type in circular Circos form. 
- `RNA-MUTANT-circos`: The base pair probabilities of the mutant in circular Circos form. 
- `RNA-MUTANT-removed-circos`: The weakened base-pairing potentials in the form of probability difference between WT and mutant.
- `RNA-MUTANT-introduced-circos`: The increased base-pairings potentials in the form of probability difference between mutant and WT.
- `RNA-WT-MUT-dotplot`: The base pair probabilities of the wild type (upper right) and mutant (lower left) in dotplot-style matrix format heatmap plots.
- `RNA-REMOVED-INTRODUCED-dotplot`: The weakened base-pairing potentials in the form of probability difference between WT and mutant (upper right). The increased base-pairings potentials in the form of probability difference between mutant and WT (lower left).
- `RNA_lunp-ECG` The position-wise unpaired probability, also know as accessibility, of wild type, mutant and their difference.
  • Global

The results under the global/ directory contain the visualization of base-pairing probabilities as computed by RNAplfold. Similar to the local mode but allowing for large base-pair span over the sequence length in a single window. under different conditions. The plots are generated in two file formats (.png and svg).

  • Predicted structural impact of the mutation:

    • remuRNA.csv : prediction scores by remuRNA. remuRNA score (H(WT||MUT))is the relative entropy between the ensemble of structures in wild type versus mutant RNA. The score reflects the changes in the global structure of the RNA.
    • RNAsnp.csv : prediction scores by RNAsnp. RNAsnp scores are generated in two modes (-m 1, -m 2) of (semi-)global and local folding and based on the two metrics of base-pairing distance (d_max) in both modes and correlation coefficient (r_min). RNAsnp further computes the significance of score in term of p-values against a pre-computed table of sequence with similar features.

mutarna's People

Contributors

martin-raden avatar mmiladi avatar

Stargazers

 avatar

Watchers

 avatar  avatar

mutarna's Issues

plot title wrong!

in the diff-dot-plots titles you write Pr(wt)-Pr(mt) but in the box you annotate delta > 0 : weakend and delta < 0 : increased

but if Pr(wt) > Pr(mt), ie. weaker, delta >0 according to your formula..

so either the title is wrong or the in-box-text ..

@mmiladi please correct

heatmap tweaks

  • ticks only every 10
  • vertical x-tick label (at the bottom?)
  • 50%alpha gray lines every 50nt

acc plot tweaks

  • red mutation line = dashed (as in heatmap)
  • ticks only every 10 (not at every x)
  • gray background lines only every 10 (at ticks); instead of leaving it out every 10

Report RNAsnp warnings

RNAsnp reports a warning if p-values are not available for the provided sequence length. These errors must be reported to the user.

Report of a form:

'SNP', 'H(wt||mu)', 'MFE(mu)', 'MFE(wt)', 'dMFE']
-m 1 
ERROR: RNASNP returned warning for: /scratch/rna/bisge001/RNA_results/CARNA-result/MutaRNA_4348332/input_rna.fa message is:Warnings: The input sequence length 55 is less than twice the size of chosen flanking size 200. Thus, the reporting p-value is not accurate. Please refer to the README file for more details.
SNP	w	Slen	GC	interval	d_max	p-value	interval	r_min	p-value
C7G	200	55	0.2182	2-51	0.0014	0.9048	4-53	0.9929	0.8692

indel support ?

  • could be done by inserting dummy N chars at the deleted or inserted positions for plotting
  • the "extended" sequences should NOT be used as RNAplfold input, since they are causing problems with the minimal and maximal base pair span

Empty impact range diff matrix throws an exception

In the procedure to get the ideal impact range in the zoomed mode, a fixed minimum probability is assigned. This causes a problem once all the probability differences are below the set threshold.

Do these the zoomed range list get empty:
https://github.com/mmiladi/MutaRNA/blob/master/bin/MutaRNA-plot.py#L129

Now the question is how to deal with this case:

  • Should the the probability threshold be dynamically lowered (e.g. by 1/10) until the list is not empty?
  • Should we report the full figure instead?

ping @martin-raden

new option '--idxPos0' (default=1) to guide indexing

  • defines the in/out index of the first sequence position
  • if negative; ignore 0 from indexing
  • use this information to map mutation encoding to sequence position

(similar to IntaRNA --t/qIdxPos0)

  • allows to study same mutation (encoding) within different sequence context lengths

Installation issue: PackagesNotFoundError

Hi. I'm trying to install MutaRNA but I'm getting PackagesNotFoundError error.

git clone https://github.com/mmiladi/MutaRNA.git
cd MutaRNA
conda env create -f conda-env-MutaRNA.yml -n MutaRNA

Channels:
 - conda-forge
 - bioconda
 - defaults
 - r
 - conda
Platform: linux-64
Collecting package metadata (repodata.json): done
Solving environment: failed

PackagesNotFoundError: The following packages are not available from current channels:

  - gcc_impl_linux-64==7.3.0=habb00fd_2
  - gdk-pixbuf==2.36.12=haf2c3b9_1004
  - gxx_impl_linux-64==7.3.0=hdf63c60_2

Current channels:

  - https://conda.anaconda.org/conda-forge/linux-64
  - https://conda.anaconda.org/bioconda/linux-64
  - https://repo.anaconda.com/pkgs/main/linux-64
  - https://repo.anaconda.com/pkgs/r/linux-64
  - https://conda.anaconda.org/r/linux-64
  - https://conda.anaconda.org/conda/linux-64

To search for alternate channels that may provide the conda package you're
looking for, navigate to

    https://anaconda.org

and use the search bar at the top of the page.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.