GithubHelp home page GithubHelp logo

fg6 / foract Goto Github PK

View Code? Open in Web Editor NEW
0.0 2.0 2.0 11.87 MB

Pipeline to prepare alignments for visualization with ACT (goo.gl/1T28jX) and for locating possible inter-chromosomal re-arrengments/misjoints

License: GNU General Public License v3.0

Shell 34.92% Python 0.31% Makefile 4.22% C++ 56.73% R 3.83%
alignment genome assemblies visualization act contigs scaffold pipeline chromosome smalt

foract's Introduction

forACT

Pipeline to prepare alignments between a Reference fasta and a draft assembly; then uses the ACT pipeline (http://www.sanger.ac.uk/science/tools/artemis-comparison-tool-act) to visualize genome comparison interactively. Printout a summary of the assembly analysed, and possible inter-chromosomal misjoints (if comparing with a reference) For chromosome assigned scaffolds there is also the possibiliy to have circos-like plots for a quick general view. Please notice that ACT will not work for genomes ~> 2.1 Gbp; in this case you can still plot the comparison but have to select 1 to 5 chromosomes to show at a time. To decide which chromosomes to plot, you can make use of the misassembly report to see if and which chromosomes get mixed-up by the assembly.

Main Pipeline steps:

  • Shreds draft assembly contigs/scaffolds in 10K bases (can vary length of chunks by changing parameter "shred")

  • Maps the shredded draft assembly against Reference (SMALT aligner)

  • Checks orientation of each contig/scaffold with respect to Reference and change to complement the contigs/scaffolds for which majority of chunks are complements. This creates an equivalent draft assembly with only forward contigs/scaffolds

  • Shreds the forward draft assembly contigs/scaffolds in 10K bases (can vary length of chunks by changing parameter "shred")

  • Maps the shredded forward draft assembly against Reference (SMALT aligner)

  • Filters out:

    • alignments with identity < 80% (can vary minimum identity by changing parameter "minid")
    • isolated alignments shorter than 1% of the contig (can vary this limit by changing parameter "noise")
  • Re-order position of forward contigs/scaffolds to match each contig/scaffold alignment position in the Reference (when one contig/scaffold map to more than one place in the Reference, the major alignment is considered)

  • Re-define alignment positions according to ACT positioning scheme

Requirements

The new version of Artemis require maven to properly install act, please make sure it's in your PATH. You will need a g++ version >=4.7 and the zlib in your path for installation

The aligner executable (Smalt from http://www.sanger.ac.uk/science/tools/smalt-0) is defined in your forACT/utils/myscripts/settings.sh file: it automatically points to a x86_64 compiled version, more versions you might want to try are in your forACT/utils/mysrcs/mylibs/smalt-0.7.4 otherwise just pont to your compiled smalt executable in forACT/utils/myscripts/settings.sh

External packages

forACT downloads and installs the gzstream library to handle gzip input files (https://www.cs.unc.edu/Research/compgeom/gzstream/). It also uses the external software ACT, smalt and minimap2 (https://github.com/lh3/minimap2).

Instructions

Download repository and install utilities/compile tools:

$ git clone https://github.com/fg6/forACT.git
$ cd forACT
$ ./launchme.sh install
$ ./launchme.sh test

Get information about parameters and settings:

$ ./launchme.sh suggestions

Run the pipeline for a project:

Step 1: setup folder and scripts:

$ ./launchme.sh setup </full/path/to/reference> </full/path/to/draft>  </full/path/to/destdir>
    /full/path/to/reference:  Reference fasta (Please provide full path)
    /full/path/to/draft:  draft assembly fasta (Please provide full path)
    /full/path/to/destdir: folder where to run the pipeline (Please provide full path)

Step 2: run the pipeline:

$ cd /full/path/to/destdir
$ ./mypipeline.sh align

Check if alignment ran smoothly and with no errors:

$ ./mypipeline.sh check 

If everything run smoothly then prepare the files for ACT:

$ ./mypipeline.sh prepfiles

If no errors, the newly created files will be in /full/path/to/destdir/[aligner]_10000/unique/ folder.

Step 3: write a report of the assemblies and alignments, and possible misassemblies/re-arrengements:

$ ./mypipeline.sh report
In progress: some variables like "Reference Coverage" or "Structure coverage" definition 
	are still not fixed, do not take them too seriously for now

Step 4: for chromosome-assigned super-scaffolds: look for inter-chromosome rearrengments/misjoints:

$ ./mypipeline.sh misjoints

and then

$ ./mypipeline.sh circlize

for circos type plots. Be aware that in these plots only and only the alignment to the major chromosome per each scaffold and the possible inter-chromosomal misassemblies are visualized.

Step 5: Launch ACT

If Step 1 and 2 gave no errors, then launch ACT:

$ ./mypipeline.sh act  

To view this project compared to another assembly:

$ ./mypipeline.sh act_compare full/path/to/folder/other_assembly_forACT

Please note this only works if the other assembly forACT has been run with the same reference and same shred/noise/minid parameters

Step 6: Launch ACT for genome size > 2.1 GB

*** Warning *** ACT cannot handle genome size > 2.1 GB, for genome of this size run instead single chromosome at a time in ACT:

List possible chromosomes to view:

$ ./mypipeline.sh act_select list

View up to 5 chromosomes:

$ ./mypipeline.sh act_select chr1 chr2 ...

where chr1..chr2 are the name of the chromosomes to visualize.

To view this project compared to another assembly (up to 5 chromosomes):

$ ./mypipeline.sh act_select_compare full/path/to/folder/other_assembly_forACT  chr1 chr2 ...

Please note this only works if for the other assembly forACT has been run with the same reference and same shred/noise/minid parameters

foract's People

Contributors

fg6 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.