GithubHelp home page GithubHelp logo

biotools-roll's Introduction

SDSC "biotools" roll

Overview

This roll bundles a collection of Biology packages.

For more information about the various biology packages included in the biotools roll please visit their official web pages:

  • bamtools provides both a programmer's API and an end-user's toolkit for handling BAM files.
  • bcftools is a set of utilities that manipulate variant calls in the Variant Call Format (VCF) and its binary counterpart BCF. All commands work transparently with both VCFs and BCFs, both uncompressed and BGZF-compressed.
  • bedtools is a toolset for genome arithmetic.
  • bioperl is open source Perl tools for bioinformatics, genomics and life science.
  • biopython is a set tools for biological computation.
  • bismark is a tool to map bisulfite converted sequence reads and determine cytosine methylation states.
  • blast finds regions of similarity between biological sequences.
  • blat produces two major classes of alignments: at the DNA level between two sequences that are of 95% or greater identity, but which may include large inserts and at the protein or translated DNA level between sequences that are of 80% or greater identity and may also include large inserts.
  • bowtie is a tool for aligning sequencing reads to long reference sequences.
  • bowtie2 is a tool for aligning sequencing reads to long reference sequences and has more features than bowtie 1.
  • bwa is a software package for mapping low-divergent sequences against a large reference genome.
  • bx-python consists of tools for manipulating biological data, particularly multiple sequence alignments.
  • canu is a fork of the Celera Assembler designed for high-noise single-molecule sequencing
  • cufflinks assembles transcripts, estimates their abundances, and tests for differential expression and regulation in RNA-Seq samples.
  • dendropy is a Python library for the simulation, processing, and manipulation of phylogenetic trees and character matrices, and supports the reading and writing of phylogenetic data in a range of formats.
  • diamond is a BLAST-compatible local aligner for mapping protein and translated DNA query sequences against a protein reference database.
  • edena is a de novo short reads assembler.
  • emboss is the European Molecular Biology Open Software Suite.
  • fastqc is a quality control tool for high throughput sequence data.
  • fastx is a collection of command line tools for Short-Reads FASTA/FASTQ files preprocessing.
  • GenomeAnalysisTK is a software package developed to analyse next-generation resequencing data.
  • gmap_gsnap is a genomic mapping and alignment program for mRNA and EST Sequences, and GSNAP: Genomic Short-read Nucleotide Alignment Program.
  • hisat2is a fast and sensitive alignment program for mapping next-generation sequencing reads (both DNA and RNA) to a population of human genomes (as well as to a single reference genome).
  • hmmer is used for searching sequence databases for sequence homologs, and for making sequence alignments. It implements methods using probabilistic models called profile hidden Markov models (profile HMMs).
  • htseq is a Python package that provides infrastructure to process data from high-throughput sequencing assays.
  • idba-ud is a iterative De Bruijn Graph De Novo Assembler for Short Reads Sequencing data with Highly Uneven Sequencing Depth.
  • matt is a multiple protein structure alignment program.
  • miRDeep2 is a tool which discovers microRNA genes by analyzing sequenced RNAs.
  • picard comprises Java-based command-line utilities that manipulate SAM files, and a Java API (HTSJDK) for creating new programs that read and write SAM files.
  • plink is a toolset for genome-wide association studies (GWAS) and research in population genetics.
  • pysam is a python module for reading and manipulating Samfiles.
  • randfold computes the probability that, for a given RNA sequence, the Minimum Free Energy (MFE) of the secondary structure is different from a distribution of MFE computed with random sequences.
  • rseqc provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data.
  • samtools provide various utilities for manipulating alignments in the SAM format, including sorting, merging, indexing and generating alignments in a per-position format.
  • soapdenovo is a novel short-read assembly method that can build a de novo draft assembly for the human-sized genomes.
  • soapsnp is a resequencing utility that can assemble consensus sequence for the genome of a newly sequenced individual based on the alignment of the raw sequencing reads on the known reference.
  • spades St. Petersburg genome assembler is intended for both standard isolates and single-cell MDA bacteria assemblies.
  • squid is a library of C functions and utility programs for sequence analysis.
  • stacks is aoftware pipeline for building loci from short-read sequences, such as those generated on the Illumina platform.
  • stringtieis a fast and highly efficient assembler of RNA-Seq alignments into potential transcripts.
  • trimmomatic is used to quality trim and adapter clip NGS data.
  • trinity represents a novel method for the efficient and robust de novo reconstruction of transcriptomes from RNA-seq data.
  • vcftools is a program package designed for working with VCF files, such as those generated by the 1000 Genomes Project.
  • velvet is a de novo genomic assembler specially designed for short read sequencing technologies.
  • viennaRNA consists of a C code library and several stand-alone programs for the prediction and comparison of RNA secondary structures.

Requirements

To build/install this roll you must have root access to a Rocks development machine (e.g., a frontend or development appliance).

If your Rocks development machine does not have Internet access you must download the appropriate biotools source file(s) using a machine that does have Internet access and copy them into the src/<package> directories on your Rocks development machine.

Dependencies

The sdsc-roll must be installed on the build machine, since the build process depends on make include files provided by that roll.

The roll sources assume that modulefiles provided by SDSC compiler, mpi, math, and python rolls are available, but it will build without them as long as the environment variables they provide are otherwise defined.

The build process requires the BOOST, EIGEN, GSL, and MKL libraries, as well as the python NUMPY and SCIPY libraries, and it assumes that the modulefiles provided by the SDSC boost-roll, math-roll, and mkl-roll (or intel-roll), python-roll and scipy-roll are available. It will build without the modulefiles as long as the environment variables they provide are available.

The build process requires cmake and assumes that the cmake modulefile provided by the SDSC cmake-roll is available. It will build without the modulefile as long as the environment variables it provides are otherwise defined.

Building

To build the biotools-roll, execute this on a Rocks development machine (e.g., a frontend or development appliance):

% make 2>&1 | tee build.log

A successful build will create the file biotools-*.disk1.iso. If you built the roll on a Rocks frontend, proceed to the installation step. If you built the roll on a Rocks development appliance, you need to copy the roll to your Rocks frontend before continuing with installation. with installation.

This roll source supports building with different compilers and for different MPI flavors. The ROLLCOMPILER and ROLLMPI make variables can be used to specify the names of compiler and MPI modulefiles to use for building the software, e.g.,

make ROLLCOMPILER=intel ROLLMPI=mvapich2_ib 2>&1 | tee build.log

The build process recognizes "gnu", "intel", or "pgi" as the value for the ROLLCOMPILER variable; any MPI modulefile name may be used as the value of the ROLLMPI variable. The default values are "gnu" and "rocks-openmpi".

Building the picard application requires use of the Java 1.8 javac. The build process assumes that the JAVA_HOME environment variable refers to a Java 1.8 installation; if this is not the case, you can use specify a java8home path in the ROLLOPTS make variable, e.g.,

make ROLLOPTS='java8home=/usr/local/jvm/1.8.0' 2>&1 | tee build.log

Installation

To install, first execute these instructions on a Rocks frontend:

% rocks add roll *.iso
% rocks enable roll biotools
% cd /export/rocks/install
% rocks create distro

Subsequent installs of compute and login nodes will then include the contents of the biotools-roll.

To avoid cluttering the cluster frontend with unused software, the biotools-roll is configured to install only on compute and login nodes. To force installation on your frontend, run this command after adding the biotools-roll to your distro:

% rocks run roll biotools host=NAME | bash

where NAME is the DNS name of a compute or login node in your cluster.

In addition to the software itself, the roll installs individual modulefiles for each package in:

/opt/modulefiles/applications

Testing

The biotools-roll includes a test script which can be run to verify proper installation of the roll documentation, binaries, and modulefiles. To run the test scripts execute the following command(s):

% /root/rolltests/biotools.t 

biotools-roll's People

Contributors

jerrypgreenberg avatar jjhayes avatar mahidhar avatar tcooper avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

nasa03

biotools-roll's Issues

Intel dependencies

Hello,

My group (bioinformatics group at Agriculture and Agri-Food Canada) are running Rocks and looking at perhaps adopting your rolls holus-bolus. But we do not have the Intel compiler suite or mkl libraries. Is this a show-stopper for us moving forward in using your rolls, especially this roll (biotools-roll), or is it (easily) possible to build your rolls without them?

Thanks,
Glen

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.