GithubHelp home page GithubHelp logo

gt1 / bwtb3m Goto Github PK

View Code? Open in Web Editor NEW
2.0 2.0 1.0 1.37 MB

Burrows Wheeler Transform By Balanced Block Merging

License: GNU General Public License v3.0

C++ 96.10% Makefile 0.69% M4 3.21%

bwtb3m's Introduction

bwtb3m

Burrows Wheeler Transform By Balanced Block Merging

bwtb3m is a set of tools for indexing data.

Source

The bwtb3m source code is hosted on github:

[email protected]:gt1/bwtb3m.git

Compilation of bwtb3m

bwtb3m needs libmaus2 [https://github.com/gt1/libmaus2] . When libmaus2 is installed in ${LIBMAUSPREFIX} then bwtb3m can be compiled and installed in ${HOME}/bwtb3m using

- autoreconf -i -f
- ./configure --with-libmaus2=${LIBMAUSPREFIX} \
	--prefix=${HOME}/bwtb3m
- make install

Calling bwtb3m and options

usage: src/bwtb3m [options] <inputfile>

options:

* inputtype=[<bytestream>] (bytestream,compactstream,pac,pacterm,lz4,utf-8)
* outputfilename=[<bwtb3m_<hostname>_<pid>_<starttime>.bwt>] (name of output .bwt file)
* sasamplingrate=[32] sampling rate for sampled suffix array
* isasamplingrate=[262144] sampling rate for sampled inverse suffix array
* mem=[2147483648] memory target (suffixes k,m and g are accepted)
* numthreads=[8] number of threads
* bwtonly=[0] compute BWT only (no sampled suffix array and reverse)
* tmpprefix=[bwtb3m_myers-mac-8.local_35627_1472591133] (prefix for tmp files)
* sparsetmpprefix=[tmpprefix] (prefix for sparse gap tmp files)
* copyinputtomemory=[0] (copy input file to memory)
* largelcpthres=[16384] (large LCP value threshold)
* verbose=[0] (verbosity level)

Output

bwtb3m computes the BWT of the given input file. Note that this refers to the BWT as in the original definition (see M. Burrows and D. J. Wheeler: A Block-sorting Lossless Data Compression Algorithm, Digital Research Report 124), in particular no (implicit or explicit) terminator symbol is used. The input string is considered as circular for the sake of comparisons. The output .bwt file can be decoded using the bwtb3mdecoderl program or read using the class libmaus2::huffman::RLDecoder in libmaus2.

If bwtonly=0, then bwtb3m computes a sampled suffix array by loading the final BWT to memory in the form of a Huffman shaped wavelet tree. If bwtonly=1, then the program only computes a hint file with suffix .preisa, which can be used for calling the bwtcomputessa program to compute a sampled suffix array and sampled inverse suffix array in external memory without loading the BWT into memory.

Generating an index for BWA

The following script can be used to generate an index for BWA. It expects 5 arguments:

  • the path to the bwa program
  • the path to the bwtb3m program
  • the path to the bwtb3mtobwa program
  • the amount of memory for bwtb3m in giga bytes
  • the file name of the input FastA file to be indexed
#! /bin/bash
if [ $# -lt 5 ] ; then
	echo "usage: ${SHELL} $0 /path/to/bwa /path/to/bwtb3m /path/to/bwtb3mtobwa <mem/GB> <in.fa>"
	exit 1
fi

BWA="$1"
BWTB3M="$2"
BWTB3MTOBWA="$3"
MEM="$4"
INPUT="$5"

if [ ! -e "${BWA}" ] ; then
	echo "File ${BWA} does not exist"
	exit 1
fi
if [ ! -e "${BWTB3M}" ] ; then
	echo "File ${BWTB3M} does not exist"
	exit 1
fi
if [ ! -e "${BWTB3MTOBWA}" ] ; then
	echo "File ${BWTB3MTOBWA} does not exist"
	exit 1
fi

"${BWA}" fa2pac "${INPUT}"
"${BWTB3M}" inputtype=pacterm mem="${MEM}g" outputfilename=${INPUT}.pac.bwt ${INPUT}.pac
"${BWTB3MTOBWA}" ${INPUT}.pac.bwt ${INPUT}.bwt ${INPUT}.sa
"${BWA}" bwtupdate "${INPUT}.bwt"
rm -f "${INPUT}".pac.*
"${BWA}" fa2pac -f "${INPUT}"

bwtb3m's People

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

Forkers

dkj

bwtb3m's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.