protaxa's Introduction

protaxA

PROTAX for aligned sequences

c/ and scripts/ contain c-code and Perl scripts required to train a Probabilistic Taxonomic classifier (PROTAX). Instructions are in file readme.txt

c2/ contains c-code for classification. Sequences are represented as 64-bit integer vectors (representing 16 consecutive nucleotides as one long integer) in order to gain speedup. Speedup can be measured by running classify_v1 (sequences represented as character strings) and classify_v2 (sequences represented as 64-bit int vectors). Both programs measure the time for calculating all pairwise distances between query sequence and reference sequences and the time to convert the sequence distances into taxon probabilities.

In addition, there are two variants of classify_v2:

classify_rseq classify reference sequences without using self-similarity
classify_info prints the nearest and 2nd nearest reference sequence to query sequence in each node, along with the predictions

There are also several utility programs using the fast distance calculations used in classify_v2:

dist_best for each query sequence, give the most similar reference sequence and the distance.
dist_matrix calculate all pairwise distances between the queries, and report those less than a given threshold.
dist_bipart calculate all pairwise distances between the queries and references, and report those less than a given threshold.

Recommend Projects

brendanf / protaxa Goto Github PK

protaxa's Introduction

protaxA

protaxa's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs