GithubHelp home page GithubHelp logo

tseemann / legsta Goto Github PK

View Code? Open in Web Editor NEW
7.0 10.0 3.0 7.14 MB

๐Ÿ—โญ In silico Legionella pneumophila Sequence Based Typing

License: GNU General Public License v3.0

Perl 100.00%

legsta's Introduction

Build Status License: GPL v3 Don't judge me

legsta

In silico Legionella pneumophila Sequence Based Typing (SBT)

Background

SBT stands for sequence-based typing. The purpose of the Legionella pneumophila SBT scheme is to provide a rapid and easily comparable method for the epidemiological typing of clinical and environmental isolates of Legionella pneumophila in outbreak investigations.

Install

Conda

conda install -c conda-forge -c bioconda -c defaults legsta

Homebrew

brew install brewsci/bio/legsta

Github

cd $HOME
git clone https://github.com/tseemann/legsta.git
cd $HOME/legsta/test
../bin/legsta *.fna *.gbk

Input

The any2fasta tool is used to convert input files to FASTA for feeding to isPcr. It can accept FASTA, Genbank, EMBL, GFF, and many other formats. The files may also be compressed with gzip, bzip2 or zip.

Output

Output is a TSV file (or CSV if --csv is used). Alleles with no in silico product are denoted - and novel alleles listed using ?.

% cd legsta/test
% ../bin/legsta NC_006368.fna NC_018140.fna CR628336.1.gbk.gz missing_flaA.fna FJBS01000000.fna.bz2


FILE  		       SBT     flaA    pilE    asd     mip     mompS   proA    neuA
NC_006368.fna          1       1       4       3       1       1       1       1
NC_018140.fna          734     2       6       17      1       1       8       11
CR628336.1.gbk.gz      1       1       4       3       1       1       1       1
missing_flaA.fna       -       -       14      16      25      7       13      206
FJBS01000000.fna.bz2   -       3       10      1       3       14      9       ?

Options

Option Description
--quiet do not print any informational messages to stderr
--csv comma-separated output instead of tab-separated
--noheader don't print table header to output (ie. FILE SBT flaA ...)
--version print legsta X.Y.Z version to stdout and exit

Dependencies

Issues

Submit questions or issues to our Issue Tracker

Authors

  • Torsten Seemann
  • Anders Goncalves Da Silva
  • Andrew Buultjens
  • Jason Kwong

Licence

GPLv3

Acknowledgements

  • Natalie Groves for providing the latest sequences and profiles from the PHE database

References

legsta's People

Contributors

andersgs avatar tseemann avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

legsta's Issues

Unwanted output saved to redirected file

At the moment, if we run:

legsta myfasta.fa > legsta_res.txt

The first line of the file turns out to be:

-minPerfect=N - Minimum size of perfect match at 3' end of primer (default 15)

Turns out, on line 30 of legsta we have this:

system("$ISPCR 2>&1 | grep -- minPerfect")==0 or err("Can not run $ISPCR");

Perhaps it should be:

system("$ISPCR 2>&1 | grep -- minPerfect > /dev/null")==0 or err("Can not run $ISPCR");

flaA allele 11 not detected

Hi, thanks for making this tool, it's very useful! I am running it on 70 genomes which I also have Sanger SBT on to compare. For most of the genomes, legsta finds the same SBT, but it does not seem to find flaA allele 11. 25 of the genomes (SBTs 154 and 574) should have flaA 11, but legsta outputs "-" for this allele. Other flaA alleles are found in the other genomes. When I run BLASTn on the assemblies using the flaA 11 sequence from https://github.com/tseemann/legsta/blob/master/db/flaA.tfa, I find flaA_11 with 100% nucleotide identity and sequence coverage.

Also, for 12 of those 25 genomes, the mompS-gene was called incorrectly (as allele 7 in stead of 15). The mompS-allele is correctly called in all the other genomes I have looked at.

Any idea what is going on?

Thanks,
Marit

pilE 10 allele is not found

It seems there are some mismatches in the PCR primers that cause isPCR to fail for that particular allele.

SBT from amplicons

Hi,
Great tool for implementing L. pneumophila SBT! Thanks.
I'm trying to call SBT using nanopore sequenced multiplex PCR amplicons. However, I get "no product" results for some of the alleles. Sequences for the alleles are in the input file and are a perfect match for an allele in the database. I tested an assembled genome of the same SBT strain and get expected results.
Can I use amplicons with Legsta? Do the input sequences need to be longer?

Cheers!

neuA.fta headers

Hi @tseemann,

we found a discrepancy in the fta file for the neuA loci. Two header formats exist:

neuA_xx
neuAh_xx

was this intentional? may this affect the results (i.e. unassigned allele for this loci, etc)?

thanks,
Yair

Add --label option

Instead of filename, print --label.
Only works for 1 sample.

Perhaps --label could be a regexp to capture a part of the filename

Hi Torsen

Hi Torsen, i am wondering if the script works fine with concatenation of the 7 sequences in one fasta file instead of a whole genome sequence. We have only data from first generation sequencing. Actually, i tried it and it doesn't works.

Thanks
Eric

Handle empty FASTA file better

./bin/legsta /dev/null
FILE    SBT     flaA    pilE    asd     mip     mompS   proA    neuA
Running: isPcr /dev/null /home/tseemann/git/legsta/bin/../db/ispcr.tab stdout -minPerfect=6 -tileSize=6 -maxSize=1200 -stepSize=5 -out=fa
Program error: trying to allocate 0 bytes in needLargeMem (limit: 1073741824)
isPcr: memalloc.c:91: needLargeMem: Assertion `FALSE' failed.
/dev/null       -       -       -       -       -       -       -

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.