vishnuraghuram94 / agrvate Goto Github PK

View Code? Open in Web Editor NEW

9.0 1.0 2.0 60.61 MB

Rapid identification of Staphylococcus aureus agr locus type and agr operon variants.

License: MIT License

Shell 1.27% HTML 98.73%

agr-operon agr-group mummer aureus usearch agr staphylococcus-aureus staphylococcus frameshifts alleles

agrvate's People

Contributors

Stargazers

Watchers

Forkers

rpetit3

agrvate's Issues

option to force overwrite?

I'm all for not overwriting existing results! Good job!

I think it would be useful to add an option for allow users to forcibly overwrite existing outputs. Here's an example where it would be useful:

ls -lh
total 2.9M
drwxr-xr-x 2 rpetit users 4.0K Jan  6 16:35 empty_folder
-rw-r--r-- 1 rpetit users 2.9M Jan  6 16:31 ERX204841.fna


agrvate ERX204841.fna empty_folder/
cat: ERX204841-results/ERX204841-agr_gp.tab: No such file or directory
cat: ERX204841-results/ERX204841-agr_gp.tab: No such file or directory
Unable to agr type

Error: File existence/permissions problem in trying to open query file empty_folder//agrD_hmm.hmm.
HMM file empty_folder//agrD_hmm.hmm not found (nor an .h3m binary of it)

grep: ERX204841-results/ERX204841-hmm.tab: No such file or directory
Unable to find agrD
usearch11.0.667_i86linux32 not in path, cannot perform frameshift detection
 please download usearch11.0.667_i86linux32 from https://www.drive5.com/usearch/download.html
 
# Run failed but results folder was created, which is fine
ls -lh
total 2.9M
drwxr-xr-x 2 rpetit users 4.0K Jan  6 16:35 empty_folder
-rw-r--r-- 1 rpetit users 2.9M Jan  6 16:31 ERX204841.fna
drwxr-xr-x 2 rpetit users 4.0K Jan  6 16:42 ERX204841-results

# fixed database path
agrvate ERX204841.fna
Results directory already exists, cannot overwrite

# manually remove folder
rm -rf ERX204841-results/

agrvate ERX204841.fna
agr typing successful, gp1
usearch found
agr operon extraction successful
Snippy successful
No frameshifts found

Alternative

agrvate ERX204841.fna -f 
found ERX204841-results/, but '-f' given will delete results folder
agr typing successful, gp1
usearch found
agr operon extraction successful
Snippy successful
No frameshifts found

The result of frame shift is not consistent between output of snippy and blast result.

Thank you for this great tool to detect the mutations in agr operon!

Recently, I used this great tool for some MRSA genomes, and I found that for some strains, the position of frame shift was not consistent between the output from snippy (snps.tab) and the blast result (between the reference agr sequence and agr operon extracted by agrvate).

For instance, the position of frame shift in agrC of strain A from snippy was c.487delT (p.Tyr163fs); however, the blast result showed that the position of frame shift in agrC was c.481delT. I guess this might be caused by the repeated T base?

Please could you explain this phenomenon?

Best regards,
Tonny_z

check existence of individual database files

Currently only the existence of the database directory is checked. It would be useful if each database file used is checked before any processing occurs.

Steps to repeat:

mkdir empty_folder
agrvate ERX204841.fna empty_folder/
cat: ERX204841-results/ERX204841-agr_gp.tab: No such file or directory
cat: ERX204841-results/ERX204841-agr_gp.tab: No such file or directory
Unable to agr type

Error: File existence/permissions problem in trying to open query file empty_folder//agrD_hmm.hmm.
HMM file empty_folder//agrD_hmm.hmm not found (nor an .h3m binary of it)

grep: ERX204841-results/ERX204841-hmm.tab: No such file or directory
Unable to find agrD
usearch11.0.667_i86linux32 not in path, cannot perform frameshift detection
 please download usearch11.0.667_i86linux32 from https://www.drive5.com/usearch/download.html

thoughts on adding column headers to tabbed outputs?

What do you think about adding the column names to the tabbed outputs? I think it would be useful to users.

Something like:

cat ERX204841-results/ERX204841-summary.tab
ERX204841       gp1     13      1       s       0

cat ERX204841-results/ERX204841-summary.tab
filename	agr_group	match_score	canonical_agr	groups_found	frameshifts
ERX204841       gp1     13      1       s       0

why choose --minqual 1 --mincov 2?

Thank you for this great tool to detecting the mutations in agr operon!

I noticed that the values for --minqual and --mincov are 1 and 2, respectively. So why you choose these values?

Best regards,
Tonny_z

license needed

Yo! Before putting on Bioconda you will need to pick a license.

update citation to publication instead of preprint

Hi, thanks for developing this tool. I just noticed that the peer reviewed publication is out and thought you might want to link to that on the main /README.md instead of the pre-print.

https://journals.asm.org/doi/10.1128/spectrum.01334-21

check for usearch before processing

The check for usearch happens after processing would have occurred

https://github.com/VishnuRaghuram94/AgrVATE/blob/main/agrvate#L188-L200

I think it would be useful to check this at the start before processing.

You could then tell give the users the commands to download like you have in the README. Instead of the cp usearch /usr/bin you could use the same path that agrvate is in. Somthing like:

...
script_dir=$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )
...

echo "usearch11.0.667_i86linux32 not in path, cannot perform frameshift detection\n please download usearch11.0.667_i86linux32 from https://www.drive5.com/usearch/download.html"
echo "Example commands:"  1>&2
echo "wget usearch" 1>&2
echo "gunzip usearch" 1>&2
echo " chmod 755 usearch" 1>&2
echo cp usearch ${script_dir}/usearch11.0.667_i86linux32" 1>&2

help and version parameters don't work with a fasta file

Steps to repeat:

agrvate ERX204841.fna -h
-h does not exist

agrvate ERX204841.fna -v
-v does not exist

Expected behavior:

agrvate ERX204841.fna -h

AgrVATE: Agr Variant Assessment & Typing Engine

VERSION: agrvate v1.0

USAGE:   agrvate <fasta file>
         <fasta_file> <path/to/agrvate_databases> #Not required if installed using Conda

FLAGS:
  -h     Print this help message
  -v     Print version

SOURCE:  https://github.com/VishnuRaghuram94/AgrVATE

question about agr type argument in biorxiv paper

Is it true that agr types of every clonal complex should be the same? I have seen this trend at the mlst level but not clonal complex level. For example, ST188 is CC1 but has gp1 (not gp3 shown in your paper).

Many thanks,

dots in sample name cause name to get truncated to first dot

I input S.201202.00885.fna and the results were written to S-results

Processing S.201202.00885.fna ...
  /local/home/rpetit/miniconda3/envs/staph-typer2/bin/agrvate_databases/ is valid
  agr typing successful, gp3
  Mummer successful
  Unable to find agr operon, check S-results/S-mummer-log.txt

snippy not running

Column 7 of output returns "fail". All other outputs are returning "pass". I am running on mummer as I cannot download USearch on my current MacOS software.
Snippy is installed correctly as version 3.1
Please could you advise?

input for DREME

Hello,

Is it possible to adapt your tool for other staph species? If so, what would be the appropriate input to DREME?

Best wishes,

invalid inputs return exit code 0, and print to STDOUT

Very minor, but errors caught all return exit code 0 (no error). The error messages are also printed to STDOUT.

agrvate g
Invalid input

AgrVATE: Agr Variant Assessment & Typing Engine

VERSION: agrvate v1.0

USAGE:   agrvate <fasta file>
         <fasta_file> <path/to/agrvate_databases> #Not required if installed using Conda

FLAGS:
  -h     Print this help message
  -v     Print version

SOURCE:  https://github.com/VishnuRaghuram94/AgrVATE

echo $?
0

Lines affected:
https://github.com/VishnuRaghuram94/AgrVATE/blob/main/agrvate#L34
https://github.com/VishnuRaghuram94/AgrVATE/blob/main/agrvate#L41-L72
https://github.com/VishnuRaghuram94/AgrVATE/blob/main/agrvate#L84-L88
https://github.com/VishnuRaghuram94/AgrVATE/blob/main/agrvate#L91-L95
https://github.com/VishnuRaghuram94/AgrVATE/blob/main/agrvate#L191-L200
https://github.com/VishnuRaghuram94/AgrVATE/blob/main/agrvate#L206-L225

For the exits that you think should be an error, you can change them to exit 1

For messages you think should go to STDERR, I think you can do something like echo "my error message" 1>&2"

feature request - allow for user to define output prefix and/or output directory

It would be nice for the user to be able to define at the minimum the output filename prefix with an option like --outprefix <string> and even better, define the output directory with an option like --outdir <directory>. It would allow for more flexibility and predictablity of output filenames, which is useful for incorporation of agrvate into workflows.

AFAIK agrvate currently uses the filename prefix to name the output directory and resulting file names. Looks to me like it is cutting on the period, but perhaps I'm misunderstanding the code here: https://github.com/VishnuRaghuram94/AgrVATE/blob/main/agrvate#L138

$ agrvate -i asdfasdf12345.fasta -m
Processing asdfasdf12345.fasta ...
/usr/local/bin/agrvate_databases/ is valid
agr typing successful, gp1
Mummer successful
Extracting agr operon from mummer output
Mummer alignment is contiguous
agr operon extraction successful
Snippy successful
No frameshifts found

$ tree -L 1 asdfasdf12345-results/
asdfasdf12345-results/
├── asdfasdf12345-agr_gp.tab
├── asdfasdf12345-agr_operon.fna
├── asdfasdf12345-agr_operon_frameshifts.tab
├── asdfasdf12345-blastn_log.txt
├── asdfasdf12345-mummer
├── asdfasdf12345-mummer-log.txt
├── asdfasdf12345-snippy
├── asdfasdf12345-snippy-log.txt
└── asdfasdf12345-summary.tab

novel agr types?

Hello,

If the output of AgrVATE indicates that there are missense mutations in AgrD, does that indicate that my isolate has a novel Agr type? My understanding is that there should be no mutations in the AgrD

Many thanks for your wonderful tool,

vishnuraghuram94 / agrvate Goto Github PK

agrvate's People

Contributors

Stargazers

Watchers

Forkers

agrvate's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs