vishnuraghuram94 / agrvate Goto Github PK
View Code? Open in Web Editor NEWRapid identification of Staphylococcus aureus agr locus type and agr operon variants.
License: MIT License
Rapid identification of Staphylococcus aureus agr locus type and agr operon variants.
License: MIT License
I'm all for not overwriting existing results! Good job!
I think it would be useful to add an option for allow users to forcibly overwrite existing outputs. Here's an example where it would be useful:
ls -lh
total 2.9M
drwxr-xr-x 2 rpetit users 4.0K Jan 6 16:35 empty_folder
-rw-r--r-- 1 rpetit users 2.9M Jan 6 16:31 ERX204841.fna
agrvate ERX204841.fna empty_folder/
cat: ERX204841-results/ERX204841-agr_gp.tab: No such file or directory
cat: ERX204841-results/ERX204841-agr_gp.tab: No such file or directory
Unable to agr type
Error: File existence/permissions problem in trying to open query file empty_folder//agrD_hmm.hmm.
HMM file empty_folder//agrD_hmm.hmm not found (nor an .h3m binary of it)
grep: ERX204841-results/ERX204841-hmm.tab: No such file or directory
Unable to find agrD
usearch11.0.667_i86linux32 not in path, cannot perform frameshift detection
please download usearch11.0.667_i86linux32 from https://www.drive5.com/usearch/download.html
# Run failed but results folder was created, which is fine
ls -lh
total 2.9M
drwxr-xr-x 2 rpetit users 4.0K Jan 6 16:35 empty_folder
-rw-r--r-- 1 rpetit users 2.9M Jan 6 16:31 ERX204841.fna
drwxr-xr-x 2 rpetit users 4.0K Jan 6 16:42 ERX204841-results
# fixed database path
agrvate ERX204841.fna
Results directory already exists, cannot overwrite
# manually remove folder
rm -rf ERX204841-results/
agrvate ERX204841.fna
agr typing successful, gp1
usearch found
agr operon extraction successful
Snippy successful
No frameshifts found
Alternative
agrvate ERX204841.fna -f
found ERX204841-results/, but '-f' given will delete results folder
agr typing successful, gp1
usearch found
agr operon extraction successful
Snippy successful
No frameshifts found
Thank you for this great tool to detect the mutations in agr operon!
Recently, I used this great tool for some MRSA genomes, and I found that for some strains, the position of frame shift was not consistent between the output from snippy (snps.tab) and the blast result (between the reference agr sequence and agr operon extracted by agrvate).
For instance, the position of frame shift in agrC of strain A from snippy was c.487delT (p.Tyr163fs); however, the blast result showed that the position of frame shift in agrC was c.481delT. I guess this might be caused by the repeated T base?
Please could you explain this phenomenon?
Best regards,
Tonny_z
Currently only the existence of the database directory is checked. It would be useful if each database file used is checked before any processing occurs.
Steps to repeat:
mkdir empty_folder
agrvate ERX204841.fna empty_folder/
cat: ERX204841-results/ERX204841-agr_gp.tab: No such file or directory
cat: ERX204841-results/ERX204841-agr_gp.tab: No such file or directory
Unable to agr type
Error: File existence/permissions problem in trying to open query file empty_folder//agrD_hmm.hmm.
HMM file empty_folder//agrD_hmm.hmm not found (nor an .h3m binary of it)
grep: ERX204841-results/ERX204841-hmm.tab: No such file or directory
Unable to find agrD
usearch11.0.667_i86linux32 not in path, cannot perform frameshift detection
please download usearch11.0.667_i86linux32 from https://www.drive5.com/usearch/download.html
What do you think about adding the column names to the tabbed outputs? I think it would be useful to users.
Something like:
cat ERX204841-results/ERX204841-summary.tab
ERX204841 gp1 13 1 s 0
cat ERX204841-results/ERX204841-summary.tab
filename agr_group match_score canonical_agr groups_found frameshifts
ERX204841 gp1 13 1 s 0
Thank you for this great tool to detecting the mutations in agr operon!
I noticed that the values for --minqual and --mincov are 1 and 2, respectively. So why you choose these values?
Best regards,
Tonny_z
Yo! Before putting on Bioconda you will need to pick a license.
Hi, thanks for developing this tool. I just noticed that the peer reviewed publication is out and thought you might want to link to that on the main /README.md
instead of the pre-print.
The check for usearch happens after processing would have occurred
https://github.com/VishnuRaghuram94/AgrVATE/blob/main/agrvate#L188-L200
I think it would be useful to check this at the start before processing.
You could then tell give the users the commands to download like you have in the README. Instead of the cp usearch /usr/bin
you could use the same path that agrvate is in. Somthing like:
...
script_dir=$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )
...
echo "usearch11.0.667_i86linux32 not in path, cannot perform frameshift detection\n please download usearch11.0.667_i86linux32 from https://www.drive5.com/usearch/download.html"
echo "Example commands:" 1>&2
echo "wget usearch" 1>&2
echo "gunzip usearch" 1>&2
echo " chmod 755 usearch" 1>&2
echo cp usearch ${script_dir}/usearch11.0.667_i86linux32" 1>&2
Steps to repeat:
agrvate ERX204841.fna -h
-h does not exist
agrvate ERX204841.fna -v
-v does not exist
Expected behavior:
agrvate ERX204841.fna -h
AgrVATE: Agr Variant Assessment & Typing Engine
VERSION: agrvate v1.0
USAGE: agrvate <fasta file>
<fasta_file> <path/to/agrvate_databases> #Not required if installed using Conda
FLAGS:
-h Print this help message
-v Print version
SOURCE: https://github.com/VishnuRaghuram94/AgrVATE
Is it true that agr types of every clonal complex should be the same? I have seen this trend at the mlst level but not clonal complex level. For example, ST188 is CC1 but has gp1 (not gp3 shown in your paper).
Many thanks,
I input S.201202.00885.fna
and the results were written to S-results
Processing S.201202.00885.fna ...
/local/home/rpetit/miniconda3/envs/staph-typer2/bin/agrvate_databases/ is valid
agr typing successful, gp3
Mummer successful
Unable to find agr operon, check S-results/S-mummer-log.txt
Column 7 of output returns "fail". All other outputs are returning "pass". I am running on mummer as I cannot download USearch on my current MacOS software.
Snippy is installed correctly as version 3.1
Please could you advise?
Hello,
Is it possible to adapt your tool for other staph species? If so, what would be the appropriate input to DREME?
Best wishes,
Very minor, but errors caught all return exit code 0 (no error). The error messages are also printed to STDOUT.
agrvate g
Invalid input
AgrVATE: Agr Variant Assessment & Typing Engine
VERSION: agrvate v1.0
USAGE: agrvate <fasta file>
<fasta_file> <path/to/agrvate_databases> #Not required if installed using Conda
FLAGS:
-h Print this help message
-v Print version
SOURCE: https://github.com/VishnuRaghuram94/AgrVATE
echo $?
0
Lines affected:
https://github.com/VishnuRaghuram94/AgrVATE/blob/main/agrvate#L34
https://github.com/VishnuRaghuram94/AgrVATE/blob/main/agrvate#L41-L72
https://github.com/VishnuRaghuram94/AgrVATE/blob/main/agrvate#L84-L88
https://github.com/VishnuRaghuram94/AgrVATE/blob/main/agrvate#L91-L95
https://github.com/VishnuRaghuram94/AgrVATE/blob/main/agrvate#L191-L200
https://github.com/VishnuRaghuram94/AgrVATE/blob/main/agrvate#L206-L225
For the exits that you think should be an error, you can change them to exit 1
For messages you think should go to STDERR, I think you can do something like echo "my error message" 1>&2"
It would be nice for the user to be able to define at the minimum the output filename prefix with an option like --outprefix <string>
and even better, define the output directory with an option like --outdir <directory>
. It would allow for more flexibility and predictablity of output filenames, which is useful for incorporation of agrvate
into workflows.
AFAIK agrvate
currently uses the filename prefix to name the output directory and resulting file names. Looks to me like it is cutting on the period, but perhaps I'm misunderstanding the code here: https://github.com/VishnuRaghuram94/AgrVATE/blob/main/agrvate#L138
$ agrvate -i asdfasdf12345.fasta -m
Processing asdfasdf12345.fasta ...
/usr/local/bin/agrvate_databases/ is valid
agr typing successful, gp1
Mummer successful
Extracting agr operon from mummer output
Mummer alignment is contiguous
agr operon extraction successful
Snippy successful
No frameshifts found
$ tree -L 1 asdfasdf12345-results/
asdfasdf12345-results/
├── asdfasdf12345-agr_gp.tab
├── asdfasdf12345-agr_operon.fna
├── asdfasdf12345-agr_operon_frameshifts.tab
├── asdfasdf12345-blastn_log.txt
├── asdfasdf12345-mummer
├── asdfasdf12345-mummer-log.txt
├── asdfasdf12345-snippy
├── asdfasdf12345-snippy-log.txt
└── asdfasdf12345-summary.tab
Hello,
If the output of AgrVATE indicates that there are missense mutations in AgrD, does that indicate that my isolate has a novel Agr type? My understanding is that there should be no mutations in the AgrD
Many thanks for your wonderful tool,
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.