Hello!
First of all, I would like to thank you for making AutoTax publicly available. I am attempting to recreate the output you generated using your original script which uses USEARCH
. At this moment I don’t have the 64-bit version of USEARCH
available to me. When I ran your script and arrived at the “Finding taxonomy of best hit in SILVA database”-step the memory limit of the 32-bit process was exceeded. So I changed the command to its VSEARCH
counterpart which is:
For searchTaxDB:
$ vsearch -usearch_global $input -db $database -maxaccepts 0 -maxrejects 0 -top_hits_only -strand plus -id 0 -blast6out $output -threads $MAX_THREADS
For searchTaxDB_typestrain:
$ vsearch -usearch_global $input -db $database -maxaccepts 0 -maxrejects 0 -strand plus -id 0.987 -blast6out $output -threads $MAX_THREADS
The script ran smoothly but when it arrived at the mergeTaxonomy-step I got this output:
Matching unique query sequences: 6 of 85 (7.06%)
[2021-05-27 13:00:24]: Clustering FLASV's at Species level (98.7% identity)
[2021-05-27 13:00:24]: Clustering FLASV's at Genus level (94.5% identity)
[2021-05-27 13:00:24]: Clustering FLASV's at Family level (86.5% identity)
[2021-05-27 13:00:24]: Clustering FLASV's at Order level (82.0% identity)
[2021-05-27 13:00:25]: Clustering FLASV's at Class level (78.5% identity)
[2021-05-27 13:00:25]: Clustering FLASV's at Phylum level (75.0% identity)
[2021-05-27 13:00:25]: Merging and outputting taxonomy...
No replacement file found, skipping...
Error in S4Vectors:::normarg_names(value, class(x), length(x)) :
attempt to set too many names (89) on GroupedIRanges object of length
85
Calls: names<- -> names<- -> names<- -> names<- ->
Execution halted
These are the files (three missing) that I've got in the /output
folder:
tax_complete.csv
tax_denovo.csv
tax_SILVA.csv
tax_slv_typestr.csv
tax_typestrains.csv
I thought that there was something wrong with my own databases. So I changed the databases to the ones you’ve originally used (SILVA_138_SSURef_NR99) and shared here: https://figshare.com/articles/dataset/SILVA132_typestrains_in_ARB_UDB11_format/9994568?file=22790396
But when I tried to run your script with your original databases I get this error on the orient-step.
[2021-05-27 13:56:11]: Checking for required R packages and installing if missing...
[2021-05-27 13:56:13]: - Orienting sequences...
usearch v11.0.667_i86linux32, 4.0Gb RAM (198Gb total), 128 cores
(C) Copyright 2013-18 Robert C. Edgar, all rights reserved.
https://drive5.com/usearch
00:03 2.8Gb 100.0% Rows
00:03 2.8Gb Reading pointers...done.
00:03 2.9Gb Reading db seqs...
usearch -orient test/example_data/5K_addon_FLASVs.fa -db /home/genomics/npeeters/AutoTax/refdatabases/SILVA_138_SSURef_NR99_tax_silvaK.udb -fastaout temp/fSSUs_oriented.fa -threads 1
---Fatal error---
ReadStdioFile failed, attempted 754773441 bytes, read 748130870 bytes, errno=0
Have you encountered this problem before? Is there a way to fix this?
Thank you in advance!