joseph7e / assign-taxonomy-with-blast Goto Github PK
View Code? Open in Web Editor NEWAssign taxonomy with blast, can be used for qiime
Assign taxonomy with blast, can be used for qiime
Hello, thanks for this useful resource!
I am trying to run the taxonomy_assignment_BLAST.py in linux as follows:
python3 taxonomy_assignment_BLAST.py --length_percentage 70 --hits_to_consider 20 --blast_file ncbi_nt --output_dir dada_seqs_ITS2.fa expanded_ncbi_taxonomy.tsv
So the dada_seqs_ITS2.fa file is my sequence_file and the expanded_ncbi_taxonomy.tsv is my tax_file.
I get this error: the following arguments are required: tax_file
So it doesn't appear to be recognising the .tsv? Have I formatted the code wrong somehow?
Many thanks,
Amy
Hello! I'm extremely new to bioinformatics/python/github, so I apologize if this is a super easy fix! Thank you in advance for any help/input.
I am trying assign taxonomy to trnL (plant chloroplast) OTUs with blast results already run and output in the required format (outfmt '6 qseqid qlen sseqid pident length qstart qend sstart send evalue bitscore staxids'). I'll provide a file example of what my blast output looks like below in case that is the issue, but it's a tab delimited file with 10 top hits per OTU.
The code I use when running is:
python3 taxonomy_assignment_BLAST.py TLotus.fa ./ncbi_taxonomy/expanded_ncbi_taxonomy.tsv --blast_database IGNORE --blast_file TLotus_taxhits.txt
where TLotus.fa is my file with my OTU sequences (not using since I already have a blast file)
expanded_ncbi_taxonomy.tsv is taxonomy file, built as instructed (preview of what file looks like attached)
TLotus_taxhits.txt is my blast_output with custom formatting
#BLAST LINE : Otu4 94 NC_047481.1 100 94 1 94 52771 52864 1.52E-39 174 94 354624
#BLAST LINE : Otu4 94 MN308055.1 100 94 1 94 52771 52864 1.52E-39 174 94 354624
#BLAST LINE : Otu4 94 MK105463.1 100 94 1 94 52769 52862 1.52E-39 174 94 3512
ASSIGNING TAXONOMY FOR Otu4 total hits passing initial filters = 3
NC_047481.1 100 --> CAPTURED after percent sway filter
MN308055.1 100 --> CAPTURED after percent sway filter
MK105463.1 100 --> CAPTURED after percent sway filter
X100_1 superkingdom;subkingdom;sub_subkingdom;kingdom;tmp1;tmp2;phylum;class;family;genus;species;tmp3;tmp4;tmp5;tmp6
X100_2 superkingdom;subkingdom;sub_subkingdom;kingdom;tmp1;tmp2;phylum;class;family;genus;species;tmp3;tmp4;tmp5;tmp6
X100_3 superkingdom;subkingdom;sub_subkingdom;kingdom;tmp1;tmp2;phylum;class;family;genus;species;tmp3;tmp4;tmp5;tmp6
Taxonomy Assignment for Otu4 = superkingdom:subkingdom:sub_subkingdom:kingdom:tmp1:tmp2:phylum:class:family:genus:species:tmp3:tmp4:tmp5:tmp6
It looks like it's reading my blast file correctly, as it's pulling the accession number and percent ID correctly. When I search the taxonomy file for the taxids in the blast file, they are present with normal taxonomy information.
Is this some sort of basic formatting error? I've tried looking over the python code but have little to no knowledge of python coding and cannot find the issue myself!
Hi,
The link to the customized database ncbi taxonomy database for this script: --> http://cobb.unh.edu/ncbi_taxonomy_expanded.tsv.gz is broken/the file can not be downloaded. Could you please give an example of a customized database? What should be the format of a customized database to be able to use the assign_taxonomy_with_blast.py script please?
Many thanks
I've yet to try the script myself, but curious how i could use it with output from a remote blast.
it looks like the input fasta and database files are compulsory inputs?
Hi Joe, I am trying to use the script to assign taxonomy to some 18S sequences. For this, I have generated my own blast database from a blast search of first 10 hits for each of my sequences and tax file that connects my blast database sequences with their taxonomy.
My tax database has been curated to have 7 levels only k,p,c,o,f,g,s...
I tried to run the script and I get this message:
Traceback (most recent call last):
File "taxonomy_assignment_BLAST_V2.py", line 340, in
best_level_taxonomy, blast_percent = Assign_Taxonomy(current_query, current_best_hits)
File "taxonomy_assignment_BLAST_V2.py", line 288, in Assign_Taxonomy
s.add(j[i])
IndexError: list index out of range
Could it be that the script is not recognizing the categories in my tax file?
File "taxonomy_assignment_BLAST.py", line 196, in
do_blast(args.sequence_file,args.blast_database,blast_file)
File "taxonomy_assignment_BLAST.py", line 188, in do_blast
log_and_print(err)
File "taxonomy_assignment_BLAST.py", line 76, in log_and_print
log_file_name_handle.writelines(statement+'\n')
TypeError: can't concat str to bytes
No idea what to do.
Hello,
The taxonomy_assignment_BLAST_V2.py script works great for taxonomic assignment using default parameters. However when I use it with the filtering option "length_percentage" with values equal to or > than 0.7 I get this error:
Traceback (most recent call last):
File "taxonomy_assignment_BLAST_V2.py", line 363, in
best_level_taxonomy, blast_percent = Assign_Taxonomy(current_query, current_best_hits)
File "taxonomy_assignment_BLAST_V2.py", line 248, in Assign_Taxonomy
if max(top_hits) >= args.cutoff_family:
ValueError: max() arg is an empty sequence
For some reason, it works well with values < 0.7.
Once again, thanks heaps for your time in advance.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.