GithubHelp home page GithubHelp logo

joseph7e / assign-taxonomy-with-blast Goto Github PK

View Code? Open in Web Editor NEW
18.0 5.0 8.0 65 KB

Assign taxonomy with blast, can be used for qiime

Python 100.00%
taxonomy-assignment qiime blast ncbi-taxonomy ncbi-blast ncbi-database

assign-taxonomy-with-blast's Issues

taxonomy_assignment_BLAST.py: error: the following arguments are required: tax_file

Hello, thanks for this useful resource!

I am trying to run the taxonomy_assignment_BLAST.py in linux as follows:

python3 taxonomy_assignment_BLAST.py --length_percentage 70 --hits_to_consider 20 --blast_file ncbi_nt --output_dir dada_seqs_ITS2.fa expanded_ncbi_taxonomy.tsv

So the dada_seqs_ITS2.fa file is my sequence_file and the expanded_ncbi_taxonomy.tsv is my tax_file.

I get this error: the following arguments are required: tax_file

So it doesn't appear to be recognising the .tsv? Have I formatted the code wrong somehow?

Many thanks,

Amy

issue with grabbing taxonomy for blast hits using blast input file

Hello! I'm extremely new to bioinformatics/python/github, so I apologize if this is a super easy fix! Thank you in advance for any help/input.

I am trying assign taxonomy to trnL (plant chloroplast) OTUs with blast results already run and output in the required format (outfmt '6 qseqid qlen sseqid pident length qstart qend sstart send evalue bitscore staxids'). I'll provide a file example of what my blast output looks like below in case that is the issue, but it's a tab delimited file with 10 top hits per OTU.

The code I use when running is:
python3 taxonomy_assignment_BLAST.py TLotus.fa ./ncbi_taxonomy/expanded_ncbi_taxonomy.tsv --blast_database IGNORE --blast_file TLotus_taxhits.txt

where TLotus.fa is my file with my OTU sequences (not using since I already have a blast file)
expanded_ncbi_taxonomy.tsv is taxonomy file, built as instructed (preview of what file looks like attached)
TLotus_taxhits.txt is my blast_output with custom formatting

The command runs without error, but the returned taxonomy is the default for when there is no match in the taxonomy file. For example, this is what every OTU looks like this for every OTU:

#BLAST LINE : Otu4 94 NC_047481.1 100 94 1 94 52771 52864 1.52E-39 174 94 354624
#BLAST LINE : Otu4 94 MN308055.1 100 94 1 94 52771 52864 1.52E-39 174 94 354624
#BLAST LINE : Otu4 94 MK105463.1 100 94 1 94 52769 52862 1.52E-39 174 94 3512

ASSIGNING TAXONOMY FOR Otu4 total hits passing initial filters = 3
NC_047481.1 100 --> CAPTURED after percent sway filter
MN308055.1 100 --> CAPTURED after percent sway filter
MK105463.1 100 --> CAPTURED after percent sway filter

Providing consensus taxonomy up to level 14 : tmp6

X100_1 superkingdom;subkingdom;sub_subkingdom;kingdom;tmp1;tmp2;phylum;class;family;genus;species;tmp3;tmp4;tmp5;tmp6
X100_2 superkingdom;subkingdom;sub_subkingdom;kingdom;tmp1;tmp2;phylum;class;family;genus;species;tmp3;tmp4;tmp5;tmp6
X100_3 superkingdom;subkingdom;sub_subkingdom;kingdom;tmp1;tmp2;phylum;class;family;genus;species;tmp3;tmp4;tmp5;tmp6

Taxonomy Assignment for Otu4 = superkingdom:subkingdom:sub_subkingdom:kingdom:tmp1:tmp2:phylum:class:family:genus:species:tmp3:tmp4:tmp5:tmp6

It looks like it's reading my blast file correctly, as it's pulling the accession number and percent ID correctly. When I search the taxonomy file for the taxids in the blast file, they are present with normal taxonomy information.

Is this some sort of basic formatting error? I've tried looking over the python code but have little to no knowledge of python coding and cannot find the issue myself!

ncbi_expanded_taxonomy_EXAMPLE.txt

TLotus_taxhits_EXAMPLE.txt

remote blast output

I've yet to try the script myself, but curious how i could use it with output from a remote blast.
it looks like the input fasta and database files are compulsory inputs?

using script on custom database

Hi Joe, I am trying to use the script to assign taxonomy to some 18S sequences. For this, I have generated my own blast database from a blast search of first 10 hits for each of my sequences and tax file that connects my blast database sequences with their taxonomy.
My tax database has been curated to have 7 levels only k,p,c,o,f,g,s...

I tried to run the script and I get this message:
Traceback (most recent call last):
File "taxonomy_assignment_BLAST_V2.py", line 340, in
best_level_taxonomy, blast_percent = Assign_Taxonomy(current_query, current_best_hits)
File "taxonomy_assignment_BLAST_V2.py", line 288, in Assign_Taxonomy
s.add(j[i])
IndexError: list index out of range

Could it be that the script is not recognizing the categories in my tax file?

blast_output_custom_format.txt
log_file.txt

Error when trying to run script

File "taxonomy_assignment_BLAST.py", line 196, in
do_blast(args.sequence_file,args.blast_database,blast_file)
File "taxonomy_assignment_BLAST.py", line 188, in do_blast
log_and_print(err)
File "taxonomy_assignment_BLAST.py", line 76, in log_and_print
log_file_name_handle.writelines(statement+'\n')
TypeError: can't concat str to bytes

No idea what to do.

error with length percentage cutoff using 0.7 and higher

Hello,

The taxonomy_assignment_BLAST_V2.py script works great for taxonomic assignment using default parameters. However when I use it with the filtering option "length_percentage" with values equal to or > than 0.7 I get this error:

Traceback (most recent call last):
File "taxonomy_assignment_BLAST_V2.py", line 363, in
best_level_taxonomy, blast_percent = Assign_Taxonomy(current_query, current_best_hits)
File "taxonomy_assignment_BLAST_V2.py", line 248, in Assign_Taxonomy
if max(top_hits) >= args.cutoff_family:
ValueError: max() arg is an empty sequence

For some reason, it works well with values < 0.7.

Once again, thanks heaps for your time in advance.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.