Comments (8)
Hi @davised
Thanks a lot for helping me out with this issue! It was indeed an issue with the assembly version used.
I have run the analysis again with the version you have suggested and now the gyrB and atpD genes are found.
Cheers,
Pablo
from automlsa2.
I pulled the genes out of K599 and ran them against K599 myself and everything seems correct.
You can look in the blast folder and see what % the hits are against K599. In my test, they are 100% ID as expected.
❯ cat K599_test.log
[02/11/22 10:22:31] DEBUG Started autoMLSA.py for run: K599_test parse_args.py:264
DEBUG {'allow_missing': None, parse_args.py:265
'checkpoint': 'filtering',
'config': None,
'configfile':
'/home/davised/code/automlsa2/issues/issue_4/K599_test/config.json',
'coverage': None,
'debug': False,
'dir': None,
'dups': False,
'evalue': None,
'external': None,
'files': ['/home/davised/code/automlsa2/issues/issue_4/Agrobacterium_rh
izogenes_K599.fna'],
'identity': None,
'install_deps': None,
'iqtree': None,
'mafft': None,
'missing_check': False,
'outgroup': None,
'program': 'blastn',
'protect': False,
'query': ['/home/davised/code/automlsa2/issues/issue_4/markers.ffn'],
'quiet': False,
'rundir': '/home/davised/code/automlsa2/issues/issue_4/K599_test',
'runid': 'K599_test',
'threads': 4}
INFO Welcome to autoMLSA.py version 0.7.1 __main__.py:37
[02/11/22 10:22:32] DEBUG BLAST found in external dir. validate_requirements.py:139
DEBUG Checking tblastn: validate_requirements.py:175
/home/davised/.local/external/ncbi-blast-2.10.0+/bin/tblastn
[02/11/22 10:22:33] DEBUG Program version found: tblastn -> 2.10.0; ref -> 2.10.0 validate_requirements.py:65
DEBUG tblastn found: 2.10.0 validate_requirements.py:102
DEBUG Checking blastn: validate_requirements.py:181
/home/davised/.local/external/ncbi-blast-2.10.0+/bin/blastn
DEBUG Program version found: blastn -> 2.10.0; ref -> 2.10.0 validate_requirements.py:65
DEBUG blastn found: 2.10.0 validate_requirements.py:102
DEBUG Checking makeblastdb: /home/davised/.local/external/ncbi-blas validate_requirements.py:187
t-2.10.0+/bin/makeblastdb
DEBUG Program version found: makeblastdb -> 2.10.0; ref -> 2.10.0 validate_requirements.py:65
DEBUG makeblastdb found: 2.10.0 validate_requirements.py:102
DEBUG Checking mafft: mafft validate_requirements.py:194
DEBUG Program version found: mafft -> 7.487; ref -> 7.471 validate_requirements.py:65
DEBUG mafft found: 7.487 validate_requirements.py:102
DEBUG Checking iqtree: iqtree2 validate_requirements.py:201
DEBUG Program version found: iqtree -> 2.1.1; ref -> 2.1.1 validate_requirements.py:65
DEBUG iqtree found: 2.1.1 validate_requirements.py:102
INFO Reconciling configuration settings. __main__.py:43
DEBUG Validating & reconciling arguments. configuration.py:84
DEBUG Max threads found 8 configuration.py:265
DEBUG Validated arguments: configuration.py:340
DEBUG configuration.py:341
{ 'allow_missing': 0,
'checkpoint': 'filtering',
'config': None,
'configfile':
'/home/davised/code/automlsa2/issues/issue_4/K599_test/config.json',
'coverage': 50,
'debug': False,
'dir': [],
'dups': False,
'evalue': 1e-05,
'external': '',
'fasta': [ '/home/davised/code/automlsa2/issues/issue_4/Agrobac
terium_rhizogenes_K599.fna'],
'files': [ '/home/davised/code/automlsa2/issues/issue_4/Agrobac
terium_rhizogenes_K599.fna'],
'identity': 30,
'install_deps': None,
'iqtree': '-m MFP -B 1000 -alrt 1000 --msub nuclear --merge
rclusterf',
'mafft': '--localpair --maxiterate 1000 --reorder',
'missing_check': False,
'outgroup': '',
'program': 'blastn',
'protect': False,
'query':
['/home/davised/code/automlsa2/issues/issue_4/markers.ffn'],
'quiet': False,
'rundir':
'/home/davised/code/automlsa2/issues/issue_4/K599_test',
'runid': 'K599_test',
'threads': 4}
DEBUG Writing config file configuration.py:360
/home/davised/code/automlsa2/issues/issue_4/K599_test/config.json
INFO Converting genome FASTA files for BLAST if necessary. __main__.py:55
DEBUG Generating list of labels /home/davised/code/automlsa2/issues/issue_4 configuration.py:389
/K599_test/.autoMLSA/labels.json
DEBUG Writing renamed fasta for Agrobacterium_rhizogenes_K599.fna formatting.py:67
INFO Generating and/or validating BLAST DBs. formatting.py:76
DEBUG Found new genome files formatting.py:90
INFO Extracting query FASTA files if necessary. __main__.py:61
DEBUG Reading query file formatting.py:161
(/home/davised/code/automlsa2/issues/issue_4/markers.ffn)
DEBUG Writing recA_e73466028f65c4ee8cd3b836183bb9b0.fas for seq 1 in formatting.py:257
markers.ffn
DEBUG Writing rpoB_345e7a1fe323e9dda81cc28714e31a3c.fas for seq 2 in formatting.py:257
markers.ffn
DEBUG Writing atpD_e7fbda600ef46a45adbfafdb4b2aa415.fas for seq 3 in formatting.py:257
markers.ffn
DEBUG Writing gyrB_ba2146a65bd9ab3898850322c103db70.fas for seq 4 in formatting.py:257
markers.ffn
DEBUG Writing 16S_38bf19ee874a379135ef530d49b7c17b.fas for seq 5 in formatting.py:257
markers.ffn
DEBUG Found new query sequences formatting.py:281
INFO Generating list of BLAST searches and outputs. __main__.py:67
INFO Running 5 BLAST searches using 4 CPUs. blast_functions.py:115
INFO Reading BLAST results. blast_functions.py:156
INFO Summarizing and filtering BLAST hits. blast_functions.py:213
INFO Keeping these genomes (1): blast_functions.py:284
- Agrobacterium_rhizogenes_K599.fna
DEBUG Found new filtered sequences blast_functions.py:332
DEBUG Removing downstream files, if present blast_functions.py:333
INFO Checkpoint reached after BLAST result filtering. Stopping... helper_functions.py:151
INFO Program was stopped at an intermediate stage. helper_functions.py:83
from automlsa2.
Based on the hash values, I have determined your query sequences are the same as those I used:
❯ cat test.log | awk '{ print $3 }' | tr '\n' -d | sed 's/--/\n/g'
16S_38bf19ee874a379135ef530d49b7c17b.fas
rpoB_345e7a1fe323e9dda81cc28714e31a3c.fas
recA_e73466028f65c4ee8cd3b836183bb9b0.fas
gyrB_ba2146a65bd9ab3898850322c103db70.fas
atpD_e7fbda600ef46a45adbfafdb4b2aa415.fas
❯ ls queries/ -1
16S_38bf19ee874a379135ef530d49b7c17b.fas
atpD_e7fbda600ef46a45adbfafdb4b2aa415.fas
gyrB_ba2146a65bd9ab3898850322c103db70.fas
recA_e73466028f65c4ee8cd3b836183bb9b0.fas
rpoB_345e7a1fe323e9dda81cc28714e31a3c.fas
I can't figure out if the K599 fna files are the same because I don't print out the hash values of each contig in the logs.
from automlsa2.
Hi @davised
Thanks a lot for checking my issue!
I am surprised you found the target genes when you ran the analysis. Are you working in a linux environment or in Mac one? I am running in a Mac environment. Perhaps I am missing one package or something?
Do you have any idea what could be the issue on my side?
Cheers,
PAblo
from automlsa2.
mac vs linux should not be affecting this.
I will update my blast binaries to see if that would affect it, but it should not.
If you tar the blast results and send them to me I can take a look at those (the blast folder).
from automlsa2.
I updated my binaries and everything looks good still. So I don't believe it's a program mismatch issue.
from automlsa2.
Hi @davised
I am sending you the blast results. The assembly I am having issues with (the K599) is GCF_002005205.
from automlsa2.
Can you download the latest assembly from here:
https://www.ncbi.nlm.nih.gov/assembly/GCF_002005205.3/
ftp.ncbi.nlm.nih.gov/genomes/all/GCA/002/005/205/GCA_002005205.3_ASM200520v3/GCA_002005205.3_ASM200520v3_genomic.fna.gz
and give it another shot?
gyrB and atpD are not being found by blast in that assembly, so I assume that assembly is incomplete in some way. This doesn't appear to be an issue with automlsa2.
atpD_e7fbda600ef46a45adbfafdb4b2aa415_vs_GCF_002005205.fasta.tab:# BLASTN 2.12.0+
atpD_e7fbda600ef46a45adbfafdb4b2aa415_vs_GCF_002005205.fasta.tab:# Query: atpD
atpD_e7fbda600ef46a45adbfafdb4b2aa415_vs_GCF_002005205.fasta.tab:# Database: /Users/pablo/Downloads/K599_mlsa_test2_K599_markers/fasta/GCF_002005205.fasta
atpD_e7fbda600ef46a45adbfafdb4b2aa415_vs_GCF_002005205.fasta.tab:# 0 hits found
atpD_e7fbda600ef46a45adbfafdb4b2aa415_vs_GCF_002005205.fasta.tab:# BLAST processed 1 queries
gyrB_ba2146a65bd9ab3898850322c103db70_vs_GCF_002005205.fasta.tab:# BLASTN 2.12.0+
gyrB_ba2146a65bd9ab3898850322c103db70_vs_GCF_002005205.fasta.tab:# Query: gyrB
gyrB_ba2146a65bd9ab3898850322c103db70_vs_GCF_002005205.fasta.tab:# Database: /Users/pablo/Downloads/K599_mlsa_test2_K599_markers/fasta/GCF_002005205.fasta
gyrB_ba2146a65bd9ab3898850322c103db70_vs_GCF_002005205.fasta.tab:# 0 hits found
gyrB_ba2146a65bd9ab3898850322c103db70_vs_GCF_002005205.fasta.tab:# BLAST processed 1 queries
from automlsa2.
Related Issues (5)
- multiple gene copies HOT 2
- Final tree file HOT 1
- Missing genome in final output HOT 9
- np.float issue HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from automlsa2.