GithubHelp home page GithubHelp logo

missing genes about automlsa2 HOT 8 CLOSED

pavlo888 avatar pavlo888 commented on July 2, 2024
missing genes

from automlsa2.

Comments (8)

pavlo888 avatar pavlo888 commented on July 2, 2024 1

Hi @davised

Thanks a lot for helping me out with this issue! It was indeed an issue with the assembly version used.

I have run the analysis again with the version you have suggested and now the gyrB and atpD genes are found.

Cheers,
Pablo

from automlsa2.

davised avatar davised commented on July 2, 2024

I pulled the genes out of K599 and ran them against K599 myself and everything seems correct.

You can look in the blast folder and see what % the hits are against K599. In my test, they are 100% ID as expected.

❯ cat K599_test.log
[02/11/22 10:22:31] DEBUG    Started autoMLSA.py for run: K599_test                                   parse_args.py:264
                    DEBUG    {'allow_missing': None,                                                  parse_args.py:265
                              'checkpoint': 'filtering',
                              'config': None,
                              'configfile':
                             '/home/davised/code/automlsa2/issues/issue_4/K599_test/config.json',
                              'coverage': None,
                              'debug': False,
                              'dir': None,
                              'dups': False,
                              'evalue': None,
                              'external': None,
                              'files': ['/home/davised/code/automlsa2/issues/issue_4/Agrobacterium_rh
                             izogenes_K599.fna'],
                              'identity': None,
                              'install_deps': None,
                              'iqtree': None,
                              'mafft': None,
                              'missing_check': False,
                              'outgroup': None,
                              'program': 'blastn',
                              'protect': False,
                              'query': ['/home/davised/code/automlsa2/issues/issue_4/markers.ffn'],
                              'quiet': False,
                              'rundir': '/home/davised/code/automlsa2/issues/issue_4/K599_test',
                              'runid': 'K599_test',
                              'threads': 4}
                    INFO     Welcome to autoMLSA.py version 0.7.1                                        __main__.py:37
[02/11/22 10:22:32] DEBUG    BLAST found in external dir.                                  validate_requirements.py:139
                    DEBUG    Checking tblastn:                                             validate_requirements.py:175
                             /home/davised/.local/external/ncbi-blast-2.10.0+/bin/tblastn
[02/11/22 10:22:33] DEBUG    Program version found: tblastn -> 2.10.0; ref -> 2.10.0        validate_requirements.py:65
                    DEBUG    tblastn found: 2.10.0                                         validate_requirements.py:102
                    DEBUG    Checking blastn:                                              validate_requirements.py:181
                             /home/davised/.local/external/ncbi-blast-2.10.0+/bin/blastn
                    DEBUG    Program version found: blastn -> 2.10.0; ref -> 2.10.0         validate_requirements.py:65
                    DEBUG    blastn found: 2.10.0                                          validate_requirements.py:102
                    DEBUG    Checking makeblastdb: /home/davised/.local/external/ncbi-blas validate_requirements.py:187
                             t-2.10.0+/bin/makeblastdb
                    DEBUG    Program version found: makeblastdb -> 2.10.0; ref -> 2.10.0    validate_requirements.py:65
                    DEBUG    makeblastdb found: 2.10.0                                     validate_requirements.py:102
                    DEBUG    Checking mafft: mafft                                         validate_requirements.py:194
                    DEBUG    Program version found: mafft -> 7.487; ref -> 7.471            validate_requirements.py:65
                    DEBUG    mafft found: 7.487                                            validate_requirements.py:102
                    DEBUG    Checking iqtree: iqtree2                                      validate_requirements.py:201
                    DEBUG    Program version found: iqtree -> 2.1.1; ref -> 2.1.1           validate_requirements.py:65
                    DEBUG    iqtree found: 2.1.1                                           validate_requirements.py:102
                    INFO     Reconciling configuration settings.                                         __main__.py:43
                    DEBUG    Validating & reconciling arguments.                                    configuration.py:84
                    DEBUG    Max threads found 8                                                   configuration.py:265
                    DEBUG    Validated arguments:                                                  configuration.py:340
                    DEBUG                                                                          configuration.py:341
                             {   'allow_missing': 0,
                                 'checkpoint': 'filtering',
                                 'config': None,
                                 'configfile':
                             '/home/davised/code/automlsa2/issues/issue_4/K599_test/config.json',
                                 'coverage': 50,
                                 'debug': False,
                                 'dir': [],
                                 'dups': False,
                                 'evalue': 1e-05,
                                 'external': '',
                                 'fasta': [   '/home/davised/code/automlsa2/issues/issue_4/Agrobac
                             terium_rhizogenes_K599.fna'],
                                 'files': [   '/home/davised/code/automlsa2/issues/issue_4/Agrobac
                             terium_rhizogenes_K599.fna'],
                                 'identity': 30,
                                 'install_deps': None,
                                 'iqtree': '-m MFP -B 1000 -alrt 1000 --msub nuclear --merge
                             rclusterf',
                                 'mafft': '--localpair --maxiterate 1000 --reorder',
                                 'missing_check': False,
                                 'outgroup': '',
                                 'program': 'blastn',
                                 'protect': False,
                                 'query':
                             ['/home/davised/code/automlsa2/issues/issue_4/markers.ffn'],
                                 'quiet': False,
                                 'rundir':
                             '/home/davised/code/automlsa2/issues/issue_4/K599_test',
                                 'runid': 'K599_test',
                                 'threads': 4}
                    DEBUG    Writing config file                                                   configuration.py:360
                             /home/davised/code/automlsa2/issues/issue_4/K599_test/config.json
                    INFO     Converting genome FASTA files for BLAST if necessary.                       __main__.py:55
                    DEBUG    Generating list of labels /home/davised/code/automlsa2/issues/issue_4 configuration.py:389
                             /K599_test/.autoMLSA/labels.json
                    DEBUG    Writing renamed fasta for Agrobacterium_rhizogenes_K599.fna               formatting.py:67
                    INFO     Generating and/or validating BLAST DBs.                                   formatting.py:76
                    DEBUG    Found new genome files                                                    formatting.py:90
                    INFO     Extracting query FASTA files if necessary.                                  __main__.py:61
                    DEBUG    Reading query file                                                       formatting.py:161
                             (/home/davised/code/automlsa2/issues/issue_4/markers.ffn)
                    DEBUG    Writing recA_e73466028f65c4ee8cd3b836183bb9b0.fas for seq 1 in           formatting.py:257
                             markers.ffn
                    DEBUG    Writing rpoB_345e7a1fe323e9dda81cc28714e31a3c.fas for seq 2 in           formatting.py:257
                             markers.ffn
                    DEBUG    Writing atpD_e7fbda600ef46a45adbfafdb4b2aa415.fas for seq 3 in           formatting.py:257
                             markers.ffn
                    DEBUG    Writing gyrB_ba2146a65bd9ab3898850322c103db70.fas for seq 4 in           formatting.py:257
                             markers.ffn
                    DEBUG    Writing 16S_38bf19ee874a379135ef530d49b7c17b.fas for seq 5 in            formatting.py:257
                             markers.ffn
                    DEBUG    Found new query sequences                                                formatting.py:281
                    INFO     Generating list of BLAST searches and outputs.                              __main__.py:67
                    INFO     Running 5 BLAST searches using 4 CPUs.                              blast_functions.py:115
                    INFO     Reading BLAST results.                                              blast_functions.py:156
                    INFO     Summarizing and filtering BLAST hits.                               blast_functions.py:213
                    INFO     Keeping these genomes (1):                                          blast_functions.py:284
                                      - Agrobacterium_rhizogenes_K599.fna
                    DEBUG    Found new filtered sequences                                        blast_functions.py:332
                    DEBUG    Removing downstream files, if present                               blast_functions.py:333
                    INFO     Checkpoint reached after BLAST result filtering. Stopping...       helper_functions.py:151
                    INFO     Program was stopped at an intermediate stage.                       helper_functions.py:83

from automlsa2.

davised avatar davised commented on July 2, 2024

Based on the hash values, I have determined your query sequences are the same as those I used:

❯ cat test.log  | awk '{ print $3 }' | tr '\n' -d | sed 's/--/\n/g'
16S_38bf19ee874a379135ef530d49b7c17b.fas
rpoB_345e7a1fe323e9dda81cc28714e31a3c.fas
recA_e73466028f65c4ee8cd3b836183bb9b0.fas
gyrB_ba2146a65bd9ab3898850322c103db70.fas
atpD_e7fbda600ef46a45adbfafdb4b2aa415.fas
❯ ls queries/ -1
 16S_38bf19ee874a379135ef530d49b7c17b.fas
 atpD_e7fbda600ef46a45adbfafdb4b2aa415.fas
 gyrB_ba2146a65bd9ab3898850322c103db70.fas
 recA_e73466028f65c4ee8cd3b836183bb9b0.fas
 rpoB_345e7a1fe323e9dda81cc28714e31a3c.fas

I can't figure out if the K599 fna files are the same because I don't print out the hash values of each contig in the logs.

from automlsa2.

pavlo888 avatar pavlo888 commented on July 2, 2024

Hi @davised

Thanks a lot for checking my issue!

I am surprised you found the target genes when you ran the analysis. Are you working in a linux environment or in Mac one? I am running in a Mac environment. Perhaps I am missing one package or something?

Do you have any idea what could be the issue on my side?

Cheers,
PAblo

from automlsa2.

davised avatar davised commented on July 2, 2024

mac vs linux should not be affecting this.

I will update my blast binaries to see if that would affect it, but it should not.

If you tar the blast results and send them to me I can take a look at those (the blast folder).

from automlsa2.

davised avatar davised commented on July 2, 2024

I updated my binaries and everything looks good still. So I don't believe it's a program mismatch issue.

from automlsa2.

pavlo888 avatar pavlo888 commented on July 2, 2024

Hi @davised

I am sending you the blast results. The assembly I am having issues with (the K599) is GCF_002005205.

blast.zip

from automlsa2.

davised avatar davised commented on July 2, 2024

Can you download the latest assembly from here:

https://www.ncbi.nlm.nih.gov/assembly/GCF_002005205.3/

ftp.ncbi.nlm.nih.gov/genomes/all/GCA/002/005/205/GCA_002005205.3_ASM200520v3/GCA_002005205.3_ASM200520v3_genomic.fna.gz

and give it another shot?

gyrB and atpD are not being found by blast in that assembly, so I assume that assembly is incomplete in some way. This doesn't appear to be an issue with automlsa2.

atpD_e7fbda600ef46a45adbfafdb4b2aa415_vs_GCF_002005205.fasta.tab:# BLASTN 2.12.0+
atpD_e7fbda600ef46a45adbfafdb4b2aa415_vs_GCF_002005205.fasta.tab:# Query: atpD
atpD_e7fbda600ef46a45adbfafdb4b2aa415_vs_GCF_002005205.fasta.tab:# Database: /Users/pablo/Downloads/K599_mlsa_test2_K599_markers/fasta/GCF_002005205.fasta
atpD_e7fbda600ef46a45adbfafdb4b2aa415_vs_GCF_002005205.fasta.tab:# 0 hits found
atpD_e7fbda600ef46a45adbfafdb4b2aa415_vs_GCF_002005205.fasta.tab:# BLAST processed 1 queries
gyrB_ba2146a65bd9ab3898850322c103db70_vs_GCF_002005205.fasta.tab:# BLASTN 2.12.0+
gyrB_ba2146a65bd9ab3898850322c103db70_vs_GCF_002005205.fasta.tab:# Query: gyrB
gyrB_ba2146a65bd9ab3898850322c103db70_vs_GCF_002005205.fasta.tab:# Database: /Users/pablo/Downloads/K599_mlsa_test2_K599_markers/fasta/GCF_002005205.fasta
gyrB_ba2146a65bd9ab3898850322c103db70_vs_GCF_002005205.fasta.tab:# 0 hits found
gyrB_ba2146a65bd9ab3898850322c103db70_vs_GCF_002005205.fasta.tab:# BLAST processed 1 queries

from automlsa2.

Related Issues (5)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.