GithubHelp home page GithubHelp logo

Comments (8)

meren avatar meren commented on July 26, 2024 1

I think the real solution is to increase the version of contigs-db, and write a migration script that simply removes the SCG taxonomy results :/ it will be annoying to anyone who has been using the main branch, but it will be the most reliable way to fix all future hiccups.

from anvio.

mschecht avatar mschecht commented on July 26, 2024

Good call - does there need to be multiple migration tasks or is this enough to write a new migration script?

from anvio.

meren avatar meren commented on July 26, 2024

Your intuition was right, and a new migration script along with an update in version number was enough. I've made some changes in your version now, please test it, and feel free to migrate it when you're ready :)

from anvio.

mschecht avatar mschecht commented on July 26, 2024

Thanks for the code updates @meren!

Here is the PR: #2226

Here is a successful test I ran with IGD:

cd INFANT-GUT-TUTORIAL

# migrate all external-genomes
for db in `ls additional-files/pangenomics/external-genomes*.db`; do anvi-migrate $db --migrate-safely; done

# run scg-taxonomy
anvi-run-scg-taxonomy -c additional-files/pangenomics/external-genomes/Enterococcus_faecalis_6512.db --num-threads 1

# print table updates
query="SELECT * FROM scg_taxonomy;"
$ sqlite3 additional-files/pangenomics/external-genomes/Enterococcus_faecalis_6512.db "$query"
2066|Ribosomal_L1|CONSENSUS|100.0|Bacteria|Bacillota|Bacilli|Lactobacillales|Enterococcaceae|Enterococcus|Enterococcus faecalis
2464|Ribosomal_L13|CONSENSUS|100.0|Bacteria|Bacillota|Bacilli|Lactobacillales|Enterococcaceae|Enterococcus|Enterococcus faecalis
156|Ribosomal_L14|CONSENSUS|100.0|Bacteria|Bacillota|Bacilli|Lactobacillales|Enterococcaceae|Enterococcus|Enterococcus faecalis
153|Ribosomal_L16|CONSENSUS|100.0|Bacteria|Bacillota|Bacilli|Lactobacillales|Enterococcaceae|Enterococcus|
174|Ribosomal_L17|CONSENSUS|100.0|Bacteria|Bacillota|Bacilli|Lactobacillales|Enterococcaceae|Enterococcus|Enterococcus faecalis
1556|Ribosomal_L19|CONSENSUS|100.0|Bacteria|Bacillota|Bacilli|Lactobacillales|Enterococcaceae|Enterococcus|
633|Ribosomal_L20|CONSENSUS|100.0|Bacteria|Bacillota|Bacilli|Lactobacillales|Enterococcaceae|Enterococcus|Enterococcus faecalis
692|Ribosomal_L21p|CONSENSUS|100.0|Bacteria|Bacillota|Bacilli|Lactobacillales|Enterococcaceae|Enterococcus|Enterococcus faecalis
151|Ribosomal_L22|CONSENSUS|100.0|Bacteria|Bacillota|Bacilli|Lactobacillales|Enterococcaceae|Enterococcus|Enterococcus faecalis
158|Ribosomal_L5|CONSENSUS|100.0|Bacteria|Bacillota|Bacilli|Lactobacillales|Enterococcaceae|Enterococcus|Enterococcus faecalis
149|Ribosomal_L2|CONSENSUS|100.0|Bacteria|Bacillota|Bacilli|Lactobacillales|Enterococcaceae|Enterococcus|Enterococcus faecalis
166|Ribosomal_L27A|CONSENSUS|100.0|Bacteria|Bacillota|Bacilli|Lactobacillales|Enterococcaceae|Enterococcus|Enterococcus faecalis
146|Ribosomal_L3|CONSENSUS|100.0|Bacteria|Bacillota|Bacilli|Lactobacillales|Enterococcaceae|Enterococcus|Enterococcus faecalis
147|Ribosomal_L4|CONSENSUS|100.0|Bacteria|Bacillota|Bacilli|Lactobacillales|Enterococcaceae|Enterococcus|Enterococcus faecalis
172|Ribosomal_S11|CONSENSUS|100.0|Bacteria|Bacillota|Bacilli|Lactobacillales|Enterococcaceae|Enterococcus|Enterococcus faecalis
2321|Ribosomal_S15|CONSENSUS|100.0|Bacteria|Bacillota|Bacilli|Lactobacillales|Enterococcaceae|Enterococcus|Enterococcus faecalis
1400|Ribosomal_S16|CONSENSUS|100.0|Bacteria|Bacillota|Bacilli|Lactobacillales|Enterococcaceae|Enterococcus|Enterococcus faecalis
1818|Ribosomal_S2|CONSENSUS|100.0|Bacteria|Bacillota|Bacilli|Lactobacillales|Enterococcaceae|Enterococcus|Enterococcus faecalis
6|Ribosomal_S6|CONSENSUS|100.0|Bacteria|Bacillota|Bacilli|Lactobacillales|Enterococcaceae|Enterococcus|Enterococcus faecalis
140|Ribosomal_S7|CONSENSUS|100.0|Bacteria|Bacillota|Bacilli|Lactobacillales|Enterococcaceae|Enterococcus|Enterococcus faecalis
160|Ribosomal_S8|CONSENSUS|100.0|Bacteria|Bacillota|Bacilli|Lactobacillales|Enterococcaceae|Enterococcus|Enterococcus faecalis
2463|Ribosomal_S9|CONSENSUS|100.0|Bacteria|Bacillota|Bacilli|Lactobacillales|Enterococcaceae|Enterococcus|Enterococcus faecalis

# Get updated values
db_file="additional-files/pangenomics/external-genomes/Enterococcus_faecalis_6512.db"
key_to_filter="scg_taxonomy_was_run"
query="SELECT * FROM self WHERE key = '$key_to_filter';"
$ sqlite3 "$db_file" "$query"
scg_taxonomy_was_run|1

key_to_filter="scg_taxonomy_database_version"
query="SELECT * FROM self WHERE key = '$key_to_filter';"
$ sqlite3 "$db_file" "$query"
scg_taxonomy_database_version|GTDB: v214.1; Anvi'o: v1

Are self values look like they are updating properly :)

from anvio.

mschecht avatar mschecht commented on July 26, 2024

Ok I found an edge case!

Here is an example of the problem:

test_db.zip

unzip test_db.zip

cd test_db

anvi-script-gen-genomes-file --input-dir . -o external-genomes.txt

$ anvi-estimate-scg-taxonomy -M external-genomes.txt \
                           --metagenome-mode \
                           --scg-name-for-metagenome-mode Ribosomal_L19 \
                           --raw-output \
                           -O asdf

Traceback (most recent call last):
  File "/Users/mschechter/github/anvio/bin/anvi-estimate-scg-taxonomy", line 104, in <module>
    main(args)
  File "/Users/mschechter/github/anvio/anvio/terminal.py", line 915, in wrapper
    program_method(*args, **kwargs)
  File "/Users/mschechter/github/anvio/bin/anvi-estimate-scg-taxonomy", line 39, in main
    t.estimate()
  File "/Users/mschechter/github/anvio/anvio/taxonomyops/scg.py", line 447, in estimate
    scg_taxonomy_super_dict_multi = self.get_scg_taxonomy_super_dict_for_metagenomes()
  File "/Users/mschechter/github/anvio/anvio/taxonomyops/scg.py", line 937, in get_scg_taxonomy_super_dict_for_metagenomes
    scg_taxonomy_super_dict[metagenome_name] = SCGTaxonomyEstimatorSingle(args, progress=progress_quiet, run=run_quiet).get_items_taxonomy_super_dict()
  File "/Users/mschechter/github/anvio/anvio/taxonomyops/scg.py", line 999, in __init__
    TaxonomyEstimatorSingle.__init__(self, skip_init=skip_init)
  File "/Users/mschechter/github/anvio/anvio/taxonomyops/__init__.py", line 175, in __init__
    self.init()
  File "/Users/mschechter/github/anvio/anvio/taxonomyops/__init__.py", line 179, in init
    self.init_items_data()
  File "/Users/mschechter/github/anvio/anvio/taxonomyops/__init__.py", line 269, in init_items_data
    self.item_name_to_gene_caller_id_dict[item_gene_name].add(gene_callers_id)
KeyError: 'Ribosomal_L9_C'

I think this problem is caused by if you run anvi-run-scg-taxonomy on a contigs-db that does not contain any of the new list of SCGs it will exit out as shown here.

Here are two solutions and I don't know which way to go considering we already merged the migration script:

  1. The migration needs to clear all scg_taxonomy tables regardless of whether the new version was ran.
  2. anvi-run-scg-taxonomy should also remove any contents that might still be in the scg_taxonomy table IF it doesn't find any new SCG here

from anvio.

meren avatar meren commented on July 26, 2024

This should have never happened, @mschecht. Could this be a contigs-db you migrated with the previous version of the migration script?

It has the following SCG in the table:

image

Even though Ribosomal_L9_C is not one of the SCGs we're using:

default_scgs_for_taxonomy = ['Ribosomal_L1',
                             'Ribosomal_L13',
                             'Ribosomal_L14',
                             'Ribosomal_L16',
                             'Ribosomal_L17',
                             'Ribosomal_L19',
                             'Ribosomal_L2',
                             'Ribosomal_L20',
                             'Ribosomal_L21p',
                             'Ribosomal_L22',
                             'Ribosomal_L27A',
                             'Ribosomal_L3',
                             'Ribosomal_L4',
                             'Ribosomal_L5',
                             'Ribosomal_S11',
                             'Ribosomal_S15',
                             'Ribosomal_S16',
                             'Ribosomal_S2',
                             'Ribosomal_S6',
                             'Ribosomal_S7',
                             'Ribosomal_S8',
                             'Ribosomal_S9']

AND even though the version of the DB shows that it is new:

scg_taxonomy_database_version ................: GTDB: v214.1; Anvi'o: v1

When I manually change the self table to revert this contigs-db back to v22 and set the scg_taxonomy_database_version to v214.1, then everything works out as expected:

:: anvi'o v7 dev ::  ~/Downloads/test_db >>> anvi-migrate ORAL_P-D-F_Bin_00037.db --migrate-quickly

NEW MIGRATION TASK
===============================================
Input file path ..............................: ORAL_P-D-F_Bin_00037.db
Input file type ..............................: contigs
Current Version ..............................: 22
Target Version ...............................: 23
Migration mode ...............................: Adventurous

SQLite Version ...............................: 3.41.2


* Your contigs database is now version 23. Sadly this update removed all SCG
  taxonomy data in this contigs-db due to a change in the set of SCGs anvi'o now
  uses for taxonomy estimation. As a result, you will need to re-run anvi-run-
  scg-taxonomy command on this contigs-db :/ If you would like to learn why this
  was necessary, please visit https://github.com/merenlab/anvio/issues/2211. We
  thank you for your patience!

 :: anvi'o v7 dev ::  ~/Downloads/test_db >>> anvi-estimate-scg-taxonomy -c ORAL_P-D-F_Bin_00037.db


Config Error: It seems the SCG taxonomy tables were not populated in this contigs database :/
              Luckily it is easy to fix that. Please see the program `anvi-run-scg-taxonomy`.


 :: anvi'o v7 dev ::  ~/Downloads/test_db >>> anvi-run-scg-taxonomy -c ORAL_P-D-F_Bin_00037.db

WARNING
===============================================
This contigs database contains no single-copy core gene sequences that are used
by the anvi'o taxonomy headquarters in Lausanne. Somewhat disappointing but
totally OK.

 :: anvi'o v7 dev ::  ~/Downloads/test_db >>> anvi-estimate-scg-taxonomy -c ORAL_P-D-F_Bin_00037.db
Contigs DB ...................................: ORAL_P-D-F_Bin_00037.db
Metagenome mode ..............................: False

Estimated taxonomy for "ORAL_P-D-F_Bin_00037"
===============================================
+----------------------+--------------+-------------------+------------------+
|                      |   total_scgs |   supporting_scgs | taxonomy         |
+======================+==============+===================+==================+
| ORAL_P-D-F_Bin_00037 |            0 |                 0 | /  /  /  /  /  / |
+----------------------+--------------+-------------------+------------------+

from anvio.

meren avatar meren commented on July 26, 2024

(So this is not an edge case others will run into -- it is just a Frankenstein contigs-db that was updated with an earlier version of the migration script before it was merged to master, OR something else similar to that :))

from anvio.

mschecht avatar mschecht commented on July 26, 2024

Thanks for looking into this @meren! It must have been an artifact I introduced while developing. Glad it won't impact anyone!

from anvio.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.