Comments (8)
I think the real solution is to increase the version of contigs-db, and write a migration script that simply removes the SCG taxonomy results :/ it will be annoying to anyone who has been using the main branch, but it will be the most reliable way to fix all future hiccups.
from anvio.
Good call - does there need to be multiple migration tasks or is this enough to write a new migration script?
from anvio.
Your intuition was right, and a new migration script along with an update in version number was enough. I've made some changes in your version now, please test it, and feel free to migrate it when you're ready :)
from anvio.
Thanks for the code updates @meren!
Here is the PR: #2226
Here is a successful test I ran with IGD:
cd INFANT-GUT-TUTORIAL
# migrate all external-genomes
for db in `ls additional-files/pangenomics/external-genomes*.db`; do anvi-migrate $db --migrate-safely; done
# run scg-taxonomy
anvi-run-scg-taxonomy -c additional-files/pangenomics/external-genomes/Enterococcus_faecalis_6512.db --num-threads 1
# print table updates
query="SELECT * FROM scg_taxonomy;"
$ sqlite3 additional-files/pangenomics/external-genomes/Enterococcus_faecalis_6512.db "$query"
2066|Ribosomal_L1|CONSENSUS|100.0|Bacteria|Bacillota|Bacilli|Lactobacillales|Enterococcaceae|Enterococcus|Enterococcus faecalis
2464|Ribosomal_L13|CONSENSUS|100.0|Bacteria|Bacillota|Bacilli|Lactobacillales|Enterococcaceae|Enterococcus|Enterococcus faecalis
156|Ribosomal_L14|CONSENSUS|100.0|Bacteria|Bacillota|Bacilli|Lactobacillales|Enterococcaceae|Enterococcus|Enterococcus faecalis
153|Ribosomal_L16|CONSENSUS|100.0|Bacteria|Bacillota|Bacilli|Lactobacillales|Enterococcaceae|Enterococcus|
174|Ribosomal_L17|CONSENSUS|100.0|Bacteria|Bacillota|Bacilli|Lactobacillales|Enterococcaceae|Enterococcus|Enterococcus faecalis
1556|Ribosomal_L19|CONSENSUS|100.0|Bacteria|Bacillota|Bacilli|Lactobacillales|Enterococcaceae|Enterococcus|
633|Ribosomal_L20|CONSENSUS|100.0|Bacteria|Bacillota|Bacilli|Lactobacillales|Enterococcaceae|Enterococcus|Enterococcus faecalis
692|Ribosomal_L21p|CONSENSUS|100.0|Bacteria|Bacillota|Bacilli|Lactobacillales|Enterococcaceae|Enterococcus|Enterococcus faecalis
151|Ribosomal_L22|CONSENSUS|100.0|Bacteria|Bacillota|Bacilli|Lactobacillales|Enterococcaceae|Enterococcus|Enterococcus faecalis
158|Ribosomal_L5|CONSENSUS|100.0|Bacteria|Bacillota|Bacilli|Lactobacillales|Enterococcaceae|Enterococcus|Enterococcus faecalis
149|Ribosomal_L2|CONSENSUS|100.0|Bacteria|Bacillota|Bacilli|Lactobacillales|Enterococcaceae|Enterococcus|Enterococcus faecalis
166|Ribosomal_L27A|CONSENSUS|100.0|Bacteria|Bacillota|Bacilli|Lactobacillales|Enterococcaceae|Enterococcus|Enterococcus faecalis
146|Ribosomal_L3|CONSENSUS|100.0|Bacteria|Bacillota|Bacilli|Lactobacillales|Enterococcaceae|Enterococcus|Enterococcus faecalis
147|Ribosomal_L4|CONSENSUS|100.0|Bacteria|Bacillota|Bacilli|Lactobacillales|Enterococcaceae|Enterococcus|Enterococcus faecalis
172|Ribosomal_S11|CONSENSUS|100.0|Bacteria|Bacillota|Bacilli|Lactobacillales|Enterococcaceae|Enterococcus|Enterococcus faecalis
2321|Ribosomal_S15|CONSENSUS|100.0|Bacteria|Bacillota|Bacilli|Lactobacillales|Enterococcaceae|Enterococcus|Enterococcus faecalis
1400|Ribosomal_S16|CONSENSUS|100.0|Bacteria|Bacillota|Bacilli|Lactobacillales|Enterococcaceae|Enterococcus|Enterococcus faecalis
1818|Ribosomal_S2|CONSENSUS|100.0|Bacteria|Bacillota|Bacilli|Lactobacillales|Enterococcaceae|Enterococcus|Enterococcus faecalis
6|Ribosomal_S6|CONSENSUS|100.0|Bacteria|Bacillota|Bacilli|Lactobacillales|Enterococcaceae|Enterococcus|Enterococcus faecalis
140|Ribosomal_S7|CONSENSUS|100.0|Bacteria|Bacillota|Bacilli|Lactobacillales|Enterococcaceae|Enterococcus|Enterococcus faecalis
160|Ribosomal_S8|CONSENSUS|100.0|Bacteria|Bacillota|Bacilli|Lactobacillales|Enterococcaceae|Enterococcus|Enterococcus faecalis
2463|Ribosomal_S9|CONSENSUS|100.0|Bacteria|Bacillota|Bacilli|Lactobacillales|Enterococcaceae|Enterococcus|Enterococcus faecalis
# Get updated values
db_file="additional-files/pangenomics/external-genomes/Enterococcus_faecalis_6512.db"
key_to_filter="scg_taxonomy_was_run"
query="SELECT * FROM self WHERE key = '$key_to_filter';"
$ sqlite3 "$db_file" "$query"
scg_taxonomy_was_run|1
key_to_filter="scg_taxonomy_database_version"
query="SELECT * FROM self WHERE key = '$key_to_filter';"
$ sqlite3 "$db_file" "$query"
scg_taxonomy_database_version|GTDB: v214.1; Anvi'o: v1
Are self values look like they are updating properly :)
from anvio.
Ok I found an edge case!
Here is an example of the problem:
unzip test_db.zip
cd test_db
anvi-script-gen-genomes-file --input-dir . -o external-genomes.txt
$ anvi-estimate-scg-taxonomy -M external-genomes.txt \
--metagenome-mode \
--scg-name-for-metagenome-mode Ribosomal_L19 \
--raw-output \
-O asdf
Traceback (most recent call last):
File "/Users/mschechter/github/anvio/bin/anvi-estimate-scg-taxonomy", line 104, in <module>
main(args)
File "/Users/mschechter/github/anvio/anvio/terminal.py", line 915, in wrapper
program_method(*args, **kwargs)
File "/Users/mschechter/github/anvio/bin/anvi-estimate-scg-taxonomy", line 39, in main
t.estimate()
File "/Users/mschechter/github/anvio/anvio/taxonomyops/scg.py", line 447, in estimate
scg_taxonomy_super_dict_multi = self.get_scg_taxonomy_super_dict_for_metagenomes()
File "/Users/mschechter/github/anvio/anvio/taxonomyops/scg.py", line 937, in get_scg_taxonomy_super_dict_for_metagenomes
scg_taxonomy_super_dict[metagenome_name] = SCGTaxonomyEstimatorSingle(args, progress=progress_quiet, run=run_quiet).get_items_taxonomy_super_dict()
File "/Users/mschechter/github/anvio/anvio/taxonomyops/scg.py", line 999, in __init__
TaxonomyEstimatorSingle.__init__(self, skip_init=skip_init)
File "/Users/mschechter/github/anvio/anvio/taxonomyops/__init__.py", line 175, in __init__
self.init()
File "/Users/mschechter/github/anvio/anvio/taxonomyops/__init__.py", line 179, in init
self.init_items_data()
File "/Users/mschechter/github/anvio/anvio/taxonomyops/__init__.py", line 269, in init_items_data
self.item_name_to_gene_caller_id_dict[item_gene_name].add(gene_callers_id)
KeyError: 'Ribosomal_L9_C'
I think this problem is caused by if you run anvi-run-scg-taxonomy on a contigs-db that does not contain any of the new list of SCGs it will exit out as shown here.
Here are two solutions and I don't know which way to go considering we already merged the migration script:
- The migration needs to clear all scg_taxonomy tables regardless of whether the new version was ran.
- anvi-run-scg-taxonomy should also remove any contents that might still be in the scg_taxonomy table IF it doesn't find any new SCG here
from anvio.
This should have never happened, @mschecht. Could this be a contigs-db you migrated with the previous version of the migration script?
It has the following SCG in the table:
Even though Ribosomal_L9_C is not one of the SCGs we're using:
default_scgs_for_taxonomy = ['Ribosomal_L1',
'Ribosomal_L13',
'Ribosomal_L14',
'Ribosomal_L16',
'Ribosomal_L17',
'Ribosomal_L19',
'Ribosomal_L2',
'Ribosomal_L20',
'Ribosomal_L21p',
'Ribosomal_L22',
'Ribosomal_L27A',
'Ribosomal_L3',
'Ribosomal_L4',
'Ribosomal_L5',
'Ribosomal_S11',
'Ribosomal_S15',
'Ribosomal_S16',
'Ribosomal_S2',
'Ribosomal_S6',
'Ribosomal_S7',
'Ribosomal_S8',
'Ribosomal_S9']
AND even though the version of the DB shows that it is new:
scg_taxonomy_database_version ................: GTDB: v214.1; Anvi'o: v1
When I manually change the self
table to revert this contigs-db back to v22 and set the scg_taxonomy_database_version
to v214.1
, then everything works out as expected:
:: anvi'o v7 dev :: ~/Downloads/test_db >>> anvi-migrate ORAL_P-D-F_Bin_00037.db --migrate-quickly
NEW MIGRATION TASK
===============================================
Input file path ..............................: ORAL_P-D-F_Bin_00037.db
Input file type ..............................: contigs
Current Version ..............................: 22
Target Version ...............................: 23
Migration mode ...............................: Adventurous
SQLite Version ...............................: 3.41.2
* Your contigs database is now version 23. Sadly this update removed all SCG
taxonomy data in this contigs-db due to a change in the set of SCGs anvi'o now
uses for taxonomy estimation. As a result, you will need to re-run anvi-run-
scg-taxonomy command on this contigs-db :/ If you would like to learn why this
was necessary, please visit https://github.com/merenlab/anvio/issues/2211. We
thank you for your patience!
:: anvi'o v7 dev :: ~/Downloads/test_db >>> anvi-estimate-scg-taxonomy -c ORAL_P-D-F_Bin_00037.db
Config Error: It seems the SCG taxonomy tables were not populated in this contigs database :/
Luckily it is easy to fix that. Please see the program `anvi-run-scg-taxonomy`.
:: anvi'o v7 dev :: ~/Downloads/test_db >>> anvi-run-scg-taxonomy -c ORAL_P-D-F_Bin_00037.db
WARNING
===============================================
This contigs database contains no single-copy core gene sequences that are used
by the anvi'o taxonomy headquarters in Lausanne. Somewhat disappointing but
totally OK.
:: anvi'o v7 dev :: ~/Downloads/test_db >>> anvi-estimate-scg-taxonomy -c ORAL_P-D-F_Bin_00037.db
Contigs DB ...................................: ORAL_P-D-F_Bin_00037.db
Metagenome mode ..............................: False
Estimated taxonomy for "ORAL_P-D-F_Bin_00037"
===============================================
+----------------------+--------------+-------------------+------------------+
| | total_scgs | supporting_scgs | taxonomy |
+======================+==============+===================+==================+
| ORAL_P-D-F_Bin_00037 | 0 | 0 | / / / / / / |
+----------------------+--------------+-------------------+------------------+
from anvio.
(So this is not an edge case others will run into -- it is just a Frankenstein contigs-db that was updated with an earlier version of the migration script before it was merged to master, OR something else similar to that :))
from anvio.
Thanks for looking into this @meren! It must have been an artifact I introduced while developing. Glad it won't impact anyone!
from anvio.
Related Issues (20)
- [FEATURE REQUEST] metabolic enrichment for enzymes HOT 2
- [BUG] created profile does not include a table named `mean_coverage_Q2Q3_splits` HOT 5
- Trouble installing anvi'o 8 on Apple M1 Pro
- [BUG] KeyError: None in anvi-script-gen-distribution-of-genes-in-a-bin HOT 3
- [BUG] Stop codons appear when anvi-inspect magnification truncates and ORF HOT 3
- Pangenome analysis when dispaly result, can't see the Phylogenetic tree HOT 13
- Config Error: PyANI returned with non-zero exit code, there may be some errors. please check the log file for details. HOT 1
- [FEATURE REQUEST] Color scale(s) in the SVG output HOT 1
- [BUG] is get_enriched_groups(props, reps) working as intended? HOT 4
- Anvi_analyze_synteny ouput visualization HOT 1
- [BUG] Installing Anvio-8 - error failed building wheel for anvio HOT 5
- Anvio installation HOT 1
- [FEATURE REQUEST] MD5 checksum verification for sra-download workflow
- [FEATURE REQUEST] Using a list of gene callers id with anvi-export-functions
- [BUG] anvi-split need `variability_splits` in profile.db HOT 11
- [BUG] Different results when running samples together vs separate HOT 3
- [BUG] pulp v2.8 made API changes that cause errors generating workflow DAG diagrams HOT 1
- [BUG] Can't create phylogenomic tree, as phylogenomic-tree.txt and default collection has different number of items.
- [BUG] Can't create phylogenomic tree, as phylogenomic-tree.txt and default collection has different number of items. HOT 3
- [BUG] Can't visualise the pangenome with average nucleotide identity HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from anvio.