Comments (8)
The query is "create table..." A
-----Original Message-----
From: "A. Murat Eren" [email protected]
Sent: 2/8/2015 12:17 AM
To: "meren/PaPi" [email protected]
Subject: [PaPi] Critical DB issue (#33)
Well, I am getting this error with large files that do not happen with smaller ones.
This may be a db.commit() issue. Needs to be checked properly:
meren SSH://MBL /workspace/shared/tom/Infant-gut-FASTA-files $ papi-populate-search-table Infant-gut-assembly-1kb.fa Infant-gut-assembly-1kb.db -L 20000
Database .....................................: A new database, Infant-gut-assembly-1kb.db, has been created.
Split length .................................: 20000
HMM profiles .................................: 3 sources have been loaded: Dupont_et_al (111 genes), Campbell_et_al (139 genes), Wu_et_al (31 genes)
Finding ORFs in contigs
Genes ........................................: /tmp/tmpqjiEz2/contigs.genes
Proteins .....................................: /tmp/tmpqjiEz2/contigs.proteins
Log file .....................................: /tmp/tmpqjiEz2/00_log.txt
HMM Profiling for Dupont_et_al
Reference ....................................: Dupont et al, http://www.nature.com/ismej/journal/v6/n6/full/ismej2011189a.html
Pfam model ...................................: /groups/merenlab/PaPi/PaPi/data/hmm/Dupont_et_al/genes.hmm.gz
Number of genes ..............................: 111
Temporary work dir ...........................: /tmp/tmpYslaQa
HMM scan output ..............................: /tmp/tmpYslaQa/hmm.output
HMM scan hits ................................: /tmp/tmpYslaQa/hmm.hits
Log file .....................................: /tmp/tmpYslaQa/00_log.txt
Number of raw hits ...........................: 3,945
HMM Profiling for Campbell_et_al
Reference ....................................: Campbell et al, http://www.pnas.org/content/110/14/5540.short
Pfam model ...................................: /groups/merenlab/PaPi/PaPi/data/hmm/Campbell_et_al/genes.hmm.gz
Number of genes ..............................: 139
Temporary work dir ...........................: /tmp/tmpI3mhZw
HMM scan output ..............................: /tmp/tmpI3mhZw/hmm.output
HMM scan hits ................................: /tmp/tmpI3mhZw/hmm.hits
Log file .....................................: /tmp/tmpI3mhZw/00_log.txt
Number of raw hits ...........................: 2,364
HMM Profiling for Wu_et_al
Reference ....................................: Wu et al, http://genomebiology.com/2008/9/10/R151
Pfam model ...................................: /groups/merenlab/PaPi/PaPi/data/hmm/Wu_et_al/genes.hmm.gz
Number of genes ..............................: 31
Temporary work dir ...........................: /tmp/tmpz_f6JE
HMM scan output ..............................: /tmp/tmpz_f6JE/hmm.output
HMM scan hits ................................: /tmp/tmpz_f6JE/hmm.hits
Log file .....................................: /tmp/tmpz_f6JE/00_log.txt
Number of raw hits ...........................: 946
Traceback (most recent call last):
File "/groups/merenlab/PaPi/bin/papi-populate-search-table", line 118, in
main(args)
File "/groups/merenlab/PaPi/bin/papi-populate-search-table", line 78, in main
g.populate_search_tables(annotation_db, sources)
File "/groups/merenlab/PaPi/PaPi/annotation.py", line 179, in populate_search_tables
search_tables.append(source, reference, kind_of_search, all_genes_searched_against, search_results_dict)
File "/groups/merenlab/PaPi/PaPi/annotation.py", line 261, in append
self.db.create_table(self.search_info_table, search_info_table_structure, search_info_table_types)
File "/groups/merenlab/PaPi/PaPi/db.py", line 68, in create_table
self._exec('''CREATE TABLE %s (%s)''' % (table_name, db_fields))
File "/groups/merenlab/PaPi/PaPi/db.py", line 110, in _exec
ret_val = self.cursor.execute(sql_query)
sqlite3.OperationalError: table search_info already exists
—
Reply to this email directly or view it on GitHub.
from anvio.
Oops, sorry. So, I was going to say, the query is "create table" and the error "table ... Exists". You can add "create table if not exists...".
-----Original Message-----
From: "A. Murat Eren" [email protected]
Sent: 2/8/2015 12:17 AM
To: "meren/PaPi" [email protected]
Subject: [PaPi] Critical DB issue (#33)
Well, I am getting this error with large files that do not happen with smaller ones.
This may be a db.commit() issue. Needs to be checked properly:
meren SSH://MBL /workspace/shared/tom/Infant-gut-FASTA-files $ papi-populate-search-table Infant-gut-assembly-1kb.fa Infant-gut-assembly-1kb.db -L 20000
Database .....................................: A new database, Infant-gut-assembly-1kb.db, has been created.
Split length .................................: 20000
HMM profiles .................................: 3 sources have been loaded: Dupont_et_al (111 genes), Campbell_et_al (139 genes), Wu_et_al (31 genes)
Finding ORFs in contigs
Genes ........................................: /tmp/tmpqjiEz2/contigs.genes
Proteins .....................................: /tmp/tmpqjiEz2/contigs.proteins
Log file .....................................: /tmp/tmpqjiEz2/00_log.txt
HMM Profiling for Dupont_et_al
Reference ....................................: Dupont et al, http://www.nature.com/ismej/journal/v6/n6/full/ismej2011189a.html
Pfam model ...................................: /groups/merenlab/PaPi/PaPi/data/hmm/Dupont_et_al/genes.hmm.gz
Number of genes ..............................: 111
Temporary work dir ...........................: /tmp/tmpYslaQa
HMM scan output ..............................: /tmp/tmpYslaQa/hmm.output
HMM scan hits ................................: /tmp/tmpYslaQa/hmm.hits
Log file .....................................: /tmp/tmpYslaQa/00_log.txt
Number of raw hits ...........................: 3,945
HMM Profiling for Campbell_et_al
Reference ....................................: Campbell et al, http://www.pnas.org/content/110/14/5540.short
Pfam model ...................................: /groups/merenlab/PaPi/PaPi/data/hmm/Campbell_et_al/genes.hmm.gz
Number of genes ..............................: 139
Temporary work dir ...........................: /tmp/tmpI3mhZw
HMM scan output ..............................: /tmp/tmpI3mhZw/hmm.output
HMM scan hits ................................: /tmp/tmpI3mhZw/hmm.hits
Log file .....................................: /tmp/tmpI3mhZw/00_log.txt
Number of raw hits ...........................: 2,364
HMM Profiling for Wu_et_al
Reference ....................................: Wu et al, http://genomebiology.com/2008/9/10/R151
Pfam model ...................................: /groups/merenlab/PaPi/PaPi/data/hmm/Wu_et_al/genes.hmm.gz
Number of genes ..............................: 31
Temporary work dir ...........................: /tmp/tmpz_f6JE
HMM scan output ..............................: /tmp/tmpz_f6JE/hmm.output
HMM scan hits ................................: /tmp/tmpz_f6JE/hmm.hits
Log file .....................................: /tmp/tmpz_f6JE/00_log.txt
Number of raw hits ...........................: 946
Traceback (most recent call last):
File "/groups/merenlab/PaPi/bin/papi-populate-search-table", line 118, in
main(args)
File "/groups/merenlab/PaPi/bin/papi-populate-search-table", line 78, in main
g.populate_search_tables(annotation_db, sources)
File "/groups/merenlab/PaPi/PaPi/annotation.py", line 179, in populate_search_tables
search_tables.append(source, reference, kind_of_search, all_genes_searched_against, search_results_dict)
File "/groups/merenlab/PaPi/PaPi/annotation.py", line 261, in append
self.db.create_table(self.search_info_table, search_info_table_structure, search_info_table_types)
File "/groups/merenlab/PaPi/PaPi/db.py", line 68, in create_table
self._exec('''CREATE TABLE %s (%s)''' % (table_name, db_fields))
File "/groups/merenlab/PaPi/PaPi/db.py", line 110, in _exec
ret_val = self.cursor.execute(sql_query)
sqlite3.OperationalError: table search_info already exists
—
Reply to this email directly or view it on GitHub.
from anvio.
Hi Anna,
Thank you for the comment. But the issue is a bit knottier than that.
The table is there, and the code should never branch into that line. Here where the check is being done:
https://github.com/meren/PaPi/blob/master/PaPi/annotation.py#L260
You can see in the line before, it gets the table names from the database, which is a simple operation:
https://github.com/meren/PaPi/blob/master/PaPi/db.py#L153
This error does not happen with runs that don't take too long. I think database connection times out, and it doesn't return table names properly or something. Because the table is there, it is just the code sometimes, especially during long runs can't recover that information from the database :/
from anvio.
Anna,
Would you say keeping every db operation atomic is a better practice instead of keeping the connection open for very long periods of time?
I am planning to change the database wrapper class to a much more atomic one, so every function will (1) open a connection, (2) query the db and collect results, (3) close connection. I am not sure how much overhead this will bring, but I feel like it can resolve this issue I am dealing with (if this approach is better, I will rely on aspect oriented programming to decorate db functions in https://github.com/meren/PaPi/blob/master/PaPi/db.py).
Best,
from anvio.
Meren,
Yes, you check if the table exists with python, maybe to do that within the
db is good too.
That's your error message:
File "/groups/merenlab/PaPi/PaPi/db.py", line 68, in create_table
self._exec('''CREATE TABLE %s (%s)''' % (table_name, db_fields))
File "/groups/merenlab/PaPi/PaPi/db.py", line 110, in _exec
ret_val = self.cursor.execute(sql_query)
sqlite3.OperationalError: table search_info already exists
"OperationalError: table search_info already exists" will be gone if query
is
self._exec('''CREATE TABLE IF NOT EXISTS %s (%s)''' % (table_name,
db_fields))
I'm sure you are right and the real problem is different, but maybe you
will have more clear error message at least.
The second thing is a connection timeout.
You can do as you said and open and close a connection each time. Only
benchmark will tell you if it is much slower. It will be, probably,
especially if without it you can use "prepare". And it will not help if one
query is running too long.
There are several options you can play with.
- Increase timeout (
https://docs.python.org/2/library/sqlite3.html#sqlite3.connect)
"
sqlite3.connect(database[, timeout, detect_types, isolation_level,
check_same_thread, factory, cached_statements]) Opens a connection to the
SQLite database file database. You can use ":memory:" to open a database
connection to a database that resides in RAM instead of on disk. When a
database is accessed by multiple connections, and one of the processes
modifies the database, the SQLite database is locked until that transaction
is committed. The timeout parameter specifies how long the connection
should wait for the lock to go away until raising an exception. The default
for the timeout parameter is 5.0 (five seconds).
"
-
See how many connections Sqlite can open at the same time. Some info
here, for example:
http://stackoverflow.com/questions/3610775/how-many-connections-can-sqlite-3-handle -
Use multithreading,
https://docs.python.org/2/library/sqlite3.html#multithreading
Not sure if it help, I'd like to know more about the db configuration.
Anya
On Sun, Feb 8, 2015 at 2:13 PM, A. Murat Eren [email protected]
wrote:
Anna,
Would you say keeping every db operation atomic is a better practice
instead of keeping the connection open for very long periods of time?I am planning to change the database wrapper class to a much more atomic
one, every function will (1) open a connection, (2) query the db and
collect results, (3) close connection. I am not sure how much overhead this
will bring, but I feel like it can resolve this issue I am dealing with (if
this approach is better, I will rely on aspect oriented programming to
decorate db functions in
https://github.com/meren/PaPi/blob/master/PaPi/db.py).Best,
—
Reply to this email directly or view it on GitHub
#33 (comment).
from anvio.
If you knew about the db configuration you would have decided to come to the lab with a bucket of gasoline and set my computer on fire.
I am changing the classes a bit so the database connection is not made until the data is ready for submission. I will see if this changes anything, but it definitely makes more sense as you pointed out.
Best,
from anvio.
:-)
On Sun, Feb 8, 2015 at 3:50 PM, A. Murat Eren [email protected]
wrote:
If you knew about the db configuration you would have decided to come to
the lab with a bucket of gasoline and set my computer on fire.I am changing the classes a bit so the database connection is not made
until the data is ready for submission. I will see if this changes
anything, but it definitely makes more sense as you pointed out.Best,
—
Reply to this email directly or view it on GitHub
#33 (comment).
from anvio.
I just finished testing on the servers, commit 34d7342 worked like magic, so this is resolved! :)
from anvio.
Related Issues (20)
- [BUG] Incorrect (inflated) stepwise copy number for modules with complex parenthetical clauses HOT 1
- external gene call file created from more than two contigs HOT 5
- [BUG] anvi-estimate-scg-taxonomy crashes when external-genomes file contains contigs-db with old scg names HOT 8
- [BUG] anvi-pan-genome DIAMOND expected output files are missing
- [BUG] Memory issues with `anvi-estimate-metabolism` on large metagenomes in `--metagenome-mode` HOT 4
- [DISCUSSION] Common misconceptions and mistakes for anvi'o beginners that we should clarify in a blog post HOT 5
- [BUG] 'NumGenomesEstimator' sometimes double-counts some genomes as both Bacteria and Archaea HOT 8
- [BUG] Eggnog mapper not running in workflow, works fine manually in same conda env HOT 1
- [BUG] Key Error when running anvi-metabolism HOT 7
- [Bug] external-genomes.txt missing genomes HOT 8
- [BUG] Config Error: Drivers::Muscle: anvio-8 HOT 7
- [BUG] Parent layer bug HOT 1
- [BUG] anvi-setup-ncbi-cogs urlopen error HOT 6
- [FEATURE REQUEST] Interface option to keep contigs together when binning HOT 2
- anvi-display-functions, KOfam, sugar HOT 1
- [BUG] anvi-interactive is not working in my computer HOT 2
- [BUG] The interactive interface does not care about layer orders stored in the state HOT 1
- Config Error: There is nothing at '/mnt/d/bioinformatics/phlogytree/M_ZM43__SPLIT/ZM43_CLEAN/CONTIGS.db' :/ HOT 1
- Config Error: There is nothing at '/mnt/d/bioinformatics/phlogytree/M_ZM43__SPLIT/ZM43_CLEAN/CONTIGS.db' :/ HOT 2
- [BUG] anvi-run-kegg-kofams terminated prematurely HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from anvio.