GithubHelp home page GithubHelp logo

Critical DB issue about anvio HOT 8 CLOSED

merenlab avatar merenlab commented on July 26, 2024
Critical DB issue

from anvio.

Comments (8)

annaship avatar annaship commented on July 26, 2024

The query is "create table..." A

-----Original Message-----
From: "A. Murat Eren" [email protected]
Sent: ‎2/‎8/‎2015 12:17 AM
To: "meren/PaPi" [email protected]
Subject: [PaPi] Critical DB issue (#33)

Well, I am getting this error with large files that do not happen with smaller ones.
This may be a db.commit() issue. Needs to be checked properly:
meren SSH://MBL /workspace/shared/tom/Infant-gut-FASTA-files $ papi-populate-search-table Infant-gut-assembly-1kb.fa Infant-gut-assembly-1kb.db -L 20000
Database .....................................: A new database, Infant-gut-assembly-1kb.db, has been created.
Split length .................................: 20000
HMM profiles .................................: 3 sources have been loaded: Dupont_et_al (111 genes), Campbell_et_al (139 genes), Wu_et_al (31 genes)

Finding ORFs in contigs

Genes ........................................: /tmp/tmpqjiEz2/contigs.genes
Proteins .....................................: /tmp/tmpqjiEz2/contigs.proteins
Log file .....................................: /tmp/tmpqjiEz2/00_log.txt

HMM Profiling for Dupont_et_al

Reference ....................................: Dupont et al, http://www.nature.com/ismej/journal/v6/n6/full/ismej2011189a.html
Pfam model ...................................: /groups/merenlab/PaPi/PaPi/data/hmm/Dupont_et_al/genes.hmm.gz
Number of genes ..............................: 111
Temporary work dir ...........................: /tmp/tmpYslaQa
HMM scan output ..............................: /tmp/tmpYslaQa/hmm.output
HMM scan hits ................................: /tmp/tmpYslaQa/hmm.hits
Log file .....................................: /tmp/tmpYslaQa/00_log.txt
Number of raw hits ...........................: 3,945

HMM Profiling for Campbell_et_al

Reference ....................................: Campbell et al, http://www.pnas.org/content/110/14/5540.short
Pfam model ...................................: /groups/merenlab/PaPi/PaPi/data/hmm/Campbell_et_al/genes.hmm.gz
Number of genes ..............................: 139
Temporary work dir ...........................: /tmp/tmpI3mhZw
HMM scan output ..............................: /tmp/tmpI3mhZw/hmm.output
HMM scan hits ................................: /tmp/tmpI3mhZw/hmm.hits
Log file .....................................: /tmp/tmpI3mhZw/00_log.txt
Number of raw hits ...........................: 2,364

HMM Profiling for Wu_et_al

Reference ....................................: Wu et al, http://genomebiology.com/2008/9/10/R151
Pfam model ...................................: /groups/merenlab/PaPi/PaPi/data/hmm/Wu_et_al/genes.hmm.gz
Number of genes ..............................: 31
Temporary work dir ...........................: /tmp/tmpz_f6JE
HMM scan output ..............................: /tmp/tmpz_f6JE/hmm.output
HMM scan hits ................................: /tmp/tmpz_f6JE/hmm.hits
Log file .....................................: /tmp/tmpz_f6JE/00_log.txt
Number of raw hits ...........................: 946
Traceback (most recent call last):
File "/groups/merenlab/PaPi/bin/papi-populate-search-table", line 118, in
main(args)
File "/groups/merenlab/PaPi/bin/papi-populate-search-table", line 78, in main
g.populate_search_tables(annotation_db, sources)
File "/groups/merenlab/PaPi/PaPi/annotation.py", line 179, in populate_search_tables
search_tables.append(source, reference, kind_of_search, all_genes_searched_against, search_results_dict)
File "/groups/merenlab/PaPi/PaPi/annotation.py", line 261, in append
self.db.create_table(self.search_info_table, search_info_table_structure, search_info_table_types)
File "/groups/merenlab/PaPi/PaPi/db.py", line 68, in create_table
self._exec('''CREATE TABLE %s (%s)''' % (table_name, db_fields))
File "/groups/merenlab/PaPi/PaPi/db.py", line 110, in _exec
ret_val = self.cursor.execute(sql_query)
sqlite3.OperationalError: table search_info already exists

Reply to this email directly or view it on GitHub.

from anvio.

annaship avatar annaship commented on July 26, 2024

Oops, sorry. So, I was going to say, the query is "create table" and the error "table ... Exists". You can add "create table if not exists...".

-----Original Message-----
From: "A. Murat Eren" [email protected]
Sent: ‎2/‎8/‎2015 12:17 AM
To: "meren/PaPi" [email protected]
Subject: [PaPi] Critical DB issue (#33)

Well, I am getting this error with large files that do not happen with smaller ones.
This may be a db.commit() issue. Needs to be checked properly:
meren SSH://MBL /workspace/shared/tom/Infant-gut-FASTA-files $ papi-populate-search-table Infant-gut-assembly-1kb.fa Infant-gut-assembly-1kb.db -L 20000
Database .....................................: A new database, Infant-gut-assembly-1kb.db, has been created.
Split length .................................: 20000
HMM profiles .................................: 3 sources have been loaded: Dupont_et_al (111 genes), Campbell_et_al (139 genes), Wu_et_al (31 genes)

Finding ORFs in contigs

Genes ........................................: /tmp/tmpqjiEz2/contigs.genes
Proteins .....................................: /tmp/tmpqjiEz2/contigs.proteins
Log file .....................................: /tmp/tmpqjiEz2/00_log.txt

HMM Profiling for Dupont_et_al

Reference ....................................: Dupont et al, http://www.nature.com/ismej/journal/v6/n6/full/ismej2011189a.html
Pfam model ...................................: /groups/merenlab/PaPi/PaPi/data/hmm/Dupont_et_al/genes.hmm.gz
Number of genes ..............................: 111
Temporary work dir ...........................: /tmp/tmpYslaQa
HMM scan output ..............................: /tmp/tmpYslaQa/hmm.output
HMM scan hits ................................: /tmp/tmpYslaQa/hmm.hits
Log file .....................................: /tmp/tmpYslaQa/00_log.txt
Number of raw hits ...........................: 3,945

HMM Profiling for Campbell_et_al

Reference ....................................: Campbell et al, http://www.pnas.org/content/110/14/5540.short
Pfam model ...................................: /groups/merenlab/PaPi/PaPi/data/hmm/Campbell_et_al/genes.hmm.gz
Number of genes ..............................: 139
Temporary work dir ...........................: /tmp/tmpI3mhZw
HMM scan output ..............................: /tmp/tmpI3mhZw/hmm.output
HMM scan hits ................................: /tmp/tmpI3mhZw/hmm.hits
Log file .....................................: /tmp/tmpI3mhZw/00_log.txt
Number of raw hits ...........................: 2,364

HMM Profiling for Wu_et_al

Reference ....................................: Wu et al, http://genomebiology.com/2008/9/10/R151
Pfam model ...................................: /groups/merenlab/PaPi/PaPi/data/hmm/Wu_et_al/genes.hmm.gz
Number of genes ..............................: 31
Temporary work dir ...........................: /tmp/tmpz_f6JE
HMM scan output ..............................: /tmp/tmpz_f6JE/hmm.output
HMM scan hits ................................: /tmp/tmpz_f6JE/hmm.hits
Log file .....................................: /tmp/tmpz_f6JE/00_log.txt
Number of raw hits ...........................: 946
Traceback (most recent call last):
File "/groups/merenlab/PaPi/bin/papi-populate-search-table", line 118, in
main(args)
File "/groups/merenlab/PaPi/bin/papi-populate-search-table", line 78, in main
g.populate_search_tables(annotation_db, sources)
File "/groups/merenlab/PaPi/PaPi/annotation.py", line 179, in populate_search_tables
search_tables.append(source, reference, kind_of_search, all_genes_searched_against, search_results_dict)
File "/groups/merenlab/PaPi/PaPi/annotation.py", line 261, in append
self.db.create_table(self.search_info_table, search_info_table_structure, search_info_table_types)
File "/groups/merenlab/PaPi/PaPi/db.py", line 68, in create_table
self._exec('''CREATE TABLE %s (%s)''' % (table_name, db_fields))
File "/groups/merenlab/PaPi/PaPi/db.py", line 110, in _exec
ret_val = self.cursor.execute(sql_query)
sqlite3.OperationalError: table search_info already exists

Reply to this email directly or view it on GitHub.

from anvio.

meren avatar meren commented on July 26, 2024

Hi Anna,

Thank you for the comment. But the issue is a bit knottier than that.

The table is there, and the code should never branch into that line. Here where the check is being done:

https://github.com/meren/PaPi/blob/master/PaPi/annotation.py#L260

You can see in the line before, it gets the table names from the database, which is a simple operation:

https://github.com/meren/PaPi/blob/master/PaPi/db.py#L153

This error does not happen with runs that don't take too long. I think database connection times out, and it doesn't return table names properly or something. Because the table is there, it is just the code sometimes, especially during long runs can't recover that information from the database :/

from anvio.

meren avatar meren commented on July 26, 2024

Anna,

Would you say keeping every db operation atomic is a better practice instead of keeping the connection open for very long periods of time?

I am planning to change the database wrapper class to a much more atomic one, so every function will (1) open a connection, (2) query the db and collect results, (3) close connection. I am not sure how much overhead this will bring, but I feel like it can resolve this issue I am dealing with (if this approach is better, I will rely on aspect oriented programming to decorate db functions in https://github.com/meren/PaPi/blob/master/PaPi/db.py).

Best,

from anvio.

annaship avatar annaship commented on July 26, 2024

Meren,

Yes, you check if the table exists with python, maybe to do that within the
db is good too.
That's your error message:

File "/groups/merenlab/PaPi/PaPi/db.py", line 68, in create_table
self._exec('''CREATE TABLE %s (%s)''' % (table_name, db_fields))
File "/groups/merenlab/PaPi/PaPi/db.py", line 110, in _exec
ret_val = self.cursor.execute(sql_query)
sqlite3.OperationalError: table search_info already exists

"OperationalError: table search_info already exists" will be gone if query
is

self._exec('''CREATE TABLE IF NOT EXISTS %s (%s)''' % (table_name,
db_fields))

I'm sure you are right and the real problem is different, but maybe you
will have more clear error message at least.

The second thing is a connection timeout.
You can do as you said and open and close a connection each time. Only
benchmark will tell you if it is much slower. It will be, probably,
especially if without it you can use "prepare". And it will not help if one
query is running too long.

There are several options you can play with.

  1. Increase timeout (
    https://docs.python.org/2/library/sqlite3.html#sqlite3.connect)

"
sqlite3.connect(database[, timeout, detect_types, isolation_level,
check_same_thread, factory, cached_statements]) Opens a connection to the
SQLite database file database. You can use ":memory:" to open a database
connection to a database that resides in RAM instead of on disk. When a
database is accessed by multiple connections, and one of the processes
modifies the database, the SQLite database is locked until that transaction
is committed. The timeout parameter specifies how long the connection
should wait for the lock to go away until raising an exception. The default
for the timeout parameter is 5.0 (five seconds).
"

  1. See how many connections Sqlite can open at the same time. Some info
    here, for example:
    http://stackoverflow.com/questions/3610775/how-many-connections-can-sqlite-3-handle

  2. Use multithreading,
    https://docs.python.org/2/library/sqlite3.html#multithreading

Not sure if it help, I'd like to know more about the db configuration.

Anya

On Sun, Feb 8, 2015 at 2:13 PM, A. Murat Eren [email protected]
wrote:

Anna,

Would you say keeping every db operation atomic is a better practice
instead of keeping the connection open for very long periods of time?

I am planning to change the database wrapper class to a much more atomic
one, every function will (1) open a connection, (2) query the db and
collect results, (3) close connection. I am not sure how much overhead this
will bring, but I feel like it can resolve this issue I am dealing with (if
this approach is better, I will rely on aspect oriented programming to
decorate db functions in
https://github.com/meren/PaPi/blob/master/PaPi/db.py).

Best,


Reply to this email directly or view it on GitHub
#33 (comment).

from anvio.

meren avatar meren commented on July 26, 2024

If you knew about the db configuration you would have decided to come to the lab with a bucket of gasoline and set my computer on fire.

I am changing the classes a bit so the database connection is not made until the data is ready for submission. I will see if this changes anything, but it definitely makes more sense as you pointed out.

Best,

from anvio.

annaship avatar annaship commented on July 26, 2024

:-)

On Sun, Feb 8, 2015 at 3:50 PM, A. Murat Eren [email protected]
wrote:

If you knew about the db configuration you would have decided to come to
the lab with a bucket of gasoline and set my computer on fire.

I am changing the classes a bit so the database connection is not made
until the data is ready for submission. I will see if this changes
anything, but it definitely makes more sense as you pointed out.

Best,


Reply to this email directly or view it on GitHub
#33 (comment).

from anvio.

meren avatar meren commented on July 26, 2024

I just finished testing on the servers, commit 34d7342 worked like magic, so this is resolved! :)

from anvio.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.