yhat / db.py Goto Github PK

View Code? Open in Web Editor NEW

1.2K 50.0 111.0 14.75 MB

db.py is an easier way to interact with your databases

License: BSD 2-Clause "Simplified" License

Python 98.37% Shell 0.37% Batchfile 1.26%

db.py's Introduction

db.py

What is it
Databases Supported
Features
Quickstart - Installation - Demo
How To
Contributing
TODO

What is it?

db.py is an easier way to interact with your databases. It makes it easier to explore tables, columns, views, etc. It puts the emphasis on user interaction, information display, and providing easy to use helper functions.

db.py uses pandas to manage data, so if you're already using pandas, db.py should feel pretty natural. It's also fully compatible with the IPython Notebook, so not only is db.py extremely functional, it's also pretty.

Blog Post

Databases Supported

PostgreSQL
MySQL
SQLite
Redshift
MS SQL Server
Oracle

`db.py` let's you...

Execute queries

>>> db.query_from_file("myscript.sql")
       _id                    datetime           user_id  n
0  1290000  10/Jun/2014:18:21:27 +0000  0000015b37cd0964  1
1  9120009  23/Jun/2014:02:11:21 +0000  00006e01a6419822  1
2  1683874  23/Jun/2014:02:11:48 +0000  00006e01a6419822  2
3  2562153  23/Jun/2014:02:12:57 +0000  00006e01a6419822  3
4   393019  14/Jun/2014:16:05:18 +0000  000099d569e3a216  1
5  3542568  14/Jun/2014:16:06:02 +0000  000099d569e3a216  2

Fully compatible with predictive type

>>> db.tables.
db.tables.Album          db.tables.Customer       db.tables.Genre          db.tables.InvoiceLine    db.tables.Playlist       db.tables.Track
db.tables.Artist         db.tables.Employee       db.tables.Invoice        db.tables.MediaType      db.tables.PlaylistTrack  db.tables.tables

Friendly displays

>>> db.tables.Track
+-------------------------------------------------------------+
|                            Album                            |
+----------+---------------+-----------------+----------------+
| Column   | Type          | Foreign Keys    | Reference Keys |
+----------+---------------+-----------------+----------------+
| AlbumId  | INTEGER       |                 | Track.AlbumId  |
| Title    | NVARCHAR(160) |                 |                |
| ArtistId | INTEGER       | Artist.ArtistId |                |
+----------+---------------+-----------------+----------------+

Directly integrated with pandas

>>> db.tables.Track.head()
   TrackId                                     Name  AlbumId  MediaTypeId  \
0        1  For Those About To Rock (We Salute You)        1            1
1        2                        Balls to the Wall        2            2
2        3                          Fast As a Shark        3            2
3        4                        Restless and Wild        3            2
4        5                     Princess of the Dawn        3            2
5        6                    Put The Finger On You        1            1

   GenreId                                           Composer  Milliseconds  \
0        1          Angus Young, Malcolm Young, Brian Johnson        343719
1        1                                               None        342562
2        1  F. Baltes, S. Kaufman, U. Dirkscneider & W. Ho...        230619
3        1  F. Baltes, R.A. Smith-Diesel, S. Kaufman, U. D...        252051
4        1                         Deaffy & R.A. Smith-Diesel        375418
5        1          Angus Young, Malcolm Young, Brian Johnson        205662

      Bytes  UnitPrice
0  11170334       0.99
1   5510424       0.99
2   3990994       0.99
3   4331779       0.99
4   6290521       0.99
5   6713451       0.99

Create queries using Handlebars style templates

q = """
SELECT
    '{{ name }}' as table_name, sum(1) as cnt
FROM
    {{ name }}
GROUP BY
    table_name
"""
data = [
  {"name": "Album"},
  {"name": "Artist"},
  {"name": "Track"}
]
db.query(q, data=data)
  table_name   cnt
0      Album   347
1     Artist   275
2      Track  3503

Search your schema

>>> db.find_column("*Id*")
+---------------+---------------+---------+
| Table         |  Column Name  | Type    |
+---------------+---------------+---------+
| Album         |    AlbumId    | INTEGER |
| Album         |    ArtistId   | INTEGER |
| Artist        |    ArtistId   | INTEGER |
| Customer      |  SupportRepId | INTEGER |
| Customer      |   CustomerId  | INTEGER |
| Employee      |   EmployeeId  | INTEGER |
| Genre         |    GenreId    | INTEGER |
| Invoice       |   InvoiceId   | INTEGER |
| Invoice       |   CustomerId  | INTEGER |
| InvoiceLine   |   InvoiceId   | INTEGER |
| InvoiceLine   |    TrackId    | INTEGER |
| InvoiceLine   | InvoiceLineId | INTEGER |
| MediaType     |  MediaTypeId  | INTEGER |
| Playlist      |   PlaylistId  | INTEGER |
| PlaylistTrack |    TrackId    | INTEGER |
| PlaylistTrack |   PlaylistId  | INTEGER |
| Track         |  MediaTypeId  | INTEGER |
| Track         |    TrackId    | INTEGER |
| Track         |    AlbumId    | INTEGER |
| Track         |    GenreId    | INTEGER |
+---------------+---------------+---------+

IPython Notebook friendly

Quickstart

Installation

db.py is on PyPi.

$ pip install db.py

The database libraries being used under the hood are optional dependencies (if you use mysql, you probably don't care about installing psycopg2). Based on the databases you're using, you'll need one (or many) of the following:

PostgreSQL: psycopg2. Windows
Redshift: psycopg2. Redshift is a flavor of PostgreSQL.
MySQL: MySQLdb
SQLite: sqlite3. Should be installed already.
MS SQL: TBD. Suggestions welcome! https://github.com/yhat/db.py/issues

Demo

>>> from db import DemoDB # or connect to your own using DB. see below
>>> db = DemoDB() # comes from: http://chinookdatabase.codeplex.com/
>>> db.tables
+---------------+----------------------------------------------------------------------------------+
| Table         | Columns                                                                          |
+---------------+----------------------------------------------------------------------------------+
| Album         | AlbumId, Title, ArtistId                                                         |
| Artist        | ArtistId, Name                                                                   |
| Customer      | CustomerId, FirstName, LastName, Company, Address, City, State, Country, PostalC |
|               | ode, Phone, Fax, Email, SupportRepId                                             |
| Employee      | EmployeeId, LastName, FirstName, Title, ReportsTo, BirthDate, HireDate, Address, |
|               |  City, State, Country, PostalCode, Phone, Fax, Email                             |
| Genre         | GenreId, Name                                                                    |
| Invoice       | InvoiceId, CustomerId, InvoiceDate, BillingAddress, BillingCity, BillingState, B |
|               | illingCountry, BillingPostalCode, Total                                          |
| InvoiceLine   | InvoiceLineId, InvoiceId, TrackId, UnitPrice, Quantity                           |
| MediaType     | MediaTypeId, Name                                                                |
| Playlist      | PlaylistId, Name                                                                 |
| PlaylistTrack | PlaylistId, TrackId                                                              |
| Track         | TrackId, Name, AlbumId, MediaTypeId, GenreId, Composer, Milliseconds, Bytes, Uni |
|               | tPrice                                                                           |
+---------------+----------------------------------------------------------------------------------+
>>> db.tables.Customer
+------------------------------------------------------------------------+
|                                Customer                                |
+--------------+--------------+---------------------+--------------------+
| Column       | Type         | Foreign Keys        | Reference Keys     |
+--------------+--------------+---------------------+--------------------+
| CustomerId   | INTEGER      |                     | Invoice.CustomerId |
| FirstName    | NVARCHAR(40) |                     |                    |
| LastName     | NVARCHAR(20) |                     |                    |
| Company      | NVARCHAR(80) |                     |                    |
| Address      | NVARCHAR(70) |                     |                    |
| City         | NVARCHAR(40) |                     |                    |
| State        | NVARCHAR(40) |                     |                    |
| Country      | NVARCHAR(40) |                     |                    |
| PostalCode   | NVARCHAR(10) |                     |                    |
| Phone        | NVARCHAR(24) |                     |                    |
| Fax          | NVARCHAR(24) |                     |                    |
| Email        | NVARCHAR(60) |                     |                    |
| SupportRepId | INTEGER      | Employee.EmployeeId |                    |
+--------------+--------------+---------------------+--------------------+
>>> db.tables.Customer.sample()
   CustomerId  FirstName    LastName  \
0           4      Bjørn      Hansen
1          26    Richard  Cunningham
2           1       Luís   Gonçalves
3          21      Kathy       Chase
4           6     Helena        Holý
5          14       Mark     Philips
6          49  Stanisław      Wójcik
7          19        Tim       Goyer
8          45   Ladislav      Kovács
9           8       Daan     Peeters

                                            Company  \
0                                              None
1                                              None
2  Embraer - Empresa Brasileira de Aeronáutica S.A.
3                                              None
4                                              None
5                                             Telus
6                                              None
7                                        Apple Inc.
8                                              None
9                                              None

                           Address                 City State         Country  \
0                 Ullevålsveien 14                 Oslo  None          Norway
1              2211 W Berry Street           Fort Worth    TX             USA
2  Av. Brigadeiro Faria Lima, 2170  São José dos Campos    SP          Brazil
3                 801 W 4th Street                 Reno    NV             USA
4                    Rilská 3174/6               Prague  None  Czech Republic
5                   8210 111 ST NW             Edmonton    AB          Canada
6                     Ordynacka 10               Warsaw  None          Poland
7                  1 Infinite Loop            Cupertino    CA             USA
8                Erzsébet krt. 58.             Budapest  None         Hungary
9                  Grétrystraat 63             Brussels  None         Belgium

  PostalCode               Phone                 Fax  \
0       0171     +47 22 44 22 22                None
1      76110   +1 (817) 924-7272                None
2  12227-000  +55 (12) 3923-5555  +55 (12) 3923-5566
3      89503   +1 (775) 223-7665                None
4      14300    +420 2 4177 0449                None
5    T6G 2C7   +1 (780) 434-4554   +1 (780) 434-5565
6     00-358    +48 22 828 37 39                None
7      95014   +1 (408) 996-1010   +1 (408) 996-1011
8     H-1073                None                None
9       1000    +32 02 219 03 03                None

                      Email  SupportRepId
0     bjorn.hansen@yahoo.no             4
1  ricunningham@hotmail.com             4
2      luisg@embraer.com.br             3
3       kachase@hotmail.com             5
4           hholy@gmail.com             5
5        mphilips12@shaw.ca             5
6    stanisław.wójcik@wp.pl             4
7          tgoyer@apple.com             3
8  ladislav_kovacs@apple.hu             3
9     daan_peeters@apple.be             4
>>> db.find_column("*Name*")
+-----------+-------------+---------------+
| Table     | Column Name | Type          |
+-----------+-------------+---------------+
| Artist    |     Name    | NVARCHAR(120) |
| Customer  |  FirstName  | NVARCHAR(40)  |
| Customer  |   LastName  | NVARCHAR(20)  |
| Employee  |  FirstName  | NVARCHAR(20)  |
| Employee  |   LastName  | NVARCHAR(20)  |
| Genre     |     Name    | NVARCHAR(120) |
| MediaType |     Name    | NVARCHAR(120) |
| Playlist  |     Name    | NVARCHAR(120) |
| Track     |     Name    | NVARCHAR(200) |
+-----------+-------------+---------------+
>>> db.find_table("A*")
+--------+--------------------------+
| Table  | Columns                  |
+--------+--------------------------+
| Album  | AlbumId, Title, ArtistId |
| Artist | ArtistId, Name           |
+--------+--------------------------+
>>> db.query("select * from Artist limit 10;")
   ArtistId                  Name
0         1                 AC/DC
1         2                Accept
2         3             Aerosmith
3         4     Alanis Morissette
4         5       Alice In Chains
5         6  Antônio Carlos Jobim
6         7          Apocalyptica
7         8            Audioslave
8         9              BackBeat
9        10          Billy Cobham

How To

Connecting to a Database

The `DB()` object

Arguments

username: your username
password: your password
hostname: hostname of the database (i.e. localhost, dw.mardukas.com, ec2-54-191-289-254.us-west-2.compute.amazonaws.com)
port: port the database is running on (i.e. 5432)
dbname: name of the database (i.e. hanksdb)
filename: path to sqlite database (i.e. baseball-archive-2012.sqlite, employees.db)
dbtype: type of database you're connecting to (postgres, mysql, sqlite, redshift)
profile: name of the profile you want to use to connect. using this negates the need to specify any other arguments
exclude_system_tables: whether or not to load schema information for internal tables. for example, postgres has a bunch of tables prefixed with pg_ that you probably don't actually care about. on the other had if you're administrating a database, you might want to query these tables
limit: default number of records to return in a query. This is used by the DB.query method. You can override it by adding limit={X} to the query method, or by passing an argument to DB(). None indicates that there will be no limit (That's right, you'll be limitless. Bradley Cooper style.)

>>> from db import DB
>>> db = DB(username="greg", password="secret", hostname="localhost",
            dbtype="postgres")

Saving a profile

>>> from db import DB
>>> db = DB(username="greg", password="secret", hostname="localhost",
            dbtype="postgres")
>>> db.save_credentials() # this will save to "default"
>>> db.save_credentials(profile="local_pg")

Connecting from a profile

>>> from db import DB
>>> db = DB() # this loads "default" profile
>>> db = DB(profile="local_pg")

List your profiles

>>> from db import list_profiles
>>> list_profiles()
{'demo': {u'dbname': None,
  u'dbtype': u'sqlite',
  u'filename': u'/Users/glamp/repos/yhat/opensource/db.py/db/data/chinook.sqlite',
  u'hostname': u'localhost',
  u'password': None,
  u'port': 5432,
  u'username': None},
 'muppets': {u'dbname': u'muppetdb',
  u'dbtype': u'postgres',
  u'filename': None,
  u'hostname': u'muppets.yhathq.com',
  u'password': None,
  u'port': 5432,
  u'username': u'kermit'}}

Remove a profile

>>> remove_profile('demo')

Executing Queries

From a string

>>> df1 = db.query("select * from Artist;")
>>> df2 = db.query("select * from Album;")

From a file

>>> db.query_from_file("myscript.sql")
>>> df = db.query_from_file("myscript.sql")

Searching for Tables and Columns

Tables

>>> db.find_table("A*")
+--------+--------------------------+
| Table  | Columns                  |
+--------+--------------------------+
| Album  | AlbumId, Title, ArtistId |
| Artist | ArtistId, Name           |
+--------+--------------------------+
>>> results = db.find_table("tmp*") # returns all tables prefixed w/ tmp
>>> results = db.find_table("prod_*") # returns all tables prefixed w/ prod_
>>> results = db.find_table("*Invoice*") # returns all tables containing trans
>>> results = db.find_table("*") # returns everything

Columns

>>> db.find_column("Name") # returns all columns named "Name"
+-----------+-------------+---------------+
| Table     | Column Name | Type          |
+-----------+-------------+---------------+
| Artist    |     Name    | NVARCHAR(120) |
| Genre     |     Name    | NVARCHAR(120) |
| MediaType |     Name    | NVARCHAR(120) |
| Playlist  |     Name    | NVARCHAR(120) |
| Track     |     Name    | NVARCHAR(200) |
+-----------+-------------+---------------+
>>> db.find_column("*Id") # returns all columns ending w/ Id
+---------------+---------------+---------+
| Table         |  Column Name  | Type    |
+---------------+---------------+---------+
| Album         |    AlbumId    | INTEGER |
| Album         |    ArtistId   | INTEGER |
| Artist        |    ArtistId   | INTEGER |
| Customer      |  SupportRepId | INTEGER |
| Customer      |   CustomerId  | INTEGER |
| Employee      |   EmployeeId  | INTEGER |
| Genre         |    GenreId    | INTEGER |
| Invoice       |   InvoiceId   | INTEGER |
| Invoice       |   CustomerId  | INTEGER |
| InvoiceLine   |   InvoiceId   | INTEGER |
| InvoiceLine   |    TrackId    | INTEGER |
| InvoiceLine   | InvoiceLineId | INTEGER |
| MediaType     |  MediaTypeId  | INTEGER |
| Playlist      |   PlaylistId  | INTEGER |
| PlaylistTrack |    TrackId    | INTEGER |
| PlaylistTrack |   PlaylistId  | INTEGER |
| Track         |  MediaTypeId  | INTEGER |
| Track         |    TrackId    | INTEGER |
| Track         |    AlbumId    | INTEGER |
| Track         |    GenreId    | INTEGER |
+---------------+---------------+---------+
>>> db.find_column("*Address*") # returns all columns containing Address
+----------+----------------+--------------+
| Table    |  Column Name   | Type         |
+----------+----------------+--------------+
| Customer |    Address     | NVARCHAR(70) |
| Employee |    Address     | NVARCHAR(70) |
| Invoice  | BillingAddress | NVARCHAR(70) |
+----------+----------------+--------------+
# returns all columns containing Address that are varchars
>>> db.find_column("*Address*", data_type="NVARCHAR(70)")
# returns all columns have an "e" and are NVARCHAR/INTEGERS
>>> db.find_column("*e*", data_type=["NVARCHAR(70)", "INTEGER"])

Tests

To run individual tests:

$ python -m unittest test_module.TestClass.test_method

To run all the tests:

$ python -m unittest discover <path_to_tests_folder> -v

Contributing

See either the TODO below or Adding a Database.

TODO

db.py's People

Contributors

Stargazers

Watchers

Forkers

wrobstory leetrout afthill yamikep scari mrwacky42 chenpy228 debjan pooney hammer alienfluid mkemka genfri jaepil douglas-larocca slackorama pombredanne nfredrik changsj adamkal dregitsky raleighgee ueg1990 sheltowt postmind-net drewww szkielet patrick-russell hkpesala richiverse asifiqbal coolcoding manugarri levy5674 catawbasam philngo christiantremblay mindis jkr2110 iadrich ospreyx msiler irfanj rothnic jol1 ricket1978 djhume ashhher3 cophy08 pandeesh giserh bmabey hgrif barbara-ipsy vambati klonuo hilam cy-fir 8dspaces sulozx nkhuyu b11z pathcl tonystank3000 lburdzy sanjc shermm rlugojr nanangarsyad dirkakrid cniclsh kevinwkc brunopradel daiguadaidai san-na wha000tif h1d3r jijicanyu songofhack m00zh33 moriarty2016 vdt yucz guibeira bigrlab clxdsjyx linuxiscool hanfeijp mittm91 rayleyva aaronzmia kutsuki afcarl wleandrooliveira kenxu2015 druphliu anningwang alru aslyr fagan2888

db.py's Issues

Connection Issue: AttributeError "Can't set attribute" /DataBase Error "Server closed the connection unexpectedly"

I'm getting a strange set of errors when trying to establish a database connection. Initially, it gave me:

    273                 attr = "_" + col.name
    274             #print col
--> 275             setattr(self, attr, col)
    276
    277         self._cur.execute(self._query_templates['system']['foreign_keys_for_table'].format(table=self.name))

AttributeError: can't set attribute

Since this is a pretty complicated database, I thought that it might help to exclude system tables, so I set that parameter in the DB() call to True. Then, when I ran that same call call successive times, it began to alternate between the above error and this followingerror (I never seemed to get the same one two times in a row, so it does appear to be strict alternation.)

    275             setattr(self, attr, col)
    276
--> 277         self._cur.execute(self._query_templates['system']['foreign_keys_for_table'].format(table=self.name))
    278         for (column_name, foreign_table, foreign_column) in self._cur:
    279             col = getattr(self, column_name)

DatabaseError: server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.

Problem using with spatialite

It has problem reading properties of SpatialIndex - VirtualSpatialIndex virtual table even though I can run the same SQL in the CLI.

In ipython:

In [1]: from db import DB
In [2]: db = DB(username=None, password=None, hostname='localhost', filename='/home/bernardo/Dropbox/workspace/orache/db.sqlite3', dbtype='sqlite')
Indexing schema. This will take a second...---------------------------------------------------------------------------
OperationalError                          Traceback (most recent call last)
<ipython-input-2-a38e0de61324> in <module>()
----> 1 db = DB(username=None, password=None, hostname='localhost', filename='/home/bernardo/Dropbox/workspace/orache/db.sqlite3', dbtype='sqlite')

/home/bernardo/envs/devpy/local/lib/python2.7/site-packages/db.py-0.3.1-py2.7.egg/db/db.py in __init__(self, username, password, hostname, port, filename, dbname, dbtype, schemas, profile, exclude_system_tables, limit)
    757             self.con = sqlite.connect(self.filename)
    758             self.cur = self.con.cursor()
--> 759             self._create_sqlite_metatable()
    760         elif self.dbtype=="mysql":
    761             if not HAS_MYSQL:

/home/bernardo/envs/devpy/local/lib/python2.7/site-packages/db.py-0.3.1-py2.7.egg/db/db.py in _create_sqlite_metatable(self)
   1170         tables = [row[0] for row in self.cur.execute("select name from sqlite_master where type='table';")]
   1171         for table in tables:
-> 1172             for row in self.cur.execute("pragma table_info(%s)" % table):
   1173                 rows_to_insert.append((table, row[1], row[2]))
   1174         # find for table and column names

OperationalError: no such module: VirtualSpatialIndex

Does not work with MS SQL Server

I can't seem to get the SQL Server driver working on OSX. I've got pyodbc working (which seems to be the most accepted way to query SQL Server from Python).

Any suggestions?

Query using TRUNCATE fails

Hi,

Thanks for the very cool implementation.

When connected to an Amazon Redshift database, my attempt to TRUNCATE a table

from db import DB
db = DB() 
db.query("TRUNCATE table_name;")

obtained an error

TypeError: 'NoneType' object is not iterable

Does db.py support the TRUNCATE command?

Support for Presto

Are you planning on supporting Presto in the future?
That would be awesome.
http://prestodb.io/docs/current/

Name conflict between column name and Table property

The schema of the database I'm trying to connect to includes a table with column name 'count'. When I try to connect to it, db.py creates a Table object for that table and calls setattr(self, attr, col) for each column in the table. Since Table has a property called 'count', when it gets to that column, it raises an AttributeError. Obviously a similar problem exists with the other method and field names in Table. Other people have mentioned this problem in issue #33

I'm not sure of the best way to handle this. Table.init already checks to make sure we aren't creating a column attribute with name 'name' or 'con'. If so, it puts an underscore in front of the attribute name (this also seems to be a bug, as _con is the name of the field). This behavior isn't necessarily bad, but it does make it hard to find the column names since underscore attributes are kind of hidden in python. For example, in IPython if I have a Table named Foo with a column attribute named _bar, then typing Foo. does not give me _bar as a suggestion. Apart from looking at attribute names on the class, I don't see any way to determine the columns in a table other than db.find_table("Foo").

Perhaps a combination of putting underscores in front of conflicting column name attributes and adding a 'columns' property that would return a list of column names? That still isn't quite ideal, but it would be better than the current situation which completely prevents me from being able to use db.py, which is a shame because it seems pretty cool.

I'd be happy to submit a pull request, but as I said, I'm not sure of the best resolution.

Suggestion: Include system tables

Rather than type in all the tables I'd like to exclude from loading in DB(), would it be possible to just select the few tables I'd like to load the schema for? I feel like this would be a ton faster for me.

utf-8 values are double-coded (at least from postgresql)

In a dataframe resulting from e.g. db.tables.table.all(), utf-8 values from postgresql were double-encoded (encoded as utf-8 twice).

When i later had to save my dataframe to an Excelsheet or a .csv-file, i had to do a .decode('utf-8') on all values in the dataframe, for it to be able to export after some troubleshooting.

Ask for SSL over Postgres/RDS

Is there a way to specify SSL over the connection using the guide here: http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_PostgreSQL.html#PostgreSQL.Concepts.General.SSL

'DB' object has no attribute 'driver' when loading MSSQL profile

When trying to load a saved profile, I'm getting this AttributeError:

AttributeError                            Traceback (most recent call last)
<ipython-input-1-3981467c7daa> in <module>()
      2 import pandas as pd
      3 
----> 4 NARDO = DB(profile='NARDO')
      5 #NARDO = DB(hostname='[REDACTED]', dbtype='mssql')

c:\Anaconda\lib\site-packages\db\db.pyc in __init__(self, username, password, hostname, port, filename, dbname, dbtype, schemas, profile, exclude_system_tables, limit, keys_per_column, driver)
    833             if HAS_ODBC:
    834                 base_con = "Driver={driver};Server={server};Database={database};".format(
--> 835                     driver=self.driver or "SQL Server",
    836                     server=self.hostname or "localhost",
    837                     database=self.dbname or ''

AttributeError: 'DB' object has no attribute 'driver'

The odd part is that, I can connect to the database no problem if I specify the hostname as if it's a new connection NARDO = DB(hostname='[REDACTED]', dbtype='mssql'). Not sure if this is also a problem with other DB types--have only tested it on MSSQL

DB.query_from_file limit doesn't work

Call to query do not pass arguments the right way and duplicate the _apply_handlebars function

Creating DemoDB object in setUp under tests.py

Hello,

First of all this is an amazing project. Will try to contribute as I want to contribute to database related projects in Python :)

I wanted to know why in tests.py did you decalre db = DemoDB() as a global object. You could have done the following in setUp:

def setUp(self):
self.db = DemoDB()

setUp() is called to prepare test fixture and is used to provide initialization for the tests. Since the test.py code is still not that big, you can replace all db variables by self.db. Let me know what you think. I can do this too if you think this is a viable contribution. Also adding:

def tearDown(self):
pass

since that is for cleaning up after all tests are done.

feature: add 'count' property/method or len to Table to get the number of rows

Hi,

I would like to get the number of rows in a specific table. Is it possible to add a count or rows method/property to the Table class? You can also add the __len__ special method.

Moreover, this number of rows will be added to the TableSet repr.

Cheers,
Damien

Suggestion: add Oracle DB compatibility

It would be great to add Oracle DB connectivity, given their massive presence in business.

Docs

Need some docs...

Says Oracle support, but doesn't seems to

My partner didn't get it working in his OSX to access an Oracle DB.

After that, while trying to figure out how to do it, we got confused. Here says "yes, we support", but there says "not yet".

So, what's the right answer?

queries against information schema at init (MySQL)

it seems that at start up this module runs a high number of queries against the information schema.
making init time exceedingly high

a possible solution might be temporarily disabling the calculation of added statistics using the following
SET innodb_stats_on_metadata=0;

thanks

db.py doesn't support multiple statements ?

example / whish :

from db import DB
db=DB(dbtype="sqlite", filename=":memory:")
db.query("""DROP TABLE IF EXISTS writer;")
CREATE TABLE writer (first_name, last_name, year_of_death);
INSERT INTO writer VALUES ('William', 'Shakespeare', 1616);
INSERT INTO writer VALUES ('Bertold', 'Brecht', 1956);
SELECT  * , sqlite_version() as sqlite_version from Writer order by Year_of_death;""")

Generate schema based on regex

One issue I run into is having databases with large numbers of tables that are user-specific and necessary architecturally. A good solution for me has been using PyCharm/JetBrains products as their database connected allows you to set a regex on which tables to include in the schema. This lets me filter out all of those needless tables and only access the core that I need.

Option to return pandas DataFrame from TableSet or ColumnSet

Currently TabletSets and ColumnSets are represented as "tablified" PrettyTables. However when working with a database with lots of tables, or a table with lots of columns (>100) it would be nice to be able to return only a slice of the table.

An easy way to do this would be to have the option to return the TableSets or ColumnSets as pandas DataFrames.

Obviously the pandas DataFrame representation in the IPython terminal is not as pretty, but at least in IPython notebook the HTML representation of DataFrames is quite nice.

Something like:

rows = []
for col in self._columns:
    rows += [{"Column":col.name, "Type":col.type, "Foreign Keys":col._str_foreign_keys(), "Reference Keys":col._str_ref_keys()}]    
df = pd.DataFrame(li)

Perhaps in a _repr_df_ function.

S3 credentials

Thedb.to_redshift feature is great, but it does not follow the pattern of DB for database credentials. It would be great if you could put in the S3 credentials once, save it, and load it on the fly. Example of what I'd expect from the package.

from db import S3
s3 = S3(credentials)
s3.save_credentials('s3_for_redshift')
s3 = S3(profile='s3_for_redshift')
...
db.to_redshift(df, s3_cred = s3)

Large Database leads to resource errors

Is there a way to limit the number of tables being indexed, possibly by schema?

Postgres/Psycopg2 DatabaseErrror

Whenever I initialize a DB object with my database (Postgres 9.3), and then try to conduct an operation on it (say, show_tables), I get

DatabaseError: server closed the connection unexpectedly
This probably means the server terminated abnormally
    before or while processing the request.

I know that it's some setting particular to my database configuration interacting with db.py, because I can use db.py on other databases, and I can use vanilla psycopg2 on this database without trouble, but I haven't been able to track down what the issue is.

is boto so much necessary in the requirements ?

boto is heavy, so having it optional would be kind

name 'ColumnSet' is not defined

At line 17 of db.py file shouldn't we also import ColumnSet? I am getting the error I pasted in the title section.

question about the template engine pybars

Hi,

I'm just curious. Why do you use the template engine pybars instead of a more mainstream package such as jinja2? There are some features that you use in pybars which are not implemented elsewhere? Or the author(s) have already known pybars so they just use it?

Thanks. Great job by the way, very (very) useful.
Damien

Add support for query parameters

Help on method query in module db.db:

query(self, q, limit=None) method of db.db.DB instance
    Query your database with a raw string.

    Parameters
    ----------
    q: str
        Query string to execute
    limit: int
        Number of records to return

Have you considered allowing a params tuple to be passed in that would use the SQL library's SQL escaping capabilities? (db.query('SELECT * FROM table WHERE foo="foo"') does work, but isn't very safe, even for a data exploration library)

db.tables error

download the package today. Appreciate if anyone can help with this error.

Python 2.7.3 (default, Jun 22 2015, 19:33:41)
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.

from db import DemoDB
db = DemoDB()
Indexing schema. This will take a second...finished!
db.tables
Refreshing schema. Please wait...Traceback (most recent call last):
File "", line 1, in
File "/home/magnolia/python/lib/python2.7/site-packages/db.py-0.4.4-py2.7.egg/db/db.py", line 959, in tables
self.refresh_schema(self._exclude_system_tables, self._use_cache)
File "/home/magnolia/python/lib/python2.7/site-packages/db.py-0.4.4-py2.7.egg/db/db.py", line 1510, in refresh_schema
tables = self._gen_tables_from_col_tuples(col_meta)
File "/home/magnolia/python/lib/python2.7/site-packages/db.py-0.4.4-py2.7.egg/db/db.py", line 1586, in _gen_tables_from_col_tuples
for (table_schema, table_name, column_name, data_type) in cols:
ValueError: need more than 3 values to unpack

Can db.py read a in-memory SQLite db?

Hi,

Suppose, for demo or test purposes, I have a in-memory SQLite database such as:

https://www.sqlite.org/inmemorydb.html or
engine = create_engine('sqlite://') if I use SQLAlchemy

Can I read it from db.DB? I test:

from db import DB
db = DB(filename="sqlite://", dbtype='sqlite')

without success. Is there a plan to support it?

Thanks !
Cheers,
Damien G.

Oracle Support

You mentioned Oracle support in the list of oracle support but I can't quite figure it out. I tried dbtype="oracle" and "Oracle". I do have the Oracle package cx_oracle

Do you really need pandas-0.15?

I tried installing db.py and was alarmed to find it trying to upgrade my pandas version. Given how much of a pain pandas can be to install and how surprising things can break between versions, are you sure you need 0.15.0?

Having had a quick prod, it seems to work fine with 0.14.1

MSSQL / _gen_tables_from_col_tuples failed - Missing schema

Request in queries/mssql.py is missing table_schema

db.py is not compatible with Python3 yet

Add support for ms SQL server with pymssql or _mssql

http://pymssql.org/en/latest/

Executing set time_zone queries returns with an error

This likely happens on any SET queries that do not return a result.

postgres: AttributeError: can't set attribute.

Error appears to be due to the "_" prepended to the column name if the column name is either "name" or "con". In this specific case dim_cell.name gets prepended to form "_name".

--------------+
|      Table       | Name |   Type  | Foreign Keys | Reference Keys |
+------------------+------+---------+--------------+----------------+
| dim_cell | name | varchar |              |                |
+------------------+------+---------+--------------+----------------+

----> db = DB(username="user", password="pw", hostname="hostname",port="5432",dbname="dbname", dbtype="postgres")

C:\Users\user1\AppData\Local\Continuum\Anaconda\lib\site-packages\db\db.py in __init__(self, username, password, hostname, port, filename, dbname, dbtype, schemas, profile, exclude_system_tables, limit, keys_per_column)
    833 
    834         self.tables = TableSet([])
--> 835         self.refresh_schema(exclude_system_tables)
    836         self.handlebars = pybars.Compiler()
    837 
C:\Users\user1\AppData\Local\Continuum\Anaconda\lib\site-packages\db\db.py in refresh_schema(self, exclude_system_tables)
 1339             tables[table_name].append(Column(self.con, self._query_templates, table_name, column_name, data_type, self.keys_per_column))
   1340 
-> 1341         self.tables = TableSet([Table(self.con, self._query_templates, t, tables[t], keys_per_column=self.keys_per_column) for t in sorted(tables.keys())])
   1342         sys.stderr.write("done!\n")
   1343 
C:\Users\user1\AppData\Local\Continuum\Anaconda\lib\site-packages\db\db.py in __init__(self, con, query_templates, name, cols, keys_per_column)
    272             if attr in ("name", "con"):
    273                 attr = "_" + col.name
--> 274             setattr(self, attr, col)
    275 
    276         self._cur.execute(self._query_templates['system']['foreign_keys_for_table'].format(table=self.name))

AttributeError: can't set attribute

save_credentials should at least have an option for a simple encryption key

Many analysts work in enterprises where there are some hard and fast compliance rules for storing passwords in cleartext (yes, base64 == cleartext). Adding a option to let folks use a master "encryption key" (yes, that they would have to typed in) to encode/decode the credentials will add a bit of security to this feature and prevent an audit nightmare.

"Add a database" checklist

Since we're getting a few requests of the form "Can you support database X?", it'd be really helpful to have a developer checklist of features to implement when adding support for a new database.

Add support for index_col

In query you call through to pd.io.sql.read_sql() but don't provide a way of passing in the index_col you want to use. Can this be added to the method signature?

Getting DB/table information

Hi there, first of all thanks for writing this awesome project.

What are you thoughts on adding a way to get db/table size to this library? I was thinking something along the lines of:

db = DB(profile='default')
db.size  # Returns db size
db.tables.my_table.size  # Returns table size
db.tables.my_table.rows  # Returns number of rows

The ability to run EXPLAIN statements would also be really useful.

Handlebars partial support

Hello,

First, thanks for this amazing library.

I am trying to template complex SQL queries and would benefit from being able to register partials for use in the handlebar queries.

Is there any way to accomplish that in db.py without any string manipulation shenanigans?

example from twitter apparently doesn't work on Python3

http://blog.yhathq.com/posts/dbpy-0.4-handlebars-meets-sql.html
(trying on python3.4, windows)

D:\WinPython\basedir34\build\winpython-3.4.3.amd64\python-3.4.3.amd64\lib\site-packages\db\db.py in _apply_handlebars(self, q, data, union)
   1034 
   1035     def _apply_handlebars(self, q, data, union=True):
-> 1036         q = unicode(q)
   1037         template = self.handlebars.compile(q)
   1038         if isinstance(data, list):

NameError: name 'unicode' is not defined

print "var", var

one error in file init.py, in line 69, print "var", var, """is it necessary?""", i have changed to print ("var", var).
I am using the version 3.5.2 of python

AttributeError when trying to connect mysql

Python 2.7.5 (default, Oct 14 2013, 11:50:38)
[GCC 4.4.6 20120305 (Red Hat 4.4.6-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from db import DB
>>> db = DB(username='my id', password='my password', hostname='internal ip', dbname='db name', dbtype="mysql")
Refreshing schema. Please wait...Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/data1/program/anaconda/lib/python2.7/site-packages/db/db.py", line 794, in __init__
    self.refresh_schema(exclude_system_tables)
  File "/data1/program/anaconda/lib/python2.7/site-packages/db/db.py", line 1216, in refresh_schema
    self.tables = TableSet([Table(self.con, self._query_templates, t, tables[t]) for t in sorted(tables.keys())])
  File "/data1/program/anaconda/lib/python2.7/site-packages/db/db.py", line 261, in __init__
    foreign_key = Column(con, queries_templates, foreign_table, foreign_column, col.type)
AttributeError: 'str' object has no attribute 'type'

Any ideas?, please

Profiles / Load credentials not working

Raises an exception that db type can't be found.

Postgres schemas are not supported

db.py will identify all tables across all tables, but can't query them because it only tries to access the unqualified table, without the schema prepended.

For example, if I have a schema s and a table t, calling db.all() tries to execute SELECT * FROM t instead of SELECT * FROM s.t. Unless users have configured a postgres search_path (and have unambiguous table names) this query will fail.

db.py doesn't support drop/create/insert statements ?

example :
is ok :

from db import DB
db=DB(dbtype="sqlite", filename=":memory:")
db.query("select  sqlite_version() as sqlite_version  ;")

is not ok

db.query("DROP TABLE IF EXISTS writer;")

not ok ?

db.query("CREATE TABLE writer (first_name, last_name, year_of_death);")

not ok ?

db.query("INSERT INTO writer VALUES ('William', 'Shakespeare', 1616);")

UnicodeEncodeError with Unicode query parameter

I get a UnicodeEncodeError with Unicode query parameter with the present 'pip' version.

>>> db.query(u"select * from words where form == '{{ word }}' limit 10", data={'word': u'læge'})
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/db/db.py", line 1307, in query
    q = self._apply_handlebars(q, data, union)
  File "/usr/local/lib/python2.7/dist-packages/db/db.py", line 1144, in _apply_handlebars
    query = str(query)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe6' in position 36: ordinal not in     range(128)

In the present GitHub version the error occurs in line 1232.

I have no problem if I interpolate before calling the query method, i.e., avoid self._apply_handlebars.

>>> db.query(u"select * from words where form == 'læge' limit 10")
   index        id  form   pos
0   9275  11031550  læge  Noun

Issue connecting postgres database

/Users/gwulfs/anaconda/lib/python2.7/site-packages/db/db.pyc in init(self, username, password, hostname, port, filename, dbname, dbtype, profile, exclude_system_tables, limit)
670
671 if self.dbtype=="postgres" or self.dbtype=="redshift":
--> 672 self.con = pg.connect(user=self.username, password=self.password,
673 host=self.hostname, port=self.port, dbname=self.dbname)
674 self.cur = self.con.cursor()

NameError: global name 'pg' is not defined

Silence logging messages

As an enhancement it consider silencing the logging messages, e.g., "Indexing schema. This will take a second..."