GithubHelp home page GithubHelp logo

patman17 / dsc-selecting-data-houston-ds-082619 Goto Github PK

View Code? Open in Web Editor NEW

This project forked from learn-co-students/dsc-selecting-data-houston-ds-082619

0.0 1.0 0.0 537 KB

License: Other

Jupyter Notebook 100.00%

dsc-selecting-data-houston-ds-082619's Introduction

Selecting Data

Introduction

Now that you've gotten a brief introduction to SQL, its time to get some hands-on practice connecting to a database via Python and executing some queries.

Objectives

You will be able to:

  • Connect to a SQL database using Python
  • Retrieve all information from a SQL table
  • Retrieve a subset of records from a table using a WHERE clause
  • Write SQL queries to filter and order results
  • Retrieve a subset of columns from a table

Connecting to a Database

First, let's connect to our database by importing sqlite3 and running the following cell in our notebook. You'll need a cursor object (cur) to fetch results. Cursor objects allow you to keep track of which result set is which since it's possible to run multiple queries before you're done fetching the results of the first.

import sqlite3
conn = sqlite3.connect('data.sqlite')
cur = conn.cursor()
# connect database and create cursor here
import sqlite3 
conn = sqlite3.connect('data.sqlite')
cur = conn.cursor()

Schema Overview

The database that you've just connected to is the same database from the previous introduction. Here's an overview of the database:

Querying Via the Connection

Now that you're connected to the database, let's take a look at how you can query the data within.

With your cursor object, you can execute queries

cur.execute("""SELECT * FROM employees LIMIT 5;""")
<sqlite3.Cursor at 0x111deb420>

The execute command itself only returns the cursor object. To see the results, you must use the fetchall method afterwards.

cur.fetchall()
[('1002',
  'Murphy',
  'Diane',
  'x5800',
  '[email protected]',
  '1',
  '',
  'President'),
 ('1056',
  'Patterson',
  'Mary',
  'x4611',
  '[email protected]',
  '1',
  '1002',
  'VP Sales'),
 ('1076',
  'Firrelli',
  'Jeff',
  'x9273',
  '[email protected]',
  '1',
  '1002',
  'VP Marketing'),
 ('1088',
  'Patterson',
  'William',
  'x4871',
  '[email protected]',
  '6',
  '1056',
  'Sales Manager (APAC)'),
 ('1102',
  'Bondur',
  'Gerard',
  'x5408',
  '[email protected]',
  '4',
  '1056',
  'Sale Manager (EMEA)')]

It's also possible to combine the previous two cells into one line, like so:

## Uncomment cell to display contents:

# cur.execute("""SELECT * FROM employees LIMIT 5;""").fetchall()

Quick note on formatting syntax:

When working with strings, you may have previously seen a 'string', a "string", a '''string''', or a """string""". While all of these are strings, the triple quotes have the added functionality of being able to use multiple lines within the same string. Sometimes, SQL queries can be much longer than others, in which case it's helpful to use new lines for readability. Here's a short example:

## Uncomment cell to display contents:

# cur.execute("""SELECT * 
#                FROM employees 
#                LIMIT 5;""").fetchall()

Wrapping Results Into Pandas DataFrames

Often, a more convenient output will be to turn these results into pandas DataFrames. To do this, you simply wrap the c.fetchall() output with a pandas DataFrame constructor:

import pandas as pd
cur.execute("""SELECT * FROM employees LIMIT 5;""")
df = pd.DataFrame(cur.fetchall())
df.head()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
0 1 2 3 4 5 6 7
0 1002 Murphy Diane x5800 [email protected] 1 President
1 1056 Patterson Mary x4611 [email protected] 1 1002 VP Sales
2 1076 Firrelli Jeff x9273 [email protected] 1 1002 VP Marketing
3 1088 Patterson William x4871 [email protected] 6 1056 Sales Manager (APAC)
4 1102 Bondur Gerard x5408 [email protected] 4 1056 Sale Manager (EMEA)

Sadly as you can see this is slightly clunky as we do not have the column names.

We can access the column names by calling cur.description, like so:

cur.execute("""SELECT * FROM employees LIMIT 5;""")
df = pd.DataFrame(cur.fetchall())
df.columns = [x[0] for x in cur.description]
df.head()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
employeeNumber lastName firstName extension email officeCode reportsTo jobTitle
0 1002 Murphy Diane x5800 [email protected] 1 President
1 1056 Patterson Mary x4611 [email protected] 1 1002 VP Sales
2 1076 Firrelli Jeff x9273 [email protected] 1 1002 VP Marketing
3 1088 Patterson William x4871 [email protected] 6 1056 Sales Manager (APAC)
4 1102 Bondur Gerard x5408 [email protected] 4 1056 Sale Manager (EMEA)
  • Check out the documentation for more info on cursor methods and attributes here.

The Where Clause

In general, the WHERE clause filters query results by some condition. As you are starting to see, you can also combine multiple conditions.

Selecting Customers From a Specific City

cur.execute("""SELECT * FROM customers WHERE city = 'Boston';""")
df = pd.DataFrame(cur.fetchall())
df.columns = [x[0] for x in cur.description]
df
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
customerNumber customerName contactLastName contactFirstName phone addressLine1 addressLine2 city state postalCode country salesRepEmployeeNumber creditLimit
0 362 Gifts4AllAges.com Yoshido Juri 6175559555 8616 Spinnaker Dr. Boston MA 51003 USA 1216 41900.00
1 495 Diecast Collectables Franco Valarie 6175552555 6251 Ingle Ln. Boston MA 51003 USA 1188 85100.00

Selecting Multiple Cities

cur.execute("""SELECT * FROM customers WHERE city = 'Boston' OR city = 'Madrid';""")
df = pd.DataFrame(cur.fetchall())
df.columns = [x[0] for x in cur.description]
df
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
customerNumber customerName contactLastName contactFirstName phone addressLine1 addressLine2 city state postalCode country salesRepEmployeeNumber creditLimit
0 141 Euro+ Shopping Channel Freyre Diego (91) 555 94 44 C/ Moralzarzal, 86 Madrid 28034 Spain 1370 227600.00
1 237 ANG Resellers Camino Alejandra (91) 745 6555 Gran Vía, 1 Madrid 28001 Spain 0.00
2 344 CAF Imports Fernandez Jesus +34 913 728 555 Merchants House 27-30 Merchant's Quay Madrid 28023 Spain 1702 59600.00
3 362 Gifts4AllAges.com Yoshido Juri 6175559555 8616 Spinnaker Dr. Boston MA 51003 USA 1216 41900.00
4 458 Corrida Auto Replicas, Ltd Sommer Martín (91) 555 22 82 C/ Araquil, 67 Madrid 28023 Spain 1702 104600.00
5 465 Anton Designs, Ltd. Anton Carmen +34 913 728555 c/ Gobelas, 19-1 Urb. La Florida Madrid 28023 Spain 0.00
6 495 Diecast Collectables Franco Valarie 6175552555 6251 Ingle Ln. Boston MA 51003 USA 1188 85100.00

The Order By and Limit Clauses

Two additional keywords that you can use to refine your searches are the ORDER BY and LIMIT clauses. The order by clause allows you to sort the results by a particular feature. For example, you could sort by the customerName column if you wished to get results in alphabetical order. By default, ORDER BY is ascending. So, as with the above example, if you want the opposite, use the additional parameter DESC. Finally, the limit clause is typically the last argument in a SQL query and simply limits the output to a set number of results.

Selecting Specific Columns With Complex Criteria

cur.execute("""SELECT customerNumber, customerName, city, creditLimit
               FROM customers
               WHERE (city = 'Boston' OR city = 'Madrid') AND (creditLimit >= 50000.00)
               ORDER BY creditLimit DESC
               LIMIT 15
               ;""")
df = pd.DataFrame(cur.fetchall())
df.columns = [x[0] for x in cur.description]
df
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
customerNumber customerName city creditLimit
0 495 Diecast Collectables Boston 85100.00
1 344 CAF Imports Madrid 59600.00
2 362 Gifts4AllAges.com Boston 41900.00
3 141 Euro+ Shopping Channel Madrid 227600.00
4 458 Corrida Auto Replicas, Ltd Madrid 104600.00
5 237 ANG Resellers Madrid 0.00
6 465 Anton Designs, Ltd. Madrid 0.00

You might notice that the output of this query doesn't seem to respect our credit limit criterion. A little investigation shows that this is because the number is actually stored as a string!

type(df.creditLimit.iloc[0])
str

This is an annoying problem to encounter and also underlines the importance of setting up a database in an appropriate manner at the get-go. For now, it's time to practice some of your SQL querying skills!

Summary

In this lesson, you saw how to connect to a SQL database via Python and how to subsequently execute queries against that database. Going forward, you'll continue to learn additional keywords for specifying your query parameters!

dsc-selecting-data-houston-ds-082619's People

Contributors

annjohn avatar deniznida avatar gj avatar sophiedebenedetto avatar mas16 avatar tkoar avatar peterbell avatar sanpietro avatar mathymitchell avatar loredirick avatar ipc103 avatar sproulhimself avatar gilmoursa avatar eaud avatar franknowinski avatar lcorr8 avatar tmb41 avatar victhevenot avatar asialindsay avatar fpolchow avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.