This project contains the activities and supporting code for a hands-on workshop on web scraping with Python for the 2019 NICAR conference.
Pipfile
: Defines Python dependencies for this project. See Pipenv documentation.data
: Place to store your the output of your scrapers.data/src
: Place to store your raw HTML.reference_cheatsheet.md
: Quick technical reference.scrapers
: Directory that contains the files you'll edit in these exercises.scrapers-solutions
: Directory that contains versions of the scripts inscrapers
with the blank sections filled in.scraping_site
: Flask app that implements our mock scraping site.
- Python 3.6+. If you don't have Python, you may want to follow the instructions at Installing Python the IRE Way
- Pipenv or virtualenvwrapper
- A basic knowledge of Python programming. These are good resources:
- Intermediate Python โ Python Tips
- How to Think like a Computer Scientist: Interactive Edition
- Dive Into Python 3
- Django Girls Tutorial - This is Django-specific, but there is a good intro to the command line and Python section.
git clone https://github.com/asuozzo/nicar2019-scraping.git
cd nicar2019-scraping
Create a virtualenv for the project and install Python dependencies using Pipenv:
pipenv install
Create a virtualenv for the project and install Python dependencies using Virtualenvwrapper:
mkvirtualenv nicar2019-scraping
workon nicar2019-scraping
pip install -r requirements.txt
In order to make this workshop able to run, even if conference Internet access is sketchy, we decided to implement a mock site that has many features of actual sites we've scraped as a Flask app. To run this app, run the following command:
pipenv shell
FLASK_ENV=development FLASK_APP=scraping_site flask run