ingest, clean, process and report.
Data Sources that can be used:
- Tennis tournament data
- Spotify API on music
- Housing data from a few sources (gov data mixed with MLS)
- Personal history of Amazon purchases
- Wearable data
- Fantasy football
- a dataset from your passion project domain
Create pipeline to ingest data from some source and go through some type of process of your choice. Use Airflow to manage the running of the project
- ingest from one source
- clean (add missing data, create a feature or two)
- process and perform some transformations
- report (charts, graphs, some kind of output which informs our understanding of the datasets)
- keep it pretty simple, it needs to be done in a day or two.
Take a look into airflow-proj-src
to see s super simple, docker-hosted, Airflow project to start with.