I've thrown a quick demo together to help me learn about dbt with a Postgres database.
With very little effort we've cleaned up this price data!
The demo:
- Starts a database (postgres)
- Creates tables and imports a dataset
- Sets up a DBT project and profile
- Creates relevant models/views
dbt-profiles
and database.env
inside this repo to make it easy for a quickstart. These both contain secrets, which if were real, should never end up in Git. Ever.
Run: make all
Commands have been written and tested on MacOS Big Sur.
brew install dbt
- Docker Desktop & CLI tools (
docker-compose
) - Postgres CLI tools (
pgsql
)
Runs all of the following commands in the correct order to end up with a database, with data imported, seeds, models and everything ready to go.
Stops and completely deletes the postgres database. Allows us to start from fresh each time if we so wish.
Starts the postgres database using docker-compose
, it'll keep running forever until you stop it.
This will create any base ingest tables we need to give us some starter data
Using the tables created previously, it ungzips our pricing data then uses psql
to copy it into our table. There are ~650k rows so it takes a moment.
We have some static data that is used to enrich the pricing data, in the data source it's hardcoded to single alpha-chars, but that's not particularly useful. Using the data/
directory, we have some CSVs that get pulled in as tables automatically.
This is where the magic happens, dbt run
is called, and will create all our views/models etc automatically.
setup/prices-data-2020.csv
is from GOV.UK Price Paid Data and is licensed under Open Government Licence (OGL)