GithubHelp home page GithubHelp logo

covid-19-toronto's Introduction

Table of Contents

Project Overview

Toronto Image This repository is the final project for Data Engineering Zoomcamp hosted by our good friends DataTalks Club. In this project, I selected the COVID-19 Cases in Toronto Data from Toronto Open Data Catalogue. The criteria for my selection of dataset is that it refreshed in a specific period or at least on a schedule. In this case, the dataset refreshes weekly. I am also interested in the COVID-19 cases in Toronto because some of my friends and family members live there.

The final output is a dashboard that shows some summary statistics of the COVID-19 cases in Toronto. The dashboard is deployed on Looker Studio. You can access the dashboard here

Tech Stack

This project utilizes the following tech stack:

  • Python script and Prefect for data pipeline and orchestration
  • Google Cloud Platform
    • Google Cloud Storage was used to store the raw data
    • Google BigQuery was used to store the transformed data
    • Google Looker Studio was used to create the dashboard
  • Terraform for infrastructure as code
  • dbt for more data transformation

Data Pipeline

Data Pipeline The data pipeline consists of the following steps:

  1. Use Terraform to create a Google Cloud Storage bucket and a Google BigQuery dataset
  2. Extract the data from the Toronto Open Data Catalogue using their API
  3. Perform preliminary data transformation using Python script
  4. Store the raw data in Google Cloud Storage
  5. From the raw data, create a new table in Google BigQuery
  6. Perform more data transformation using dbt
  7. Create a dashboard using Google Looker Studio

Running locally

Note: These set of steps assume that the user already created necessary Google Cloud Platform steps such as creating Prefect blocks and storing the credentials in a JSON file. The user should also have Terraform installed on their machine.

  1. Clone this repository
git clone [email protected]:jplaulau14/covid-19-toronto.git
  1. Create a virtual environment
python3 -m venv venv
  1. Activate the virtual environment For Mac/Linux
source venv/bin/activate

For Windows

venv\Scripts\activate
  1. Install the requirements
pip install -r requirements.txt
  1. Run the Bash Script for Terraform
chmod +x scripts/terraform.sh
./scripts/terraform.sh
  1. Run the Python Script
python flows/main.py

Further Improvements

  • Create a CI/CD pipeline to automate the deployment of the data pipeline
  • Use Prefect Cloud to schedule weekly processing
  • Add a detailed instruction on running this project locally

covid-19-toronto's People

Contributors

jplaulau14 avatar pats14 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.