GithubHelp home page GithubHelp logo

fabriziomiano / covid-italy-etl Goto Github PK

View Code? Open in Web Editor NEW
0.0 2.0 0.0 13.27 MB

This repository hosts an Azure Data Factory to perform the ingestion of the official COVID-19 pandemic- and vaccine-data on an Azure SQL Database, and the relevant DDL to create a data model to produce insightful analytics on PowerBI

TSQL 100.00%
azure data-factory azure-sql azure-data-factory azure-sql-database azure-sql-server covid-19 covid19-italy covid-19-italy pandemic

covid-italy-etl's Introduction

covid-etl-df

The scope of this project is the creation of insightful analytics on the Italian COVID pandemic and vaccine status. pandemic.png vax.png

Description

This repository hosts an Azure Data Factory (ADF) to perform the ingestion of the official COVID-19 pandemic- and vaccine-data on an Azure SQL Database. It also contains for the DDL needed for the creation of the data model on an Azure SQL Database.

Data Factory

The ADF consists of:

  • 1 tumbling-window trigger
  • 3 linked services
  • 3 datasets
  • 5 ingestion pipelines

The Trigger

The tumbling-window trigger is set to play every 24 hours, and it triggers the ingestion

The linked services

Azure SQL Database

The SQL Server database the data is copied to

DPC Github

The HTTP request that is sent to the civil protection Github repository PCM-DPC to retrieve the pandemic data

Italia OD Github

The HTTP request that is sent to the Italia-OpenData repository to retrieve the pandemic data

The datasets

GithubDPC

Represents the CSV file from the civil-protection repository taken from its linked service

GithubOD

Represents the CSV file from the Italia open data repository taken from its linked service

SQLDB

Represents the dataset needed for the copy to the Azure SQL DB linked service

Pipelines

The main pipeline is the Data Ingestion pipeline. This calls two pipelines:

  • Ingest Pandemic Data
  • Ingest Vax Data

which in turn call the two parametric copy-activity pipelines:

  • OD 2 SQL
  • PCM-DPC 2 SQL

These are parametric in directory name and file name to be retrieved from the relevant Github repositories and perform the copy activities from the CSV files to the relevant tables on SQL, together with a provided stored procedure that update the age ranges in the Italia-Open-Data adminstrations CSV file to harmonize the age-range with their provided population CSV file

The DDL scripts

The needed tables, view, and procedure are defined under SQL/. The scripts create the relevant tables needed for the ingestion; an update procedure to harmonize the data; the views to be exposed to the data model.

Usage

The repository does not contain any script for the automated deployment of the ADF. However, in order to deploy the ADF, apart from clicking on the button at the top of this repo it is needed to:

  • fork this repo
  • provision an Azure SQL Database (connection string needed in the template)
  • provision an Azure Data Factory

Once the services have been created, run:

  • table creation in SQL/tables.sql
  • procedure creation in SQL/procedures.sql
  • views creation in SQL\views.sql

At this stage all the components needed to create the data model are in place. The screenshot below shows the designed data model data_model.png

and the relevant PowerBI report, whose screenshot is at the top of this README, can be found inside SampleReport/COVID.pbix

Donation

If you liked this project or if I saved you some time, feel free to buy me a beer :beer: Cheers!

paypal

covid-italy-etl's People

Contributors

fabriziomiano avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.