This repository contains the code and documentation for an end-to-end data engineering project focused on cricket analytics. The project includes the ingestion of data from JSON files, processing it through various layers, and building a consumption layer for analytics. The analytics results can be visualized in a dashboard using Snowsight.
The project is organized into several layers:
- Land Layer: The initial landing layer where raw data in JSON format is stored.
- Used Json Cracker tool to visualize our deeply nested Jsons to understand their structure.
- Created internal Stage and Json File Format to store our Json files in Snowflake.
- RAW Layer: The raw layer stores the ingested data without any transformation. It includes tasks to load data from JSON files into Snowflake tables.
- Clean Layer: The clean layer is responsible for cleaning and transforming the raw data. It includes tasks to create clean tables that are more suitable for analysis.
- Consumption Layer: The consumption layer is designed for analytics and reporting. It includes the final tables and views that are used for creating dashboards.
-
Dashboard: The dashboard layer utilizes Snowsight to visualize the analytics results. It can be accessed for insightful data analysis and reporting.
-
Automate Continuous Data Flow: automating continuous data flow, specifically by creating automated tasks to listen to change data capture (CDC) and update data in all tables.
- Snowflake account (Free Trial Account sufficient)
- Snowflake client (SnowSQL or Snowflake web interface)
- Access to Snowsight for dashboard creation
-
Clone this repository:
git clone https://github.com/mehdi-touil/End-to-End-Cricket-Analytics-Data-Engineering-Project
-
Execute the SQL scripts in each layer (Land, RAW, Clean, Consumption) in the correct order to create tables, streams, tasks, and views.
-
Load data into the Land Layer using tasks or other data loading methods.
-
Run the tasks in the RAW Layer to move data from the Land Layer to the RAW Layer.
-
Run subsequent tasks for cleaning and transforming data in the Clean Layer.
-
Finally, execute the tasks in the Consumption Layer to create the final tables and views for analytics.
-
Use Snowsight to access the dashboard and visualize the cricket analytics.
Json Cricket files/
: Contains our Json Files.Sql Worksheet/
: Contains the SQL scripts used in all layers.dashboard/
: Includes Screenshots and Sql Scripts related to the dashboard creation using Snowsight.
- Customize the SQL scripts based on your specific project requirements.
- Ensure proper access control and permissions are set in Snowflake.
- Refer to Snowflake documentation for detailed information on SnowSQL, Snowsight, and other Snowflake features.