GithubHelp home page GithubHelp logo

sharzzdevise / data-engineer-nanodegree-projects-udacity Goto Github PK

View Code? Open in Web Editor NEW

This project forked from kroudir/data-engineer-nanodegree-projects-udacity

0.0 0.0 0.0 1.94 MB

Projects done in the Data Engineer Nanodegree Program by Udacity.com

Home Page: https://www.udacity.com/course/data-engineer-nanodegree--nd027

Python 28.22% Jupyter Notebook 70.09% TSQL 1.69%

data-engineer-nanodegree-projects-udacity's Introduction

Data-Engineer-Nanodegree-Projects-Udacity

Projects done in the Data Engineer Nanodegree by Udacity.com

Course 1: Data Modeling

Introduction to Data Modeling

  • Understand the purpose of data modeling
  • Identify the strengths and weaknesses of different types of databases and data storage techniques
  • Create a table in Postgres and Apache Cassandra

Relational Data Models

  • Understand when to use a relational database
  • Understand the difference between OLAP and OLTP databases
  • Create normalized data tables
  • Implement denormalized schemas (e.g. STAR, Snowflake)

NoSQL Data Models

  • Understand when to use NoSQL databases and how they differ from relational databases
  • Select the appropriate primary key and clustering columns for a given use case
  • Create a NoSQL database in Apache Cassandra

Project 1: Data Modeling with Postgres and Apache Cassandra

Course 2: Cloud Data Warehouses

Introduction to the Data Warehouses

  • Understand Data Warehousing architecture
  • Run an ETL process to denormalize a database (3NF to Star)
  • Create an OLAP cube from facts and dimensions
  • Compare columnar vs. row oriented approaches

Introduction to the Cloud with AWS

  • Understand cloud computing
  • Create an AWS account and understand their services
  • Set up Amazon S3, IAM, VPC, EC2, RDS PostgreSQL

Implementing Data Warehouses on AWS

  • Identify components of the Redshift architecture
  • Run ETL process to extract data from S3 into Redshift
  • Set up AWS infrastructure using Infrastructure as Code (IaC)
  • Design an optimized table by selecting the appropriate distribution style and sorting key

Project 2: Data Infrastructure on the Cloud

Course 3: Data Lakes with Spark

The Power of Spark

  • Understand the big data ecosystem
  • Understand when to use Spark and when not to use it

Data Wrangling with Spark

  • Manipulate data with SparkSQL and Spark Dataframes
  • Use Spark for ETL purposes

Debugging and Optimization

  • Troubleshoot common errors and optimize their code using the Spark WebUI

Introduction to Data Lakes

  • Understand the purpose and evolution of data lakes
  • Implement data lakes on Amazon S3, EMR, Athena, and Amazon Glue
  • Use Spark to run ELT processes and analytics on data of diverse sources, structures, and vintages
  • Understand the components and issues of data lakes

Project 3: Big Data with Spark

Course 4: Automate Data Pipelines

Data Pipelines

  • Create data pipelines with Apache Airflow
  • Set up task dependencies
  • Create data connections using hooks

Data Quality

  • Track data lineage
  • Set up data pipeline schedules
  • Partition data to optimize pipelines
  • Write tests to ensure data quality
  • Backfill data

Production Data Pipelines

  • Build reusable and maintainable pipelines
  • Build your own Apache Airflow plugins
  • Implement subDAGs
  • Set up task boundaries
  • Monitor data pipelines

Project 4: Data Pipelines with Airflow

./images/certification.jpg

data-engineer-nanodegree-projects-udacity's People

Contributors

kroudir avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.