GithubHelp home page GithubHelp logo

ny_taxi's Introduction

Syllabus

Syllabus

Note: NYC TLC changed the format of the data we use to parquet. But you can still access the csv files here.

  • Course overview
  • Introduction to GCP
  • Docker and docker-compose
  • Running Postgres locally with Docker
  • Setting up infrastructure on GCP with Terraform
  • Preparing the environment for the course
  • Data Lake
  • Workflow orchestration
  • Introduction to Prefect
  • ETL with GCP & Prefect
  • Parametrizing workflows
  • Prefect Cloud and additional resources
  • Data Warehouse
  • BigQuery
  • Partitioning and clustering
  • BigQuery best practices
  • Internals of BigQuery
  • Integrating BigQuery with Airflow
  • BigQuery Machine Learning
  • Basics of analytics engineering
  • dbt (data build tool)
  • BigQuery and dbt
  • Postgres and dbt
  • dbt models
  • Testing and documenting
  • Deployment to the cloud and locally
  • Visualizing the data with google data studio and metabase
  • Batch processing
  • What is Spark
  • Spark Dataframes
  • Spark SQL
  • Internals: GroupBy and joins
  • Introduction to Kafka
  • Schemas (avro)
  • Kafka Streams
  • Kafka Connect and KSQL

Workshop: Maximizing Confidence in Your Data Model Changes with dbt and PipeRider

More details

Overview

Architecture diagram

Technologies

  • Google Cloud Platform (GCP): Cloud-based auto-scaling platform by Google
    • Google Cloud Storage (GCS): Data Lake
    • BigQuery: Data Warehouse
  • Terraform: Infrastructure-as-Code (IaC)
  • Docker: Containerization
  • SQL: Data Analysis & Exploration
  • Prefect: Workflow Orchestration
  • dbt: Data Transformation
  • Spark: Distributed Processing
  • Kafka: Streaming

Tools

For this course, you'll need to have the following software installed on your computer:

  • Docker and Docker-Compose
  • Python 3 (e.g. via Anaconda)
  • Google Cloud SDK
  • Terraform

ny_taxi's People

Contributors

antonlevashov avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.