GithubHelp home page GithubHelp logo

Shuo Tian's Projects

bigdata-2019w icon bigdata-2019w

CS 451/651, CS 431/631: Data-Intensive Distributed Computing (Winter 2019) at the University of Waterloo https://aroegies.github.io/bigdata-2019w/

cannabispect icon cannabispect

Customer Segmentation Based on Cannabis Consumer Reviews

data-modeling-with-postgresql icon data-modeling-with-postgresql

Data modeling with PostgreSQL and building an ETL pipeline using Python. Define fact and dimension tables for a star schema for a particular analytic focus, and write an ETL pipeline that transfers data from files in two local directories into these tables in PostgreSQL using Python and SQL.

data_lake_spark icon data_lake_spark

Building out an ETL pipeline, extracting data from S3 buckets, processing it through Spark and transforming into a star schema stored in S3 buckets with parquet formatting and efficient partitioning.

data_pipelines_airflow icon data_pipelines_airflow

Using Airflow to automate ETL pipelines using Airflow, Python, Amazon Redshift. Transforming data from various sources into a star schema optimized for the analytics team's use cases. Writing custom operators to perform tasks such as staging data, filling the data warehouse, and validation through data quality checks.

data_warehouse_redshift icon data_warehouse_redshift

Building out an ETL pipeline using AWS SDK, Redshift, Python and PostgreSQL. Developing seamless pipeline to connect to Redshift cluster and COPY data from S3 buckets to redshift staging tables. Creating a database with tables designed to optimize queries on song play analysis

deeplearning-tutorials icon deeplearning-tutorials

Code for deep learning tutorials that I have posted on my blog: https://hareeshbahuleyan.github.io/blog/

dna-sequences-classification icon dna-sequences-classification

Linear Feature Extraction using PCA, LDA. Nonlinear Dimensionality Reduction using LLE and ISOMAP. Naive Bayes classifier, kNN, SVM

ece608-quantitative-methods icon ece608-quantitative-methods

Data Wrangling-statistics-ANOVA-parametric assumption-Regression- Multiple Regression- Logistic Regression- Poisson Regression-Validity-non-parametric

nosql-data-modeling-with-apache-cassandra icon nosql-data-modeling-with-apache-cassandra

Building out an ETL pipeline using Python. Creating a database schema and ETL pipeline for this analysis. Creating an Apache Cassandra database with denormalized tables designed to optimize queries on event data. Define robust Partition Keys, Clustering Columns and Composite Primary Keys.

system-design-primer icon system-design-primer

Learn how to design large-scale systems. Prep for the system design interview. Includes Anki flashcards.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.