shuotian172 Goto Github PK

followers: 2.0 following: 5.0 repos: 23.0 gists: 0.0

Name: Shuo Tian

Type: User

Company: University of Waterloo

Bio: Interest: Data Science, Data Engineering, Machine Learning, Data Analyst, Deep Learning, Software Development

Location: Toronto, Ontario, CA

Blog: [email protected]

Shuo Tian's Projects

bigdata-2019w

CS 451/651, CS 431/631: Data-Intensive Distributed Computing (Winter 2019) at the University of Waterloo https://aroegies.github.io/bigdata-2019w/

cannabispect

Customer Segmentation Based on Cannabis Consumer Reviews

chatbot-seq2seq-attention-transformer

Chatbot-Raw Reddit Comments-Data Clean-Seq2Seq-Tensorflow-Attention-Bidirectional GRU

Data modeling with PostgreSQL and building an ETL pipeline using Python. Define fact and dimension tables for a star schema for a particular analytic focus, and write an ETL pipeline that transfers data from files in two local directories into these tables in PostgreSQL using Python and SQL.

data_lake_spark

Building out an ETL pipeline, extracting data from S3 buckets, processing it through Spark and transforming into a star schema stored in S3 buckets with parquet formatting and efficient partitioning.

data_pipelines_airflow

Using Airflow to automate ETL pipelines using Airflow, Python, Amazon Redshift. Transforming data from various sources into a star schema optimized for the analytics team's use cases. Writing custom operators to perform tasks such as staging data, filling the data warehouse, and validation through data quality checks.

data_warehouse_redshift

Building out an ETL pipeline using AWS SDK, Redshift, Python and PostgreSQL. Developing seamless pipeline to connect to Redshift cluster and COPY data from S3 buckets to redshift staging tables. Creating a database with tables designed to optimize queries on song play analysis

deeplearning-tutorials

Code for deep learning tutorials that I have posted on my blog: https://hareeshbahuleyan.github.io/blog/

dna-sequences-classification

Linear Feature Extraction using PCA, LDA. Nonlinear Dimensionality Reduction using LLE and ISOMAP. Naive Bayes classifier, kNN, SVM

ece608-quantitative-methods

Data Wrangling-statistics-ANOVA-parametric assumption-Regression- Multiple Regression- Logistic Regression- Poisson Regression-Validity-non-parametric

ece650-traffic-management-system

Vertex Cover problem-Optimization-Multi-thread-Multi-process-MiniSAT

handwritten-digit-dataset-analysis

Handwritten dataset with 5 classes: digit 0, 1, 2, 3, 4. Dimensional reduction approaches-PCA, LDA, LLE, and Isomap- were implied.

java-design-patterns

Design patterns implemented in Java

movie-recommendation-website-using-apache-spark-and-flask

nosql-data-modeling-with-apache-cassandra

Building out an ETL pipeline using Python. Creating a database schema and ETL pipeline for this analysis. Creating an Apache Cassandra database with denormalized tables designed to optimize queries on event data. Define robust Partition Keys, Clustering Columns and Composite Primary Keys.

shuotian172 Goto Github PK

Shuo Tian's Projects

Recommend Projects

Recommend Topics

Recommend Org

Jobs