T81 577 Applied Data Science for Practitioners
Washington University in St. Louis
Instructor: Asim Banskota
Spring 2020, Wednesday, 6:00 PM - 9:00 PM , Cupples II, Room L015
Organizations are rapidly transforming the way they ingest, integrate, store, serve data, and perform
analytics. In this course, students will learn the steps involved with designing and implementing data
science projects. Topics addressed include: ingesting and parsing data from various sources, dealing with
messy and missing data, transforming and engineering features, building and evaluating machine learning models, and
visualizing results. Using Python based tools such as Numpy, Pandas, and Scikit-learn, students will
complete a practical data science project that addresses the entire design and implementation process.
Students will also become familiar with the best practices and current trends in data science including
code documentation, version control, reproducible research, pipeline automation, and cloud computing. Upon completion of the course, students will emerge equipped with data science knowledge and skills that can be applied from day one on the job.
Week
Content
Week 1 1/15/2020
Introductions Assignment 1.1: Install anaconda and test Jupyter notebook Assignment 1.2: AWS fundamentals
Week 2 1/22/2020
Python Fundamentals Assignment 2: Programming practice assignment
Week 3 1/29/2020
Coding Best Practices in Data Science Assignment 3.1.: Exercise of version control with git Assignment 3.2. Exercise on code documentation and enforcing standards
Week 4 2/5/2020
Modeling Overview 4.1. Types of models 4.1.1. Descriptive/Prescriptive/Predictive 4.1.2. Statistical vs Machine learning 4.1.3. Blackbox vs Explainable 4.2. Model development steps 4.2.1. Framing questions 4.2.2. Data ingestion and wrangling 4.2.3. Data Preprocessing 4.2.4. Model fitting and evaluation 4.2.5 Model deployment 4.2.6. Performance monitoring and redevelopment Quiz Modeling Overview
Week 5 2/12/2020
Accessing Data 5.1. Introduction to RESTful APIs 5.2. Accessing data from API using request module and Postman 5.3. Overview of JSON-formatted data 5.4. Parsing JSON data 5.5. Importing commonly used files formatted data 5.6. Reading data from PostgreSQL database Assignment 4: Finalization of final project topic and data set (Not graded)
Week 6 2/19/2020
Numpy/Pandas for Data Munging/Wrangling 6.1. Pandas and numpy data structure 6.2. Querying and reading data 6.3. Reshaping, Indexing, slicing, and filtering data 6.4. Join, Merge, and Aggregation 6.5. Vectorization 6.6. Basic statistics and plotting Assignment 5: Data wrangling with Numpy and Pandas
Week 7 2/26/2020
Exploratory Data Analysis (EDA) 7.1. Categorical vs numeric features 7.2. Datatype conversion 7.3. Sampling 7.4. Data summary and distribution 7.5. Patterns in data 7.6. Data visualization using matplotlib, seaborn, and Bokeh 7.7 Anomaly/outlier detection Assignment 6: Patterns in data: Vizualization and data summary
Week 8 3/4/2020
Data Preprocessing 8.1. Basics (select, filter, removal of duplicates) 8.2. Data Transformation 8.3. Standardization, Binning, Missing value treatments 8.4 Balancing dataset Assignment 6: Data preprocessing
Week 9 3/18/2020
Feature Transformation and Engineering 9.1. Categorical encodings 9.2. Feature creation/engineering 9.3. Feature extraction Assignment Transformation of categorical and continuous features
Week 10 3/25/2020
Building and Evaluating Models 10.1. Tour of machine learning algorithms using scikit learn 10.2. Introduction to Scikit-learn model development API 10.3. Amazon SageMaker 10.4. Training and fitting classification models 10.5.Training and fitting regression models 10.6. Performance evaluation metrics and curves Assignment: Model building and evaluation using Scikit-Learn
Week 11 4/1/2020
Best practices in Machine Learning 11.1. Bias vs variance tradeoff 11.2. Train/dev/test dataset 11.3. Regularization 11.4. Learning vs validation curves 11.5. Hyperparameter tuning 11.6. Ensemble learning 11.7. Streamlining workflows with pipelines Assignment: Regularization, cross validation and hyperparameter tuning
Week 12 4/8/2020
1. Guest Lecture: Data Science at Wells Fargo 2. Discussion on final project status Quiz 2: Best practices on machine learning
Week 13 4/15/2020
Productionize a Machine Learning model 13.1. Dev/Stage/Prod environment 13.2 Docker , Docker Files, Docker Containers 13.3. Deploy a machine learning model as a Flask app 13.4 Introduction to Airflow Assignment: Build and deploy a model using Docker and Heroku app
Week 14 4/22/2020
Final Project Demo Short 5 minutes long individual project demo