GithubHelp home page GithubHelp logo

t81_577_data_science's Introduction

T81 577 Applied Data Science for Practitioners

Washington University in St. Louis

Instructor: Asim Banskota

Spring 2020, Wednesday, 6:00 PM - 9:00 PM , Cupples II, Room L015

Course Description

Organizations are rapidly transforming the way they ingest, integrate, store, serve data, and perform analytics. In this course, students will learn the steps involved with designing and implementing data science projects. Topics addressed include: ingesting and parsing data from various sources, dealing with messy and missing data, transforming and engineering features, building and evaluating machine learning models, and visualizing results. Using Python based tools such as Numpy, Pandas, and Scikit-learn, students will complete a practical data science project that addresses the entire design and implementation process. Students will also become familiar with the best practices and current trends in data science including code documentation, version control, reproducible research, pipeline automation, and cloud computing. Upon completion of the course, students will emerge equipped with data science knowledge and skills that can be applied from day one on the job.

Syllabus

Week Content
Week 1
1/15/2020
Introductions Assignment 1.1: Install anaconda and test Jupyter notebook
Assignment 1.2: AWS fundamentals
Week 2
1/22/2020
Python Fundamentals Assignment 2: Programming practice assignment
Week 3
1/29/2020
Coding Best Practices in Data Science Assignment 3.1.: Exercise of version control with git
Assignment 3.2. Exercise on code documentation and enforcing standards
Week 4
2/5/2020
Modeling Overview
  • 4.1. Types of models
    • 4.1.1. Descriptive/Prescriptive/Predictive
    • 4.1.2. Statistical vs Machine learning
    • 4.1.3. Blackbox vs Explainable
    4.2. Model development steps
    • 4.2.1. Framing questions
    • 4.2.2. Data ingestion and wrangling
    • 4.2.3. Data Preprocessing
    • 4.2.4. Model fitting and evaluation
    • 4.2.5 Model deployment
    • 4.2.6. Performance monitoring and redevelopment
Quiz Modeling Overview
Week 5
2/12/2020
Accessing Data
  • 5.1. Introduction to RESTful APIs
  • 5.2. Accessing data from API using request module and Postman
  • 5.3. Overview of JSON-formatted data
  • 5.4. Parsing JSON data
  • 5.5. Importing commonly used files formatted data
  • 5.6. Reading data from PostgreSQL database
Assignment 4: Finalization of final project topic and data set (Not graded)
Week 6
2/19/2020
Numpy/Pandas for Data Munging/Wrangling
  • 6.1. Pandas and numpy data structure
  • 6.2. Querying and reading data
  • 6.3. Reshaping, Indexing, slicing, and filtering data
  • 6.4. Join, Merge, and Aggregation
  • 6.5. Vectorization
  • 6.6. Basic statistics and plotting
Assignment 5: Data wrangling with Numpy and Pandas
Week 7
2/26/2020
Exploratory Data Analysis (EDA)
  • 7.1. Categorical vs numeric features
  • 7.2. Datatype conversion
  • 7.3. Sampling
  • 7.4. Data summary and distribution
  • 7.5. Patterns in data
  • 7.6. Data visualization using matplotlib, seaborn, and Bokeh
  • 7.7 Anomaly/outlier detection
Assignment 6: Patterns in data: Vizualization and data summary
Week 8
3/4/2020
Data Preprocessing
  • 8.1. Basics (select, filter, removal of duplicates)
  • 8.2. Data Transformation
  • 8.3. Standardization, Binning, Missing value treatments
  • 8.4 Balancing dataset
Assignment 6: Data preprocessing
Week 9
3/18/2020
Feature Transformation and Engineering
  • 9.1. Categorical encodings
  • 9.2. Feature creation/engineering
  • 9.3. Feature extraction
Assignment Transformation of categorical and continuous features
Week 10
3/25/2020
Building and Evaluating Models
  • 10.1. Tour of machine learning algorithms using scikit learn
  • 10.2. Introduction to Scikit-learn model development API
  • 10.3. Amazon SageMaker
  • 10.4. Training and fitting classification models
  • 10.5.Training and fitting regression models
  • 10.6. Performance evaluation metrics and curves
Assignment: Model building and evaluation using Scikit-Learn
Week 11
4/1/2020
Best practices in Machine Learning
  • 11.1. Bias vs variance tradeoff
  • 11.2. Train/dev/test dataset
  • 11.3. Regularization
  • 11.4. Learning vs validation curves
  • 11.5. Hyperparameter tuning
  • 11.6. Ensemble learning
  • 11.7. Streamlining workflows with pipelines
Assignment: Regularization, cross validation and hyperparameter tuning
Week 12
4/8/2020
1. Guest Lecture: Data Science at Wells Fargo
2. Discussion on final project status
Quiz 2: Best practices on machine learning
Week 13
4/15/2020
Productionize a Machine Learning model
  • 13.1. Dev/Stage/Prod environment
  • 13.2 Docker , Docker Files, Docker Containers
  • 13.3. Deploy a machine learning model as a Flask app
  • 13.4 Introduction to Airflow
Assignment: Build and deploy a model using Docker and Heroku app
Week 14
4/22/2020
Final Project Demo
Short 5 minutes long individual project demo

t81_577_data_science's People

Contributors

abanskota avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.