GithubHelp home page GithubHelp logo

edwardt / msan501 Goto Github PK

View Code? Open in Web Editor NEW

This project forked from parrt/msan501-old

0.0 3.0 0.0 13.11 MB

USF MSAN501 lecture notes and sample code

License: BSD 3-Clause "New" or "Revised" License

TeX 94.29% Python 5.71%

msan501's Introduction

MSAN501 -- Computational analytics

Written by Terence Parr, prof. of computer science and analytics at the University of San Francisco, with ideas from the faculty.

The content contained in this repository represents a set of exercises for the computational analytics (PDF) 5-week bootcamp for the MS in Analytics program at the University of San Francisco. It collects all of the labs students must complete by the end of the bootcamp in order to pass. The labs start out as very simple tasks or step-by-step recipes but then accelerate in difficulty, culminating with an interesting text analysis project.

Table of contents

Part I -- Introduction

  • Audience and Summary
  • ``Newbies say the darndest things''

Part II -- Python Programming and Data Structures

  • Computing Point Statistics
  • Approximating sqrt(n) with the Babylonian Method
  • Generating Uniform Random Numbers
  • Histograms Using matplotlib
  • Graph Adjacency Lists and Matrices

Part III -- A Taste of Distributed Computing

  • Launching a Virtual Machine at Amazon Web Services
  • Linux command line
  • Using the Hadoop Streaming Interface with Python

Part IV -- Empirical statistics

  • Generating Binomial Distributions
  • Generating Exponential Random Variables
  • The Central Limit Theorem in Action
  • Generating Normal Random Variables
  • Confidence Intervals for Price of Hostess Twinkies
  • Is Free Beer Good For Tips?

Part V -- Optimization and Prediction

  • Iterative Optimization Via Gradient Descent
  • Predicting Murder Rates With Gradient Descent

Part VI -- Text Analysis

  • Summarizing Reuters Articles with TFIDF

Summary

This course is specifically designed as an introduction to analytics programming for those who are not yet skilled programmers. The course also explores many concepts from math and statistics, but in an empirical fashion rather than symbolically as one would do in a math class. Consequently, this course is also useful to programmers who would like to strengthen their understanding of numerical methods.

The exercises are grouped into parts. We begin with simple programs to compute statistics, build simple data structures, and use libraries to create visualizations and then move on to learning to use the UNIX command line, launch virtual computers in the cloud, and write simple Hadoop map-reduce programs. The empirical statistics part strives to give an intuitive feel for random variables, density functions, the central limit theorem, hypothesis testing, and confidence intervals. It's one thing to learn about their formal definitions, but to get a really solid grasp of these concepts, it really helps to observe statistics in action. All of the techniques we'll use in empirical statistics rely on the ability to generate random values from a particular distribution. We can do it all from a uniform random number generator, which is the first exercise in that part.

The optimization exercises deal with minimizing functions. Given a particular function, f(x), optimizing it generally means finding its minimum or maximum, which occur when the derivative goes flat: f'(x) = 0. When the function's derivative cannot be derived symbolically, we're left with a general technique called gradient descent that searches for minima. It's like putting a marble on a hilly surface and letting gravity bring it to the nearest minimum.

Finally, we'll do an exercise that introduces text analysis. We'll compute something called TFIDF that indicates how well that word distinguishes a document from other documents in a corpus. That score is used broadly in text analytics, but our exercise uses it to summarize documents by listing the most important words.

msan501's People

Contributors

parrt avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.