Winter semester 22-23, University of Leipzig, 10-INF-DS201
Goals:
- Understand the definitions of standard data science terms, and the associated mathematical terms
- Understand the proofs of how commonly used techniques in data science work
- Implement the algorithms and examples with a computer program
- Investigate the math behind your favorite topic in data science
We first cover two introductory topics
- Linear algebra
- Subspaces
- Orthogonality
- The pseudo-inverse
- the singular value decomposition
- Probability Theory
We then proceed with the following four themes commonly seen in data science
- Network analysis
- Graphs and the Laplace matrix
- The spectrum of a graph
- Markov processes in networks
- Centrality measures
- Machine learning
- Data, models, and learning
- Regeression in statistical models
- Principal component analysis (method for dimension reduction)
- Support vector machines (binary classification method)
- Topological data analysis
- Simplicial complexes and homology
- Matrices and tensors
- Low rank matrices and tensors
-
From 11. October 2022 through 1. February 2023
-
Tuesdays 11:45-12:45 (Lecture)
-
Wednesdays 15:15 - 16:45 (Seminar)
-
SG 2-14
-
Contact: samantha.fairchild(at)mis.mpg.de
-
Office hours: Tuesdays and Wednesdays after class, and by email.
Grading scheme (subject to change pending class composition):
- Homework: assigned every other week, proofs and examples
- Project: Due 18.01 in class: Pick a data science topic and learn about the math behind it. Must include 1 proof and 1 example (~2 pages)
- Exam?: ??, written theory exam covering entire course, mainly computations and examples
Date | Topics |
---|---|
11.10 | Orthogonal projections and the Pseudo-Inverse |
12.10 | No class (Immatrikulationsfeier) |
18.10 | (SM) (uniqueness of) singular value decompositions |
19.10 | * Probability theory introduction: Random variables and Bayes' Theorem |
25.10 | Expected Value and Variance, the normal distribution |
26.10 | Network Analysis: the Laplace matrix |
01.11 | Spectrum of a graph, and the relationship to structure of a graph |
02.11 | (SM) Eigenvectors of the Laplace matrix, Notebook 2 Example |
08.11 | (SM) Diameter of a graph, spanning trees, and definition of a Markov process |
09.11 | (SM) Transition matrices and stationary distributions |
15.11 | * Existence and uniqueness of stationary distributions, Metropolis--Hastings algorithm |
16.11 | No class (Buß- und Bettag) |
22.11 | (SM) Machine Learning: Data, models, and learning |
23.11 | (SM) Linear regression, least squares, MLE |
29.11 | Non-linear regression, MAP and Bayesian approach |
30.11 | Nueral networks |
6.12 | Support vector machines (primal) |
7.12 | Dual SVMs and Kernels |
13.12 | Principal component analysis (PCA) |
14.12 | PCA and SVD, PCA with Gassian prior |
20.12 | Review, introduce project, finish lectures if behind |
21.12-03.01 | Winter Break |
4.01 | Topological data analysis: Simplicies |
10.01 | Simplicial complexes, Čech and Vietoris-Rips complex |
11.01 | Comparing Čech and Vietoris-Rips complex, homology of planar complexes |
17.01 | Homology, Betti numbers, Euler characteristic |
18.01 | Persistent homology |
24.01 | Matrices of low rank |
25.01 | Tensors |
31.01 | Review |
1.02 | Last Day Class, exam? |
This repository contains the Jupyter Notebooks from the class.
In order to use the notebooks:
- Download the notebooks (Click on the green
Code
Button or download as Zip File or use a Git Client such as Github Desktop oder Sublime). - Download the newest version of Juila here.
- Start Juila.
- Enter the package manager by putting in
]
in the package manager. add IJulia
- Leave the package manager with a backspace.
using IJulia
notebook()
Then a browser window should open, in which the local saved notebooks can be opened.D
Other material from the Julia Academy:
The following materials are chosen to complement the course lecture notes
The Fundamental Theorem of Linear Algebra, Gilbert Strang.
Basic Probability Theory, Robert B. Ash.
Spectral Graph Theory (insbesondere Kapitel 1), Fan Chung.
Spectral Graph Theory (insbesondere Vorlesung 5), Thomas Sauerwald and He Sun
Graph Theory in the Information Age, Fan Chung.
Computer Science Theory for the Information Age (insbesondere Notes 5), Venkatesan Guruswami and Ravi Kannan.
Mathematics for Machine Learning (insbesondere Kapitel 8-12), Marc Peter Deisenroth, A. Aldo Faisal und Cheng Soon Ong.
Neural Network Theory, Philipp Christian Petersen
Topological Data Analysis Spring 2020, Magnus Bakke Botnan.
Topological Data Analysis, Ulderico Fugacci.
Geometric Methods on Low-Rank Matrix and Tensor Manifolds, André Uschmajew and Bart Vandereycken.
Tensor Decompositions and Applications, Tamara G. Kolda und Brett W. Bader.