Content for 2 weeks of class (10 days).
Topic of the day | Things to cover | |
---|---|---|
Mon | 1. Robust ML | Tree & Mult Models, Validation, ColumnTransformer, Pipelines |
Tue | 2. Data Cleaning | Missings & Outliers (Drop vars, impute vars) |
Wed | 3. Numerical encodings | MinMaxScaler, StandardScaler, BoxCox, QuantileTransformer |
Thu | 4. Categorical encodings | Ordinal, Binary, OneHot, Mean Enc., CatBoost |
Fri | 5. Feature Selection & Dim Reduction | PCA, tSNE, UMAP, VarianceThreshold |
Mon | 6. FE for NLP | BoW, TFIDF, N-Grams |
Tue | 7. FE for Time Series | Lag features, TSfresh |
Wed | 8. FE for Geographic data | Lat, lon. population |
Thu | 9. FE for Several tables | Manually merge & join, featuretools |
Fri | 10. Kaggle challenge |