This course is designed to build foundations for predictive modeling. Many available resourses are either statistical/theoretical in nature or data science programming focused. After one has gone through learnings, some key questions remains:
- how model is set up in real life use cases?
- why a particular set up such as y=f(x) has predictive powers?
- how to interpret predictive power metrics such as R-square, partial-R-square, KS, AUC, recall/precision, etc?
- how to build/deploy?
In short, this course offers to close the gap between learnings and practicing.
Setting up vscode(local dev) /python / github(code repo) ... Building data set needed for modeling ...
a. definition: find signal to some `future' outcome. Key is future. ... b. a time series example for a single entity such a stock ticker ... c. multiple time series (identical independent distributed entities) example ... d. what does the usual y=f(x) setup entail? ... e. t is everything: difference amongst y_t+1 = f(x_t) vs y_t =f(x_t) vs y_t = f_t(x_t) ... f. causality vs statistical relationship. Only certain statistical relationship will be considered 'predictive' ...
a. OLS ... b. simple regression ... c. logistic regression ... d. simple tree ... e. network ...
a. design x, y, splits (train, validation, test sets) ... b. feature engineering ... c. feature selection ... d. model selection ... e. score cards and model objects ...
a. predictive power metrics ... b. cross time validation ... c. bias-variance tradeoff ... d. underfitting/overfitting ... e. leaking ...
a. using score cards with database ... b. using model object with python ... c. end point using sagemaker ...
a. skewed data in y ... b. skewed data in x ... c. boosting vs bootstrapping ... d. drifting and time travel ... e. incrementality or controlable model impact ... f. cicd