smith478 / modeltools Goto Github PK

View Code? Open in Web Editor NEW

0.0 0.0 3.0 15.94 MB

Python tools for model exploration

License: MIT License

Python 0.87% Jupyter Notebook 99.13%

modeltools's People

Contributors

Watchers

Forkers

gaffney2010 gitter-badger

modeltools's Issues

Add Multivariate Linear Analysis

add multivariate linear analysis so that we can view how correlations affect the linear model this wil give
some insight into the ensemble/non-linear models. Could also add in interaction detection to non-linear models
using Friedman's H statistic, then add those interactions into the linear model to see how close we get in performance
since linear model will give intuitive interpretation

Packing/Unpacking Hyperparameters

We should make it so that the hyperparameters always get passed as dicts. Maybe build in some default values too.

Fix Jupyter workbooks

I moved some workbooks to top-level, because I couldn't figure out how to reference this libraries otherwise. We could consolidate the two that I have working. The others may need to be updated for the changes made.

Fix AUC on Univariate Analysis

In the univariate analysis section I get an error when I use auc as my metric.

Build Variable Importance Function

Per Dan:

Build in variable importance function that uses:
built in functions with sci-kit learn
Shapley Value based importance (run-time would be 2^n (number of models to fit) where n is the number of predictors/features in the model)
Perhaps we could use correlation to make a network so that instead of testing all coalitions, we only test those with high correlation
The assumption would be that the contribution of independent variables woud be roughly additive. (this seems fair)
We would still look at all possible subsets, but for uncorrelated variables, we could just add up their contributions
If Shaply Value importance is fit on training and evaluated on holdout, then after we calculate Shapley we could just remove all variables with a negative shapley value
This would be an alternative to forward/backward regression for variable selection

Figure out a way to evaluate variable importance when using dummy variables

Audit seeding logic

My new Model class needs random seeds strung through it. As well, some of the existing logic could use a second look.

GLM, etc.

Per Dan:

Add multivariate GLM part to visualize predicted values, taking into account correlations

We should also look at using/testing polynomial (rather than just linear) regression for our continuous variables

Think about simplifications of variables.. Is there a way to automate this part. At very least, do forward/backward regression with
a linear model to see if strange/undesireable things are happening.

Implement Bayesian Optimization for Parameter Tuning

In Model selection piece, use Bayesian Optimization to do hyper-parameter tuning
Look at adding a double lift chart, this will also indicate model diversity for possible ensembling
Or for classification look at confusion matrix and see if two models are doing well on different segments

Implement a Particle Swarm Algorithm

If we pass through an already implemented library, this is a light-weight meta-heuristic. It will be a nice way to show off the flexibility of the Model class.

Use variable types to combine regression/classification

We can build in a variable type functions to the new Model_data class. Then we can combine the many regression/classification functions, by looking at that type and picking the right thing.

df_sorted_final never set

It's possible that df_sorted has every value greater than p_threshold in the ForBack functions. In that case, the if statement at the end of the backwards regression never gets run and df_sorted_final never gets set (https://github.com/smith478/ModelTools/blob/master/ModelTools.py#L1108), and this causes an error when we try to use df_sorted_final later.

smith478 / modeltools Goto Github PK

modeltools's People

Contributors

Watchers

Forkers

modeltools's Issues

Add Multivariate Linear Analysis

Packing/Unpacking Hyperparameters

Fix Jupyter workbooks

Fix AUC on Univariate Analysis

Build Variable Importance Function

Audit seeding logic

GLM, etc.

Implement Bayesian Optimization for Parameter Tuning

Implement a Particle Swarm Algorithm

Use variable types to combine regression/classification

df_sorted_final never set

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs