smith478 / modeltools Goto Github PK
View Code? Open in Web Editor NEWPython tools for model exploration
License: MIT License
Python tools for model exploration
License: MIT License
add multivariate linear analysis so that we can view how correlations affect the linear model this wil give
some insight into the ensemble/non-linear models. Could also add in interaction detection to non-linear models
using Friedman's H statistic, then add those interactions into the linear model to see how close we get in performance
since linear model will give intuitive interpretation
We should make it so that the hyperparameters always get passed as dicts. Maybe build in some default values too.
I moved some workbooks to top-level, because I couldn't figure out how to reference this libraries otherwise. We could consolidate the two that I have working. The others may need to be updated for the changes made.
In the univariate analysis section I get an error when I use auc as my metric.
Per Dan:
Build in variable importance function that uses:
built in functions with sci-kit learn
Shapley Value based importance (run-time would be 2^n (number of models to fit) where n is the number of predictors/features in the model)
Perhaps we could use correlation to make a network so that instead of testing all coalitions, we only test those with high correlation
The assumption would be that the contribution of independent variables woud be roughly additive. (this seems fair)
We would still look at all possible subsets, but for uncorrelated variables, we could just add up their contributions
If Shaply Value importance is fit on training and evaluated on holdout, then after we calculate Shapley we could just remove all variables with a negative shapley value
This would be an alternative to forward/backward regression for variable selection
Figure out a way to evaluate variable importance when using dummy variables
My new Model class needs random seeds strung through it. As well, some of the existing logic could use a second look.
Per Dan:
Add multivariate GLM part to visualize predicted values, taking into account correlations
We should also look at using/testing polynomial (rather than just linear) regression for our continuous variables
Think about simplifications of variables.. Is there a way to automate this part. At very least, do forward/backward regression with
a linear model to see if strange/undesireable things are happening.
In Model selection piece, use Bayesian Optimization to do hyper-parameter tuning
Look at adding a double lift chart, this will also indicate model diversity for possible ensembling
Or for classification look at confusion matrix and see if two models are doing well on different segments
If we pass through an already implemented library, this is a light-weight meta-heuristic. It will be a nice way to show off the flexibility of the Model class.
We can build in a variable type functions to the new Model_data class. Then we can combine the many regression/classification functions, by looking at that type and picking the right thing.
It's possible that df_sorted has every value greater than p_threshold in the ForBack functions. In that case, the if statement at the end of the backwards regression never gets run and df_sorted_final never gets set (https://github.com/smith478/ModelTools/blob/master/ModelTools.py#L1108), and this causes an error when we try to use df_sorted_final later.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.