gurobi / gurobi-machinelearning Goto Github PK
View Code? Open in Web Editor NEWFormulate trained predictors in Gurobi models
Home Page: https://gurobi-machinelearning.readthedocs.io/
License: Apache License 2.0
Formulate trained predictors in Gurobi models
Home Page: https://gurobi-machinelearning.readthedocs.io/
License: Apache License 2.0
It's now set in two places pyproject.toml
and __init__.py
.
I don't really get yet how this should be done... I'll try something.
When we do classification with logistic regression, setting the output variables to be binary is part of the model. Currently gurobi_ml will set them. But then we don't restore them if we remove the predictor.
I have a working version but to make it pretty we would need:
As shown by #167, there are things that are not tested for column transformers.
This is an overlook of xgboost support.
It should be easy to do
Every file should contain copyright and license statements
Currently we reshape every list argument to a one dimensional object.
This is not needed and a list of lists should be working easily.
It looks trivial to do from sklearn code.
The hardest is to write the unittest...
Most of the time is spent in generating indicator constraints.
We should try to make that faster and avoid it when possible (bounds already determine branching direction or even generate directly big-Ms).
In China, Baidu's open-sourced ML package PaddlePaddle (19.7k+ stars) is very widely used (China’s No.1 and world’s Top 3). Meanwhile, Gurobi also has many users in China. To maximize the synergistic benefits of gurobi and paddle, I want to build Paddle's based predictor for gurobi-ml. The advantage of using paddle is that it can improve the open source ecosystem of gurobi-ml and attract more. @pobonomo would you be willing to accept a PR?
We don't have one. Main issue is having some small size networks.
Like scikit-learn
we can use anaconda and staging,
then when publishing manually to PyPI, grab the wheel from staging and push.
This is a list of things to fix in the documentation
We currently don't have one.
Need to find an appropriate small training set.
XGBoost is popular for gradient boosting trees. It should be relatively easy to support at least the basic regression model.
We have to straigthen out the examples.
Currently there is:
I think that I should do a slightly more complex examples with Parabolas and then use it for having one basic example with all regression models
Then we could keep Golden and Peak2D but maybe drop them.
Janos is an issue but it's also the only example we have currently that can use the logistic regression. So removing it is an issue.
Price optimization we should probably change to use different regression models. Definitely it is one we will keep.
MNIST I think is fine. We should just have one also for PyTorch. Ideally we could find some small size pre-trained networks that we could use.
There are two general issues with the Janos examples:
It should really return the absolute value of the difference.
Also documentation is wrong about output
We need it for #8
The example in https://github.com/Gurobi/gurobi-machinelearning/blob/main/examples/adversarial/adversarial_keras.ipynb should be now OK for review
If we add a column transformer and one of the transformation is not applied to any column we get an error.
When we have nested models (for e.g. a gradient boosting regression has 100 decision trees by default), we don't necessarily need all the information about the sub-part models as we don't need to remove any individually (i.e. we don't want to remove individually the decision trees that compose a GBT).
All the recording can take a significant amount of time. One issue is that we won't be able to print any statistics about the sub-parts of the model.
In the docs, we should add a paragraph or two explaining where the error can come from and how to address them.
In the current notebooks, we actually have significant errors so we should illustrate how they can be removed.
Currently we only do ordinary least squares (LinearRegression) but the code can also deal with Lasso and Ridge regressions (with l1 and l2 regularization respectively). It's just a matter of associating the correct object to the scikit-learn object.
Currently for logistic regression we model predict_proba
need to model predict
(i.e. classification) also.
The documentation refers to a "member function" in a few places. While technically correct from a OO programming model, this term is rooted deeply in the C++ world. Wouldn't a Pythonist just say "method" for the same thing?
Currently they use the neural network base class.
Historically, because gurobipy.MVar issues there were many tricks that I didn't want to repeat 3 times.
Now that gurobipy is fixed it should actually be very easy to do them separately.
It would be cleaner for the output of what we added and naming of variables.
It would be interesting to have a way of providing bounds value for neurons similarly as done in
this function (add_output_vars).
with correct bounds, it could for example discard many constraints or binary variables introduced to model the Relu layers.
Computation time could be better but not necessarly (i will try to do some experiments)
In the price optimization and Janos examples the problem data is in pandas.
While we manage to do things converting to numpy, it would be certainly nicer if the pandas input was accepted directly.
Also, using gurobipy-pandas, we could handle better the case where some of the input for the predictor we put in the optimization model is "fixed". We also have to do this in the two examples and it could be nicer (see Gurobi/gurobipy-pandas#52).
The branch mlinexpr is attempting to start implementing this.
The optimization model should be similar to other Gradient Boosting but need to figure out how to retrieve the regressor from LigthGBM
Originally posted to community forum
https://support.gurobi.com/hc/en-us/community/posts/15116202641425
The error is coming from these lines:
Not sure if it's a case of missing line or wrong indentation or something else.
I have introduced a small subtle dependency on scikit-learn
For the sklearn regressions we use the function check_is_fitted to check that the regression was fitted.
Now another issue is that the function to register predictor is also included in scikit-learn so that it can be used in pipelines.
It should be indentical to sklearn's StandardScaling
Needed for making #8 cute
When you currently run test_formulation::test_adversarial_activations()
with sklearn != 1.0.2
you get the following warning:
UserWarning: Trying to unpickle estimator LabelBinarizer from version 1.0.2 when using version 1.1.2. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to:
https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
All the other predictors are automatically rebuilt if the sklearn
version differs from the predictors stored as .joblib
files in tests/predictors
. This rebuild method should be added to the class harboring test_adversarial_activations
In the unittest, we currently only check the validity of the models for one input type. We should check more of them. I had two bad bugs in the last week.
If we have a predictor p
that we inserted in a gurobi model m
with input variable x
and ouput variables y
a basic check when looking at the results is that the prediction obtained from x
using p
is indeed y
.
In scikit learn this means checking that:
p.predict(x.X) - y.X
is small.
We should provide the function that computes this error.
When the fixed formulation test fail we execute this line:
that is convenient to see the actual error. But it actually only works with scikit-learn. It would be nice if it worked with everyone.
The example in https://github.com/Gurobi/gurobi-machinelearning/blob/main/examples/adversarial/adversarial_pytorch.ipynb should be now OK for review
The only one I can think of in the litterature is the one of OptiCL
I would like to implement support for sklearn.cross_decomposition.PLSRegression. It's a linear model so it should be rather straightforward. @pobonomo would you be willing to accept a PR? Any preliminary thoughts on the implementation?
Could be prettier in particular for neural networks and pipeline should have some sort of table
Also check how many layer has a neural network? (hidden layers + 1 or hidden layers +2)
To reproduce run the Janos/XGboost notebook and set verbose=True
in the call to add_predictor_constr
, you get the error:
estimators.append(
> TreeEstimator(
self.gp_model,
tree,
self.input,
tree_vars[:, i, :],
self.epsilon,
timer,
self.verbose,
**kwargs,
)
)
E TypeError: __init__() got multiple values for argument 'verbose'
Matthias says no "-" ot "_" in package names is prefereable.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.