gurobi / gurobi-machinelearning Goto Github PK

View Code? Open in Web Editor NEW

185.0 13.0 37.0 47.69 MB

Formulate trained predictors in Gurobi models

Home Page: https://gurobi-machinelearning.readthedocs.io/

License: Apache License 2.0

Python 65.46% Makefile 0.08% Jupyter Notebook 34.45%

gurobi machine-learning mathematical-optimization python

gurobi-machinelearning's Issues

Fix version of package

It's now set in two places pyproject.toml and __init__.py.

I don't really get yet how this should be done... I'll try something.

Type of variables for classication

When we do classification with logistic regression, setting the output variables to be binary is part of the model. Currently gurobi_ml will set them. But then we don't restore them if we remove the predictor.

Try doing the advocado price optimization model with the framework

I have a working version but to make it pretty we would need:

a column transformer
also if we could actually also hook with gurobipy-pandas it will probably be easier.

Missing tests for column transformer

As shown by #167, there are things that are not tested for column transformers.

Support xgboost sklearn API in sklearn pipelines

This is an overlook of xgboost support.
It should be easy to do

Add license to every file

Every file should contain copyright and license statements

Accept list of lists of variables as argument to add_pred_constr

Currently we reshape every list argument to a one dimensional object.

This is not needed and a list of lists should be working easily.

Make the decision tree work with more than one output

It looks trivial to do from sklearn code.

The hardest is to write the unittest...

Add the code
Make a unittest
Update documentation (remove limitations 👍 )
Add a note somewhere that this is tested but we don't have a use case so one has to be very careful...

Speed up model generation for decision trees

Most of the time is spent in generating indicator constraints.
We should try to make that faster and avoid it when possible (bounds already determine branching direction or even generate directly big-Ms).

[feature request] support paddle

In China, Baidu's open-sourced ML package PaddlePaddle (19.7k+ stars) is very widely used (China’s No.1 and world’s Top 3). Meanwhile, Gurobi also has many users in China. To maximize the synergistic benefits of gurobi and paddle, I want to build Paddle's based predictor for gurobi-ml. The advantage of using paddle is that it can improve the open source ecosystem of gurobi-ml and attract more. @pobonomo would you be willing to accept a PR?

Unittest for Keras

We don't have one. Main issue is having some small size networks.

Add wheel building and publishing

Like scikit-learn we can use anaconda and staging,
then when publishing manually to PyPI, grab the wheel from staging and push.

Formatting issues in the docs

This is a list of things to fix in the documentation

Links in the notebooks
Find out how the api should really be documented (modules?)
Some legal stuff somewhere?
Links to how to get a Gurobi license
Format links to gurobipy things and other python packages in the same way
Output of gurobi in the notebooks (there's an ugly trick now...)
Explicitely mention supported versions of sklearn/tensorflow/pytorch

Unittest for logistic regression

We currently don't have one.
Need to find an appropriate small training set.

Add support for XGBoost regressors

XGBoost is popular for gradient boosting trees. It should be relatively easy to support at least the basic regression model.

Finzalize Examples

We have to straigthen out the examples.
Currently there is:

I think that I should do a slightly more complex examples with Parabolas and then use it for having one basic example with all regression models
Then we could keep Golden and Peak2D but maybe drop them.
Janos is an issue but it's also the only example we have currently that can use the logistic regression. So removing it is an issue.
Price optimization we should probably change to use different regression models. Definitely it is one we will keep.
MNIST I think is fine. We should just have one also for PyTorch. Ideally we could find some small size pre-trained networks that we could use.

Issues with Janos examples

There are two general issues with the Janos examples:

They use data that I copied from their repositories https://github.com/INFORMSJoC/2020.1023/tree/master/data this should rather be downloaded
The examples are questionable and part of the notebook is devoted to demonstrating it

Unittest for pytorch

Fixes to get_error

It should really return the absolute value of the difference.

Also documentation is wrong about output

Add one hot encoder

We need it for #8

MNIST Tensor flow example

The example in https://github.com/Gurobi/gurobi-machinelearning/blob/main/examples/adversarial/adversarial_keras.ipynb should be now OK for review

Bug with a column transformer and empty list of columns

If we add a column transformer and one of the transformation is not applied to any column we get an error.

Make model generation faster for nested formulations

When we have nested models (for e.g. a gradient boosting regression has 100 decision trees by default), we don't necessarily need all the information about the sub-part models as we don't need to remove any individually (i.e. we don't want to remove individually the decision trees that compose a GBT).

All the recording can take a significant amount of time. One issue is that we won't be able to print any statistics about the sub-parts of the model.

Explain better the source of infeasibilites w.r.t. predictor in decision tree models

In the docs, we should add a paragraph or two explaining where the error can come from and how to address them.

In the current notebooks, we actually have significant errors so we should illustrate how they can be removed.

We can deal with all linear regression models of scikit-learn

Currently we only do ordinary least squares (LinearRegression) but the code can also deal with Lasso and Ridge regressions (with l1 and l2 regularization respectively). It's just a matter of associating the correct object to the scikit-learn object.

Logistic Regression Output

Currently for logistic regression we model predict_proba need to model predict (i.e. classification) also.

Add option to the LogisticRegressionConstr
Integrate in test (add test for classification)
Document it

Avoid terminus "member function", just use "method"

The documentation refers to a "member function" in a few places. While technically correct from a OO programming model, this term is rooted deeply in the C++ world. Wouldn't a Pythonist just say "method" for the same thing?

Cleanup sklearn's Linear and Logistic Regression

Currently they use the neural network base class.

Historically, because gurobipy.MVar issues there were many tricks that I didn't want to repeat 3 times.
Now that gurobipy is fixed it should actually be very easy to do them separately.

It would be cleaner for the output of what we added and naming of variables.

Errors for Keras ReLU parameters that we don't handle

Tests should generate the predictor they need if the file is not there

Try to get rid of the sklearn warning when using get_error with a pandas trained regression

[Feature request] Provide lower and upper bounds to intermediate neurons

It would be interesting to have a way of providing bounds value for neurons similarly as done in
this function (add_output_vars).
with correct bounds, it could for example discard many constraints or binary variables introduced to model the Relu layers.

Computation time could be better but not necessarly (i will try to do some experiments)

Support pandas input, handle better fixed features

In the price optimization and Janos examples the problem data is in pandas.
While we manage to do things converting to numpy, it would be certainly nicer if the pandas input was accepted directly.

Also, using gurobipy-pandas, we could handle better the case where some of the input for the predictor we put in the optimization model is "fixed". We also have to do this in the two examples and it could be nicer (see Gurobi/gurobipy-pandas#52).

The branch mlinexpr is attempting to start implementing this.

Support for LightGBM

The optimization model should be similar to other Gradient Boosting but need to figure out how to retrieve the regressor from LigthGBM

[feature request] Support sklearn monotonic constraints

https://scikit-learn.org/stable/auto_examples/ensemble/plot_monotonic_constraints.html

BUG: local variable 'trans_constr' referenced before assignment

Originally posted to community forum
https://support.gurobi.com/hc/en-us/community/posts/15116202641425

The error is coming from these lines:

https://github.com/Gurobi/gurobi-machinelearning/blob/88f953c7b17e4d3c9168dc10b0ad1a270f4112f2/src/gurobi_ml/sklearn/column_transformer.py#LL76C1-L80C1

Not sure if it's a case of missing line or wrong indentation or something else.

Remove dependency on scikit-learn

I have introduced a small subtle dependency on scikit-learn

For the sklearn regressions we use the function check_is_fitted to check that the regression was fitted.

Now another issue is that the function to register predictor is also included in scikit-learn so that it can be used in pipelines.

Add Keras Normalization layer

It should be indentical to sklearn's StandardScaling

Add column_transformer

Needed for making #8 cute

create build_predictors() for test_adversarial_activations

When you currently run test_formulation::test_adversarial_activations() with sklearn != 1.0.2 you get the following warning:

UserWarning: Trying to unpickle estimator LabelBinarizer from version 1.0.2 when using version 1.1.2. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to:
  https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations

All the other predictors are automatically rebuilt if the sklearn version differs from the predictors stored as .joblib files in tests/predictors. This rebuild method should be added to the class harboring test_adversarial_activations

More tests on input

In the unittest, we currently only check the validity of the models for one input type. We should check more of them. I had two bad bugs in the last week.

Have an error function to inspect results of optimization

If we have a predictor p that we inserted in a gurobi model m with input variable x and ouput variables ya basic check when looking at the results is that the prediction obtained from x using p is indeed y.
In scikit learn this means checking that:

p.predict(x.X) - y.X

is small.
We should provide the function that computes this error.

Issues in fixed formulation test when it fails

When the fixed formulation test fail we execute this line:

gurobi-machinelearning/tests/fixed_formulation.py

Line 87 in c09fde4

print(f"Error: {y.X} != {predictor.predict(examples)}")

that is convenient to see the actual error. But it actually only works with scikit-learn. It would be nice if it worked with everyone.

            estimators.append(
>               TreeEstimator(
                    self.gp_model,
                    tree,
                    self.input,
                    tree_vars[:, i, :],
                    self.epsilon,
                    timer,
                    self.verbose,
                    **kwargs,
                )
            )
E           TypeError: __init__() got multiple values for argument 'verbose'

Package name

Matthias says no "-" ot "_" in package names is prefereable.

gurobi / gurobi-machinelearning Goto Github PK

gurobi-machinelearning's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs