GithubHelp home page GithubHelp logo

gurobi / gurobi-machinelearning Goto Github PK

View Code? Open in Web Editor NEW
185.0 13.0 37.0 47.69 MB

Formulate trained predictors in Gurobi models

Home Page: https://gurobi-machinelearning.readthedocs.io/

License: Apache License 2.0

Python 65.46% Makefile 0.08% Jupyter Notebook 34.45%
gurobi machine-learning mathematical-optimization python

gurobi-machinelearning's Issues

Fix version of package

It's now set in two places pyproject.toml and __init__.py.

I don't really get yet how this should be done... I'll try something.

Type of variables for classication

When we do classification with logistic regression, setting the output variables to be binary is part of the model. Currently gurobi_ml will set them. But then we don't restore them if we remove the predictor.

Make the decision tree work with more than one output

It looks trivial to do from sklearn code.

The hardest is to write the unittest...

  • Add the code
  • Make a unittest
  • Update documentation (remove limitations 👍 )
  • Add a note somewhere that this is tested but we don't have a use case so one has to be very careful...

Speed up model generation for decision trees

Most of the time is spent in generating indicator constraints.
We should try to make that faster and avoid it when possible (bounds already determine branching direction or even generate directly big-Ms).

[feature request] support paddle

In China, Baidu's open-sourced ML package PaddlePaddle (19.7k+ stars) is very widely used (China’s No.1 and world’s Top 3). Meanwhile, Gurobi also has many users in China. To maximize the synergistic benefits of gurobi and paddle, I want to build Paddle's based predictor for gurobi-ml. The advantage of using paddle is that it can improve the open source ecosystem of gurobi-ml and attract more. @pobonomo would you be willing to accept a PR?

Formatting issues in the docs

This is a list of things to fix in the documentation

  • Links in the notebooks
  • Find out how the api should really be documented (modules?)
  • Some legal stuff somewhere?
  • Links to how to get a Gurobi license
  • Format links to gurobipy things and other python packages in the same way
  • Output of gurobi in the notebooks (there's an ugly trick now...)
  • Explicitely mention supported versions of sklearn/tensorflow/pytorch

Finzalize Examples

We have to straigthen out the examples.
Currently there is:

  • Some examples where we approximate functions and minimize them:
    • parabolas
    • GoldenStein
    • Peak2D
  • The Janos example
  • The price optimization example
  • Adversarial MNIST

I think that I should do a slightly more complex examples with Parabolas and then use it for having one basic example with all regression models
Then we could keep Golden and Peak2D but maybe drop them.
Janos is an issue but it's also the only example we have currently that can use the logistic regression. So removing it is an issue.
Price optimization we should probably change to use different regression models. Definitely it is one we will keep.
MNIST I think is fine. We should just have one also for PyTorch. Ideally we could find some small size pre-trained networks that we could use.

Fixes to get_error

It should really return the absolute value of the difference.

Also documentation is wrong about output

Make model generation faster for nested formulations

When we have nested models (for e.g. a gradient boosting regression has 100 decision trees by default), we don't necessarily need all the information about the sub-part models as we don't need to remove any individually (i.e. we don't want to remove individually the decision trees that compose a GBT).

All the recording can take a significant amount of time. One issue is that we won't be able to print any statistics about the sub-parts of the model.

We can deal with all linear regression models of scikit-learn

Currently we only do ordinary least squares (LinearRegression) but the code can also deal with Lasso and Ridge regressions (with l1 and l2 regularization respectively). It's just a matter of associating the correct object to the scikit-learn object.

Logistic Regression Output

Currently for logistic regression we model predict_proba need to model predict (i.e. classification) also.

  • Add option to the LogisticRegressionConstr
  • Integrate in test (add test for classification)
  • Document it

Avoid terminus "member function", just use "method"

The documentation refers to a "member function" in a few places. While technically correct from a OO programming model, this term is rooted deeply in the C++ world. Wouldn't a Pythonist just say "method" for the same thing?

Cleanup sklearn's Linear and Logistic Regression

Currently they use the neural network base class.

Historically, because gurobipy.MVar issues there were many tricks that I didn't want to repeat 3 times.
Now that gurobipy is fixed it should actually be very easy to do them separately.

It would be cleaner for the output of what we added and naming of variables.

Support pandas input, handle better fixed features

In the price optimization and Janos examples the problem data is in pandas.
While we manage to do things converting to numpy, it would be certainly nicer if the pandas input was accepted directly.

Also, using gurobipy-pandas, we could handle better the case where some of the input for the predictor we put in the optimization model is "fixed". We also have to do this in the two examples and it could be nicer (see Gurobi/gurobipy-pandas#52).

The branch mlinexpr is attempting to start implementing this.

Support for LightGBM

The optimization model should be similar to other Gradient Boosting but need to figure out how to retrieve the regressor from LigthGBM

Remove dependency on scikit-learn

I have introduced a small subtle dependency on scikit-learn

For the sklearn regressions we use the function check_is_fitted to check that the regression was fitted.

Now another issue is that the function to register predictor is also included in scikit-learn so that it can be used in pipelines.

create build_predictors() for test_adversarial_activations

When you currently run test_formulation::test_adversarial_activations() with sklearn != 1.0.2 you get the following warning:

UserWarning: Trying to unpickle estimator LabelBinarizer from version 1.0.2 when using version 1.1.2. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to:
  https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations

All the other predictors are automatically rebuilt if the sklearn version differs from the predictors stored as .joblib files in tests/predictors. This rebuild method should be added to the class harboring test_adversarial_activations

More tests on input

In the unittest, we currently only check the validity of the models for one input type. We should check more of them. I had two bad bugs in the last week.

Have an error function to inspect results of optimization

If we have a predictor p that we inserted in a gurobi model m with input variable x and ouput variables ya basic check when looking at the results is that the prediction obtained from x using p is indeed y.
In scikit learn this means checking that:

p.predict(x.X) - y.X

is small.
We should provide the function that computes this error.

Support for PLSRegression from scikit-learn

I would like to implement support for sklearn.cross_decomposition.PLSRegression. It's a linear model so it should be rather straightforward. @pobonomo would you be willing to accept a PR? Any preliminary thoughts on the implementation?

Output of print_stats

Could be prettier in particular for neural networks and pipeline should have some sort of table

Also check how many layer has a neural network? (hidden layers + 1 or hidden layers +2)

XGBoost constraints don't work with verbose mode

To reproduce run the Janos/XGboost notebook and set verbose=True in the call to add_predictor_constr, you get the error:

            estimators.append(
>               TreeEstimator(
                    self.gp_model,
                    tree,
                    self.input,
                    tree_vars[:, i, :],
                    self.epsilon,
                    timer,
                    self.verbose,
                    **kwargs,
                )
            )
E           TypeError: __init__() got multiple values for argument 'verbose'

Package name

Matthias says no "-" ot "_" in package names is prefereable.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.