GithubHelp home page GithubHelp logo

machinelearningbcam / constraint-generation-for-mrcs-uai-2023 Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 0.0 120.36 MB

Efficient Learning of Minimax Risk Classifiers in High Dimensions

Home Page: https://github.com/MachineLearningBCAM/MRCpy

Python 100.00%
efficient-learning high-dimensionality machine-learning supervised-classification-methods minimax-risk-classifiers python3

constraint-generation-for-mrcs-uai-2023's Introduction

Efficient Learning of Minimax Risk Classifiers in High Dimensions

made-with-python Ask Me Anything !

This repository is the official implementation of Efficient Learning of Minimax Risk Classifiers in High Dimensions.

The algorithm proposed in the paper provides efficient learning for 0-1 minimax risk classifiers (MRCs) in high dimensions based on constraint generation. In particular, the iterative algorithm utilizes greedy feature selection to obtain a sequence of MRCs with decreasing worst-case error probability. Such performance assessment at learning provides a trade-off between training time and approximation error.

Decreasing worst-case errors

Scalability

Worst-case errors and training times using multiple high-dimensional datasets

Requirements

The standard libraries required are listed in the file requirements.txt. To install these libraries using

  1. pip
pip install -r requirements.txt
  1. conda environment
conda create --name <environment_name> --file requirements.txt

The code also requires the MRCpy library which can be installed from this repository. In addition, the implementation of the proposed algorithm utilizes the GUROBI optimizer for which license can be downloaded from here.

Data

The repository contains multiple high-dimensional biological datasets to perform experiments as follows -

Dataset Variables Samples Classes
Arcene 10000 200 2
Colon 2000 62 2
CLL_SUB_111 11340 111 3
Dorothea 100000 1150 2
GLI_85 22283 85 2
GLIOMA 4434 50 4
Leukemia 7129 72 3
Lung 12600 203 5
MLL 12582 72 3
Ovarian 15154 253 2
Prostate_GE 5966 102 2
SMK_CAN_187 19993 187 2
TOX_171 5748 171 4

The datasets are available as zip files in this repository in the Datasets folder with functions to easily load them an numpy matrices in load.py.

Evaluation

To reproduce the results in the paper, use the following python scripts:

  • param_eps.py : Experiments to study the influence of epsilon parameter and show the monotonic decrease in the worst-case error probability.
  • param_n_max.py : Experiment to analyze the effect of the parameter n_max and show improved efficiency over MRC-LP.
  • scalability.py : Experiments to show the scalability of the different algorithms with increasing number of features.
  • comparison.py : Experiments to compare our method with the state of the art techniques in terms of error and training time.
  • feature_selection.py : Experiment to compare our method as feature selection approach. The experiment is performed for binary classification datasets.

The implementation of the proposed algorithm along with the LP formulation of MRCs can be found in Main folder. The code for the SVM-CG can be found in Libraries folder which is forked from this respository.

To reproduce any of the experimental results for any dataset, run the following command -

python <scriptname> <dataset>

The result corresponding to that experiment and dataset will be saved in the Results/ folder in the respective folder of the experiment and the dataset. For instance, the experiment for comparing the scalability of different algorithms for the Ovarian dataset can be performed as follows -

python3 scalability.py Ovarian

and the corresponding results will be saved in Results/Scalability/Ovarian/ as CSV.

Usage

The proposed algorithm is also implemented in the MRCpy library which can be installed from here. After the installation, one can use the proposed algorithm through the library as follows

from MRCpy import MRC

# fitting
mrc_classif = MRC(loss = '0-1, solver = 'cg', n_max = 100, k_max = 20, eps = 1e-4)
mrc_classif.fit(X,y)

# prediction
predicted_labels = mrc_classif.predict(X)

For further details, please check out the documentation of the library in here

Support and Authors

Kartheek Reddy Bondugula

[email protected]

Santiago Mazuelas

[email protected]

Aritz Pérez

[email protected]

Citation

If you find useful the code in your research, please include explicit mention of our work in your publication with the following corresponding entry in your bibliography:

[1] K. Bondugula, S. Mazuelas, A. Pérez "Efficient Learning of Minimax Risk Classifiers in High Dimensions." The 39th Conference on Uncertainty in Artificial Intelligence. PMLR, 2023.

The corresponding BiBTeX citation is given below:

@InProceedings{BonMazPer23, title = {Efficient Learning of Minimax Risk Classifiers in High Dimensions}, author = {Bondugula, Kartheek and Mazuelas, Santiago and P'{e}rez, Aritz}, booktitle = {The 39th Conference on Uncertainty in Artificial Intelligence}, pages = {206--215}, year = {2023}, volume = {216}, series = {Proceedings of Machine Learning Research}, publisher = {PMLR}}

constraint-generation-for-mrcs-uai-2023's People

Contributors

kartheekcic avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.