GithubHelp home page GithubHelp logo

nolan-h-hamilton / emsco Goto Github PK

View Code? Open in Web Editor NEW
3.0 1.0 0.0 1.02 MB

Code for "Budgeted Classification with Rejection: An Evolutionary Method with Multiple Objectives", at IEEE CEC 2022

License: MIT License

Python 98.82% Shell 1.18%
selective-classification budgeted-learning learning-with-rejection multiobjective-learning

emsco's Introduction

EMSCO - Evolutionary Multi-Stage Classifier Optimizer

EMSCO is used for constructing budgeted, selective classification systems. Three objectives: (i) accuracy, (ii) coverage, and (iii) processing cost are optimized over a feasible region of ordered feature set partitions that define a sequential, budgeted classification protocol with early-exits and a terminal reject option. To effectively traverse a large $\theta(k^n)$ search space and manage multiple objectives in a Pareto efficient manner, a problem-specific multi-objective evolutionary algorithm is utilized.

Pipeline

emsco_sweep.py runs EMSCO over a sweep of confidence thresholds to evaluate its performance in a full range of risk-averse, budgeted contexts. Due to the stochastic nature of evolutionary algorithms, multiple (--runs) EA runs are conducted for each confidence threshold to compute average out-of-sample performance on the test set. This script, with sufficient `--runs' and a reasonable range of confidence thresholds is suggested if using EMSCO as a benchmark for comparison.

resources/cost_acc.py accepts the output of emsco_sweep.py and produces a cost-accuracy trade-off curve color-coded by coverage

Citation

@INPROCEEDINGS{9870382,
  author={Hamilton, Nolan H. and Fulp, Errin W.},
  booktitle={2022 IEEE Congress on Evolutionary Computation (CEC)}, 
  title={Budgeted Classification with Rejection: An Evolutionary Method with Multiple Objectives}, 
  year={2022},
  volume={},
  number={},
  pages={1-10},
  doi={10.1109/CEC55065.2022.9870382}}

Dependencies

Python3

EMSCO uses a few popular Python packages as dependencies:

numpy
pandas
scikit-learn
matplotlib

See resources/environment_details.txt for exact versions of each package and Python that produced the results in the Example Use section.

Input

  • --train_data: a training split in CSV format (see example/example_train.csv)
  • --val_data: a validation split in CSV format (see example/example_val.csv)
  • --test_data: a test split in CSV format (see example/example_test.csv)
  • --cost_data: a text file specifying acquisition costs for each feature (see example/example_costs.txt)

CSV files are expected to be formatted with header: f0,f1,f2,...,f[n],label where f0 corresponds to the first feature value for the record, and label corresponds to the label of the record.

The costs file should maintain the same feature order as in the training/validation/test files.

Parameters

Use python3 emsco_sweep.py -h for a list of all parameters. We provide extended descriptions of several parameters here.

  • --pop_size: default=300, number of chromosomes comprising each generation's population
  • --iter_num: default=150, number of generations over which to optimize
  • --exp_num: default=10, number of distinct confidence thresholds to test. see --sweep.
  • --min_prob: default=0.55, minimum confidence threshold to accept prediction
  • --inc: default=1, pop_size increment in case elite population grows to pop_size
  • --sweep: default=.05, the tested confidence thresholds are given by {min_prob + i*sweep for i=0...exp_num-1}
  • --runs: default=3, number of EMSCO runs for each confidence threshold to compute average performance. Increase as necessary to reduce variance of performance estimates.

Example Use

The results and plot in the example/ directory can be generated with the following commands.

  • run EMSCO 10 times for each confidence threshold in [.55,...,.95], and record average test performance.

python3 emsco_sweep.py --train_data example/example_train.csv --val_data example/example_val.csv --test_data example/example_test.csv --cost_data example/example_costs.txt --exp_num 9 --runs 10 --max_stages 5 --sweep .05 --min_prob .55 --out example/example_sweep_results.txt

  • Plot the results. The cost-accuracy tradeoff curve is an important tool for measuring and comparing performance of budgeted classifiers. Since EMSCO is also selective, coverage must be accounted for too, so each point on the curve is color-coded according to this metric ($g_1$).

python3 resources/cost_acc.py example/example_sweep_results.txt

Test Performance

emsco's People

Contributors

nolan-h-hamilton avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.