GithubHelp home page GithubHelp logo

heteroeval's Introduction

heteroeval

Build Status GitHub issues

A Python package designed for the evaluation of machine learning models with heterogeneous test data.

Features

Imagine a scenario where the observed data consists of multiple groups, and the composition of these groups changes in a non-stationary manner. If the expected value of a machine learning model's evaluation metric varies by group, and this expected value doesn't vary based on factors other than the group, the model's evaluation metric will fluctuate non-stationarily unless viewed group-by-group. This fluctuation complicates the comparison of evaluation metrics across different models. This library aids in automatically determining an appropriate grouping method for such scenarios, ensuring that if the model remains consistent, its evaluation metrics within each group will too.

Detailed Usecase

Within the health application domain, it's crucial to monitor metrics like physical activity, dietary habits, and sleep patterns to forecast health risks. Given the diverse user base, ranging from teenagers to retirees in their 60s and from active athletes to office workers, prediction complexities can significantly differ between groups. Moreover, if a dataset doesn't have a balanced representation of each user group, certain groups might overly influence the results. This highlights the need to segment predictions and evaluations based on distinct user demographics.

However, overly detailed segmentation brings its own set of challenges. Segmenting users into numerous specific groups can lead to scarce evaluation data for each segment. Evaluating based on smaller datasets can result in greater metric variance, making it harder to accurately assess machine learning models.

To address this, it's essential to group users with the right level of granularity. heteroeval provides a solution by suggesting the best granularity for user grouping, considering evaluation metric trends and the amount of evaluation data, without depending on the actual feature values. For instance, if metrics for users in their 20s are similar to those in their 30s, heteroeval might advise clustering these age groups together.

By utilizing heteroeval, professionals can account for the unique evaluation metrics of different user groups, ensuring a more precise model evaluation.

Mathematical Formulation

1. Evaluation Metric Calculation

For a given model $m$, regime $r$, group $G$, and data point $i$, we calculate the evaluation metric using a generic function $F$.

$$E_{m,r,G} = F(y_{m,r,i}, \hat{y}_{m,r,i})$$

Where:

  • $E_{m,r,G}$ represents the evaluation metric for model $m$, regime $r$, and group $G$.
  • $y_{m,r,i}$ and $\hat{y}_{m,r,i}$ denote the true value and predicted value, respectively.
  • $F$ is a general function to compute the evaluation metric. As an example, the squared error can be used and is represented as:
$$F(y, \hat{y}) = \frac{1}{N_{r,G}} \sum_{i \in I_{r,G}} (y_{m,r,i} - \hat{y}_{m,r,i})^2$$

Here, $I_{r,G}$ is the index set for regime $r$ and group $G$.

2. Inter-group Evaluation Metric Variation

Given a grouping rule $g$, we compute the variation in evaluation metrics for each group $G$.

$$V_{m, g, G} = \text{Aggregate}_{\text{inter-regime}}(E_{m,r_1,G}, E_{m,r_2,G}, \ldots, E_{m,r_{K},G})$$

Where:

  • $\text{Aggregate}_{\text{inter-regime}}$ is a general function to aggregate evaluation metrics across regimes. An example implementation can be the standard deviation.

3. Cost Function

The cost function calculates the average of the evaluation metric variations $V_{m,g,G}$ for each group $G$ for a model $m$, and then aggregates these results across the model.

$$C_{m,g} = \text{Aggregate}_{\text{group}}(V_{m,g,G_1}, V_{m,g,G_2}, \ldots, V_{m,g,G_{|G|}})$$ $$C_m = \text{Aggregate}_{\text{model}}(C_{m,g_1}, C_{m,g_2}, \ldots, C_{m,g_{|G|}})$$

Where:

  • $\text{Aggregate}_{\text{group}}$ and $\text{Aggregate}_{\text{model}}$ are general functions to aggregate evaluation metrics per group and across the model, respectively. An example implementation can be the average.

4. Grouping

The process of grouping involves transforming data samples, characterized by their features and possibly meta-information, into a specific group index. This mapping can be represented by a function parameterized by $\theta$:

$$G_i = g_{\theta}(x_i, m_i)$$

Where:

  • $G_i$ represents the group index for the $i$-th sample.
  • $x_i$ is the vector of features for the $i$-th sample.
  • $m_i$ denotes the meta-information associated with the $i$-th sample.
  • $g_{\theta}$ is the grouping function, parameterized by $\theta$, determining the group index based on features and meta-information.

5. Optimization

We aim to find the parameter $\theta$ that minimizes our cost function $C$ by adjusting our grouping function $g_{\theta}$. Specifically, by changing $\theta$, we generate different grouping rules and calculate the cost function for each. We select the $\theta$ that results in the lowest cost:

$$\theta^* = \arg\min_{\theta} C = \arg\min_{\theta} \text{Aggregate}_{\text{grouping}}(C_{m1}, C_{m2}, \ldots)$$

By finding $\theta^*$, we determine the optimal grouping rule.

Installation

pip install git+https://github.com/inoueakimitsu/heteroeval

Usage

Simply call find_best_grouping(), as shown below:

from heteroeval import find_best_grouping

find_best_grouping(
    n_models,
    regimes,
    X, y_true,
    y_pred_for_each_model,
    evaluation_measure,
    inter_regime_variation_measure,
    groupwise_variation_measure_aggregate_function,
    modelwise_variation_measure_aggregate_function,
    cost_function,
    optimizer)

Refer to heteroeval/discrete.py for a comprehensive working example.

License

heteroeval is licensed under the MIT License.

heteroeval's People

Contributors

inoueakimitsu avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.