awslabs / fortuna Goto Github PK

A Library for Uncertainty Quantification.

Home Page: https://aws-fortuna.readthedocs.io/en/latest/

License: Apache License 2.0

Python 81.55% Jupyter Notebook 18.16% Dockerfile 0.05% Shell 0.24%

deep-learning neural-networks uncertainty uncertainty-estimation uncertainty-quantification bayesian-inference calibration conformal-prediction model-calibration uncertainty-calibration flax jax ai machine-learning ml

fortuna's Issues

Bug: Init doesn't work when the input data is a dictionary

Hi, I've run into an annoying bug when trying to run the model.

Basically the following line of code assumes that input_shape is a tuple, but get_input_shape, which is used in ProbClassifier._check_output_dim to get the input_shape is the result of a tree map. So when the input is Dict[str, Array], input_shape has type `Dict[str, Tuple].

The problem his that Joint.init, linked below, doesn't cover this case.

fortuna/fortuna/model/model_manager/classification.py

Line 107 in c277435

model=self.model.init(rngs, jnp.zeros((1,) + input_shape), **kwargs)

This seems to be an assumption throughout the library. Maybe there should be a high level error when making a data loader if input is a dictionary.

feat: Keep track of `state_dict` corresponding to best model according to validation accuracy

This may be already possible but I can't seem to figure out how to do it. Is there a way to use the FitConfig components to keep track of the best state_dict of a model according to validation accuracy and then saving that state dict?

Thanks so much!

bug: SWAG's `state.mean.size` is `1` leading to `TypeError: len() of unsized object`.

Bug Report

Fortuna version: Latest

prob_model.load_state("../swag_checkpoints/2023-07-25 14:59:40.237855/checkpoint_18600/checkpoint")
state = prob_model.posterior.state.get()
# SWAGState(step=array(18600, dtype=int32), apply_fn=None, params=FrozenDict({
#    model: {
 #       params: {
  #          dfe_subnet: {
   #             BatchNorm_0: {
   #                 bias: array([-0.13133389, -0.14736553, -0.14047779, -0.12409671, -0.11933165,
     #                      -0.16984864, -0.13965459, -0.07937623, -0.11898279, -0.1386996 ,
        #                   -0.13736989, -0.11246286, -0.15424594, -0.10375523, -0.10800011,
           #                -0.14000903, -0.15316793, -0.13276398, -0.11146024, -0.16203304,
              #             -0.14830959, -0.13227627, -0.11291285, -0.11979104, -0.08990214,
                 #          -0.13557586, -0.15480955, -0.17320064, -0.14736709, -0.12703426, ...
state.mean
# array(-0.01478862, dtype=float32)

This leads to an error when running prob_model.predictive.sample() on line 212 of fortuna/prob_model/posterior/swag/swag_posterior.py:

    207 if state.mutable is not None and inputs_loader is None and inputs is None:
    208     raise ValueError(
    209         "The posterior state contains mutable objects. Please pass `inputs_loader` or `inputs`."
    210     )
--> 212 n_params = len(state.mean) # TypeError: len() of unsized object
    213 rank = state.dev.shape[-1]
    214 which_params = decode_encoded_tuple_of_lists_of_strings_to_array(
    215     state._encoded_which_params
    216 )

Not sure if I'm doing something wrong here? Thanks!

bug: MNIST classification tutorial not running

Bug Report

Fortuna version: v0.1.42

Current behavior: When I run the MNIST classification tutorial, I run into a broadcasting error. I think the issue occurs during the calibration step of training for SWAG.

Here's the traceback:

Traceback (most recent call last):
  File "/home/yl9959/23_09_uncertainty/src/ftest.py", line 88, in <module>
    status = prob_model.train(
  File "/home/yl9959/.conda/envs/jax/lib/python3.10/site-packages/fortuna/prob_model/classification.py", line 254, in train
    return super().train(
  File "/home/yl9959/.conda/envs/jax/lib/python3.10/site-packages/fortuna/prob_model/base.py", line 101, in train
    calib_status = self.calibrate(
  File "/home/yl9959/.conda/envs/jax/lib/python3.10/site-packages/fortuna/prob_model/classification.py", line 289, in calibrate
    return super()._calibrate(
  File "/home/yl9959/.conda/envs/jax/lib/python3.10/site-packages/fortuna/prob_model/base.py", line 204, in _calibrate
    state, status = calibrator.train(
  File "/home/yl9959/.conda/envs/jax/lib/python3.10/site-packages/fortuna/training/output_calibrator.py", line 117, in train
    ) = self._training_loop(
  File "/home/yl9959/.conda/envs/jax/lib/python3.10/site-packages/fortuna/training/output_calibrator.py", line 195, in _training_loop
    state, aux = self.training_step(
  File "/home/yl9959/.conda/envs/jax/lib/python3.10/site-packages/fortuna/training/output_calibrator.py", line 650, in training_step
    return super().training_step(state, batch, outputs, loss_fun, rng, n_data)
  File "/home/yl9959/.conda/envs/jax/lib/python3.10/site-packages/fortuna/training/output_calibrator.py", line 252, in training_step
    (loss, aux), grad = grad_fn(state.params)
  File "/home/yl9959/.conda/envs/jax/lib/python3.10/site-packages/fortuna/training/output_calibrator.py", line 247, in <lambda>
    lambda params: self.training_loss_step(
  File "/home/yl9959/.conda/envs/jax/lib/python3.10/site-packages/fortuna/prob_model/prob_model_calibrator.py", line 44, in training_loss_step
    loss, aux = loss_fun(
  File "/home/yl9959/.conda/envs/jax/lib/python3.10/site-packages/fortuna/prob_model/predictive/base.py", line 297, in _batched_negative_log_joint_prob
    outs = self._batched_log_joint_prob(
  File "/home/yl9959/.conda/envs/jax/lib/python3.10/site-packages/fortuna/prob_model/predictive/base.py", line 271, in _batched_log_joint_prob
    outs = lax.map(_lik_log_joint_prob, ensemble_outputs)
  File "/home/yl9959/.conda/envs/jax/lib/python3.10/site-packages/fortuna/prob_model/predictive/base.py", line 259, in _lik_log_joint_prob
    return self.likelihood._batched_log_joint_prob(
  File "/home/yl9959/.conda/envs/jax/lib/python3.10/site-packages/fortuna/likelihood/base.py", line 248, in _batched_log_joint_prob
    self.prob_output_layer.log_prob(outputs, targets, train=train, **kwargs)
  File "/home/yl9959/.conda/envs/jax/lib/python3.10/site-packages/fortuna/prob_output_layer/classification.py", line 29, in log_prob
    return jnp.sum(targets * outputs, -1) - jsp.special.logsumexp(outputs, -1)
  File "/home/yl9959/.conda/envs/jax/lib/python3.10/site-packages/jax/_src/numpy/array_methods.py", line 728, in op
    return getattr(self.aval, f"_{name}")(self, *args)
  File "/home/yl9959/.conda/envs/jax/lib/python3.10/site-packages/jax/_src/numpy/array_methods.py", line 256, in deferring_binary_op
    return binary_op(*args)
  File "/home/yl9959/.conda/envs/jax/lib/python3.10/site-packages/jax/_src/numpy/ufuncs.py", line 97, in fn
    return lax_fn(x1, x2) if x1.dtype != np.bool_ else bool_lax_fn(x1, x2)
TypeError: mul got incompatible shapes for broadcasting: (128, 10), (3840, 10)

Is this due to a versioning issue? Thanks!

bug: restore_checkpoint_path doesn't seem to work.

Bug Report

Fortuna version: 0.1.21

I am trying to train a MAP posterior approximator first, then continue training with Laplace starting from the MAP checkpoint:

# // Only differ by posterior_approximator method 
map_prob_model
laplace_prob_model

checkpoint = "/path/to/map/checkpoint"

# // Validation accuracy of MAP model at checkpoint is as expected...
map_prob_model.load_state("/path/to/map/checkpoint")
map_out = map_prob_model.predictive.mean(val_loader.to_inputs_loader())
(map_out.argmax(axis=-1) == val_loader.to_array_targets()).sum() / val_loader.size
# '0.67'

from fortuna.metric.classification import accuracy
from fortuna.prob_model import FitCheckpointer, FitConfig, FitMonitor, FitOptimizer

optimizer = FitOptimizer(n_epochs=main_epochs)
monitor = FitMonitor(
    metrics=(accuracy,),
    eval_every_n_epochs=1,
)
checkpointer = FitCheckpointer(
    save_checkpoint_dir=main_save_dir,
   # // Start training from the MAP checkpoint
    restore_checkpoint_path="/path/to/map/checkpoint/",
    keep_top_n_checkpoints=2,
)
config = FitConfig(checkpointer=checkpointer, monitor=monitor)
laplace_status = laplace_prob_model.train(
    fit_config=config,
    train_data_loader=train_loader,
    val_data_loader=val_loader,
)
# // Validation accuracy is NOT as expected...
laplace_out = laplace_prob_model.predictive.mean(val_loader.to_inputs_loader())
(laplace_out.argmax(axis=-1) == val_loader.to_array_targets()).sum() / val_loader.size
# '0.11'

However, it seems like the Laplace model is not starting from the checkpoint I pass into restore_checkpoint_path. Is there a chance that the restore_checkpoint_path is not properly working? Let me know if you need more information and I can provide more detailed code!

Thanks :)

docs: Convert notebooks to py files.

Documentation Request

Suggested improvement

Let the notebooks be .py files and converting them to notebooks with Jupytext will make the version control simpler.

Additional Context

Holding notebooks as .py will have the drawback of meaning notebooks cannot be viewed within Github. We can circumvent this though by simply pointing all users to the docs. In the docs, notebook styling is nicer due to the CSS that gets injected by the Sphinx theme.

If the request is approved, would you be willing to submit a PR?

Yes

feat: mondrian conformal prediction

Feature Request

Describe the Feature Request
Looking at the documentation it seems like Fortuna implements Inductive Confomal Prediction. I couldn't understand if you are using a mondrian approac or not, meaning that you are calculating the non-conformity measures for the calibration set (or alphas) for each class separately (and computing conformal p-values for each class).

Describe Preferred Solution
If fortuna implements mondrian ICPs it would be good to add it to the documentation otherwise it would be nice to have it done with the mondrian approach for better handling class imbalances.

Related Code
n/a

Additional Context
n/a

If the feature request is approved, would you be willing to submit a PR?
Yes (if time permits, I am not sure if I have the capacity during my working hours)

Documentation: Inconsistent outputs

Some docs have rendered output e.g., Sinusoidal regression, whereas others, such as MNIST Classification do not.

We should render the notebooks in a CI/CD loop. This would yield two benefits: 1) fully rendered and up-to-date notebooks, and 2) bug catching - if there's a change to the codebase that breaks a notebook, this workflow will catch it.

As a further consideration, letting the notebooks be .py files and converting them to notebooks with Jupytext would make the version control simpler.

If approved, I'd be happy to open a PR for this.

Installation

System: Mac M1

Just tried to install. First installed jax (CPU version) from the link provided. My pip list is:

Package    Version
---------- -------
jax        0.4.1
jaxlib     0.4.1
numpy      1.24.0
opt-einsum 3.3.0
pip        22.3.1
scipy      1.9.3
setuptools 58.1.0

Then running pip install aws-fortuna results in:

ERROR: Cannot install aws-fortuna==0.1.1, aws-fortuna==0.1.2, aws-fortuna==0.1.3, aws-fortuna==0.1.4 and aws-fortuna==0.1.5 because these package versions have conflicting dependencies.

The conflict is caused by:
    aws-fortuna 0.1.5 depends on tensorflow-cpu<3.0.0 and >=2.11.0
    aws-fortuna 0.1.4 depends on tensorflow-cpu<3.0.0 and >=2.11.0
    aws-fortuna 0.1.3 depends on tensorflow-cpu<3.0.0 and >=2.11.0
    aws-fortuna 0.1.2 depends on tensorflow-cpu<3.0.0 and >=2.11.0
    aws-fortuna 0.1.1 depends on tensorflow-cpu<3.0.0 and >=2.11.0

Do I need to separately install tensorflow?

bug: `model.call()` got an unexpected keyword argument 'train' when training custom model class.

Bug Report

Fortuna version: 0.1.17

Current behavior:

When bringing own model class, and then trying to run prob_model.train(), it leads to the following error:

TypeError: model_class.__call__() got an unexpected keyword argument 'train'

Expected behavior: That the model trains.

Related code:

import flax.linen as nn
from fortuna.prob_model import ProbClassifier
from fortuna.data import DataLoader 
from fortuna.prob_model import FitConfig

class CNN(nn.Module):
    @nn.compact
    def __call__(self, x):
        x = nn.Conv(features=32, kernel_size=(3, 3))(x)
        x = nn.relu(x)
        x = nn.avg_pool(x, window_shape=(2, 2), strides=(2, 2))
        x = nn.Conv(features=64, kernel_size=(3, 3))(x)
        x = nn.relu(x)
        x = nn.avg_pool(x, window_shape=(2, 2), strides=(2, 2))
        x = x.reshape((x.shape[0], -1))  # flatten
        x = nn.Dense(features=256)(x)
        x = nn.relu(x)
        x = nn.Dense(features=10)(x)
        x = nn.log_softmax(x)
        return x
    
prob_model = ProbClassifier(model=CNN())


x = jnp.zeros((5, 64, 64, 10))
y = jnp.ones((5,))

train_loader = DataLoader.from_array_data(
    data=(x,y), batch_size=1
)

prob_model.train(
    train_loader,
    fit_config=FitConfig(
    )
)
# RAISES ERROR

Documentation Calib Regressor

I am applying the CalibRegressor to model outputs and comparing it to a simple implementation from the A Gentle Intro to Conformal Prediction Paper. I am getting quiet different results and was therefore looking at how the CalibRegressor does conformal prediction. I am not sure where to find it in the documentation, and was wondering if you could point me to a paper on which basis the CalibRegressor is implemented?

Issue templates

For more consistent and meaningful issue tracking, we should have issue templates for bugs, features, documentation, and general code improvements.

bug: `FitOptimizer.init()` got an unexpected keyword argument 'freeze_fun'

Bug Report

Fortuna version: 0.1.14

Current behavior: From this example, when I run

output_dim = 10
prob_model = ProbClassifier(
    model=LeNet5(output_dim=output_dim),
    posterior_approximator=LaplacePosteriorApproximator(),
)

status = prob_model.train(
    train_data_loader=train_data_loader,
    val_data_loader=val_data_loader,
    calib_data_loader=val_data_loader,
    fit_config=FitConfig(
        optimizer=FitOptimizer(freeze_fun=lambda path, val: "trainable" if "output_subnet" in path else "frozen")
    ),
    map_fit_config=FitConfig(
        monitor=FitMonitor(early_stopping_patience=2, metrics=(accuracy,)),
        optimizer=FitOptimizer()
    ),
    calib_config=CalibConfig(monitor=CalibMonitor(early_stopping_patience=2))
)

I get

TypeError: FitOptimizer.__init__() got an unexpected keyword argument 'freeze_fun'

Expected behavior: For the example to run without error.

Steps to reproduce: I created a virtual environment with poetry. I ran poetry add aws-fortuna to install fortuna.

Related code:

Other information:

How to log metrics during training

Hi,

Sorry to ask so many questions! And thanks again for creating such a great library.

I'd like to log the loss / accuracy and other metrics during training...for example with Tensorboard or Weights & Biases. I've looked at Callbacks but it appears they interact with TrainerState which seems to only contain the parameters of the model state.

Do you know if there is any easy way to construct a Callback function to retrieve predictions and ground truth for a particular epoch? Then I could compute whatever sort of metrics I'd want.

Thank you!

Example with Time Series Data ?

Is the library suitable for time series data?

bug: `ConcretizationTypeError` when trying to use `prob_model.predictive()`

Bug Report

Hi! I've trained a prob_model and created checkpoints. I then run prob_model.load_state and attempt to produce predictions on the test set. However, I'm getting the following error:

...
  pspec=PartitionSpec('processes',)
] b
    from line [/home/pscemama/bayesian-conformal-sets/.venv/lib/python3.10/site-packages/orbax/checkpoint/utils.py:63](https://vscode-remote+ssh-002dremote-002brapidstart.vscode-resource.vscode-cdn.net/home/pscemama/bayesian-conformal-sets/.venv/lib/python3.10/site-packages/orbax/checkpoint/utils.py:63) (sync_global_devices)

See https://jax.readthedocs.io/en/latest/errors.html#jax.errors.ConcretizationTypeError

The only thing I've done that is not standard is use my own custom model, which is here:

from typing import Any
import flax.linen as nn
import jax.numpy as jnp
import jax

act = jax.nn.swish


class AlexNet(nn.Module):
    output_dim: int
    dtype: Any = jnp.float32
    """
    An AlexNet model for Cifar10.
    """

    def setup(self):
        self.hidden_layers = AlexNetHiddenLayers(dtype=self.dtype)
        self.last_layer = AlexNetLastLayer(output_dim=self.output_dim, dtype=self.dtype)

    def __call__(self, x: jnp.ndarray, train: bool = True) -> jnp.ndarray:
        x = self.hidden_layers(x, train)
        x = self.last_layer(x, train)
        return x


class AlexNetHiddenLayers(nn.Module):
    dtype: Any = jnp.float32
    """
    Hidden Convolutional layers of AlexNet model
    """

    @nn.compact
    def __call__(self, x: jnp.ndarray, train: bool = True):
        # [32, 32, 3]
        x = nn.Conv(features=64, kernel_size=(3,))(x)
        # [32, 32, 64]
        x = act(x)
        x = nn.max_pool(x, window_shape=(2, 2), strides=(2, 2))
        # [16, 16, 64]

        x = nn.Conv(features=128, kernel_size=(3,))(x)
        # [16, 16, 128]
        x = act(x)
        x = nn.max_pool(x, window_shape=(2, 2), strides=(2, 2))
        # [8, 8, 128]

        x = nn.Conv(features=256, kernel_size=(2,))(x)
        # [8, 8, 256]
        x = act(x)

        x = nn.Conv(features=128, kernel_size=(2,))(x)
        # [8, 8, 128]
        x = act(x)

        x = nn.Conv(features=64, kernel_size=(2,))(x)
        # [8, 8, 64]
        x = act(x)

        x = x.reshape((x.shape[0], -1))
        return x


class AlexNetLastLayer(nn.Module):
    output_dim: int
    dtype: Any = jnp.float32

    @nn.compact
    def __call__(self, x: jnp.ndarray, train: bool = True):
        x = nn.Dense(features=256, dtype=self.dtype)(x)
        x = act(x)
        x = nn.Dense(features=256, dtype=self.dtype)(x)
        x = act(x)
        x = nn.Dense(features=self.output_dim, dtype=self.dtype)(x)
        return x

Steps to reproduce:

# // Model
prob_model = ProbClassifier(
    model=AlexNet(output_dim=10), 
    posterior_approximator=LaplacePosteriorApproximator(),
    prior=IsotropicGaussianPrior(log_var=jnp.log(PRIOR_VAR))
)
prob_model.load_state("../sgd_checkpoints/checkpoint_11532/")
test_log_probs = prob_model.predictive.log_prob(data_loader=test_loader)
# RAISES ERROR

Other information:

The data is coming from a torch dataloader, and converted with .from_torch_dataloader(). Let me know if you need more information on the actual data.

My hunch is that maybe I'm doing something wrong here. Any guidance is appreciated :)

Fortuna install google colab

I want to explore fortuna with some experiments in a google colab notebook, but having trouble with the install. Here is a notebook just trying to install fortuna and use the CalibRegressor.

AttributeError: module 'numpy' has no attribute '_no_nep50_warning'

awslabs / fortuna Goto Github PK

fortuna's Issues

Bug Report

Bug Report

Bug Report

Documentation Request

Feature Request

Bug Report

Bug Report

Bug Report

Recommend Projects

Recommend Topics

Recommend Org

Jobs