awslabs / fortuna Goto Github PK
View Code? Open in Web Editor NEWA Library for Uncertainty Quantification.
Home Page: https://aws-fortuna.readthedocs.io/en/latest/
License: Apache License 2.0
A Library for Uncertainty Quantification.
Home Page: https://aws-fortuna.readthedocs.io/en/latest/
License: Apache License 2.0
Hi, I've run into an annoying bug when trying to run the model.
Basically the following line of code assumes that input_shape is a tuple, but get_input_shape
, which is used in ProbClassifier._check_output_dim
to get the input_shape
is the result of a tree map. So when the input is Dict[str, Array]
, input_shape
has type `Dict[str, Tuple].
The problem his that Joint.init
, linked below, doesn't cover this case.
This seems to be an assumption throughout the library. Maybe there should be a high level error when making a data loader if input is a dictionary.
This may be already possible but I can't seem to figure out how to do it. Is there a way to use the FitConfig
components to keep track of the best state_dict
of a model according to validation accuracy and then saving that state dict?
Thanks so much!
Fortuna version: Latest
prob_model.load_state("../swag_checkpoints/2023-07-25 14:59:40.237855/checkpoint_18600/checkpoint")
state = prob_model.posterior.state.get()
# SWAGState(step=array(18600, dtype=int32), apply_fn=None, params=FrozenDict({
# model: {
# params: {
# dfe_subnet: {
# BatchNorm_0: {
# bias: array([-0.13133389, -0.14736553, -0.14047779, -0.12409671, -0.11933165,
# -0.16984864, -0.13965459, -0.07937623, -0.11898279, -0.1386996 ,
# -0.13736989, -0.11246286, -0.15424594, -0.10375523, -0.10800011,
# -0.14000903, -0.15316793, -0.13276398, -0.11146024, -0.16203304,
# -0.14830959, -0.13227627, -0.11291285, -0.11979104, -0.08990214,
# -0.13557586, -0.15480955, -0.17320064, -0.14736709, -0.12703426, ...
state.mean
# array(-0.01478862, dtype=float32)
This leads to an error when running prob_model.predictive.sample()
on line 212 of fortuna/prob_model/posterior/swag/swag_posterior.py
:
207 if state.mutable is not None and inputs_loader is None and inputs is None:
208 raise ValueError(
209 "The posterior state contains mutable objects. Please pass `inputs_loader` or `inputs`."
210 )
--> 212 n_params = len(state.mean) # TypeError: len() of unsized object
213 rank = state.dev.shape[-1]
214 which_params = decode_encoded_tuple_of_lists_of_strings_to_array(
215 state._encoded_which_params
216 )
Not sure if I'm doing something wrong here? Thanks!
Fortuna version: v0.1.42
Current behavior: When I run the MNIST classification tutorial, I run into a broadcasting error. I think the issue occurs during the calibration step of training for SWAG.
Here's the traceback:
Traceback (most recent call last):
File "/home/yl9959/23_09_uncertainty/src/ftest.py", line 88, in <module>
status = prob_model.train(
File "/home/yl9959/.conda/envs/jax/lib/python3.10/site-packages/fortuna/prob_model/classification.py", line 254, in train
return super().train(
File "/home/yl9959/.conda/envs/jax/lib/python3.10/site-packages/fortuna/prob_model/base.py", line 101, in train
calib_status = self.calibrate(
File "/home/yl9959/.conda/envs/jax/lib/python3.10/site-packages/fortuna/prob_model/classification.py", line 289, in calibrate
return super()._calibrate(
File "/home/yl9959/.conda/envs/jax/lib/python3.10/site-packages/fortuna/prob_model/base.py", line 204, in _calibrate
state, status = calibrator.train(
File "/home/yl9959/.conda/envs/jax/lib/python3.10/site-packages/fortuna/training/output_calibrator.py", line 117, in train
) = self._training_loop(
File "/home/yl9959/.conda/envs/jax/lib/python3.10/site-packages/fortuna/training/output_calibrator.py", line 195, in _training_loop
state, aux = self.training_step(
File "/home/yl9959/.conda/envs/jax/lib/python3.10/site-packages/fortuna/training/output_calibrator.py", line 650, in training_step
return super().training_step(state, batch, outputs, loss_fun, rng, n_data)
File "/home/yl9959/.conda/envs/jax/lib/python3.10/site-packages/fortuna/training/output_calibrator.py", line 252, in training_step
(loss, aux), grad = grad_fn(state.params)
File "/home/yl9959/.conda/envs/jax/lib/python3.10/site-packages/fortuna/training/output_calibrator.py", line 247, in <lambda>
lambda params: self.training_loss_step(
File "/home/yl9959/.conda/envs/jax/lib/python3.10/site-packages/fortuna/prob_model/prob_model_calibrator.py", line 44, in training_loss_step
loss, aux = loss_fun(
File "/home/yl9959/.conda/envs/jax/lib/python3.10/site-packages/fortuna/prob_model/predictive/base.py", line 297, in _batched_negative_log_joint_prob
outs = self._batched_log_joint_prob(
File "/home/yl9959/.conda/envs/jax/lib/python3.10/site-packages/fortuna/prob_model/predictive/base.py", line 271, in _batched_log_joint_prob
outs = lax.map(_lik_log_joint_prob, ensemble_outputs)
File "/home/yl9959/.conda/envs/jax/lib/python3.10/site-packages/fortuna/prob_model/predictive/base.py", line 259, in _lik_log_joint_prob
return self.likelihood._batched_log_joint_prob(
File "/home/yl9959/.conda/envs/jax/lib/python3.10/site-packages/fortuna/likelihood/base.py", line 248, in _batched_log_joint_prob
self.prob_output_layer.log_prob(outputs, targets, train=train, **kwargs)
File "/home/yl9959/.conda/envs/jax/lib/python3.10/site-packages/fortuna/prob_output_layer/classification.py", line 29, in log_prob
return jnp.sum(targets * outputs, -1) - jsp.special.logsumexp(outputs, -1)
File "/home/yl9959/.conda/envs/jax/lib/python3.10/site-packages/jax/_src/numpy/array_methods.py", line 728, in op
return getattr(self.aval, f"_{name}")(self, *args)
File "/home/yl9959/.conda/envs/jax/lib/python3.10/site-packages/jax/_src/numpy/array_methods.py", line 256, in deferring_binary_op
return binary_op(*args)
File "/home/yl9959/.conda/envs/jax/lib/python3.10/site-packages/jax/_src/numpy/ufuncs.py", line 97, in fn
return lax_fn(x1, x2) if x1.dtype != np.bool_ else bool_lax_fn(x1, x2)
TypeError: mul got incompatible shapes for broadcasting: (128, 10), (3840, 10)
Is this due to a versioning issue? Thanks!
Fortuna version: 0.1.21
I am trying to train a MAP posterior approximator first, then continue training with Laplace starting from the MAP checkpoint:
# // Only differ by posterior_approximator method
map_prob_model
laplace_prob_model
checkpoint = "/path/to/map/checkpoint"
# // Validation accuracy of MAP model at checkpoint is as expected...
map_prob_model.load_state("/path/to/map/checkpoint")
map_out = map_prob_model.predictive.mean(val_loader.to_inputs_loader())
(map_out.argmax(axis=-1) == val_loader.to_array_targets()).sum() / val_loader.size
# '0.67'
from fortuna.metric.classification import accuracy
from fortuna.prob_model import FitCheckpointer, FitConfig, FitMonitor, FitOptimizer
optimizer = FitOptimizer(n_epochs=main_epochs)
monitor = FitMonitor(
metrics=(accuracy,),
eval_every_n_epochs=1,
)
checkpointer = FitCheckpointer(
save_checkpoint_dir=main_save_dir,
# // Start training from the MAP checkpoint
restore_checkpoint_path="/path/to/map/checkpoint/",
keep_top_n_checkpoints=2,
)
config = FitConfig(checkpointer=checkpointer, monitor=monitor)
laplace_status = laplace_prob_model.train(
fit_config=config,
train_data_loader=train_loader,
val_data_loader=val_loader,
)
# // Validation accuracy is NOT as expected...
laplace_out = laplace_prob_model.predictive.mean(val_loader.to_inputs_loader())
(laplace_out.argmax(axis=-1) == val_loader.to_array_targets()).sum() / val_loader.size
# '0.11'
However, it seems like the Laplace model is not starting from the checkpoint I pass into restore_checkpoint_path
. Is there a chance that the restore_checkpoint_path
is not properly working? Let me know if you need more information and I can provide more detailed code!
Thanks :)
Suggested improvement
Let the notebooks be .py
files and converting them to notebooks with Jupytext will make the version control simpler.
Additional Context
Holding notebooks as .py
will have the drawback of meaning notebooks cannot be viewed within Github. We can circumvent this though by simply pointing all users to the docs. In the docs, notebook styling is nicer due to the CSS that gets injected by the Sphinx theme.
If the request is approved, would you be willing to submit a PR?
Yes
Describe the Feature Request
Looking at the documentation it seems like Fortuna implements Inductive Confomal Prediction. I couldn't understand if you are using a mondrian approac or not, meaning that you are calculating the non-conformity measures for the calibration set (or alphas) for each class separately (and computing conformal p-values for each class).
Describe Preferred Solution
If fortuna implements mondrian ICPs it would be good to add it to the documentation otherwise it would be nice to have it done with the mondrian approach for better handling class imbalances.
Related Code
n/a
Additional Context
n/a
If the feature request is approved, would you be willing to submit a PR?
Yes (if time permits, I am not sure if I have the capacity during my working hours)
Some docs have rendered output e.g., Sinusoidal regression, whereas others, such as MNIST Classification do not.
We should render the notebooks in a CI/CD loop. This would yield two benefits: 1) fully rendered and up-to-date notebooks, and 2) bug catching - if there's a change to the codebase that breaks a notebook, this workflow will catch it.
As a further consideration, letting the notebooks be .py
files and converting them to notebooks with Jupytext would make the version control simpler.
If approved, I'd be happy to open a PR for this.
System: Mac M1
Just tried to install. First installed jax
(CPU version) from the link provided. My pip list
is:
Package Version
---------- -------
jax 0.4.1
jaxlib 0.4.1
numpy 1.24.0
opt-einsum 3.3.0
pip 22.3.1
scipy 1.9.3
setuptools 58.1.0
Then running pip install aws-fortuna
results in:
ERROR: Cannot install aws-fortuna==0.1.1, aws-fortuna==0.1.2, aws-fortuna==0.1.3, aws-fortuna==0.1.4 and aws-fortuna==0.1.5 because these package versions have conflicting dependencies.
The conflict is caused by:
aws-fortuna 0.1.5 depends on tensorflow-cpu<3.0.0 and >=2.11.0
aws-fortuna 0.1.4 depends on tensorflow-cpu<3.0.0 and >=2.11.0
aws-fortuna 0.1.3 depends on tensorflow-cpu<3.0.0 and >=2.11.0
aws-fortuna 0.1.2 depends on tensorflow-cpu<3.0.0 and >=2.11.0
aws-fortuna 0.1.1 depends on tensorflow-cpu<3.0.0 and >=2.11.0
Do I need to separately install tensorflow
?
Fortuna version: 0.1.17
Current behavior:
When bringing own model class, and then trying to run prob_model.train()
, it leads to the following error:
TypeError: model_class.__call__() got an unexpected keyword argument 'train'
Expected behavior: That the model trains.
Related code:
import flax.linen as nn
from fortuna.prob_model import ProbClassifier
from fortuna.data import DataLoader
from fortuna.prob_model import FitConfig
class CNN(nn.Module):
@nn.compact
def __call__(self, x):
x = nn.Conv(features=32, kernel_size=(3, 3))(x)
x = nn.relu(x)
x = nn.avg_pool(x, window_shape=(2, 2), strides=(2, 2))
x = nn.Conv(features=64, kernel_size=(3, 3))(x)
x = nn.relu(x)
x = nn.avg_pool(x, window_shape=(2, 2), strides=(2, 2))
x = x.reshape((x.shape[0], -1)) # flatten
x = nn.Dense(features=256)(x)
x = nn.relu(x)
x = nn.Dense(features=10)(x)
x = nn.log_softmax(x)
return x
prob_model = ProbClassifier(model=CNN())
x = jnp.zeros((5, 64, 64, 10))
y = jnp.ones((5,))
train_loader = DataLoader.from_array_data(
data=(x,y), batch_size=1
)
prob_model.train(
train_loader,
fit_config=FitConfig(
)
)
# RAISES ERROR
I am applying the CalibRegressor to model outputs and comparing it to a simple implementation from the A Gentle Intro to Conformal Prediction Paper. I am getting quiet different results and was therefore looking at how the CalibRegressor
does conformal prediction. I am not sure where to find it in the documentation, and was wondering if you could point me to a paper on which basis the CalibRegressor
is implemented?
For more consistent and meaningful issue tracking, we should have issue templates for bugs, features, documentation, and general code improvements.
Fortuna version: 0.1.14
Current behavior: From this example, when I run
output_dim = 10
prob_model = ProbClassifier(
model=LeNet5(output_dim=output_dim),
posterior_approximator=LaplacePosteriorApproximator(),
)
status = prob_model.train(
train_data_loader=train_data_loader,
val_data_loader=val_data_loader,
calib_data_loader=val_data_loader,
fit_config=FitConfig(
optimizer=FitOptimizer(freeze_fun=lambda path, val: "trainable" if "output_subnet" in path else "frozen")
),
map_fit_config=FitConfig(
monitor=FitMonitor(early_stopping_patience=2, metrics=(accuracy,)),
optimizer=FitOptimizer()
),
calib_config=CalibConfig(monitor=CalibMonitor(early_stopping_patience=2))
)
I get
TypeError: FitOptimizer.__init__() got an unexpected keyword argument 'freeze_fun'
Expected behavior: For the example to run without error.
Steps to reproduce: I created a virtual environment with poetry. I ran poetry add aws-fortuna
to install fortuna.
Related code:
Other information:
Hi,
Sorry to ask so many questions! And thanks again for creating such a great library.
I'd like to log the loss / accuracy and other metrics during training...for example with Tensorboard or Weights & Biases. I've looked at Callbacks but it appears they interact with TrainerState
which seems to only contain the parameters of the model state.
Do you know if there is any easy way to construct a Callback function to retrieve predictions and ground truth for a particular epoch? Then I could compute whatever sort of metrics I'd want.
Thank you!
Is the library suitable for time series data?
Hi! I've trained a prob_model
and created checkpoints. I then run prob_model.load_state
and attempt to produce predictions on the test set. However, I'm getting the following error:
...
pspec=PartitionSpec('processes',)
] b
from line [/home/pscemama/bayesian-conformal-sets/.venv/lib/python3.10/site-packages/orbax/checkpoint/utils.py:63](https://vscode-remote+ssh-002dremote-002brapidstart.vscode-resource.vscode-cdn.net/home/pscemama/bayesian-conformal-sets/.venv/lib/python3.10/site-packages/orbax/checkpoint/utils.py:63) (sync_global_devices)
See https://jax.readthedocs.io/en/latest/errors.html#jax.errors.ConcretizationTypeError
The only thing I've done that is not standard is use my own custom model, which is here:
from typing import Any
import flax.linen as nn
import jax.numpy as jnp
import jax
act = jax.nn.swish
class AlexNet(nn.Module):
output_dim: int
dtype: Any = jnp.float32
"""
An AlexNet model for Cifar10.
"""
def setup(self):
self.hidden_layers = AlexNetHiddenLayers(dtype=self.dtype)
self.last_layer = AlexNetLastLayer(output_dim=self.output_dim, dtype=self.dtype)
def __call__(self, x: jnp.ndarray, train: bool = True) -> jnp.ndarray:
x = self.hidden_layers(x, train)
x = self.last_layer(x, train)
return x
class AlexNetHiddenLayers(nn.Module):
dtype: Any = jnp.float32
"""
Hidden Convolutional layers of AlexNet model
"""
@nn.compact
def __call__(self, x: jnp.ndarray, train: bool = True):
# [32, 32, 3]
x = nn.Conv(features=64, kernel_size=(3,))(x)
# [32, 32, 64]
x = act(x)
x = nn.max_pool(x, window_shape=(2, 2), strides=(2, 2))
# [16, 16, 64]
x = nn.Conv(features=128, kernel_size=(3,))(x)
# [16, 16, 128]
x = act(x)
x = nn.max_pool(x, window_shape=(2, 2), strides=(2, 2))
# [8, 8, 128]
x = nn.Conv(features=256, kernel_size=(2,))(x)
# [8, 8, 256]
x = act(x)
x = nn.Conv(features=128, kernel_size=(2,))(x)
# [8, 8, 128]
x = act(x)
x = nn.Conv(features=64, kernel_size=(2,))(x)
# [8, 8, 64]
x = act(x)
x = x.reshape((x.shape[0], -1))
return x
class AlexNetLastLayer(nn.Module):
output_dim: int
dtype: Any = jnp.float32
@nn.compact
def __call__(self, x: jnp.ndarray, train: bool = True):
x = nn.Dense(features=256, dtype=self.dtype)(x)
x = act(x)
x = nn.Dense(features=256, dtype=self.dtype)(x)
x = act(x)
x = nn.Dense(features=self.output_dim, dtype=self.dtype)(x)
return x
Steps to reproduce:
# // Model
prob_model = ProbClassifier(
model=AlexNet(output_dim=10),
posterior_approximator=LaplacePosteriorApproximator(),
prior=IsotropicGaussianPrior(log_var=jnp.log(PRIOR_VAR))
)
prob_model.load_state("../sgd_checkpoints/checkpoint_11532/")
test_log_probs = prob_model.predictive.log_prob(data_loader=test_loader)
# RAISES ERROR
Other information:
The data is coming from a torch
dataloader, and converted with .from_torch_dataloader()
. Let me know if you need more information on the actual data.
My hunch is that maybe I'm doing something wrong here. Any guidance is appreciated :)
I want to explore fortuna with some experiments in a google colab notebook, but having trouble with the install. Here is a notebook just trying to install fortuna and use the CalibRegressor
.
AttributeError: module 'numpy' has no attribute '_no_nep50_warning'
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.