cappelletto / bayesian-inference Goto Github PK

View Code? Open in Web Editor NEW

0.0 0.0 2.0 51.26 MB

License: GNU General Public License v3.0

Python 99.20% Shell 0.11% Dockerfile 0.69%

bayesian-inference's People

Contributors

Watchers

Forkers

ocean-perception umeshn

bayesian-inference's Issues

[refactor] Fix logfile_name --> log_filename naming consistency

Expand regression targets to support normalized class proportions

When using aggregated labels from a classification problem, the error metric can be computed as the loss in the predicted vector space with imposed vector norm constraints.

Let's assume we have K=3 classes (A,B,C), and for given input vector x we want to predict the proportions of A, B and C in a 3x1 vector y. As the expected number of classes is known beforehand, the norm of the vector must be 1.0 (100% from all classes summed).

This can be enforced by normalizing the predicted vector at the output layer, or can be used as part of the loss function (RMSE for norm(y)). The predicted proportions are just the y_a, y_b, and y_c component of the (normalized) output vector y

install error

While running pip install -U .
or pip install -U -e .

Processing /home/umesh/datadrive/Software/Gitclones/Autoencoder/bnn_inference
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error

× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [19 lines of output]
Traceback (most recent call last):
File "", line 2, in
File "", line 34, in
File "/home/umesh/datadrive/Software/Gitclones/Autoencoder/bnn_inference/setup.py", line 18, in
exec(ver_file.read(), main_ns)
File "", line 18, in
File "/home/umesh/anaconda3/envs/blitz/lib/python3.9/site-packages/git/refs/symbolic.py", line 219, in _get_object
return Object.new_from_sha(self.repo, hex_to_bin(self.dereference_recursive(self.repo, self.path)))
File "/home/umesh/anaconda3/envs/blitz/lib/python3.9/site-packages/git/objects/base.py", line 94, in new_from_sha
oinfo = repo.odb.info(sha1)
File "/home/umesh/anaconda3/envs/blitz/lib/python3.9/site-packages/git/db.py", line 40, in info
hexsha, typename, size = self._git.get_object_header(bin_to_hex(binsha))
File "/home/umesh/anaconda3/envs/blitz/lib/python3.9/site-packages/git/cmd.py", line 1383, in get_object_header
return self.__get_object_header(cmd, ref)
File "/home/umesh/anaconda3/envs/blitz/lib/python3.9/site-packages/git/cmd.py", line 1370, in __get_object_header
return self._parse_object_header(cmd.stdout.readline())
File "/home/umesh/anaconda3/envs/blitz/lib/python3.9/site-packages/git/cmd.py", line 1329, in _parse_object_header
raise ValueError("SHA could not be resolved, git returned: %r" % (header_line.strip()))
ValueError: SHA could not be resolved, git returned: b''
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

Separate training and encoding modules

Current decoupled train/encode implementation is available but the network architecture definition is not shared

Allow resuming from previously trained network

As pretrained network are easily detected, we could add a --resume flag that allows loading (if available) such pretrained network.

Caution: the actual training parameters and number of trained epochs may not be available so it could be either used for accelerating convergence, recovering interrupted training or for fine tuning models.

Export summary report

After each training, export summary report containing:

Training parameters (epochs, targets, inputs)
Final loss (recon and divergence)
Exported outputs
Network summary (layer specific?)

Fix cosine-similarity function

Consider use flag to modify last layer architecture to match one-hot enconding with range limiting functions (tanh, swish, sigmoid)

Define a default policy for single input/target file scenario

It is possible to use the same CSV file for both target (labels) and input (latents) file. However, the dataLoader object will duplicate the entries during the join operation. We can default to force join_left when a single input file is provided (or name duplication, it is equivalent)

This might require either:

Provide the CLI option for a single input/target file definition at invocation time
Detect name duplication at runtime and enable the join_left option

We will always assume that the input CSV (latents) contains the relevant metadata fields we want to propagate to the exported dataframe

Bump Pytorch version to support on-the-fly smart GPU selection

When using multi-GPU environments, allocate the GPU with enough resources to run current instance. Criteria could be based on allocated memory, threads. This requires Pytorch API bumped to 1.10 or higher. Current 1.7 does not support memory based detailed queries

Update requirements to reflect packages required by pre-commit

[refactor] pre-commit requires packages that are missing in the requirements.txt

Include scaling factor to the training module

New scaling factor arg is only used during inference (bnn_predict) phase. Include as part of the user-defined normalization step during training (bnn_train) module

Environment + setup proces fails due to missing bnn_inference.tools modules

When using bnn_predict in a freshly installed conda environment + setup , fails to find bnn_inference.tools module

(PYL-E0213) Method should have `self` as the first argument

Description

The first argument of instance methods must be named self. This is considered an error since this convention is so common that you shouldn't break it.

Occurrences

There are 3 occurrences of this issue in the repository.

See all occurrences on DeepSource → deepsource.io/gh/cappelletto/bayesian-inference/issue/PYL-E0213/occurrences/

Complete README and LICENSE information

torch.Tensor vs torch.tensor

Iridis setup (pytorch 1.7.0, cuda 10.2) complains about torch.tensor(). Can be solved by torch.Tensor().

Investigate more about forward compatibility

GitPython issues when code is run outside of git directory

This is the traceback when calling the docker image.

Traceback (most recent call last):
  File "/usr/local/bin/bnn_inference", line 5, in <module>
    from bnn_inference.cli import main
  File "/usr/local/lib/python3.10/site-packages/bnn_inference/__init__.py", line 6, in <module>
    from bnn_inference.version import __version__  # noqa: F401
  File "/usr/local/lib/python3.10/site-packages/bnn_inference/version.py", line 17, in <module>
    repo = git.Repo(search_parent_directories=True)
  File "/usr/local/lib/python3.10/site-packages/git/repo/base.py", line 265, in __init__
    raise InvalidGitRepositoryError(epath)
git.exc.InvalidGitRepositoryError: /data

(PYL-R1722) Use of `exit()` or `quit()` detected

Description

The exit or quit functions don't exist at top-level if python is started with the -S flag, and will raise an error. Use sys.exit() instead.

Occurrences

There are 4 occurrences of this issue in the repository.

See all occurrences on DeepSource → deepsource.io/gh/cappelletto/bayesian-inference/issue/PYL-R1722/occurrences/

Add log-normal to normal conversion

Target data may follow non-normal distribution. We can provide a distribution conversion step when loading the datasets so we can keep using the mean as the same MLE estimator for the prediction distribution, by ensuring we have a normally distributed output

Migrate to pyproject.toml based solution for package creation/installation

Modern package creation pipelines requires to adhere pyproject.toml based solutions instead of deprecated setup.py
Please refer to:

setup.py will be deprecated in 0.2.X series

Include [index] as first column of exported CSV

Next release should enforce import/export CSV pandas dataframe from prediction, intersection, mapping and merge/clip maps.
Missing [index] first-column can break part of the bash/XLS pipeline. It is requried by LGA pipeline but there is no real functionality for predicting

Add support to configuration file

Score module tests

Check known and newer scores validity at post-processing stage (MSE of p, B-score, FX-family scores, MCC)

Number of Monte Carlo samples must be larger than 1

In order to calculate the uncertainty of the BNN predictions, the number of samples (-s, --samples) must be larger than 1. This bug affects both the training and the inference modules.

It would require either a validation at arg-parsing level, or making a decision when computing the uncertainty.
SAMPLES=1 could mean that the user wants a prediction from the frozen network, where the NN parameters are not drawn from the Gaussian distribution but the Gaussian mean is used as the expected value of such parameters. This can be achieved by freezing the NN.model

Add setup.py module for user/system wide usage

Check MWS at https://github.com/shuds13/pyexample/blob/master/setup.py
Match oplab setup approach https://github.com/ocean-perception/georef_semantics/blob/main/setup.py

Embed training time parameters into network dictionary

For quality control and replication purposes, it is convenient to have a copy of training time information within the network, including fields like:

has/commit/revision of training module
environment used for training (incl. host, user, time, date)
carbon copy of invocation
config file, if provided?
type/names of the output fields (columns) following the same output order

Sync blitz snapshot into master branch

Split console and log info about error

Show both KL-divergence (g.o.f.) and fitting loss as part of the composite cost function. This can be informative when trying to fit non-normal distributions and when cost function is heavily unbalanced

Update/bump env requirement

Python version must be bumped up to 3.9 or 3.11.
Check deps

(SH-2061) Quote the passed parameter to prevent the shell from interpreting it

Description

Occurrences

There is 1 occurrence of this issue in the repository.

See all occurrences on DeepSource → deepsource.io/gh/cappelletto/bayesian-inference/issue/SH-2061/occurrences/

Support logNormal and MinMax scaling as part of network dictionary

Append to PyTorch model + state dictionary the following parameters:

logNormal / Normal transformation (boolean)
Distribution transformation can be extended to support other non-Gaussian distributions
MinMax scaler range to apply (if enabled) after logNormal transformation

Caution: some transformations way impose restrictions in the X input domain (e.g. non-negaive numbers for logNormal)

Evaluate in which order we should apply these transformations

cappelletto / bayesian-inference Goto Github PK

bayesian-inference's People

Contributors

Watchers

Forkers

bayesian-inference's Issues

Description

Occurrences

Description

Occurrences

Description

Occurrences

Recommend Projects

Recommend Topics

Recommend Org

Jobs