GithubHelp home page GithubHelp logo

bayesian-inference's People

Contributors

cappelletto avatar deepsource-autofix[bot] avatar deepsourcebot avatar miquelmassot avatar

Watchers

 avatar

bayesian-inference's Issues

Expand regression targets to support normalized class proportions

When using aggregated labels from a classification problem, the error metric can be computed as the loss in the predicted vector space with imposed vector norm constraints.

Let's assume we have K=3 classes (A,B,C), and for given input vector x we want to predict the proportions of A, B and C in a 3x1 vector y. As the expected number of classes is known beforehand, the norm of the vector must be 1.0 (100% from all classes summed).

This can be enforced by normalizing the predicted vector at the output layer, or can be used as part of the loss function (RMSE for norm(y)). The predicted proportions are just the y_a, y_b, and y_c component of the (normalized) output vector y

install error

While running pip install -U .
or pip install -U -e .

Processing /home/umesh/datadrive/Software/Gitclones/Autoencoder/bnn_inference
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error

× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [19 lines of output]
Traceback (most recent call last):
File "", line 2, in
File "", line 34, in
File "/home/umesh/datadrive/Software/Gitclones/Autoencoder/bnn_inference/setup.py", line 18, in
exec(ver_file.read(), main_ns)
File "", line 18, in
File "/home/umesh/anaconda3/envs/blitz/lib/python3.9/site-packages/git/refs/symbolic.py", line 219, in _get_object
return Object.new_from_sha(self.repo, hex_to_bin(self.dereference_recursive(self.repo, self.path)))
File "/home/umesh/anaconda3/envs/blitz/lib/python3.9/site-packages/git/objects/base.py", line 94, in new_from_sha
oinfo = repo.odb.info(sha1)
File "/home/umesh/anaconda3/envs/blitz/lib/python3.9/site-packages/git/db.py", line 40, in info
hexsha, typename, size = self._git.get_object_header(bin_to_hex(binsha))
File "/home/umesh/anaconda3/envs/blitz/lib/python3.9/site-packages/git/cmd.py", line 1383, in get_object_header
return self.__get_object_header(cmd, ref)
File "/home/umesh/anaconda3/envs/blitz/lib/python3.9/site-packages/git/cmd.py", line 1370, in __get_object_header
return self._parse_object_header(cmd.stdout.readline())
File "/home/umesh/anaconda3/envs/blitz/lib/python3.9/site-packages/git/cmd.py", line 1329, in _parse_object_header
raise ValueError("SHA could not be resolved, git returned: %r" % (header_line.strip()))
ValueError: SHA could not be resolved, git returned: b''
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

Allow resuming from previously trained network

As pretrained network are easily detected, we could add a --resume flag that allows loading (if available) such pretrained network.

Caution: the actual training parameters and number of trained epochs may not be available so it could be either used for accelerating convergence, recovering interrupted training or for fine tuning models.

Export summary report

After each training, export summary report containing:

  • Training parameters (epochs, targets, inputs)
  • Final loss (recon and divergence)
  • Exported outputs
  • Network summary (layer specific?)

Fix cosine-similarity function

Consider use flag to modify last layer architecture to match one-hot enconding with range limiting functions (tanh, swish, sigmoid)

Define a default policy for single input/target file scenario

It is possible to use the same CSV file for both target (labels) and input (latents) file. However, the dataLoader object will duplicate the entries during the join operation. We can default to force join_left when a single input file is provided (or name duplication, it is equivalent)

This might require either:

  • Provide the CLI option for a single input/target file definition at invocation time
  • Detect name duplication at runtime and enable the join_left option

We will always assume that the input CSV (latents) contains the relevant metadata fields we want to propagate to the exported dataframe

Bump Pytorch version to support on-the-fly smart GPU selection

When using multi-GPU environments, allocate the GPU with enough resources to run current instance. Criteria could be based on allocated memory, threads. This requires Pytorch API bumped to 1.10 or higher. Current 1.7 does not support memory based detailed queries

torch.Tensor vs torch.tensor

Iridis setup (pytorch 1.7.0, cuda 10.2) complains about torch.tensor(). Can be solved by torch.Tensor().

Investigate more about forward compatibility

GitPython issues when code is run outside of git directory

This is the traceback when calling the docker image.

Traceback (most recent call last):
  File "/usr/local/bin/bnn_inference", line 5, in <module>
    from bnn_inference.cli import main
  File "/usr/local/lib/python3.10/site-packages/bnn_inference/__init__.py", line 6, in <module>
    from bnn_inference.version import __version__  # noqa: F401
  File "/usr/local/lib/python3.10/site-packages/bnn_inference/version.py", line 17, in <module>
    repo = git.Repo(search_parent_directories=True)
  File "/usr/local/lib/python3.10/site-packages/git/repo/base.py", line 265, in __init__
    raise InvalidGitRepositoryError(epath)
git.exc.InvalidGitRepositoryError: /data

Add log-normal to normal conversion

Target data may follow non-normal distribution. We can provide a distribution conversion step when loading the datasets so we can keep using the mean as the same MLE estimator for the prediction distribution, by ensuring we have a normally distributed output

Include [index] as first column of exported CSV

Next release should enforce import/export CSV pandas dataframe from prediction, intersection, mapping and merge/clip maps.
Missing [index] first-column can break part of the bash/XLS pipeline. It is requried by LGA pipeline but there is no real functionality for predicting

Score module tests

Check known and newer scores validity at post-processing stage (MSE of p, B-score, FX-family scores, MCC)

Number of Monte Carlo samples must be larger than 1

In order to calculate the uncertainty of the BNN predictions, the number of samples (-s, --samples) must be larger than 1. This bug affects both the training and the inference modules.

It would require either a validation at arg-parsing level, or making a decision when computing the uncertainty.
SAMPLES=1 could mean that the user wants a prediction from the frozen network, where the NN parameters are not drawn from the Gaussian distribution but the Gaussian mean is used as the expected value of such parameters. This can be achieved by freezing the NN.model

Embed training time parameters into network dictionary

For quality control and replication purposes, it is convenient to have a copy of training time information within the network, including fields like:

  • has/commit/revision of training module
  • environment used for training (incl. host, user, time, date)
  • carbon copy of invocation
  • config file, if provided?
  • type/names of the output fields (columns) following the same output order

Split console and log info about error

Show both KL-divergence (g.o.f.) and fitting loss as part of the composite cost function. This can be informative when trying to fit non-normal distributions and when cost function is heavily unbalanced

Support logNormal and MinMax scaling as part of network dictionary

Append to PyTorch model + state dictionary the following parameters:

  • logNormal / Normal transformation (boolean)
  • Distribution transformation can be extended to support other non-Gaussian distributions
  • MinMax scaler range to apply (if enabled) after logNormal transformation

Caution: some transformations way impose restrictions in the X input domain (e.g. non-negaive numbers for logNormal)

  • Evaluate in which order we should apply these transformations

Export correct target/output labels in results CSV

Retrieve and forward the user-defined name of the columns used as target/label during the training. Replicate the naming convention, prepending uncertainty_ and predicted_ to the exported dataframe CSV

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.