cappelletto / bayesian-inference Goto Github PK
View Code? Open in Web Editor NEWLicense: GNU General Public License v3.0
License: GNU General Public License v3.0
When using aggregated labels from a classification problem, the error metric can be computed as the loss in the predicted vector space with imposed vector norm constraints.
Let's assume we have K=3 classes (A,B,C), and for given input vector x
we want to predict the proportions of A, B and C in a 3x1 vector y
. As the expected number of classes is known beforehand, the norm of the vector must be 1.0 (100% from all classes summed).
This can be enforced by normalizing the predicted vector at the output layer, or can be used as part of the loss function (RMSE for norm(y
)). The predicted proportions are just the y_a, y_b, and y_c component of the (normalized) output vector y
While running pip install -U .
or pip install -U -e .
Processing /home/umesh/datadrive/Software/Gitclones/Autoencoder/bnn_inference
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error
× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [19 lines of output]
Traceback (most recent call last):
File "", line 2, in
File "", line 34, in
File "/home/umesh/datadrive/Software/Gitclones/Autoencoder/bnn_inference/setup.py", line 18, in
exec(ver_file.read(), main_ns)
File "", line 18, in
File "/home/umesh/anaconda3/envs/blitz/lib/python3.9/site-packages/git/refs/symbolic.py", line 219, in _get_object
return Object.new_from_sha(self.repo, hex_to_bin(self.dereference_recursive(self.repo, self.path)))
File "/home/umesh/anaconda3/envs/blitz/lib/python3.9/site-packages/git/objects/base.py", line 94, in new_from_sha
oinfo = repo.odb.info(sha1)
File "/home/umesh/anaconda3/envs/blitz/lib/python3.9/site-packages/git/db.py", line 40, in info
hexsha, typename, size = self._git.get_object_header(bin_to_hex(binsha))
File "/home/umesh/anaconda3/envs/blitz/lib/python3.9/site-packages/git/cmd.py", line 1383, in get_object_header
return self.__get_object_header(cmd, ref)
File "/home/umesh/anaconda3/envs/blitz/lib/python3.9/site-packages/git/cmd.py", line 1370, in __get_object_header
return self._parse_object_header(cmd.stdout.readline())
File "/home/umesh/anaconda3/envs/blitz/lib/python3.9/site-packages/git/cmd.py", line 1329, in _parse_object_header
raise ValueError("SHA could not be resolved, git returned: %r" % (header_line.strip()))
ValueError: SHA could not be resolved, git returned: b''
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed
× Encountered error while generating package metadata.
╰─> See above for output.
note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
Current decoupled train/encode implementation is available but the network architecture definition is not shared
As pretrained network are easily detected, we could add a --resume flag that allows loading (if available) such pretrained network.
Caution: the actual training parameters and number of trained epochs may not be available so it could be either used for accelerating convergence, recovering interrupted training or for fine tuning models.
After each training, export summary report containing:
Consider use flag to modify last layer architecture to match one-hot enconding with range limiting functions (tanh, swish, sigmoid)
It is possible to use the same CSV file for both target (labels) and input (latents) file. However, the dataLoader object will duplicate the entries during the join operation. We can default to force join_left when a single input file is provided (or name duplication, it is equivalent)
This might require either:
We will always assume that the input CSV (latents) contains the relevant metadata fields we want to propagate to the exported dataframe
When using multi-GPU environments, allocate the GPU with enough resources to run current instance. Criteria could be based on allocated memory, threads. This requires Pytorch API bumped to 1.10 or higher. Current 1.7 does not support memory based detailed queries
[refactor] pre-commit requires packages that are missing in the requirements.txt
New scaling factor arg is only used during inference (bnn_predict) phase. Include as part of the user-defined normalization step during training (bnn_train) module
When using bnn_predict in a freshly installed conda environment + setup , fails to find bnn_inference.tools module
The first argument of instance methods must be named self
. This is considered an error since this convention is so common that you shouldn't break it.
There are 3 occurrences of this issue in the repository.
See all occurrences on DeepSource → deepsource.io/gh/cappelletto/bayesian-inference/issue/PYL-E0213/occurrences/
Iridis setup (pytorch 1.7.0, cuda 10.2) complains about torch.tensor(). Can be solved by torch.Tensor().
Investigate more about forward compatibility
This is the traceback when calling the docker image.
Traceback (most recent call last):
File "/usr/local/bin/bnn_inference", line 5, in <module>
from bnn_inference.cli import main
File "/usr/local/lib/python3.10/site-packages/bnn_inference/__init__.py", line 6, in <module>
from bnn_inference.version import __version__ # noqa: F401
File "/usr/local/lib/python3.10/site-packages/bnn_inference/version.py", line 17, in <module>
repo = git.Repo(search_parent_directories=True)
File "/usr/local/lib/python3.10/site-packages/git/repo/base.py", line 265, in __init__
raise InvalidGitRepositoryError(epath)
git.exc.InvalidGitRepositoryError: /data
The exit
or quit
functions don't exist at top-level if python is started with the -S
flag, and will raise an error. Use sys.exit()
instead.
There are 4 occurrences of this issue in the repository.
See all occurrences on DeepSource → deepsource.io/gh/cappelletto/bayesian-inference/issue/PYL-R1722/occurrences/
Target data may follow non-normal distribution. We can provide a distribution conversion step when loading the datasets so we can keep using the mean as the same MLE estimator for the prediction distribution, by ensuring we have a normally distributed output
Modern package creation pipelines requires to adhere pyproject.toml based solutions instead of deprecated setup.py
Please refer to:
setup.py
will be deprecated in 0.2.X series
Next release should enforce import/export CSV pandas dataframe from prediction, intersection, mapping and merge/clip maps.
Missing [index] first-column can break part of the bash/XLS pipeline. It is requried by LGA pipeline but there is no real functionality for predicting
Check known and newer scores validity at post-processing stage (MSE of p, B-score, FX-family scores, MCC)
In order to calculate the uncertainty of the BNN predictions, the number of samples (-s, --samples) must be larger than 1. This bug affects both the training and the inference modules.
It would require either a validation at arg-parsing level, or making a decision when computing the uncertainty.
SAMPLES=1 could mean that the user wants a prediction from the frozen network, where the NN parameters are not drawn from the Gaussian distribution but the Gaussian mean is used as the expected value of such parameters. This can be achieved by freezing the NN.model
Check MWS at https://github.com/shuds13/pyexample/blob/master/setup.py
Match oplab setup approach https://github.com/ocean-perception/georef_semantics/blob/main/setup.py
For quality control and replication purposes, it is convenient to have a copy of training time information within the network, including fields like:
Show both KL-divergence (g.o.f.) and fitting loss as part of the composite cost function. This can be informative when trying to fit non-normal distributions and when cost function is heavily unbalanced
Python version must be bumped up to 3.9 or 3.11.
Check deps
There is 1 occurrence of this issue in the repository.
See all occurrences on DeepSource → deepsource.io/gh/cappelletto/bayesian-inference/issue/SH-2061/occurrences/
Append to PyTorch model + state dictionary the following parameters:
Caution: some transformations way impose restrictions in the X input domain (e.g. non-negaive numbers for logNormal)
Follow oplab pattern for user defined external configuration via YAML file
Retrieve and forward the user-defined name of the columns used as target/label during the training. Replicate the naming convention, prepending uncertainty_ and predicted_ to the exported dataframe CSV
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.