deeprob-org / deeprob-kit Goto Github PK

View Code? Open in Web Editor NEW

60.0 3.0 8.0 330 KB

A Python Library for Deep Probabilistic Modeling

Home Page: https://deeprob-kit.readthedocs.io/en/latest/

License: MIT License

Makefile 0.16% Python 99.64% Shell 0.20%

normalizing-flows sum-product-networks probabilistic-models probabilistic-circuits

deeprob-kit's People

Contributors

Stargazers

Watchers

Forkers

fedous marco299 rohithpeddi yangyang-pro davidrmh godlovxiari mitchell-xiyunfeng remilvus

deeprob-kit's Issues

Create TreeBN class

Most of the code available in BinaryCLT actually works for any tree-shaped Bayesian Network. Therefore, it would be better to create a super-class called TreeBN and then make BinaryCLT a subclass of it.

Example `plot_spn.py` raises an error

I was trying to run plot_spn.py, but the code raises an error. Here's the output:

Plotting the dummy SPN to spn-dummy.svg ...
Traceback (most recent call last):
  File ".../deeprob-kit/examples/spn_plot.py", line 25, in <module>
    spn.plot_spn(root, spn_filename)
  File ".../miniconda3/envs/deeprob/lib/python3.9/site-packages/deeprob/spn/structure/io.py", line 317, in plot_spn
    pos = nx_pydot.graphviz_layout(graph, prog='dot')
  File ".../miniconda3/envs/deeprob/lib/python3.9/site-packages/networkx/drawing/nx_pydot.py", line 357, in graphviz_layout
    return pydot_layout(G=G, prog=prog, root=root)
  File ".../miniconda3/envs/deeprob/lib/python3.9/site-packages/networkx/drawing/nx_pydot.py", line 406, in pydot_layout
    P = to_pydot(G)
  File ".../miniconda3/envs/deeprob/lib/python3.9/site-packages/networkx/drawing/nx_pydot.py", line 263, in to_pydot
    raise ValueError(
ValueError: Node names and attributes should not contain ":" unless they are quoted with "".                For example the string 'attribute:data1' should be written as '"attribute:data1"'.                Please refer https://github.com/pydot/pydot/issues/258

Setup "deeprob-kit-docs" repository or "gh-pages" branch for automatically versioned documentation

Setup a new repository deeprob-org/deeprob-kit-docs or the special branch gh-pages containing versioned documentation.
Refer to sphinx-multiversion for building versioned documentation.
In particular, refer to a fork of sphinx-multiversion supporting sphinx-apidoc and sphinx-autodoc.

Finally, setup a GitHub Action to automatically push new documentation versions when:

A push on main branch is made.
A new tag/release is pushed.

However, this can also be done using Travis CI.

Implement the FID score for normalizing flows

Implement the FID score for generative models. A suitable package to place the function fid_score is deeprob.utils.statistics.

Moreover, include the FID score, aside the BPP (bits-per-pixel) metric, in the results given by normalizing flows experiments.

Add link to Arxiv article into readme

Perhaps the following link to the corresponding Arxiv article can be included in the readme:

https://arxiv.org/abs/2212.04403

Refactor Unit Tests

Refactor tests to use pytest instead of unittest
Add tests for shapes checking
Introduce Continuous Integration (CI) (e.g. GitHub Action using Codecov on merge on main)

Histogram2D does not support string arguments for bins anymore

Implementation of Greedy Variable Splitting (GVS) is broken.

Fully differentiable MAFs

The method "apply_forward" of the class "AutoregressiveLayer" is not differentiable, due to in-place operations. This makes training using the flow sampling direction impossible.

RuntimeError when sampling from RealNVP with n_flows > 1

Forward evaluation of RealNVP with more than one of sequential multiscale architectures raises a RuntimeError.

Deprecated dependency

The library depends on the deprecated sklearn package and should be updated to use scikit-learn instead.

See: https://pypi.org/project/sklearn/

Introduce compatibility check between two PCs

Introduce method are_compatible between two PCs.
Rewrite method is_structured_decomposable, since we can define it as self-compatibility.

Check the implementation of MPE queries on Binary Chow-Liu Trees (CLTs)

Review the implementation of MPE queries on Binary Chow-Liu Trees (CLTs) and write unit tests related on the correctness of MPE queries.

Introduce multithreaded implementation of forward and backward evaluation of SPNs

The forward evaluation (used for EVI, MAR and MPE queries and sampling) can be parallelized by considering a layered topological ordering of the SPN graph. That is, every leaf node can be evaluated in parallel and, after that, every parent node can be computed in parallel as well, and so on.
The backward evaluation (used for MPE query and sampling) can be parallelized by considering a layered topological ordering of the SPN graph, as for forward evaluation.
Moreover, introduce unit tests ensuring the correctness of the implementation.

A suitable multiprocessing library for this task is joblib, for which it is possible to specify 'threading' as lightweight backend.

On flows, mean and standard deviation of default base distribution are not kept constant during training

When training a normalizing flow having a Standard Gaussian as base distribution (i.e. using in_base=None by default), mean and standard deviation are not kept constant during training. The expected behavior is that they must be kept constant during training.

This is probably due to a wrong initialization of mean and standard deviation parameters: https://github.com/deeprob-org/deeprob-kit/blob/main/deeprob/flows/models/base.py#L52-L53.

Introduce additional scripts regarding XPCs

Add an example about learning and using XPCs.
Add a script (similar to experiments/spn.py) to launch XPC experiments.
Add basic unit tests about XPC related modules.

Write a README.md file for each sub-directory

Split the README.md file at root directory into multiple markdown files discussing the content (and usage) of the scripts present in the following directories:

benchmark
docs
examples
experiments

Make benchmark reults more reliable

Disable automatic garbage collection in Python.
Execute garbage collection manually outside of code blocks that measure elapsed times.
Increase the number of repetitions from 20 to 50.
Add benchmarks rule to Makefile.

Setup PyLint

Setup PyLint static code analyser.
Also, setup GitHub Action to automatically print a report about code quality.

Unclear where experiments folder is

I installed deeprob-kit using pip:

$ pip install --user deeprob-kit

Now, I want try out the experiments to see if the code works on my system. However, I do not seem to have the experiments folder and therefore do not seem to be able to run them or put the datasets in place. Namely, what I have after installation is the following tree:

~/.local/lib/python3.9 $ tree -L 3
.
└── site-packages
    ├── deeprob
    │   ├── __init__.py
    │   ├── __pycache__
    │   ├── context.py
    │   ├── flows
    │   ├── spn
    │   ├── torch
    │   └── utils
    └── deeprob_kit-1.0.0.dist-info
        ├── INSTALLER
        ├── LICENSE
        ├── METADATA
        ├── RECORD
        ├── REQUESTED
        ├── WHEEL
        └── top_level.txt

8 directories, 9 files

My impression is that the bundle on PyPi only contains deeprob-kit itself, without any of the other materials. Perhaps putting the experiments folder (and others) under deeprob may provide a solution, but I guess you chose the current structure for a reason. Or perhaps I am looking in the wrong location.

Add a string flag "method" on SPN learning wrappers

Add a string flag method on learn_estimator function (in module deeprob.spn.learning.wrappers) that permits to choose between different SPN learning algorithms.

At the moment, the flag method must support two values: learnspn and learnxpc, corresponding to LearnSPN and LearnXPC algorithms respectively.

Update README.md and fix implicit imports

Update the table of implemented models in README.md
Add NormalizingFlow abstract class import in flows/models/__init__.py
Add RatSpn abstract class import in spn/models/__init__.py
Fix 'type' object is not subscriptable using sphinx
Prepend MIT license information to every source file in deeprob/

Fix most Pylint errors

Fix most of the errors given by the Pylint static code analyzer.

Feedback on the example running experience

I ran all examples. They are a nice way of testing how the code runs on one's computer and show its capabilities. Below, I provide some points of feedback/suggestions that may improve the experience people have when running the examples. Some of that feedback may pertain or be relevant to other parts of the code base as well.

Often, files are created as part of an example, such as the nice illustrative figures. It would be useful to alert the user of all files being created, so that they are aware of this even if they do not keep an eye on their working folder. Also, some files have unclear purpose (such as the pt files). Clarifying their use when alerting they are created would therefore be useful. (If they are temporary files, delete them at the end of running the example or use the tempfile module.)
The console output provides useful information about the time it takes to run an example. If possible, generalize this to all examples that are not trivially short. (I think the first stage of spn_latent_mnist.py does not.)
The console output numerical values often have a large number of digits displayed. There is little reason to believe that many are actually significant. Furthermore, it makes the output more difficult to read and digest. Ideally, output only significant digits, but if you do not know how many digits are significant, 4 digits in total is a good upper bound (like 57.63 %, 1234, 1.234e6).
Many of the console output numbers have units (s, it/s, batch/s). The international standard is to always have a space between a number and its unit.
Sometimes, JSON output is created either as console output or in files. Try to pretty-print it a bit, to make it easier to scan. If it is not meant to be read, perhaps consider omitting it.
For many of the examples, you generate images, which is great. It would add value to have every example generate some image, even if it is not a sample. Namely, the examples can also provide users of the package inspiration of the type of images that they might generate.
In one case, an image was generated in an interactive window (nvp1d_moons.py) and not in an image file. That is nice. Could it be generalized to all examples, with a fallback to image file generation?
In two cases, the examples automatically downloaded some datasets. While convenient, some users might not expect this, may not like it, or may not have an internet connection. I think it would be more user-friendly to ask first or instruct first where to download the dataset. Furthermore, I saw that MNIST was downloaded from LeCunn's original website, who explicitly requests not to do that (“Please refrain from accessing these files from automated scripts with high frequency. Make copies!”); it would be polite to honor that request. In general, make sure to download from permanent repositories if possible instead of possibly non-permanent websites.
The console output lists accuracy percentages. These generally are quite a bit closer to 100 % than to 0 %. Therefore, the initial digit (7, 8, 9) is often not very significant and therefore distracting. It is more user friendly to use error rate instead, so, e.g., [12.49, 8.66, 4.57] instead of [87.51, 91.34, 95.43].

Obviously, these are mostly cosmetic suggestions, so I'd understand that you classify (parts of) this issue as ‘wontfix’.

deeprob-org / deeprob-kit Goto Github PK

deeprob-kit's People

Contributors

Stargazers

Watchers

Forkers

deeprob-kit's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs