GithubHelp home page GithubHelp logo

deeprob-org / deeprob-kit Goto Github PK

View Code? Open in Web Editor NEW
60.0 3.0 8.0 330 KB

A Python Library for Deep Probabilistic Modeling

Home Page: https://deeprob-kit.readthedocs.io/en/latest/

License: MIT License

Makefile 0.16% Python 99.64% Shell 0.20%
normalizing-flows sum-product-networks probabilistic-models probabilistic-circuits

deeprob-kit's People

Contributors

fedous avatar gengala avatar loreloc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

deeprob-kit's Issues

Create TreeBN class

Most of the code available in BinaryCLT actually works for any tree-shaped Bayesian Network. Therefore, it would be better to create a super-class called TreeBN and then make BinaryCLT a subclass of it.

Example `plot_spn.py` raises an error

I was trying to run plot_spn.py, but the code raises an error. Here's the output:

Plotting the dummy SPN to spn-dummy.svg ...
Traceback (most recent call last):
  File ".../deeprob-kit/examples/spn_plot.py", line 25, in <module>
    spn.plot_spn(root, spn_filename)
  File ".../miniconda3/envs/deeprob/lib/python3.9/site-packages/deeprob/spn/structure/io.py", line 317, in plot_spn
    pos = nx_pydot.graphviz_layout(graph, prog='dot')
  File ".../miniconda3/envs/deeprob/lib/python3.9/site-packages/networkx/drawing/nx_pydot.py", line 357, in graphviz_layout
    return pydot_layout(G=G, prog=prog, root=root)
  File ".../miniconda3/envs/deeprob/lib/python3.9/site-packages/networkx/drawing/nx_pydot.py", line 406, in pydot_layout
    P = to_pydot(G)
  File ".../miniconda3/envs/deeprob/lib/python3.9/site-packages/networkx/drawing/nx_pydot.py", line 263, in to_pydot
    raise ValueError(
ValueError: Node names and attributes should not contain ":" unless they are quoted with "".                For example the string 'attribute:data1' should be written as '"attribute:data1"'.                Please refer https://github.com/pydot/pydot/issues/258

Setup "deeprob-kit-docs" repository or "gh-pages" branch for automatically versioned documentation

Setup a new repository deeprob-org/deeprob-kit-docs or the special branch gh-pages containing versioned documentation.
Refer to sphinx-multiversion for building versioned documentation.
In particular, refer to a fork of sphinx-multiversion supporting sphinx-apidoc and sphinx-autodoc.

Finally, setup a GitHub Action to automatically push new documentation versions when:

  1. A push on main branch is made.
  2. A new tag/release is pushed.

However, this can also be done using Travis CI.

Implement the FID score for normalizing flows

Implement the FID score for generative models. A suitable package to place the function fid_score is deeprob.utils.statistics.

Moreover, include the FID score, aside the BPP (bits-per-pixel) metric, in the results given by normalizing flows experiments.

Refactor Unit Tests

  • Refactor tests to use pytest instead of unittest
  • Add tests for shapes checking
  • Introduce Continuous Integration (CI) (e.g. GitHub Action using Codecov on merge on main)

Fully differentiable MAFs

The method "apply_forward" of the class "AutoregressiveLayer" is not differentiable, due to in-place operations. This makes training using the flow sampling direction impossible.

Introduce multithreaded implementation of forward and backward evaluation of SPNs

  • The forward evaluation (used for EVI, MAR and MPE queries and sampling) can be parallelized by considering a layered topological ordering of the SPN graph. That is, every leaf node can be evaluated in parallel and, after that, every parent node can be computed in parallel as well, and so on.
  • The backward evaluation (used for MPE query and sampling) can be parallelized by considering a layered topological ordering of the SPN graph, as for forward evaluation.
  • Moreover, introduce unit tests ensuring the correctness of the implementation.

A suitable multiprocessing library for this task is joblib, for which it is possible to specify 'threading' as lightweight backend.

On flows, mean and standard deviation of default base distribution are not kept constant during training

When training a normalizing flow having a Standard Gaussian as base distribution (i.e. using in_base=None by default), mean and standard deviation are not kept constant during training. The expected behavior is that they must be kept constant during training.

This is probably due to a wrong initialization of mean and standard deviation parameters: https://github.com/deeprob-org/deeprob-kit/blob/main/deeprob/flows/models/base.py#L52-L53.

Write a README.md file for each sub-directory

Split the README.md file at root directory into multiple markdown files discussing the content (and usage) of the scripts present in the following directories:

  • benchmark
  • docs
  • examples
  • experiments

Setup PyLint

Setup PyLint static code analyser.
Also, setup GitHub Action to automatically print a report about code quality.

Unclear where experiments folder is

I installed deeprob-kit using pip:

$ pip install --user deeprob-kit

Now, I want try out the experiments to see if the code works on my system. However, I do not seem to have the experiments folder and therefore do not seem to be able to run them or put the datasets in place. Namely, what I have after installation is the following tree:

~/.local/lib/python3.9 $ tree -L 3
.
└── site-packages
    ├── deeprob
    │   ├── __init__.py
    │   ├── __pycache__
    │   ├── context.py
    │   ├── flows
    │   ├── spn
    │   ├── torch
    │   └── utils
    └── deeprob_kit-1.0.0.dist-info
        ├── INSTALLER
        ├── LICENSE
        ├── METADATA
        ├── RECORD
        ├── REQUESTED
        ├── WHEEL
        └── top_level.txt

8 directories, 9 files

My impression is that the bundle on PyPi only contains deeprob-kit itself, without any of the other materials. Perhaps putting the experiments folder (and others) under deeprob may provide a solution, but I guess you chose the current structure for a reason. Or perhaps I am looking in the wrong location.

Add a string flag "method" on SPN learning wrappers

Add a string flag method on learn_estimator function (in module deeprob.spn.learning.wrappers) that permits to choose between different SPN learning algorithms.

At the moment, the flag method must support two values: learnspn and learnxpc, corresponding to LearnSPN and LearnXPC algorithms respectively.

Update README.md and fix implicit imports

  • Update the table of implemented models in README.md
  • Add NormalizingFlow abstract class import in flows/models/__init__.py
  • Add RatSpn abstract class import in spn/models/__init__.py
  • Fix 'type' object is not subscriptable using sphinx
  • Prepend MIT license information to every source file in deeprob/

Feedback on the example running experience

I ran all examples. They are a nice way of testing how the code runs on one's computer and show its capabilities. Below, I provide some points of feedback/suggestions that may improve the experience people have when running the examples. Some of that feedback may pertain or be relevant to other parts of the code base as well.

  1. Often, files are created as part of an example, such as the nice illustrative figures. It would be useful to alert the user of all files being created, so that they are aware of this even if they do not keep an eye on their working folder. Also, some files have unclear purpose (such as the pt files). Clarifying their use when alerting they are created would therefore be useful. (If they are temporary files, delete them at the end of running the example or use the tempfile module.)
  2. The console output provides useful information about the time it takes to run an example. If possible, generalize this to all examples that are not trivially short. (I think the first stage of spn_latent_mnist.py does not.)
  3. The console output numerical values often have a large number of digits displayed. There is little reason to believe that many are actually significant. Furthermore, it makes the output more difficult to read and digest. Ideally, output only significant digits, but if you do not know how many digits are significant, 4 digits in total is a good upper bound (like 57.63 %, 1234, 1.234e6).
  4. Many of the console output numbers have units (s, it/s, batch/s). The international standard is to always have a space between a number and its unit.
  5. Sometimes, JSON output is created either as console output or in files. Try to pretty-print it a bit, to make it easier to scan. If it is not meant to be read, perhaps consider omitting it.
  6. For many of the examples, you generate images, which is great. It would add value to have every example generate some image, even if it is not a sample. Namely, the examples can also provide users of the package inspiration of the type of images that they might generate.
  7. In one case, an image was generated in an interactive window (nvp1d_moons.py) and not in an image file. That is nice. Could it be generalized to all examples, with a fallback to image file generation?
  8. In two cases, the examples automatically downloaded some datasets. While convenient, some users might not expect this, may not like it, or may not have an internet connection. I think it would be more user-friendly to ask first or instruct first where to download the dataset. Furthermore, I saw that MNIST was downloaded from LeCunn's original website, who explicitly requests not to do that (“Please refrain from accessing these files from automated scripts with high frequency. Make copies!”); it would be polite to honor that request. In general, make sure to download from permanent repositories if possible instead of possibly non-permanent websites.
  9. The console output lists accuracy percentages. These generally are quite a bit closer to 100 % than to 0 %. Therefore, the initial digit (7, 8, 9) is often not very significant and therefore distracting. It is more user friendly to use error rate instead, so, e.g., [12.49, 8.66, 4.57] instead of [87.51, 91.34, 95.43].

Obviously, these are mostly cosmetic suggestions, so I'd understand that you classify (parts of) this issue as ‘wontfix’.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.