GithubHelp home page GithubHelp logo

mspronesti / baylib Goto Github PK

View Code? Open in Web Editor NEW
30.0 3.0 2.0 1.71 MB

High-performance library for approximate inference on discrete Bayesian networks on GPU and CPU

License: GNU General Public License v3.0

C++ 87.93% CMake 2.16% Shell 0.35% Cuda 9.56%
inference parallel multithreading gpgpu-computing bayesian-inference bayesian-network mcmc mcmc-methods boost cuda

baylib's Introduction

baylib C++ library

C++ version Build Build GPU License

Baylib is a parallel inference library for discrete Bayesian networks supporting approximate inference algorithms both in CPU and GPU.

Main features

Here's a list of the main requested features:

  • Copy-On-Write semantics for the graph data structure, including the conditional probability table (CPT) of each node
  • parallel implementation of the algorithms either using C++17 threads or GPGPUU optimization
  • GPGPU optimization implemented with opencl, using boost compute and CUDA.
  • template-based classes for probability format
  • input compatibility with the XDSL format provided by the SMILE library
  • cmake-based deployment

Currently supported algorithms

  • Gibbs Sampling - C++11 threads
  • Likelihood Weighting - C++11 threads
  • Logic Sampling - GPGPU with boost compute
  • Rejection Sampling - C++11 threads
  • Adaptive importance sampling - C++11 threads, GPGPU with boost compute
algorithm evidence deterministic nodes multi-threading GPGPU-OpenCL GPGPU - CUDA
gibbs sampling *
likelihood weighting
logic sampling
rejection sampling
adaptive importance sampling

*It's a very well-known limitation of the Gibbs sampling approach

Dependencies

  • cmake >= 2.8
  • boost >= 1.65
  • libtbb
  • [optional] ocl-icd-opencl
  • [optional] mesa-opencl-icd

In order to use the cuda algorithms the system must be cuda compatible and the relative cuda toolkit must be installed.

Under Linux, you can install the required dependencies using the provided script install_dependencies.sh as follows

 cd scripts/
 chmod u+x install_dependencies.sh
./install_dependencies.sh

Install baylib

Using the cmake FetchContent directives you can directly setup baylib as follows

include(FetchContent)

FetchContent_Declare(
        baylib
        GIT_REPOSITORY https://github.com/mspronesti/baylib.git
)

FetchContent_MakeAvailable(baylib)
# create your executable 
# and whatever you need for
# your project ...
target_link_libraries(<your_executable> baylib)

Alternatively under Linux or MacOS, you can run the provided script install.sh as follows

cd scripts/
chmod u+x install.sh
sudo ./install.sh

another option for the script is running the following commands (assuming you're in the root of the project):

mkdir build
cd build
cmake ..
make
sudo make install

You can now include baylib in your projects.

In the latter two cases, make sure your CMakeLists.txt looks like this

find_package(baylib)
# create your executable 
# and whatever you need for
# your project ...
target_link_libraries(<your_executable> baylib)

Usage

Baylib allows performing approximate inference on Bayesian Networks loaded from xdsl files or created by hand (either using named nodes or numeric identifiers).

Have a look at examples for more.

External references

baylib's People

Contributors

akatief avatar mspronesti avatar paolotron avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

baylib's Issues

Convert CPT from file to algorithm format

The logic_sampling algorithm requires a specific format for the CPT:
image
in the image we can see an example taken from https://repo.bayesfusion.com/bayesbox.html, the CPT must be represented as a contiguous vector, the order in which the probabilities must be reported is by column, e.g. the set of related probabilities must be contiguous, the order in which the parents is presented is very important as it will be used to interpret the CPT vector.
From the image presented the derived CPT vector is:
{0.8, 0.2, 0.8, 0.2, 0.8, 0.2, 0.05, 0.095}
with the parents passed as:
{Increased_serum_calcium, Brain_Tumor}

condition should be pruned and reasserted before discarding it

Let X be a random variable of the Bayesian network.
Let P be the set of parents of X.
Let Y be a random variable not included in P.
From basic notions of conditional probability, P( X | P U Y) = P( X | P) therefore, if a condition contains couples node-state not being actual dependencies of X, I suggest pruning the condition before discarding it (the current assertion discards it once a non-parent is detected)

Consider using TBB to further parallelize std algorithms

To properly use TBB, install it with

sudo apt install libtbb-dev

(don't forget to add the above line to scripts/install_dependencies.sh).
Then add the following lines to baylib/CMakeLists.txt

find_package(TBB REQUIRED)
...
set(BAYLIB_REQUIRED_LIBS
     ...
     tbb
)

add documentation (Doxygen)

First install flex and bison

sudo apt install flex
sudo apt install bison

then clone the official repository

git clone https://github.com/doxygen/doxygen.git
cd doxygen

build

mkdir build
cd build
cmake -G "Unix Makefiles" ..
make

and force a clean install of doxygen

sudo make install

Create a doc directory in the root of this project and run the following command to have a template of a doxygen configuration file

doxygen -g doxy-config

edit doxy-config with your favorite text editor (e.g. vi) to specify the value of the INPUT variable. For instance

INPUT                  = ../baylib \
                         ../baylib/graph \
                         ../baylib/inference \
                         ../baylib/network \
                         ../baylib/parser  \
                         ../baylib/probability \
                         ../baylib/tools \
                         ../baylib/tools/cow \
                         ../baylib/tools/gpu \
                         ../baylib/tools/random 

Eventually, run

doxygen doxy-config

to produce the documentation both in html and latex format.

OpenCL algorithms very rarely cause segfault

Very rarely opencl algorithms that use boost compute cause a segfault, I wasn't able to track down the leading cause as it happens with a very low probability and I didn't manage to reproduce it for now.
A minimal reproducible example is needed.

GPU memory limitations might be problematic

Our logic sampling implementation is limited mainly by how much memory we can use on the GPU, a possible problem could arise when trying to use more memory than the device has available, the solution is to simply rerun the algorithm in separate instances and accumulate the results.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.