vangj / py-bbn Goto Github PK

Inference in Bayesian Belief Networks using Probability Propagation in Trees of Clusters (PPTC) and Gibbs sampling

Home Page: https://py-bbn.readthedocs.io/

License: Apache License 2.0

Makefile 0.15% Python 21.62% Jupyter Notebook 78.01% Shell 0.18% Dockerfile 0.04%

bayesian-belief-networks inference approximate-inference-algorithm gibbs-sampling probability-propagation causal-inference causation bbn python-libraries turing-bbn

py-bbn's Introduction

End of Life

Note: This codebase is no longer being maintained. It will be sunset and archived soon. A new and re-written library (under the same name) is available here.

PyBBN

PyBBN is Python library for Bayesian Belief Networks (BBNs) exact inference using the junction tree algorithm or Probability Propagation in Trees of Clusters (PPTC). The implementation is taken directly from C. Huang and A. Darwiche, "Inference in Belief Networks: A Procedural Guide," in International Journal of Approximate Reasoning, vol. 15, pp. 225--263, 1999. In this API, PPTC is applied to BBNs with all discrete variables. When dealing with a BBN with all Gaussian variables (or a Gaussian Belief Network, GBN), exact inference is conducted through an incremental algorithm manipulating the means and covariance matrix. Additionally, there is the ability to generate singly- and multi-connected graphs, which is taken from JS Ide and FG Cozman, "Random Generation of Bayesian Network," in Advances in Artificial Intelligence, Lecture Notes in Computer Science, vol 2507. There is also the option to generate sample data from your BBN. This synthetic data may be summarized to generate your posterior marginal probabilities and work as a form of approximate inference. Lastly, we have added Pearl's do-operator for causal inference.

Power Up, Next Level

Rocket Vector	Autonosis	pyspark-bbn

If you like py-bbn, please inquire about our next-generation products below! [email protected]

Rocket Vector is a causal learning platform in the cloud!
Autonosis is a GenAI + CausalAI capabable platform.
pyspark-bbn is a is a scalable, massively parallel processing MPP framework for learning structures and parameters of Bayesian Belief Networks BBNs using Apache Spark.

Exact Inference, Discrete Variables

Below is an example code to create a Bayesian Belief Network, transform it into a join tree, and then set observation evidence. The last line prints the marginal probabilities for each node.

from pybbn.graph.dag import Bbn
from pybbn.graph.edge import Edge, EdgeType
from pybbn.graph.jointree import EvidenceBuilder
from pybbn.graph.node import BbnNode
from pybbn.graph.variable import Variable
from pybbn.pptc.inferencecontroller import InferenceController

# create the nodes
a = BbnNode(Variable(0, 'a', ['on', 'off']), [0.5, 0.5])
b = BbnNode(Variable(1, 'b', ['on', 'off']), [0.5, 0.5, 0.4, 0.6])
c = BbnNode(Variable(2, 'c', ['on', 'off']), [0.7, 0.3, 0.2, 0.8])
d = BbnNode(Variable(3, 'd', ['on', 'off']), [0.9, 0.1, 0.5, 0.5])
e = BbnNode(Variable(4, 'e', ['on', 'off']), [0.3, 0.7, 0.6, 0.4])
f = BbnNode(Variable(5, 'f', ['on', 'off']), [0.01, 0.99, 0.01, 0.99, 0.01, 0.99, 0.99, 0.01])
g = BbnNode(Variable(6, 'g', ['on', 'off']), [0.8, 0.2, 0.1, 0.9])
h = BbnNode(Variable(7, 'h', ['on', 'off']), [0.05, 0.95, 0.95, 0.05, 0.95, 0.05, 0.95, 0.05])

# create the network structure
bbn = Bbn() \
    .add_node(a) \
    .add_node(b) \
    .add_node(c) \
    .add_node(d) \
    .add_node(e) \
    .add_node(f) \
    .add_node(g) \
    .add_node(h) \
    .add_edge(Edge(a, b, EdgeType.DIRECTED)) \
    .add_edge(Edge(a, c, EdgeType.DIRECTED)) \
    .add_edge(Edge(b, d, EdgeType.DIRECTED)) \
    .add_edge(Edge(c, e, EdgeType.DIRECTED)) \
    .add_edge(Edge(d, f, EdgeType.DIRECTED)) \
    .add_edge(Edge(e, f, EdgeType.DIRECTED)) \
    .add_edge(Edge(c, g, EdgeType.DIRECTED)) \
    .add_edge(Edge(e, h, EdgeType.DIRECTED)) \
    .add_edge(Edge(g, h, EdgeType.DIRECTED))

# convert the BBN to a join tree
join_tree = InferenceController.apply(bbn)

# insert an observation evidence
ev = EvidenceBuilder() \
    .with_node(join_tree.get_bbn_node_by_name('a')) \
    .with_evidence('on', 1.0) \
    .build()
join_tree.set_observation(ev)

# print the marginal probabilities
for node in join_tree.get_bbn_nodes():
    potential = join_tree.get_bbn_potential(node)
    print(node)
    print(potential)

Exact Inference, Gaussian Variables

The example belows shows how to perform inference on multivariate Gaussian variables.

import numpy as np

from pybbn.gaussian.inference import GaussianInference


def get_cowell_data():
    """
    Gets Cowell data.

    :return: Data and headers.
    """
    n = 10000
    Y = np.random.normal(0, 1, n)
    X = np.random.normal(Y, 1, n)
    Z = np.random.normal(X, 1, n)

    D = np.vstack([Y, X, Z]).T
    return D, ['Y', 'X', 'Z']


# assume we have data and headers (variable names per column)
# X is the data (rows are observations, columns are variables)
# H is just a list of variable names
X, H = get_cowell_data()

# then we can compute the means and covariance matrix easily
M = X.mean(axis=0)
E = np.cov(X.T)

# the means and covariance matrix are all we need for gaussian inference
# notice how we keep `g` around?
# we'll use `g` over and over to do inference with evidence/observations
g = GaussianInference(H, M, E)
# {'Y': (0.00967, 0.98414), 'X': (0.01836, 2.02482), 'Z': (0.02373, 3.00646)}
print(g.P)

# we can make a single observation with do_inference()
g1 = g.do_inference('X', 1.5)
# {'X': (1.5, 0), 'Y': (0.76331, 0.49519), 'Z': (1.51893, 1.00406)}
print(g1.P)

# we can make multiple observations with do_inferences()
g2 = g.do_inferences([('Z', 1.5), ('X', 2.0)])
# {'Z': (1.5, 0), 'X': (2.0, 0), 'Y': (1.97926, 0.49509)}
print(g2.P)

Building

To build, you will need 3.7. Managing environments through Anaconda is highly recommended to be able to build this project (though not absolutely required if you know what you are doing). Assuming you have installed Anaconda, you may create an environment as follows (make sure you cd into the root of this project's location).

To create the environment, use the following commands.

conda env create -f environment.yml

If you want to use the environments with Jupyter, install the kernel.

conda activate pybbn37
python -m ipykernel install --user --name pybbn37 --display-name "pybbn37"

Then you may build the project as follows. (Note that in Python 3.6 you will get some warnings).

make build

To build the documents, go into the docs sub-directory and type in the following.

make html

Testing

You can do a fresh test with Docker as follows.

docker build -t pybbn-test:local -f Dockerfile.test .

Installing

From PyPi

Use pip to install the package as it has been published to PyPi.

pip install pybbn

From Source

If you check out the source do the following.

pip list | grep pybbn
pip uninstall pybbn
python setup.py install
pip list | grep pybbn

GraphViz issue

Make sure you install GraphViz on your system.

CentOS: yum install graphviz*
Ubuntu: sudo apt-get install graphviz libgraphviz-dev
Mac OSX: brew install graphviz and when you install pygraphviz pip install pygraphviz --install-option="--include-path=/usr/local/lib/graphviz/" --install-option="--library-path=/usr/local/lib/graphviz/"
Windows: use the msi installer
- For Anaconda + Windows, install pygraphviz from this channel conda install -c alubbock pygraphviz

testpypi issue

You should NOT be doing this operation, but if you do want to install from testpypi, then add the --extra-index-url as follows.

pip install -i https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ pybbn

Other Python Bayesian Belief Network Inference Libraries

Here is a list of other Python libraries for inference in Bayesian Belief Networks.

Library	Algorithm	Algorithm Type	License
BayesPy	variational message passing	approximate	MIT
pomegranate	loopy belief	approximate	MIT
pgmpy	multiple	approximate/exact	MIT
libpgm	likelihood sampling	approximate	Proprietary
bayesnetinference	variable elimination	exact	None

I found other packages in PyPI too.

Java

But I am coming from the Java mothership and I want to use Bayesian Belief Networks in Java. How do I perform probabilistic inference in Java?

This Python code base is a port of the original Java code.

Help

Citation

@misc{vang_2017, 
title={PyBBN}, 
url={https://github.com/vangj/py-bbn/}, 
journal={GitHub},
author={Vang, Jee}, 
year={2017}, 
month={Jan}}

Online Articles

I found these online articles using PyBBN.

Copyright Stuff

Software

Copyright 2017 -- 2023 Jee Vang

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

Art Copyright

Sponsor, Love

py-bbn's People

Contributors

Stargazers

Watchers

Forkers

tarunsinghal92 schreck61 ekko2 abouzarabbaspour ostwalprasad vishalbelsare johnvonlzf gitter-badger a0x8o drahnreb mathsml ben-learner sureshakukkaje carbirbal huang-salute deyh2020

py-bbn's Issues

Generated graphs may not be DAG

from pybbn.generator.bbngenerator import generate_multi_bbn, convert_for_exact_inference
from pybbn.sampling.sampling import LogicSampler


bbn = convert_for_exact_inference(*generate_multi_bbn(20, max_iter=5_000))
sampler = LogicSampler(bbn)
print(sampler.nodes)

samples = sampler.get_samples(n_samples=5, seed=37)

print(samples)

Will result in the following.

Traceback (most recent call last):
  File "/Users/super/git/summer-camp/web/pyodide/data/generate.py", line 9, in <module>
    samples = sampler.get_samples(n_samples=5, seed=37)
  File "/opt/anaconda3/lib/python3.9/site-packages/pybbn/sampling/sampling.py", line 138, in get_samples
    val = table.get_value(p, sample=sample)
  File "/opt/anaconda3/lib/python3.9/site-packages/pybbn/sampling/sampling.py", line 65, in get_value
    k = ','.join([f'{i}={sample[i]}' for i in self.parent_ids])
  File "/opt/anaconda3/lib/python3.9/site-packages/pybbn/sampling/sampling.py", line 65, in <listcomp>
    k = ','.join([f'{i}={sample[i]}' for i in self.parent_ids])
KeyError: 2

Divide by 0 Error

When I called from_data in the pybbn.graph.factory class on my dataset, it gave a divide by zero error. There is a line that is
prob = numer / denom, which is numerically unstable. I changed it to prob = numer / (denom + 0.00001) to get the code to run. I think this would be a good improvement for your project because adding a small value to denominators is generally used for stability purposes. Thanks

pybbn updating wrong result

I'm try to use pybbn update by ".reaply"

However, after update, i get wrong answer.

before update:
analysis solution >> P(clay_till=F)=0.625110.6 + 0.125110.4 = 0.425
pybbn return 0.425 >> get right answer

after update:
analysis solution >> P(clay_till=F)=0.8125110.6 + 0.3125110.4 = 0.6125
pybbn return 0.55 >> get wrong answer

i will upload the code, cpt table, bnn structure.

p.s. what different between ".with_evidence" and ".reaply"

thank for reading.
PyBbn.zip

Mean sign is flipped in Gaussian inference

The sign of the mean is flipped when retrieving the parameters for Gaussian inference. This is being done in the .P() method ( https://github.com/vangj/py-bbn/blob/master/pybbn/gaussian/inference.py#L71 ). Used the following code from sample docs to recreate:

import numpy as np
import pandas as pd
from pybbn.gaussian.inference import GaussianInference

def get_cowell_data():
    """
    Gets Cowell data.

    :return: Data and headers.
    """
    n = 10000
    Y = np.random.normal(0, 1, n)
    X = np.random.normal(Y, 1, n)
    Z = np.random.normal(X, 1, n)

    D = np.vstack([Y, X, Z]).T
    return D, ['Y', 'X', 'Z']


# assume we have data and headers (variable names per column)
# X is the data (rows are observations, columns are variables)
# H is just a list of variable names
X, H = get_cowell_data()
![Screen Shot 2023-04-14 at 11 22 44 AM](https://user-images.githubusercontent.com/16946130/232086920-bf55358e-d798-4789-8017-b25e70c6a57b.png)

# then we can compute the means and covariance matrix easily
M = X.mean(axis=0)
E = np.cov(X.T)
print('Means from data: ', M)

g = GaussianInference(H, M, E)
print('Means returned from g.P: ', g.P)

PPTC implementation returns incorrect probabilities

Hello! I've noticed that on a medium sized graph PPTC implementation in this library reurns incorrect probabilities. Here is the simplest version of the network which showcases the problem.

[
  {"id":"A","states":["T","F"],"parents":[],"cpt":[0.5,0.5]},
  {"id":"B","states":["T","F"],"parents":["A"],"cpt":[0.2,0.8,0.1,0.9]},
  {"id":"C","states":["T","F"],"parents":["A"],"cpt":[0.5,0.5,0.5,0.5]},
  {"id":"D","states":["T","F"],"parents":["B"],"cpt":[0.5,0.5,0.5,0.5]},
  {"id":"E","states":["T","F"],"parents":["B"],"cpt":[0.5,0.5,0.5,0.5]},
  {"id":"F","states":["T","F"],"parents":["C"],"cpt":[0.5,0.5,0.5,0.5]},
  {"id":"G","states":["T","F"],"parents":["E","D","F"],"cpt":[0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5]},
]

The implementation calculates the following probabilities:

{
  "A": 0.5686274509803922,
  "B": 0.5000000000000001,
  "C": 0.5,
  "D": 0.5,
  "E": 0.5,
  "F": 0.5,
  "G": 0.5,
}

However I expect the probabilities to be:

{
  "A": 0.5,
  "B": 0.15,
  "C": 0.5,
  "D": 0.5,
  "E": 0.5,
  "F": 0.5,
  "G": 0.5,
}

Could you please fix this bug?
Every implemetation I tried so far had a serious issue with it 😢

Python 3?

Cool package. Any plans for python 3?

Posterior probabilities depend on node names

When constructing a BBN from structure and data, the computed posterior probabilities depend on the order of node names. This seems to happen when the columns of the dataframe with the data are not sorted.

A minimal working example is attached.

pybbn_bug_mwe.zip

Inference not finishing for bigger graph

I find this library very easy to use, good documentation and code, I could set it up in 15 mins, compared to pomegranate, pgmpy, pymc - which took longer.

For my simple test Bayesian network, the test has passed, and everything is perfect. But when I test for a real-world network join_tree = InferenceController.apply(model) just runs "forever".
What are the limitations of this? The graph consists of 226 nodes and 344 edges. Most nodes have 2 states (0,1), but a few have 3-4 or max 5 states. Is this graph too big for this algorithm or am I doing something wrong?

My testcode:

def create_model(nodes: list[Node]):
    model = Bbn()
    for node in nodes:
        bbn_node = BbnNode(Variable(node.id, node.id, list(range(node.stateCount))), node.probabilities)
        model.add_node(bbn_node)
        if node.parentIds::
            for parent_id in node.parentIds:
                parent = model.get_node(parent_id)
                model.add_edge(Edge(parent, bbn_node, EdgeType.DIRECTED))

    join_tree = InferenceController.apply(model)
    return join_tree

(nodes are topological sorted)

pybnn .EvidenceBuilder() /.with_evidence function not providing same results

The .EvidenceBuilder() /.with_evidence did not provide the same results with the same observation.
I have used the data from the paper "BBN: Bayesian Belief Networks — How to Build Them Effectively in Python" at https://towardsdatascience.com/bbn-bayesian-belief-networks-how-to-build-them-effectively-in-python-6b7f93435bba

The predicted value based on an observation changed from the original value (no observation) to the expected value (with observation), as expected. However, If repeated the function "predict", it goes back to the original value.

def posterior_res(jt):
for node, posteriors in jt.get_posteriors().items():
p = ', '.join([f'{val}={prob:.5f}' for val, prob in posteriors.items()])
print(f'{node} : {p}')
return
def predict(jt, formula, node_var, obsv, prob_obsv):
formula = EvidenceBuilder()
.with_node(jt.get_bbn_node_by_name(node_var))
.with_evidence(obsv, prob_obsv)
.build()

jt.set_observation(formula)
return

predict(join_tree, 'H9am_obsv', 'Humidity9am', "High_>60", 1)
posterior_res(join_tree)

Your assistance is appreciated.

Linear Gaussian Inference on Evidence

Firstly, many thanks for writting this module. It's quite helpful!

I've been playing around with LGs for past week with this module. I wanted to know which paper/book was followed for Linear Gaussian Inference ( src>lg>graph.py) particularly
predict_proba() code. I found this one similar. Also, l couldn't find any docs/example for lg>gaussian and lg>inference.

It would be grateful if you could add reference links for gaussian models.

Also, I tried do_inference() function to get inference using conditioned variable :

I created a small network : Node0>Node1>Node2

# Create Toy data in pandas dataframe

df = pd.DataFrame()
length=99
df['a'] = np.linspace(0,50,length) + np.random.normal(0,2,length)
df['b'] = (3*df['a'])  + np.random.normal(0,2,length)
df['d'] = (2*df['b']) + np.random.normal(0,2,length)

# Calculate covariance and mean of dataframe
cov = np.array(df.cov())
mean = np.array(df.mean())

#Create Network
dag1 = Dag()
dag1.add_node(0,metadata={'name':'A'})
dag1.add_node(1,metadata={'name':'B'})
dag1.add_node(2,metadata={'name':'D'})
dag1.add_edge(0, 1)
dag1.add_edge(1, 2)

# The parameters are estimated from the samples above
params = Parameters(mean, cov)

# create the bayesian belief network
bbn1 = Bbn(dag1, params)

These are the means and covariances for the data

mean  = array([ 24.79494819,  74.44707958, 148.96799903])
cov = array([[ 220.64004318,  662.35310334, 1321.52335555],
       [ 662.35310334, 1991.98019528, 3974.13546903],
       [1321.52335555, 3974.13546903, 7933.38989638]])

Now, when I condition on Node2(Which is child of Node 1), I get following means and covariances.

bbn1.clear_evidences()
bbn1.set_evidence(2,100)
mean_c,cov_c =  bbn1.do_inference()

array([ 16.63798689,  49.91715425, 100.])
array([[5.04136325e-01, 3.52003045e-01, 1.32152336e+03],
       [3.52003045e-01, 1.18522207e+00, 3.97413547e+03],
       [1.32152336e+03, 3.97413547e+03, 1.00000000e-02]])

Now, as network has hierarchy like Node0>Node1>Node2, evidence on Node 2 shouldn't change the means of Node0 and Node1. I tried putting evidence on Node 1 as well and it also changed values of both Node0 and Node2. Seems like there's no d-separation.

I saw source code and found out that network structure is not being utlized to find out values, rather conditionals are calculated directly using block matrices (function compute_means() and compute_covs() ) . Shouldn't Linear Gaussian network consider network hirarchy as well like in this presentation slide 11-12?

Thanks again!

Edit: Added example code.