Following information has been extracted from "Constructing Deep Neural Networks by Bayesian Network Structure Learning"
""" Specifically, Bayesian networks for density estimation and causal discovery (Pearl, 2009; Spirtes et al., 2000). Two main approaches were studied: score-based and constraint-based. Score-based approaches combine a scoring function, such as BDe (Cooper & Herskovits, 1992), with a strategy for searching in the space of structures, such as greedy equivalence search (Chickering, 2002). Adams et al. (2010) introduced an algorithm for sampling deep belief networks (generative model) and demonstrated its applicability to high-dimensional image datasets. Constraint-based approaches (Pearl, 2009; Spirtes et al., 2000) find the optimal structures in the large sample limit by testing conditional independence (CI) between pairs of variables. They are generally faster than score-based approaches (Yehezkel & Lerner, 2009) and have a well-defined stopping criterion (e.g., maximal order of conditional independence). However, these methods are sensitive to errors in the independence tests, especially in the case of high-order CI tests and small training sets. """
Learning Bayesian networks by hill climbing: Efficient methods based on progressive restriction of the neighborhood https://www.researchgate.net/publication/225378901_Learning_Bayesian_networks_by_hill_climbing_Efficient_methods_based_on_progressive_restriction_of_the_neighborhood
Deep Learning: A Bayesian Perspective written by Poulson