GithubHelp home page GithubHelp logo

ccdr's Introduction

ccdr

ccdr implements the CCDr structure learning algorithm described in [1]. Based on observational data, this algorithm estimates the structure of a Bayesian network (aka edges in a DAG) using penalized maximum likelihood based on L1 or concave (MCP) regularization.

Important note: This legacy package consists of a single method that implements the main algorithm and is not actively maintained. The main purpose of this repo is to serve as a reproducible snapshot for the simulations in [1].

Installation

Please note that this package is currently in a development state. In order to install the package, you will need both devtools and Rcpp. We are still working out the kinks of getting the package to compile on different systems, so please let us know if you have any issues.

If you have never installed packages using Rcpp before, we recommend checking out the following resources which should get you started:

The code is hosted on github and can be installed using devtools:

if (packageVersion("devtools") < 1.6) {
  install.packages("devtools")
}
if(devtools::find_rtools()) devtools::install_github("itsrainingdata/ccdr")

NOTE: Windows users will need to make sure that Rtools is installed in order to build this package. To check if you have Rtools installed, you can use devtools::find_rtools(). For more details: http://cran.r-project.org/bin/windows/Rtools/.

If you find any bugs, please report them here on github.

Examples

The basic usage is as follows:

### Specify dimensions
nn <- 20
pp <- 100

### Generate a random Gaussian matrix
dat <- matrix(rnorm(nn * pp), nrow = nn)

### Run the ccdr algorithm
ccdr.path <- ccdr.run(data = dat, lambdas.length = 20)

### Display the results
print(ccdr.path)

The output of ccdr.run is an S3 object ccdrPath, which is essentially a list of estimates, one for each value of lambda in the solution path. Each estimate is an S3 object ccdrFit. The DAG itself is stored as an edge list (see documentation for ccdrFit-class and edgeList-class for more details).

This trivial example uses uncorrelated normal data, which is not very interesting. In order to do some interesting calculations, we first need to generate data according to some pre-specified DAG structure. The ccdr package does not provide this functionality: To generate data from a given Bayesian network and/or simulate random networks, the following R packages are recommended:

Example using pcalg

The pcalg package provides two useful functions: randomDAG for generating a random ordered DAG and rmvDAG for generating random data according to the structural equation model implied by the DAG.

NOTE: Since randomDAG produces an ordered DAG, we should permute the columns in the simulated dataset in order to obfuscate this information. If we know this ordering in advance, there are much better methods for estimating the DAG structure using autoregressive models.

library("pcalg")

### Set up the model parameters
nn <- 20                # How many samples to draw?
pp <- 100               # How many nodes in the DAG?
num.edges <- 100        # How many *expected* edges in the DAG?
ss <- num.edges / pp    # This is the expected number of parents *per node*

### Generate a random DAG using the pcalg method randomDAG
beta.min <- 0.5
beta.max <- 2
edge.pr <- 2 * ss / (pp - 1)
g <- randomDAG(n = pp, prob = edge.pr, lB = beta.min, uB = beta.max) # Note that the edge weights are selected at random here!

### Generate random data according to this DAG using the method rmvDAG
dat <- rmvDAG(n = nn, dag = g, errDist = "normal")
dat <- dat[, sample(1:pp)] # permute the columns to randomize node ordering

### Run the algorithm
ccdr.path <- ccdr.run(data = dat, lambdas.length = 20, alpha = 10, verbose = FALSE)

Example using bnlearn

The bnlearn package provides methods for reading in data from the Bayesian Network Repository. We can use these methods along with the pcalg function rmvDAG to generate random data from these structures:

library("bnlearn")
library("graph")
library("pcalg")

### Download the RDA file containing bnlearn-compatible data from the repository
con <- url("http://www.bnlearn.com/bnrepository/hailfinder/hailfinder.rda")
load(con)
close(con)

### Convert the bnlearn data to a graph object
this.adj <- amat(bn)
this.graph <- as(this.adj, "graphNEL")

### Generate random data according to this DAG using the method rmvDAG
pp <- numNodes(this.graph)
nn <- 100
dat <- rmvDAG(n = nn, dag = this.graph, errDist = "normal")
dat <- dat[, sample(1:pp)] # permute the columns to randomize node ordering

### Run the algorithm
ccdr.path <- ccdr.run(data = dat, lambdas.length = 20, alpha = 10, verbose = FALSE)

References

[1] B. Aragam and Q. Zhou. Concave penalized estimation of sparse Gaussian Bayesian networks. The Journal of Machine Learning Research, In press, 2015.

ccdr's People

Contributors

itsrainingdata avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.