GithubHelp home page GithubHelp logo

laurimi / ddcrp-gibbs Goto Github PK

View Code? Open in Web Editor NEW
8.0 3.0 6.0 41 KB

Gibbs sampler for the Distance Dependent Chinese Restaurant Process

License: BSD 3-Clause "New" or "Revised" License

CMake 3.66% C++ 94.40% C 1.93%
gibbs-sampling markov-chain-monte-carlo dirichlet-process-mixtures

ddcrp-gibbs's Introduction

A Gibbs sampler for the Distance Dependent Chinese Restaurant Process (ddCRP)

This is a C++ implementation of a Gibbs sampler for the Distance Dependent Chinese Restaurant Process (ddCRP), originally introduced in: Blei, D.M., Frazier, P.I.: "Distance Dependent Chinese Restaurant Process", Journal of Machine Learning Research 12 (2011):2383-2410.

This implementation was used to obtain the results presented in: Lauri, M., Frintrop, S.: "Object Proposal Generation Applying the Distance Dependent Chinese Restaurant Process", in Proc. 20th Scandinavian Conference on Image Analysis, Tromsö, Norway, June 12--14, 2017.

For now, the code supports a multivariate normal cluster likelihood model.

Building

You need a compiler with C++11 support. This software also requires the Boost libraries and Eigen3.

mkdir build && cd build
cmake ..
make

The executable will be placed in the bin folder.

Running

Executing ./bin/ddcrp_clustering_example will yield the help message.

The feature (or data) file contains the N data points in d-dimensional space to feed into the ddCRP, as a N-by-d matrix.

The log decay file contains a N-by-N matrix. Informally, entry (i,j) quantifies the relative likelihood that the ith and jth data points will form a link (and thus be in the same cluster). More formally, entry (i,j) is equal to log( f(d(i,j)) ), where d(i,j) is a distance measure between the ith and jth data point, f is a decay function (see the papers). An entry -Inf here corresponds to an impossible link.

The prior covariance file sets the prior cluster covariance matrix, and is a d-by-d matrix. The prior mean file sets the prior cluster mean vector, a d-by-1 vector. The strengths of these priors are determined by the input parameters v and k.

n specifies how many samples of clusterings to draw from the ddCRP, and b sets the number of burn-in samples before outputting the samples.

You can also draw samples from the ddCRP prior (ignoring the likelihood model) by setting the switch --p.

Output

The output will be written to files called clustering_0000.csv with a running numbering. The ith row in the file has a comma separated list of data point indices belonging to the ith cluster. The number of rows in the file indicates the number of clusters. For example, for 5 data points an output

0, 1, 2
3
4, 5

would mean that there are 3 clusters, with data points corresponding to the indices {0,1,2}, {3}, and {4,5}, respectively.

Demo

There is some test data provided in the folder data. The file data.csv contains 100 samples drawn from two bivariate Gaussian distributions.

The file log_decay.csv contains a 100-by-100 matrix of log of the decay function values obtained as follows. We compute the Euclidean distance d(i,j) between each pair of data points. We apply a windowed exponential decay function

		exp(-d/a),		if d <= d_max
f(d) = 
		0				otherwise.

Here we have set a=0.3, d_max=1.5. The values on the off-diagonals of log_decay.csv are obtained by log(f(d(i,j))). For the diagonal entries we want the self-link likelihoods of each data point. Here we set a constant value -0.8/a.

We can run the clustering by

cd data
./../bin/ddcrp-gibbs-example -f data.csv -S covar.csv -m mean.csv -l log_decay.csv

The output will be written to the current folder. The figure below compares the true clusters (left) and one of the clusterings drawn from the ddCRP (right).

clustering example

ddcrp-gibbs's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

ddcrp-gibbs's Issues

Make table member list attribute of CustomerAssignment

Currently, CustomerAssignment::get_table_members works by generating a new set for the table members every time the function is called. Based on some quick profiling, this seems to be rather time consuming. Efficiency can be improved by returning const references to table member sets stored internally in CustomerAssignment, or other similar ideas.

In the worst case, this might require messing around with the internal implementation of the tables.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.