GithubHelp home page GithubHelp logo

fpetitjean / hdp Goto Github PK

View Code? Open in Web Editor NEW
8.0 4.0 4.0 11.45 MB

Accurate estimation of conditional categorical probability distributions using Hierarchical Dirichlet Processes

License: GNU General Public License v3.0

Java 100.00%
dirichlet-process bayesian-statistics gibbs-sampling bayesian-inference hierarchical-dirichlet-processes

hdp's Introduction

HDP

Accurate estimation of conditional categorical probability distributions using Hierarchical Dirichlet Processes

This package offers an accurate parameter estimation technique for Bayesian Network classifiers. It uses a Hierarchical Dirichlet Process to estimate the parameters (using a collapsed Gibbs sampler). Note that the package is built in a generic way such that it can estimate any conditional probability distributions over categorical variables.

More information available at http://www.francois-petitjean.com/Research/

Underlying research and scientific paper

This code is supporting our paper in Machine Learning entitled "Accurate parameter estimation for Bayesian Network Classifiers using Hierarchical Dirichlet Processes".

The paper is also available on arXiv.

When using this repository, please cite:

@ARTICLE{Petitjean2018-HDP,
  author = {Petitjean, Francois and Buntine, Wray and Webb, Geoffrey I. and Zaidi, Nayyar},
  title = {Accurate parameter estimation for Bayesian Network Classifiers using Hierarchical Dirichlet Processes},
  journal={Machine Learning},
  year={2018},
  volume={107},
  number={8},
  pages={1303--1331},
  year = 2018
  url = {https://doi.org/10.1007/s10994-018-5718-0}
}

Compiling and launching

After cloning the repo, launch the following commands.

ant
java -Xmx1g -cp "bin:lib/*:lib/commons-math3-3.6.1/*"  hdp.testing.Test2LevelsExampleHeartAttack

This will run a simple example with a small toy dataset and then learning the probability distribution.

Dependencies

Getting a cross-platform jar

Simply entering ant creates a jar file that you can execute in most environments in dist/HDP.jar.

Memory Consumption

You may want to allow the JVM to use more memory if you are working with large models. Use the Xmx flag to increase the JVM memory. For example, java -Xmx4g.

Using it for your own library

The code available at src/monash/ml/hdp/testing/Test2LevelsExampleHeartAttack.java gives a good idea on how to plug your own code with this library. Basically, you have to create a dataset in the form of a matrix of integers (int[N][M+1]) where N is the number of samples, and M the number of covariates (or features). +1 is because the first column gives the values of the target variable you want to get a conditional estimate over. A cell data[i][j] represents the value taken by sample i for feature x_{j-1}. data[i][0] represents the value taken for the target variable. Things are coded over integers because this code is for categorical distributions.

String [][]data = {
    {"yes","heavy","tall"},
    {"no","light","short"},
    ...
    {"yes","heavy","med"}
};

ProbabilityTree hdp = new ProbabilityTree();
//learns p(target|x)
hdp.addDataset(data);
//print the tree
System.out.println(hdp.printProbabilities());

Contributors

Original research and code by:

Work on the Stirling Number Generator:

Support

YourKit is supporting this open-source project with its full-featured Java Profiler. YourKit is the creator of innovative and intelligent tools for profiling Java and .NET applications. http://www.yourkit.com

hdp's People

Contributors

fpetitjean avatar icesky0125 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.