GithubHelp home page GithubHelp logo

dirichlet's Introduction

Dirichlet

A Python package to estimate the Dirichlet distribution, calculate maximum likelihood, and test for independence from a variable based on fitting nested Dirichlet distribution hypotheses.

Most of this package is a port of Thomas P. Minka's wonderful Fastfit MATLAB code. Much thanks to him for that and his clear paper "Estimating a Dirichlet distribution".

Dirichlet Test

This likelihood ratio test for independence will determine whether two Dirichlet-distributed data sets are likely to be from the same distribution or from two different ones, much like a chi-square or G-test for independence, but with Dirichlet models.

Simplex Plots

The dirichlet.simplex module creates scatter, contour, and filled contour 2-simplex plots.

Caveats

Note that this package at the moment doesn't support sparse data vectors due to the numerical fitting algorithm that uses the gamma function. Possibly some sort of additive smoothing would make this package work in your context, but that will depend on your application.

Installation

pip install git+https://github.com/ericsuh/dirichlet.git

This has only been tested with Python 3.6+. Other versions may work, but they haven't been tested.

Development

Note: These instructions have only been tested on Ubuntu/Debian.

Dev dependencies are listed in requirements-dev.txt. You can install them with:

pip install -r requirements-dev.txt

Code style

Please use black to format your code when contributing

Testing

This project uses tox and pytest for testing. To run tests, generally you can just run:

tox

To test a particular version of Python, you will need to have it installed and in your $PATH ahead of time.

dirichlet's People

Contributors

ericsuh avatar nbraem avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

dirichlet's Issues

How to handle data that contains zeros

I've used an application of this Dirichlet package by user xuod for the analysis of Morris watermaze data, which is a type of constant sum probability data. Briefly, the data comes from mice swimming in a circular waterfilled pool having four quadrants superimposed on it. The time in each quadrant is recorded over 1 minute, and the dirichlet test is used to determine if the time spent is uniform between all quadrants, or if it non-uniform, suggesting a bias for a particular quadrant. The mice are trained to locate an escape platform hidden in the "target quadrant" which is a test of their memory. It sometimes happens that the data may contain values that are zero when a mouse spends no time at all in one of the quadrants, and the dirichlet.test_uniform() python function fails to converge the model when zeros are present in the data.

Is there any work-around or good solution to address this issue? Can the dataset be transformed in some way to eliminate the zeros without fundamentally altering the data? It seems that adding a constant value to every data point, which gets around the problem of taking the log of zero, would change the data inappropriately.

Thanks for any feedback.

Install using github version or pypi version?

Hi, I notice that in the readme, you suggest install by

pip install git+https://github.com/ericsuh/dirichlet.git

but there is also a https://pypi.python.org/pypi/dirichlet

Why not update the pypi one?

Getting the value of the concentration parameter a0.

Hi!
Thanks alot for the package ! I was wondering if there was way to get the concentration parameter a0.
The mle package gives me the vector of proportions a_K/a0. I am interested in getting a_K or a0. Is there a way I can get that from the MLE method

Thanks

Ashish

Using sparse vectors

Hi Eric,

Thanks a lot for your Dirichlet fitting code. I am using your code on sparse (i.e. lots of zeros) Bag of Words data from images. When I try dirichlet fitting on those, the fitting process blows up and gives lots of NaNs. Sorry, I am little bit new to Dirichlet models, so if question is preliminary, pointing merely to a reference would be good as well. I tried the same features with Tom Minka's matlab toolbox, and it seems to blow up as well.

Thanks again for your code.

Sincerely,
Abhijit Bendale
PhD Student, University of Colorado

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.