GithubHelp home page GithubHelp logo

Comments (3)

bvenn avatar bvenn commented on July 21, 2024

Digging into negative binomial distribution implementations turned out to be a rabbit hole. Many packages as
R - dnbinom or Python - scipy.stats.nbinom model the PMF according to provided success and failure counts.

Negative Binomial distribution

The distribution models the number of trials needed x to get the rth success in repeated independent Bernoulli trials.
Until the (x-1)th trial, (r-1) successes must be accomplished, which matches the standard binomial distribution.
Therefore, to get the rth success in the xth trial, you have to multiply Binom(p,x-1,r-1) by p.
Therefore the PMF is:

  • probability for r-1 successes in x-1 trials = binomial distribution: $\binom{x-1}{r-1}p^{r-1}(1-p)^{(x-1)-(r-1)}$
  • probability for 1 success in the last trial: $p$
  • combined probability: $p*\binom{x-1}{r-1}p^{r-1}(1-p)^{(x-1)-(r-1)}$
  • simplified version: $\binom{x-1}{r-1}p^{r}(1-p)^{x-r}$

with a support of [r -> infinity).

Example

What is the probability for the third success occurring at the 10th trial, given a independent trial success-probability of 0.09?

  • x=10
  • r=3
  • p=0.09

NegBinom(x,r,p) = 0.01356

image

However, standard R and Python functions result in:
R: dnbinom(x=10,size=3,prob=0.09) = 0.01873637
Python: scipy.nbinom.pmf(k=10, n=3, p=0.09) = 0.01873637

Scipy-documentation states that

The probability mass function above is defined in the “standardized” form. To shift distribution use the loc parameter. Specifically, nbinom.pmf(k, n, p, loc) is identically equivalent to nbinom.pmf(k - loc, n, p).

k often is defined as the number of failures prior to the last success (Wikipedia top right or this online calculator). By changing the function accordingly:

scipy.nbinom.pmf(k=10, n=3, p=0.09, loc=3) or
scipy.nbinom.pmf(k= 7, n=3, p=0.09) = 0.01356

the expected probability is returned.

Conclusion

With the definition given above, there is no possibility to have probabilities > 0 when x<r. Therefore I would suggest to parameterize the negative binomial distribution using:

  • r: number of successes
  • x: number of trials and not number of failures
  • p: probability of each independent Bernoulli trial

and stick to these parameters for PMF and CDF accordingly. Switching to number of failures for the determination of PMF does not make sense for me. To my current overview this does not align to other implementations, so some further research has to be done to clarify the situation. However, overloads could be introduced to support both definitions. The parameter usage must be well defined.

References for X = number of trials:

References for X = number of failures:

References that support both definitions:

@muehlhaus, maybe you have time to have a look at this issue

from fsharp.stats.

bvenn avatar bvenn commented on July 21, 2024

It all condenses down to the question if the variable x of the negative binomial distribution (or pascal distribution) should be defined as:

  • A: number of trials or
  • B: number of failures

Both, the german and english Wikipedia provide both definitions with no preference. The german lists A first, while the english lists B first.

I would suggest to stick to the first definition and clearly state this fact in the documentation.

from fsharp.stats.

bvenn avatar bvenn commented on July 21, 2024

After some more consideration, it may be beneficial to stick with two Implementations:

  • NegativeBinomial_trials
  • NegativeBinomial_failures

with the second using the first one. In rare cases, the parameterization of the distribution is done by the number of failures instead of successes. But I don't think anyone would be confused that NegativeBinomial_failures does take the failures as input, but models the number of failures as result

from fsharp.stats.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.