<input type="checkbox" id="" disabled=""

Add negative binomial distribution,about fslaborg/fsharp.stats

bvenn commented on July 21, 2024

Digging into negative binomial distribution implementations turned out to be a rabbit hole. Many packages as
R - dnbinom or Python - scipy.stats.nbinom model the PMF according to provided success and failure counts.

Negative Binomial distribution

The distribution models the number of trials needed x to get the r^th success in repeated independent Bernoulli trials.
Until the (x-1)^th trial, (r-1) successes must be accomplished, which matches the standard binomial distribution.
Therefore, to get the r^th success in the x^th trial, you have to multiply Binom(p,x-1,r-1) by p.
Therefore the PMF is:

probability for r-1 successes in x-1 trials = binomial distribution: $\binom{x-1}{r-1}p^{r-1}(1-p)^{(x-1)-(r-1)}$
probability for 1 success in the last trial: $p$
combined probability: $p*\binom{x-1}{r-1}p^{r-1}(1-p)^{(x-1)-(r-1)}$
simplified version: $\binom{x-1}{r-1}p^{r}(1-p)^{x-r}$

with a support of [r -> infinity).

Example

What is the probability for the third success occurring at the 10^th trial, given a independent trial success-probability of 0.09?

x=10
r=3
p=0.09

NegBinom(x,r,p) = 0.01356

However, standard R and Python functions result in:
R: dnbinom(x=10,size=3,prob=0.09) = 0.01873637
Python: scipy.nbinom.pmf(k=10, n=3, p=0.09) = 0.01873637

Scipy-documentation states that

The probability mass function above is defined in the “standardized” form. To shift distribution use the loc parameter. Specifically, nbinom.pmf(k, n, p, loc) is identically equivalent to nbinom.pmf(k - loc, n, p).

k often is defined as the number of failures prior to the last success (Wikipedia top right or this online calculator). By changing the function accordingly:

scipy.nbinom.pmf(k=10, n=3, p=0.09, loc=3) or
scipy.nbinom.pmf(k= 7, n=3, p=0.09) = 0.01356

the expected probability is returned.

Conclusion

With the definition given above, there is no possibility to have probabilities > 0 when x<r. Therefore I would suggest to parameterize the negative binomial distribution using:

r: number of successes
x: number of trials and not number of failures
p: probability of each independent Bernoulli trial

and stick to these parameters for PMF and CDF accordingly. Switching to number of failures for the determination of PMF does not make sense for me. To my current overview this does not align to other implementations, so some further research has to be done to clarify the situation. However, overloads could be introduced to support both definitions. The parameter usage must be well defined.

References for X = number of trials:

References for X = number of failures:

References that support both definitions:

https://molbiotools.com/math_calculators/negative_binomial.html

@muehlhaus, maybe you have time to have a look at this issue

from fsharp.stats.

bvenn commented on July 21, 2024

It all condenses down to the question if the variable x of the negative binomial distribution (or pascal distribution) should be defined as:

A: number of trials or
B: number of failures

Both, the german and english Wikipedia provide both definitions with no preference. The german lists A first, while the english lists B first.

I would suggest to stick to the first definition and clearly state this fact in the documentation.

from fsharp.stats.

bvenn commented on July 21, 2024

After some more consideration, it may be beneficial to stick with two Implementations:

NegativeBinomial_trials
NegativeBinomial_failures

with the second using the first one. In rare cases, the parameterization of the distribution is done by the number of failures instead of successes. But I don't think anyone would be confused that NegativeBinomial_failures does take the failures as input, but models the number of failures as result

from fsharp.stats.

Add negative binomial distribution about fsharp.stats HOT 3 CLOSED

Comments (3)

Negative Binomial distribution

Example

Conclusion

References for X = number of trials:

References for X = number of failures:

References that support both definitions:

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs