Comments (5)
I've added an InvCDF member to all distributions by 3d6a220.
I noticed the approximation of the inverse error function leads to some discrepancies when extreme values are chosen and compared to the R qnorm
procedure.
// Testing FSharp.Stats
(Distributions.Continuous.Normal.InvCDF 0. 1. 0.5) //0.
# Testing R
qnorm(0.5,0,1)
# Testing Python
from scipy.stats import norm
norm.ppf(0.5, loc=0, scale=1)
Mean | StDev | X | result FSharp.Stats | result R | result Python |
0 | 1 | 0.5 | 1.253321755e-09 | 0 | 0 |
0 | 1 | 0 | -infinity | -infinity | -infinity |
0 | 1 | 1 | infinity | infinity | infinity |
3 | 0.01 | 0.01 | 2.97673652985179 | 2.97673652126 | 2.97673652125959 |
-300000 | 5000 | 0.99 | -288368.2649258 | -288368.2606298 | -288368.2606297958 |
While the deviation is small and just occurs at extreme values, it would be worth checking if the approximation presented in Wichura, “Algorithm AS 241: The Percentage Points of the Normal Distribution.”, 1988 should be implemented.
- add tests
- check accuracy
References
from fsharp.stats.
The normal InvCDF for mean = 0
and sigma = 1
is already implemented at an inproper position:
FSharp.Stats/src/FSharp.Stats/Signal/QQPlot.fs
Lines 91 to 92 in b74ecf2
I agree, we should add quantile functions as InvCDF
for all distributions 👍
from fsharp.stats.
I've implemented the quantile function of the normal distribution as described in Wichura et al.. Its accurate for
15 decimal places.
from fsharp.stats.
Progress of adding InvCDF/quantile functions t continuous distributions
- Normal
- Gamma
- Beta
- F
- StudentT
- Studentized Range
- LogNormal
- MultivariateNrmal
- Exponential
- Uniform <- should be easy
- Chi
- ChiSquared
from fsharp.stats.
Many distributions have no closed form of the quantile function. Besides published approximations would be beneficial to add a member for each Distribution that approximates the correct x for a given p. The CDF is continuously increasing and therefore a root finding approach should work just fine. I propose the following:
type MyDistribution =
static member PDF a b x = ...
static member CDF a b x = ...
static member InvCDF a b x = //possible no closed form exists
static member InvCDFApprox a b x accuracy =
///parameters: function (float -> float); accuracy (float); minimum (float); maximum (float); maxIterations
let tmp = Optimization.Bisection.tryFindRoot (fun x -> MyDistribution.CDF a b x - p) accuracy 0. 1. 1000
match tmp with
| Some x -> x
| None -> failwith "no InvCDF found to satisfy the given conditions"
Drawbacks
- While this should be feasible for any distribution, the optimization step may be quite slow.
- If the CDF itself is an approximation, an error propagation would inflate the InvCDF error.
To discuss:
- should the
InvCDFApprox
fail or result innan
when no root can be identified? - can the maxIterations be determined by accuracy? In my understanding, the range between min and max is divided into two sections during each iteration. Therefore, the accuracy should be coupled to the number of maximum iterations by:
$$accuracy = 0.5^{maxIterations}$$ If this is correct, maxIterations could be set to$$System.Math.Log(accuracy,0.5)$$ - should the accuracy be given as float, or maybe model it as type like in Expecto.Accuracy.
- Is there a better alternative for
Optimization.Bisection
? Like e.g. NelderMead with (fun x -> abs (MyDistribution.CDF a b x - p)) as objective function. - Are there any better naming options as
InvCDFApprox
? - Are there more concerns, that I missed?
from fsharp.stats.
Related Issues (20)
- Minkowski distance HOT 4
- Documentation for Savitzky Golay filter
- t-Distributed Stochastic Neighbour Embedding (tSNE)
- Decision trees and random forest
- kNN classification HOT 1
- Documentation request for X² test
- [Feature Request] Surface fitting for simple data, and example that shows the surface in plotly HOT 5
- [Feature Request] NaN safety, we probably need something more than doc strings. HOT 2
- [Feature Request] review / remove [<AutoOpen>] on modules that overlay FSharp.Core collection types. HOT 2
- [Feature Request]addition of Normalized Mutual Information
- [BUG] Incorrect Behavior in FSharp.Stats.SpecializedGenericImpl.setColM for Non-Square Matrices
- [Feature Request] QR Matrix Decomposition using Gram-Schmidt HOT 2
- [Feature Request] weighted KNN imputation
- `JaggedCollection.transpose` results in wrong result when applied to rows of varying length
- [BUG] HierarchicalClustering cannot digest data of generic type anymore
- [BUG] Interval.isIntersection false positive on one-sided open intervals HOT 1
- [Feature Request] Support intersect on mixed interval types
- [BUG] Seq.weightedMean returns a function
- [BUG] Seq.stats returns wrong seq length (N)
- [Feature Request] Rename round operator HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fsharp.stats.