Comments (4)
I will have a look at this. Maybe there is a performance advantage if you explicitly restrict it to float. If so, there should be additional "generic" functions. I'll test it and make the functions usable for "non-float" lists as well.
That you don't have access to non-float letters in your case is hard to work around in the module. There are a lot of possible alphabets that could be considered (upper case, lower case, äüö, special characters, numbers). I assume you have to add your desired set of characters separately by:
let myAlphabet =
"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ ".ToCharArray()
With this at hand you can use this as template and just replace counts of characters that are existing in your text.
#r "nuget: FSharp.Stats"
#r "nuget: Plotly.NET"
open FSharp.Stats
open FSharp.Stats.Distributions
open Plotly.NET
let myAlphabet =
"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ ".ToCharArray()
let myTextMap =
"mississippi".ToCharArray()
|> List.ofArray
|> Frequency.createGeneric
let myFinalMap =
// use your own defined alphabet to include the desired set of characters
myAlphabet
|> Array.map (fun key ->
// if the text contains the current character, its value is used
if myTextMap.ContainsKey key then
key,myTextMap.[key]
// if the text does NOT contain the current character, set its count to 0
else
key,0
)
|> Map.ofArray
// accession of character frequencies
myFinalMap.['z'] // 0
myFinalMap.['s'] // 4
// visualization
myFinalMap
|> Map.toArray
|> Chart.Column
|> Chart.withSize (1000.,500.) // quick way to depict all characters
|> Chart.show
I'll comment if I have any news.
from fsharp.stats.
I fixed the issue, tested the Empirical.create
function, and added a convenience layer for nominal/categorical inputs.
- fix floating point error when handling floats on bin border value 0.3 when binwidth is 0.1
- make Frequency functions generic
- make Empirical functions generic
- add Empirical.create for nominal data
- add convenience layer
- add tests
still missing
- add documentation
Usage
You can build the binaries yourself or wait for the next FSharp.Stats release.
(Update: You can use #r "nuget: FSharp.Stats, 0.4.12-preview.1"
)
Define the set of characters to search for:
#r @"<PathToFSharp.Stats>\FSharp.Stats\src\FSharp.Stats\bin\Release\netstandard2.0\FSharp.Stats.dll"
#r "nuget: Plotly.NET"
open FSharp.Stats
open FSharp.Stats.Distributions
open Plotly.NET
let letters = "Mississippi"
// Define your set of characters that should be checked for
// Any character that is not present in these sets is ignored
let myAlphabet = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ" |> Set.ofSeq
let mySmallAlphabet = "abcdefghijklmnopqrstuvwxyz" |> Set.ofSeq
These alphabets can be used to create the probability maps.
//takes the characters and determines their probabilities without considering non-existing characters
let myFrequencies0 = EmpiricalDistribution.createNominal() letters
//takes upper and lower case characters and determines their probability
let myFrequencies1 = EmpiricalDistribution.createNominal(Template=myAlphabet) letters
//takes only lower case characters and determines their probability
let myFrequencies2 = EmpiricalDistribution.createNominal(Template=mySmallAlphabet) letters
An additional field for transforming the input sequence may be beneficial if it does not matter if an character is lower case or upper case:
//converts all characters to lower case characters and determines their probability
let myFrequencies3 = EmpiricalDistribution.createNominal(Template=mySmallAlphabet,Transform=System.Char.ToLower) letters
// check probability of non existing characters, that are within the search scope (Template alphabet)
myFrequencies3.['z'] //returns 0.0
Visualization
[
Chart.Column(myFrequencies0 |> Map.toArray,"noTemplate") |> Chart.withYAxisStyle "probability"
Chart.Column(myFrequencies1 |> Map.toArray,"bigAlphabet") |> Chart.withYAxisStyle "probability"
Chart.Column(myFrequencies2 |> Map.toArray,"smallAlphabet") |> Chart.withYAxisStyle "probability"
Chart.Column(myFrequencies3 |> Map.toArray,"toLower + smallAlphabet") |> Chart.withYAxisStyle "probability"
]
|> Chart.Grid(4,1)
|> Chart.withTemplate ChartTemplates.lightMirrored
|> Chart.withTitle letters
|> Chart.withSize(1000.,900.)
|> Chart.show
from fsharp.stats.
A prerelease is published and can be used:
#r "nuget: FSharp.Stats, 0.4.12-preview.1"
The documentation that contains the same information as this thread can be found here.
from fsharp.stats.
Thanks Benedikt, nice solution!
from fsharp.stats.
Related Issues (20)
- Filter nan before qvalue calculations HOT 2
- Add negative binomial distribution HOT 3
- [Feature Request] Statistical distributions roadmap
- [BUG] Beta distribution PDF returns nan unexpectedly HOT 2
- [Feature Request] generic binning (binmap) function
- [Feature Request] Documentation of Nelder-Mead method HOT 1
- [Feature Request] CUSUM test
- [Feature Request] Inverse CDF of distributions HOT 5
- [Feature Request] `Frequency` merge operations HOT 2
- [BUG] Padding.pad drops a data element, and the end padding is backwards HOT 3
- [Feature Request] optimized hierarchical clustering HOT 1
- [Feature Request] Module Restructuring Part 1 HOT 2
- [Feature Request] Add Parameter field to Distributions HOT 1
- [BUG] documentation issue: Statistical Testing / SAM HOT 8
- [Feature Request] Update all comments according to XML documentation styling HOT 1
- Separate Akima interpolation from cubic spline interpolation HOT 1
- [Duplicate] Brent minimization seems to be duplicated within FSharp.Stats
- Move regularize anywhere more appropriate
- Update Interval module HOT 3
- Addition of Bezier curves HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fsharp.stats.