GithubHelp home page GithubHelp logo

Comments (4)

bvenn avatar bvenn commented on June 20, 2024

I will have a look at this. Maybe there is a performance advantage if you explicitly restrict it to float. If so, there should be additional "generic" functions. I'll test it and make the functions usable for "non-float" lists as well.

That you don't have access to non-float letters in your case is hard to work around in the module. There are a lot of possible alphabets that could be considered (upper case, lower case, äüö, special characters, numbers). I assume you have to add your desired set of characters separately by:

let myAlphabet = 
    "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ ".ToCharArray()

With this at hand you can use this as template and just replace counts of characters that are existing in your text.

#r "nuget: FSharp.Stats"
#r "nuget: Plotly.NET"

open FSharp.Stats
open FSharp.Stats.Distributions
open Plotly.NET

let myAlphabet = 
    "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ ".ToCharArray()

let myTextMap = 
    "mississippi".ToCharArray()
    |> List.ofArray
    |> Frequency.createGeneric

let myFinalMap = 
    // use your own defined alphabet to include the desired set of characters
    myAlphabet
    |> Array.map (fun key -> 
        // if the text contains the current character, its value is used
        if myTextMap.ContainsKey key then 
            key,myTextMap.[key] 
        // if the text does NOT contain the current character, set its count to 0
        else 
            key,0
        )
    |> Map.ofArray

// accession of character frequencies    
myFinalMap.['z'] // 0
myFinalMap.['s'] // 4

// visualization
myFinalMap
|> Map.toArray
|> Chart.Column
|> Chart.withSize (1000.,500.) // quick way to depict all characters
|> Chart.show

image

I'll comment if I have any news.

from fsharp.stats.

bvenn avatar bvenn commented on June 20, 2024

I fixed the issue, tested the Empirical.create function, and added a convenience layer for nominal/categorical inputs.

32fa0c2

  • fix floating point error when handling floats on bin border value 0.3 when binwidth is 0.1

060f696

  • make Frequency functions generic
  • make Empirical functions generic
  • add Empirical.create for nominal data
  • add convenience layer

7c1242d

  • add tests

still missing

  • add documentation

Usage

You can build the binaries yourself or wait for the next FSharp.Stats release.
(Update: You can use #r "nuget: FSharp.Stats, 0.4.12-preview.1")

Define the set of characters to search for:

#r @"<PathToFSharp.Stats>\FSharp.Stats\src\FSharp.Stats\bin\Release\netstandard2.0\FSharp.Stats.dll"
#r "nuget: Plotly.NET"

open FSharp.Stats
open FSharp.Stats.Distributions
open Plotly.NET

let letters = "Mississippi"

// Define your set of characters that should be checked for
// Any character that is not present in these sets is ignored
let myAlphabet = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ" |> Set.ofSeq
let mySmallAlphabet = "abcdefghijklmnopqrstuvwxyz" |> Set.ofSeq

These alphabets can be used to create the probability maps.

//takes the characters and determines their probabilities without considering non-existing characters
let myFrequencies0 = EmpiricalDistribution.createNominal() letters

//takes upper and lower case characters and determines their probability
let myFrequencies1 = EmpiricalDistribution.createNominal(Template=myAlphabet) letters

//takes only lower case characters and determines their probability
let myFrequencies2 = EmpiricalDistribution.createNominal(Template=mySmallAlphabet) letters

An additional field for transforming the input sequence may be beneficial if it does not matter if an character is lower case or upper case:

//converts all characters to lower case characters and determines their probability
let myFrequencies3 = EmpiricalDistribution.createNominal(Template=mySmallAlphabet,Transform=System.Char.ToLower) letters

// check probability of non existing characters, that are within the search scope (Template alphabet)
myFrequencies3.['z'] //returns 0.0

Visualization

[
Chart.Column(myFrequencies0 |> Map.toArray,"noTemplate") |> Chart.withYAxisStyle "probability"
Chart.Column(myFrequencies1 |> Map.toArray,"bigAlphabet") |> Chart.withYAxisStyle "probability"
Chart.Column(myFrequencies2 |> Map.toArray,"smallAlphabet") |> Chart.withYAxisStyle "probability"
Chart.Column(myFrequencies3 |> Map.toArray,"toLower + smallAlphabet") |> Chart.withYAxisStyle "probability"
]
|> Chart.Grid(4,1)
|> Chart.withTemplate ChartTemplates.lightMirrored
|> Chart.withTitle letters
|> Chart.withSize(1000.,900.)
|> Chart.show

image

from fsharp.stats.

bvenn avatar bvenn commented on June 20, 2024

A prerelease is published and can be used:

#r "nuget: FSharp.Stats, 0.4.12-preview.1"

The documentation that contains the same information as this thread can be found here.

from fsharp.stats.

HarryMcCarney avatar HarryMcCarney commented on June 20, 2024

Thanks Benedikt, nice solution!

from fsharp.stats.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.