GithubHelp home page GithubHelp logo

Comments (4)

msharp avatar msharp commented on June 9, 2024

Thanks @PragTob

I have based a lot of the behaviours in this package on the Python numpy and scipy libraries.

The scipy.stats.mode() function returns the lowest value when there are multiple values with the same frequency - https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.mode.html

Can you describe how you are using this function so I can understand your requirements?

from elixir-statistics.

PragTob avatar PragTob commented on June 9, 2024

Hi there,

truth be told I'm not really using - I was looking at the library when evaluating if it was worth releasing my statistics library (Statistex) and came across this so I thought I might as well report it as I believe it to be a bug.

That said, I'm not the one to argue with numpy and scipy. It's just what I read up on when implementing mode (or fully truthful, what someone else read up on and I questioned when reviewing it :) )

I think it's important to know about a data set that there are multiple modes and also seeing them as they might be very interesting. For instance I use this mostly with benchmarking. Seeing that there are 2 modes but they aren't directly "next" to each other is super interesting.

In the same vein, when no value occurs more than once and then just the smallest value is reported that seems highly unhelpful to me. In benchmarking terms it would have me believe that the fastest run time is the one that occurs most frequently which imo heavily skews the results.

Anyhow, that's just my perspective/what I remember from reading it up back then. If numpy and scipy do it like this I'm sure it's fine and feel free to close this :)

Tobi

from elixir-statistics.

msharp avatar msharp commented on June 9, 2024

Thanks for the background @PragTob

To be honest, I probably didn't think too much about the mode implementation. It's not a statistic I use very often.

Benchmarking is probably better evaluated with percentiles anyway, especially if you have real (floating point) numbers.

There might be scope for another mode function with a different arity which can support multimodal datasets.

from elixir-statistics.

PragTob avatar PragTob commented on June 9, 2024

I find all of them useful, but yes benchee also supports percentiles :) (Currently by default shows 99th%, which might be too hardcore maybe 95th% would be a better default, not sure) But yeah, love me some box plots :)

from elixir-statistics.

Related Issues (5)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.