GithubHelp home page GithubHelp logo

enumerable-statistics's Introduction

Enumerable::Statistics

Build Status

Enumerable::Statistics provides some methods to calculate statistical summary in arrays and enumerables.

Installation

Add this line to your application's Gemfile:

gem 'enumerable-statistics'

And then execute:

$ bundle

Or install it yourself as:

$ gem install enumerable-statistics

Usage

You should load this library by the following line in your script at first.

require 'enumerable/statistics'

The following methods are supplied by this library:

  • Array#mean, Enumerable#mean
    • Calculates a mean of values in an array or an enumerable
  • Array#variance, Enumerable#variance
    • Calculates a variance of values in an array or an enumerable
  • Array#stdev, Enumerable#stdev
    • Calculates a standard deviation of values in an array or an enumerable
  • Array#mean_variance, Enumerable#mean_variance
    • Calculates a mean and a variance simultaneously
  • Array#mean_stdev, Enumerable#mean_stdev
    • Calculates a mean and a standard deviation simultaneously
  • Array#median
    • Calculates a median of values in an array
  • Array#percentile(q)
    • Calculates a percentile or percentiles of values in an array
  • Array#value_counts, Enumerable#value_counts, and Hash#value_counts
    • Count how many items for each value in the container
  • Array#histogram
    • Calculate histogram of the values in the array

Moreover, for Ruby < 2.4, Array#sum and Enumerable#sum are provided.

All methods scan a collection once to calculate statistics and preserve precision as possible.

Performance

$ bundle exec rake bench
# sum
Warming up --------------------------------------
              inject     1.545k i/100ms
               while     2.342k i/100ms
                 sum    11.009k i/100ms
Calculating -------------------------------------
              inject     15.016k (± 9.6%) i/s -     75.705k in   5.098723s
               while     22.238k (±16.2%) i/s -    107.732k in   5.068156s
                 sum    112.992k (± 6.9%) i/s -    572.468k in   5.091868s
# mean
Warming up --------------------------------------
              inject     1.578k i/100ms
               while     2.057k i/100ms
                mean     9.855k i/100ms
Calculating -------------------------------------
              inject     15.347k (± 8.6%) i/s -     77.322k in   5.076009s
               while     21.669k (±14.5%) i/s -    106.964k in   5.074312s
                mean    108.861k (± 8.9%) i/s -    542.025k in   5.021786s
# variance
Warming up --------------------------------------
              inject   586.000  i/100ms
               while   826.000  i/100ms
            variance     8.475k i/100ms
Calculating -------------------------------------
              inject      6.187k (± 6.7%) i/s -     31.058k in   5.043418s
               while      8.597k (± 7.4%) i/s -     42.952k in   5.024587s
            variance     84.702k (± 8.5%) i/s -    423.750k in   5.039936s

Development

After checking out the repo, run bin/setup to install dependencies. You can also run bin/console for an interactive prompt that will allow you to experiment.

To install this gem onto your local machine, run bundle exec rake install. To release a new version, update the version number in version.rb, and then run bundle exec rake release, which will create a git tag for the version, push git commits and tags, and push the .gem file to rubygems.org.

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/mrkn/enumerable-statistics.

enumerable-statistics's People

Contributors

mrkn avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

enumerable-statistics's Issues

Histogram may contain nil in weights

>> require 'enumerable/statistics'
=> true
>> [1, 2].histogram(closed: :left)
=> #<struct EnumerableStatistics::Histogram edge=[1.0, 1.5, 2.0, 2.5], weights=[1, nil, 1], closed=:left, isdensity
=false>

Support bins: option in value_counts

bins: should be an integer or an array of integers.

If an integer bins is given, the values are grouped into half-open bins.

If an array of integers is given as bins, the array elements represent the lower limits of each bin. The array must increase monotonically.

Example

>> [1, 2, 1, 1, 3, 4, 4, 5, 2, 6].value_counts(bins: 4)
{ (1.0 ... 2.25) => 5,
  (2.25 ... 3.5) => 1,
  (3.5 ... 4.75) => 2,
  (4.75 ... 6.005) => 2 }

>> [1, 2, 1, 1, 3, 4, 4, 5, 2, 6].value_counts(bins: [1, 2, 3, 4, 5, 6], dropna: false)
{ (1 ... 2) => 3,
  (2 ... 3) => 2,
  (3 ... 4) => 1,
  (4 ... 5) => 2,
  (5 ... 6) => 1,
  nil => 1 }

>> [1, 2, 1, 1, 3, 4, 4, 5, 2, 6].value_counts(bins: [1, 3, 5, 7])
{ (1 ... 3) => 5,
  (3 ... 5) => 3,
  (5 ... 7) => 2 }

value_counts

I want to add value_counts for Array, Hash, and Enumerable.

  • Array#value_counts
  • Enumerable#value_counts
  • Hash#value_counts

bundle exec rake bench no longer works

$ bundle exec rake bench                       
# sum
Traceback (most recent call last):
ruby: No such file or directory -- bench/sum.rb (LoadError)
# mean
Traceback (most recent call last):
ruby: No such file or directory -- bench/mean.rb (LoadError)
# variance
Traceback (most recent call last):
ruby: No such file or directory -- bench/variance.rb (LoadError)

tests fail on i386 (32-bit x86)

Hi,

When I run enumerable-statistics tests on a i386 userspace (i.e. 32-bit x86), I get several test failures. Comparing floating point numbers for equality is not really reliable, and You Should Not Do It(TM).

  1) Hash#sum for {:a=>(100000000/1), :b=>1.0e-09, :c=>1.0e-09, :d=>1.0e-09, :e=>1.0e-09, :f=>1.0e-09, :g=>1.0e-09, :h=>1.0e-09, :i=>1.0e-09, :j=>1.0e-09, :k=>1.0e-09} with conversion `(k, v) -> v` is expected to eq 100000000.00000001
     Failure/Error: it { is_expected.to eq(x) }
     
       expected: 100000000.00000001
            got: 100000000.0
     
       (compared using ==)
     # ./spec/support/macros.rb:48:in `block in it_equals_with_type'

                                                                                                          
  2) Hash#sum for {:a=>1.0e-09, :b=>(100000000/1), :c=>1.0e-09, :d=>1.0e-09, :e=>1.0e-09, :f=>1.0e-09, :g=>1.0e-09, :h=>1.0e-09, :i=>1.0e-09, :j=>1.0e-09, :k=>1.0e-09, :l=>1.0e-09} with conversion `(k, v) -> v` is expected to eq 100000000.00000001
     Failure/Error: it { is_expected.to eq(x) }
     
       expected: 100000000.00000001
            got: 100000000.0
     
       (compared using ==)
     # ./spec/support/macros.rb:48:in `block in it_equals_with_type'

                                                                                                          
  3) Hash#sum for {:a=>100000000, :b=>1.0e-09, :c=>1.0e-09, :d=>1.0e-09, :e=>1.0e-09, :f=>1.0e-09, :g=>1.0e-09, :h=>1.0e-09, :i=>1.0e-09, :j=>1.0e-09, :k=>1.0e-09} with conversion `(k, v) -> v` is expected to eq 100000000.00000001
     Failure/Error: it { is_expected.to eq(x) }
     
       expected: 100000000.00000001
            got: 100000000.0
     
       (compared using ==)
     # ./spec/support/macros.rb:48:in `block in it_equals_with_type'

                                                                                                          
  4) Enumerable#mean for [1.0e-09, (100000000/1), 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09].each is expected to eq 8333333.333333335
     Failure/Error: it { is_expected.to eq(x) }
     
       expected: 8333333.333333335
            got: 8333333.333333333
     
       (compared using ==)
     # ./spec/support/macros.rb:48:in `block in it_equals_with_type'

                                                                                                          
  5) Enumerable#mean_variance for [100000000, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09].each is expected to eq [9090909.090909092, 909090909090909.4]
     Failure/Error: it { is_expected.to eq([m, var]) }
     
       expected: [9090909.090909092, 909090909090909.4]
            got: [9090909.090909092, 909090909090909.0]
     
       (compared using ==)
     # ./spec/enum_spec.rb:237:in `block (4 levels) in <top (required)>'

                                                                                                          
  6) Enumerable#mean_stdev for [100000000, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09].each is expected to eq [9090909.090909092, 30151134.457776368]
     Failure/Error: it { is_expected.to eq([m, sd]) }
     
       expected: [9090909.090909092, 30151134.457776368]
            got: [9090909.090909092, 30151134.45777636]
     
       (compared using ==)
     # ./spec/enum_spec.rb:422:in `block (4 levels) in <top (required)>'

                                                                                                          
  7) Enumerable#variance for [100000000, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09].each is expected to eq 909090909090909.4
     Failure/Error: it { is_expected.to eq(x) }
     
       expected: 909090909090909.4
            got: 909090909090909.0
     
       (compared using ==)
     # ./spec/support/macros.rb:48:in `block in it_equals_with_type'

                                                                                                          
  8) Enumerable#stdev for [100000000, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09].each is expected to eq 30151134.457776368
     Failure/Error: it { is_expected.to eq(x) }
     
       expected: 30151134.457776368
            got: 30151134.45777636
     
       (compared using ==)
     # ./spec/support/macros.rb:48:in `block in it_equals_with_type'

                                                                                                          
  9) Array#variance for [100000000, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09] is expected to eq 909090909090909.4
     Failure/Error: it { is_expected.to eq(x) }
     
       expected: 909090909090909.4
            got: 909090909090909.0
     
       (compared using ==)
     # ./spec/support/macros.rb:48:in `block in it_equals_with_type'

                                                                                                          
  10) Array#mean_stdev for [100000000, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09] is expected to eq [9090909.090909092, 30151134.457776368]
      Failure/Error: it { is_expected.to eq([m, sd]) }
      
        expected: [9090909.090909092, 30151134.457776368]
             got: [9090909.090909092, 30151134.45777636]
      
        (compared using ==)
      # ./spec/array_spec.rb:335:in `block (4 levels) in <top (required)>'

                                                                                                          
  11) Array#stdev for [100000000, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09] is expected to eq 30151134.457776368
      Failure/Error: it { is_expected.to eq(x) }
      
        expected: 30151134.457776368
             got: 30151134.45777636
      
        (compared using ==)
      # ./spec/support/macros.rb:48:in `block in it_equals_with_type'

                                                                                                          
  12) Array#mean_variance for [100000000, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09] is expected to eq [9090909.090909092, 909090909090909.4]
      Failure/Error: it { is_expected.to eq([m, var]) }
      
        expected: [9090909.090909092, 909090909090909.4]
             got: [9090909.090909092, 909090909090909.0]
      
        (compared using ==)
      # ./spec/array_spec.rb:240:in `block (4 levels) in <top (required)>'

                                                                                                          
  13) Array#mean for [1.0e-09, (100000000/1), 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09] is expected to eq 8333333.333333335
      Failure/Error: it { is_expected.to eq(x) }
      
        expected: 8333333.333333335
             got: 8333333.333333333
      
        (compared using ==)
      # ./spec/support/macros.rb:48:in `block in it_equals_with_type'

I tried to come up with a patch for this but wasn't able to grok all the rspec macros in there. This here fixes 7 of those 13 failures:

diff --git a/spec/support/macros.rb b/spec/support/macros.rb
index 6d3dd6e..3ae99dd 100644
--- a/spec/support/macros.rb
+++ b/spec/support/macros.rb
@@ -45,7 +45,7 @@ module Enumerable
 
         def it_equals_with_type(x, type)
           it { is_expected.to be_an(type) }
-          it { is_expected.to eq(x) }
+          it { is_expected.to be_within(0.0000001).of(x) }
         end
 
         def it_is_int_equal(n)

The remaining ones are:

Failed examples:

rspec ./spec/enum_spec.rb:237 # Enumerable#mean_variance for [100000000, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09].each is expected to eq [9090909.090909092, 909090909090909.4]
rspec ./spec/enum_spec.rb[1:2:7:2] # Enumerable#variance for [100000000, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09].each is expected to be within 1.0e-07 of 909090909090909.4
rspec ./spec/enum_spec.rb:422 # Enumerable#mean_stdev for [100000000, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09].each is expected to eq [9090909.090909092, 30151134.457776368]
rspec ./spec/array_spec.rb:240 # Array#mean_variance for [100000000, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09] is expected to eq [9090909.090909092, 909090909090909.4]
rspec ./spec/array_spec.rb[1:2:7:2] # Array#variance for [100000000, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09] is expected to be within 1.0e-07 of 909090909090909.4
rspec ./spec/array_spec.rb:335 # Array#mean_stdev for [100000000, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09] is expected to eq [9090909.090909092, 30151134.457776368]

Add pivot

The pivot collects the result arrays of the given block, makes a hash that is made by grouping the item in the result arrays by its first items, then aggregates the values of the hash by the given a function or functions, finally returns the resulting hash.

[ 1, 2, 3, 4 ].pivot {|x| [x.odd?, x] } # aggregated by mean
#=> {true => 2.0, false => 3.0}

[ 1, 2, 3, 4 ].pivot(agg: :sum) {|x| [x.odd?, x] }
#=> {true => 4, false => 6}

[ 1, 2, 3, 4 ].pivot(agg: :itself) {|x| [x.odd?, x] }
#=> {true => [1, 3], false => [2, 4]}

[ 1, 2, 3, 4 ].pivot(agg: [:mean, :sum]) {|x| [x.odd?, x] }
#=> {true => [2.0, 4], false => [3.0, 6]}

[ 1, 2, 3, 4 ].pivot(agg: {x: :mean, y: :sum}) {|x| [x.odd?, x] }
#=> {true => { :x => 2.0, :y => 4}, false => { :x => 3.0, :y => 6}}

{ a: 1, b: 2, c: 1 }.pivot(agg: :itself) {|k, v| [v, k] }
#=> {1 => [:a, :c], 2 => [:b]}

RubyDataScience

Dear Kenta,

I've recently added your project to our RubyDataScience list: https://github.com/arbox/data-science-with-ruby

I wonder if you want to participate in the Ruby for Data Science network. You could do this in a very simple step by adding the rubydatascience topic to your GitHub repository. You may want to spread a word on Twitter or on other media :)

Thank you for the project!

histogram

I want to add histogram for Array, Hash, and Enumerable. This corresponds to numpy.histogram function in NumPy.

This method takes the following options:

  • bins: for specifying the number of bins.
  • range: is a 2-length array or a range that contains the lower and the upper bounds of bins.
  • density: is boolean. If this is false, the result will contain the frequency of samples in each bin. If this is true, the result is the value of the possibility density function at the bin.
  • weights: is an array or a hash. If this is an array, it should have the same number of elements with the receiver. Each element in the weight array corresponds to the element at the same index in the receiver. If this is a hash, this maps elements in the receiver to their weights.

And this method returns an array with two items. The first is an array that contains the values of the histogram. The second is an array that contains the bin edges.

Where can I get more information about the edge detection algorithm?

I really appreciate your library. I opened up a PR #23. While I wait I was considering porting the histogram code to pure Ruby. I was wondering where the edge detection algorithm came from or if you have any more information about it

ary_histogram_calculate_edge_lo_hi(const double lo, const double hi, const long nbins, const int left_p)

Thanks again for this library!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.