GithubHelp home page GithubHelp logo

enumerable-statistics's Issues

histogram

I want to add histogram for Array, Hash, and Enumerable. This corresponds to numpy.histogram function in NumPy.

This method takes the following options:

  • bins: for specifying the number of bins.
  • range: is a 2-length array or a range that contains the lower and the upper bounds of bins.
  • density: is boolean. If this is false, the result will contain the frequency of samples in each bin. If this is true, the result is the value of the possibility density function at the bin.
  • weights: is an array or a hash. If this is an array, it should have the same number of elements with the receiver. Each element in the weight array corresponds to the element at the same index in the receiver. If this is a hash, this maps elements in the receiver to their weights.

And this method returns an array with two items. The first is an array that contains the values of the histogram. The second is an array that contains the bin edges.

Histogram may contain nil in weights

>> require 'enumerable/statistics'
=> true
>> [1, 2].histogram(closed: :left)
=> #<struct EnumerableStatistics::Histogram edge=[1.0, 1.5, 2.0, 2.5], weights=[1, nil, 1], closed=:left, isdensity
=false>

tests fail on i386 (32-bit x86)

Hi,

When I run enumerable-statistics tests on a i386 userspace (i.e. 32-bit x86), I get several test failures. Comparing floating point numbers for equality is not really reliable, and You Should Not Do It(TM).

  1) Hash#sum for {:a=>(100000000/1), :b=>1.0e-09, :c=>1.0e-09, :d=>1.0e-09, :e=>1.0e-09, :f=>1.0e-09, :g=>1.0e-09, :h=>1.0e-09, :i=>1.0e-09, :j=>1.0e-09, :k=>1.0e-09} with conversion `(k, v) -> v` is expected to eq 100000000.00000001
     Failure/Error: it { is_expected.to eq(x) }
     
       expected: 100000000.00000001
            got: 100000000.0
     
       (compared using ==)
     # ./spec/support/macros.rb:48:in `block in it_equals_with_type'

                                                                                                          
  2) Hash#sum for {:a=>1.0e-09, :b=>(100000000/1), :c=>1.0e-09, :d=>1.0e-09, :e=>1.0e-09, :f=>1.0e-09, :g=>1.0e-09, :h=>1.0e-09, :i=>1.0e-09, :j=>1.0e-09, :k=>1.0e-09, :l=>1.0e-09} with conversion `(k, v) -> v` is expected to eq 100000000.00000001
     Failure/Error: it { is_expected.to eq(x) }
     
       expected: 100000000.00000001
            got: 100000000.0
     
       (compared using ==)
     # ./spec/support/macros.rb:48:in `block in it_equals_with_type'

                                                                                                          
  3) Hash#sum for {:a=>100000000, :b=>1.0e-09, :c=>1.0e-09, :d=>1.0e-09, :e=>1.0e-09, :f=>1.0e-09, :g=>1.0e-09, :h=>1.0e-09, :i=>1.0e-09, :j=>1.0e-09, :k=>1.0e-09} with conversion `(k, v) -> v` is expected to eq 100000000.00000001
     Failure/Error: it { is_expected.to eq(x) }
     
       expected: 100000000.00000001
            got: 100000000.0
     
       (compared using ==)
     # ./spec/support/macros.rb:48:in `block in it_equals_with_type'

                                                                                                          
  4) Enumerable#mean for [1.0e-09, (100000000/1), 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09].each is expected to eq 8333333.333333335
     Failure/Error: it { is_expected.to eq(x) }
     
       expected: 8333333.333333335
            got: 8333333.333333333
     
       (compared using ==)
     # ./spec/support/macros.rb:48:in `block in it_equals_with_type'

                                                                                                          
  5) Enumerable#mean_variance for [100000000, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09].each is expected to eq [9090909.090909092, 909090909090909.4]
     Failure/Error: it { is_expected.to eq([m, var]) }
     
       expected: [9090909.090909092, 909090909090909.4]
            got: [9090909.090909092, 909090909090909.0]
     
       (compared using ==)
     # ./spec/enum_spec.rb:237:in `block (4 levels) in <top (required)>'

                                                                                                          
  6) Enumerable#mean_stdev for [100000000, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09].each is expected to eq [9090909.090909092, 30151134.457776368]
     Failure/Error: it { is_expected.to eq([m, sd]) }
     
       expected: [9090909.090909092, 30151134.457776368]
            got: [9090909.090909092, 30151134.45777636]
     
       (compared using ==)
     # ./spec/enum_spec.rb:422:in `block (4 levels) in <top (required)>'

                                                                                                          
  7) Enumerable#variance for [100000000, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09].each is expected to eq 909090909090909.4
     Failure/Error: it { is_expected.to eq(x) }
     
       expected: 909090909090909.4
            got: 909090909090909.0
     
       (compared using ==)
     # ./spec/support/macros.rb:48:in `block in it_equals_with_type'

                                                                                                          
  8) Enumerable#stdev for [100000000, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09].each is expected to eq 30151134.457776368
     Failure/Error: it { is_expected.to eq(x) }
     
       expected: 30151134.457776368
            got: 30151134.45777636
     
       (compared using ==)
     # ./spec/support/macros.rb:48:in `block in it_equals_with_type'

                                                                                                          
  9) Array#variance for [100000000, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09] is expected to eq 909090909090909.4
     Failure/Error: it { is_expected.to eq(x) }
     
       expected: 909090909090909.4
            got: 909090909090909.0
     
       (compared using ==)
     # ./spec/support/macros.rb:48:in `block in it_equals_with_type'

                                                                                                          
  10) Array#mean_stdev for [100000000, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09] is expected to eq [9090909.090909092, 30151134.457776368]
      Failure/Error: it { is_expected.to eq([m, sd]) }
      
        expected: [9090909.090909092, 30151134.457776368]
             got: [9090909.090909092, 30151134.45777636]
      
        (compared using ==)
      # ./spec/array_spec.rb:335:in `block (4 levels) in <top (required)>'

                                                                                                          
  11) Array#stdev for [100000000, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09] is expected to eq 30151134.457776368
      Failure/Error: it { is_expected.to eq(x) }
      
        expected: 30151134.457776368
             got: 30151134.45777636
      
        (compared using ==)
      # ./spec/support/macros.rb:48:in `block in it_equals_with_type'

                                                                                                          
  12) Array#mean_variance for [100000000, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09] is expected to eq [9090909.090909092, 909090909090909.4]
      Failure/Error: it { is_expected.to eq([m, var]) }
      
        expected: [9090909.090909092, 909090909090909.4]
             got: [9090909.090909092, 909090909090909.0]
      
        (compared using ==)
      # ./spec/array_spec.rb:240:in `block (4 levels) in <top (required)>'

                                                                                                          
  13) Array#mean for [1.0e-09, (100000000/1), 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09] is expected to eq 8333333.333333335
      Failure/Error: it { is_expected.to eq(x) }
      
        expected: 8333333.333333335
             got: 8333333.333333333
      
        (compared using ==)
      # ./spec/support/macros.rb:48:in `block in it_equals_with_type'

I tried to come up with a patch for this but wasn't able to grok all the rspec macros in there. This here fixes 7 of those 13 failures:

diff --git a/spec/support/macros.rb b/spec/support/macros.rb
index 6d3dd6e..3ae99dd 100644
--- a/spec/support/macros.rb
+++ b/spec/support/macros.rb
@@ -45,7 +45,7 @@ module Enumerable
 
         def it_equals_with_type(x, type)
           it { is_expected.to be_an(type) }
-          it { is_expected.to eq(x) }
+          it { is_expected.to be_within(0.0000001).of(x) }
         end
 
         def it_is_int_equal(n)

The remaining ones are:

Failed examples:

rspec ./spec/enum_spec.rb:237 # Enumerable#mean_variance for [100000000, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09].each is expected to eq [9090909.090909092, 909090909090909.4]
rspec ./spec/enum_spec.rb[1:2:7:2] # Enumerable#variance for [100000000, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09].each is expected to be within 1.0e-07 of 909090909090909.4
rspec ./spec/enum_spec.rb:422 # Enumerable#mean_stdev for [100000000, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09].each is expected to eq [9090909.090909092, 30151134.457776368]
rspec ./spec/array_spec.rb:240 # Array#mean_variance for [100000000, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09] is expected to eq [9090909.090909092, 909090909090909.4]
rspec ./spec/array_spec.rb[1:2:7:2] # Array#variance for [100000000, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09] is expected to be within 1.0e-07 of 909090909090909.4
rspec ./spec/array_spec.rb:335 # Array#mean_stdev for [100000000, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09] is expected to eq [9090909.090909092, 30151134.457776368]

RubyDataScience

Dear Kenta,

I've recently added your project to our RubyDataScience list: https://github.com/arbox/data-science-with-ruby

I wonder if you want to participate in the Ruby for Data Science network. You could do this in a very simple step by adding the rubydatascience topic to your GitHub repository. You may want to spread a word on Twitter or on other media :)

Thank you for the project!

Where can I get more information about the edge detection algorithm?

I really appreciate your library. I opened up a PR #23. While I wait I was considering porting the histogram code to pure Ruby. I was wondering where the edge detection algorithm came from or if you have any more information about it

ary_histogram_calculate_edge_lo_hi(const double lo, const double hi, const long nbins, const int left_p)

Thanks again for this library!

bundle exec rake bench no longer works

$ bundle exec rake bench                       
# sum
Traceback (most recent call last):
ruby: No such file or directory -- bench/sum.rb (LoadError)
# mean
Traceback (most recent call last):
ruby: No such file or directory -- bench/mean.rb (LoadError)
# variance
Traceback (most recent call last):
ruby: No such file or directory -- bench/variance.rb (LoadError)

Support bins: option in value_counts

bins: should be an integer or an array of integers.

If an integer bins is given, the values are grouped into half-open bins.

If an array of integers is given as bins, the array elements represent the lower limits of each bin. The array must increase monotonically.

Example

>> [1, 2, 1, 1, 3, 4, 4, 5, 2, 6].value_counts(bins: 4)
{ (1.0 ... 2.25) => 5,
  (2.25 ... 3.5) => 1,
  (3.5 ... 4.75) => 2,
  (4.75 ... 6.005) => 2 }

>> [1, 2, 1, 1, 3, 4, 4, 5, 2, 6].value_counts(bins: [1, 2, 3, 4, 5, 6], dropna: false)
{ (1 ... 2) => 3,
  (2 ... 3) => 2,
  (3 ... 4) => 1,
  (4 ... 5) => 2,
  (5 ... 6) => 1,
  nil => 1 }

>> [1, 2, 1, 1, 3, 4, 4, 5, 2, 6].value_counts(bins: [1, 3, 5, 7])
{ (1 ... 3) => 5,
  (3 ... 5) => 3,
  (5 ... 7) => 2 }

value_counts

I want to add value_counts for Array, Hash, and Enumerable.

  • Array#value_counts
  • Enumerable#value_counts
  • Hash#value_counts

Add pivot

The pivot collects the result arrays of the given block, makes a hash that is made by grouping the item in the result arrays by its first items, then aggregates the values of the hash by the given a function or functions, finally returns the resulting hash.

[ 1, 2, 3, 4 ].pivot {|x| [x.odd?, x] } # aggregated by mean
#=> {true => 2.0, false => 3.0}

[ 1, 2, 3, 4 ].pivot(agg: :sum) {|x| [x.odd?, x] }
#=> {true => 4, false => 6}

[ 1, 2, 3, 4 ].pivot(agg: :itself) {|x| [x.odd?, x] }
#=> {true => [1, 3], false => [2, 4]}

[ 1, 2, 3, 4 ].pivot(agg: [:mean, :sum]) {|x| [x.odd?, x] }
#=> {true => [2.0, 4], false => [3.0, 6]}

[ 1, 2, 3, 4 ].pivot(agg: {x: :mean, y: :sum}) {|x| [x.odd?, x] }
#=> {true => { :x => 2.0, :y => 4}, false => { :x => 3.0, :y => 6}}

{ a: 1, b: 2, c: 1 }.pivot(agg: :itself) {|k, v| [v, k] }
#=> {1 => [:a, :c], 2 => [:b]}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.