mrkn / enumerable-statistics Goto Github PK
View Code? Open in Web Editor NEWLicense: MIT License
License: MIT License
I want to add histogram
for Array, Hash, and Enumerable. This corresponds to numpy.histogram
function in NumPy.
This method takes the following options:
bins:
for specifying the number of bins.range:
is a 2-length array or a range that contains the lower and the upper bounds of bins.density:
is boolean. If this is false
, the result will contain the frequency of samples in each bin. If this is true
, the result is the value of the possibility density function at the bin.weights:
is an array or a hash. If this is an array, it should have the same number of elements with the receiver. Each element in the weight array corresponds to the element at the same index in the receiver. If this is a hash, this maps elements in the receiver to their weights.And this method returns an array with two items. The first is an array that contains the values of the histogram. The second is an array that contains the bin edges.
>> require 'enumerable/statistics'
=> true
>> [1, 2].histogram(closed: :left)
=> #<struct EnumerableStatistics::Histogram edge=[1.0, 1.5, 2.0, 2.5], weights=[1, nil, 1], closed=:left, isdensity
=false>
Hi,
When I run enumerable-statistics tests on a i386 userspace (i.e. 32-bit x86), I get several test failures. Comparing floating point numbers for equality is not really reliable, and You Should Not Do It(TM).
1) Hash#sum for {:a=>(100000000/1), :b=>1.0e-09, :c=>1.0e-09, :d=>1.0e-09, :e=>1.0e-09, :f=>1.0e-09, :g=>1.0e-09, :h=>1.0e-09, :i=>1.0e-09, :j=>1.0e-09, :k=>1.0e-09} with conversion `(k, v) -> v` is expected to eq 100000000.00000001
Failure/Error: it { is_expected.to eq(x) }
expected: 100000000.00000001
got: 100000000.0
(compared using ==)
# ./spec/support/macros.rb:48:in `block in it_equals_with_type'
2) Hash#sum for {:a=>1.0e-09, :b=>(100000000/1), :c=>1.0e-09, :d=>1.0e-09, :e=>1.0e-09, :f=>1.0e-09, :g=>1.0e-09, :h=>1.0e-09, :i=>1.0e-09, :j=>1.0e-09, :k=>1.0e-09, :l=>1.0e-09} with conversion `(k, v) -> v` is expected to eq 100000000.00000001
Failure/Error: it { is_expected.to eq(x) }
expected: 100000000.00000001
got: 100000000.0
(compared using ==)
# ./spec/support/macros.rb:48:in `block in it_equals_with_type'
3) Hash#sum for {:a=>100000000, :b=>1.0e-09, :c=>1.0e-09, :d=>1.0e-09, :e=>1.0e-09, :f=>1.0e-09, :g=>1.0e-09, :h=>1.0e-09, :i=>1.0e-09, :j=>1.0e-09, :k=>1.0e-09} with conversion `(k, v) -> v` is expected to eq 100000000.00000001
Failure/Error: it { is_expected.to eq(x) }
expected: 100000000.00000001
got: 100000000.0
(compared using ==)
# ./spec/support/macros.rb:48:in `block in it_equals_with_type'
4) Enumerable#mean for [1.0e-09, (100000000/1), 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09].each is expected to eq 8333333.333333335
Failure/Error: it { is_expected.to eq(x) }
expected: 8333333.333333335
got: 8333333.333333333
(compared using ==)
# ./spec/support/macros.rb:48:in `block in it_equals_with_type'
5) Enumerable#mean_variance for [100000000, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09].each is expected to eq [9090909.090909092, 909090909090909.4]
Failure/Error: it { is_expected.to eq([m, var]) }
expected: [9090909.090909092, 909090909090909.4]
got: [9090909.090909092, 909090909090909.0]
(compared using ==)
# ./spec/enum_spec.rb:237:in `block (4 levels) in <top (required)>'
6) Enumerable#mean_stdev for [100000000, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09].each is expected to eq [9090909.090909092, 30151134.457776368]
Failure/Error: it { is_expected.to eq([m, sd]) }
expected: [9090909.090909092, 30151134.457776368]
got: [9090909.090909092, 30151134.45777636]
(compared using ==)
# ./spec/enum_spec.rb:422:in `block (4 levels) in <top (required)>'
7) Enumerable#variance for [100000000, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09].each is expected to eq 909090909090909.4
Failure/Error: it { is_expected.to eq(x) }
expected: 909090909090909.4
got: 909090909090909.0
(compared using ==)
# ./spec/support/macros.rb:48:in `block in it_equals_with_type'
8) Enumerable#stdev for [100000000, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09].each is expected to eq 30151134.457776368
Failure/Error: it { is_expected.to eq(x) }
expected: 30151134.457776368
got: 30151134.45777636
(compared using ==)
# ./spec/support/macros.rb:48:in `block in it_equals_with_type'
9) Array#variance for [100000000, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09] is expected to eq 909090909090909.4
Failure/Error: it { is_expected.to eq(x) }
expected: 909090909090909.4
got: 909090909090909.0
(compared using ==)
# ./spec/support/macros.rb:48:in `block in it_equals_with_type'
10) Array#mean_stdev for [100000000, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09] is expected to eq [9090909.090909092, 30151134.457776368]
Failure/Error: it { is_expected.to eq([m, sd]) }
expected: [9090909.090909092, 30151134.457776368]
got: [9090909.090909092, 30151134.45777636]
(compared using ==)
# ./spec/array_spec.rb:335:in `block (4 levels) in <top (required)>'
11) Array#stdev for [100000000, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09] is expected to eq 30151134.457776368
Failure/Error: it { is_expected.to eq(x) }
expected: 30151134.457776368
got: 30151134.45777636
(compared using ==)
# ./spec/support/macros.rb:48:in `block in it_equals_with_type'
12) Array#mean_variance for [100000000, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09] is expected to eq [9090909.090909092, 909090909090909.4]
Failure/Error: it { is_expected.to eq([m, var]) }
expected: [9090909.090909092, 909090909090909.4]
got: [9090909.090909092, 909090909090909.0]
(compared using ==)
# ./spec/array_spec.rb:240:in `block (4 levels) in <top (required)>'
13) Array#mean for [1.0e-09, (100000000/1), 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09] is expected to eq 8333333.333333335
Failure/Error: it { is_expected.to eq(x) }
expected: 8333333.333333335
got: 8333333.333333333
(compared using ==)
# ./spec/support/macros.rb:48:in `block in it_equals_with_type'
I tried to come up with a patch for this but wasn't able to grok all the rspec macros in there. This here fixes 7 of those 13 failures:
diff --git a/spec/support/macros.rb b/spec/support/macros.rb
index 6d3dd6e..3ae99dd 100644
--- a/spec/support/macros.rb
+++ b/spec/support/macros.rb
@@ -45,7 +45,7 @@ module Enumerable
def it_equals_with_type(x, type)
it { is_expected.to be_an(type) }
- it { is_expected.to eq(x) }
+ it { is_expected.to be_within(0.0000001).of(x) }
end
def it_is_int_equal(n)
The remaining ones are:
Failed examples:
rspec ./spec/enum_spec.rb:237 # Enumerable#mean_variance for [100000000, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09].each is expected to eq [9090909.090909092, 909090909090909.4]
rspec ./spec/enum_spec.rb[1:2:7:2] # Enumerable#variance for [100000000, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09].each is expected to be within 1.0e-07 of 909090909090909.4
rspec ./spec/enum_spec.rb:422 # Enumerable#mean_stdev for [100000000, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09].each is expected to eq [9090909.090909092, 30151134.457776368]
rspec ./spec/array_spec.rb:240 # Array#mean_variance for [100000000, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09] is expected to eq [9090909.090909092, 909090909090909.4]
rspec ./spec/array_spec.rb[1:2:7:2] # Array#variance for [100000000, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09] is expected to be within 1.0e-07 of 909090909090909.4
rspec ./spec/array_spec.rb:335 # Array#mean_stdev for [100000000, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09] is expected to eq [9090909.090909092, 30151134.457776368]
Dear Kenta,
I've recently added your project to our RubyDataScience list: https://github.com/arbox/data-science-with-ruby
I wonder if you want to participate in the Ruby for Data Science network. You could do this in a very simple step by adding the rubydatascience
topic to your GitHub repository. You may want to spread a word on Twitter or on other media :)
Thank you for the project!
I really appreciate your library. I opened up a PR #23. While I wait I was considering porting the histogram code to pure Ruby. I was wondering where the edge detection algorithm came from or if you have any more information about it
Thanks again for this library!
$ bundle exec rake bench
# sum
Traceback (most recent call last):
ruby: No such file or directory -- bench/sum.rb (LoadError)
# mean
Traceback (most recent call last):
ruby: No such file or directory -- bench/mean.rb (LoadError)
# variance
Traceback (most recent call last):
ruby: No such file or directory -- bench/variance.rb (LoadError)
bins:
should be an integer or an array of integers.
If an integer bins
is given, the values are grouped into half-open bins.
If an array of integers is given as bins
, the array elements represent the lower limits of each bin. The array must increase monotonically.
>> [1, 2, 1, 1, 3, 4, 4, 5, 2, 6].value_counts(bins: 4)
{ (1.0 ... 2.25) => 5,
(2.25 ... 3.5) => 1,
(3.5 ... 4.75) => 2,
(4.75 ... 6.005) => 2 }
>> [1, 2, 1, 1, 3, 4, 4, 5, 2, 6].value_counts(bins: [1, 2, 3, 4, 5, 6], dropna: false)
{ (1 ... 2) => 3,
(2 ... 3) => 2,
(3 ... 4) => 1,
(4 ... 5) => 2,
(5 ... 6) => 1,
nil => 1 }
>> [1, 2, 1, 1, 3, 4, 4, 5, 2, 6].value_counts(bins: [1, 3, 5, 7])
{ (1 ... 3) => 5,
(3 ... 5) => 3,
(5 ... 7) => 2 }
I want to add value_counts
for Array, Hash, and Enumerable.
The pivot
collects the result arrays of the given block, makes a hash that is made by grouping the item in the result arrays by its first items, then aggregates the values of the hash by the given a function or functions, finally returns the resulting hash.
[ 1, 2, 3, 4 ].pivot {|x| [x.odd?, x] } # aggregated by mean
#=> {true => 2.0, false => 3.0}
[ 1, 2, 3, 4 ].pivot(agg: :sum) {|x| [x.odd?, x] }
#=> {true => 4, false => 6}
[ 1, 2, 3, 4 ].pivot(agg: :itself) {|x| [x.odd?, x] }
#=> {true => [1, 3], false => [2, 4]}
[ 1, 2, 3, 4 ].pivot(agg: [:mean, :sum]) {|x| [x.odd?, x] }
#=> {true => [2.0, 4], false => [3.0, 6]}
[ 1, 2, 3, 4 ].pivot(agg: {x: :mean, y: :sum}) {|x| [x.odd?, x] }
#=> {true => { :x => 2.0, :y => 4}, false => { :x => 3.0, :y => 6}}
{ a: 1, b: 2, c: 1 }.pivot(agg: :itself) {|k, v| [v, k] }
#=> {1 => [:a, :c], 2 => [:b]}
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.