Comments (4)
It would be an overstatement to say that it completely doesn't work:
std::vector<T> output;
std:partial_sum(axis.begin(), axis.end(), std::back_inserter(output), [] (const pair &p1, const pair &p2) { return p1.second + p2.second; });
Yes, it is less convenient, and one could provide an interface for iterating only over the values.
I see that the real problem is that example code inserts values back into the original histogram. This really feels like what ROOT users normally do. The ROOT developers and community have implemented all imaginable operations on histograms, so it made it very tempting to use them as you normally use, say, numpy arrays (I call this a histogram driven data analysis). I can imagine I don't need to tell how ugly it gets. And it applies to this case too. The cumulative distribution is not a histogram, it has a different meaning, you won't be able to safely manipulate it with routines designed to handle histograms (e.g. think how the bin statistics are not independent now). So my point is that a good histogram object should provide a means to reduce it to some other kind of value (a single number, or vector, or matrix, graph, ...), but list of operations allowed to produce a histogram should be limited to the ones that actually produce a sensible histogram.
from histogram.
I see that the real problem is that example code inserts values back into the original histogram.
I absolutely agree with you. A histogram is not a general containers of numbers, the values inside have a strong meaning. Giving write-access to the values would potentially violate the invariants of the class, notably the consistency of the variance estimates.
I will quote my answer to the boost mailing list. tl;dr: I think I should provide a conversion to Boost.MultiArray. Users can then use iterators of multi_array
. The interface could look like this:
auto h3 = /* some 3d histogram */
// return copy of internal values as 3d multi_array, using value_type of histogram
// (which derives from storage policy); throws if dimension is wrong
auto array1 = h3.values<3>();
// return copy of internal values using custom value type int, throws if dimension is wrong
auto array2 = h3.values<3, int>();
// for experts, try to return a read-only view of internal values as
// const_multi_array_ref<3, int>; throws exception, if counts cannot be viewed like this;
// maybe use boost::optional?
auto array_view = histogram.values_view<3, int>();
I think we need several iterators then. The current axis iterators iterate over the bins definition, not really over the bin content of the histogram itself. I chose the current iterators in this way, because I believe this scheme generalises well to the multi-dimensional case. I wouldn't replace these iterators, because they seem useful to me, but additional iterators that iterate over the bin values and play well with algorithms are also clearly needed.
How should we define such iterators so that they generalise to multiple dimensions? For example, it would be easy to implement new iterators which just iterates over the bin content:
auto h = make_static_histogram(axis::regular(3, 0, 1));
// fill […], then
std::vector cumulative;
std::partial_sum(h.begin(), h.end(), cumulative.begin(), std::back_inserter(cumulative));Questions:
- What should this do if h is 2d? The same? If yes, in which order should the 2d array be iterated over?
The current design keeps the internal order an implementation detail (I use column-major internally, because it is easier to compute the strides on the fly), the two choices are fortran-like (column-major) or C-like (row-major). Perhaps provide both with clear names so it is obvious? Or just row-major, because that's the standard in C++? These questions must have been already discussed during the design of Boost.MultiArray. I think I should provide a way to provide a MultiArray copy of the bin content, thus leveraging all the work that already went into defining MultiArray. Similarly one could provide a read-only view of the variance estimates. A read-only MultiArray view of the internal bin counts of the histogram is also possible for array_storage policy, but not for the default adaptive_storage policy.
- Writable iterators?
In your example, you use writable iterators, to change the content (and the meaning of the content) of the histogram itself. In my reply here, I use read-only iterators, because changing the content of the histogram seems to go against the semantics. A histogram is more specific than a general container of values, the content has a strong meaning that cannot be arbitrarily changed. To be precise, what should the variance estimate, returned byhistogram::variance(…)
be for a cumulative count? The variance estimate for cumulative count is a whole matrix of covariances, since the successive entries are not independent. There are other operations where it is not clearly defined how the variance estimate should be computed. I think the best way to avoid these problems is to provide read-only iterators.
from histogram.
Ok, I thought about it some more. Returning a Boost.MultiArray does not work very well for several reasons. I don't want to return a copy, because the copy could be very large. Returning a view is difficult. Manually specifiying the histogram dimension in the call is also annoying.
And if you just want to iterate over the values, it should be possible, so I am thinking of adding iterators for that. The code should then look like this:
auto h = /* some histogram */
std::vector<double> csum;
std::partial_sum(h.values().begin(), h.values().end(), std::back_inserter(csum));
The iterator just goes over all normal bins, excluding under/overflow bins in implementation-defined order. For 1d-histograms it would just work, and if you need to compute something multi-dimensional, you will be able to query the index of the current element. Like so:
auto h = /* some 2d histogram */
for (auto iter = h.values().begin(); iter != h.values().end(); ++iter) {
std::cout << iter.idx(0) << " " << iter.idx(1) << " " << *iter << std::endl;
}
// prints:
// 0 0 3.0
// 1 0 2.0
// ...
from histogram.
has been fixed mean-while in the master
from histogram.
Related Issues (20)
- Support `boost::rational` as value type for `axis::regular` HOT 2
- storage_adaptor<std::vector> missing efficient move assignment from std::vector HOT 4
- Fuzzy testing of index and value of regular axis
- windows.h illegally uses `#define small char`, explore possible work-arounds
- Division support for weighted storages HOT 5
- Fix MacOS CI
- Add collector accumulator HOT 1
- Confusing error on `make_histogram(axis::integer(0, 1), dense_storage())`
- Extend arithmetic operator support for accumulators
- Sum segfault when mixing empty growable axes with other axes HOT 2
- BLOSC storage HOT 1
- indexed(histogram) not usable with ranges HOT 3
- Extend fraction to weighted samples HOT 1
- boost::histogram::axis::variant, allow users to choose between sorted_array+std::lower_bound and eytzinger_layout+eytzinger_binary_search HOT 14
- Boost.Histogram 1.81.0: Test failed to be built with gcc-12.2.0 HOT 6
- Introducing `variable_t` and similar types for convenience HOT 9
- axes should provide strong exception guarantee for metadata HOT 10
- Improve numerical accuracy of circular axis HOT 3
- Sparse histograms and maped-based storage HOT 10
- CMake Warning from boostorg super project cmake
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from histogram.