ltratt / multitime Goto Github PK
View Code? Open in Web Editor NEWTime command execution over multiple executions
Home Page: http://tratt.net/laurie/src/multitime/
License: MIT License
Time command execution over multiple executions
Home Page: http://tratt.net/laurie/src/multitime/
License: MIT License
When a confidence interval is not provided on the command like e.g. -c 99 a segmentation fault occurs if the number of runs is less than 30.
The offending code can be found in the format.c file on line 210:
if (conf->num_runs < 30) {
z_t = tvals[conf->conf_level - 1][conf->num_runs - 1];
}
This is due to the fact that the array location being accessed is out of bounds
The code should be modified to set a default value of 0 for the confidence level and condition the code that calculates / prints the confidence interval based on this default value.
Thanks!
#13 and #28 deal with the confidence level. One is a query, and the other is a short-term fix.
This issue seems like a good place perhaps @snim2 @ltratt and I (anyone interested, I've invited myself π); could document the vision, progress and usage of the confidence score before documenting it on the README?
Here is an example run of multitime:
===> multitime results
1: ./demosaic images/airplane_RGGB.png
Mean Std.Dev. Min Median Max
real 1.563+/-0.3564 0.159 1.444 1.512 2.177
user 1.408+/-0.0019 0.012 1.385 1.407 1.435
sys 0.024+/-0.0012 0.009 0.012 0.024 0.052
In this example, the wall clock time is not useful, but the amount of time spent on the CPU is of interest. Unfortunately, summing the user and sys rows is no longer valid, because confidence intervals are not additive.
Therefore, I propose a new list of rows to display on the output:
Matthew Howell, one of my undergraduate students, has reported what might be a bug in multitime
. The bug report from Matthew was that when running very short programs at reasonable values of n
(e.g. n=30
) the resulting confidence intervals were very large, larger than the mean of the wall-clock time. This was on a very short-running program.
This might not be a bug, it may be that the CI is correct, but without being able to see the times of the individual runs it is difficult to tell. On the other hand, there may be an error in the CI calculations, or it may be that my code is not using the t-value
and z-value
LUTs correctly.
It is difficult to see how to track this potential bug down in such a way that we can use the work done now for regression testing in the future, without incurring a huge overhead in adding many unit tests now, but here is one (slightly eccentric) idea:
void format_other(Conf)
into smaller units that can be tested separately. This would create separate functions for calculating means, CIs, etc.python-multitime
which uses cffi to expose the functions created in Step 1. to Python.multitime
to those generated by scipy
or similar, as a ground truth..travis.yml
file which uses python-multitime
from Step 2. and Python quickcheck to test the functions from Step 1. automatically.This is a bit messy, and it would probably be neater to do everything in C...
OSX 10.13.4.
Apple LLVM version 9.1.0 (clang-902.0.39.1)
Target: x86_64-apple-darwin17.5.0
Thread model: posix
autoconf (GNU Autoconf) 2.69
On OSX, any output format option other than liketime
segfaults after running the command the specified number of times.
#> multitime -f rusage -n 2 sleep 1
===> multitime results
Segmentation fault: 11
#> multitime -n 2 sleep 1
===> multitime results
Segmentation fault: 11
liketime
works fine though:
#> multitime -f liketime -n 2 sleep 1
real 1.00
user 0.00
sys 0.00
No segfaults.
#23 fixes a segfault, so it seems to be worth a new release. Whatβs even worse, the 1.4 release is tagged on some weird branch completely separate from master, so the patch from #23 does not apply to it.
Master has all changes from 1.4 except for an entry in HISTORY
and two contributors in CREDITS
.
The means reported by multitime
are arithmetic means:
https://github.com/ltratt/multitime/blob/master/format.c#L224-L241
Fleming and Wallace (1986) say that geometric means should be reported.
Is this an improvement worth making? If so, how does it affect other measures which rely on the mean values (such as std. dev. and CIs)?
In batch mode multitime
can produce timing measurements for a number of different commands. One obvious use of this is to compare the performance of two similar commands. In this situation, it would be useful to report the result of a Student t-test (or similar hypothesis test) which compares the sample means of the different commands.
After running multitime with -c 95, the confidence interval result seems odd?
Mean Std.Dev. Min Median Max
real 289.409+/-988.4848 9.596 240.783 290.874 296.314
user 287.989+/-904.7777 9.181 240.263 290.186 293.038
sys 0.637+/-9.5890 0.945 0.028 0.108 3.028
Running brew install Tenzer/tap/multitime
as per the download instructions gives:
==> Tapping tenzer/tap
Cloning into '/usr/local/Homebrew/Library/Taps/tenzer/homebrew-tap'...
remote: Enumerating objects: 186, done.
remote: Counting objects: 100% (1/1), done.
remote: Total 186 (delta 0), reused 0 (delta 0), pack-reused 185
Receiving objects: 100% (186/186), 26.41 KiB | 6.60 MiB/s, done.
Resolving deltas: 100% (61/61), done.
Tapped 5 formulae (20 files, 44.9KB).
Warning: No available formula or cask with the name "tenzer/tap/multitime". Did you mean tenzer/tap/quirky?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. πππ
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google β€οΈ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.