ltratt / multitime Goto Github PK
View Code? Open in Web Editor NEWTime command execution over multiple executions
Home Page: http://tratt.net/laurie/src/multitime/
License: MIT License
Time command execution over multiple executions
Home Page: http://tratt.net/laurie/src/multitime/
License: MIT License
multitime: a better time utility ================================ Unix's `time` utility is a simple and often effective way of measuring how long a command takes to run. Unfortunately, running a command once can give misleading timings: the process may create a cache on its first execution, running faster subsequently; other processes may cause the command to be starved of CPU or IO time; etc. It is common to see people run `time` several times and take whichever values they feel most comfortable with. Inevitably, this causes problems. `multitime` is, in essence, a simple extension to time which runs a command multiple times and prints the timing means (with confidence intervals), standard deviations, minimums, medians, and maximums having done so. This can give a much better understanding of the command's performance. Why should you use multitime? ----------------------------- If you want to do any of the following, then `multitime` is worth considering: * You want to run a command several times to understand how its timings naturally vary. * You want to run a command several times so that temporary blips in system activity do not distort the timings. * You need different executions of a command being timed to have different inputs / outputs. * You want to compare the timing of multiple commands (e.g. for benchmarking purposes). `multitime` can also be used as a drop-in replacement for the POSiX time command: when invoked as time (e.g. via a symlink), `multitime` behaves as `time`. For most users, therefore, `multitime` can safely replace the time binary, even if you don't make use of its advanced features. Example usage -------------- The example below shows a simple benchmark of an `awk` program. In this case the program has been executed 5 times (`-n 5`). $ multitime -n 5 awk "function fib(n) \ > { return n <= 1 ? 1 : fib(n - 1) + fib(n - 2) } BEGIN { fib(30) }" ===> multitime results 1: awk "function fib(n) { return n <= 1 ? 1 : fib(n - 1) + fib(n - 2) } BEGIN { fib(30) }" Mean Std.Dev. Min Median Max real 1.860+/-0.0013 0.021 1.837 1.856 1.895 user 1.833+/-0.0005 0.013 1.812 1.836 1.846 sys 0.002+/-0.0000 0.003 0.000 0.000 0.008 Installing ---------- Formal released of `multitime` can be downloaded here: http://tratt.net/laurie/src/multitime/releases.html. Formal releases can be built and installed with: $ ./configure $ make install The latest source can be cloned with: $ git clone git://github.com/ltratt/multitime.git and built with: make -f Makefile.bootstrap $ ./configure $ make install Want to know more? ------------------ More details can be found at the http://tratt.net/laurie/src/multitime/
Running brew install Tenzer/tap/multitime
as per the download instructions gives:
==> Tapping tenzer/tap
Cloning into '/usr/local/Homebrew/Library/Taps/tenzer/homebrew-tap'...
remote: Enumerating objects: 186, done.
remote: Counting objects: 100% (1/1), done.
remote: Total 186 (delta 0), reused 0 (delta 0), pack-reused 185
Receiving objects: 100% (186/186), 26.41 KiB | 6.60 MiB/s, done.
Resolving deltas: 100% (61/61), done.
Tapped 5 formulae (20 files, 44.9KB).
Warning: No available formula or cask with the name "tenzer/tap/multitime". Did you mean tenzer/tap/quirky?
In batch mode multitime
can produce timing measurements for a number of different commands. One obvious use of this is to compare the performance of two similar commands. In this situation, it would be useful to report the result of a Student t-test (or similar hypothesis test) which compares the sample means of the different commands.
Matthew Howell, one of my undergraduate students, has reported what might be a bug in multitime
. The bug report from Matthew was that when running very short programs at reasonable values of n
(e.g. n=30
) the resulting confidence intervals were very large, larger than the mean of the wall-clock time. This was on a very short-running program.
This might not be a bug, it may be that the CI is correct, but without being able to see the times of the individual runs it is difficult to tell. On the other hand, there may be an error in the CI calculations, or it may be that my code is not using the t-value
and z-value
LUTs correctly.
It is difficult to see how to track this potential bug down in such a way that we can use the work done now for regression testing in the future, without incurring a huge overhead in adding many unit tests now, but here is one (slightly eccentric) idea:
void format_other(Conf)
into smaller units that can be tested separately. This would create separate functions for calculating means, CIs, etc.python-multitime
which uses cffi to expose the functions created in Step 1. to Python.multitime
to those generated by scipy
or similar, as a ground truth..travis.yml
file which uses python-multitime
from Step 2. and Python quickcheck to test the functions from Step 1. automatically.This is a bit messy, and it would probably be neater to do everything in C...
Thanks!
Here is an example run of multitime:
===> multitime results
1: ./demosaic images/airplane_RGGB.png
Mean Std.Dev. Min Median Max
real 1.563+/-0.3564 0.159 1.444 1.512 2.177
user 1.408+/-0.0019 0.012 1.385 1.407 1.435
sys 0.024+/-0.0012 0.009 0.012 0.024 0.052
In this example, the wall clock time is not useful, but the amount of time spent on the CPU is of interest. Unfortunately, summing the user and sys rows is no longer valid, because confidence intervals are not additive.
Therefore, I propose a new list of rows to display on the output:
#13 and #28 deal with the confidence level. One is a query, and the other is a short-term fix.
This issue seems like a good place perhaps @snim2 @ltratt and I (anyone interested, I've invited myself π); could document the vision, progress and usage of the confidence score before documenting it on the README?
After running multitime with -c 95, the confidence interval result seems odd?
Mean Std.Dev. Min Median Max
real 289.409+/-988.4848 9.596 240.783 290.874 296.314
user 287.989+/-904.7777 9.181 240.263 290.186 293.038
sys 0.637+/-9.5890 0.945 0.028 0.108 3.028
#23 fixes a segfault, so it seems to be worth a new release. Whatβs even worse, the 1.4 release is tagged on some weird branch completely separate from master, so the patch from #23 does not apply to it.
Master has all changes from 1.4 except for an entry in HISTORY
and two contributors in CREDITS
.
The means reported by multitime
are arithmetic means:
https://github.com/ltratt/multitime/blob/master/format.c#L224-L241
Fleming and Wallace (1986) say that geometric means should be reported.
Is this an improvement worth making? If so, how does it affect other measures which rely on the mean values (such as std. dev. and CIs)?
When a confidence interval is not provided on the command like e.g. -c 99 a segmentation fault occurs if the number of runs is less than 30.
The offending code can be found in the format.c file on line 210:
if (conf->num_runs < 30) {
z_t = tvals[conf->conf_level - 1][conf->num_runs - 1];
}
This is due to the fact that the array location being accessed is out of bounds
The code should be modified to set a default value of 0 for the confidence level and condition the code that calculates / prints the confidence interval based on this default value.
OSX 10.13.4.
Apple LLVM version 9.1.0 (clang-902.0.39.1)
Target: x86_64-apple-darwin17.5.0
Thread model: posix
autoconf (GNU Autoconf) 2.69
On OSX, any output format option other than liketime
segfaults after running the command the specified number of times.
#> multitime -f rusage -n 2 sleep 1
===> multitime results
Segmentation fault: 11
#> multitime -n 2 sleep 1
===> multitime results
Segmentation fault: 11
liketime
works fine though:
#> multitime -f liketime -n 2 sleep 1
real 1.00
user 0.00
sys 0.00
No segfaults.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. πππ
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google β€οΈ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.