ltratt / multitime Goto Github PK

Time command execution over multiple executions

Home Page: http://tratt.net/laurie/src/multitime/

License: MIT License

C 88.07% Makefile 1.01% Roff 10.92%

multitime's Introduction

multitime: a better time utility
================================

Unix's `time` utility is a simple and often effective way of measuring how long a command takes to run. Unfortunately, running a command once can give misleading timings: the process may create a cache on its first execution, running faster subsequently; other processes may cause the command to be starved of CPU or IO time; etc. It is common to see people run `time` several times and take whichever values they feel most comfortable with. Inevitably, this causes problems.

`multitime` is, in essence, a simple extension to time which runs a command multiple times and prints the timing means (with confidence intervals), standard deviations, minimums, medians, and maximums having done so. This can give a much better understanding of the command's performance.


Why should you use multitime?
-----------------------------

If you want to do any of the following, then `multitime` is worth considering:

* You want to run a command several times to understand how its timings naturally vary.
* You want to run a command several times so that temporary blips in system activity do not distort the timings.
* You need different executions of a command being timed to have different inputs / outputs.
* You want to compare the timing of multiple commands (e.g. for benchmarking purposes).

`multitime` can also be used as a drop-in replacement for the POSiX time command: when invoked as time (e.g. via a symlink), `multitime` behaves as `time`. For most users, therefore, `multitime` can safely replace the time binary, even if you don't make use of its advanced features.


Example usage
--------------

The example below shows a simple benchmark of an `awk` program. In this case the program has been executed 5 times (`-n 5`).

    $ multitime -n 5 awk "function fib(n) \
    >   { return n <= 1 ? 1 : fib(n - 1) + fib(n - 2) } BEGIN { fib(30) }"
    ===> multitime results
    1: awk "function fib(n)   { return n <= 1 ? 1 : fib(n - 1) + fib(n - 2) } BEGIN { fib(30) }"
                Mean                Std.Dev.    Min         Median      Max
    real        1.860+/-0.0013      0.021       1.837       1.856       1.895
    user        1.833+/-0.0005      0.013       1.812       1.836       1.846
    sys         0.002+/-0.0000      0.003       0.000       0.000       0.008


Installing
----------

Formal released of `multitime` can be downloaded here: http://tratt.net/laurie/src/multitime/releases.html.

Formal releases can be built and installed with:

    $ ./configure
    $ make install

The latest source can be cloned with:

    $ git clone git://github.com/ltratt/multitime.git

and built with:

    make -f Makefile.bootstrap
    $ ./configure
    $ make install


Want to know more?
------------------

More details can be found at the http://tratt.net/laurie/src/multitime/

multitime's People

Contributors

Stargazers

Watchers

Forkers

snim2 iustin andyneff wjt fproulx-dfuse hide5stm juergenhoetzel alex4o morlord24 aagontuk lewiscowles1986 thinkingerrol nmdis1999

multitime's Issues

Brew formula not working

Running brew install Tenzer/tap/multitime as per the download instructions gives:

==> Tapping tenzer/tap
Cloning into '/usr/local/Homebrew/Library/Taps/tenzer/homebrew-tap'...
remote: Enumerating objects: 186, done.
remote: Counting objects: 100% (1/1), done.
remote: Total 186 (delta 0), reused 0 (delta 0), pack-reused 185
Receiving objects: 100% (186/186), 26.41 KiB | 6.60 MiB/s, done.
Resolving deltas: 100% (61/61), done.
Tapped 5 formulae (20 files, 44.9KB).
Warning: No available formula or cask with the name "tenzer/tap/multitime". Did you mean tenzer/tap/quirky?

Report hypothesis tests in batch mode

In batch mode multitime can produce timing measurements for a number of different commands. One obvious use of this is to compare the performance of two similar commands. In this situation, it would be useful to report the result of a Student t-test (or similar hypothesis test) which compares the sample means of the different commands.

Testing the calculations in format.c automatically

Matthew Howell, one of my undergraduate students, has reported what might be a bug in multitime. The bug report from Matthew was that when running very short programs at reasonable values of n (e.g. n=30) the resulting confidence intervals were very large, larger than the mean of the wall-clock time. This was on a very short-running program.

This might not be a bug, it may be that the CI is correct, but without being able to see the times of the individual runs it is difficult to tell. On the other hand, there may be an error in the CI calculations, or it may be that my code is not using the t-value and z-value LUTs correctly.

It is difficult to see how to track this potential bug down in such a way that we can use the work done now for regression testing in the future, without incurring a huge overhead in adding many unit tests now, but here is one (slightly eccentric) idea:

Split the calculations in void format_other(Conf) into smaller units that can be tested separately. This would create separate functions for calculating means, CIs, etc.
Create a separate repository called python-multitime which uses cffi to expose the functions created in Step 1. to Python.
In this repository (?) create a number of assertions against the functions from Step 1. that can be tested using Python quickcheck. The advantage of doing this is that we can compare the standard deviations, confidence intervals, etc. from multitime to those generated by scipy or similar, as a ground truth.
In this repository create a .travis.yml file which uses python-multitime from Step 2. and Python quickcheck to test the functions from Step 1. automatically.

This is a bit messy, and it would probably be neater to do everything in C...

Add a command line switch which produces LaTeX output

Improvements to the man page

Add in more examples
Explain CI levels better. Explain 2/3/5 sigma. Give an opinion as to best practice.

It'd be awesome to have a Homebrew package for this

Thanks!

Improve reporting of mean times + CI

Here is an example run of multitime:

===> multitime results
1: ./demosaic images/airplane_RGGB.png
             Mean                Std.Dev.    Min         Median      Max
real        1.563+/-0.3564      0.159       1.444       1.512       2.177       
user        1.408+/-0.0019      0.012       1.385       1.407       1.435    
sys         0.024+/-0.0012      0.009       0.012       0.024       0.052

In this example, the wall clock time is not useful, but the amount of time spent on the CPU is of interest. Unfortunately, summing the user and sys rows is no longer valid, because confidence intervals are not additive.

Therefore, I propose a new list of rows to display on the output:

wall -- the wall clock time, currently named real (renamed for clarity)
cpu -- the real and sys rows added together with a correct confidence interval and standard deviation, etc.
user -- as it is now
sys -- as it is now

Feature: Confidence Interval

#13 and #28 deal with the confidence level. One is a query, and the other is a short-term fix.

This issue seems like a good place perhaps @snim2 @ltratt and I (anyone interested, I've invited myself 😄); could document the vision, progress and usage of the confidence score before documenting it on the README?

Possible bug with the confidence interval feature

After running multitime with -c 95, the confidence interval result seems odd?

         Mean                Std.Dev.    Min         Median      Max

real 289.409+/-988.4848 9.596 240.783 290.874 296.314
user 287.989+/-904.7777 9.181 240.263 290.186 293.038
sys 0.637+/-9.5890 0.945 0.028 0.108 3.028

Release version 1.5

#23 fixes a segfault, so it seems to be worth a new release. What’s even worse, the 1.4 release is tagged on some weird branch completely separate from master, so the patch from #23 does not apply to it.

Master has all changes from 1.4 except for an entry in HISTORY and two contributors in CREDITS.

Should the means reported by multitime be geometric means?

The means reported by multitime are arithmetic means:

https://github.com/ltratt/multitime/blob/master/format.c#L224-L241

Fleming and Wallace (1986) say that geometric means should be reported.

Is this an improvement worth making? If so, how does it affect other measures which rely on the mean values (such as std. dev. and CIs)?

Segmentation Fault - When running without a specified confidence interval

When a confidence interval is not provided on the command like e.g. -c 99 a segmentation fault occurs if the number of runs is less than 30.

The offending code can be found in the format.c file on line 210:

if (conf->num_runs < 30) {
    z_t = tvals[conf->conf_level - 1][conf->num_runs - 1];
}

This is due to the fact that the array location being accessed is out of bounds

The code should be modified to set a default value of 0 for the confidence level and condition the code that calculates / prints the confidence interval based on this default value.

Segmentation fault on OSX when running without '-f liketime'

Environment

OSX 10.13.4.

Apple LLVM version 9.1.0 (clang-902.0.39.1)
Target: x86_64-apple-darwin17.5.0
Thread model: posix

autoconf (GNU Autoconf) 2.69

Observed behavior

On OSX, any output format option other than liketime segfaults after running the command the specified number of times.

#> multitime -f rusage -n 2 sleep 1
===> multitime results
Segmentation fault: 11

#> multitime -n 2 sleep 1
===> multitime results
Segmentation fault: 11

liketime works fine though:

#> multitime -f liketime -n 2 sleep 1
real         1.00
user         0.00
sys          0.00

Expected behavior

No segfaults.