Tester is a tool for automatically running a collection of test cases and for comparing the actual and expected outputs in each case.
The following items are required to use the code in this repository:
- Java Development Kit (version >= 1.8 should be sufficient)
- Apache Ant (version >= 1.9.2 should be sufficient)
The development of tester was supported in part by funding from the National Science Foundation, Award No. CNS-1422979.
To begin, use the command ant
to compile the code for tester
and to create the file tester.jar
. The recommended method for installation is to copy the tester
shell script into a folder on your path, and then edit the copy to point to the location of the tester.jar
file on your system. After these steps, you should be able to run tester
(without arguments) in any directory to produce the following output:
$ tester
usage: tester options ...
options: -r run tests (dryrun without this option)
-i enable interaction to update expected results
-c do not print context for tests during interaction
-q quiet; do not print test progress messages
-f do not print test failed messages
-s do not print test set summaries
$
A set of test cases is defined by a folder containing a file called tests
that defines a series of individual test cases. Each test case has an associated name (a sequence of one or more characters, each of which is a letter, digit, a hyphen, or an underscore), and every test case in each test
file must have a distinct name. Test case names are used in identifying individual tests to users, and also for creating files as necessary to capture test case outputs.
Every tests
file is interpreted as a sequence of lines, and the start of a new test case is marked by one of the following:
-
A line of the form
exec: name command
introduces a test calledname
that will be executed by running the specifiedcommand
. The lines in thetests
file immediately following theexec:
line can be used to provide some context for the test (e.g., a short description of why the test is included); this context information can then be displayed when tester detects a change in the expected output to help the user determine whether the results are correct or not. -
A line of the form
.ext: name command
introduces a test calledname
that will be executed by running the specifiedcommand
with an additional argument that will be the name of a file with a.ext
suffix. The contents of the latter file are specified by the lines in thetests
file that immediately follow the.ext:
line, up to the start of the next text case or the end of the file, whichever comes first. Note that tester will allow any file extensionext
(with the same syntax as is used for test names). -
A line of the form
tests: name
allows the user to nest a new set of tests, calledname
, within the current test set. This can be useful for organizing a large set of tests cases into smaller pieces. For a test case of this form to be valid, there must be a nested folder calledname
that itself contains a newtests
file. For example, if the filea/tests
includes the linetests: b
, then there should also be a foldera/tests/b
and a filea/tests/b/tests
where the latter describes the test cases forb
.
All other lines in the input are treated as plain text, either forming introductory comments before any test cases at the start of a tests
file, or else as lines making up the context or contents portion of a code:
or .ext:
test case, respectively. It is also permitted to include additional lines after a tests:
line and before the next test case (if any), but the text in those lines will be ignored.
The basic operation of tester
is to find and run each of the test cases in a given test set; to capture any output that is produced on the standard output and standard error streams; and to compare those outputs with (previously determined) expected results. As such, tester
will not be immediately applicable in settings where the programs being tested do not produce enough information on the standard output and error streams to be able to judge whether the programs are working correctly or not. If you are testing a compiler, for example, then it should be possible to test that errors in input programs are being detected and reported correctly by checking these outputs, but it may be harder to confirm correct operation for valid input programs where the compiler quietly generates an executable and then exits. Similarly, it can be hard to test a program in this way if its outputs are expected to be different every time the program is run (for example, if the output includes current time or date information or machine-specific details). It may still be possible, however, to use tester for programs like this by using shell scripts or other programs or options specifically designed for use in testing. For example, you might use a script that not only compiles source programs, but also runs them, or passes them through other tools to generate output and confirm that the program worked correctly. Or you might be able to filter the output to remove unnecessary details, such as time information, that might cause the output to change unnecessarily.
If you determine that tester
is applicable, the next step is to create a root folder for the test cases. This folder will typically contain:
-
One or more subfolders with
tests
files. For example, you might partition your tests across multiple folders so that you can test different parts of the system without having to run all of the tests every time. Or perhaps you will use one short set of tests for a quick sanity check during development, and a larger, more extensive set of tests (that might take much longer to run) on a less regular basis, such as prior to commiting code for a new feature. -
Two subfolders called
expected
andactual
that contain the expected and actual outputs of the individual test cases. Thetester
tool will automatically generate these folders, and any necessary subfolders. In principle, you could create files in theexpected
folder by hand; in practice, however, it will probably be easier to use tester's interactive mode (-i
) to do this. The contents of theactual
folder overwritten every time you rerun the tests (using the-r
option), so you should not expect to make permanent edits to the files in this folder, but you may find that those files are useful for debugging individual test cases. For example, the actual outputs that are produced by running a test calledt
in a set of tests represented by the foldera
will be stored in the filesactual/a/t.out
andactual/a/t.err
, capturing the standard output and standard error, respectively. If you runtester
without repeating the tests, then any results from a previous run that are cached in theactuals
folder will be used instead. This, for example, makes it possible to get a quick reminder of any failing tests in the previous run by using-qs
as a command line option without the overhead of having to rerun all of the tests. -
Custom shell scripts or other programs that you need for testing. Note that the commands that are specified in
tests
files are taken as verbatim: there is no support for environment variable substitutions, I/O redirection, pipes, shell escapes, etc. If such features are required for the tests, then they should be packaged up in shell scripts that can, instead, be referenced in the commands used for testing. -
Data files that are needed for testing. (If you have a lot of these, it might be better to organize them in subfolders too, of course.)
The tester
command allows the user to specify a combination of options (by writing an initial -
followed by one or more option characters) and a sequence of test folders. As indicated previously, it is possible to obtain a summary of command line options by running tester
without any arguments. The following provides a slightly more detailed description for what each of the options controls.
-
-r
indicates thattester
should attempt to run all of the specified test cases and capture new output for each one. If this flag is not specified, thentester
will use just any outputs captured in theactual
folder instead of repeating the test. This will likely be much faster, but it obviously won't account for any changes that have been made to the program being tested since the previous test run. -
-i
indicates thattester
should use an interactive mode to display details of any issues that it detects and give the user an option to update the expected outputs when appropriate. It is important, of course, for the user to review the new outputs very carefully to ensure they are correct before accepting them. -
-c
indicates thattester
should not display the context information (explanation or code) for test cases whentester
is running in interactive mode. Use of this option is not encouraged unless the naming scheme that is used for individual test cases already provides enough information for users to identify any test where the outputs differ. -
-q
indicates thattester
should run in "quiet" mode, which means that it will not display progress messages about which test is being run or about which tests have passed; only summary information and details of failing tests will be displayed. -
-s
indicates thattester
should not display summary messages that indicate the fraction of passing tests in each test set. In combination with-q
, this option indicates thattester
should only display information about failing tests. -
-f
indicates thattester
should not display messages about failing tests. Failing tests will, however, still be included in summary lines (unless they have also been disabled using the-s
option).
The following sequence of commands illustrates the use of tester
on a simple example: testing the Unix sort utility, which takes text files as input and outputs a sorted version with the lines rearranged (by default) in increasing dictionary order. The files that are used as the starting point for this demo are included in the sort-test
folder of the tester
distribution; if you have installed tester
correctly, then you should be able to change in to that directory and then just follow along with the sequence of commands shown below.
The testing process begins by defining a set of test cases, which we will create here in a test folder called demo
:
$ cat demo/tests
This test file contains some simple tests for the Unix sort utility.
.txt: animals sort
lion
tiger
panda
racoon
aardvark
zebra
bear
wolf
echidna
exec: numbers sort numbers
This variant of the sort command treats the input as text strings that will
be sorted in dictionary/lexicographic order. So anything that starts with
0 will come before anything that starts with a 1.
exec: numbers-numeric sort -n numbers
By adding the -n command line option, we specify that the input should be
treated (and hence sorted) as a list of numbers, so 007 will come before
10, but not before 1.
$
This particular test file contains three test cases. The first, called animals
attempts to sort the given list of animal names by placing them in a .txt file and passing that as an argument to the sort command. (Note that there is actually a blank linke at the end of the list of animals, and this will be included with the rest of the input when the test is executed; this is not a problem, so long as we are careful to account that this is properly reflected in the output that we see when the test case is run.)
The remaining two test cases, named numbers
and numbers-numeric
test the behavior of sort
for sorting text representing numbers in two different ways. Each of the test cases includes some brief text to explain what is being tested and what we might expect from the results. The list of numbers is actually stored separately in a file, also called numbers
, so that the same input can be used in both tests:
$ cat numbers
4
007
12
42
7
10
100
93
1
$
At this point, demo/tests
and numbers
are the only files we have created:
$ ls
demo/ numbers
$
But this can be changed by running tester
on our demo
test cases:
$ tester demo
Test set demo contains 3 tests:
....CodeTest: sort demo/animals.txt
....FAILED demo/animals: Missing expected outputs
....Exec demo/numbers: sort numbers
....FAILED demo/numbers: Missing expected outputs
....Exec demo/numbers-numeric: sort -n numbers
....FAILED demo/numbers-numeric: Missing expected outputs
demo: passed 0 of 3 tests
$ ls
actual/ demo/ expected/ numbers
$
Notice that, although all of the tests failed (because we have not yet specified
any expected outputs for them), this initial run did at least succeed
in creating the expected
and actual
folders where outputs will be stored.
In fact, the tester demo
command did not actually run any of the tests, so
although it created the actual
folder with a subfolder for demo
, it has
not actually captured any outputs there:
$ ls actual/demo/
animals.txt
$
The only file here is animals.txt
, which was created to contain the text from
the first test case. To force tester
to actually run the tests, we must specify
the -r
command line option:
$ tester -r demo
Test set demo contains 3 tests:
....CodeTest: sort demo/animals.txt
....FAILED demo/animals: Missing expected outputs
....Exec demo/numbers: sort numbers
....FAILED demo/numbers: Missing expected outputs
....Exec demo/numbers-numeric: sort -n numbers
....FAILED demo/numbers-numeric: Missing expected outputs
demo: passed 0 of 3 tests
$
Although all the tests are failing (after all, we still have not created the
expected outputs), we can see that the outputs---for all three tests, and for
both standard output and standard error---have now been captured in the
actual/demo
folder:
$ ls actual/demo/
animals.err animals.txt numbers-numeric.out numbers.out
animals.out numbers-numeric.err numbers.err
$
It would be possible now to go through those files, checking (by hand)
that the contents are what we expect, and then copying each of them to
the corresponding position in the expected
folder to build up our
initial set of expected outputs. (Of course, it would also be possible,
but likely more work too, to create each of those expected output files
manually.) The interactive mode of tester
, which is accessed by
including i
as a command line option, is designed to help in situations
like this by walking the user through any changes and allowing them to
update expected outputs as appropriate. Every time tester -i
finds
a difference between the actual and expected outputs for a test case,
it (a) displays the context for the test; (b) shows either the actual output
(if there is no expected output) or a diff (if both outputs exist); and
(c) asks the user if they would like to update the expected output with
the actual output. The user should inspect the outputs carefully and
answer either y
(yes) or n
(no) as appropriate at each such prompt.
(In what follows, we use the r
and i
command line options
together, which is likely to be a common pattern in practice. But
because we have just run all the tests using tester -r
, it would
also be possible to use just tester -i
here, and avoid the
overhead of repeating the previous test runs.)
$ tester -ri demo
Test set demo contains 3 tests:
....CodeTest: sort demo/animals.txt
ISSUE for demo/animals: expected outputs for demo/animals are missing.
context=================================================================
lion
tiger
panda
racoon
aardvark
zebra
bear
wolf
echidna
standard output=========================================================
aardvark
bear
echidna
lion
panda
racoon
tiger
wolf
zebra
standard error==========================================================
========================================================================
Use these outputs as the expected results [yn]? y
....Exec demo/numbers: sort numbers
ISSUE for demo/numbers: expected outputs for demo/numbers are missing.
context=================================================================
This variant of the sort command treats the input as text strings that will
be sorted in dictionary/lexicographic order. So anything that starts with
0 will come before anything that starts with a 1.
standard output=========================================================
007
1
10
100
12
4
42
7
93
standard error==========================================================
========================================================================
Use these outputs as the expected results [yn]? y
....Exec demo/numbers-numeric: sort -n numbers
ISSUE for demo/numbers-numeric: expected outputs for demo/numbers-numeric are missing.
context=================================================================
By adding the -n command line option, we specify that the input should be
treated (and hence sorted) as a list of numbers, so 007 will come before
10, but not before 1.
standard output=========================================================
1
4
007
7
10
12
42
93
100
standard error==========================================================
========================================================================
Use these outputs as the expected results [yn]? y
demo: passed 3 of 3 tests
$
In each of the three cases shown here, tester
detects that the
expected output is missing and displays the context (code or
explanation for the test case) and the generated standard output
and standard error (the latter being empty in all of these
examples). At each step, after carefully reviewing the details to
check that it is correct, the user hits y
to accept the update
and the test case is marked as being passed.
If we rerun tester
, all of the tests are now marked as PASSED
:
$ tester -r demo
Test set demo contains 3 tests:
....CodeTest: sort demo/animals.txt
....PASSED demo/animals
....Exec demo/numbers: sort numbers
....PASSED demo/numbers
....Exec demo/numbers-numeric: sort -n numbers
....PASSED demo/numbers-numeric
demo: passed 3 of 3 tests
$
In situations like this, it may be useful to see only an abbreviated
version of the test report. Using command line option -q
, for
example, omits all of the lines for passing tests and shows only
the summary lines that describe the fraction of passing tests:
$ tester -q demo
demo: passed 3 of 3 tests
$
Now suppose that we want to add another test. For the sake of this
example, we'll use the command line to add something to the end of
the demo/tests
file: in practice, it will usually be easier to make
changes like this in a text editor:
$ echo "exec: sort-reverse sort -r numbers" >> demo/tests
$
The purpose of this test case is to explore what happens when the -r
flag is used (it should cause the input to be sort in reverse order).
If we rerun tester
with this addition, it will detect and report a
"Missing expected outputs" error for the new test case:
$ tester -q demo
....FAILED demo/sort-reverse: Missing expected outputs
demo: passed 3 of 4 tests
$
To fill this gap, and skip output for the tests that already pass,
we can use tester -qri
:
$ tester -qri demo
ISSUE for demo/sort-reverse: expected outputs for demo/sort-reverse are missing.
standard output=========================================================
93
7
42
4
12
100
10
1
007
standard error==========================================================
========================================================================
Use these outputs as the expected results [yn]? y
demo: passed 4 of 4 tests
$
Now that we have built a set of test cases, and a corresponding set of expected outputs for each one, it is easy to run the tests again and again:
$ tester -qr demo
demo: passed 4 of 4 tests
$
In practice, of course, it only makes sense to repeat the tests is there
is a possibility that something may have changed to cause the programs to
fail. To simulate this, we will add an extra number to the end of the
numbers
file and then rerun tester
:
$ echo 18 >> numbers
$ tester -qr demo
....FAILED demo/numbers: Test did not produce expected outputs
....FAILED demo/numbers-numeric: Test did not produce expected outputs
....FAILED demo/sort-reverse: Test did not produce expected outputs
demo: passed 1 of 4 tests
$
Unsurprisingly, the three tests that depend on the content of numbers
are
now listed as having FAILED
. The new outputs, however, have been recorded
in the actual
folder, so it is easy to generate a quick reminder of which
tests are failing without having to run everything from scratch:
$ tester -q demo
....FAILED demo/numbers: Test did not produce expected outputs
....FAILED demo/numbers-numeric: Test did not produce expected outputs
....FAILED demo/sort-reverse: Test did not produce expected outputs
demo: passed 1 of 4 tests
$
More importantly, we can go back to interactive mode using the i
command
line flag and use that to step through and update the broken test cases:
$ tester -qi demo
ISSUE for demo/numbers: test did not produce expected outputs.
context=================================================================
This variant of the sort command treats the input as text strings that will
be sorted in dictionary/lexicographic order. So anything that starts with
0 will come before anything that starts with a 1.
standard output=========================================================
*** expected/demo/numbers.out 2018-09-19 16:14:45.000000000 -0700
--- actual/demo/numbers.out 2018-09-19 16:23:52.000000000 -0700
***************
*** 4,9 ****
--- 4,10 ----
10
100
12
+ 18
4
42
7
========================================================================
Use new output as the expected result [yn]? y
ISSUE for demo/numbers-numeric: test did not produce expected outputs.
context=================================================================
By adding the -n command line option, we specify that the input should be
treated (and hence sorted) as a list of numbers, so 007 will come before
10, but not before 1.
standard output=========================================================
*** expected/demo/numbers-numeric.out 2018-09-19 16:14:46.000000000 -0700
--- actual/demo/numbers-numeric.out 2018-09-19 16:23:52.000000000 -0700
***************
*** 5,10 ****
--- 5,11 ----
7
10
12
+ 18
42
93
100
========================================================================
Use new output as the expected result [yn]? y
ISSUE for demo/sort-reverse: test did not produce expected outputs.
standard output=========================================================
*** expected/demo/sort-reverse.out 2018-09-19 16:15:48.000000000 -0700
--- actual/demo/sort-reverse.out 2018-09-19 16:23:52.000000000 -0700
***************
*** 2,7 ****
--- 2,8 ----
7
42
4
+ 18
12
100
10
========================================================================
Use new output as the expected result [yn]? y
demo: passed 4 of 4 tests
$
In these examples, tester
recognizes that expected and actual outputs both
exist, but are different, and so it displays a (context) diff for each one to
highlight the changes. In each case, it is clear that the only change is the
result of adding 18 to the original test set, and it is also easy to see that
the new number is inserted at the correct position. As a result, the user
can quickly answer each of the prompts with y
, and return to the position
where all of the tests pass.