aas-benchmark

A collection of pattern matching algorithms and a tool to benchmark the algorithms against each other.

aas-benchmark

Build Instructions

If you don't want to build aas-benchmark yourself, you can also download a prebuilt release version. Building the tool yourself, however, should be the prefered way if you want the latest features as the release version may not be updated regularly.

Steps

How can follow these steps to compile aas-benchmark yourself:

Make sure that you have the Rust compiler as well as Cargo installed. Preferably, use rustup to install the entire Rust toolchain.
Clone or download this repository to a local directory.
Open a terminal and navigate to the directory where the Cargo.toml file is located.
Run cargo build --release to compile aas-benchmark.

Alternatively, you can run cargo run --release to compile and run aas-benchmark.
Using this, you can append command-line arguments after a double dash: cargo run --release -- --arguments --here.

You will find an executable file in the target/release subdirectory.

Usage Instructions

This part of the README will explain in further detail how to use aas-benchmark using some examples. Make sure you've read the chapter Build instructions.

Specifying Algorithms

The tool requires the parameter -a which specifies the algorithm or algorithms that you want to benchmark. You can either set a single or multiple algorithms.

aas-benchmark -a naive ...
aas-benchmark -a naive horspool kmp ...

Benchmark All Algorithms at Once

There is also a shortcut to benchmark all algorithms at once:

aas-benchmark -a all ...

Specifying a Number of Executions

If you like, you can specify a number of executions for each algorithm. You could for example use

aas-benchmark naive,horspool -n 10 ...

to run both the naive and horspool algorithm 10 times to smooth out deviations in runtime. If you set different pattern lengths, the tool will run the set number of executions for each algorithm and pattern length.

Specifying a Text Source

Random Generated Text

You can generate a random text with a length of m bytes by using the -t or --tr argument:

aas-benchmark naive -t m ...

Text From File

It is possible to load a text as a UTF-8 string from a file by using --tf:

aas-benchmark naive ... --tf text.txt

This would load the content of the file text.txt as the text.

Specifying a Pattern Source

Below, all possible arguments for specifying a pattern source are listed.

Pattern(s) from...	Usage	Parameters	Multiple patterns?
...fixed position in text	`--pt a..b`	Range¹ `a..b` of characters in text.	No.
...random position in text	`--prt m` or `-p m`	Pattern length `m`.	Yes, supply a range¹ for `m` or use `--pmrt m1;m2;m3` with different lengths `m_i`.
...CLI argument	`--pa pattern`	Pattern as ASCII string `pattern`.	Yes, use `--pa` multiple times or enter multiple patterns separated by spaces after `--pa`.
...file	`--pf pattern.txt`	File `pattern.txt`	Yes, use `--pmf` and supply a file where each line contains one pattern.
Randomly generated	`--pr m`	Pattern length `m`.	Yes, supply a range¹ for `m`.

¹ A range is written as a..b where a is the lower bound and b is the inclusive upper bound. You can also supply a step size c as in a..b,c.

Note that the names of those arguments all follow the same naming convention:

-- + p + Multiple? (m) Random? (r) + Source

This may help you to remember the correct arguments.

Specifying a Seed

You can set a seed to make the generation of a random text and random patterns predictable using the -s or --seed argument:

aas-benchmark naive ... --seed 12345

Other Arguments

Here is a list of other arguments you can set:

Argument	Description
`--noheader`	Disables the header in the CSV output
`--alphabet n`	Set the alphabet size of randomly generated text and patterns to `n`

List of Algorithms

Currently, these algorithms are supported:

Single Pattern Algorithms

Algorithm	Command-line argument name
Backward Nondeterministic DAWG Matching (BNDM)	`bndm`
Backward Oracle Matching (BOM)	`bom`
Horspool	`horspool`
Naive Approach	`naive`
Knuth-Morris-Pratt (KMP)	`kmp` or `kmp-classic`
Shift-And	`shift-and`
Double Window Algorithm	`dw`
Bit-Parallel Length Independent Matching (BLIM)	`blim`

Algorithms Using a Suffix Array

Algorithm	Command-line argument name
Pattern Matching	`sa-match`

See Suffix Array Generation Algorithms for more information on how the suffix array is generated.

Suffix Array Generation Algorithms

Algorithms that require a suffix array to work generate this suffix array using the SAIS algorithm by default. You can, however, select the used suffix array generation algorithm yourself by specifying the --suffixarray argument:

aas-benchmark sa-match ... --suffixarray sais

Currently, these algorithms are available for suffix array generation:

Algorithm	Command-line argument name
Naive approach	`naive`
SAIS	`sais`

Approximative Algorithms

Algorithm	Command-line argument name
Ukkonen's DP Algorithm	`ukkonen`
Error Tolerant Shift-And	`et-shift-and`

For approximative algorithms you can set a maximum allowed error value using the --maxerror argument:

aas-benchmark ukkonen ... --maxerror 2

This value defaults to 0 if not set.

List of Command-Line Arguments

You can run aas-benchmark --help to get a list of available arguments.

lxndio / aas-benchmark Goto Github PK

aas-benchmark's Introduction

aas-benchmark

Table of Contents

Build Instructions

Steps

Usage Instructions

Specifying Algorithms

Benchmark All Algorithms at Once

Specifying a Number of Executions

Specifying a Text Source

Random Generated Text

Text From File

Specifying a Pattern Source

Specifying a Seed

Other Arguments

List of Algorithms

Single Pattern Algorithms

Algorithms Using a Suffix Array

Suffix Array Generation Algorithms

Approximative Algorithms

List of Command-Line Arguments

aas-benchmark's People

Contributors

Watchers

Forkers

aas-benchmark's Issues

Single Pattern

Full Text Index

Approximative

Multiple Patterns

Recommend Projects

Recommend Topics

Recommend Org

Jobs