Continuous benchmarking
Add a CI-like job to run the benchmark automatically.
It will help developers, potential users and tantivy-curious people to track performance numbers continuously. Automating also means less stress and hassle for the maintainers/developers of tantivy.
Granularity
We can choose to either run a benchmark on every commit or on every release.
On every commit
Integrate benchmarking suite into CI on the main tantivy repo. Using travisCI's after_success
build stage, run the benchmark, append results to results.json on search-benchmark repo.
Pros:
Commit-specific perf numbers - easier to triage perf regressions.
Will create a more detailed picture of the hot path for the future.
Automated - don't have to fiddle, re-run benchmarking locally.
Costs/cons:
Too much noise - some commits are WIP or harm perf for the sake of a refactor. Is it really necessary to keep that data?
Makes every CI job run longer.
Benchmarking should be done on a dedicated machine to guarantee similar conditions. CI jobs runs inside uncontrolled layers of abstraction (docker inside VM, inside VM). To control the environment and keep it automated, we would need to dedicate a VPS instance. This is an expense, potential security vulnerability and needs administration.
On every release
Same as above, only use git-tags to tell if this commit has a new release.
Pros:
Fewer runs - cheaper on HW, doesn't slow builds down.
Releases are usually semantically important points in history, where we are interested in perf.
Cons/costs:
Still needs dedicated HW to run consistently.
Needs push access to tantivy-benchmark repo.
Presentation
Showing data from every commit might be unnecessarily overwhelming. The current benchmark front-end is clean (imho) and makes it easy to compare results across queries and versions.
On the front-end, we can show 0.6, 0.7, 0.8, 0.9 and latest commit or release.
Power-users or admins can be given the choice to massively extend the table to every commit.
Implementation
A VPS that watches the tantivy main repo, builds a benchmark and commits new results at a decided frequency.
Thoughts?