juliaci / pkgbenchmark.jl Goto Github PK

View Code? Open in Web Editor NEW

127.0 9.0 28.0 450 KB

Easy benchmark tracking for packages

License: Other

Julia 100.00%

pkgbenchmark.jl's Introduction

PkgBenchmark

Benchmarking tools for Julia packages

Introduction

PkgBenchmark provides an interface for Julia package developers to track performance changes of their packages.

The package contains the following features:

Running the benchmark suite at a specified commit, branch or tag. The path to the julia executable, the command line flags, and the environment variables can be customized.
Comparing performance of a package between different package commits, branches or tags.
Exporting results to markdown for benchmarks and comparisons, similar to how Nanosoldier reports results for the benchmarks on Base Julia.

Installation

The package is registered and can be installed with Pkg.add as

julia> Pkg.add("PkgBenchmark")

Documentation

STABLE — most recently tagged version of the documentation.
DEV — most recent development version of the documentation.

Project Status

The package is tested against Julia v1.0 and the latest v1.x on Linux, macOS, and Windows.

Contributing and Questions

Contributions are welcome, as are feature requests and suggestions. Please open an issue if you encounter any problems.

pkgbenchmark.jl's People

Contributors

Stargazers

Watchers

pkgbenchmark.jl's Issues

Tuple must be non empty error?


benchmarkpkg("ChaosTools")

INFO: Running benchmarks...
ERROR: LoadError: 
ArgumentError: tuple must be non-empty
Stacktrace:
 [1] include_from_node1(::String) at .\loading.jl:576
 [2] include(::String) at .\sysimg.jl:14
 [3] runbenchmark_local(::String, ::String, ::String, ::Bool) at C:\Users\datseris\.julia\v0.6\PkgBenchmark\src\runbenchmark.jl:28
while loading C:\Users\datseris\.julia\v0.6\ChaosTools\benchmark\benchmarks.jl, in expression starting on line 3
failed process: Process(`'C:\Users\datseris\AppData\Local\Julia-0.6.2\bin\julia.exe' -Cx86-64 '-JC:\Users\datseris\AppData\Local\Julia-0.6.2\lib\julia\sys.dll' --compile=yes --depwarn=yes --color=no --compilecache=yes -e 'using PkgBenchmark
PkgBenchmark.runbenchmark_local("C:\\Users\\datseris\\.julia\\v0.6\\ChaosTools\\benchmark\\benchmarks.jl", "C:\\Users\\datseris\\AppData\\Local\\Temp\\jl_B1B0.tmp", "C:\\Users\\datseris\\.julia\\v0.6\\.benchmarks\\ChaosTools\\.tune.jld", false )
'`, ProcessExited(1)) [1]
in benchmarkpkg at PkgBenchmark\src\runbenchmark.jl:64
in benchmarkpkg at PkgBenchmark\src\runbenchmark.jl:64
in  at PkgBenchmark\src\runbenchmark.jl:67
in with_reqs at PkgBenchmark\src\util.jl:12
in withtemp at PkgBenchmark\src\runbenchmark.jl:37
in  at base\<missing>
in #runbenchmark#7 at PkgBenchmark\src\runbenchmark.jl:9 
in  at base\<missing>
in #benchmark_proc#8 at PkgBenchmark\src\runbenchmark.jl:23
in run at base\process.jl:651
in pipeline_error at base\process.jl:682

Here is my benchmarks.jl file:

using PkgBenchmark, ChaosTools

@benchgroup "discrete_lyapunov" ["Lyapunov"] begin
    henon = Systems.henon()
    @bench lyapunov($henon, 10000) 
end

Allow benchmarkpkg to commit results

Ultimately, I'd love for travis to benchmark my package with every commit, for this, I believe benchmarkpkg would need to be able to commit the results of the .jld file it generates.

Support BenchmarkTools native way of writing benchmark suites

The macro based interface to define benchmark suites is a quite small wrapper on top of the "native" one from
BenchmarkTools. Originally, the old BenchmarkTools also used a macro based interface but moved over to the current Dict based one, closer matching the internal representation.
From what I have read, the macro based interface had some problems with interpolation, perhaps @jrevels can fill in with some details. Issue #16 is relevant.

As a first step I think at least supporting benchmark suites written using the native Dict method is valuable.
I think it can be done with not much extra code, we just need to define a convention for the name of the root benchmarkgroup (perhaps SUITE)?

If the macro based interface is too problematic it could be deprecated but that is a later question.

benchmarkpkg does not return on Windows 10

julia> using PkgBenchmark

julia> bresult = benchmarkpkg("DynamicalBilliards");
PkgBenchmark: Running benchmarks...
PkgBenchmark: using benchmark tuning data in C:\Users\datseris\.julia\v0.6\.pkgbenchmark\DynamicalBilliards_tune.json
Benchmarking: 100%|█████████████████████████████████████| Time: 0:00:30
    [1/1]:          "coltimes"
      [2/2]:        "magnetic"
        [2/2]:      "obstacles"
          [11/11]:  "Left wall"

My REPL is just stuck there forever, and I don't know how to progress.
I've tried also with retune=true, no difference.

Julia 0.6.2, windows 10, PkgBenchmark 0.1.1

GitError when running benchmarkpkg with GitLab CI

I'm trying to use PkgBenchmark with one of the private packages we've developed at my company. We have a private registry. The registry and packages are stored on a self-hosted GitLab instance.

I have a script that is more or less doing the following:

using PkgBenchmark

pkg_root = abspath(@__DIR__, "..") # this script lives in the benchmark subdirectory, so I'm pointing up one level

function benchmark_config(; kw...)
    BenchmarkConfig(juliacmd=`julia -O3`,
                    env=Dict("JULIA_NUM_THREADS" => 1);
                    kw...)
end

target = benchmarkpkg(pkg_root, benchmark_config())
baseline = benchmarkpkg(pkg_root, benchmark_config(id="8d33882")) # this is line 26 in the stack trace below

I have a hard-coded id there just to test things out. Eventually, I'll use git merge-base to figure out the correct id.

I can run this locally on my Mac just fine. When I try to call this using GitLab CI, with a runner on a Linux machine, I get the error below.

818 ERROR: LoadError: GitError(Code:ERROR, Class:Index, invalid path: '.julia/registries/EG/')
819 Stacktrace:
820   [1] macro expansion
821     @ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/LibGit2/src/error.jl:110 [inlined]
822   [2] add!(idx::LibGit2.GitIndex, files::String; flags::UInt32)
823     @ LibGit2 /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/LibGit2/src/index.jl:107
824   [3] add!
825     @ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/LibGit2/src/index.jl:106 [inlined]
826   [4] (::LibGit2.var"#157#160"{GitRepo})(idx::LibGit2.GitIndex)
827     @ LibGit2 /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/LibGit2/src/LibGit2.jl:908
828   [5] with(f::LibGit2.var"#157#160"{GitRepo}, obj::LibGit2.GitIndex)
829     @ LibGit2 /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/LibGit2/src/types.jl:1150
830   [6] with
831     @ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/LibGit2/src/types.jl:1156 [inlined]
832   [7] snapshot(repo::GitRepo)
833     @ LibGit2 /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/LibGit2/src/LibGit2.jl:906
834   [8] transact(f::PkgBenchmark.var"#39#40"{PkgBenchmark.var"#do_benchmark#22"{Nothing, Bool, Bool, Nothing, String, String, BenchmarkConfig, Bool, String}, String, String}, repo::GitRepo)
835     @ LibGit2 /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/LibGit2/src/LibGit2.jl:953
836   [9] _withcommit(f::PkgBenchmark.var"#do_benchmark#22"{Nothing, Bool, Bool, Nothing, String, String, BenchmarkConfig, Bool, String}, repo::GitRepo, commit::String)
837     @ PkgBenchmark /scratch/00/gitlab-runner/Z_-sry_E/0/Genesis/Genesis.jl/.julia/packages/PkgBenchmark/N88Tj/src/util.jl:17
838  [10] benchmarkpkg(pkg::String, target::BenchmarkConfig; script::Nothing, postprocess::Nothing, resultfile::Nothing, retune::Bool, verbose::Bool, logger_factory::Nothing, progressoptions::Nothing, custom_loadpath::String)
839     @ PkgBenchmark /scratch/00/gitlab-runner/Z_-sry_E/0/Genesis/Genesis.jl/.julia/packages/PkgBenchmark/N88Tj/src/runbenchmark.jl:139
840  [11] benchmarkpkg(pkg::String, target::BenchmarkConfig)
841     @ PkgBenchmark /scratch/00/gitlab-runner/Z_-sry_E/0/Genesis/Genesis.jl/.julia/packages/PkgBenchmark/N88Tj/src/runbenchmark.jl:56
842  [12] top-level scope
843     @ /.scratch00/gitlab-runner/Z_-sry_E/0/Genesis/Genesis.jl/benchmark/run_ci.jl:26
844 in expression starting at /.scratch00/gitlab-runner/Z_-sry_E/0/Genesis/Genesis.jl/benchmark/run_ci.jl:26

That "EG" registry is our private registry. Line 26 of run_ci.jl is the second call to benchmarkpkg above. It's like it is trying to operate on the registry's repo instead of my package's repo.

I'm using Julia 1.6.1 on both systems, and I'm using v0.2.11 of PkgBenchmark.

I set JULIA_DEPOT_PATH to force the runner to start with a fresh depot. I've looked at the environment variables that GitLab CI is setting, but nothing jumped out at me.

I can try to reduce this down to a simpler demonstration of the problem, but I thought I'd post what I have so far, in case something jumps out to someone.

`benchmarkpkg(pathof` not working

In the documentation I find pathof(module) everywhere but I think you mean pkgdir instead at least that one works and pathof doesn't.
Generally it would be nice if benchmarkpkg(module) would work 😄

PkgBenchmark downgrade

Today PkgBenchmark suddenly downgraded from 0.2.6 to 0.2.1. Explicit add yields

(benchmark) pkg> add [email protected]
  Updating registry at `~/.julia/registries/General`
  Updating git-repo `https://github.com/JuliaRegistries/General.git`
 Resolving package versions...
ERROR: Unsatisfiable requirements detected for package PkgBenchmark [32113eaa]:
 PkgBenchmark [32113eaa] log:
 ├─possible versions are: [0.2.0-0.2.3, 0.2.5-0.2.6] or uninstalled
 ├─restricted to versions 0.2.6 by an explicit requirement, leaving only versions 0.2.6
 └─restricted by compatibility requirements with BenchmarkTools [6e4b80f9] to versions: 0.2.0-0.2.1 or uninstalled — no versions left
   └─BenchmarkTools [6e4b80f9] log:
     ├─possible versions are: [0.4.0-0.4.3, 0.5.0] or uninstalled
     └─restricted to versions 0.5.0 by an explicit requirement, leaving only versions 0.5.0

If it was already fixed on master may be it's worth of new tag release? Do not want to add the master branch as a dependency in CI or downgrade BenchmarkTools

Bad ETA in progress bar

I have different seconds = X additions to @benchmarkable i.e
@benchmarkable solve_x(1) seconds=10 and @benchmarkable solve_y(2) seconds=2 but this doesn't seem to be considered when computing the ETA and the progress bar.
I'm sometimes a bit confused what is done in this package vs. what is done in BenchmarkTools so this is just an assumption that this the correct package for the issue.
I am interested in looking into it myself just wanted to put it out here asap.

Use the same benchmark suite when judging

Currently, when you judge a package between two commits, the benchmarks executed could potentially be different since the file loaded is just the one at the different commits.
I would argue a perhaps better way would be to read in the benchmark suite at the current commit, store that in a file, and use the same benchmark suite for the two commits.

Any opinions @shashi?

Make result-table export_markdown readable without having to render markdown

Would it be possible to add a keyword argument, maybe table_format to export_markdown that would allow to modify how the "Results" table is formatted? Instead of a markdown table, I'd love to have a text-formatted table. So, with table_format = "text", instead of

## Results

...

|ID | time | GC time | memory | allocations |
|----|-----:|--------:|-------:|------------:|
| `["numerical dense matrices", "T=Complex{Float64}, β=-1, N=4 - super"]` | 21.270 μs (5%) |  | 3.25 KiB (1%) | 17 |
| `["numerical dense matrices", "T=Complex{Float64}, β=-1, N=4"]` | 777.000 ns (5%) |  |  |  |
| `["numerical dense matrices", "T=Complex{Float64}, β=-1, N=500 - super"]` | 204.540 ms (5%) | 1.229 ms | 26.70 MiB (1%) | 24 |
| `["numerical dense matrices", "T=Complex{Float64}, β=-1, N=500"]` | 191.092 ms (5%) |  |  |  |

the output would be

## Results

...

     ID                                                                        time              GC time    memory           allocations

     ["numerical dense matrices", "T=Complex{Float64}, β=-1, N=4 - super"]     21.270 μs (5%)               3.25 KiB (1%)    17
     ["numerical dense matrices", "T=Complex{Float64}, β=-1, N=4"]             777.000 ns (5%)
     ["numerical dense matrices", "T=Complex{Float64}, β=-1, N=500 - super"]   204.540 ms (5%)   1.229 ms   26.70 MiB (1%)   24
     ["numerical dense matrices", "T=Complex{Float64}, β=-1, N=500"]           191.092 ms (5%)

This is still valid markdown, but it would make it much easier to read the resulting output in the terminal.

Actually, probably an even better solution would be to always generate the markdown-table code in such a way that the columns align, something like

## Results

...

|ID                                                                         | time            | GC time  | memory         | allocations |
|---------------------------------------------------------------------------|----------------:|---------:|---------------:|------------:|
| `["numerical dense matrices", "T=Complex{Float64}, β=-1, N=4 - super"]`   | 21.270 μs (5%)  |          | 3.25 KiB (1%)  | 17          |
| `["numerical dense matrices", "T=Complex{Float64}, β=-1, N=4"]`           | 777.000 ns (5%) |          |                |             |
| `["numerical dense matrices", "T=Complex{Float64}, β=-1, N=500 - super"]` | 204.540 ms (5%) | 1.229 ms | 26.70 MiB (1%) | 24          |
| `["numerical dense matrices", "T=Complex{Float64}, β=-1, N=500"]`         | 191.092 ms (5%) |          |                |             |

Add functionality for exporting pretty printed results.

Similar to how Nanosolider reports the base benchmarks. I have code extracted from Nanosolider that generates something like this: https://gist.github.com/KristofferC/61a6752025255ec0b5a4510c2e83263e. Functionality for exporting to this markdown format could be added after #24 is done. It would be cool if we could use the Github API to automatically upload the result to a gist.

Enable code coverage badge

Tag a version

The currently tagged version doesn't work on Windows, but master does. Would be great if a new version could be tagged.

Increase adoption among packages

Many packages would benefit from using PkgBenchmark.jl to catch performance regressions, and adoption would increase if the barrier for usage was lowered even further.

Wondering if any of the following ideas should be pursued:

Include additional usability features. @tkf put together some helper scripts that have found use by @ericphanson and me in three packages so far. I suspect more packages will follow this trend, and there will be lots of copied boilerplate. It may be best for PkgBenchmark to absorb this functionality.
Add automatic package benchmarking support to the Travis CI Julia script. It would be really nice if users could just create a valid benchmark/benchmarks.jl file and set a flag (similar to the code coverage flags) to enable package benchmarking.
Curate a list of projects that implement package benchmarking effectively. Similar to how the Travis CI page has a section for example projects.
Maintain a simplest possible example project that still behaves like a Julia module. I'm thinking of something like the reference SUITE, but where the functions being tested are wrapped by a module (e.g. mysin, mycos, mytan).
Add a CI category in Discourse, either under Domains or Tooling.

Not working with simple example

Could you please clarify what is accepted as argument to @bench, this is another package for which the functionality is not working:

using GeoStats
using PkgBenchmark

srand(2017)

dim = 3; nobs = 100
X = rand(dim, nobs); z = rand(nobs)
xₒ = rand(dim)
γ = GaussianVariogram(1., 1., 0.)

@benchgroup "Kriging" ["kriging"] begin
  @bench "SimpleKriging" simkrig = SimpleKriging(X, z, γ, mean(z))
  @bench "OrdinaryKriging" ordkrig = OrdinaryKriging(X, z, γ)
  @bench "UniversalKriging" unikrig = UniversalKriging(X, z, γ, 1)
  for j=1:nobs
    @bench "SimpleKrigingEstimate" estimate(simkrig, X[:,j])
    @bench "OrdinaryKrigingEstimate" estimate(ordkrig, X[:,j])
    @bench "UniversalKrigingEstimate" estimate(unikrig, X[:,j])
  end
end

Benchmark failing with 3D images

I don't know what is happening, but please consider this code:

using ImageQuilting
using GeoStatsImages
using PkgBenchmark

srand(2017)

TI = rand(50, 50, 50)
iqsim(TI, 10, 10, 10, size(TI)...) # warm up

@benchgroup "2D simulation" ["iqsim"] begin
  for TIname in ["Strebelle", "StoneWall"]
    TI = training_image(TIname)
    @bench TIname iqsim(TI, 30, 30, 1, size(TI)..., nreal=3)
  end
end

@benchgroup "3D simulation" ["iqsim"] begin
  TI = training_image("StanfordV")
  @bench "StanfordV" iqsim(TI, 30, 30, 10, size(TI)...)
end

It works if I comment out the @benchgroup "3D simulation", what is the reason it is failing? The command is correct and runs perfectly without the @bench macro in front.

You can try it for yourself in the package itself: https://github.com/juliohm/ImageQuilting.jl

Please let me know how I can fix this issue.

Support Project.toml

It would be great if Project.toml/Manifest.toml in benchmark/ folder is instantiated so that it's easy to configure benchmarks in a CI.

is it just a matter of making https://github.com/JuliaCI/PkgBenchmark.jl/blob/master/src/runbenchmark.jl#L244 this an argument? By default it could continue to use that but if there's a Project.toml in benchmark/, it uses that instead?

Move docstrings inline

With Documenter, the docstrings for the exported functions should be able to be moved inline.

No need to check if repo is dirty since the results are not stored inside the repo anymore.

The storage location for the saved data changed from the package repo to .benchmarks. There is therefore no need to check if package is dirty when saving the data.

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

If you'd like for me to do this for you, comment TagBot fix on this issue.
I'll open a PR within a few hours, please be patient!

Is judge mixing up improvements and regressions?

I start out with this code:

function foo(n)
    res = 0
    for i=1:n
        res += i
    end
    return res
end

@benchgroup "stupid-test" begin
    @bench "basic" foo(1_000_000)
end

I commit this in my branch and then run benchmarkpkg. Next I change this code to

function foo(n)
    res = 0
    for i=1:n
        res += i^2
    end
    return res
end

@benchgroup "stupid-test" begin
    @bench "basic" foo(1_000_000_000)
end

I then run showall(judge("Packagename", "branchname")). I would expect this to show me a regression, i.e. an increase in time. But instead it shows:

"stupid-test" => 1-element BenchmarkTools.BenchmarkGroup:
        tags: []
        "basic" => TrialJudgement(-10.00% => improvement)

I would have expected exactly the opposite, i.e. a positive percentage change because runtime has increased and as a judgment that this is a regression. Am I missing something basic?

saving other data

First of all awesome package, it is very helpful!

I am not sure if this is appropriate, but I am benchmarking an optimization tool and I am interested in using your tool to not only tell me how long it takes but also the results. For my case, the only other results besides time that I would like to grab are (two numbers) the error and the objective function value.

Before I started to dig in I was wondering, is this something that I can easily already do?

I have another question but I don't think it deserves its own issue:

how many times samples does it have for each run? Is there a way to change this? like in BenchmarkTools?

Thanks!

Judge documentation doesn't match code

Documentation for judge seems to be incorrect. The documentation states that the to_ref is optional judge(pkg, from_ref, [to_ref]; ...) but the code requires it.

Add post-processing step?

Opening this after offline discussion of this PR with @jrevels: JuliaCI/BenchmarkTools.jl#129

I'm copy-pasting some of that discussion here:

Here's the context for this:

I'm trying to define a BenchmarkGroup for https://juliaci.github.io/PkgBenchmark.jl. But in my existing benchmark file, before using PkgBenchmark, I'm subtracting the results of two @benchmark runs (one where I do my operation, and one where I don't) in order to get the exact time for just my operation. Is there any way I can reproduce that type of logic with the @benchmarkable structure I need for the BenchmarkGroup?

Here's my existing logic that I'm trying to convert to be a BenchmarkGroup:
https://github.com/JuliaMath/FixedPointDecimals.jl/blob/3e7da851ea9caa0e267c21e0bb067ae32ee9ad77/bench/decimal-representation-comparisons.jl#L96-L99

What we want is to be able to subtract/diff the results of two benchmark runs as part of generating the final BenchmarkResult from PkgBenchmark.jl. In that PR, JuliaCI/BenchmarkTools.jl#129, I suggested a feature to allow adding this subtraction step to the "computation graph" built for the BenchmarkGroup:

        bbase = @benchmarkable $fbase()
        bbench = @benchmarkable $fbench()
        SUITE["bench"] = bbench - bbase

But after talking to @jrevels, we think that this kind of "post-processing" doesn't really belong in the BenchmarkGroup structure. Instead, it's something that you should do after you have your results, no different than a judge step, or taking a ratio would be.

However, currently PkgBenchmark doesn't support any kind of post-processing. It simply runs the BenchmarkGroup and returns the results. Could we consider accepting an additional argument that allows performing arbitrary post-processing on the results?

I'm imagining maybe taking a callback that accepts a BenchmarkResults and returns a new BenchmarkResults. Or maybe takes a BenchmarkGroup (after it's been run) and returns a new BenchmarkGroup. This callback param could be added to benchmarkpkg and judge.

What do you think? :) If you think this makes sense, i'm happy to create and send the PR!! :)

Document or unexport `withresults`

The function is exported and tested but not documented anywhere AFAIU.

No need to throw for dirty repo if the commit we measure is the same as the one we are on.

E.g. benchmarking HEAD when the repo is dirty should not error because no checkout is needed. (Note, only when using "HEAD" as an explicit argument).

judge parameter ref is not optional

The README states that the ref is optional:

judge(pkg, [ref], baseline;
    f=(minimum, minimum),
    usesaved=(true, true),
    script=defaultscript(pkg),
    require=defaultrequire(pkg),
    resultsdir=defaultresultsdir(pkg),
    saveresults=true,
    promptsave=true,
    promptoverwrite=true)

In practise this is not true:

julia> using PkgBenchmark

julia> judge("Package", "branch")
ERROR: MethodError: no method matching judge(::String, ::String)
Closest candidates are:
  judge(::String, ::String, ::String; f, judgekwargs, kwargs...) at /Users/omus/.julia/v0.6/PkgBenchmark/src/judge.jl:26

Remove ProgressMeter.jl-based progress bar?

IIUC, PkgBenchmark.jl has its own _run function to support progress reports via ProgressMeter.jl. In JuliaCI/BenchmarkTools.jl#153 I suggested to use more flexible and open progress logging API on BenchmarkTools.jl side. If that PR is accepted, I suppose we can eventually use the plain run function?

My motivation is to run PkgBenchmark with verbose option passed to run. I think this is nice because:

It helps package authors to locate time-consuming benchmarks.
Some CI services don't like silent jobs. verbose option is a nice way to workaround it.

Revamp the testing strategy

Right now the testing tries to test against the package itself. This is a bit awkward especially when you want to test what happens when the package is dirty. Instead, it is probably better to set up a completely separate "TestPackage" in a temp folder that mimics the structure of a simple package and then run the tests on that package.

Compatibility with Julia v1.0

Any plans on making PkgBenchmark.jl compatible with Julia v1.0?

Documentation: When is retune required?

Considering that PkgBenchmark seems to automatically cache tuning data, I was quite unsure under what circumstances (if ever) one would need to pass retune=true to the benchmarkpgk function.

It would be good if this was mentioned in the documentation.

Add DOCUMENTER_KEY secret

I don't have the rights to do that. See https://github.com/JuliaCI/PkgBenchmark.jl/runs/2608902862?check_suite_focus=true#step:5:45

Xref #125

Enable appveyor

I'd like to use this from Windows, and it would be good if the package was regularly tested on Windows.

Update CI for 1.0 update

I have not and will probably not have time to update the CI for the 1.0 update in #68. That should probably be done before tagging a release so help with that would be appreciated.

Use JLD2 over JLD?

https://github.com/simonster/JLD2.jl is written in pure Julia and is significantly faster than JLD and it also supports compression.
We already have an indirect dependency on JLD through BenchmarkTools so this would add one dependency.

I can do some benchmarks, look at timings and see if this would be a good idea. I am not sure if JLD2 can read files written by JLD. If it can, this should be backwards compatible.

Examples don't work

ERROR: syntax: invalid escape sequence
 in eval(::Module, ::Any) at .\boot.jl:238
 in process_options(::Base.JLOptions) at .\client.jl:245
 in _start() at .\client.jl:332
ERROR: failed process: Process(`'C:\Julia\Julia-0.6-latest\bin\julia' -Cx86-64 '-JC:\Julia\Julia-0.6-latest\lib\julia\sys.dll' --compile=yes --depwarn=yes --color=yes --compilecache=yes -e 'using PkgBenchmark
PkgBenchmark.runbenchmark_local("C:\Users\Mus\.julia\v0.6\Cephes\benchmark\benchmarks.jl", "C:\Users\Mus\AppData\Local\Temp\jl_137F.tmp", "C:\Users\Mus\.julia\v0.6\Cephes\benchmark\.tune.jld", false )
'`, ProcessExited(1)) [1]
 in pipeline_error(::Base.Process) at .\process.jl:682
 in run at .\process.jl:651 [inlined]
 in #benchmark_proc#8(::Bool, ::Function, ::String, ::String, ::String) at C:\Users\Mus\.julia\v0.6\PkgBenchmark\src\runbenchmark.jl:22
 in (::PkgBenchmark.#kw##benchmark_proc)(::Array{Any,1}, ::PkgBenchmark.#benchmark_proc, ::String, ::String, ::String) at .\<missing>:0
 in #runbenchmark#7 at C:\Users\Mus\.julia\v0.6\PkgBenchmark\src\runbenchmark.jl:9 [inlined]
 in (::PkgBenchmark.#kw##runbenchmark)(::Array{Any,1}, ::PkgBenchmark.#runbenchmark, ::String, ::String, ::String) at .\<missing>:0
 in withtemp(::PkgBenchmark.##11#15{String,String,Bool}, ::String) at C:\Users\Mus\.julia\v0.6\PkgBenchmark\src\runbenchmark.jl:36
 in with_reqs(::PkgBenchmark.##10#14{String,String,Bool}, ::String, ::Function) at C:\Users\Mus\.julia\v0.6\PkgBenchmark\src\util.jl:12
 in (::PkgBenchmark.#do_benchmark#13{String,String,String,String,Bool,Bool,Bool,String})() at C:\Users\Mus\.julia\v0.6\PkgBenchmark\src\runbenchmark.jl:66
 in #benchmarkpkg#9(::String, ::String, ::String, ::String, ::Bool, ::Bool, ::Bool, ::Bool, ::PkgBenchmark.#benchmarkpkg, ::String, ::Void) at C:\Users\Mus\.julia\v0.6\PkgBenchmark\src\runbenchmark.jl:106
 in benchmarkpkg(::String) at C:\Users\Mus\.julia\v0.6\PkgBenchmark\src\runbenchmark.jl:63

ERROR: LoadError: error compiling anonymous: syntax: prefix "$" in non-quoted expression
 in include_from_node1(::String) at .\loading.jl:532
 in include(::String) at .\sysimg.jl:14
while loading C:\Users\Mus\Dropbox (MIT)\code\github\Cephes.jl\benchmark\benchmarks.jl, in expression starting on line 3

Support for `writecsv`

Sometimes, it would be nice to be able to plot the results of a benchmark suite, or do some kind of numerical analysis. This would be easy if both BenchmarkResults and BenchmarkJudgement could be written to a CSV file. Thus, there should be an appropriate new method for the builtin writecsv routine.

Judge function comparing latest tagged version by default?

I would like to ask you if this is a good idea?

judge("Foo") = judge("Foo", "vx.y.z") # with vx.y.z the latest tagged version.

Printing results

I implementing a benchmark infrastructure for my packages (I am starting with CovarianceMatrices.jl. Am I missing something or the there is no way of nicely printing the output of a judge?

Make part of BenchmarkTools.jl?

In conversation with @shashi we were wondering if this package should just become part of BenchmarkTools.jl?

Rename `judge` parameters

I find that the to_ref and from_ref parameter names don't tell me much regarding what is being compared. I think the terminology of baseline and candidate are much clearer in letting the user know that the candidate is the new code which is being compared against the baseline.

If people like this idea we should probably do the same in BenchmarkTools.jl

Tags do not seem to work (as expected by a noob)

here is benchmarks.jl:

@benchgroup "discrete_lyapunov" ["Lyapunov", "Lyapunov2"] begin
    henon = Systems.henon()
    @bench "λ_henon" lyapunov($henon, 10000)
end

@benchgroup "discrete_lyapunov_manyD" ["Lyapunov"] begin
    M = 5;
    ds = Systems.coupledstandardmaps(5)
    @bench "λ_CSM" lyapunov($ds, 10000)
end

and

julia> benchmarkpkg("ChaosTools")                                           
INFO: Running benchmarks...                                                 
Using benchmark tuning data in C:\Users\datseris\.julia\v0.6\.benchmarks\ChaosTools\.tune.jld                                                           
File results of this run? (commit=a173cb, resultsdir=C:\Users\datseris\.julia\v0.6\.benchmarks\ChaosTools\results) (Y/n) Y                              
INFO: Results of the benchmark were written to C:\Users\datseris\.julia\v0.6\.benchmarks\ChaosTools\results\a173cb00e59eb527ae0865a8364b1dddb24ecfb4.jld                                                                            
2-element BenchmarkTools.BenchmarkGroup:                                      
tags: []                                                                    
"discrete_lyapunov_manyD" => 1-element BenchmarkGroup(["Lyapunov"])         
"discrete_lyapunov" => 1-element BenchmarkGroup(["Lyapunov", "Lyapunov2"])

I would expect the tags to not be empty? Or is this simply an issue of printing?

Claims git repo is not a git repo when specifying target branch

Works fine when not specifying a target branch.

> julia -e 'using PkgBenchmark; benchmarkpkg(".")'

PkgBenchmark: Running benchmarks...
Activating environment at `~/projects/Demo/Project.toml`
  Updating registry at `~/.julia/registries/General`
  Updating git-repo `https://github.com/JuliaRegistries/General.git`
    Status `~/projects/Demo/Project.toml`
  [f88ad9dd] Demo v0.1.0 #master (..)
  [6e4b80f9] BenchmarkTools v0.4.3
  [864edb3b] DataStructures v0.17.0 #compare (https://github.com/milesfrain/DataStructures.jl.git)
  [32113eaa] PkgBenchmark v0.2.5
PkgBenchmark: using benchmark tuning data in /home/miles/Demo/benchmark/tune.json
Benchmarking: 100%|██████████████████████████████████████████████████████████████| Time: 0:00:12
    [1/1]:        "demo"
      [1/1]:      "small table"
        [1/1]:    "tiny query"
          [2/2]:  "new calc"

But not a git repo when baseline target branch is specified.

> julia -e 'using PkgBenchmark; benchmarkpkg(".", "baseline")'
ERROR: . is not a git repo, cannot benchmark at baseline
Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] #benchmarkpkg#20(::Nothing, ::Nothing, ::Nothing, ::Bool, ::NamedTuple{(),Tuple{}}, ::String, ::typeof(benchmarkpkg), ::String, ::String) at /home/miles/.julia/packages/PkgBenchmark/LueAt/src/runbenchmark.jl:118
 [3] benchmarkpkg(::String, ::String) at /home/miles/.julia/packages/PkgBenchmark/LueAt/src/runbenchmark.jl:48
 [4] top-level scope at none:1

Definitely a git repo with an existing baseline

> git diff --stat baseline
 README.md | 1 +
 1 file changed, 1 insertion(+)

Option to not tune evals/samples

Could benchmarkpkg have an option to not tune the parameters before test?

I tend to set them manuelly when I'm creating the benchmark ie.:

@benchmarkable foo(1) evals=300

Any suggestions on a workaround / could retune option be expanded to include retune=:never (or something)?

Thanks for the package!

Alex

PS: I'd be happy to put together the pull request for this change, but I wanted to touch base first

error showing BenchmarkConfig

It seems the BenchmarkConfig prints nothing directly?

current = BenchmarkConfig(id=nothing, env = Dict("JULIA_NUM_THREADS"=>4), juliacmd=`julia -O3`)

This print gives an error in REPL.

Allow customization of the julia command and environment variables executed for benchmarking

A ref is now a string simply something that gets turned into a commit id using revparseid. There are however more ways you might want to benchmark your package than by commit. For example, you might want to give the julia command being run (julia version, command flags like optimization level) or what environment variables are active (e.g. JULIA_NUM_THREADS).

I therefore suggest that the ref argument is extended from a String to some new type that holds the information above.
This should be possible to do backwards compatible by defining a string only constructor.

This does however mean that using the commit id for the file name is not enough to identify the actual benchmark being run.
Adding a hash based on the new information should be sufficient?

Tag a new release?

Any chance we can tag a new release of PkgBenchmark?

Allow to run certain suites

I don't know if I overlooked this in the docs, but I'd like to have a feature to only run certain suites when benchmarking a package.
Benchmarking all suites in a package can take hours (literally), while one is often only interested in the benchmark results of a certain suite.

Use Documenter for documentation

The current amount of documentation is perhaps not large enough to warrant using Documenter, however as more features gets added I think it will be valuable. It will also allow us to move docstrings inline, use doctests etc.

Consider storing metadata in the results

Right now the results being stored are "naked" BenchmarkGroup. I suggest we instead store our own PkgBenchmarkGroup object that wraps a BenchmarkGroup but where we are free to add arbitrary metadata that can be used for e.g. exporting the results. This could include things like the date and hardware for the benchmark.