Over the Thanksgiving break, I've been playing with various values for JMH annotations

/cc <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

Also /cc <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard

Some early results from the current run: <div class="snippet-clipboard-content not

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

It's a <a href="https://blog.acolyer.org/2017/11/07/virtual-machine-warmup-blows-hot-a

Figure out the methodology for JVM benchmarks about rsc HOT 13 CLOSED

twitter commented on July 26, 2024

Figure out the methodology for JVM benchmarks

from rsc.

Comments (13)

xeno-by commented on July 26, 2024

Here's the data that we have gathered so far by repeatedly running several configurations. Different variations of X/Y/Z give rise to names like HotRscTypecheck_XY_XY_Z:

HotRscTypecheck_310_310_2.run 32.955
HotRscTypecheck_310_310_2.run 33.302
HotRscTypecheck_310_310_2.run 33.318
HotRscTypecheck_310_310_2.run 33.519
HotRscTypecheck_310_310_2.run 33.640
HotRscTypecheck_310_310_2.run 33.749
HotRscTypecheck_310_310_2.run 33.753
HotRscTypecheck_310_310_2.run 33.871
HotRscTypecheck_310_310_2.run 34.083
HotRscTypecheck_310_310_2.run 34.548

HotRscTypecheck_310_310_3.run 32.956
HotRscTypecheck_310_310_3.run 33.032
HotRscTypecheck_310_310_3.run 33.319
HotRscTypecheck_310_310_3.run 33.322
HotRscTypecheck_310_310_3.run 33.517
HotRscTypecheck_310_310_3.run 33.591
HotRscTypecheck_310_310_3.run 33.695
HotRscTypecheck_310_310_3.run 33.792
HotRscTypecheck_310_310_3.run 33.801
HotRscTypecheck_310_310_3.run 34.136

HotRscTypecheck_310_310_5.run 33.252
HotRscTypecheck_310_310_5.run 33.303
HotRscTypecheck_310_310_5.run 33.387
HotRscTypecheck_310_310_5.run 33.511
HotRscTypecheck_310_310_5.run 33.546
HotRscTypecheck_310_310_5.run 33.557
HotRscTypecheck_310_310_5.run 33.573
HotRscTypecheck_310_310_5.run 33.643
HotRscTypecheck_310_310_5.run 33.865
HotRscTypecheck_310_310_5.run 34.262

HotRscTypecheck_510_510_3.run 33.205
HotRscTypecheck_510_510_3.run 33.274
HotRscTypecheck_510_510_3.run 33.282
HotRscTypecheck_510_510_3.run 33.284
HotRscTypecheck_510_510_3.run 33.328
HotRscTypecheck_510_510_3.run 33.420
HotRscTypecheck_510_510_3.run 33.440
HotRscTypecheck_510_510_3.run 33.550
HotRscTypecheck_510_510_3.run 33.720
HotRscTypecheck_510_510_3.run 34.164

Currently, I think that all those configuration are equivalent reliability-wise. There doesn't seem to be any reason to pay 300s for 5/10/3 or 3/10/5 if 120s of 3/10/2 seems to be roughly as good as far as run-to-run variance is concerned.

This afternoon, we'll be experimenting with 510_510_5, 1010_1010_5 and 1020_1020_3.

from rsc.

xeno-by commented on July 26, 2024

Here's another batch of results that I forgot to include into the grepping session. It's not as detailed as the results above. 20/10/3 seems to be no better than 3/10/2, but the jury is still out wrt the others.

HotRscTypecheck_1010_1010_6.run 33.170
HotRscTypecheck_1010_1010_6.run 33.313
HotRscTypecheck_1010_1010_6.run 33.338
HotRscTypecheck_1010_1010_6.run 33.469
HotRscTypecheck_1010_1010_6.run 33.517
HotRscTypecheck_1010_1010_6.run 33.562

HotRscTypecheck_1020_1020_3.run 33.249
HotRscTypecheck_1020_1020_3.run 33.477
HotRscTypecheck_1020_1020_3.run 33.527
HotRscTypecheck_1020_1020_3.run 33.612
HotRscTypecheck_1020_1020_3.run 33.621
HotRscTypecheck_1020_1020_3.run 33.780
HotRscTypecheck_1020_1020_3.run 33.714
HotRscTypecheck_1020_1020_3.run 33.565
HotRscTypecheck_1020_1020_3.run 33.767
HotRscTypecheck_1020_1020_3.run 33.357

HotRscTypecheck_2010_2010_3.run 33.247
HotRscTypecheck_2010_2010_3.run 33.357
HotRscTypecheck_2010_2010_3.run 33.380
HotRscTypecheck_2010_2010_3.run 33.593
HotRscTypecheck_2010_2010_3.run 34.113
HotRscTypecheck_2010_2010_3.run 34.123

HotRscTypecheck_2020_2020_6.run 33.572
HotRscTypecheck_2020_2020_6.run 33.605
HotRscTypecheck_2020_2020_6.run 33.776

from rsc.

xeno-by commented on July 26, 2024

/cc @adriaanm @lrytz @SethTisue @szeiger @retronym

from rsc.

xeno-by commented on July 26, 2024

Also /cc @liufengyun

from rsc.

andreaTP commented on July 26, 2024

sub millisecond(IMO) means that you start to exercise L3 that does not look predominant(so far) in such cases...

from rsc.

xeno-by commented on July 26, 2024

Some early results from the current run:

HotRscTypecheck_510_510_5.run 33.183
HotRscTypecheck_510_510_5.run 33.112
HotRscTypecheck_510_510_5.run 33.591
HotRscTypecheck_510_510_5.run 33.718
HotRscTypecheck_510_510_5.run 33.438
HotRscTypecheck_510_510_5.run 33.444
HotRscTypecheck_510_510_5.run 33.428
HotRscTypecheck_510_510_5.run 33.893
HotRscTypecheck_510_510_5.run 33.665
HotRscTypecheck_510_510_5.run 33.294
HotRscTypecheck_510_510_5.run 33.594
HotRscTypecheck_510_510_5.run 33.549
HotRscTypecheck_510_510_5.run 33.830
HotRscTypecheck_510_510_5.run 33.458
HotRscTypecheck_510_510_5.run 33.416
HotRscTypecheck_510_510_5.run 33.738
HotRscTypecheck_510_510_5.run 33.309
HotRscTypecheck_510_510_5.run 33.690
HotRscTypecheck_510_510_5.run 33.524
HotRscTypecheck_510_510_5.run 33.418
HotRscTypecheck_510_510_5.run 33.455
HotRscTypecheck_510_510_5.run 33.615
HotRscTypecheck_510_510_5.run 33.865

HotRscTypecheck_1010_1010_5.run 33.603
HotRscTypecheck_1010_1010_5.run 33.292
HotRscTypecheck_1010_1010_5.run 33.612
HotRscTypecheck_1010_1010_5.run 33.338
HotRscTypecheck_1010_1010_5.run 33.581
HotRscTypecheck_1010_1010_5.run 33.277
HotRscTypecheck_1010_1010_5.run 33.584
HotRscTypecheck_1010_1010_5.run 33.529
HotRscTypecheck_1010_1010_5.run 33.809
HotRscTypecheck_1010_1010_5.run 33.526

from rsc.

xeno-by commented on July 26, 2024

@andreaTP Can you elaborate? I'm not sure I fully understand what you mean.

from rsc.

andreaTP commented on July 26, 2024

Sure, we are anyhow working on a managed runtime, this does mean that you have no full control over a number of variables. Sub millis checks are useful (in my very personal experience) when you start looking at your architecture performance (I. E. your processor) running on the Jvm. I think someone more knowledgeable than me can chip in this discussion @mjpt777 (just trying to summon him :-))

from rsc.

xeno-by commented on July 26, 2024

I'm getting really worked up about this, because our best benchmark for Rsc currently runs in slightly less than 25ms. >1ms run-to-run variance is a 4% potential error.

As it currently stands, having to make yay/nay judgements using such imprecise information makes me quite nervous. If a 4% error stacks 10 times (the rough number of "Benchmark XXX optimization" issues that we have in flight), it becomes a 1.5x difference in performance, and that is huge.

from rsc.

retronym commented on July 26, 2024

It's a hard problem...

You can get some insight into VM based sources of jitter with:

> jmh:run Bench -f1 -wi 0 -i100 -prof hs_comp -prof hs_gc `

Watch how long it takes for compiler.totalCompiles to reach a steady state. You can also look for pattern in the GC stats. I find it useful to oversize the heap so that full GCs are very rare.

You could try to minimize OS jitter with our benv script.

I've found that even without super precise benchmarks, having continuous benchmark graphs around can help ensure the trend goes the right way.

from rsc.

mjpt777 commented on July 26, 2024

Run to run variance can be much greater than 4% on general purpose operating system and managed platform. Things like CPU clock scaling, scheduling, research starvation, OS configuration, JVM configuration, JIT compiler races, etc. etc. JVMs like Azul Zing can help but you need to build a signficant knowledge base to do this well. On a three day course I just about manage to cover the major topics people need to start becoming aware of.

However progress has to start somewhere with experimenting and being curious at the core. Have fun experimenting and learning but be very careful reading too much into your discoveries as the reasons behind some results can be very surprising and often not obvious.

from rsc.

liufengyun commented on July 26, 2024

For development purposes, when there's a regressional benchmark infrastructure, usually developers don't care much about < 1ms. If a performance change is invisible in the graph, developers will just ignore, as they cannot do any optimisation based on such changes.

Instead, developers usually care:

visible big changes relative to previous points
the trend of the curve
no big intra-point variance that's visible in graph: via min and avg curve
no inter-point variance visible in graph that cannot be explained by code change in PR

I Dotty, we have take following measures to address the 3rd and 4th concern in addition to stablizing the machine:

have a test "emptyFile", which is supposed to be always/mostly flat
show both min and avg in the curve
allow developer to issue test command to have several points for the same PR

We still experience visible intra-point variance for some tests and sometimes get inexplicable inter-point changes. While we are still fighting against the variances, the infrastructure has been useful in confirming performance improvements and catching big regressions.

from rsc.

xeno-by commented on July 26, 2024

Thank you, everyone, for the valuable comments! Judging from your feedback, I think that my paranoia about run-to-run variance went a bit over the top.

We'll be settling down on 3/10/2 for quick benches and 10/10/5 for CI, accepting the current variance for the time being and relying on continuous benchmark graphs to help us detect meaningful trends in compilation performance.

Moreover, as we'll be implementing more and more features from full Scala, I expect that we'll be taking on more involved Scala projects. This will make millisecond-precision hair splitting moot, since most benchmarks will likely no longer run in double-digit millisecond timeframes.

from rsc.

Figure out the methodology for JVM benchmarks about rsc HOT 13 CLOSED

Comments (13)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs