GithubHelp home page GithubHelp logo

Comments (7)

petertorelli avatar petertorelli commented on June 23, 2024

If a benchmark does not run for a long enough period, especially if you are running on an operating system (vs. bare metal), noise from the system can interfere with the measurement. Think about it this way, if you try to time a single loop, and the OS interrupts the process briefly, the time of one loop could vary from 1-10x. Running more iterations and extending the benchmark runtime amortizes the noise over the measurement period, resulting in more stable measurements (less run-to-run variation). Given the number of instructions run in a single loop of CoreMark, 10 seconds was deemed to be of a sufficient order of magnitude greater than the noise in the majority of deployments. However, since you specified arg4 = 0, it should automatically determine the number of iterations (see lines 242-263 in core_main.c) and pick a # that results in 10 seconds by increasing the # of iterations 10x each time. Odd that it missed the target by ~214ms. I'm curious as to why that happened: very unusual.

from coremark.

yuxianch avatar yuxianch commented on June 23, 2024

Thanks a lot for your comment! I tried to build the binary with clang and icc and also tried on another machine. They have the same issue as well.

from coremark.

petertorelli avatar petertorelli commented on June 23, 2024

If I were you I would verify how the macro GETMYTIME is implemented and that it is using the correct TIMER_RES_DIVIDER. I would also instrument the code in 242-263. What platform are you running on?

from coremark.

yuxianch avatar yuxianch commented on June 23, 2024
  1. GETMYTIME(_t) is expanded to clock_gettime(0, _t) and TIMER_RES_DIVIDER is set to 1000000. Collecting time part should work well.
  2. I printed the value of secs_passed after line 253 and found that the autual run time is not always proportional to the value of iterations.

    coremark/core_main.c

    Lines 241 to 263 in 1541482

    /* automatically determine number of iterations if not set */
    if (results[0].iterations == 0)
    {
    secs_ret secs_passed = 0;
    ee_u32 divisor;
    results[0].iterations = 1;
    while (secs_passed < (secs_ret)1)
    {
    results[0].iterations *= 10;
    start_time();
    iterate(&results[0]);
    stop_time();
    secs_passed = time_in_secs(get_time());
    }
    /* now we know it executes for at least 1 sec, set actual run time at
    * about 10 secs */
    divisor = (ee_u32)secs_passed;
    if (divisor == 0) /* some machines cast float to int as 0 since this
    conversion is not defined by ANSI, but we know at
    least one second passed */
    divisor = 1;
    results[0].iterations *= 1 + 10 / divisor;
    }

    For example, when iterations=10, the run time of iterate(&results[0]) is 0.001s. When iterations=100, the run time is 0.017s, which is nearly 10 times of the first run time(0.001s). When iterations=1000, the run time is 0.276s, which is about 16 times of the second run time(0.017s). When iterations=10000, the run time is 3.144s, about 11 times of the third run time(0.276s). Now iterations is set to 40000 according to the code, and we expects the run time to be 3.144s*4=12.576s, which is more than 10 secs. However, it's hard to make sure that the last run time 3.144s is resprentative so the run time of 40000 iterations could be less or more than 10 secs.
secs_passed: 0.001000
secs_passed: 0.017000
secs_passed: 0.276000
secs_passed: 3.144000
  1. I ran on Xeon(R) CPU E5-2680 v3+redhat 8.2.

from coremark.

petertorelli avatar petertorelli commented on June 23, 2024

@yuxianch

The first half of what you report makes sense, the second half does not.

First: yes, as the number of iterations increases from ~10 to ~10,000 the IPS will increase. On a Xeon machine running Linux, one loop of CoreMark is well within the OS noise, and much faster than the tolerance of the measurement function. As you increase the number of iterations, the percentage of the time spent measuring the OS and the clock code goes to zero. (Note: You could conceivably measure a single loop of CoreMark on a Xeon, but you would need to turn off interrupts and power management, run at Ring0, warm the cache, and use RDTSC as the timing instruction, which measures core clock ticks. This would be "bare metal".)

Second: As the number of iterations increases, the IPS should become constant. Since this is not happening, it makes me think we need to back up a bit. CoreMark does not run at Ring-0, which means it can be interrupted by the OS. If you are doing something on your machine while the benchmark is running, you will interfere with it and collect invalid measurements. You must make sure every non-essential process is terminated. And don't move any windows in the GUI or interact with the machine in any way.

The IPS between 100k, 150k, 200k iterations should be roughly the same. If the IPS is not stabilizing, your computer is doing something else during the benchmark and interfering with it.

Strange problem. This timing loop is pretty simple and has been in use for 12 years, we have scores from 2 MHz to 3000 MHz on ~500 platforms, so I'm pretty sure this has something to do with your OS activity.

from coremark.

yuxianch avatar yuxianch commented on June 23, 2024

One thing that I am sure is that when running CoreMark, there is no other heavy process running. I can only see some light processes which occpy 0% CPU and 0% MEM.

from coremark.

petertorelli avatar petertorelli commented on June 23, 2024

Any update or can we close this?

from coremark.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.