When building and running coremark.exe with gcc on redhat 8.2, sometimes I will get th

GETMYTIME(_t) is expanded to clock_gettime(0, _t) and TIMER_RES_DIVIDER is set t

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Question regarding "Must execute for at least 10 secs for a valid result" about coremark HOT 7 CLOSED

eembc commented on June 23, 2024

Question regarding "Must execute for at least 10 secs for a valid result"

from coremark.

Comments (7)

petertorelli commented on June 23, 2024

If a benchmark does not run for a long enough period, especially if you are running on an operating system (vs. bare metal), noise from the system can interfere with the measurement. Think about it this way, if you try to time a single loop, and the OS interrupts the process briefly, the time of one loop could vary from 1-10x. Running more iterations and extending the benchmark runtime amortizes the noise over the measurement period, resulting in more stable measurements (less run-to-run variation). Given the number of instructions run in a single loop of CoreMark, 10 seconds was deemed to be of a sufficient order of magnitude greater than the noise in the majority of deployments. However, since you specified arg4 = 0, it should automatically determine the number of iterations (see lines 242-263 in core_main.c) and pick a # that results in 10 seconds by increasing the # of iterations 10x each time. Odd that it missed the target by ~214ms. I'm curious as to why that happened: very unusual.

from coremark.

yuxianch commented on June 23, 2024

Thanks a lot for your comment! I tried to build the binary with clang and icc and also tried on another machine. They have the same issue as well.

from coremark.

petertorelli commented on June 23, 2024

If I were you I would verify how the macro GETMYTIME is implemented and that it is using the correct TIMER_RES_DIVIDER. I would also instrument the code in 242-263. What platform are you running on?

from coremark.

yuxianch commented on June 23, 2024

GETMYTIME(_t) is expanded to clock_gettime(0, _t) and TIMER_RES_DIVIDER is set to 1000000. Collecting time part should work well.

I printed the value of secs_passed after line 253 and found that the autual run time is not always proportional to the value of iterations.

coremark/core_main.c

Lines 241 to 263 in 1541482

 /* automatically determine number of iterations if not set */ 

 if (results[0].iterations == 0) 

 { 

 secs_ret secs_passed = 0; 

 ee_u32 divisor; 

 results[0].iterations = 1; 

 while (secs_passed < (secs_ret)1) 

 { 

 results[0].iterations *= 10; 

 start_time(); 

 iterate(&results[0]); 

 stop_time(); 

 secs_passed = time_in_secs(get_time()); 

 } 

 /* now we know it executes for at least 1 sec, set actual run time at 

  * about 10 secs */ 

 divisor = (ee_u32)secs_passed; 

 if (divisor == 0) /* some machines cast float to int as 0 since this 

  conversion is not defined by ANSI, but we know at 

  least one second passed */ 

 divisor = 1; 

 results[0].iterations *= 1 + 10 / divisor; 

 }

For example, when iterations=10, the run time of iterate(&results[0]) is 0.001s. When iterations=100, the run time is 0.017s, which is nearly 10 times of the first run time(0.001s). When iterations=1000, the run time is 0.276s, which is about 16 times of the second run time(0.017s). When iterations=10000, the run time is 3.144s, about 11 times of the third run time(0.276s). Now iterations is set to 40000 according to the code, and we expects the run time to be 3.144s*4=12.576s, which is more than 10 secs. However, it's hard to make sure that the last run time 3.144s is resprentative so the run time of 40000 iterations could be less or more than 10 secs.

secs_passed: 0.001000
secs_passed: 0.017000
secs_passed: 0.276000
secs_passed: 3.144000

I ran on Xeon(R) CPU E5-2680 v3+redhat 8.2.

from coremark.

petertorelli commented on June 23, 2024

@yuxianch

The first half of what you report makes sense, the second half does not.

First: yes, as the number of iterations increases from ~10 to ~10,000 the IPS will increase. On a Xeon machine running Linux, one loop of CoreMark is well within the OS noise, and much faster than the tolerance of the measurement function. As you increase the number of iterations, the percentage of the time spent measuring the OS and the clock code goes to zero. (Note: You could conceivably measure a single loop of CoreMark on a Xeon, but you would need to turn off interrupts and power management, run at Ring0, warm the cache, and use RDTSC as the timing instruction, which measures core clock ticks. This would be "bare metal".)

Second: As the number of iterations increases, the IPS should become constant. Since this is not happening, it makes me think we need to back up a bit. CoreMark does not run at Ring-0, which means it can be interrupted by the OS. If you are doing something on your machine while the benchmark is running, you will interfere with it and collect invalid measurements. You must make sure every non-essential process is terminated. And don't move any windows in the GUI or interact with the machine in any way.

The IPS between 100k, 150k, 200k iterations should be roughly the same. If the IPS is not stabilizing, your computer is doing something else during the benchmark and interfering with it.

Strange problem. This timing loop is pretty simple and has been in use for 12 years, we have scores from 2 MHz to 3000 MHz on ~500 platforms, so I'm pretty sure this has something to do with your OS activity.

from coremark.

yuxianch commented on June 23, 2024

One thing that I am sure is that when running CoreMark, there is no other heavy process running. I can only see some light processes which occpy 0% CPU and 0% MEM.

from coremark.

petertorelli commented on June 23, 2024

Any update or can we close this?

from coremark.

Question regarding "Must execute for at least 10 secs for a valid result" about coremark HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs

	/* automatically determine number of iterations if not set */
	if (results[0].iterations == 0)
	{
	secs_ret secs_passed = 0;
	ee_u32 divisor;
	results[0].iterations = 1;
	while (secs_passed < (secs_ret)1)
	{
	results[0].iterations *= 10;
	start_time();
	iterate(&results[0]);
	stop_time();
	secs_passed = time_in_secs(get_time());
	}
	/* now we know it executes for at least 1 sec, set actual run time at
	* about 10 secs */
	divisor = (ee_u32)secs_passed;
	if (divisor == 0) /* some machines cast float to int as 0 since this
	conversion is not defined by ANSI, but we know at
	least one second passed */
	divisor = 1;
	results[0].iterations *= 1 + 10 / divisor;
	}