GithubHelp home page GithubHelp logo

eembc / coremark Goto Github PK

View Code? Open in Web Editor NEW
880.0 27.0 307.0 493 KB

CoreMark® is an industry-standard benchmark that measures the performance of central processing units (CPU) and embedded microcrontrollers (MCU).

License: Other

Makefile 11.66% C 88.34%

coremark's Introduction

Introduction

CoreMark's primary goals are simplicity and providing a method for testing only a processor's core features. For more information about EEMBC's comprehensive embedded benchmark suites, please see www.eembc.org.

For a more compute-intensive version of CoreMark that uses larger datasets and execution loops taken from common applications, please check out EEMBC's CoreMark-PRO benchmark, also on GitHub.

Building and Running

To build and run the benchmark, type

> make

Full results are available in the files run1.log and run2.log. CoreMark result can be found in run1.log.

Cross Compiling

For cross compile platforms please adjust core_portme.mak, core_portme.h (and possibly core_portme.c) according to the specific platform used. When porting to a new platform, it is recommended to copy one of the default port folders (e.g. mkdir <platform> && cp linux/* <platform>), adjust the porting files, and run:

% make PORT_DIR=<platform>

Make Targets

  • run - Default target, creates run1.log and run2.log.
  • run1.log - Run the benchmark with performance parameters, and output to run1.log
  • run2.log - Run the benchmark with validation parameters, and output to run2.log
  • run3.log - Run the benchmark with profile generation parameters, and output to run3.log
  • compile - compile the benchmark executable
  • link - link the benchmark executable
  • check - test MD5 of sources that may not be modified
  • clean - clean temporary files

Make flag: ITERATIONS

By default, the benchmark will run between 10-100 seconds. To override, use ITERATIONS=N

% make ITERATIONS=10

Will run the benchmark for 10 iterations. It is recommended to set a specific number of iterations in certain situations e.g.:

  • Running with a simulator
  • Measuring power/energy
  • Timing cannot be restarted

Minimum required run time: Results are only valid for reporting if the benchmark ran for at least 10 secs!

Make flag: XCFLAGS

To add compiler flags from the command line, use XCFLAGS e.g.:

% make XCFLAGS="-DMULTITHREAD=4 -DUSE_FORK"

Make flag: CORE_DEBUG

Define to compile for a debug run if you get incorrect CRC.

% make XCFLAGS="-DCORE_DEBUG=1"

Make flag: REBUILD

Force a rebuild of the executable.

Systems Without make

The following files need to be compiled:

  • core_list_join.c
  • core_main.c
  • core_matrix.c
  • core_state.c
  • core_util.c
  • PORT_DIR/core_portme.c

For example:

% gcc -O2 -o coremark.exe core_list_join.c core_main.c core_matrix.c core_state.c core_util.c simple/core_portme.c -DPERFORMANCE_RUN=1 -DITERATIONS=1000
% ./coremark.exe > run1.log

The above will compile the benchmark for a performance run and 1000 iterations. Output is redirected to run1.log.

Parallel Execution

Use XCFLAGS=-DMULTITHREAD=N where N is number of threads to run in parallel. Several implementations are available to execute in multiple contexts, or you can implement your own in core_portme.c.

% make XCFLAGS="-DMULTITHREAD=4 -DUSE_PTHREAD -pthread"

The above will compile the benchmark for execution on 4 cores, using POSIX Threads API. Forking is also supported:

% make XCFLAGS="-DMULTITHREAD=4 -DUSE_FORK"

Note: linking may fail on the previous command if your linker does not automatically add the pthread library. If you encounter undefined reference errors, please modify the core_portme.mak file for your platform, (e.g. linux/core_portme.mak) and add -pthread to the LFLAGS_END parameter.

Run Parameters for the Benchmark Executable

CoreMark's executable takes several parameters as follows (but only if main() accepts arguments): 1st - A seed value used for initialization of data. 2nd - A seed value used for initialization of data. 3rd - A seed value used for initialization of data. 4th - Number of iterations (0 for auto : default value) 5th - Reserved for internal use. 6th - Reserved for internal use. 7th - For malloc users only, ovreride the size of the input data buffer.

The run target from make will run coremark with 2 different data initialization seeds.

Alternative parameters:

If not using malloc or command line arguments are not supported, the buffer size for the algorithms must be defined via the compiler define TOTAL_DATA_SIZE. TOTAL_DATA_SIZE must be set to 2000 bytes (default) for standard runs. The default for such a target when testing different configurations could be:

% make XCFLAGS="-DTOTAL_DATA_SIZE=6000 -DMAIN_HAS_NOARGC=1"

Submitting Results

CoreMark results can be submitted on the web. Open a web browser and go to the submission page. After registering an account you may enter a score.

Run Rules

What is and is not allowed.

Required

  1. The benchmark needs to run for at least 10 seconds.
  2. All validation must succeed for seeds 0,0,0x66 and 0x3415,0x3415,0x66, buffer size of 2000 bytes total.
    • If not using command line arguments to main:
	% make XCFLAGS="-DPERFORMANCE_RUN=1" REBUILD=1 run1.log
	% make XCFLAGS="-DVALIDATION_RUN=1" REBUILD=1 run2.log
  1. If using profile guided optimization, profile must be generated using seeds of 8,8,8, and buffer size of 1200 bytes total.
    % make XCFLAGS="-DTOTAL_DATA_SIZE=1200 -DPROFILE_RUN=1" REBUILD=1 run3.log
  1. All source files must be compiled with the same flags.
  2. All data type sizes must match size in bits such that:
    • ee_u8 is an unsigned 8-bit datatype.
    • ee_s16 is a signed 16-bit datatype.
    • ee_u16 is an unsigned 16-bit datatype.
    • ee_s32 is a signed 32-bit datatype.
    • ee_u32 is an unsigned 32-bit datatype.

Allowed

  1. Changing number of iterations
  2. Changing toolchain and build/load/run options
  3. Changing method of acquiring a data memory block
  4. Changing the method of acquiring seed values
  5. Changing implementation in core_portme.c
  6. Changing configuration values in core_portme.h
  7. Changing core_portme.mak

NOT ALLOWED

  1. Changing of source file other then core_portme* (use make check to validate)

Reporting rules

Use the following syntax to report results on a data sheet:

CoreMark 1.0 : N / C [/ P] [/ M]

N - Number of iterations per second with seeds 0,0,0x66,size=2000)

C - Compiler version and flags

P - Parameters such as data and code allocation specifics

  • This parameter may be omitted if all data was allocated on the heap in RAM.
  • This parameter may not be omitted when reporting CoreMark/MHz

M - Type of parallel execution (if used) and number of contexts

  • This parameter may be omitted if parallel execution was not used.

e.g.:

CoreMark 1.0 : 128 / GCC 4.1.2 -O2 -fprofile-use / Heap in TCRAM / FORK:2 

or

CoreMark 1.0 : 1400 / GCC 3.4 -O4 

If reporting scaling results, the results must be reported as follows:

CoreMark/MHz 1.0 : N / C / P [/ M]

P - When reporting scaling results, memory parameter must also indicate memory frequency:core frequency ratio.

  1. If the core has cache and cache frequency to core frequency ratio is configurable, that must also be included.

e.g.:

CoreMark/MHz 1.0 : 1.47 / GCC 4.1.2 -O2 / DDR3(Heap) 30:1 Memory 1:1 Cache

Log File Format

The log files have the following format

2K performance run parameters for coremark.	(Run type)
CoreMark Size    	: 666					(Buffer size)
Total ticks			: 25875					(platform dependent value)
Total time (secs) 	: 25.875000				(actual time in seconds)
Iterations/Sec 		: 3864.734300			(Performance value to report)
Iterations			: 100000				(number of iterations used)
Compiler version	: GCC3.4.4				(Compiler and version)	
Compiler flags		: -O2					(Compiler and linker flags)
Memory location		: Code in flash, data in on chip RAM
seedcrc				: 0xe9f5				(identifier for the input seeds)
[0]crclist			: 0xe714				(validation for list part)
[0]crcmatrix		: 0x1fd7				(validation for matrix part)
[0]crcstate			: 0x8e3a				(validation for state part)
[0]crcfinal			: 0x33ff				(iteration dependent output)
Correct operation validated. See README.md for run and reporting rules.  (*Only when run is successful*)
CoreMark 1.0 : 6508.490622 / GCC3.4.4 -O2 / Heap 						  (*Only on a successful performance run*)

Theory of Operation

This section describes the initial goals of CoreMark and their implementation.

Small and easy to understand

  • X number of source code lines for timed portion of the benchmark.
  • Meaningful names for variables and functions.
  • Comments for each block of code more than 10 lines long.

Portability

A thin abstraction layer will be provided for I/O and timing in a separate file. All I/O and timing of the benchmark will be done through this layer.

Code / data size

  • Compile with gcc on x86 and make sure all sizes are according to requirements.
  • If dynamic memory allocation is used, take total memory allocated into account as well.
  • Avoid recursive functions and keep track of stack usage.
  • Use the same memory block as data site for all algorithms, and initialize the data before each algorithm – while this means that initialization with data happens during the timed portion, it will only happen once during the timed portion and so have negligible effect on the results.

Controlled output

This may be the most difficult goal. Compilers are constantly improving and getting better at analyzing code. To create work that cannot be computed at compile time and must be computed at run time, we will rely on two assumptions:

  • Some system functions (e.g. time, scanf) and parameters cannot be computed at compile time. In most cases, marking a variable volatile means the compiler is force to read this variable every time it is read. This will be used to introduce a factor into the input that cannot be precomputed at compile time. Since the results are input dependent, that will make sure that computation has to happen at run time.

  • Either a system function or I/O (e.g. scanf) or command line parameters or volatile variables will be used before the timed portion to generate data which is not available at compile time. Specific method used is not relevant as long as it can be controlled, and that it cannot be computed or eliminated by the compiler at compile time. E.g. if the clock() functions is a compiler stub, it may not be used. The derived values will be reported on the output so that verification can be done on a different machine.

  • We cannot rely on command line parameters since some embedded systems do not have the capability to provide command line parameters. All 3 methods above will be implemented (time based, scanf and command line parameters) and all 3 are valid if the compiler cannot determine the value at compile time.

  • It is important to note that The actual values that are to be supplied at run time will be standardized. The methodology is not intended to provide random data, but simply to provide controlled data that cannot be precomputed at compile time.

  • Printed results must be valid at run time. This will be used to make sure the computation has been executed.

  • Some embedded systems do not provide “printf” or other I/O functionality. All I/O will be done through a thin abstraction interface to allow execution on such systems (e.g. allow output via JTAG).

Key Algorithms

Linked List

The following linked list structure will be used:

typedef struct list_data_s {
	ee_s16 data16;
	ee_s16 idx;
} list_data;

typedef struct list_head_s {
	struct list_head_s *next;
	struct list_data_s *info;
} list_head;

While adding a level of indirection accessing the data, this structure is realistic and used in many embedded applications for small to medium lists.

The list itself will be initialized on a block of memory that will be passed in to the initialization function. While in general linked lists use malloc for new nodes, embedded applications sometime control the memory for small data structures such as arrays and lists directly to avoid the overhead of system calls, so this approach is realistic.

The linked list will be initialized such that 1/4 of the list pointers point to sequential areas in memory, and 3/4 of the list pointers are distributed in a non sequential manner. This is done to emulate a linked list that had add/remove happen for a while disrupting the neat order, and then a series of adds that are likely to come from sequential memory locations.

For the benchmark itself:

  • Multiple find operations are going to be performed. These find operations may result in the whole list being traversed. The result of each find will become part of the output chain.
  • The list will be sorted using merge sort based on the data16 value, and then derive CRC of the data16 item in order for part of the list. The CRC will become part of the output chain.
  • The list will be sorted again using merge sort based on the idx value. This sort will guarantee that the list is returned to the primary state before leaving the function, so that multiple iterations of the function will have the same result. CRC of the data16 for part of the list will again be calculated and become part of the output chain.

The actual data16 in each cell will be pseudo random based on a single 16b input that cannot be determined at compile time. In addition, the part of the list which is used for CRC will also be passed to the function, and determined based on an input that cannot be determined at run time.

Matrix Multiply

This very simple algorithm forms the basis of many more complex algorithms. The tight inner loop is the focus of many optimizations (compiler as well as hardware based) and is thus relevant for embedded processing.

The total available data space will be divided to 3 parts:

  1. NxN matrix A.
  2. NxN matrix B.
  3. NxN matrix C.

E.g. for 2K we will have 3 12x12 matrices (assuming data type of 32b 12(len)*12(wid)*4(size)*3(num) =1728 bytes).

Matrix A will be initialized with small values (upper 3/4 of the bits all zero). Matrix B will be initialized with medium values (upper half of the bits all zero). Matrix C will be used for the result.

For the benchmark itself:

  • Multiple A by a constant into C, add the upper bits of each of the values in the result matrix. The result will become part of the output chain.
  • Multiple A by column X of B into C, add the upper bits of each of the values in the result matrix. The result will become part of the output chain.
  • Multiple A by B into C, add the upper bits of each of the values in the result matrix. The result will become part of the output chain.

The actual values for A and B must be derived based on input that is not available at compile time.

State Machine

This part of the code needs to exercise switch and if statements. As such, we will use a small Moore state machine. In particular, this will be a state machine that identifies string input as numbers and divides them according to format.

The state machine will parse the input string until either a “,” separator or end of input is encountered. An invalid number will cause the state machine to return invalid state and a valid number will cause the state machine to return with type of number format (int/float/scientific).

This code will perform a realistic task, be small enough to easily understand, and exercise the required functionality. The other option used in embedded systems is a mealy based state machine, which is driven by a table. The table then determines the number of states and complexity of transitions. This approach, however, tests mainly the load/store and function call mechanisms and less the handling of branches. If analysis of the final results shows that the load/store functionality of the processor is not exercised thoroughly, it may be a good addition to the benchmark (codesize allowing).

For input, the memory block will be initialized with comma separated values of mixed formats, as well as invalid inputs.

For the benchmark itself:

  • Invoke the state machine on all of the input and count final states and state transitions. CRC of all final states and transitions will become part of the output chain.
  • Modify the input at intervals (inject errors) and repeat the state machine operation.
  • Modify the input back to original form.

The actual input must be initialized based on data that cannot be determined at compile time. In addition the intervals for modification of the input and the actual modification must be based on input that cannot be determined at compile time.

Validation

This release was tested on the following platforms:

  • x86 cygwin and gcc 3.4 (Quad, dual and single core systems)
  • x86 linux (Ubuntu/Fedora) and gcc (4.2/4.1) (Quad and single core systems)
  • MIPS64 BE linux and gcc 3.4 16 cores system
  • MIPS32 BE linux with CodeSourcery compiler 4.2-177 on Malta/Linux with a 1004K 3-core system
  • PPC simulator with gcc 4.2.2 (No OS)
  • PPC 64b BE linux (yellowdog) with gcc 3.4 and 4.1 (Dual core system)
  • BF533 with VDSP50
  • Renesas R8C/H8 MCU with HEW 4.05
  • NXP LPC1700 armcc v4.0.0.524
  • NEC 78K with IAR v4.61
  • ARM simulator with armcc v4

Memory Analysis

Valgrind 3.4.0 used and no errors reported.

Balance Analysis

Number of instructions executed for each function tested with cachegrind and found balanced with gcc and -O0.

Statistics

Lines:

Lines  Blank  Cmnts  Source     AESL     
=====  =====  =====  =====  ==========  =======================================
  469     66    170    251       627.5  core_list_join.c  (C)
  330     18     54    268       670.0  core_main.c  (C)
  256     32     80    146       365.0  core_matrix.c  (C)
  240     16     51    186       465.0  core_state.c  (C)
  165     11     20    134       335.0  core_util.c  (C)
  150     23     36     98       245.0  coremark.h  (C)
 1610    166    411   1083      2707.5  ----- Benchmark -----  (6 files)
  293     15     74    212       530.0  linux/core_portme.c  (C)
  235     30    104    104       260.0  linux/core_portme.h  (C)
  528     45    178    316       790.0  ----- Porting -----  (2 files)
 
* For comparison, here are the stats for Dhrystone
Lines  Blank  Cmnts  Source     AESL     
=====  =====  =====  =====  ==========  =======================================
  311     15    242     54       135.0  dhry.h  (C)
  789    132    119    553      1382.5  dhry_1.c  (C)
  186     26     68    107       267.5  dhry_2.c  (C)
 1286    173    429    714      1785.0  ----- C -----  (3 files)

Credits

Many thanks to all of the individuals who helped with the development or testing of CoreMark including (Sorted by company name; note that company names may no longer be accurate as this was written in 2009).

  • Alan Anderson, ADI
  • Adhikary Rajiv, ADI
  • Elena Stohr, ARM
  • Ian Rickards, ARM
  • Andrew Pickard, ARM
  • Trent Parker, CAVIUM
  • Shay Gal-On, EEMBC
  • Markus Levy, EEMBC
  • Peter Torelli, EEMBC
  • Ron Olson, IBM
  • Eyal Barzilay, MIPS
  • Jens Eltze, NEC
  • Hirohiko Ono, NEC
  • Ulrich Drees, NEC
  • Frank Roscheda, NEC
  • Rob Cosaro, NXP
  • Shumpei Kawasaki, RENESAS

Legal

Please refer to LICENSE.md in this repository for a description of your rights to use this code.

Copyright

Copyright © 2009 EEMBC All rights reserved. CoreMark is a trademark of EEMBC and EEMBC is a registered trademark of the Embedded Microprocessor Benchmark Consortium.

coremark's People

Contributors

brooksdavis avatar emaste avatar heshamelmatary avatar hosewiejacke avatar jrtc27 avatar kenta2 avatar konrad-schwarz avatar lhtin avatar mortbopet avatar petertorelli avatar sapek avatar scottj97 avatar thoughtpolice avatar timgates42 avatar volodymyr-bondarchuk avatar zulupro avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

coremark's Issues

coremark can test mutiCore CPU?

I porting coremark to AndroidPlatform.
The CPU has 8 core
The result is that:
gts6lwifi:/data/local/tmp $ ./libcoremark
2K performance run parameters for coremark.
CoreMark Size : 666
Total ticks : 16352
Total time (secs): 16.352000
Iterations/Sec : 18346.379648
Iterations : 300000
Compiler version : Android (7211189, based on r416183) Clang 12.0.4 (https://android.googlesource.com/toolchain/llvm-project c935d99d7cf2016289302412d708641d52d2f7ee)
Compiler flags : -DANDROID -fdata-sections -ffunction-sections -funwind-tables -fstack-protector-strong -no-canonical-prefixes -D_FORTIFY_SOURCE=2 -march=armv7-a -mthumb -Wformat -Werror=format-security -fexceptions -O2
Memory location : Please put data memory location here
(e.g. code in flash, data on heap etc)
seedcrc : 0xe9f5
[0]crclist : 0xe714
[0]crcmatrix : 0x1fd7
[0]crcstate : 0x8e3a
[0]crcfinal : 0xcc42
Correct operation validated. See README.md for run and reporting rules.
CoreMark 1.0 : 18346.379648 / Android (7211189, based on r416183) Clang 12.0.4 (https://android.googlesource.com/toolchain/llvm-project c935d99d7cf2016289302412d708641d52d2f7ee) -DANDROID -fdata-sections -ffunction-sections -funwind-tables -fstack-protector-strong -no-canonical-prefixes -D_FORTIFY_SOURCE=2 -march=armv7-a -mthumb -Wformat -Werror=format-security -fexceptions -O2 / Heap

I want to know this result is a single core score or muti core score ?

Error checking potentially not computed correctly if CRC and data type fail

Reported:

If the seedcrc doesn’t match a known type then total_errors=-1, However if the call to check_data_types()then fails then we could end up with a count of zero and a pass.

	switch (seedcrc) { /* test known output for common seeds */
		case 0x8a02: /* seed1=0, seed2=0, seed3=0x66, size 2000 per algorithm */
			known_id=0;
			ee_printf("6k performance run parameters for coremark.\n");
			break;
		case 0x7b05: /*  seed1=0x3415, seed2=0x3415, seed3=0x66, size 2000 per algorithm */
			known_id=1;
			ee_printf("6k validation run parameters for coremark.\n");
			break;
		case 0x4eaf: /* seed1=0x8, seed2=0x8, seed3=0x8, size 400 per algorithm */
			known_id=2;
			ee_printf("Profile generation run parameters for coremark.\n");
			break;
		case 0xe9f5: /* seed1=0, seed2=0, seed3=0x66, size 666 per algorithm */
			known_id=3;
			ee_printf("2K performance run parameters for coremark.\n");
			break;
		case 0x18f2: /*  seed1=0x3415, seed2=0x3415, seed3=0x66, size 666 per algorithm */
			known_id=4;
			ee_printf("2K validation run parameters for coremark.\n");
			break;
		default:
			total_errors=-1;
			break;
	}
	if (known_id>=0) {
		for (i=0 ; i<default_num_contexts; i++) {
			results[i].err=0;
			if ((results[i].execs & ID_LIST) && 
				(results[i].crclist!=list_known_crc[known_id])) {
				ee_printf("[%u]ERROR! list crc 0x%04x - should be 0x%04x\n",i,results[i].crclist,list_known_crc[known_id]);
				results[i].err++;
			}
			if ((results[i].execs & ID_MATRIX) &&
				(results[i].crcmatrix!=matrix_known_crc[known_id])) {
				ee_printf("[%u]ERROR! matrix crc 0x%04x - should be 0x%04x\n",i,results[i].crcmatrix,matrix_known_crc[known_id]);
				results[i].err++;
			}
			if ((results[i].execs & ID_STATE) &&
				(results[i].crcstate!=state_known_crc[known_id])) {
				ee_printf("[%u]ERROR! state crc 0x%04x - should be 0x%04x\n",i,results[i].crcstate,state_known_crc[known_id]);
				results[i].err++;
			}
			total_errors+=results[i].err;
		}
	}
	total_errors+=check_data_types();

Question regarding type `ee_f16`

Hello, I noticed that there is no type definition provided for ee_f16 inside the project (although it is used to define type MATDAT ). core_portme.h, which contains definitions for most of the ee_~ types doesn't contain definition for ee_f16. Are benchmark users expected to provide their own implementation and definition for ee_f16 to run the benchmark with ee_f16 ? Thank you 🤓

coremark/coremark.h

Lines 103 to 110 in 21d473a

#define MATDAT_INT 1
#if MATDAT_INT
typedef ee_s16 MATDAT;
typedef ee_s32 MATRES;
#else
typedef ee_f16 MATDAT;
typedef ee_f32 MATRES;
#endif

Library-function test-driver

The current test driver is based on string (usually stdout and argv) I/O and is not quite convenient to call from more restrictive environments such as iOS. A graphics environment might also want to display the results without too much string manipulation. A more "struct-ured" (pardon my pun) API taking a struct input (multithread, seed) and a struct output (anything printed) might be more adequate and portable. core_results is 90% there already.

undefined reference to pthread_*

When attempting to run all-threads on a Ryzen 7 3700 system:

user@box:~/coremark$ make XCFLAGS="-DMULTITHREAD=16 -DUSE_PTHREAD"
[...]
/usr/bin/ld: /tmp/cc1D8hXe.o: in function `core_start_parallel':
core_portme.c:(.text+0x199): undefined reference to `pthread_create'
/usr/bin/ld: /tmp/cc1D8hXe.o: in function `core_stop_parallel':
core_portme.c:(.text+0x1d0): undefined reference to `pthread_join'
collect2: error: ld returned 1 exit status

coremark for rasberrypi

Hi

I am trying to run coremark benchmark on rasberry pi board , I ported the coremark to rasberrypi But when i run the code I am getting "segmentation fault" error . Board details are mentioned below

execution steps:

cd coremark
make linux
./coremark.exe

Board Details:
Linux pi64 4.19.66-v8-fc5826fb999e-p4-bis+ #2 SMP PREEMPT Fri Aug 16 13:58:31 GMT 2019 aarch64 GNU/Linux

Error :
"Segmentation fault"

I wanted to know what could be the reason .
If anyone know how to resolve the issue ,help me to resolve .

Regards
Megha

typo in coremark.h

coremark.h

/* ultithread specific */

assume this is intended to mean multithread specific

Building with `-Wall and -Wextra` creates warnings

Building with -Wall and -Wextra creates the following warnings:

cc -O2 -Imacos -Iposix -I. -DFLAGS_STR=\""-O2   "\" -DITERATIONS=0 -Wall -Wextra  core_list_join.c core_main.c core_matrix.c core_state.c core_util.c posix/core_portme.c -o ./coremark.exe 
posix/core_portme.c:209:38: warning: unused parameter 'argc' [-Wunused-parameter]
portable_init(core_portable *p, int *argc, char *argv[])
                                     ^
posix/core_portme.c:209:50: warning: unused parameter 'argv' [-Wunused-parameter]
portable_init(core_portable *p, int *argc, char *argv[])
                                                 ^
lib/coremark/core_list_join.c: In function 'core_bench_list':
lib/coremark/core_list_join.c:445:60: warning: 'info.data16' may be used uninitialized in this function [-Wmaybe-uninitialized]
  445 |         while (list && ((list->info->data16 & 0xff) != info->data16))
      |                                                        ~~~~^~~~~~~~
lib/coremark/core_list_join.c:167:16: note: 'info.data16' was declared here
  167 |     list_data  info;
      |                ^~~~
4 warnings generated.

Tested with: riscv64-unknown-elf-gcc (GCC) 11.1.0

PORT_DIR error

make -j8 all
/bin/sh: 1: [[: not found
/bin/sh: 1: [[: not found
Makefile:45: *** PLEASE define PORT_DIR! (e.g. make PORT_DIR=simple). Stop.

i am using linux 32 bit and in make file its defined as PORT_DIR=Linux but still i am unable to buil in eclipse neon. please help

core_matrix uses suboptimal index type for matrix functions on 4-bit archs

I.e. in

void matrix_add_const(ee_u32 N, MATDAT *A, MATDAT val) {
	ee_u32 i,j;
	for (i=0; i<N; i++) {
		for (j=0; j<N; j++) {
			A[i*N+j] += val;
		}
	}
}

since unsigned types are allowed to wrap/overflow this code is not the equivalent of some abstract A[i][j] when pointers are 64-bit (where is no guarantee that i*N + j doesn't wrap). Solution: either use signed type or size_t.

Note, that it becomes a correctness issue, not a performance one, if N is allowed to be > 2^16.

How to deploy known_id=2 in MCU's coremark test?

Dear,

I want to test the score using Coremark's test code for an MCU with RAM = 2KB.
How should I deploy it?
How to calculate the coremark score for the test results?
Currently my test results are as follows.

CoreMark Size : 170
Total ticks : 15673
Total time (secs):
Iterations/Sec :
Iterations : 5000
Compiler version : GCC10.3.1
Compiler flags : -Os
Memory location : STACK
seedcrc : 0xd967
[0]crclist : 0x11ee
[0]crcmatrix : 0x66cc
[0]crcstate : 0x5cd9
[0]crcfinal : 0x0a03
Cannot validate operation for these seed values, please compare with results on a known platform.

Thanks for your help in advance.
BR,
Cpantherk

Failure on 32bit Rasbian

On a Raspberry Pi 4B running 32bit Raspbian log1 contains:

ERROR! Please define ee_ptr_int to a type that holds a pointer!
2K performance run parameters for coremark.
ERROR: ee_ptr_int is not a datatype that holds an int pointer!
ERROR: Please modify the datatypes in core_portme.h!
CoreMark Size    : 666 
Total ticks      : 18173
Total time (secs): 18.173000
Iterations/Sec   : 11005.337589
Iterations       : 200000
Compiler version : GCC8.3.0
Compiler flags   : -O2 -DPERFORMANCE_RUN=1  -lrt
Memory location  : Please put data memory location here
                        (e.g. code in flash, data on heap etc)
seedcrc          : 0xe9f5
[0]crclist       : 0xe714
[0]crcmatrix     : 0x1fd7
[0]crcstate      : 0x8e3a
[0]crcfinal      : 0x4983
Errors detected

ITERATIONS variable in core_portme.mak not being obeyed

We can easily demonstrate this bug on linux64. Modify linux64/core_portme.mak to add ITERATIONS=123456:

diff --git a/linux64/core_portme.mak b/linux64/core_portme.mak
index 5cfabee..4c7f3f5 100755
--- a/linux64/core_portme.mak
+++ b/linux64/core_portme.mak
@@ -138,3 +138,4 @@ MKDIR = mkdir -p
 # FLAG: PERL
 # Define perl executable to calculate the geomean if running separate.
 PERL=/usr/bin/perl
+ITERATIONS=123456

Then make to compile and run. The log files show that (on my system) it ran 400000 iterations, not 123456 as requested.

The problem is the dollar parenthesis around ITERATIONS in the main Makefile, line 50. After applying the patch from Pull Request #12 , the testcase works as expected, running 123456 iterations.

Ensure prototypes are valid C

ee_u8 check_data_types();

Unlike C++, this function isn't recognised as a valid function prototype - it may take 0 or n arguments. To avoid warnings, this line should be replaced with

ee_u8 check_data_types(void);

Question regarding "Must execute for at least 10 secs for a valid result"

When building and running coremark.exe with gcc on redhat 8.2, sometimes I will get the error of "ERROR! Must execute for at least 10 secs for a valid result!" since the total run time is less than 10 secs.
What does this message mean? Why does the test need to run for at least 10 secs? Could you please help to explain, thanks a lot!

make command

make CC="gcc" PORT_CFLAGS="-O0 -g"

the output of binary

$ ./coremark.exe  0x0 0x0 0x66 0 7 1 2000
2K performance run parameters for coremark.
CoreMark Size    : 666
Total ticks      : 14733
Total time (secs): 14.733000
Iterations/Sec   : 4072.490328
Iterations       : 60000
Compiler version : GCC10.2.0
Compiler flags   : -O0 -g    -lrt
Memory location  : Please put data memory location here
                        (e.g. code in flash, data on heap etc)
seedcrc          : 0xe9f5
[0]crclist       : 0xe714
[0]crcmatrix     : 0x1fd7
[0]crcstate      : 0x8e3a
[0]crcfinal      : 0xbd59
Correct operation validated. See readme.txt for run and reporting rules.
CoreMark 1.0 : 4072.490328 / GCC10.2.0 -O0 -g    -lrt / Heap

$ ./coremark.exe  0x3415 0x3415 0x66 0 7 1 2000
2K validation run parameters for coremark.
CoreMark Size    : 666
Total ticks      : 9786
Total time (secs): 9.786000
Iterations/Sec   : 4087.471899
ERROR! Must execute for at least 10 secs for a valid result!
Iterations       : 40000
Compiler version : GCC10.2.0
Compiler flags   : -O0 -g    -lrt
Memory location  : Please put data memory location here
                        (e.g. code in flash, data on heap etc)
seedcrc          : 0x18f2
[0]crclist       : 0xe3c1
[0]crcmatrix     : 0x0747
[0]crcstate      : 0x8d84
[0]crcfinal      : 0xbe81
Errors detected

Link failed for barebone porting

Hey Guys:
Thanks for this helpful benchmark! It's way more easy to read and port then Drystone.
Anywhere, when I tried to port to our bare mental RISC-V core, I found following issues may need to fixed:

size_t is missing

I believed the 'size_t' is defined in <stddef.h> which is Linux only? for bare mental cores, we need to add a definition for that. My work around is adding following line to barebones/core_portme.h

#define size_t long

CLOCKS_PER_SEC is missing

Also need to add the definition to barebones/core_portme.h

PORT_OBJS is missing

We have defined PORT_SRCS in barebones/core_portme.mak, but did not define PORT_OBJS which is also referenced in top Makefile.

HAS_FLOAT macros check.

In coremark/barebones/core_portme.h we have following configuration switches:

/* Configuration : HAS_FLOAT 
	Define to 1 if the platform supports floating point.
*/
#ifndef HAS_FLOAT 
#define HAS_FLOAT 1
#endif

But we only checks if the HAS_FLOAT defined or not, which is not right because the macros is always defined, but with different values.

We may need to check the macro value other than only check the definition in barebones/ee_printf.c:

#if(HAS_FLOAT==1)
...
#endif

After the aforementioned work around, I was able to compile & link the barebone porting with RISC-V toolchains (https://github.com/ShawnLess/coremark).

Reliability of the result

Hey,

I have been running CoreMark with a simple RISC-V 32-bit core single issue in-order pipeline and the results are quite impressive as it shown below. When comparing with other cores it seems that this one outperform most of the others which does not make much sense due its simple microarchitecture when in comparison with designs like Aria/CV32E40P. What are the places to check if the CoreMark port for this architecture is correct or not?

2K performance run parameters for coremark.
CoreMark Size    : 666
Total ticks      : 667898007
Total time (secs): 13
Iterations/Sec   : 230
Iterations       : 3000
Compiler version : riscv-none-embed-gcc (xPack GNU RISC-V Embedded GCC x86_64) 10.2.0
Compiler flags   : -O0 -g -march=rv32i -mabi=ilp32 -Wall -Wno-unused -ffreestanding --specs=nano.specs -DPRINTF_DISABLE_SUPPORT_FLOAT -DPRINTF_DISABLE_SUPPORT_EXPONENTIAL -DPRINTF_DISABLE_SUPPORT_LONG_LONG -DREAL_UART -Wall -Wno-main -DPERFORMANCE_RUN=1  -O0 -g
Memory location  : STACK
seedcrc          : 0xe9f5
[0]crclist       : 0xe714
[0]crcmatrix     : 0x1fd7
[0]crcstate      : 0x8e3a
[0]crcfinal      : 0xcc42
Correct operation validated. See README.md for run and reporting rules.

In the core_portme.c I am using this

CORETIMETYPE cpu_get_cycle(void) {

  union {
    uint64_t uint64;
    uint32_t uint32[sizeof(uint64_t)/sizeof(uint32_t)];
  } cycles;

  register uint32_t tmp1, tmp2, tmp3;
  while(1) {
    tmp1 = read_csr(0xC80);
    tmp2 = read_csr(0xC00);
    tmp3 = read_csr(0xC80);
    if (tmp1 == tmp3) {
      break;
    }
  }

  cycles.uint32[0] = tmp2;
  cycles.uint32[1] = tmp3;

  return cycles.uint64;
}
/* Define : TIMER_RES_DIVIDER
        Divider to trade off timer resolution and total time that can be
   measured.

        Use lower values to increase resolution, but make sure that overflow
   does not occur. If there are issues with the return value overflowing,
   increase this value.
        */
#define CLOCKS_PER_SEC             50000000
#define GETMYTIME(_t)              (*_t = cpu_get_cycle())
#define MYTIMEDIFF(fin, ini)       ((fin) - (ini))
#define TIMER_RES_DIVIDER          1
#define SAMPLE_TIME_IMPLEMENTATION 1
#define EE_TICKS_PER_SEC           (CLOCKS_PER_SEC / TIMER_RES_DIVIDER)

coremark was impacted by some extra print

I added some print in the "main" function (after the "iterate" function and the total_time calculation, for sure), then I found that the coremark changed.

Is that reasonable ?
I thought it should not impact the coremark since I did not change the iterate process at all.

Thanks.

can the loop index variable use `size_t` type instead of ee_u32?

In the source code, like core_matrix.c, the index variable of a loop always uses a 32bits variable(ee_u32), but I think it should use the standard size_t, because it can reduce displacement on 64 bits machine which doesn't supply 32 bits registers:

void
matrix_mul_vect(ee_u32 N, MATRES *C, MATDAT *A, MATDAT *B)
{
    ee_u32 i, j;
    for (i = 0; i < N; i++)
    {
        C[i] = 0;
        for (j = 0; j < N; j++)
        {
            C[i] += (MATRES)A[i * N + j] * (MATRES)B[j];
        }
    }
}

ee_u32 accessed with %d format specifier

In core_main.c, the results of time_in_secs is printed using the %d format specifier, which handles arguments of type int.
However type secs_ret is type ee_u32, which is not int-sized on 16-bit platforms (for which I'm working on an LLVM port :) ).
I got lucky in that my platform is little endian, so the low order 16-bits that will be read by %d are the ones that I want, and there aren't any arguments after the problematic %d's.

Three ways off the top of my head to fix this:

  1. Cast to int before the format specifier.
  2. Use the C99 format specifier macros for uint32_t.
  3. Create an EEMBC-specific version of the above macro and add it to the port headers.

Amateur asking for help regarding ARM architectures (THIS IS NOT AN ISSUE)

Hi,

I am a newbie, trying to use coremark benchmarks for Raspberry Pi 4. I have never run any benchmarks without GUI interface, so I am having difficulties using the 'run' command. Can you give a small example, on how to use the 'run' command? I have googled it and I did not find any helpful examples. I may be missing something simple, as everyone else seems to be getting it (as there aren't much examples). This is not really an issue, but I could not find, where to ask for help.

Thank You.

Coremark use 2 fewer list items than it can

In core_list_init, we use

list_head *core_list_init(ee_u32 blksize, list_head *memblock, ee_s16 seed) {
	/* calculated pointers for the list */
	ee_u32 per_item=16+sizeof(struct list_data_s);
	ee_u32 size=(blksize/per_item)-2; /* to accomodate systems with 64b pointers, and make sure same code is executed, set max list elements */
	list_head *memblock_end=memblock+size;
	list_data *datablock=(list_data *)(memblock_end);
	list_data *datablock_end=datablock+size;

IIUC, per_item is calculated as 16B for list_head + sizeof struct list_data_s. size = capacity - 2 to provide space for head and tail; However, the value of size being used to compute memblock_end means we can only store capacity-2 elements even including head and tail; This also means the last 2 loop iterations of the code below would silently discard the data since we would be past memblock_end when inserting

	/* create a fake items for the list head and tail */
	list->next=NULL;
	list->info=datablock;
	list->info->idx=0x0000;
	list->info->data16=(ee_s16)0x8080;
	memblock++;
	datablock++;
	info.idx=0x7fff;
	info.data16=(ee_s16)0xffff;
	core_list_insert_new(list,&info,&memblock,&datablock,memblock_end,datablock_end);
	
	/* then insert size items */
	for (i=0; i<size; i++) {
		ee_u16 datpat=((ee_u16)(seed^i) & 0xf);
		ee_u16 dat=(datpat<<3) | (i&0x7); /* alternate between algorithms */
		info.data16=(dat<<8) | dat;		/* fill the data with actual data and upper bits with rebuild value */
		core_list_insert_new(list,&info,&memblock,&datablock,memblock_end,datablock_end);
	}

This isn't a functional bug since core_list_insert_new is robust to the two drops and we don't use the pointer returned by that function in list_init.

I don't understand if this is actually intentional. However, I thought I should bring it up just in case my understanding was incorrect.

The use of DMULTITHREAD

Hello:

What is the point of -DMULTITHREAD? I've tried running the benchmark on my system with this flag set to 1,2,4,16 and getting the same results.
However, the more complex underlined question is how CoreMarks takes into account multiple cores? There is nothing about it in the README or the EEMBC whitepaper.

Thank you.

How to calculate the coremark score i.e. coremark/MHz ?

How to calculate the coremark score from the logs which is generated as a result of coremark run. Can you emphasize more on that because I am able to run the coremark on the embedded device succesfully but not able to infer the score like which variables I need to consider to calculate the coremark/MHz score.

For E.g. what would the coremark score for the following output and how do we calculate the coremark/MHz value for reporting and inference purposes?
CoreMark 1.0 : 6508.490622 / GCC3.4.4 -O2 / Heap

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.