GithubHelp home page GithubHelp logo

mpi-benchmarks's Introduction

Intel(R) MPI Benchmarks

3-Clause BSD License v2021.7


Contents

  • Introduction
  • Product Directories
  • What's New
  • Command-Line Control
  • Building Instructions for Linux* OS
  • Building Instructions for Windows* OS
  • Copyright and License Information
  • Notices & Disclaimers

Introduction

Intel(R) MPI Benchmarks provides a set of elementary benchmarks that conform to MPI-1, MPI-2, and MPI-3 standard. You can run all of the supported benchmarks, or a subset specified in the command line using one executable file. Use command-line parameters to specify various settings, such as time measurement, message lengths, and selection of communicators. For details, see the Intel(R) MPI Benchmarks User's Guide located at: https://www.intel.com/content/www/us/en/docs/mpi-library/user-guide-benchmarks/2021-2/overview.html

By default, Intel(R) MPI Benchmarks is installed at:

  • C:\Program Files (x86)\IntelSWTools\imb on Windows* OS
  • /opt/intel/imb on Linux* OS

Before using the Intel(R) MPI Benchmarks, please read the license agreements located in the imb/license directory.


Product Directories

After a successful installation of Intel(R) MPI Benchmarks, the following files and folders appear on your system:

+-- \imb            Intel(R) MPI Benchmarks product directory
     |
     +-- \src_c             Product source "C" code and Makefiles.
     |
     +-- \license           Product license files.
     |    |              
     |    +--license.txt    Source code license granted to you.
     |    |                             
     |    +--use-of-trademark-license.txt    License file describing the 
     |                                       use of the Intel(R) MPI 
     |                                       Benchmarks name and trademark.
     |
     +-- \src_cpp              Product source "CPP" code and Makefiles. 
     |
     +-- \WINDOWS              Microsoft* Visual Studio* project files. 
     |
     +-- Readme_IMB.txt        Readme file providing the basic information
                               about the product (this file).

What's New

New in Intel(R) MPI Benchmarks 2021.7

  • IMB-MPI1-GPU benchmark. Dynamically loading from LD_LIBRARY_PATH cuda or level zero library.

New in Intel(R) MPI Benchmarks 2021.6

  • Bug fixes.

New in Intel(R) MPI Benchmarks 2021.5

  • License update
  • Bug fixes.

New in Intel(R) MPI Benchmarks 2021.4

  • Bug fixes.

New in Intel(R) MPI Benchmarks 2021.3

  • Change default value for mem_alloc_type to device
  • License update
  • Bug fixes.

New in Intel(R) MPI Benchmarks 2021.2

  • New IMB-MPI1-GPU benchmarks (Technical Preview). The benchmarks implement the GPU version of the IMB-MPI1
  • Added -msg_pause option.
  • Changed default window_size 64 -> 256
  • Bug fixes.

New in Intel(R) MPI Benchmarks 2021.1

  • Added -window_size option for IMB-MPI1
  • Added copyrights for *.exe
  • Bug fixes.

New in Intel(R) MPI Benchmarks 2019 Update 6

  • New IMB-P2P Stencil2D and Stencil3D benchmarks.
  • Bug fixes.

New in Intel(R) MPI Benchmarks 2019 Update 5

  • Added Visual Studio projects for IMB-P2P
  • Bug fixes.

New in Intel(R) MPI Benchmarks 2019 Update 4

  • Bug fixes.

New in Intel(R) MPI Benchmarks 2019 Update 3

  • Added the warm_up option that enabled additional cycles before running benchmark(for all size.)
  • Added a new benchmark Reduce_local for IMB-MPI1.
  • Bug fixes.

New in Intel(R) MPI Benchmarks 2019 Update 2

  • New IMB-P2P benchmarks.
  • Added the Reduce_local benchmark for IMB-MPI1.
  • Deleted the alignment option (-alignment).
  • Bug fixes.
  • Code cleanup.

New in Intel(R) MPI Benchmarks 2019 Update 1

  • Added the Reduce_scatter_block benchmark for IMB-MPI1.
  • Added the aggregate_mode option that specifies the mode for IMB-IO, IMB-EXT and IMB-RMA.
  • Added the alignment option that controls buffer alignment.
  • Updated the following options:
    • -data_type now supports double.
    • -red_data_type now supports double.

New in Intel(R) MPI Benchmarks 2019

  • New IMB-MT benchmarks. The benchmarks implement the multi-threaded version of IMB-MPI1 benchmarks using the OpenMP* paradigm.
  • New benchmarks infrastructure for easier benchmarks extension is implemented in C++ (See the guide: https://www.intel.com/content/www/us/en/developer/articles/technical/creating-custom-benchmarks-for-imb-2019.html?wapkw=creating-custom-benchmarks-for-imb-2019). The IMB-MPI1, IMB-RMA, IMB-NBC, IMB-EXT, IMB-IO, and IMB-MT implementation is now based on the new C++ infrastructure. The legacy infrastructure is preserved in the src_c subdirectory.
  • Syntax changes for the -include and -exclude options. Benchmarks to include and exclude now must be separated by a comma rather than a space. Benchmarks to launch can be separated by a comma or a space.
  • Iteration policy can no longer be set with the -iter option. Use the -iter_policy instead.
  • Added a new benchmark BarrierMT for IMB-MT.
  • Added new options:
    • -noheader for IMB-MT disables printing of benchmark headers.
    • -data_type for IMB-MPI1 specifies the type to be used for communication.
    • -red_data_type for IMB-MPI1 specifies the type to be used for reduction.
    • -contig_type for IMB-MPI1 specifies the type to be used.
    • -zero_size for IMB-MPI1 disable runs with message size 0.
  • Bug fixes.
  • Code cleanup.

New in Intel(R) MPI Benchmarks 2018 Update 1

  • Support for the Microsoft* Visual Studio* 2017. Microsoft* Visual Studio* 2012 support is removed.

New in Intel(R) MPI Benchmarks 2018

New in Intel(R) MPI Benchmarks 2017 Update 1

  • Added a new option -imb_barrier.
  • The PingPong and PingPing benchmarks are now equivalent to PingPongSpecificSource and PingPingSpecificSource, respectively. Their old behavior (with MPI_ANY_SOURCE) is available in PingPongAnySource and PingPingAnySource.

New in Intel(R) MPI Benchmarks 2017

  • Changed default values for the -sync and -root_shift options.
  • Support for the Microsoft* Visual Studio* 2015. Microsoft* Visual Studio* 2010 support is removed.
  • Bug fixes.

New in Intel(R) MPI Benchmarks 4.1 Update 1

  • Bug fixes.

New in Intel(R) MPI Benchmarks 4.1

  • Introduced two new benchmarks: uniband and biband.
  • Introduced two new command-line options for collective benchmarks: -sync and -root_shift.

New in Intel(R) MPI Benchmarks 4.0 Update 2

  • Fix of a bug where benchmarking was failing on certain message lengths with -DCHECK.

New in Intel(R) MPI Benchmarks 4.0 Update 1

  • Fix of a bug where benchmarking could continue after the time limit is exceeded.

New in Intel(R) MPI Benchmarks 4.0

  • Introduced new components IMB-NBC and IMB-RMA that conform to the MPI-3.0 standard. Note: These components can only be built and used with MPI libraries that conform to the MPI-3 standard.
  • Added new targets to the Linux* OS Makefiles:
    • NBC for building IMB-NBC
    • RMA for building IMB-RMA
  • Updated Microsoft* Visual Studio* solutions to include the IMB-NBC and IMB-RMA targets.
  • Consolidated all first-use documents in ReadMe_IMB.txt to improve usability.
  • Introduced a new feature to set the appropriate algorithm for automatic calculation of iterations. The algorithm can be set through the -iter and -iter_policy options.
  • Support for the Microsoft* Visual Studio* 2013. Microsoft* Visual Studio* 2008 support is removed.

Command-Line Control

You can get help on the Intel(R) MPI Benchmarks from the command line using the component name and the -help parameter. For example, for the IMB-MPI1 component, run: IMB-MPI1 -help

You can see the Intel(R) MPI Benchmarks User's Guide for details on the command-line parameters.


Building Instructions for Linux* OS

  1. Set the CC variable to point to the appropriate compiler wrapper, mpiicc or mpicc.

  2. Run one or more Makefile commands below:

    make clean - remove legacy binary object files and executable files make IMB-MPI1 - build the executable file for the IMB-MPI1 component make IMB-EXT - build the executable file for one-sided communications benchmarks make IMB-IO - build the executable file for I/O benchmarks make IMB-NBC - build the executable file for IMB-NBC benchmarks make IMB-RMA - build the executable file for IMB-RMA benchmarks make all - build all executable files available

  3. Run the benchmarks as follows:

    mpirun -n <number_of_processes> IMB- [arguments]

    where is one of the make targets above. For details, refer to the Intel(R) MPI Benchmarks User's Guide at: https://www.intel.com/content/www/us/en/docs/mpi-library/user-guide-benchmarks/2021-2/overview.html


Building Instructions for Windows* OS

Use the enclosed solution files located in the component-specific subdirectories under the imb/WINDOWS directory. Click on the respective ".vcproj" or ".vcxproj" project file and use the Microsoft* Visual Studio* menu to run the associated benchmark application.

Building "x64" Executable Files

  1. Check that the Include, Lib, and Path environment variables are set as follows: %I_MPI_ROOT%\intel64\include %I_MPI_ROOT%\intel64\lib %I_MPI_ROOT%\mpi\intel64\bin The %I_MPI_ROOT% environment variable is set to the Intel(R) MPI Library installation directory.

  2. Open the ".vcproj" or ".vcxproj" file for the component you would like to build. From the Visual Studio Project panel: a) Change the "Solution Platforms" dialog box to "x64". b) Change the "Solution Configurations" dialog box to "Release". c) Check other settings as required, for example: General > Project Defaults - Set "Character Set" to "Use Multi-Byte Character Set" C/C++ > General - Set "Additional Include Directories" to "$(I_MPI_ROOT)\intel64\include" - Set "Warning Level" to "Level 1 (/W1)" C/C++ > Preprocessor - For the "Preprocessor definitions" within the Visual Studio projects, add the conditional compilation macros WIN_IMB and _CRT_SECURE_NO_DEPRECATE. Depending on the components you intend to use, add one or more of the following macros: MPI1, EXT, MPIIO, NBC, RMA. Linker > Input - Set "Additional Dependencies" to "$(I_MPI_ROOT)\intel64\lib\impi.lib". Make sure to add quotes.

  3. Use F7 or Build > Build Solution to create an executable.

    For details, refer to the Intel(R) MPI Benchmarks User's Guide at: https://www.intel.com/content/www/us/en/docs/mpi-library/user-guide-benchmarks/2021-2/overview.html


Copyright and Licenses

See the license files in the imb/license directory.


Notices & Disclaimers

Intel technologies may require enabled hardware, software or service activation.

No product or component can be absolutely secure.

Your costs and results may vary.

© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.


Security Policy

See SECURITY for more information.

mpi-benchmarks's People

Contributors

alexsoll avatar arthinag avatar atimnov avatar avmedvedev avatar ddurnov avatar dsolovyev avatar irozanova avatar juliars avatar marat-shamshetdinov avatar mkitez avatar mshiryaev avatar nikitaxgusev avatar oreunova avatar osudakov avatar rdower avatar sergeygubanov avatar vadimkutovoi avatar vinnitskiv avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mpi-benchmarks's Issues

Compiling IMB with -DCHECK gives error

I want to compile IMB with MVAPICH2-2.1 with -DCHECK option enabled.
I have got 2 issues here.

  1. After exporting CPPFLAGS = -DCHECK and running make, it gives the following error.

make -j8 -C src_cpp -f Makefile TARGET=MPI1
make[1]: Entering directory '/home/jyoti/Desktop/mpi-benchmarks-master/src_cpp'
mpiicpc -DCHECK -Ihelpers -I../src_c -DMPI1 -I. -O0 -Wall -Wextra -pedantic -Wno-long-long -c -o imb.o imb.cpp
mpiicpc -DCHECK -Ihelpers -I../src_c -DMPI1 -I. -O0 -Wall -Wextra -pedantic -Wno-long-long -c -o args_parser.o args_parser.cpp
make[1]: mpiicpc: Command not found
/bin/sh: 1: [: unexpected operator
make[1]: mpiicpc: Command not found
Makefile:181: recipe for target 'imb.o' failed
make[1]: *** [imb.o] Error 127
make[1]: *** Waiting for unfinished jobs....
Makefile:181: recipe for target 'args_parser.o' failed
make[1]: *** [args_parser.o] Error 127
/bin/sh: 1: [: unexpected operator
make[1]: Leaving directory '/home/jyoti/Desktop/mpi-benchmarks-master/src_cpp'
Makefile:53: recipe for target 'IMB-MPI1' failed
make: *** [IMB-MPI1] Error 2

mpiicpc belongs to IntelMPI but I want to compile it with MVAPICH2-2.1.
How can I enable DCHECK?

  1. I want to use the files in src_c only.
    So when I export CFLAGS = -DCHECK and do make in src_c folder, it gives the following error.

make -f Makefile TARGET=MPI1
make[1]: Entering directory '/home/jyoti/Desktop/mpi-benchmarks-master/src_c'
mkdir -p build_MPI1
mpicc -DCHECK -DMPI1 -c IMB.c -o build_MPI1/IMB.o
IMB.c: In function ‘main’:
IMB.c:391:8: error: expected ‘(’ before ‘num_alloc’
if num_alloc == num_free)
^
IMB.c:391:29: error: expected statement before ‘)’ token
if num_alloc == num_free)
^
IMB.c:393:5: error: ‘else’ without a previous ‘if’
else {
^
Makefile:112: recipe for target 'build_MPI1/IMB.o' failed
make[1]: *** [build_MPI1/IMB.o] Error 1
make[1]: Leaving directory '/home/jyoti/Desktop/mpi-benchmarks-master/src_c'
Makefile:87: recipe for target 'all' failed
make: *** [all] Error 2

This seems to be an error in the code.

How can I resolve these?

Thanks

Signal 11 Seg Fault at end of run

Hello I am trying to do tests with OpenMPI v4.0.0 and was having issues with IMB v2019.1 release and was told by the OpenMPI devs to use this commit as a workaround: 841446d. This worked fine until the very end when it does what I'm guessing is a cleanup step and will seg fault on one or two machines. Is there any way to get more output for the end of the run? I tried using '-v' but I got nothing more out of it?

Command used:

mpirun -v --mca btl_openib_warn_no_device_params_found 0 --map-by node --mca orte_base_help_aggregate 0  --mca btl openib,vader,self --mca pml ob1 --mca btl_openib_allow_ib 1 -np 8 -hostfile /home/aleblanc/ib-mpi-hosts IMB-MPI1

Output:

#----------------------------------------------------------------
# Benchmarking Bcast                                                                                                                                                                                                               [1055/98325]
# #processes = 6 
#----------------------------------------------------------------
       #bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
            0         1000         0.11         0.12         0.11
            1         1000         1.72         7.06         4.86
            2         1000         1.72         6.85         4.80
            4         1000         1.71         6.92         4.78
            8         1000         1.76         7.12         4.91
           16         1000         1.76         7.18         4.89
           32         1000         1.74         7.17         4.87
           64         1000         1.81         7.58         5.13
          128         1000         1.80         9.27         6.16
          256         1000         1.84         9.54         6.34
          512         1000         2.15        10.70         7.22
         1024         1000         2.35        11.70         7.92
         2048         1000         2.21        15.09        10.10
         4096         1000         3.62        17.32        12.54
         8192         1000         6.17        23.32        17.99
        16384         1000        11.24        37.28        28.67
        32768         1000        62.61        80.91        71.06
        65536          640       109.31       131.24       120.22
       131072          320       225.50       236.59       231.80
       262144          160       430.89       449.17       442.21
       524288           80       406.54       453.22       430.84
      1048576           40       811.17       878.36       842.89
      2097152           20      1788.67      1886.04      1824.92
      4194304           10      2899.46      3183.22      3073.55

#---------------------------------------------------
# Benchmarking Barrier 
# #processes = 2 
# ( 4 additional processes waiting in MPI_Barrier)
#---------------------------------------------------
 #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
         1000         2.30         2.30         2.30

#---------------------------------------------------
# Benchmarking Barrier 
# #processes = 4 
# ( 2 additional processes waiting in MPI_Barrier)
#---------------------------------------------------
 #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
         1000         4.87         4.87         4.87

#---------------------------------------------------
# Benchmarking Barrier 
# #processes = 6                                                                                                                                                                                                                   [1008/98325]
#---------------------------------------------------
 #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
         1000         8.54         8.54         8.54


# All processes entering MPI_Finalize

[titan:08194] *** Process received signal ***
[titan:08194] Signal: Segmentation fault (11)
[titan:08194] Signal code: Address not mapped (1)
[titan:08194] Failing at address: 0x10
[titan:08194] [ 0] /lib64/libpthread.so.0(+0xf680)[0x7f0218104680]
[titan:08194] [ 1] /opt/openmpi/4.0.0/lib/libopen-pal.so.40(+0x2a865)[0x7f021777f865]
[titan:08194] [ 2] /opt/openmpi/4.0.0/lib/openmpi/mca_rcache_grdma.so(+0x1fd9)[0x7f020b9defd9]
[titan:08194] [ 3] /opt/openmpi/4.0.0/lib/libopen-pal.so.40(mca_rcache_base_module_destroy+0x8f)[0x7f021781d55f]
[titan:08194] [ 4] /opt/openmpi/4.0.0/lib/openmpi/mca_btl_openib.so(+0xeba7)[0x7f020ac73ba7]
[titan:08194] [ 5] /opt/openmpi/4.0.0/lib/openmpi/mca_btl_openib.so(mca_btl_openib_finalize+0x601)[0x7f020ac6ef91]
[titan:08194] [ 6] /opt/openmpi/4.0.0/lib/libopen-pal.so.40(+0x76213)[0x7f02177cb213]
[titan:08194] [ 7] /opt/openmpi/4.0.0/lib/libopen-pal.so.40(mca_base_framework_close+0x79)[0x7f02177b5799]
[titan:08194] [ 8] /opt/openmpi/4.0.0/lib/libopen-pal.so.40(mca_base_framework_close+0x79)[0x7f02177b5799]
[titan:08194] [ 9] /opt/openmpi/4.0.0/lib/libmpi.so.40(ompi_mpi_finalize+0x86f)[0x7f0218367c1f]
[titan:08194] [10] IMB-MPI1[0x4025d4]
[titan:08194] [11] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7f0217d473d5]
[titan:08194] [12] IMB-MPI1[0x401d59]
[titan:08194] *** End of error message ***
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
[pandora:13903] *** Process received signal ***
[pandora:13903] Signal: Segmentation fault (11)
[pandora:13903] Signal code: Address not mapped (1)
[pandora:13903] Failing at address: 0x10
[pandora:13903] [ 0] /lib64/libpthread.so.0(+0xf680)[0x7f68ee599680]
[pandora:13903] [ 1] /opt/openmpi/4.0.0/lib/libopen-pal.so.40(+0x2a865)[0x7f68edc14865]
[pandora:13903] [ 2] /opt/openmpi/4.0.0/lib/openmpi/mca_rcache_grdma.so(+0x1fd9)[0x7f68e1b8bfd9]
[pandora:13903] [ 3] /opt/openmpi/4.0.0/lib/libopen-pal.so.40(mca_rcache_base_module_destroy+0x8f)[0x7f68edcb255f]
[pandora:13903] [ 4] /opt/openmpi/4.0.0/lib/openmpi/mca_btl_openib.so(+0xeba7)[0x7f68e1548ba7]
[pandora:13903] [ 5] /opt/openmpi/4.0.0/lib/openmpi/mca_btl_openib.so(mca_btl_openib_finalize+0x601)[0x7f68e1543f91]
[pandora:13903] [ 6] /opt/openmpi/4.0.0/lib/libopen-pal.so.40(+0x76213)[0x7f68edc60213]
[pandora:13903] [ 7] /opt/openmpi/4.0.0/lib/libopen-pal.so.40(mca_base_framework_close+0x79)[0x7f68edc4a799]
[pandora:13903] [ 8] /opt/openmpi/4.0.0/lib/libopen-pal.so.40(mca_base_framework_close+0x79)[0x7f68edc4a799]
[pandora:13903] [ 9] /opt/openmpi/4.0.0/lib/libmpi.so.40(ompi_mpi_finalize+0x86f)[0x7f68ee7fcc1f]
[pandora:13903] [10] IMB-MPI1[0x4025d4]
[pandora:13903] [11] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7f68ee1dc3d5]
[pandora:13903] [12] IMB-MPI1[0x401d59]
[pandora:13903] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 2 with PID 8194 on node titan-ib exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------

IMB-RMA (c++ version) Truly_passive_put t_ovrl results in all zeros

In IMB-RMA (c++ version), I found a bug in the aggregation part of the benchmark results,
where the t_ovrl results for Truly_passive_put are all 0.
I will create a pull request.
Below are the results when run on OpenMPI v4.1.5.
The problem only occurs on the c++ version of IMB-RMA.

steps to reproduce

$ mpirun --host bnode120:1,bnode119:1 -np 2 ./IMB-RMA Truly_passive_put
#----------------------------------------------------------------
#    Intel(R) MPI Benchmarks 2021.3, MPI-RMA part
#----------------------------------------------------------------
# Date                  : Mon Oct  2 21:18:40 2023
# Machine               : x86_64
# System                : Linux
# Release               : 5.4.0-144-generic
# Version               : #161-Ubuntu SMP Fri Feb 3 14:49:04 UTC 2023
# MPI Version           : 3.1
# MPI Thread Environment:


# Calling sequence was:

# ./IMB-RMA Truly_passive_put

# Minimum message length in bytes:   0
# Maximum message length in bytes:   4194304
#
# MPI_Datatype                   :   MPI_BYTE
# MPI_Datatype for reductions    :   MPI_FLOAT
# MPI_Op                         :   MPI_SUM
#
#

# List of Benchmarks to run:

# Truly_passive_put
#     The benchmark measures execution time of MPI_Put for 2 cases:
#     1) The target is waiting in MPI_Barrier call (t_pure value)
#     2) The target performs computation and then enters MPI_Barrier routine (t_ovrl value)

#---------------------------------------------------
# Benchmarking Truly_passive_put
# #processes = 2
#---------------------------------------------------
       #bytes #repetitions t_pure[usec] t_ovrl[usec]
            0         1000         1.80         0.00
            1         1000         3.19         0.00
            2         1000         3.16         0.00
            4         1000         3.15         0.00
            8         1000         3.15         0.00
           16         1000         3.16         0.00
           32         1000         3.17         0.00
           64         1000         3.18         0.00
          128         1000         3.19         0.00
          256         1000         3.31         0.00
          512         1000         3.32         0.00
         1024         1000         3.37         0.00
         2048         1000         3.52         0.00
         4096         1000         4.40         0.00
         8192         1000         5.05         0.00
        16384         1000         4.50         0.00
        32768         1000         5.13         0.00
        65536          640         6.47         0.00
       131072          320         9.51         0.00
       262144          160        15.10         0.00
       524288           80        25.63         0.00
      1048576           40        46.72         0.00
      2097152           20       178.16         0.00
      4194304           10       174.46         0.00


# All processes entering MPI_Finalize

IMB-RMA (C version) has no problem

$ mpirun --host bnode120:1,bnode119:1 -np 2 ./IMB-RMA Truly_passive_put
#----------------------------------------------------------------
#    Intel(R) MPI Benchmarks 2018, MPI-RMA part
#----------------------------------------------------------------
# Date                  : Mon Oct  2 21:20:14 2023
# Machine               : x86_64
# System                : Linux
# Release               : 5.4.0-144-generic
# Version               : #161-Ubuntu SMP Fri Feb 3 14:49:04 UTC 2023
# MPI Version           : 3.1
# MPI Thread Environment:


# Calling sequence was:

# ./IMB-RMA Truly_passive_put

# Minimum message length in bytes:   0
# Maximum message length in bytes:   4194304
#
# MPI_Datatype                   :   MPI_BYTE
# MPI_Datatype for reductions    :   MPI_FLOAT
# MPI_Op                         :   MPI_SUM
#
#

# List of Benchmarks to run:

# Truly_passive_put
#     Comments on this Benchmark:
#     The benchmark measures execution time of MPI_Put for 2 cases:
#     1) The target is waiting in MPI_Barrier call (t_pure value)
#     2) The target performs computation and then enters MPI_Barrier routine (t_ovrl value)

#---------------------------------------------------
# Benchmarking Truly_passive_put
# #processes = 2
#---------------------------------------------------
       #bytes #repetitions t_pure[usec] t_ovrl[usec]
            0         1000         1.77         2.64
            1         1000         3.13         4.62
            2         1000         3.12         4.61
            4         1000         3.12         4.61
            8         1000         3.12         4.60
           16         1000         3.12         4.61
           32         1000         3.15         4.63
           64         1000         3.13         4.62
          128         1000         3.18         4.66
          256         1000         3.30         4.90
          512         1000         3.31         4.90
         1024         1000         3.38         4.95
         2048         1000         3.50         5.13
         4096         1000         4.33         5.94
         8192         1000         5.11         6.71
        16384         1000         4.49         6.11
        32768         1000         5.18         6.84
        65536          640         6.51         8.10
       131072          320         9.52        11.07
       262144          160        15.11        16.68
       524288           80        25.70        27.18
      1048576           40        47.20        48.67
      2097152           20        89.51        90.77
      4194304           10       174.41       175.64


# All processes entering MPI_Finalize

Error observed while running IMB with -DCHECK option

I am using ‘Intel MPI Benchmarks 2019 update 2’ with -DCHECK option enabled only with the C source files.
The benchmark fails with data check error (sample error given below) when tried with shared memory, sockets, psm2 and dapl.

==================start of error======================================
#-----------------------------------------------------------------------------
#- Benchmarking Reduce_scatter
#- #processes = 16
#-----------------------------------------------------------------------------
#bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] defects
0 1000 0.15 0.27 0.19 0.00
15: Error Reduce_scatter,size = 4,sample #0
Process 15: Got invalid buffer:
Buffer entry: 13.600000
pos: 0
Process 15: Expected buffer:
Buffer entry: 253.600006
4 1000 1.57 4.66 2.41 0.00
Application error code 1 occurred
application called MPI_Abort(MPI_COMM_WORLD, 16) - process 15
===================end of error=====================================

Following are the steps I used to install IMB.

  1. downloaded mpi-benchmarks-master.zip from GitHub and extracted it using unzip command.
  2. cd imb/src_c
  3. export CFLAGS = -DCHECK
  4. make

Following are the errors in detail.

  1. When running it with ‘MPICH-3.3’ over shared memory, it fails at ‘Reduce_scatter’ for sample size 4.
    When running it over TCP, it fails at the same place.
    OS version ‘CentOS Linux release 7.6.1810 (Core)’.

Same is the case with ‘Intel MPI Library 2017 Update 3 for Linux’ over shared memory (default), ofi (I_MPI_FABRICS=ofi) and dapl (I_MPI_FABRICS=dapl).
OS version ‘CentOS Linux release 7.3.1611 (Core)’.

  1. In the file ‘IMB_settings.h’, I changed the ‘#define BUFFERS_FLOAT’ to ‘#define BUFFERS_INT’ to check for integer type values and compiled it again.

Keeping the environment and test cases same, it fails at ‘Allreduce’ for sample size 4.

Also, even when the benchmark fails, the ‘defects’ column entry shows 0.00 which means the benchmark was successful whereas it was not.

If I use it without the -DCHECK option enabled, the benchmark completes successfully.

Can someone comment on these observations ?

Undeclared IMB_Barrier() warnings

Compiling the v2019.1 tarball results in many warnings about IMB_Barrier() not being declared. For example:

In file included from IMB_allreduce.c:73:
IMB_allreduce.c: In function ‘IMB_allreduce’:
IMB_declare.h:257:17: warning: implicit declaration of function ‘IMB_Barrier’; did you mean ‘IMB_barrier’? [-Wimplicit-function-declaration]
                 IMB_Barrier(comm);          \
                 ^~~~~~~~~~~
IMB_allreduce.c:150:9: note: in expansion of macro ‘IMB_do_n_barriers’
         IMB_do_n_barriers(c_info->communicator, N_BARR);
         ^~~~~~~~~~~~~~~~~

dodgy memsets

GCC issues these warnings with the 2019.6 release which look valid:

helpers/helper_IMB_functions.h:594:40: warning: 'memset' used with length equal to number of elements without multiplication by element size [-Wmemset-elt-size]
             memset(time, 0, MAX_TIME_ID);
                                        ^
../src_c/IMB_mem_manager.c:707:9: warning: 'memset' used with length equal to number of elements without multiplication by element size [-Wmemset-elt-size]
         memset(time, 0, MAX_TIME_ID);
         ^~~~~~

MPI Sendrecv and Exchange performance

Hello,

Regarding the performance of Sendrecv and Exchange of MPI, the performance of Mbytes/sec drops from 1048576 bytes.
Can you tell me why this occurs?

The results with MOFED 5.4-1.0.3.0 are shown here,
Similar results were seen with MOFED5.2-1.0.4.0.

When testing with #osu_mbw_mr which is similar with IMB, there is no such issue, so it seems the issue caused by IMB itself.

■ MPI result (excerpt from log file)

Sendrecv:
#bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] Mbytes/sec

262144 160 30.48 33.48 32.24 15660.37
524288 80 46.49 59.52 54.79 17616.61
1048576 40 164.35 258.68 217.49 8107.16 ★←Drops from here
2097152 20 824.89 936.35 893.57 4479.40
4194304 10 2010.12 3076.62 2165.94 2726.56

Exchange:
#bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] Mbytes/sec

262144 160 82.66 92.93 87.85 11283.53
524288 80 149.78 169.20 157.50 12394.29
1048576 40 581.32 797.35 711.64 5260.33 ★←drops from here
2097152 20 1880.77 2131.25 1996.31 3936.01
4194304 10 4011.95 4440.56 4226.54 3778.18

■Setup:
 ・OFED:MLNX_OFED_LINUX-5.4-1.0.3.0
 ・HCA: Nvidia ConnectX-6 HDR100 (FW: 20.31.1014)
 ・IBSW:QM8700 (MLNX-OS:3.9.2400)

Best regards,
Shinto

License update

It seems that license update wasn't done in full. As an example, source files still contain text headers which cite CPL license. README.md also includes CPL badge for a license file. Since there is at least one active project forked from this source tree, could you please make licensing information consistent or explain why licensing info is arranged like this?

Having trouble compiling the suite on either RHEL or SuSE, using OMPI 3.1.4 or 4.0.2

I'm getting a variety of link errors when I use

make CC=mpicc CXX=mpicc 2>&1 | tee make.log

I've tried building with OMPI 3.1.4 and 4.0.2 on both SLES 12.3, RHEL 7.6 and RHEL 8.0.

The exact link error varies depending on the environment. For SLES 12.3 I'm seeing:

ope.o MPI1/MPI1_suite.o MPI1/MPI1_benchmark.o benchmark_suites_collection.o MPI1/IMB_allgather.o MPI1/IMB_allgatherv.o MPI1/IMB_allreduce.o MPI1/IMB_alltoall.o MPI1/IMB_alltoallv.o MPI1/IMB_bandwidth.o MPI1/IMB_barrier.o MPI1/IMB_bcast.o MPI1/IMB_benchlist.o MPI1/IMB_chk_diff.o MPI1/IMB_cpu_exploit.o MPI1/IMB_declare.o MPI1/IMB_err_handler.o MPI1/IMB_exchange.o MPI1/IMB_gather.o MPI1/IMB_gatherv.o MPI1/IMB_g_info.o MPI1/IMB_init.o MPI1/IMB_init_transfer.o MPI1/IMB_mem_manager.o MPI1/IMB_output.o MPI1/IMB_parse_name_mpi1.o MPI1/IMB_pingping.o MPI1/IMB_pingpong.o MPI1/IMB_reduce.o MPI1/IMB_reduce_local.o MPI1/IMB_reduce_scatter.o MPI1/IMB_reduce_scatter_block.o MPI1/IMB_scatter.o MPI1/IMB_scatterv.o MPI1/IMB_sendrecv.o MPI1/IMB_strgs.o MPI1/IMB_utils.o MPI1/IMB_warm_up.o /usr/lib64/gcc/x86_64-suse-linux/4.8/../../../../x86_64-suse-linux/bin/ld: MPI1/MPI1_suite.o: undefined reference to symbol 'floor@@GLIBC_2.2.5' /lib64/libm.so.6: error adding symbols: DSO missing from command line collect2: error: ld returned 1 exit status Makefile:180: recipe for target 'IMB-MPI1' failed

On RHEL I see a variety of undefined references including
benchmark_suites_collection.cpp:(.text._ZN14BenchmarkSuiteIL17benchmark_suite_t6EE12get_instanceEv[_ZN14BenchmarkSuiteIL17benchmark_suite_t6EE12get_instanceEv]+0x1b): undefined reference to operator new(unsigned long)'
benchmark_suites_collection.o: In function BenchmarkSuite<(benchmark_suite_t)6>::do_register_elem(Benchmark const*)': benchmark_suites_collection.cpp:(.text._ZN14BenchmarkSuiteIL17benchmark_suite_t6EE16do_register_elemEPK9Benchmark[_ZN14BenchmarkSuiteIL17benchmark_suite_t6EE16do_register_elemEPK9Benchmark]+0x8b): undefined reference to operator new(unsigned long)'
benchmark_suites_collection.cpp:(.text._ZN14BenchmarkSuiteIL17benchmark_suite_t6EE16do_register_elemEPK9Benchmark[_ZN14BenchmarkSuiteIL17benchmark_suite_t6EE16do_register_elemEPK9Benchmark]+0x14b): undefined reference to std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::~basic_string()' benchmark_suites_collection.cpp:(.text._ZN14BenchmarkSuiteIL17benchmark_suite_t6EE16do_register_elemEPK9Benchmark[_ZN14BenchmarkSuiteIL17benchmark_suite_t6EE16do_register_elemEPK9Benchmark]+0x15c): undefined reference to std::__cxx11::basic_string<char, std::char_traits, std::allocator >::~basic_string()'
benchmark_suites_collection.o:(.rodata._ZTI14BenchmarkSuiteIL17benchmark_suite_t6EE[_ZTI14BenchmarkSuiteIL17benchmark_suite_t6EE]+0x0): undefined reference to vtable for __cxxabiv1::__si_class_type_info'

I don't see the issue when I'm using mvapich, but it's OMPI that I'm trying to test.

testing over hfi1 fails with "mca_sharedfp_lockedfile_file_open: Error during file open"

I'm attempting to run some basic tests over a pair of hfi1-equipped hosts using openmpi, and quite a few of them are failing with similar output:

[root@rdma-dev-15 ~]$ mpirun -hostfile /root/hfile_one_core -np 2 --allow-run-as-root --map-by node -mca btl_openib_if_include hfi1_0 -mca pml cm -mca mtl psm2 -x PSM2_PKEY=0x8020 mpitests-IMB-IO C_Read_Shared -time 1.5
#------------------------------------------------------------
#    Intel(R) MPI Benchmarks 2019 Update 1, MPI-IO part
#------------------------------------------------------------
# Date                  : Thu Dec  6 15:52:19 2018
# Machine               : x86_64
# System                : Linux
# Release               : 4.18.0-47.el8.x86_64
# Version               : #1 SMP Thu Nov 29 19:43:32 UTC 2018
# MPI Version           : 3.1
# MPI Thread Environment:


# Calling sequence was:

# mpitests-IMB-IO C_Read_Shared -time 1.5

# Minimum io portion in bytes:   0
# Maximum io portion in bytes:   4194304
#
#
#

# List of Benchmarks to run:

# C_Read_Shared

#-----------------------------------------------------------------------------
# Benchmarking C_Read_Shared
# #processes = 1
# ( 1 additional process waiting in MPI_Barrier)
#-----------------------------------------------------------------------------
#
#    MODE: AGGREGATE
#
       #bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]   Mbytes/sec
            0         1000         0.13         0.13         0.13         0.00
            1         1000         0.94         0.94         0.94         1.06
            2         1000         0.94         0.94         0.94         2.14
            4         1000         0.94         0.94         0.94         4.25
            8         1000         0.95         0.95         0.95         8.46
           16         1000         0.94         0.94         0.94        17.00
           32         1000         0.94         0.94         0.94        33.98
           64         1000         0.95         0.95         0.95        67.44
          128         1000         0.96         0.96         0.96       133.60
          256         1000         0.96         0.96         0.96       267.88
          512         1000         0.97         0.97         0.97       525.62
         1024         1000         0.99         0.99         0.99      1033.96
         2048         1000         1.09         1.09         1.09      1886.29
         4096         1000         1.22         1.22         1.22      3351.03
         8192         1000         1.66         1.66         1.66      4944.17
        16384         1000         2.70         2.70         2.70      6061.39
        32768         1000         4.55         4.55         4.55      7195.70
        65536          640         8.36         8.36         8.36      7840.92
       131072          320        16.37        16.37        16.37      8008.80
       262144          160        33.23        33.23        33.23      7889.39
       524288           80        65.20        65.20        65.20      8041.71
      1048576           40       128.37       128.37       128.37      8168.12
      2097152           20       256.37       256.37       256.37      8180.12
      4194304           10       532.93       532.93       532.93      7870.25
[rdma-dev-16:20799] [1]mca_sharedfp_lockedfile_file_open: Error during file open

From a quick little bit of debugging, I know this is from the second instance of this error message in mca_sharedfp_lockedfile_file_open in the openmpi code, not the initial one, but I haven't gotten any further than that, not sure if the bug is in openmpi or the tests, and not sure where to look next.

IMB-NBC Ireduce_scatter in check mode fails, while it should not

When run in check mode, IMB-NBC Ireduce_scatter fails, whether it is run with Intel MPI or OpenMPI.
The output shows 2 problems:

  1. failure during check, while it shouldn't
  2. IMB tries to allocate a huuuuuuuuuuge amount of memory ("tried to alloc. 18446744071617404644 bytes").
    Traces are attached (trc_ireduce_scatter.txt)

The problem comes from a wrong initialization in the routine IMB_ireduce_scatter().
Attached patch fixes the issue
trc_ireduce_scatter.txt

imb2018.patch.txt

IMB-MPI1 2021.3 doesn't compile

Recently, some C++ parts have been added to IMB which introduces name mangling to the link step. The linker now complains that IMB_Barrier is listed “Extern C”in IMB_prototypes.h, but not in IMB_declare.h. The declaration in IMB_declare.h, however, is not really needed, as only the string IMB_Barrier is used in a subsequent CPP macro definition.Commenting this declaration out fixes the problem.The fix in git diff format is attached below. Please consider adopting this patch in future versions.

diff --git a/src_c/IMB_declare.h b/src_c/IMB_declare.h
index 8425545..49f749d 100644
--- a/src_c/IMB_declare.h
+++ b/src_c/IMB_declare.h
@@ -249,7 +249,7 @@ extern int IMB_internal_barrier;
(B) = (type*) IMB_v_alloc(sizeof(type)*(Len), where);
}

-void IMB_Barrier(MPI_Comm comm);
+// void IMB_Barrier(MPI_Comm comm);
#define IMB_do_n_barriers(comm, iter)
{
int _ii; \

Proper process distribution for parallel transfer benchmarks?

I must be missing something here:

I'm running e.g. uniband via slurm + openmpi. I have 2x 32-core nodes, so I want to run 64 processes with the 1st half of them on Node#1 and the 2nd half on Node#2 so the pair-wise transfers go across the network

Setting sbatch options of --ntasks=64 and --ntasks-per-node=32, running:

srun --distribution=block IMB-MPI uniband

does the right thing for the 64-process case, with ranks 0-31 on node#1 and 32-63 on node#2. However, uniband also generates results for 2, 4, 8, 16 and 32-processes. Which seems helpful, except that all the communication there is within node1, which isn't really measuring what I want.

Is this the intended usage and behavior? If so, is there a way of disabling the runs on less than all processes, so I can control placement properly?

Please give the release tarballs a more descriptive name

On https://github.com/intel/mpi-benchmarks/releases, the download tarballs are named the release number -- e.g., "v2019.1.tar.gz". This is very confusing when downloading these tarballs into a general "downloads" directory -- there's no indication in the filename what the file is.

Could you please name future release tarballs with a filename that matches the directory name to which the contents expand? This is a quite common convention.

E.g., if your next release tarball follows the same convention as previous releases, it will be named v2019.2.tar.gz, and will expand into a directory named mpi-benchmarks-2019.2 (note the lack of v). It would be great if the tarball was therefore named mpi-benchmarks-2019.2.tar.gz.

I don't have a strong preference as to whether the v precedes the 2019 or not -- just as long as the tarball filename matches the directory name to which it expands.

Please consider this for future releases. Thanks.

gcc version dependency

Is there a requirement on the gcc version? I don't see any info regarding this in the README.

I built mpi-benchmarks on a system with older version of gcc (v4.8.5) and it compiled fine but when running benchmarks I got the following error:

[0] IMB-MPI1: /lib64/libstdc++.so.6: version `CXXABI_1.3.9' not found (required by IMB-MPI1)
[0] IMB-MPI1: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.21' not found (required by IMB-MPI1)

Build info:

export CC=/opt/intel/impi/2019.4.243/intel64/bin/mpicc
export CXX=/opt/intel/impi/2019.4.243/intel64/bin/mpicxx
make all

If I point LD_LIBRARY_PATH to libraries of latest gcc (v8.2.0) while running the benchmark, it works fine.

Non-aggregate Accumulate data validation error with Aggregate warm-up

Hello,

I ran into data validation issue with IMB-EXT non-aggregate mode Accumulate with 2 processes.

IMB: IMB-v2019.6
Open MPI: v4.1.x c71e1fa1db v4.1.x: schizo/jsm: Disable binding when direct launched

[ec2-user@ip-172-31-9-184 ompi]$ mpirun --prefix /fsx/ompi/install -n 2 --mca btl ofi --mca osc rdma --mca btl_ofi_provider_include efa --hostfile /fsx/hosts -x PATH -x LD_LIBRARY_PATH /fsx/mpi-benchmarks/IMB-EXT Accumulate -npmin 2 -iter 1 -aggregate_mode non_aggregate -warm_up 1
Warning: Permanently added 'ip-172-31-13-230,172.31.13.230' (ECDSA) to the list of known hosts.
#------------------------------------------------------------
#    Intel(R) MPI Benchmarks 2019 Update 6, MPI-2 part
#------------------------------------------------------------
# Date                  : Thu Jul 23 23:21:51 2020
# Machine               : x86_64
# System                : Linux
# Release               : 4.14.165-103.209.amzn1.x86_64
# Version               : #1 SMP Sun Feb 9 00:23:26 UTC 2020
# MPI Version           : 3.1
# MPI Thread Environment:


# Calling sequence was:

# /fsx/mpi-benchmarks/IMB-EXT Accumulate -npmin 2 -iter 1 -aggregate_mode non_aggregate -warm_up 1

# Minimum message length in bytes:   0
# Maximum message length in bytes:   4194304
#
# MPI_Datatype                   :   MPI_BYTE
# MPI_Datatype for reductions    :   MPI_FLOAT
# MPI_Op                         :   MPI_SUM
#
#

# List of Benchmarks to run:

# Accumulate

#-----------------------------------------------------------------------------
# Benchmarking Accumulate
# #processes = 2
#-----------------------------------------------------------------------------
#
#    MODE: NON-AGGREGATE
#
       #bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]      defects
            0            1         9.04         9.23         9.13         0.00
            4            1        17.77        18.18        17.97         0.00
            8            1        18.73        19.88        19.30         0.00
           16            1        20.24        21.11        20.68         0.00
           32            1        20.54        20.58        20.56         0.00
0: Error Accumulate,size = 64,sample #0
Process 0: Got invalid buffer:
Buffer entry: 0.600000
pos: 0
Process 0: Expected    buffer:
Buffer entry: 0.300000
           64            1        42.08        43.74        42.91         1.00

I found that the IMB-EXT non-aggregate Accumulate validation issue is because of its warm up procedure (see line), which uses aggregate mode (see line).

My theory is that rank 1 first finishes the warm-up and fetches the element values (accumulated during warm-up) which has not been reset by rank 0. Therefore, we got value 0.6, while the expected one is 0.3.

After using non-aggregate mode for both warm-up and later run, the benchmark runs fine to me. Can you please take a look, and let me know if it makes sense?

IMB is not compiled with OpenMPI 4.0

OpenMPI 4.0 removed support of depricated MPI_UB and MPI_LB types.
That's why IMB cannot be compiled with OpenMPI:
IMB_init_transfer.c: In function ‘IMB_init_transfer’:
/ompi4.0/include/mpi.h:1102:53: error: ‘ompi_mpi_lb’ undeclared (first use in this function)
#define MPI_LB OMPI_PREDEFINED_GLOBAL(MPI_Datatype, ompi_mpi_lb)
IMB_init_transfer.c:177:39: note: in expansion of macro ‘MPI_LB’
bllen[0]=1; displ[0] = 0; types[0] = MPI_LB;

The same error is for MPI_UB.

Command line:
make -f make_mpich MPI_HOME=/ompi4.0

BTW: Why do you reset MPI_HOME? OpenMPI usually sets this environment variable and it should work as simple as:
make -f make_mpich

It's here:
$head -3 make_mpich
(comment) Enter root directory of mpich install
MPI_HOME=

Regards!
---Dmitry

Nonblocking I/O operation and CPU exploit

It seems there is a bug in IMB-IO regarding the exploration of the CPU. Except for rank 0, all remaining ranks have target_reps equal to zero. I have added:

int rank;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
printf("rank %d -> Nrep: %d, target rep: %d\n",rank, Nrep,target_reps);

to the end of IMB_cpu_exploit, and executed:
LD_PRELOAD=./some_lib.so mpirun -np 2 ./IMB-IO P_IWrite_Indv -iter 5 -npmin 2 -msglog 20:20 -iter_policy off -time 500
here is the result:

#----------------------------------------------------------------
#    Intel(R) MPI Benchmarks 2021.3, MPI-IO partn#----------------------------------------------------------------
# Date                  : Tue Sep  6 15:03:02 2022
# Machine               : x86_64
# System                : Linux
# Release               : 5.15.0-47-generic
# Version               : #51-Ubuntu SMP Thu Aug 11 07:51:15 UTC 2022
# MPI Version           : 3.1
# MPI Thread Environment: 


# Calling sequence was: 

# ./IMB-IO P_IWrite_Indv -iter 5 -npmin 2 -msglog 20:20 -iter_policy off -time 500

# Minimum io portion in bytes:   0
# Maximum io portion in bytes:   1048576
#
#
#

# List of Benchmarks to run:

# P_IWrite_Indv
rank 0 -> Nrep: 1432890, target rep: 14328


# For nonblocking benchmarks:

# Function CPU_Exploit obtains an undisturbed
# performance of  286.58 MFlops
rank 1 -> Nrep: 1561411, target rep: 0
rank 1 -> Nrep: 1561411, target rep: 0
rank 1 -> Nrep: 1561411, target rep: 0
rank 1 -> Nrep: 1561411, target rep: 0
rank 1 -> Nrep: 1561411, target rep: 0
rank 0 -> Nrep: 1432890, target rep: 14328
rank 0 -> Nrep: 1432890, target rep: 14328
rank 0 -> Nrep: 1432890, target rep: 14328
rank 0 -> Nrep: 1432890, target rep: 14328
rank 0 -> Nrep: 1432890, target rep: 14328
rank 1 -> Nrep: 1561411, target rep: 0
rank 1 -> Nrep: 1561411, target rep: 0
rank 1 -> Nrep: 1561411, target rep: 0
rank 1 -> Nrep: 1561411, target rep: 0
rank 1 -> Nrep: 1561411, target rep: 0
rank 0 -> Nrep: 1432890, target rep: 14328
rank 0 -> Nrep: 1432890, target rep: 14328
rank 0 -> Nrep: 1432890, target rep: 14328
rank 0 -> Nrep: 1432890, target rep: 14328
rank 0 -> Nrep: 1432890, target rep: 14328

#-----------------------------------------------------------------------------
# Benchmarking P_IWrite_Indv 
# #processes = 2 
#-----------------------------------------------------------------------------
#
#    MODE: AGGREGATE 
#
       #bytes #repetitions t_ovrl[usec] t_pure[usec]  t_CPU[usec]   overlap[%]
            0            5    424323.39        74.06    845648.61       100.00
rank 1 -> Nrep: 1561411, target rep: 0
rank 1 -> Nrep: 1561411, target rep: 0
rank 1 -> Nrep: 1561411, target rep: 0
rank 1 -> Nrep: 1561411, target rep: 0
rank 1 -> Nrep: 1561411, target rep: 0
rank 0 -> Nrep: 1432890, target rep: 14328
rank 0 -> Nrep: 1432890, target rep: 14328
rank 0 -> Nrep: 1432890, target rep: 14328
rank 0 -> Nrep: 1432890, target rep: 14328
rank 0 -> Nrep: 1432890, target rep: 14328
rank 1 -> Nrep: 1561411, target rep: 0
rank 1 -> Nrep: 1561411, target rep: 0
rank 1 -> Nrep: 1561411, target rep: 0
rank 1 -> Nrep: 1561411, target rep: 0
rank 1 -> Nrep: 1561411, target rep: 0
rank 0 -> Nrep: 1432890, target rep: 14328
rank 0 -> Nrep: 1432890, target rep: 14328
rank 0 -> Nrep: 1432890, target rep: 14328
rank 0 -> Nrep: 1432890, target rep: 14328
rank 0 -> Nrep: 1432890, target rep: 14328
      1048576            5    429989.80     13614.71    845648.61       100.00


# All processes entering MPI_Finalize

This bug can be fixed by adding to original_benchmark.h after line 197 (#ifdef MPIIO):

if(c_info.w_rank != 0 && do_nonblocking_)
                IMB_cpu_exploit_reworked(TARGET_CPU_SECS, 1);

As it is nice to know the progress of the Nonblocking operation, I have added MPI_Testall to IMB_cpu_exploit.c
If you want, I can create a pull request.

Livelock in case of memory allocation failure

In some cases (for example "IMB-MPI1 Pingpong" running with 3 or more processes) some processes are put aside, waiting for others in an MPI_Barrier. If meanwhile other processes fail to allocate memory (message lengths that are too large in comparison to "-mem" arguement), those processes won't reach the MPI_Barrier call.
"Inactive" processes will then hang forever in MPI_Barrier on MPI_COM_WORLD.

explicit specialization of after instantiation errors

In file included from RMA/RMA_benchmark.cpp:61:
In file included from helpers/original_benchmark.h:58:
../src_c/IMB_prototypes.h:650:22: warning: 'register' storage class specifier is deprecated and incompatible with C++17 [-Wdeprecated-register]
long IMB_compute_crc(register char* buf, register size_t size);
^~~~~~~~~
../src_c/IMB_prototypes.h:650:42: warning: 'register' storage class specifier is deprecated and incompatible with C++17 [-Wdeprecated-register]
long IMB_compute_crc(register char* buf, register size_t size);
^~~~~~~~~
When attempting to build RMA on a cray machine at LLNL I get the errors below. The compiler is clang. Is there a fix or workaround for this issue?

zwhamo2{dinge1}47: module list

Currently Loaded Modules:

  1. cpe-cray 2) cce/11.0.3 3) craype/2.7.5 4) craype-x86-rome 5) craype-network-infiniband 6) cray-mvapich2_nogpu/2.3.5 7) cray-libsci/20.03.1 8) perftools-base/21.02.0 9) PrgEnv-cray/1.0.0 10) rocm/4.0.1

rzwhamo2{dinge1}49: which CC
/opt/cray/pe/craype/2.7.5/bin/CC
rzwhamo2{dinge1}50: echo $CXX
CC
rzwhamo2{dinge1}51: CC --version
Cray clang version 11.0.3 (477c94a197f0fb1c961670c6e69c34a212c8f345)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/cray/pe/cce/11.0.3/cce-clang/x86_64/share/../bin

RMA/RMA_benchmark.cpp:79:1: error: explicit specialization of 'name' after instantiation
BENCHMARK(IMB_rma_single_put, Unidir_put)
^
RMA/RMA_benchmark.cpp:73:106: note: expanded from macro 'BENCHMARK'
#define BENCHMARK(BMRK_FN, BMRK_NAME) template class OriginalBenchmark<BenchmarkSuite<BS_RMA>, BMRK_FN>;
^
./benchmark.h:88:114: note: expanded from macro '
DECLARE_INHERITED_TEMPLATE'
#define DECLARE_INHERITED_TEMPLATE(CLASS, NAME) namespace { CLASS elem_ ## NAME; } template<> const char *CLASS::name = #NAME;
^
helpers/original_benchmark.h:139:34: note: implicit instantiation first required here
BMark->name = strdup(name);
^
RMA/RMA_benchmark.cpp:79:1: error: explicit specialization of 'descr' after instantiation
BENCHMARK(IMB_rma_single_put, Unidir_put)

Performance regression using SGI MPT library

I have a performance regression for some of the benchmarks between commits c3ef058 and ebb5646.

When running benchmarks between two processes on the same node but on different sockets, like this:

mpiexec_mpt -ppn 2 -n 2 omplace -nt 32 ./IMB-MPI1 biband -npmin 2

I get this performance for the old source code:

# Benchmarking Biband
# #processes = 2
#---------------------------------------------------
       #bytes #repetitions   Mbytes/sec      Msg/sec
            0         1000         0.00      4988798
            1         1000         6.37      6373589
            2         1000        13.62      6810416
            4         1000        25.55      6388573
            8         1000        50.81      6351491
           16         1000       101.49      6342847
           32         1000       203.61      6362683
           64         1000       378.28      5910609
          128         1000       293.17      2290367
          256         1000       624.56      2439692
          512         1000      1212.22      2367611
         1024         1000      2078.85      2030125
         2048         1000      8515.61      4158012
         4096         1000     15032.28      3669991
         8192         1000     25693.00      3136352
        16384         1000     25555.42      1559779
        32768         1000     25122.75       766685
        65536          640     34725.89       529875
       131072          320     31540.43       240634
       262144          160     18311.99        69855
       524288           80     15432.22        29435
      1048576           40     14296.57        13634
      2097152           20     14836.32         7075
      4194304           10     14914.15         3556

And this performance for the newer version of IMB:

#---------------------------------------------------
# Benchmarking Biband
# #processes = 2
#---------------------------------------------------
       #bytes #repetitions   Mbytes/sec      Msg/sec
            0         1000         0.00      5147985
            1         1000         6.29      6290787
            2         1000        14.39      7193543
            4         1000        26.45      6613563
            8         1000        53.03      6629307
           16         1000       102.32      6395217
           32         1000       207.49      6483996
           64         1000       440.25      6878981
          128         1000       312.46      2441079
          256         1000       608.54      2377097
          512         1000      1200.30      2344330
         1024         1000      1927.01      1881845
         2048         1000      8097.04      3953630
         4096         1000      4591.55      1120983
         8192         1000      5532.52       675356
        16384         1000      5822.15       355356
        32768         1000      5488.27       167489
        65536          640      5508.57        84054
       131072          320      4271.12        32586
       262144          160      4208.83        16055
       524288           80      4139.70         7896
      1048576           40      4085.87         3897
      2097152           20      4054.54         1933
      4194304           10      4042.60          964

I'm using the same compilers between both source codes and the network setup isn't changed.
Is this expected with the latest IMB?

thanks

Compilation with -DCHECK fails in src_c/IMB.c

make -C src_c CFLAGS=-DCHECK fails because of missing ( in IMB.c.

diff --git src_c/IMB.c src_c/IMB.c
index 21ded36..ae97b86 100644
--- src_c/IMB.c
+++ src_c/IMB.c
@@ -388,7 +388,7 @@ Return value          (type int)
     IMB_free_all(&C_INFO, &BList, &ITERATIONS);
 
 #ifdef CHECK
-    if num_alloc == num_free)
+    if (num_alloc == num_free)
         ierr=0;
     else {
         fprintf(stderr, "pr %d: calls to IMB_v_alloc %d / IMB_v_free %d (doesn't seem ok, are unequal!)\n", C_INFO.w_rank,num_alloc,num_free);

IMB-NBC and IMB-IO

In IMB-NBC, I receive an integrity fail in Ireduce Scatter with standard Intel Omni Path.
In IMB-IO, P_Write_shared, P_IWrite_Shared, P_READ_Shared, P_IRead_Shared, C_Read_Shared and C_IRead_Shared represent with either a segmentation fault or an integrity fail.
can you please tell why this is happening?

Options for specifying message length while running IMB-MPI1

Hello,

I would like to know if there are options for specifying message length.

Currently the default message length configuration is:

# Minimum message length in bytes:   0
# Maximum message length in bytes:   4194304

It seems that the -msglen option is not used to achieve this.

fail to build with LTO

gentoo CI tried to build with LTO and failed

make[1]: Entering directory '/var/tmp/portage/sys-cluster/mpi-benchmarks-2021.3/work/mpi-benchmarks-IMB-v2021.3/src_cpp'
mpicxx -DMT -IMT -I. -O2 -pipe -march=x86-64 -frecord-gcc-switches -fno-diagnostics-color -fmessage-length=0 -flto -Werror=odr -Werror=lto-type-mismatch -Werror=strict-aliasing -g -Wall -Wextra -pedantic -Wno-long-long -fopenmp -fPIC -c -o MT/MT_suite.o MT/MT_suite.cpp
mpicc -O2 -pipe -march=x86-64 -frecord-gcc-switches -fno-diagnostics-color -fmessage-length=0 -flto -Werror=odr -Werror=lto-type-mismatch -Werror=strict-aliasing -g -Wall -Wno-long-long -Ihelpers -I../src_c -DMPIIO -I. -DMPIIO -c -o IO/IMB_read.o ../src_c/IMB_read.c
mpicc -O2 -pipe -march=x86-64 -frecord-gcc-switches -fno-diagnostics-color -fmessage-length=0 -flto -Werror=odr -Werror=lto-type-mismatch -Werror=strict-aliasing -g -Wall -Wno-long-long -Ihelpers -I../src_c -DMPIIO -I. -DMPIIO -c -o IO/IMB_write.o ../src_c/IMB_write.c
mpicxx -Ihelpers -I../src_c -DMPIIO -I. -O2 -pipe -march=x86-64 -frecord-gcc-switches -fno-diagnostics-color -fmessage-length=0 -flto -Werror=odr -Werror=lto-type-mismatch -Werror=strict-aliasing -g -Wall -Wextra -pedantic -Wno-long-long -fPIE -o IMB-IO imb.o args_parser.o args_parser_utests.o scope.o IO/IO_suite.o IO/IO_benchmark.o benchmark_suites_collection.o IO/IMB_declare.o IO/IMB_init.o IO/IMB_mem_manager.o IO/IMB_benchlist.o IO/IMB_strgs.o IO/IMB_err_handler.o IO/IMB_parse_name_io.o IO/IMB_g_info.o IO/IMB_warm_up.o IO/IMB_open_close.o IO/IMB_output.o IO/IMB_utils.o IO/IMB_init_transfer.o IO/IMB_init_file.o IO/IMB_user_set_info.o IO/IMB_chk_diff.o IO/IMB_cpu_exploit.o IO/IMB_read.o IO/IMB_write.o -Wl,-O1 -Wl,--as-needed -Wl,--defsym=__gentoo_check_ldflags__=0 
../src_c/IMB_comm_info.h:99:8: error: type �-Werror=odr][]]
   99 | struct comm_info {
      |        ^
../src_c/IMB_comm_info.h:99:8: note: a different type is defined in another translation unit
   99 | struct comm_info {
      |        ^
../src_c/IMB_comm_info.h:129:21: note: the first difference of corresponding definitions is field ‘s_data’
  129 |     assign_type*    s_data;         /* assign_type equivalent of s_buffer      */
      |                     ^
../src_c/IMB_comm_info.h:129:21: note: a field of same name but different type is defined in another translation unit
  129 |     assign_type*    s_data;         /* assign_type equivalent of s_buffer      */
      |                     ^
../src_c/IMB_comm_info.h:99:8: note: type ‘assign_type’ should match type ‘assign_type’
   99 | struct comm_info {
      |        ^
lto1: some warnings being treated as errors
lto-wrapper: fatal error: x86_64-pc-linux-gnu-g++ returned 1 exit status
compilation terminated.
/usr/lib/gcc/x86_64-pc-linux-gnu/12.1.1/../../../../x86_64-pc-linux-gnu/bin/ld: error: lto-wrapper failed
collect2: error: ld returned 1 exit status
make[1]: *** [Makefile:167: IMB-IO] Error 1
make[1]: Leaving directory '/var/tmp/portage/sys-cluster/mpi-benchmarks-2021.3/work/mpi-benchmarks-IMB-v2021.3/src_cpp'
make: *** [Makefile:69: IMB-IO] Error 2
make: *** Waiting for unfinished jobs....

see https://bugs.gentoo.org/860540
log https://860540.bugs.gentoo.org/attachment.cgi?id=793919

Questions on mem_manager.c code

Hi,

I'm seeing some failures with -DCHECK enabled and I was looking at buffer value assignment logic. I have a couple of questions:

  1. When is this (L280) path taken if there's already this (L273) check ahead?
273    if (pos2 >= pos1) {                                              
274        size_t a_pos1, a_pos2, i, j;
275        a_pos1 = pos1 / asize;
276
277        if (pos2 >= pos1)                                           
278            a_pos2 = pos2 / asize;
279        else
280            a_pos2 = a_pos1 - 1;                                    
281
282        if (value)
283            for (i = a_pos1, j = 0; i <= a_pos2; i++, j++)
284                ((assign_type *)buf)[j] = BUF_VALUE(rank, i);
285        else
286            for (i = a_pos1, j = 0; i <= a_pos2; i++, j++)
287                ((assign_type *)buf)[j] = 0.;
288
289        if (a_pos1*asize != pos1) {                                 
290            void* xx = (void*)(((char*)buf) + pos1 - a_pos1*asize);
291            memmove(buf, xx, pos2 - pos1 + 1);
292        }
293    } /*if( pos2>= pos1 )*/

If I'm not mistaken if line 273 is a true statement then line 277 is always true and so 280 is never taken. Correct me if I'm wrong.

  1. Definition of IMB_ass_buf seems to indicate that assignment occurs over byte positions pos1 till pos2 as indicated here:
void IMB_ass_buf(void* buf, int rank, size_t pos1, 
                 size_t pos2, int value) {
/*
                      Assigns values to a buffer
Input variables:
-rank                 (type int)
                      Rank of calling process
-pos1                 (type int)
-pos2                 (type int)
                      Assignment between byte positions pos1, pos2

But lines 284 and 287 seem to touch the buffer starting from offset 0 (buf[j] for j = 0 ... apos2). Can you explain if this is correct?

  1. Can you explain the logic behind this (L289) path? For instance, if pos1 and pos2 are 15 and 32 respectively, then if size(assign_type) == 4 then a_pos1 = 3 and a_pos2 = 8. If this is the case then there there's a memmove of 18 bytes (pos2 - pos1 + 1) when there should 17 bytes changed.

EXT and RMA accumulate aggregate mode issues

I'm working on getting all IMB tests running with MPI+OFI and am getting various errors (not limited to the two addressed in this issue). While trying to figure out why verbs;ofi_rxm and verbs;ofi_rxd are not working with the EXT accumulate test (and assert error and a hang), I managed to get rid of the issue by changing something in the test. A similar fix seems to fix a similar issue with the RMA accumulate test (consistent hang).
I am not very familiar with the setup of the test suite so I'm hoping someone can explain to me why this is fixing the problem and what the proper fix for these issues is. I'm seeing the consistent behavior with both Intel MPI and MPICH.

Here's the change that I made to fix the EXT accumulate failure:

index 39b86b5..2bfb2c0 100644
--- a/src_c/IMB_ones_accu.c
+++ b/src_c/IMB_ones_accu.c
@@ -188,7 +188,8 @@ Output variables:
 #ifdef CHECK
             for (i = 0; i < ITERATIONS->r_cache_iter; i++)
 #else
-            for (i = 0; i < ITERATIONS->n_sample; i++)
+//            for (i = 0; i < ITERATIONS->n_sample; i++)
+            for (i = 0; i < ITERATIONS->r_cache_iter; i++)
 #endif
             {
                 MPI_ERRHAND(MPI_Accumulate((char*)c_info->s_buffer + i%ITERATIONS->s_cache_iter*ITERATIONS->s_offs,

and here's the one to fix the RMA accumulate failure:

index c3052a9..0c93fb5 100644
--- a/src_c/IMB_rma_atomic.c
+++ b/src_c/IMB_rma_atomic.c
@@ -103,7 +103,8 @@ void IMB_rma_accumulate(struct comm_info* c_info, int size,
         MPI_Win_lock(MPI_LOCK_SHARED, root, 0, c_info->WIN);
         if (run_mode->AGGREGATE) {
             res_time = MPI_Wtime();
-            for (i = 0; i < iterations->n_sample; i++) {
+     //       for (i = 0; i < iterations->n_sample; i++) {
+            for (i = 0; i < iterations->r_cache_iter; i++) {
                 MPI_ERRHAND(MPI_Accumulate((char*)c_info->s_buffer + i%iterations->s_cache_iter*iterations->s_offs,
                                            s_num, c_info->red_data_type, root,
                                            i%iterations->r_cache_iter*r_off, r_num,

make fails with if /bin/sh != /bin/bash

make fails if /bin/sh is not /bin/bash (In Debian GNU/Linux, /bin/sh is /bin/dash by default).

Use of == in [ ] is an extention of bash and some other shells. It should be = for all POSIX-conformant shells.

@@ -171,8 +171,8 @@ override CPPFLAGS += -DWITH_YAML_CPP
 endif
 
 announce:
-       @if [ "$(ANNOUNCE)" == "1" ]; then echo "NOTE: Building target: $(TARGET), binary name: $(BINARY)"; fi
-       @if [ "$(ANNOUNCE)" == "1" ]; then echo "NOTE: Use make TARGET=<DIR_NAME> to select a target suite"; fi
+       @if [ "$(ANNOUNCE)" = "1" ]; then echo "NOTE: Building target: $(TARGET), binary name: $(BINARY)"; fi
+       @if [ "$(ANNOUNCE)" = "1" ]; then echo "NOTE: Use make TARGET=<DIR_NAME> to select a target suite"; fi
 
 $(BINARY): $(IMB_OBJ) $(BECHMARK_SUITE_OBJ) $(ADDITIONAL_OBJ) $(YAML_CPP_LIB)
        $(CXX) $(CPPFLAGS) $(CXXFLAGS) -o $@ $^ $(LDFLAGS)

Hangs in some IMB-RMA tests when I_MPI_ROOT is set and when using a non-One API-provided libfabric

The IMB-RMA tests Accumulate, Get_accumulate, Fetch_and_op, and Compare_and_swap all hang for us if we use a libfabric version other than the one provided with OneAPI.

If, however, we unset the env var I_MPI_ROOT, all the tests complete normally.

We discovered this because the script which we source for OneAPI MPI is setting that env var:

source /opt/intel/oneapi/mpi/latest/env/vars.sh release

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.