GithubHelp home page GithubHelp logo

Comments (138)

masazzz avatar masazzz commented on August 26, 2024 6

@ProjectPhysX
Thank you for your suggestion.
I'll try the A100 later as well.

System: "Flow" Type II subsystem, Information Technology Center, Nagoya University (https://icts.nagoya-u.ac.jp/en/sc/)
Compiler: gcc/10.3.0
Memory: 31800
Loop iterations: 80
GPUs: 2, 4
FP: FP32, FP16S, FP16C

bench.sh

#!/bin/bash
module load gcc/10.3.0
mkdir -p bin
git checkout src/defines.hpp
mv src/defines.hpp src/defines.hpp.orig
git checkout src/setup.cpp
mv src/setup.cpp src/setup.cpp.orig
GPUs=1
array=("LBM lbm(2u*L, 1u*L, 1u*L, 2u, 1u, 1u, 1.0f);" "LBM lbm(2u*L, 2u*L, 1u*L, 2u, 2u, 1u, 1.0f);")
for i in ${!array[@]}
do
  sed -e "s|for(uint i=0u; i<1000u; i++) {|for(uint i=0u; i<80u; i++) {|" -e "s|LBM lbm(256u, 256u, 256u, 1.0f);|const uint memory = 31800u;const uint L = ((uint)cbrt(fmin((float)memory*1048576.0f/(19.0f*(float)sizeof(fpxx)+17.0f), (float)max_uint))/2u)*2u;${array[$i]}|" src/setup.cpp.orig > src/setup.cpp
  GPUs=$((GPUs * 2))
  for a in FP32 FP16S FP16C
  do
    (
      echo "#define $a"
      cat src/defines.hpp.orig
    ) > src/defines.hpp
    rm -f ./bin/FluidX3D
    g++ ./src/*.cpp -o ./bin/FluidX3D -std=c++17 -pthread -I./src/OpenCL/include -L./src/OpenCL/lib -lOpenCL
    ./bin/FluidX3D | sed -E "s/\x1b\[([0-9]{1,3}((;[0-9]{1,3})*)?)?[mGK]//g" | col -bx | tee log.memory_31800.${GPUs}GPUs.$a
  done
done
bash bench.ch
.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /                FluidX3D Version 2.6 |
|                                      '         Copyright (c) Moritz Lehmann |
|----------------.------------------------------------------------------------|
| Device ID    0 | Tesla V100-SXM2-32GB                                       |
| Device ID    1 | Tesla V100-SXM2-32GB                                       |
| Device ID    2 | Tesla V100-SXM2-32GB                                       |
| Device ID    3 | Tesla V100-SXM2-32GB                                       |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | Tesla V100-SXM2-32GB                                       |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 525.60.13                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 80 at 1530 MHz (5120 cores, 15.667 TFLOPs/s)               |
| Memory, Cache  | 32500 MB, 2560 KB global / 48 KB local                     |
| Buffer Limits  | 8125 MB global, 64 KB constant                             |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 1                                                          |
| Device Name    | Tesla V100-SXM2-32GB                                       |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 525.60.13                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 80 at 1530 MHz (5120 cores, 15.667 TFLOPs/s)               |
| Memory, Cache  | 32500 MB, 2560 KB global / 48 KB local                     |
| Buffer Limits  | 8125 MB global, 64 KB constant                             |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                              1420 x 710 x 710 = 715822000 |
| Grid Domains    |                                             2 x 1 x 1 = 2 |
| LBM Type        |                                     D3Q19 SRT (FP32/FP32) |
| Memory Usage    |                             CPU 11605 MB, GPU 2x 31850 MB |
| Max Alloc Size  |                                                  25941 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 410 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|    7788 |   1192 GB/s |        11 |          800 100% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 7953                                                   |
.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /                FluidX3D Version 2.6 |
|                                      '         Copyright (c) Moritz Lehmann |
|----------------.------------------------------------------------------------|
| Device ID    0 | Tesla V100-SXM2-32GB                                       |
| Device ID    1 | Tesla V100-SXM2-32GB                                       |
| Device ID    2 | Tesla V100-SXM2-32GB                                       |
| Device ID    3 | Tesla V100-SXM2-32GB                                       |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | Tesla V100-SXM2-32GB                                       |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 525.60.13                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 80 at 1530 MHz (5120 cores, 15.667 TFLOPs/s)               |
| Memory, Cache  | 32500 MB, 2560 KB global / 48 KB local                     |
| Buffer Limits  | 8125 MB global, 64 KB constant                             |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 1                                                          |
| Device Name    | Tesla V100-SXM2-32GB                                       |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 525.60.13                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 80 at 1530 MHz (5120 cores, 15.667 TFLOPs/s)               |
| Memory, Cache  | 32500 MB, 2560 KB global / 48 KB local                     |
| Buffer Limits  | 8125 MB global, 64 KB constant                             |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                             1692 x 846 x 846 = 1210991472 |
| Grid Domains    |                                             2 x 1 x 1 = 2 |
| LBM Type        |                                    D3Q19 SRT (FP32/FP16S) |
| Memory Usage    |                             CPU 19633 MB, GPU 2x 31854 MB |
| Max Alloc Size  |                                                  21942 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 488 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|   15377 |   1184 GB/s |        13 |          799 190% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 15469                                                  |
.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /                FluidX3D Version 2.6 |
|                                      '         Copyright (c) Moritz Lehmann |
|----------------.------------------------------------------------------------|
| Device ID    0 | Tesla V100-SXM2-32GB                                       |
| Device ID    1 | Tesla V100-SXM2-32GB                                       |
| Device ID    2 | Tesla V100-SXM2-32GB                                       |
| Device ID    3 | Tesla V100-SXM2-32GB                                       |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | Tesla V100-SXM2-32GB                                       |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 525.60.13                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 80 at 1530 MHz (5120 cores, 15.667 TFLOPs/s)               |
| Memory, Cache  | 32500 MB, 2560 KB global / 48 KB local                     |
| Buffer Limits  | 8125 MB global, 64 KB constant                             |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 1                                                          |
| Device Name    | Tesla V100-SXM2-32GB                                       |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 525.60.13                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 80 at 1530 MHz (5120 cores, 15.667 TFLOPs/s)               |
| Memory, Cache  | 32500 MB, 2560 KB global / 48 KB local                     |
| Buffer Limits  | 8125 MB global, 64 KB constant                             |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                             1692 x 846 x 846 = 1210991472 |
| Grid Domains    |                                             2 x 1 x 1 = 2 |
| LBM Type        |                                    D3Q19 SRT (FP32/FP16C) |
| Memory Usage    |                             CPU 19633 MB, GPU 2x 31854 MB |
| Max Alloc Size  |                                                  21942 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 488 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|   12842 |   1989 GB/s |        11 |          799 190% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 12932                                                  |
.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /                FluidX3D Version 2.6 |
|                                      '         Copyright (c) Moritz Lehmann |
|----------------.------------------------------------------------------------|
| Device ID    0 | Tesla V100-SXM2-32GB                                       |
| Device ID    1 | Tesla V100-SXM2-32GB                                       |
| Device ID    2 | Tesla V100-SXM2-32GB                                       |
| Device ID    3 | Tesla V100-SXM2-32GB                                       |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | Tesla V100-SXM2-32GB                                       |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 525.60.13                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 80 at 1530 MHz (5120 cores, 15.667 TFLOPs/s)               |
| Memory, Cache  | 32500 MB, 2560 KB global / 48 KB local                     |
| Buffer Limits  | 8125 MB global, 64 KB constant                             |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 1                                                          |
| Device Name    | Tesla V100-SXM2-32GB                                       |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 525.60.13                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 80 at 1530 MHz (5120 cores, 15.667 TFLOPs/s)               |
| Memory, Cache  | 32500 MB, 2560 KB global / 48 KB local                     |
| Buffer Limits  | 8125 MB global, 64 KB constant                             |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 2                                                          |
| Device Name    | Tesla V100-SXM2-32GB                                       |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 525.60.13                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 80 at 1530 MHz (5120 cores, 15.667 TFLOPs/s)               |
| Memory, Cache  | 32500 MB, 2560 KB global / 48 KB local                     |
| Buffer Limits  | 8125 MB global, 64 KB constant                             |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 3                                                          |
| Device Name    | Tesla V100-SXM2-32GB                                       |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 525.60.13                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 80 at 1530 MHz (5120 cores, 15.667 TFLOPs/s)               |
| Memory, Cache  | 32500 MB, 2560 KB global / 48 KB local                     |
| Buffer Limits  | 8125 MB global, 64 KB constant                             |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                            1420 x 1420 x 710 = 1431644000 |
| Grid Domains    |                                             2 x 2 x 1 = 4 |
| LBM Type        |                                     D3Q19 SRT (FP32/FP32) |
| Memory Usage    |                             CPU 23210 MB, GPU 4x 31940 MB |
| Max Alloc Size  |                                                  25941 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 410 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|   13002 |   1989 GB/s |         9 |          800 100% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 13135                                                  |
.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /                FluidX3D Version 2.6 |
|                                      '         Copyright (c) Moritz Lehmann |
|----------------.------------------------------------------------------------|
| Device ID    0 | Tesla V100-SXM2-32GB                                       |
| Device ID    1 | Tesla V100-SXM2-32GB                                       |
| Device ID    2 | Tesla V100-SXM2-32GB                                       |
| Device ID    3 | Tesla V100-SXM2-32GB                                       |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | Tesla V100-SXM2-32GB                                       |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 525.60.13                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 80 at 1530 MHz (5120 cores, 15.667 TFLOPs/s)               |
| Memory, Cache  | 32500 MB, 2560 KB global / 48 KB local                     |
| Buffer Limits  | 8125 MB global, 64 KB constant                             |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 1                                                          |
| Device Name    | Tesla V100-SXM2-32GB                                       |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 525.60.13                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 80 at 1530 MHz (5120 cores, 15.667 TFLOPs/s)               |
| Memory, Cache  | 32500 MB, 2560 KB global / 48 KB local                     |
| Buffer Limits  | 8125 MB global, 64 KB constant                             |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 2                                                          |
| Device Name    | Tesla V100-SXM2-32GB                                       |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 525.60.13                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 80 at 1530 MHz (5120 cores, 15.667 TFLOPs/s)               |
| Memory, Cache  | 32500 MB, 2560 KB global / 48 KB local                     |
| Buffer Limits  | 8125 MB global, 64 KB constant                             |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 3                                                          |
| Device Name    | Tesla V100-SXM2-32GB                                       |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 525.60.13                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 80 at 1530 MHz (5120 cores, 15.667 TFLOPs/s)               |
| Memory, Cache  | 32500 MB, 2560 KB global / 48 KB local                     |
| Buffer Limits  | 8125 MB global, 64 KB constant                             |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                            1692 x 1692 x 846 = 2421982944 |
| Grid Domains    |                                             2 x 2 x 1 = 4 |
| LBM Type        |                                    D3Q19 SRT (FP32/FP16S) |
| Memory Usage    |                             CPU 39266 MB, GPU 4x 31930 MB |
| Max Alloc Size  |                                                  21942 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 488 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|   26044 |   2005 GB/s |        11 |          799 190% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 26527                                                  |
.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /                FluidX3D Version 2.6 |
|                                      '         Copyright (c) Moritz Lehmann |
|----------------.------------------------------------------------------------|
| Device ID    0 | Tesla V100-SXM2-32GB                                       |
| Device ID    1 | Tesla V100-SXM2-32GB                                       |
| Device ID    2 | Tesla V100-SXM2-32GB                                       |
| Device ID    3 | Tesla V100-SXM2-32GB                                       |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | Tesla V100-SXM2-32GB                                       |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 525.60.13                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 80 at 1530 MHz (5120 cores, 15.667 TFLOPs/s)               |
| Memory, Cache  | 32500 MB, 2560 KB global / 48 KB local                     |
| Buffer Limits  | 8125 MB global, 64 KB constant                             |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 1                                                          |
| Device Name    | Tesla V100-SXM2-32GB                                       |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 525.60.13                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 80 at 1530 MHz (5120 cores, 15.667 TFLOPs/s)               |
| Memory, Cache  | 32500 MB, 2560 KB global / 48 KB local                     |
| Buffer Limits  | 8125 MB global, 64 KB constant                             |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 2                                                          |
| Device Name    | Tesla V100-SXM2-32GB                                       |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 525.60.13                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 80 at 1530 MHz (5120 cores, 15.667 TFLOPs/s)               |
| Memory, Cache  | 32500 MB, 2560 KB global / 48 KB local                     |
| Buffer Limits  | 8125 MB global, 64 KB constant                             |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 3                                                          |
| Device Name    | Tesla V100-SXM2-32GB                                       |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 525.60.13                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 80 at 1530 MHz (5120 cores, 15.667 TFLOPs/s)               |
| Memory, Cache  | 32500 MB, 2560 KB global / 48 KB local                     |
| Buffer Limits  | 8125 MB global, 64 KB constant                             |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                            1692 x 1692 x 846 = 2421982944 |
| Grid Domains    |                                             2 x 2 x 1 = 4 |
| LBM Type        |                                    D3Q19 SRT (FP32/FP16C) |
| Memory Usage    |                             CPU 39266 MB, GPU 4x 31930 MB |
| Max Alloc Size  |                                                  21942 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 488 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|   22500 |   1733 GB/s |        19 |          799 190% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 22686                                                  |

from fluidx3d.

masazzz avatar masazzz commented on August 26, 2024 5

Here are the A100 benchmark test results.
Let me know if there is anything else I can do.

System: Wisteria/BDEC-01 Aquarius, Supercomputing Division, Information Technology Center, The University of Tokyo (https://www.cc.u-tokyo.ac.jp/en/supercomputer/wisteria/system.php)

Compiler: gcc/12.2.0
Memory: 39800
Loop iterations: 80
GPUs: 2, 4, 8
FP: FP32, FP16S, FP16C

bench-Aquarius.sh

module load gcc/12.2.0
mkdir -p bin
git checkout src/defines.hpp
mv src/defines.hpp src/defines.hpp.orig
git checkout src/setup.cpp
mv src/setup.cpp src/setup.cpp.orig
GPUs=1
array=("LBM lbm(2u*L, 1u*L, 1u*L, 2u, 1u, 1u, 1.0f);" "LBM lbm(2u*L, 2u*L, 1u*L, 2u, 2u, 1u, 1.0f);" "LBM lbm(2u*L, 2u*L, 2u*L, 2u, 2u, 2u, 1.0f);")
for i in ${!array[@]}
do
  sed -e "s|for(uint i=0u; i<1000u; i++) {|for(uint i=0u; i<80u; i++) {|" -e "s|LBM lbm(256u, 256u, 256u, 1.0f);|const uint memory = 39800u;const uint L = ((uint)cbrt(fmin((float)memory*1048576.0f/(19.0f*(float)sizeof(fpxx)+17.0f), (float)max_uint))/2u)*2u;${array[$i]}|" src/setup.cpp.orig > src/setup.cpp
  GPUs=$((GPUs * 2))
  for a in FP32 FP16S FP16C
  do
    log=log.memory_39800.${GPUs}GPUs.$a
    if [ ! -f $log ]
    then
        (
            echo "#define $a"
            cat src/defines.hpp.orig
        ) > src/defines.hpp
        rm -f ./bin/FluidX3D
        g++ ./src/*.cpp -o ./bin/FluidX3D -std=c++17 -pthread -I./src/OpenCL/include -L./src/OpenCL/lib -lOpenCL
        ./bin/FluidX3D | sed -E "s/\x1b\[([0-9]{1,3}((;[0-9]{1,3})*)?)?[mGK]//g" | col -bx | tee $log
    fi
  done
done
bash bench-Aquarius.sh
.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /                FluidX3D Version 2.6 |
|                                      '         Copyright (c) Moritz Lehmann |
|----------------.------------------------------------------------------------|
| Device ID    0 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    1 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    2 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    3 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    4 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    5 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    6 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    7 | NVIDIA A100-SXM4-40GB                                      |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 1                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                              1528 x 764 x 764 = 891887488 |
| Grid Domains    |                                             2 x 1 x 1 = 2 |
| LBM Type        |                                     D3Q19 SRT (FP32/FP32) |
| Memory Usage    |                             CPU 14459 MB, GPU 2x 39675 MB |
| Max Alloc Size  |                                                  32321 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 441 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|   14183 |   2170 GB/s |        16 |          800 100% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 14311                                                  |
.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /                FluidX3D Version 2.6 |
|                                      '         Copyright (c) Moritz Lehmann |
|----------------.------------------------------------------------------------|
| Device ID    0 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    1 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    2 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    3 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    4 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    5 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    6 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    7 | NVIDIA A100-SXM4-40GB                                      |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 1                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                             1824 x 912 x 912 = 1517101056 |
| Grid Domains    |                                             2 x 1 x 1 = 2 |
| LBM Type        |                                    D3Q19 SRT (FP32/FP16S) |
| Memory Usage    |                             CPU 24595 MB, GPU 2x 39897 MB |
| Max Alloc Size  |                                                  27489 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 527 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|   23472 |   1807 GB/s |        15 |          799 190% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 23707                                                  |
.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /                FluidX3D Version 2.6 |
|                                      '         Copyright (c) Moritz Lehmann |
|----------------.------------------------------------------------------------|
| Device ID    0 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    1 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    2 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    3 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    4 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    5 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    6 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    7 | NVIDIA A100-SXM4-40GB                                      |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 1                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                             1824 x 912 x 912 = 1517101056 |
| Grid Domains    |                                             2 x 1 x 1 = 2 |
| LBM Type        |                                    D3Q19 SRT (FP32/FP16C) |
| Memory Usage    |                             CPU 24595 MB, GPU 2x 39897 MB |
| Max Alloc Size  |                                                  27489 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 527 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|   15518 |   1195 GB/s |        10 |          799 190% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 15512                                                  |
.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /                FluidX3D Version 2.6 |
|                                      '         Copyright (c) Moritz Lehmann |
|----------------.------------------------------------------------------------|
| Device ID    0 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    1 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    2 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    3 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    4 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    5 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    6 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    7 | NVIDIA A100-SXM4-40GB                                      |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 1                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 2                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 3                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                            1528 x 1528 x 764 = 1783774976 |
| Grid Domains    |                                             2 x 2 x 1 = 4 |
| LBM Type        |                                     D3Q19 SRT (FP32/FP32) |
| Memory Usage    |                             CPU 28919 MB, GPU 4x 39780 MB |
| Max Alloc Size  |                                                  32321 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 441 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|   23706 |   3627 GB/s |        13 |          799 190% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 23411                                                  |
.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /                FluidX3D Version 2.6 |
|                                      '         Copyright (c) Moritz Lehmann |
|----------------.------------------------------------------------------------|
| Device ID    0 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    1 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    2 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    3 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    4 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    5 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    6 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    7 | NVIDIA A100-SXM4-40GB                                      |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 1                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 2                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 3                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                            1824 x 1824 x 912 = 3034202112 |
| Grid Domains    |                                             2 x 2 x 1 = 4 |
| LBM Type        |                                    D3Q19 SRT (FP32/FP16S) |
| Memory Usage    |                             CPU 49191 MB, GPU 4x 39987 MB |
| Max Alloc Size  |                                                  27489 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 527 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|   41877 |   3225 GB/s |        14 |          799 190% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 42400                                                  |
.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /                FluidX3D Version 2.6 |
|                                      '         Copyright (c) Moritz Lehmann |
|----------------.------------------------------------------------------------|
| Device ID    0 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    1 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    2 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    3 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    4 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    5 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    6 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    7 | NVIDIA A100-SXM4-40GB                                      |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 1                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 2                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 3                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                            1824 x 1824 x 912 = 3034202112 |
| Grid Domains    |                                             2 x 2 x 1 = 4 |
| LBM Type        |                                    D3Q19 SRT (FP32/FP16C) |
| Memory Usage    |                             CPU 49191 MB, GPU 4x 39987 MB |
| Max Alloc Size  |                                                  27489 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 527 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|   28813 |   2219 GB/s |        19 |          799 190% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 29017                                                  |
.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /                FluidX3D Version 2.6 |
|                                      '         Copyright (c) Moritz Lehmann |
|----------------.------------------------------------------------------------|
| Device ID    0 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    1 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    2 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    3 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    4 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    5 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    6 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    7 | NVIDIA A100-SXM4-40GB                                      |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 1                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 2                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 3                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 4                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 5                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 6                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 7                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                           1528 x 1528 x 1528 = 3567549952 |
| Grid Domains    |                                             2 x 2 x 2 = 8 |
| LBM Type        |                                     D3Q19 SRT (FP32/FP32) |
| Memory Usage    |                             CPU 57838 MB, GPU 8x 39883 MB |
| Max Alloc Size  |                                                  32321 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 882 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|   36708 |   5616 GB/s |        10 |          799 190% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 37619                                                  |
.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /                FluidX3D Version 2.6 |
|                                      '         Copyright (c) Moritz Lehmann |
|----------------.------------------------------------------------------------|
| Device ID    0 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    1 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    2 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    3 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    4 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    5 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    6 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    7 | NVIDIA A100-SXM4-40GB                                      |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 1                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 2                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 3                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 4                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 5                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 6                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 7                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                           1824 x 1824 x 1824 = 6068404224 |
| Grid Domains    |                                             2 x 2 x 2 = 8 |
| LBM Type        |                                    D3Q19 SRT (FP32/FP16S) |
| Memory Usage    |                             CPU 98383 MB, GPU 8x 40074 MB |
| Max Alloc Size  |                                                  27489 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                 Re < 1053 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|   72707 |   5598 GB/s |        12 |          799 190% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 72965                                                  |
.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /                FluidX3D Version 2.6 |
|                                      '         Copyright (c) Moritz Lehmann |
|----------------.------------------------------------------------------------|
| Device ID    0 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    1 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    2 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    3 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    4 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    5 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    6 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    7 | NVIDIA A100-SXM4-40GB                                      |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 1                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 2                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 3                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 4                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 5                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 6                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 7                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                           1824 x 1824 x 1824 = 6068404224 |
| Grid Domains    |                                             2 x 2 x 2 = 8 |
| LBM Type        |                                    D3Q19 SRT (FP32/FP16C) |
| Memory Usage    |                             CPU 98383 MB, GPU 8x 40074 MB |
| Max Alloc Size  |                                                  27489 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                 Re < 1053 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|   62451 |   4809 GB/s |        10 |          799 190% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 63009                                                  |

from fluidx3d.

Micmac2 avatar Micmac2 commented on August 26, 2024 3

CPU : Apple M1 Max (24GPU) with 32GB
OS : macOS Monterey 12.6.5

FP32/FP32
FluidX3D FP32 FP32

FP32/FP16S
FluidX3D FP32 FP16S

FP32/FP16C
FluidX3D FP32 FP16C

from fluidx3d.

trparry avatar trparry commented on August 26, 2024 2

image
image
image

from fluidx3d.

Maere05 avatar Maere05 commented on August 26, 2024 2

Hi,
Go to:
C:\FluidX3D-master\bin
Then create a folder called: "stl" and put your .stl files in there (only binary).
In setup.cpp change the lbm.voxelize_stl argument to be "stl/myFilename.stl"
Cheers

from fluidx3d.

ProjectPhysX avatar ProjectPhysX commented on August 26, 2024 2

@oscarbg someone reported 7900 XTX/XT benchmarks in Ububntu over on openbenchmarking.org! I just added the values to the table in the Readme file.

Edit: Carsten Spille benchmarked the 7900 XTX/XT on Windows, getting slightly better numbers for the XTX which are also more consistent with the XT. There might have been some driver issues on Ubuntu initially. So I replaced the numbers in the Readme.

from fluidx3d.

IvanBGR avatar IvanBGR commented on August 26, 2024 2

|----------------.------------------------------------------------------------|
| Device Name | NVIDIA GeForce RTX 4080 |
| Device Driver | 528.24 |
| OpenCL Version | OpenCL C 1.2 |
| Compute Units | 76 at 2850 MHz (9728 cores, 55.449 TFLOPs/s) |
| Memory, Cache | 16375 MB, 2128 KB global / 48 KB local |
| Buffer Limits | 4093 MB global, 64 KB constant |
|----------------'------------------------------------------------------------|
1 warning generated.
| Info: OpenCL C code successfully compiled. |
|-----------------.-----------------------------------------------------------|
| Grid Resolution | 256 x 256 x 256 = 16777216 |
| Grid Domains | 1 x 1 x 1 = 1 |
| LBM Type | D3Q19 SRT (FP32/FP32) |
| Memory Usage | CPU 272 MB, GPU 1x 1488 MB |
| Max Alloc Size | 1216 MB |
| Time Steps | 10 |
| Kin. Viscosity | 1.00000000 |
| Relaxation Time | 3.50000000 |
| Reynolds Number | Re < 148 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs | Bandwidth | Steps/s | Current Step | Time Remaining |
| 3883 | 594 GB/s | 231 | 9990 0% | 0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 3914 |

|----------------.------------------------------------------------------------|
| Device Name | NVIDIA GeForce RTX 4080 |
| Device Driver | 528.24 |
| OpenCL Version | OpenCL C 1.2 |
| Compute Units | 76 at 2850 MHz (9728 cores, 55.449 TFLOPs/s) |
| Memory, Cache | 16375 MB, 2128 KB global / 48 KB local |
| Buffer Limits | 4093 MB global, 64 KB constant |
|----------------'------------------------------------------------------------|
1 warning generated.
| Info: OpenCL C code successfully compiled. |
|-----------------.-----------------------------------------------------------|
| Grid Resolution | 256 x 256 x 256 = 16777216 |
| Grid Domains | 1 x 1 x 1 = 1 |
| LBM Type | D3Q19 SRT (FP32/FP16S) |
| Memory Usage | CPU 272 MB, GPU 1x 880 MB |
| Max Alloc Size | 608 MB |
| Time Steps | 10 |
| Kin. Viscosity | 1.00000000 |
| Relaxation Time | 3.50000000 |
| Reynolds Number | Re < 148 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs | Bandwidth | Steps/s | Current Step | Time Remaining |
| 7611 | 586 GB/s | 454 | 9991 10% | 0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 7626 |

|----------------.------------------------------------------------------------|
| Device Name | NVIDIA GeForce RTX 4080 |
| Device Driver | 528.24 |
| OpenCL Version | OpenCL C 1.2 |
| Compute Units | 76 at 2850 MHz (9728 cores, 55.449 TFLOPs/s) |
| Memory, Cache | 16375 MB, 2128 KB global / 48 KB local |
| Buffer Limits | 4093 MB global, 64 KB constant |
|----------------'------------------------------------------------------------|
1 warning generated.
| Info: OpenCL C code successfully compiled. |
|-----------------.-----------------------------------------------------------|
| Grid Resolution | 256 x 256 x 256 = 16777216 |
| Grid Domains | 1 x 1 x 1 = 1 |
| LBM Type | D3Q19 SRT (FP32/FP16C) |
| Memory Usage | CPU 272 MB, GPU 1x 880 MB |
| Max Alloc Size | 608 MB |
| Time Steps | 10 |
| Kin. Viscosity | 1.00000000 |
| Relaxation Time | 3.50000000 |
| Reynolds Number | Re < 148 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs | Bandwidth | Steps/s | Current Step | Time Remaining |
| 7914 | 609 GB/s | 472 | 9977 70% | 0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 7933 |

from fluidx3d.

nulaft avatar nulaft commented on August 26, 2024 2
.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /                FluidX3D Version 2.3 |
|                                      '         Copyright (c) Moritz Lehmann |
|----------------.------------------------------------------------------------|
| Device ID    0 | NVIDIA GeForce GTX 970                                     |
| Device ID    1 | Intel(R) HD Graphics 4600                                  |
| Device ID    2 | Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz                   |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | NVIDIA GeForce GTX 970                                     |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 528.02                                                     |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 13 at 1253 MHz (1664 cores, 4.170 TFLOPs/s)                |
| Memory, Cache  | 4095 MB, 624 KB global / 48 KB local                       |
| Buffer Limits  | 1023 MB global, 64 KB constant                             |
|----------------'------------------------------------------------------------|
1 warning generated.
| Info: OpenCL C code successfully compiled.                                  |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                                256 x 256 x 256 = 16777216 |
| Grid Domains    |                                             1 x 1 x 1 = 1 |
| LBM Type        |                                     D3Q19 SRT (FP32/FP32) |
| Memory Usage    |                                CPU 272 MB, GPU 1x 1488 MB |
| Max Alloc Size  |                                                   1216 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 148 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|     979 |    150 GB/s |        58 |         9999  90% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 980                                                    |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                                256 x 256 x 256 = 16777216 |
| Grid Domains    |                                             1 x 1 x 1 = 1 |
| LBM Type        |                                    D3Q19 SRT (FP32/FP16C) |
| Memory Usage    |                                 CPU 272 MB, GPU 1x 880 MB |
| Max Alloc Size  |                                                    608 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 148 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|    1618 |    125 GB/s |        96 |         9995  50% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 1623                                                   |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                                256 x 256 x 256 = 16777216 |
| Grid Domains    |                                             1 x 1 x 1 = 1 |
| LBM Type        |                                    D3Q19 SRT (FP32/FP16S) |
| Memory Usage    |                                 CPU 272 MB, GPU 1x 880 MB |
| Max Alloc Size  |                                                    608 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 148 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|    1717 |    132 GB/s |       102 |         9999  90% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 1721                                                   |

from fluidx3d.

PMunkes avatar PMunkes commented on August 26, 2024 2

Radeon RX 7900 XTX Red Devil Silent Bios Stock settings, newest Windows 10 driver (Radeon Software 23.2.1)
grafik
grafik
grafik

from fluidx3d.

ProjectPhysX avatar ProjectPhysX commented on August 26, 2024 2

@PMunkes thanks a lot for reporting! There seem to be rather significant performance improvements with the newer driver, so I've updated the values in the Readme table. I have now also fixed the incorrect TFLOPs reporting for 7900 series GPUs (RDNA3 is 256 ALUs per dual-CU).

from fluidx3d.

rodionstepanov avatar rodionstepanov commented on August 26, 2024 2

I'm surprise to get considerably different result for the same Tesla V100-SXM2-32GB 2GPU
|----------------.------------------------------------------------------------|
| Device ID 0 | Tesla V100-SXM2-32GB |
| Device ID 1 | Tesla V100-SXM2-32GB |
|----------------'------------------------------------------------------------|

|----------------.------------------------------------------------------------|
| Device ID | 0 |
| Device Name | Tesla V100-SXM2-32GB |
| Device Vendor | NVIDIA Corporation |
| Device Driver | 450.51.05 |
| OpenCL Version | OpenCL C 1.2 |
| Compute Units | 80 at 1530 MHz (5120 cores, 15.667 TFLOPs/s) |
| Memory, Cache | 32510 MB, 2560 KB global / 48 KB local |
| Buffer Limits | 8127 MB global, 64 KB constant |
|----------------'------------------------------------------------------------|

| Info: OpenCL C code successfully compiled. |

|----------------.------------------------------------------------------------|
| Device ID | 1 |
| Device Name | Tesla V100-SXM2-32GB |
| Device Vendor | NVIDIA Corporation |
| Device Driver | 450.51.05 |
| OpenCL Version | OpenCL C 1.2 |
| Compute Units | 80 at 1530 MHz (5120 cores, 15.667 TFLOPs/s) |
| Memory, Cache | 32510 MB, 2560 KB global / 48 KB local |
| Buffer Limits | 8127 MB global, 64 KB constant |
|----------------'------------------------------------------------------------|

| Info: OpenCL C code successfully compiled. |
|-----------------.-----------------------------------------------------------|
| Grid Resolution | 1420 x 710 x 710 = 715822000 |
| Grid Domains | 2 x 1 x 1 = 2 |
| LBM Type | D3Q19 SRT (FP32/FP32) |
| Memory Usage | CPU 11605 MB, GPU 2x 31850 MB |
| Max Alloc Size | 25941 MB |
| Time Steps | 10 |
| Kin. Viscosity | 1.00000000 |
| Relaxation Time | 3.50000000 |
| Reynolds Number | Re < 410 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs | Bandwidth | Steps/s | Current Step | Time Remaining |
| 8531 | 1305 GB/s | 12 | 999 90% | 0s |
|---------'-------------'-----------'-------------------'---------------------|

| Info: Peak MLUPs/s = 8528 |

from fluidx3d.

gryoung4727 avatar gryoung4727 commented on August 26, 2024 2

GTX Titan FP32/FP32
Titan-FP32-FP32

GTX Titan FP32/FP16S
Titan-FP32-FP16S

GTX Titan FP32/FP16C
Titan-FP32-FP16C

GTX 680 FP32/FP32
680-FP32-FP32

GTX 680 FP32/FP16S
680-FP32-FP16S

GTX 680 FP32/FP16C
680-FP32-FP16C

from fluidx3d.

dextorious avatar dextorious commented on August 26, 2024 2

Apple M2 Max (in the 16" chassis):

| Device ID    0 | Apple M2 Max                                               |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | Apple M2 Max                                               |
| Device Vendor  | Apple                                                      |
| Device Driver  | 1.2 1.0                                                    |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 38 at 1000 MHz (4864 cores, 9.728 TFLOPs/s)                |
| Memory, Cache  | 21845 MB, 0 KB global / 32 KB local                        |
| Buffer Limits  | 4096 MB global, 1048576 KB constant                        |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                                256 x 256 x 256 = 16777216 |
| Grid Domains    |                                             1 x 1 x 1 = 1 |
| LBM Type        |                                     D3Q19 SRT (FP32/FP32) |
| Memory Usage    |                                CPU 272 MB, GPU 1x 1488 MB |
| Max Alloc Size  |                                                   1216 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 148 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|    2398 |    367 GB/s |       143 |         9995  50% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 2405                                                   |

| Device ID    0 | Apple M2 Max                                               |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | Apple M2 Max                                               |
| Device Vendor  | Apple                                                      |
| Device Driver  | 1.2 1.0                                                    |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 38 at 1000 MHz (4864 cores, 9.728 TFLOPs/s)                |
| Memory, Cache  | 21845 MB, 0 KB global / 32 KB local                        |
| Buffer Limits  | 4096 MB global, 1048576 KB constant                        |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                                256 x 256 x 256 = 16777216 |
| Grid Domains    |                                             1 x 1 x 1 = 1 |
| LBM Type        |                                    D3Q19 SRT (FP32/FP16S) |
| Memory Usage    |                                 CPU 272 MB, GPU 1x 880 MB |
| Max Alloc Size  |                                                    608 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 148 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|    4613 |    355 GB/s |       275 |         9985  50% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 4641                                                   |

| Device ID    0 | Apple M2 Max                                               |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | Apple M2 Max                                               |
| Device Vendor  | Apple                                                      |
| Device Driver  | 1.2 1.0                                                    |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 38 at 1000 MHz (4864 cores, 9.728 TFLOPs/s)                |
| Memory, Cache  | 21845 MB, 0 KB global / 32 KB local                        |
| Buffer Limits  | 4096 MB global, 1048576 KB constant                        |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                                256 x 256 x 256 = 16777216 |
| Grid Domains    |                                             1 x 1 x 1 = 1 |
| LBM Type        |                                    D3Q19 SRT (FP32/FP16C) |
| Memory Usage    |                                 CPU 272 MB, GPU 1x 880 MB |
| Max Alloc Size  |                                                    608 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 148 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|    2422 |    187 GB/s |       144 |         9994  40% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 2444                                                   |

Pretty good overall, but I did find the FP16C efficiency drop rather surprising. There was no throttling during the benchmark (in fact, the fan didn't even turn on, the temps gradually reached 75C and that's it) and I ran it twice in different order, but this system doesn't like FP16C. Otherwise really happy to see over 90% efficiency over what is really a shared memory interface on a system with plenty of background tasks, etc.

from fluidx3d.

wesfdf avatar wesfdf commented on August 26, 2024 2

GeForce GTX 770:
Benchmark.txt

from fluidx3d.

PMunkes avatar PMunkes commented on August 26, 2024 2

AMD Phoenix with DDR5-6400 memory in the ROG Ally:
grafik
grafik
grafik

from fluidx3d.

marty1885 avatar marty1885 commented on August 26, 2024 2

On Orange Pi 5 Plus (RK3588/Mali G610 MP4) 16GB

❯ ./make.sh # F32
.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /                FluidX3D Version 2.7 |
|                                      '     Copyright (c) Dr. Moritz Lehmann |
arm_release_ver of this libmali is 'g6p0-01eac0', rk_so_ver is '7'.
|----------------.------------------------------------------------------------|
| Device ID    0 | Mali-LODX r0p0                                             |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | Mali-LODX r0p0                                             |
| Device Vendor  | ARM                                                        |
| Device Driver  | 2.1                                                        |
| OpenCL Version | OpenCL C 2.0 v1.g6p0-01eac0.2819f9d4dbe0b5a2f89c835d8484f9cd |
| Compute Units  | 4 at 1000 MHz (32 cores, 0.064 TFLOPs/s)                   |
| Memory, Cache  | 15708 MB, 1024 KB global / 32 KB local                     |
| Buffer Limits  | 15708 MB global, 16085876 KB constant                      |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                                256 x 256 x 256 = 16777216 |
| Grid Domains    |                                             1 x 1 x 1 = 1 |
| LBM Type        |                                     D3Q19 SRT (FP32/FP32) |
| Memory Usage    |                                CPU 272 MB, GPU 1x 1488 MB |
| Max Alloc Size  |                                                   1216 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 148 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|      43 |      7 GB/s |         3 |         9999  90% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 43                                                     |

❯ ./make.sh # F16S
.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /                FluidX3D Version 2.7 |
|                                      '     Copyright (c) Dr. Moritz Lehmann |
arm_release_ver of this libmali is 'g6p0-01eac0', rk_so_ver is '7'.
|----------------.------------------------------------------------------------|
| Device ID    0 | Mali-LODX r0p0                                             |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | Mali-LODX r0p0                                             |
| Device Vendor  | ARM                                                        |
| Device Driver  | 2.1                                                        |
| OpenCL Version | OpenCL C 2.0 v1.g6p0-01eac0.2819f9d4dbe0b5a2f89c835d8484f9cd |
| Compute Units  | 4 at 1000 MHz (32 cores, 0.064 TFLOPs/s)                   |
| Memory, Cache  | 15708 MB, 1024 KB global / 32 KB local                     |
| Buffer Limits  | 15708 MB global, 16085876 KB constant                      |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                                256 x 256 x 256 = 16777216 |
| Grid Domains    |                                             1 x 1 x 1 = 1 |
| LBM Type        |                                    D3Q19 SRT (FP32/FP16S) |
| Memory Usage    |                                 CPU 272 MB, GPU 1x 880 MB |
| Max Alloc Size  |                                                    608 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 148 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|      59 |      5 GB/s |         4 |         9999  90% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 59                                                     |

❯ ./make.sh # F16C
.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /                FluidX3D Version 2.7 |
|                                      '     Copyright (c) Dr. Moritz Lehmann |
arm_release_ver of this libmali is 'g6p0-01eac0', rk_so_ver is '7'.
|----------------.------------------------------------------------------------|
| Device ID    0 | Mali-LODX r0p0                                             |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | Mali-LODX r0p0                                             |
| Device Vendor  | ARM                                                        |
| Device Driver  | 2.1                                                        |
| OpenCL Version | OpenCL C 2.0 v1.g6p0-01eac0.2819f9d4dbe0b5a2f89c835d8484f9cd |
| Compute Units  | 4 at 1000 MHz (32 cores, 0.064 TFLOPs/s)                   |
| Memory, Cache  | 15708 MB, 1024 KB global / 32 KB local                     |
| Buffer Limits  | 15708 MB global, 16085876 KB constant                      |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                                256 x 256 x 256 = 16777216 |
| Grid Domains    |                                             1 x 1 x 1 = 1 |
| LBM Type        |                                    D3Q19 SRT (FP32/FP16C) |
| Memory Usage    |                                 CPU 272 MB, GPU 1x 880 MB |
| Max Alloc Size  |                                                    608 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 148 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|      19 |      1 GB/s |         1 |         9999  90% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 19                                                     |

from fluidx3d.

C-Dub2022 avatar C-Dub2022 commented on August 26, 2024 1

AMD Radeon RX 580:
image

from fluidx3d.

C-Dub2022 avatar C-Dub2022 commented on August 26, 2024 1

Hopefully this is helpful. Let me know if there is anything else I can do.

image
image

from fluidx3d.

fkay1 avatar fkay1 commented on August 26, 2024 1

AMD 5700 XT

|----------------.------------------------------------------------------------|
| Device ID 0 | gfx1010:xnack- |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID | 0 |
| Device Name | gfx1010:xnack- |
| Device Vendor | Advanced Micro Devices, Inc. |
| Device Driver | 3444.0 (PAL,LC) |
| OpenCL Version | OpenCL C 2.0 |
| Compute Units | 20 at 1905 MHz (2560 cores, 9.754 TFLOPs/s) |
| Memory, Cache | 8176 MB, 16 KB global / 64 KB local |
| Buffer Limits | 6949 MB global, 7116390 KB constant |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled. |
|-----------------.-----------------------------------------------------------|
| Grid Resolution | 256 x 256 x 256 = 16777216 |
| LBM Type | D3Q19 SRT (FP32/FP32) |
| Memory Usage | CPU 272 MB, GPU 1488 MB |
| Max Alloc Size | 1216 MB |
| Time Steps | 10 |
| Kin. Viscosity | 1.00000000 |
| Relaxation Time | 3.50000000 |
| Reynolds Number | Re < 148 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs | Bandwidth | Steps/s | Current Step | Time Remaining |
| 1366 | 209 GB/s | 81 | 9996 60% | 0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 1368 |

|----------------.------------------------------------------------------------|
| Device ID 0 | gfx1010:xnack- |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID | 0 |
| Device Name | gfx1010:xnack- |
| Device Vendor | Advanced Micro Devices, Inc. |
| Device Driver | 3444.0 (PAL,LC) |
| OpenCL Version | OpenCL C 2.0 |
| Compute Units | 20 at 1905 MHz (2560 cores, 9.754 TFLOPs/s) |
| Memory, Cache | 8176 MB, 16 KB global / 64 KB local |
| Buffer Limits | 6949 MB global, 7116390 KB constant |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled. |
|-----------------.-----------------------------------------------------------|
| Grid Resolution | 256 x 256 x 256 = 16777216 |
| LBM Type | D3Q19 SRT (FP32/FP16S) |
| Memory Usage | CPU 272 MB, GPU 880 MB |
| Max Alloc Size | 608 MB |
| Time Steps | 10 |
| Kin. Viscosity | 1.00000000 |
| Relaxation Time | 3.50000000 |
| Reynolds Number | Re < 148 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs | Bandwidth | Steps/s | Current Step | Time Remaining |
| 3253 | 250 GB/s | 194 | 9988 80% | 0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 3253 |

|----------------.------------------------------------------------------------|
| Device ID 0 | gfx1010:xnack- |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID | 0 |
| Device Name | gfx1010:xnack- |
| Device Vendor | Advanced Micro Devices, Inc. |
| Device Driver | 3444.0 (PAL,LC) |
| OpenCL Version | OpenCL C 2.0 |
| Compute Units | 20 at 1905 MHz (2560 cores, 9.754 TFLOPs/s) |
| Memory, Cache | 8176 MB, 16 KB global / 64 KB local |
| Buffer Limits | 6949 MB global, 7116390 KB constant |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled. |
|-----------------.-----------------------------------------------------------|
| Grid Resolution | 256 x 256 x 256 = 16777216 |
| LBM Type | D3Q19 SRT (FP32/FP16C) |
| Memory Usage | CPU 272 MB, GPU 880 MB |
| Max Alloc Size | 608 MB |
| Time Steps | 10 |
| Kin. Viscosity | 1.00000000 |
| Relaxation Time | 3.50000000 |
| Reynolds Number | Re < 148 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs | Bandwidth | Steps/s | Current Step | Time Remaining |
| 3044 | 234 GB/s | 181 | 9992 20% | 0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 3049 |

from fluidx3d.

funlennysub avatar funlennysub commented on August 26, 2024 1

FP32/FP16C
FluidX3D-Benchmark-FP32-FP16C-Windows_GvHd6N7oB6

FP32/FP16S
FluidX3D-Benchmark-FP32-FP16S-Windows_90ejyVLfVG

FP32/FP32
FluidX3D-Benchmark-FP32-FP32-Windows_W9hOfLroLA

from fluidx3d.

nicandris avatar nicandris commented on August 26, 2024 1

RTX 2080 SUPER
image
image
image

from fluidx3d.

gittigittibangbang avatar gittigittibangbang commented on August 26, 2024 1

I tried a 6900XT, but the score is lower than anticipated. The max bandwidth seems to be limited to 300GB/s, although GPUZ says it's connected via PCIe 4.0 16x and should top out at 512GB/s. The GPU clock is at 2540MHz and the memory clock at 2000MHz. GPU and memory controller loads are at 100%.

image
image
image

With the 3D Taylor-Green model and FP32/FP16S, the MLUPs/s and the bandwidth go through the roof. I'll try some other models, too. FP32/FP32 goes up to 2400 MLUPs/s and 370GB/s, with FP32/FP16C it's 9000 MLUPs/s and 700GB/s.
image

from fluidx3d.

HAL9000COM avatar HAL9000COM commented on August 26, 2024 1

Vega 8 in R7 4750G
|----------------.------------------------------------------------------------|
| Device ID 0 | gfx90c |
| Device ID 1 | gfx90c |
| Device ID 2 | gfx90c |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID | 0 |
| Device Name | gfx90c |
| Device Vendor | Advanced Micro Devices, Inc. |
| Device Driver | 3380.6 (PAL,HSAIL) |
| OpenCL Version | OpenCL C 2.0 |
| Compute Units | 8 at 2100 MHz (512 cores, 2.150 TFLOPs/s) |
| Memory, Cache | 26899 MB, 16 KB global / 32 KB local |
| Buffer Limits | 19382 MB global, 19847731 KB constant |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled. |
|-----------------.-----------------------------------------------------------|
| Grid Resolution | 256 x 256 x 256 = 16777216 |
| LBM Type | D3Q19 SRT (FP32/FP32) |
| Memory Usage | CPU 272 MB, GPU 1488 MB |
| Max Alloc Size | 1216 MB |
| Time Steps | 10 |
| Kin. Viscosity | 1.00000000 |
| Relaxation Time | 3.50000000 |
| Reynolds Number | Re < 148 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs | Bandwidth | Steps/s | Current Step | Time Remaining |
| 246 | 38 GB/s | 15 | 9999 90% | 0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 263 |

|----------------.------------------------------------------------------------|
| Device ID 0 | gfx90c |
| Device ID 1 | gfx90c |
| Device ID 2 | gfx90c |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID | 0 |
| Device Name | gfx90c |
| Device Vendor | Advanced Micro Devices, Inc. |
| Device Driver | 3380.6 (PAL,HSAIL) |
| OpenCL Version | OpenCL C 2.0 |
| Compute Units | 8 at 2100 MHz (512 cores, 2.150 TFLOPs/s) |
| Memory, Cache | 26899 MB, 16 KB global / 32 KB local |
| Buffer Limits | 19382 MB global, 19847731 KB constant |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled. |
|-----------------.-----------------------------------------------------------|
| Grid Resolution | 256 x 256 x 256 = 16777216 |
| LBM Type | D3Q19 SRT (FP32/FP16S) |
| Memory Usage | CPU 272 MB, GPU 880 MB |
| Max Alloc Size | 608 MB |
| Time Steps | 10 |
| Kin. Viscosity | 1.00000000 |
| Relaxation Time | 3.50000000 |
| Reynolds Number | Re < 148 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs | Bandwidth | Steps/s | Current Step | Time Remaining |
| 505 | 39 GB/s | 30 | 9998 80% | 0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 511 |

|----------------.------------------------------------------------------------|
| Device ID 0 | gfx90c |
| Device ID 1 | gfx90c |
| Device ID 2 | gfx90c |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID | 0 |
| Device Name | gfx90c |
| Device Vendor | Advanced Micro Devices, Inc. |
| Device Driver | 3380.6 (PAL,HSAIL) |
| OpenCL Version | OpenCL C 2.0 |
| Compute Units | 8 at 2100 MHz (512 cores, 2.150 TFLOPs/s) |
| Memory, Cache | 26899 MB, 16 KB global / 32 KB local |
| Buffer Limits | 19382 MB global, 19847731 KB constant |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled. |
|-----------------.-----------------------------------------------------------|
| Grid Resolution | 256 x 256 x 256 = 16777216 |
| LBM Type | D3Q19 SRT (FP32/FP16C) |
| Memory Usage | CPU 272 MB, GPU 880 MB |
| Max Alloc Size | 608 MB |
| Time Steps | 10 |
| Kin. Viscosity | 1.00000000 |
| Relaxation Time | 3.50000000 |
| Reynolds Number | Re < 148 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs | Bandwidth | Steps/s | Current Step | Time Remaining |
| 466 | 36 GB/s | 28 | 9998 80% | 0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 501 |

from fluidx3d.

edmond1992 avatar edmond1992 commented on August 26, 2024 1

RTX3060 Laptop GPU with 12700H on ASUS ROG M16 Turbo mode (120W GPU TDP) and external laptop fan
PS C:\Software\FluidX3D> .\FluidX3D-Benchmark-FP32-FP32-Windows.exe
.-----------------------------------------------------------------------------.
| ______________ ______________ |
| \ ________ | | ________ / |
| \ \ | | | | / / |
| \ \ | | | | / / |
| \ \ | | | | / / |
| \ _.-" | | "-._/ / |
| \ .-" _ "-. / |
| .-" .-" "-. "-./ |
| .-" .-"-. "-. |
| \ v" "v / |
| \ \ / / |
| \ \ / / |
| \ \ / / |
| \ ' / |
| \ / |
| \ / |
| ' ╕ Moritz Lehmann |
|----------------.------------------------------------------------------------|
| Device ID 0 | NVIDIA GeForce RTX 3060 Laptop GPU |
| Device ID 1 | Intel(R) Iris(R) Xe Graphics |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID | 0 |
| Device Name | NVIDIA GeForce RTX 3060 Laptop GPU |
| Device Vendor | NVIDIA Corporation |
| Device Driver | 512.78 |
| OpenCL Version | OpenCL C 1.2 |
| Compute Units | 30 at 1425 MHz (3840 cores, 10.944 TFLOPs/s) |
| Memory, Cache | 6143 MB, 840 KB global / 48 KB local |
| Buffer Limits | 1535 MB global, 64 KB constant |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled. |
|-----------------.-----------------------------------------------------------|
| Grid Resolution | 256 x 256 x 256 = 16777216 |
| LBM Type | D3Q19 SRT (FP32/FP32) |
| Memory Usage | CPU 272 MB, GPU 1488 MB |
| Max Alloc Size | 1216 MB |
| Time Steps | 10 |
| Kin. Viscosity | 1.00000000 |
| Relaxation Time | 3.50000000 |
| Reynolds Number | Re < 148 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs | Bandwidth | Steps/s | Current Step | Time Remaining |
| 2014 | 308 GB/s | 120 | 9999 90% | 0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 2019 |

PS C:\Software\FluidX3D> .\FluidX3D-Benchmark-FP32-FP16C-Windows.exe
.-----------------------------------------------------------------------------.
| ______________ ______________ |
| \ ________ | | ________ / |
| \ \ | | | | / / |
| \ \ | | | | / / |
| \ \ | | | | / / |
| \ _.-" | | "-._/ / |
| \ .-" _ "-. / |
| .-" .-" "-. "-./ |
| .-" .-"-. "-. |
| \ v" "v / |
| \ \ / / |
| \ \ / / |
| \ \ / / |
| \ ' / |
| \ / |
| \ / |
| ' ╕ Moritz Lehmann |
|----------------.------------------------------------------------------------|
| Device ID 0 | NVIDIA GeForce RTX 3060 Laptop GPU |
| Device ID 1 | Intel(R) Iris(R) Xe Graphics |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID | 0 |
| Device Name | NVIDIA GeForce RTX 3060 Laptop GPU |
| Device Vendor | NVIDIA Corporation |
| Device Driver | 512.78 |
| OpenCL Version | OpenCL C 1.2 |
| Compute Units | 30 at 1425 MHz (3840 cores, 10.944 TFLOPs/s) |
| Memory, Cache | 6143 MB, 840 KB global / 48 KB local |
| Buffer Limits | 1535 MB global, 64 KB constant |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled. |
|-----------------.-----------------------------------------------------------|
| Grid Resolution | 256 x 256 x 256 = 16777216 |
| LBM Type | D3Q19 SRT (FP32/FP16C) |
| Memory Usage | CPU 272 MB, GPU 880 MB |
| Max Alloc Size | 608 MB |
| Time Steps | 10 |
| Kin. Viscosity | 1.00000000 |
| Relaxation Time | 3.50000000 |
| Reynolds Number | Re < 148 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs | Bandwidth | Steps/s | Current Step | Time Remaining |
| 3523 | 271 GB/s | 210 | 9996 60% | 0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 3572 |

PS C:\Software\FluidX3D> .\FluidX3D-Benchmark-FP32-FP16S-Windows.exe
.-----------------------------------------------------------------------------.
| ______________ ______________ |
| \ ________ | | ________ / |
| \ \ | | | | / / |
| \ \ | | | | / / |
| \ \ | | | | / / |
| \ _.-" | | "-._/ / |
| \ .-" _ "-. / |
| .-" .-" "-. "-./ |
| .-" .-"-. "-. |
| \ v" "v / |
| \ \ / / |
| \ \ / / |
| \ \ / / |
| \ ' / |
| \ / |
| \ / |
| ' ╕ Moritz Lehmann |
|----------------.------------------------------------------------------------|
| Device ID 0 | NVIDIA GeForce RTX 3060 Laptop GPU |
| Device ID 1 | Intel(R) Iris(R) Xe Graphics |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID | 0 |
| Device Name | NVIDIA GeForce RTX 3060 Laptop GPU |
| Device Vendor | NVIDIA Corporation |
| Device Driver | 512.78 |
| OpenCL Version | OpenCL C 1.2 |
| Compute Units | 30 at 1425 MHz (3840 cores, 10.944 TFLOPs/s) |
| Memory, Cache | 6143 MB, 840 KB global / 48 KB local |
| Buffer Limits | 1535 MB global, 64 KB constant |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled. |
|-----------------.-----------------------------------------------------------|
| Grid Resolution | 256 x 256 x 256 = 16777216 |
| LBM Type | D3Q19 SRT (FP32/FP16S) |
| Memory Usage | CPU 272 MB, GPU 880 MB |
| Max Alloc Size | 608 MB |
| Time Steps | 10 |
| Kin. Viscosity | 1.00000000 |
| Relaxation Time | 3.50000000 |
| Reynolds Number | Re < 148 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs | Bandwidth | Steps/s | Current Step | Time Remaining |
| 3991 | 307 GB/s | 238 | 9989 90% | 0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 4012 |

PS C:\Software\FluidX3D>

from fluidx3d.

HAL9000COM avatar HAL9000COM commented on August 26, 2024 1

@HAL9000COM thanks for the Vega 8 benchmarks! Quick question: Is your RAM is 2x16GB DDR4-3200MT/s? And do you have an idea why the GPU shows up 3 times?

2x32GB DDR4-3200 OC to 3533. No idea why GPU shows up multiple times. After some reboot, it now shows up as two devices.

from fluidx3d.

skoz90 avatar skoz90 commented on August 26, 2024 1

image
image
image

Nvidia Quadro RTX 5000

from fluidx3d.

SLGY avatar SLGY commented on August 26, 2024 1

@ProjectPhysX have now added the FP16 benchmarks

RTX 3080 Ti

Updated FP32 (was concurrently baking a fluid in Blender when I ran the last one):
FP32

FP16S:
FP16S

FP16C:
FP16C

from fluidx3d.

gittigittibangbang avatar gittigittibangbang commented on August 26, 2024 1

Quadro RTX 4000 below. I also tried two Xeon Gold 5218 (2x16 cores), with the FP32/FP32 benchmark they top out at 126MLUPs/s, 20GB/s and 8 steps/s. I did not have the patience to run it to the end. The speedup with GPUs is really dramatic, damn.

image
image
image

from fluidx3d.

gittigittibangbang avatar gittigittibangbang commented on August 26, 2024 1

|----------------.------------------------------------------------------------|
| Device ID 0 | Quadro RTX 4000 |
| Device ID 1 | Intel(R) Xeon(R) Gold 5218 CPU @ 2.30GHz |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID | 1 |
| Device Name | Intel(R) Xeon(R) Gold 5218 CPU @ 2.30GHz |
| Device Vendor | Intel(R) Corporation |
| Device Driver | 6.4.0.37 |
| OpenCL Version | OpenCL C 2.0 |
| Compute Units | 32 at 2300 MHz (16 cores, 1.178 TFLOPs/s) |
| Memory, Cache | 261766 MB, 256 KB global / 32 KB local |
| Buffer Limits | 65441 MB global, 128 KB constant

FP32/FP32: 132MLUPs/s, 20GB/s bandwidth, 8 steps/s
FP32/FP16C: 270MLUPs/s, 21GB/s bandwidth, 16 steps/s
FP32/FP16S: 135MLUPs/s, 10GB/s bandwidth, 8 steps/s

from fluidx3d.

gittigittibangbang avatar gittigittibangbang commented on August 26, 2024 1

Yes, 8x32GB DDR4 at 2667MHz, but apparently only dual channel according to CPUZ. It seems there's something amiss in the UEFI settings, it should be quad channel.

from fluidx3d.

ProjectPhysX avatar ProjectPhysX commented on August 26, 2024 1

Hi @Michallote, many thanks for the 2060 KO benchmarks! The GPU chip itself does not matter too much. Performance purely follows memory bandwidth. You're getting ~75% efficiency which is typical for the Nvidia cards. It's due to the Esoteric-Pull swap algorithm using some misaligned write operations which are not at full bandwidth, for the benefit of cutting memory demand in half.
The 2060 Super has 33% higher bandwidth and that reflects in performance.

from fluidx3d.

Blightbuster avatar Blightbuster commented on August 26, 2024 1

GTX 1080 Ti

image
image
image

from fluidx3d.

atesteve avatar atesteve commented on August 26, 2024 1

This is a GTX 1650 on a laptop (under Linux):
fp32
fp16c
fp16s

from fluidx3d.

trparry avatar trparry commented on August 26, 2024 1

@ProjectPhysX yep! Just added them to original post.

from fluidx3d.

MarcoAurelioFerrari avatar MarcoAurelioFerrari commented on August 26, 2024 1

Just to confirm what was expected: RTX2060 TU106
image
image
image

from fluidx3d.

MarcoAurelioFerrari avatar MarcoAurelioFerrari commented on August 26, 2024 1

GTX 1660
image
image
image

from fluidx3d.

lsvvt avatar lsvvt commented on August 26, 2024 1

RTX 3070
image
image
image

from fluidx3d.

NarodGaming avatar NarodGaming commented on August 26, 2024 1

Apple M1 Pro (10 Core CPU / 16 Core GPU / 16GB RAM)

Not bad for ~200GB/s memory bandwidth, though definitely low on the FP16C.

Screenshot 2022-11-08 at 20 47 04

Screenshot 2022-11-08 at 20 48 01

Screenshot 2022-11-08 at 20 54 03

from fluidx3d.

ConfusedWizard avatar ConfusedWizard commented on August 26, 2024 1

RTX 4090
FP32_FP32
FP32_FP16S
FP32_FP16C

from fluidx3d.

ConfusedWizard avatar ConfusedWizard commented on August 26, 2024 1

@ProjectPhysX My first results were with a +1000Mhz memory overclock.

Here are the stock results:
FP32/FP32
FP32_FP32_stock
FP32/FP16S
FP32_FP16S_stock
FP32/FP16C
FP32_FP16C_stock

For calculating efficiency would it not be better to also benchmark the true memory bandwidth as well instead of using the official numbers from nvidia?

from fluidx3d.

mcelwee1 avatar mcelwee1 commented on August 26, 2024 1

Quadro RTX 6000 results

FP32-FP16S
image

FP32-FP16C
image

FP32-FP32
image

Note: This data was collected using the released Windows .exe files. When I run the benchmarks compiled on my Windows machine and run the benchmarks the results are ~5% slower.

from fluidx3d.

mcelwee1 avatar mcelwee1 commented on August 26, 2024 1

Quadro RTX A5000 laptop GPU results

FP32/FP32
image

FP32/FP16S
image

FP32/FP16C
image

from fluidx3d.

SLGY avatar SLGY commented on August 26, 2024 1

@ProjectPhysX Thanks for pointing out the make.sh file to me, I realised it's also mentioned in the readme file too - my apologies for that. I first rad through the readme file a long time ago before I knew what all that meant but I'll sure refer back to it in future first! I can make this into a separate issue too if you'd like, to keep this benchmark issue cleaner.

For anyone else reading this later and using Google Colab, the UTILITIES_NO_CPP17 line is in src/utilites.hpp

from fluidx3d.

oscarbg avatar oscarbg commented on August 26, 2024 1

Hope someone can post a 7900xt or xtx result..

from fluidx3d.

btrinos avatar btrinos commented on August 26, 2024 1

CPU: Intel Core i9 13900K
OS: Microsoft Windows 11
GPU: nVidia Titan RTX
Drivers: 531.61

FP32 [TFlops/s] 16.31
Mem [GB] 24
BW [GB/s] 527/571/577
FP32/FP32 [MLUPs/s] 3471
FP32/FP16S [MLUPs/s] 7456
FP32/FP16C [MLUPs/s] 7554
titanrtx-fp32-fp32
titanrtx-fp32-fp16s
titanrtx-fp32-fp16c

from fluidx3d.

btrinos avatar btrinos commented on August 26, 2024 1

CPU: Intel Core i9 13900K
OS: Microsoft Windows 11
GPU: nVidia Titan V
Drivers: 531.61

FP32 [TFlops/s] 14.899
Mem [GB] 12
BW [GB/s] 549/558/534
FP32/FP32 [MLUPs/s] 3601
FP32/FP16S [MLUPs/s] 7253
FP32/FP16C [MLUPs/s] 6957
titanv-fp32-fp32
titanv-fp32-fp16s
titanv-fp32-fp16c

from fluidx3d.

masazzz avatar masazzz commented on August 26, 2024 1

System: Wisteria/BDEC-01 Aquarius, Supercomputing Division, Information Technology Center, The University of Tokyo (https://www.cc.u-tokyo.ac.jp/en/supercomputer/wisteria/system.php)

bench.sh

module load gcc/12.2.0
mkdir -p bin
git checkout src/defines.hpp
mv src/defines.hpp src/defines.hpp.orig
for a in FP32 FP16S FP16C
do
  (
    echo "#define $a"
    cat src/defines.hpp.orig
  ) > src/defines.hpp
  rm -f ./bin/FluidX3D
  g++ ./src/*.cpp -o ./bin/FluidX3D -std=c++17 -pthread -I./src/OpenCL/include -L./src/OpenCL/lib -lOpenCL
  ./bin/FluidX3D | sed -E "s/\x1b\[([0-9]{1,3}((;[0-9]{1,3})*)?)?[mGK]//g" | col -bx | tee log.$a
done
bash bench.sh
.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /                FluidX3D Version 2.6 |
|                                      '         Copyright (c) Moritz Lehmann |
|----------------.------------------------------------------------------------|
| Device ID    0 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    1 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    2 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    3 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    4 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    5 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    6 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    7 | NVIDIA A100-SXM4-40GB                                      |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                                256 x 256 x 256 = 16777216 |
| Grid Domains    |                                             1 x 1 x 1 = 1 |
| LBM Type        |                                     D3Q19 SRT (FP32/FP32) |
| Memory Usage    |                                CPU 272 MB, GPU 1x 1488 MB |
| Max Alloc Size  |                                                   1216 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 148 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|    8542 |   1307 GB/s |       509 |         9979  90% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 8543                                                   |
.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /                FluidX3D Version 2.6 |
|                                      '         Copyright (c) Moritz Lehmann |
|----------------.------------------------------------------------------------|
| Device ID    0 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    1 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    2 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    3 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    4 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    5 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    6 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    7 | NVIDIA A100-SXM4-40GB                                      |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                                256 x 256 x 256 = 16777216 |
| Grid Domains    |                                             1 x 1 x 1 = 1 |
| LBM Type        |                                    D3Q19 SRT (FP32/FP16S) |
| Memory Usage    |                                 CPU 272 MB, GPU 1x 880 MB |
| Max Alloc Size  |                                                    608 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 148 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|   15909 |   1225 GB/s |       948 |         9954  40% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 15917                                                  |
.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /                FluidX3D Version 2.6 |
|                                      '         Copyright (c) Moritz Lehmann |
|----------------.------------------------------------------------------------|
| Device ID    0 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    1 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    2 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    3 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    4 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    5 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    6 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    7 | NVIDIA A100-SXM4-40GB                                      |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                                256 x 256 x 256 = 16777216 |
| Grid Domains    |                                             1 x 1 x 1 = 1 |
| LBM Type        |                                    D3Q19 SRT (FP32/FP16C) |
| Memory Usage    |                                 CPU 272 MB, GPU 1x 880 MB |
| Max Alloc Size  |                                                    608 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 148 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|    8748 |    674 GB/s |       521 |         9993  30% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 8748                                                   |

from fluidx3d.

masazzz avatar masazzz commented on August 26, 2024 1

System: "Flow" Type II subsystem, Information Technology Center, Nagoya University (https://icts.nagoya-u.ac.jp/en/sc/)

bench.sh

module load gcc/10.3.0
mkdir -p bin
git checkout src/defines.hpp
mv src/defines.hpp src/defines.hpp.orig
for a in FP32 FP16S FP16C
do
  (
    echo "#define $a"
    cat src/defines.hpp.orig
  ) > src/defines.hpp
  rm -f ./bin/FluidX3D
  g++ ./src/*.cpp -o ./bin/FluidX3D -std=c++17 -pthread -I./src/OpenCL/include -L./src/OpenCL/lib -lOpenCL
  ./bin/FluidX3D | sed -E "s/\x1b\[([0-9]{1,3}((;[0-9]{1,3})*)?)?[mGK]//g" | col -bx | tee log.$a
done
bash bench.sh
.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /                FluidX3D Version 2.6 |
|                                      '         Copyright (c) Moritz Lehmann |
|----------------.------------------------------------------------------------|
| Device ID    0 | Tesla V100-SXM2-32GB                                       |
| Device ID    1 | Tesla V100-SXM2-32GB                                       |
| Device ID    2 | Tesla V100-SXM2-32GB                                       |
| Device ID    3 | Tesla V100-SXM2-32GB                                       |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | Tesla V100-SXM2-32GB                                       |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 525.60.13                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 80 at 1530 MHz (5120 cores, 15.667 TFLOPs/s)               |
| Memory, Cache  | 32500 MB, 2560 KB global / 48 KB local                     |
| Buffer Limits  | 8125 MB global, 64 KB constant                             |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                                256 x 256 x 256 = 16777216 |
| Grid Domains    |                                             1 x 1 x 1 = 1 |
| LBM Type        |                                     D3Q19 SRT (FP32/FP32) |
| Memory Usage    |                                CPU 272 MB, GPU 1x 1488 MB |
| Max Alloc Size  |                                                   1216 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 148 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|    4468 |    684 GB/s |       266 |         9987  70% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 4474                                                   |
.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /                FluidX3D Version 2.6 |
|                                      '         Copyright (c) Moritz Lehmann |
|----------------.------------------------------------------------------------|
| Device ID    0 | Tesla V100-SXM2-32GB                                       |
| Device ID    1 | Tesla V100-SXM2-32GB                                       |
| Device ID    2 | Tesla V100-SXM2-32GB                                       |
| Device ID    3 | Tesla V100-SXM2-32GB                                       |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | Tesla V100-SXM2-32GB                                       |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 525.60.13                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 80 at 1530 MHz (5120 cores, 15.667 TFLOPs/s)               |
| Memory, Cache  | 32500 MB, 2560 KB global / 48 KB local                     |
| Buffer Limits  | 8125 MB global, 64 KB constant                             |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                                256 x 256 x 256 = 16777216 |
| Grid Domains    |                                             1 x 1 x 1 = 1 |
| LBM Type        |                                    D3Q19 SRT (FP32/FP16S) |
| Memory Usage    |                                 CPU 272 MB, GPU 1x 880 MB |
| Max Alloc Size  |                                                    608 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 148 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|    8934 |    688 GB/s |       533 |         9993  30% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 8947                                                   |
.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /                FluidX3D Version 2.6 |
|                                      '         Copyright (c) Moritz Lehmann |
|----------------.------------------------------------------------------------|
| Device ID    0 | Tesla V100-SXM2-32GB                                       |
| Device ID    1 | Tesla V100-SXM2-32GB                                       |
| Device ID    2 | Tesla V100-SXM2-32GB                                       |
| Device ID    3 | Tesla V100-SXM2-32GB                                       |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | Tesla V100-SXM2-32GB                                       |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 525.60.13                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 80 at 1530 MHz (5120 cores, 15.667 TFLOPs/s)               |
| Memory, Cache  | 32500 MB, 2560 KB global / 48 KB local                     |
| Buffer Limits  | 8125 MB global, 64 KB constant                             |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                                256 x 256 x 256 = 16777216 |
| Grid Domains    |                                             1 x 1 x 1 = 1 |
| LBM Type        |                                    D3Q19 SRT (FP32/FP16C) |
| Memory Usage    |                                 CPU 272 MB, GPU 1x 880 MB |
| Max Alloc Size  |                                                    608 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 148 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|    7205 |    555 GB/s |       429 |         9982  20% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 7217                                                   |

from fluidx3d.

masazzz avatar masazzz commented on August 26, 2024 1

System: Wisteria/BDEC-01 Aquarius, Supercomputing Division, Information Technology Center, The University of Tokyo (https://www.cc.u-tokyo.ac.jp/en/supercomputer/wisteria/system.php)

gcc/12.2.0
{FP32, FP16S, FP16C}
{2GPUs, 4GPUs, 8GPUs}

.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /                FluidX3D Version 2.6 |
|                                      '         Copyright (c) Moritz Lehmann |
|----------------.------------------------------------------------------------|
| Device ID    0 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    1 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    2 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    3 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    4 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    5 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    6 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    7 | NVIDIA A100-SXM4-40GB                                      |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 1                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                                512 x 256 x 256 = 33554432 |
| Grid Domains    |                                             2 x 1 x 1 = 2 |
| LBM Type        |                                     D3Q19 SRT (FP32/FP32) |
| Memory Usage    |                                CPU 544 MB, GPU 2x 1500 MB |
| Max Alloc Size  |                                                   1216 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 148 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|   10516 |   1609 GB/s |       313 |         9991  10% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 10728                                                  |
.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /                FluidX3D Version 2.6 |
|                                      '         Copyright (c) Moritz Lehmann |
|----------------.------------------------------------------------------------|
| Device ID    0 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    1 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    2 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    3 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    4 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    5 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    6 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    7 | NVIDIA A100-SXM4-40GB                                      |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 1                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 2                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 3                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                                512 x 512 x 256 = 67108864 |
| Grid Domains    |                                             2 x 2 x 1 = 4 |
| LBM Type        |                                     D3Q19 SRT (FP32/FP32) |
| Memory Usage    |                               CPU 1088 MB, GPU 4x 1513 MB |
| Max Alloc Size  |                                                   1216 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 148 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|   15355 |   2349 GB/s |       229 |         9991 110% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 16116                                                  |
.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /                FluidX3D Version 2.6 |
|                                      '         Copyright (c) Moritz Lehmann |
|----------------.------------------------------------------------------------|
| Device ID    0 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    1 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    2 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    3 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    4 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    5 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    6 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    7 | NVIDIA A100-SXM4-40GB                                      |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 1                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 2                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 3                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 4                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 5                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 6                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 7                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                               512 x 512 x 512 = 134217728 |
| Grid Domains    |                                             2 x 2 x 2 = 8 |
| LBM Type        |                                     D3Q19 SRT (FP32/FP32) |
| Memory Usage    |                               CPU 2176 MB, GPU 8x 1523 MB |
| Max Alloc Size  |                                                   1216 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 296 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|   19286 |   2951 GB/s |       144 |         9999 190% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 21564                                                  |
.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /                FluidX3D Version 2.6 |
|                                      '         Copyright (c) Moritz Lehmann |
|----------------.------------------------------------------------------------|
| Device ID    0 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    1 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    2 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    3 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    4 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    5 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    6 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    7 | NVIDIA A100-SXM4-40GB                                      |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 1                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                                608 x 304 x 304 = 56188928 |
| Grid Domains    |                                             2 x 1 x 1 = 2 |
| LBM Type        |                                    D3Q19 SRT (FP32/FP16S) |
| Memory Usage    |                                CPU 910 MB, GPU 2x 1482 MB |
| Max Alloc Size  |                                                   1018 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 176 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|   18372 |   1415 GB/s |       327 |         9989 190% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 18810                                                  |
.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /                FluidX3D Version 2.6 |
|                                      '         Copyright (c) Moritz Lehmann |
|----------------.------------------------------------------------------------|
| Device ID    0 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    1 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    2 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    3 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    4 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    5 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    6 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    7 | NVIDIA A100-SXM4-40GB                                      |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 1                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 2                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 3                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                               608 x 608 x 304 = 112377856 |
| Grid Domains    |                                             2 x 2 x 1 = 4 |
| LBM Type        |                                    D3Q19 SRT (FP32/FP16S) |
| Memory Usage    |                               CPU 1821 MB, GPU 4x 1493 MB |
| Max Alloc Size  |                                                   1018 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 176 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|   20989 |   1616 GB/s |       187 |         9997 170% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 28334                                                  |
.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /                FluidX3D Version 2.6 |
|                                      '         Copyright (c) Moritz Lehmann |
|----------------.------------------------------------------------------------|
| Device ID    0 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    1 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    2 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    3 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    4 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    5 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    6 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    7 | NVIDIA A100-SXM4-40GB                                      |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 1                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 2                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 3                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 4                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 5                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 6                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 7                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                               608 x 608 x 608 = 224755712 |
| Grid Domains    |                                             2 x 2 x 2 = 8 |
| LBM Type        |                                    D3Q19 SRT (FP32/FP16S) |
| Memory Usage    |                               CPU 3643 MB, GPU 8x 1503 MB |
| Max Alloc Size  |                                                   1018 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 351 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|   39400 |   3034 GB/s |       175 |         9994 140% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 40628                                                  |
.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /                FluidX3D Version 2.6 |
|                                      '         Copyright (c) Moritz Lehmann |
|----------------.------------------------------------------------------------|
| Device ID    0 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    1 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    2 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    3 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    4 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    5 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    6 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    7 | NVIDIA A100-SXM4-40GB                                      |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 1                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                                608 x 304 x 304 = 56188928 |
| Grid Domains    |                                             2 x 1 x 1 = 2 |
| LBM Type        |                                    D3Q19 SRT (FP32/FP16C) |
| Memory Usage    |                                CPU 910 MB, GPU 2x 1482 MB |
| Max Alloc Size  |                                                   1018 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 176 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|   13071 |   1006 GB/s |       233 |         9989 190% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 13380                                                  |
.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /                FluidX3D Version 2.6 |
|                                      '         Copyright (c) Moritz Lehmann |
|----------------.------------------------------------------------------------|
| Device ID    0 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    1 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    2 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    3 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    4 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    5 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    6 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    7 | NVIDIA A100-SXM4-40GB                                      |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 1                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 2                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 3                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                               608 x 608 x 304 = 112377856 |
| Grid Domains    |                                             2 x 2 x 1 = 4 |
| LBM Type        |                                    D3Q19 SRT (FP32/FP16C) |
| Memory Usage    |                               CPU 1821 MB, GPU 4x 1493 MB |
| Max Alloc Size  |                                                   1018 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 176 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|   21485 |   1654 GB/s |       191 |         9995 150% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 21584                                                  |
.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /                FluidX3D Version 2.6 |
|                                      '         Copyright (c) Moritz Lehmann |
|----------------.------------------------------------------------------------|
| Device ID    0 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    1 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    2 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    3 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    4 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    5 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    6 | NVIDIA A100-SXM4-40GB                                      |
| Device ID    7 | NVIDIA A100-SXM4-40GB                                      |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 1                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 2                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 3                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 4                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 5                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 6                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 7                                                          |
| Device Name    | NVIDIA A100-SXM4-40GB                                      |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 470.57.02                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 108 at 1410 MHz (6912 cores, 19.492 TFLOPs/s)              |
| Memory, Cache  | 40536 MB, 3024 KB global / 48 KB local                     |
| Buffer Limits  | 10134 MB global, 64 KB constant                            |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                               608 x 608 x 608 = 224755712 |
| Grid Domains    |                                             2 x 2 x 2 = 8 |
| LBM Type        |                                    D3Q19 SRT (FP32/FP16C) |
| Memory Usage    |                               CPU 3643 MB, GPU 8x 1503 MB |
| Max Alloc Size  |                                                   1018 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 351 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|   33048 |   2545 GB/s |       147 |         9995 150% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 33416                                                  |

from fluidx3d.

masazzz avatar masazzz commented on August 26, 2024 1

System: "Flow" Type II subsystem, Information Technology Center, Nagoya University (https://icts.nagoya-u.ac.jp/en/sc/)

gcc/10.3.0
{2GPUs, 4GPUs}
{FP32, FP16S, FP16C}

.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /                FluidX3D Version 2.6 |
|                                      '         Copyright (c) Moritz Lehmann |
|----------------.------------------------------------------------------------|
| Device ID    0 | Tesla V100-SXM2-32GB                                       |
| Device ID    1 | Tesla V100-SXM2-32GB                                       |
| Device ID    2 | Tesla V100-SXM2-32GB                                       |
| Device ID    3 | Tesla V100-SXM2-32GB                                       |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | Tesla V100-SXM2-32GB                                       |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 525.60.13                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 80 at 1530 MHz (5120 cores, 15.667 TFLOPs/s)               |
| Memory, Cache  | 32500 MB, 2560 KB global / 48 KB local                     |
| Buffer Limits  | 8125 MB global, 64 KB constant                             |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 1                                                          |
| Device Name    | Tesla V100-SXM2-32GB                                       |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 525.60.13                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 80 at 1530 MHz (5120 cores, 15.667 TFLOPs/s)               |
| Memory, Cache  | 32500 MB, 2560 KB global / 48 KB local                     |
| Buffer Limits  | 8125 MB global, 64 KB constant                             |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                                512 x 256 x 256 = 33554432 |
| Grid Domains    |                                             2 x 1 x 1 = 2 |
| LBM Type        |                                     D3Q19 SRT (FP32/FP32) |
| Memory Usage    |                                CPU 544 MB, GPU 2x 1500 MB |
| Max Alloc Size  |                                                   1216 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 148 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|    5465 |    836 GB/s |       163 |         9994 140% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 5776                                                   |
.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /                FluidX3D Version 2.6 |
|                                      '         Copyright (c) Moritz Lehmann |
|----------------.------------------------------------------------------------|
| Device ID    0 | Tesla V100-SXM2-32GB                                       |
| Device ID    1 | Tesla V100-SXM2-32GB                                       |
| Device ID    2 | Tesla V100-SXM2-32GB                                       |
| Device ID    3 | Tesla V100-SXM2-32GB                                       |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | Tesla V100-SXM2-32GB                                       |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 525.60.13                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 80 at 1530 MHz (5120 cores, 15.667 TFLOPs/s)               |
| Memory, Cache  | 32500 MB, 2560 KB global / 48 KB local                     |
| Buffer Limits  | 8125 MB global, 64 KB constant                             |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 1                                                          |
| Device Name    | Tesla V100-SXM2-32GB                                       |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 525.60.13                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 80 at 1530 MHz (5120 cores, 15.667 TFLOPs/s)               |
| Memory, Cache  | 32500 MB, 2560 KB global / 48 KB local                     |
| Buffer Limits  | 8125 MB global, 64 KB constant                             |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                                608 x 304 x 304 = 56188928 |
| Grid Domains    |                                             2 x 1 x 1 = 2 |
| LBM Type        |                                    D3Q19 SRT (FP32/FP16S) |
| Memory Usage    |                                CPU 910 MB, GPU 2x 1482 MB |
| Max Alloc Size  |                                                   1018 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 176 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|   10859 |    836 GB/s |       193 |         9995 150% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 11427                                                  |
.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /                FluidX3D Version 2.6 |
|                                      '         Copyright (c) Moritz Lehmann |
|----------------.------------------------------------------------------------|
| Device ID    0 | Tesla V100-SXM2-32GB                                       |
| Device ID    1 | Tesla V100-SXM2-32GB                                       |
| Device ID    2 | Tesla V100-SXM2-32GB                                       |
| Device ID    3 | Tesla V100-SXM2-32GB                                       |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | Tesla V100-SXM2-32GB                                       |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 525.60.13                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 80 at 1530 MHz (5120 cores, 15.667 TFLOPs/s)               |
| Memory, Cache  | 32500 MB, 2560 KB global / 48 KB local                     |
| Buffer Limits  | 8125 MB global, 64 KB constant                             |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 1                                                          |
| Device Name    | Tesla V100-SXM2-32GB                                       |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 525.60.13                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 80 at 1530 MHz (5120 cores, 15.667 TFLOPs/s)               |
| Memory, Cache  | 32500 MB, 2560 KB global / 48 KB local                     |
| Buffer Limits  | 8125 MB global, 64 KB constant                             |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                                608 x 304 x 304 = 56188928 |
| Grid Domains    |                                             2 x 1 x 1 = 2 |
| LBM Type        |                                    D3Q19 SRT (FP32/FP16C) |
| Memory Usage    |                                CPU 910 MB, GPU 2x 1482 MB |
| Max Alloc Size  |                                                   1018 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 176 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|   19556 |    736 GB/s |       170 |         9993 130% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 10018                                                  |
.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /                FluidX3D Version 2.6 |
|                                      '         Copyright (c) Moritz Lehmann |
|----------------.------------------------------------------------------------|
| Device ID    0 | Tesla V100-SXM2-32GB                                       |
| Device ID    1 | Tesla V100-SXM2-32GB                                       |
| Device ID    2 | Tesla V100-SXM2-32GB                                       |
| Device ID    3 | Tesla V100-SXM2-32GB                                       |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | Tesla V100-SXM2-32GB                                       |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 525.60.13                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 80 at 1530 MHz (5120 cores, 15.667 TFLOPs/s)               |
| Memory, Cache  | 32500 MB, 2560 KB global / 48 KB local                     |
| Buffer Limits  | 8125 MB global, 64 KB constant                             |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 1                                                          |
| Device Name    | Tesla V100-SXM2-32GB                                       |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 525.60.13                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 80 at 1530 MHz (5120 cores, 15.667 TFLOPs/s)               |
| Memory, Cache  | 32500 MB, 2560 KB global / 48 KB local                     |
| Buffer Limits  | 8125 MB global, 64 KB constant                             |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 2                                                          |
| Device Name    | Tesla V100-SXM2-32GB                                       |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 525.60.13                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 80 at 1530 MHz (5120 cores, 15.667 TFLOPs/s)               |
| Memory, Cache  | 32500 MB, 2560 KB global / 48 KB local                     |
| Buffer Limits  | 8125 MB global, 64 KB constant                             |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 3                                                          |
| Device Name    | Tesla V100-SXM2-32GB                                       |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 525.60.13                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 80 at 1530 MHz (5120 cores, 15.667 TFLOPs/s)               |
| Memory, Cache  | 32500 MB, 2560 KB global / 48 KB local                     |
| Buffer Limits  | 8125 MB global, 64 KB constant                             |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                                512 x 512 x 256 = 67108864 |
| Grid Domains    |                                             2 x 2 x 1 = 4 |
| LBM Type        |                                     D3Q19 SRT (FP32/FP32) |
| Memory Usage    |                               CPU 1088 MB, GPU 4x 1513 MB |
| Max Alloc Size  |                                                   1216 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 148 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|    7630 |   1167 GB/s |       114 |         9999 190% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 7792                                                   |
.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /                FluidX3D Version 2.6 |
|                                      '         Copyright (c) Moritz Lehmann |
|----------------.------------------------------------------------------------|
| Device ID    0 | Tesla V100-SXM2-32GB                                       |
| Device ID    1 | Tesla V100-SXM2-32GB                                       |
| Device ID    2 | Tesla V100-SXM2-32GB                                       |
| Device ID    3 | Tesla V100-SXM2-32GB                                       |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | Tesla V100-SXM2-32GB                                       |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 525.60.13                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 80 at 1530 MHz (5120 cores, 15.667 TFLOPs/s)               |
| Memory, Cache  | 32500 MB, 2560 KB global / 48 KB local                     |
| Buffer Limits  | 8125 MB global, 64 KB constant                             |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 1                                                          |
| Device Name    | Tesla V100-SXM2-32GB                                       |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 525.60.13                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 80 at 1530 MHz (5120 cores, 15.667 TFLOPs/s)               |
| Memory, Cache  | 32500 MB, 2560 KB global / 48 KB local                     |
| Buffer Limits  | 8125 MB global, 64 KB constant                             |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 2                                                          |
| Device Name    | Tesla V100-SXM2-32GB                                       |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 525.60.13                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 80 at 1530 MHz (5120 cores, 15.667 TFLOPs/s)               |
| Memory, Cache  | 32500 MB, 2560 KB global / 48 KB local                     |
| Buffer Limits  | 8125 MB global, 64 KB constant                             |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 3                                                          |
| Device Name    | Tesla V100-SXM2-32GB                                       |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 525.60.13                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 80 at 1530 MHz (5120 cores, 15.667 TFLOPs/s)               |
| Memory, Cache  | 32500 MB, 2560 KB global / 48 KB local                     |
| Buffer Limits  | 8125 MB global, 64 KB constant                             |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                               608 x 608 x 304 = 112377856 |
| Grid Domains    |                                             2 x 2 x 1 = 4 |
| LBM Type        |                                    D3Q19 SRT (FP32/FP16S) |
| Memory Usage    |                               CPU 1821 MB, GPU 4x 1493 MB |
| Max Alloc Size  |                                                   1018 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 176 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|   15383 |   1185 GB/s |       137 |         9995 150% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 16682                                                  |
.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /                FluidX3D Version 2.6 |
|                                      '         Copyright (c) Moritz Lehmann |
|----------------.------------------------------------------------------------|
| Device ID    0 | Tesla V100-SXM2-32GB                                       |
| Device ID    1 | Tesla V100-SXM2-32GB                                       |
| Device ID    2 | Tesla V100-SXM2-32GB                                       |
| Device ID    3 | Tesla V100-SXM2-32GB                                       |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | Tesla V100-SXM2-32GB                                       |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 525.60.13                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 80 at 1530 MHz (5120 cores, 15.667 TFLOPs/s)               |
| Memory, Cache  | 32500 MB, 2560 KB global / 48 KB local                     |
| Buffer Limits  | 8125 MB global, 64 KB constant                             |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 1                                                          |
| Device Name    | Tesla V100-SXM2-32GB                                       |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 525.60.13                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 80 at 1530 MHz (5120 cores, 15.667 TFLOPs/s)               |
| Memory, Cache  | 32500 MB, 2560 KB global / 48 KB local                     |
| Buffer Limits  | 8125 MB global, 64 KB constant                             |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 2                                                          |
| Device Name    | Tesla V100-SXM2-32GB                                       |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 525.60.13                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 80 at 1530 MHz (5120 cores, 15.667 TFLOPs/s)               |
| Memory, Cache  | 32500 MB, 2560 KB global / 48 KB local                     |
| Buffer Limits  | 8125 MB global, 64 KB constant                             |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|----------------.------------------------------------------------------------|
| Device ID      | 3                                                          |
| Device Name    | Tesla V100-SXM2-32GB                                       |
| Device Vendor  | NVIDIA Corporation                                         |
| Device Driver  | 525.60.13                                                  |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 80 at 1530 MHz (5120 cores, 15.667 TFLOPs/s)               |
| Memory, Cache  | 32500 MB, 2560 KB global / 48 KB local                     |
| Buffer Limits  | 8125 MB global, 64 KB constant                             |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                               608 x 608 x 304 = 112377856 |
| Grid Domains    |                                             2 x 2 x 1 = 4 |
| LBM Type        |                                    D3Q19 SRT (FP32/FP16C) |
| Memory Usage    |                               CPU 1821 MB, GPU 4x 1493 MB |
| Max Alloc Size  |                                                   1018 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 176 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|   14346 |   1105 GB/s |       128 |         9998 180% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 14567                                                  |

from fluidx3d.

ProjectPhysX avatar ProjectPhysX commented on August 26, 2024 1

@rodionstepanov I've noticed this as well. It's quite surprising that there sometimes is considerable differences even between identical GPUs. Depending on the silicon lottery, some GPU/memory chips may boost higher than others. Different CPU/mainboard/PCIe-interconnect/cooling/drivers may also affect results.

from fluidx3d.

illwieckz avatar illwieckz commented on August 26, 2024 1

GPU: AMD Radeon R9 390X Grenada XT (GCN 2.0), here labelled as Hawaii (series, device).
Driver: AMD Orca 21.20-1271047, APP 3224.4

.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /                FluidX3D Version 2.6 |
|                                      '         Copyright (c) Moritz Lehmann |
|----------------.------------------------------------------------------------|
| Device ID    0 | Hawaii                                                     |
| Device ID    1 | Oland                                                      |
| Device ID    2 | pthread-AMD Ryzen Threadripper PRO 3955WX 16-Cores         |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | Hawaii                                                     |
| Device Vendor  | Advanced Micro Devices, Inc.                               |
| Device Driver  | 3224.4                                                     |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 44 at 1080 MHz (2816 cores, 6.083 TFLOPs/s)                |
| Memory, Cache  | 7418 MB, 16 KB global / 32 KB local                        |
| Buffer Limits  | 4048 MB global, 4145152 KB constant                        |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                                256 x 256 x 256 = 16777216 |
| Grid Domains    |                                             1 x 1 x 1 = 1 |
| LBM Type        |                                     D3Q19 SRT (FP32/FP32) |
| Memory Usage    |                                CPU 272 MB, GPU 1x 1488 MB |
| Max Alloc Size  |                                                   1216 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 148 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|    1703 |    261 GB/s |       101 |         9998  80% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 1733                                                   |

from fluidx3d.

illwieckz avatar illwieckz commented on August 26, 2024 1

FP16S

.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /                FluidX3D Version 2.6 |
|                                      '         Copyright (c) Moritz Lehmann |
|----------------.------------------------------------------------------------|
| Device ID    0 | Hawaii                                                     |
| Device ID    1 | Oland                                                      |
| Device ID    2 | pthread-AMD Ryzen Threadripper PRO 3955WX 16-Cores         |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | Hawaii                                                     |
| Device Vendor  | Advanced Micro Devices, Inc.                               |
| Device Driver  | 3224.4                                                     |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 44 at 1080 MHz (2816 cores, 6.083 TFLOPs/s)                |
| Memory, Cache  | 7661 MB, 16 KB global / 32 KB local                        |
| Buffer Limits  | 4048 MB global, 4145152 KB constant                        |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                                256 x 256 x 256 = 16777216 |
| Grid Domains    |                                             1 x 1 x 1 = 1 |
| LBM Type        |                                    D3Q19 SRT (FP32/FP16S) |
| Memory Usage    |                                 CPU 272 MB, GPU 1x 880 MB |
| Max Alloc Size  |                                                    608 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 148 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|    2137 |    165 GB/s |       127 |         9998  80% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 2217                                                   |

FP16C

.-----------------------------------------------------------------------------.
|                       ______________   ______________                       |
|                       \   ________  | |  ________   /                       |
|                        \  \       | | | |       /  /                        |
|                         \  \      | | | |      /  /                         |
|                          \  \     | | | |     /  /                          |
|                           \  \_.-"  | |  "-._/  /                           |
|                            \    _.-" _ "-._    /                            |
|                             \.-" _.-" "-._ "-./                             |
|                               .-"  .-"-.  "-.                               |
|                               \  v"     "v  /                               |
|                                \  \     /  /                                |
|                                 \  \   /  /                                 |
|                                  \  \ /  /                                  |
|                                   \  '  /                                   |
|                                    \   /                                    |
|                                     \ /                FluidX3D Version 2.6 |
|                                      '         Copyright (c) Moritz Lehmann |
|----------------.------------------------------------------------------------|
| Device ID    0 | Hawaii                                                     |
| Device ID    1 | Oland                                                      |
| Device ID    2 | pthread-AMD Ryzen Threadripper PRO 3955WX 16-Cores         |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID      | 0                                                          |
| Device Name    | Hawaii                                                     |
| Device Vendor  | Advanced Micro Devices, Inc.                               |
| Device Driver  | 3224.4                                                     |
| OpenCL Version | OpenCL C 1.2                                               |
| Compute Units  | 44 at 1080 MHz (2816 cores, 6.083 TFLOPs/s)                |
| Memory, Cache  | 7656 MB, 16 KB global / 32 KB local                        |
| Buffer Limits  | 4048 MB global, 4145152 KB constant                        |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled.                                  |
|-----------------.-----------------------------------------------------------|
| Grid Resolution |                                256 x 256 x 256 = 16777216 |
| Grid Domains    |                                             1 x 1 x 1 = 1 |
| LBM Type        |                                    D3Q19 SRT (FP32/FP16C) |
| Memory Usage    |                                 CPU 272 MB, GPU 1x 880 MB |
| Max Alloc Size  |                                                    608 MB |
| Time Steps      |                                                        10 |
| Kin. Viscosity  |                                                1.00000000 |
| Relaxation Time |                                                3.50000000 |
| Reynolds Number |                                                  Re < 148 |
|---------.-------'-----.-----------.-------------------.---------------------|
| MLUPs   | Bandwidth   | Steps/s   | Current Step      | Time Remaining      |
|    1668 |    128 GB/s |        99 |         9999  90% |                  0s |
|---------'-------------'-----------'-------------------'---------------------|
| Info: Peak MLUPs/s = 1722                                                   |

from fluidx3d.

PMunkes avatar PMunkes commented on August 26, 2024 1

RX 7600, Stock. Theoretical memory bandwidth is 288GB/s:
FP32:
grafik
FP16S:
grafik
FP16C:
grafik

from fluidx3d.

jpecar avatar jpecar commented on August 26, 2024 1

IMHO we should be really looking at perf/W. Some of my numbers I can generate, all on D3Q19 and FP16S:

2080Ti ... Peak MLUPs/s = 2291, power fluctuates 165W..246W
3090 ... Peak MLUPs/s = 10635, power around 345W ... 30.8 MLUPs/W
A40 ... Peak MLUPs/s = 6821, at 215W ... 31.7 MLUPs/W
Mi210 ... Peak MLUPs/s = 7199, at 185W ... 38.9 MLUPs/W
A100 ... Peak MLUPs/s = 16203, at 217W ... 74.6 MLUPs/W
H100 ... Peak MLUPs/s = 20339, at 247W ... 82.3 MLUPs/W

Does anyone have L40 to test?

Perf/$ should also be interesting ... I expect Radeon VII & Mi50 to be at the top.

from fluidx3d.

starfire24680 avatar starfire24680 commented on August 26, 2024 1

AMD Instinct MI100:
image
image
image

Amd Pro W6800:
image
image
image

from fluidx3d.

skittles-fivem avatar skittles-fivem commented on August 26, 2024 1

https://www.techpowerup.com/gpu-specs/msi-rtx-4070-ventus-3x-oc.b11046
aaa

from fluidx3d.

skittles-fivem avatar skittles-fivem commented on August 26, 2024 1

@skittles-fivem thank you for the 4070 FP16C benchmark! Could you also post the FP32 and FP16S benchmarks please? Thanks!!

Capture22 22

from fluidx3d.

bochen2027 avatar bochen2027 commented on August 26, 2024 1

2023-06-25 14_43_02
2023-06-25 14_43_29
2023-06-25 14_44_13

anyway to use resizeable bar to support the usage of system memory?

from fluidx3d.

aschillingHWL avatar aschillingHWL commented on August 26, 2024 1

All three benchmarks with the AMD Radeon PRO W7900

RadeonProW7900-FluidX3D-FP32-FP16S RadeonProW7900-FluidX3D-FP32-FP16C RadeonProW7900-FluidX3D-FP32-FP32

from fluidx3d.

aschillingHWL avatar aschillingHWL commented on August 26, 2024 1

AMD Radeon PRO W7800

RadeonProW7800-FluidX3D-FP32-FP16C RadeonProW7800-FluidX3D-FP32-FP16S RadeonProW7800-FluidX3D-FP32-FP32

from fluidx3d.

aschillingHWL avatar aschillingHWL commented on August 26, 2024 1

AMD RTX 6000 Ada Generation

RTX6000-Ada-FluidX3D-FP32-FP16C RRTX6000-Ada-FluidX3D-FP32-FP16S RTX6000-Ada-FluidX3D-FP32-FP32

from fluidx3d.

Derakoptes avatar Derakoptes commented on August 26, 2024 1

RTX 3050M, 60WATTS TDP

Screenshot 2023-07-07 093736

image

image

from fluidx3d.

HapppyLance avatar HapppyLance commented on August 26, 2024 1

AMD RX6800M
Screenshot 2023-07-10 205458
Screenshot 2023-07-10 205652
Screenshot 2023-07-10 210005

from fluidx3d.

ibonito1 avatar ibonito1 commented on August 26, 2024

I'd love to add to the benchmarks list. I've got two questions:

  1. I want to benchmark a dual Epyc system (so specifically the CPUs actually). How would I do that (under Windows, but Linux would also be fine), if I have a GPU installed? It always automatically detects the GPU when running the benchmark “releases”.
  2. How to post the benchmarks? Just copy the console output in here?

Cheers!

from fluidx3d.

ProjectPhysX avatar ProjectPhysX commented on August 26, 2024

Hi ibonito1,

OpenCL support on EPYC CPUs is a bit difficult as these are not officially supported by AMD. Being x86-64, they should work with the Intel OpenCL CPU Runtime though, or alternatively with POCL. Fingers crossed!
To run on a specific device, in the console run ./FluidX3D.exe 2 (on Linux) or FluidX3D.exe 2 (on Windows), to select device with ID 2 for example.
You can just copy the console output here.

Regards,
Moritz

from fluidx3d.

ProjectPhysX avatar ProjectPhysX commented on August 26, 2024

C-Dub2022 thank you very much for the RX 580 benchmark! If you can post the FP16S and FP16C benchmarks as well, I'll add them to the readme!

from fluidx3d.

MarcoAurelioFerrari avatar MarcoAurelioFerrari commented on August 26, 2024

RTX 3060 12GB - v1.1

FP32-FP16C
FP32-FP16C

FP32-FP16S
FP32-FP16S

FP32-FP32
FP32-FP32

from fluidx3d.

ProjectPhysX avatar ProjectPhysX commented on August 26, 2024

MarcoAurelioFerrari thank you!

from fluidx3d.

dongwang22 avatar dongwang22 commented on August 26, 2024

Could you please tell me how to open the visualized interface of the flow domain as you said in the readme file? You said input the 2 can turn on the velocity field, but it does not work in the benchmark case. How can I generate pictures like you prensent on twitter ?
image

from fluidx3d.

ProjectPhysX avatar ProjectPhysX commented on August 26, 2024

Hi dongwang22,

thanks for the benchmark! For the visual interface, uncomment #define WINDOWS_GRAPHICS and comment #define BENCHMARK in src/defines.hpp, and uncomment for example the Taylor-Green setup in src/setup.cpp. Then compile and you should see the graphical interface where you can toggle rendering modes with keys 1/2/3/4. To generate videos, see the other setups: basically make a C++ loop and repeatedly do some LBM time steps and render images with the corresponding methods of the LBM class.

Regards,
Moritz

from fluidx3d.

ProjectPhysX avatar ProjectPhysX commented on August 26, 2024

Hi gittigittibangbang, thanks for the benchmarks! Efficiency is ~60% which is typical for the AMD GPUs. Performance is limited by VRAM bandwidth only, and the RX 6800 would presumably perform exactly the same. The benchmark setup is a 256³ box, that fills 1.5GB (FP32) or 0.9GB (FP16) of VRAM. The large infinity cache (128MB) is only an insignificant fraction of that so does not significantly boost performance.
With a smaller 128³ box however, which only fills 186MB (FP32) or 76MB (FP16), almost the entire grid fits in the cache and effective bandwidth is much larger.

from fluidx3d.

edmond1992 avatar edmond1992 commented on August 26, 2024

Is it possible to add ready-to-run benchmark for MacOS so we can get more result on Mac?
Especially the test is bandwidth limited and Apple silicon should be good at this.
Not to mention relatively cheap 64GB+ VRAM as they share the same main memory.

from fluidx3d.

ProjectPhysX avatar ProjectPhysX commented on August 26, 2024

@HAL9000COM thanks for the Vega 8 benchmarks! Quick question: Is your RAM is 2x16GB DDR4-3200MT/s? And do you have an idea why the GPU shows up 3 times?

from fluidx3d.

ProjectPhysX avatar ProjectPhysX commented on August 26, 2024

Is it possible to add ready-to-run benchmark for MacOS so we can get more result on Mac? Especially the test is bandwidth limited and Apple silicon should be good at this. Not to mention relatively cheap 64GB+ VRAM as they share the same main memory.

@edmond1992 unfortunately I don't have a Mac, so I can't compile add the executables for MacOS. But the code should work as-is; just compile it as-is with the third line in make.sh and you'll get the FP32 benchmark. Uncomment FP16S/FP16C in src/defines.hpp and recompile to get the other 2 benchmarks.

from fluidx3d.

edmond1992 avatar edmond1992 commented on August 26, 2024

from fluidx3d.

SLGY avatar SLGY commented on August 26, 2024

GTX 1050 on an old gaming laptop. It's amazing I figured out how to even run this and get a benchmark. Now I'm going to try and figure out how to run the simulation on an stl (or similar) file. I know how to use Blender quite well, but this is my first time with visial studio or command line stuff. I'm so out of my depth here 😟

Screenshot (103)

from fluidx3d.

ProjectPhysX avatar ProjectPhysX commented on August 26, 2024

Hi @SirWixy, thank you so much for the benchmarks! Can you post the FP16S and FP16C results too?

from fluidx3d.

ProjectPhysX avatar ProjectPhysX commented on August 26, 2024

@gittigittibangbang thanks for the benchmarks! For the CPU you can just stop it with Ctrl+C after it has leveled at constant performance, and take the last MLUPs/s reading. Can you post the program header with the Xeon Gold for the specs, and performance values for FP16S and FP16C too for the Xeon? Thanks!

from fluidx3d.

ProjectPhysX avatar ProjectPhysX commented on August 26, 2024

@gittigittibangbang last question: What is the RAM configuration on the Xeon Gold 5218? 8x 32GB DDR4 2667MT/s in quad-channel?

from fluidx3d.

Michallote avatar Michallote commented on August 26, 2024

Hey thanks so much for the help setting things up. I have ran the benchmarks on my GPU. I was very curious to see what would turn out to be the performance. My GPU is NVIDIA RTX 2060 KO. Which is a version which used higher quality chips that didn't pass the test to become RTX 2080s. So the actual chip is an TU104 (same as 2080 RTX and Quadro 4000) unlike most RTX 2060 that have a TU106 and as everything else is the same it could be a decent comparison of those Graphics processors:

FP32-FP32
image

FP32-FP16S
image

FP32-FP16C
image

However this results might be a bit lower than they should because the max bandwith of this GPU is 336.0 GB/s and it ran only at about 250.0 GB/s, do anybody know if this is normal? I had a couple of apps open. I might re run this later with the PC completely unloaded. In the meantime we can see the difference between RTX 2060 and RTX 2060 Super is huge!

from fluidx3d.

rodionstepanov avatar rodionstepanov commented on August 26, 2024

RTX 3090 Ti
Doc1.pdf
As we see bandwidth does not exceeds 873GBps. However the specification tells it should be 1018GBps at max.
Taking your estimate that a single lattice point requires 1241 (FP32/FP16C) FLOPs per time step we obtain only 13.3 TFLOPs/s instead of 40. Am I right?

from fluidx3d.

ProjectPhysX avatar ProjectPhysX commented on August 26, 2024

Hi @rodionstepanov, thanks for the 3090 Ti benchmarks!! FluidX3D is bandwidth bound, so it uses all* the available memory bandwidth, but only a small fraction of the available TFLOPs/s. If you compare 2 GPUs with the same bandwidth, for example 3060 Ti and 2060 Super (both 448GB/s), they will perform the same in FluidX3D, despite the 3060 Ti having >double the TFLOPs/s of the 2060 Super.

*You see only ~837GB/s instead of 1008GB/s because the Esoteric-Pull streaming algorithm I use requires some misaligned write operations that cannot be at full bandwidth. You're at 87% overall efficiency which is very good already.

The alternative to Esoteric-Pull would be the One-Step-Pull streaming algorithm, that avoids all misaligned write operations and can actually reach 100% efficiency on modern GPUs. However it's drawbacks are that it a) requires double the VRAM capacity for the same grid resolution and b) needs to load flags of neighboring grid points during streaming, so overall performance is actually lower than with Esoteric-Pull despite better efficiency. See this paper for details.

from fluidx3d.

rodionstepanov avatar rodionstepanov commented on August 26, 2024

@ProjectPhysX that is clear. I defiantly prefer higher resolutions so Esoteric-Pull is my choice. Since GPU is underloaded it could be reasonable to use more sophisticated algorithm which requires more FLOPs per lattice per step and does not need bandwidth. For example an increase stability and etc would be nice.

from fluidx3d.

ProjectPhysX avatar ProjectPhysX commented on August 26, 2024

@rodionstepanov the (relative to FLOPs) underperforming memory is a big problem across a lot of HPC software. Chip development progresses much faster than memory development for over a decade now; the FLOPs/Byte ratio is ever increasing. Using the "spare" FLOPs to improve model accuracy without performance loss is a common strategy. I'm already leveraging that with FP16 memory compression, for 2-8x increase in FLOPs/Byte by cutting memory access in half and using spare FLOPs for number conversion to the more accurate FP16C format. Still it's all bandwidth-bound.
Another possibility with LBM is a more sophisticated collision operator. So far though, the simple SRT/BGK collision has proven best for both accuracy and stability. I'll look into cumulant and central moment operators in the future.

from fluidx3d.

ProjectPhysX avatar ProjectPhysX commented on August 26, 2024

@trparry thank you so much! That 80GB A100 absolutely shreds! Can please you post the FP16S and FP16C benchmarks as well?

from fluidx3d.

IllesHUN avatar IllesHUN commented on August 26, 2024

RTX 3050 laptop GPU

Also, I cant find the place where the .stl files for the setups have to be. i've done everything but its always just ... .stl does not exist

D3Q9_FP32
D3Q9_FP16S
D3Q9_FP16C

from fluidx3d.

IllesHUN avatar IllesHUN commented on August 26, 2024

Thanks, I was doing that already, but I found out what the problem was. I am using the provided setup for the f1 car and it had two extra dots in front of the directory that i had to remove (took 3 straight hours to notice it).

And I have another question. after I have got the F1 car model to get voxelized and appear visually I started the simulation and i noticed that when using the 4 key visualizing (isosurface i think) it didn't show (visible for me) anything like it did when I tested the delta wing which is built in to the code and not a .stl file, I also noticed that the simulation time is moving much slower, and I don't know if something is wrong or I shouldn't even except the same results.

Thanks for all the help in advance. I have a middle school level of C# knowledge so it pretty hard to understand whats happening but at least im not clueless, also I started discovering CFD basically 2 days ago, so sorry if Im asking stupid questions.

from fluidx3d.

kendrickxy avatar kendrickxy commented on August 26, 2024

The RTX 3080 TI performed a little better than expected on FS16S:

FS16S
SRT:
MBench_FS16S_D3Q19_SRT

TRT:
MBench_FS16S_D3Q19_TRT

FS16C:
SRT:
MBench_FS16C_D3Q19_SRT

TRT:
MBench_FS16C_D3Q19_TRT

Overall the same score with 256 grid resolution

from fluidx3d.

ProjectPhysX avatar ProjectPhysX commented on August 26, 2024

@ConfusedWizard gotcha, thanks for clarifying and providing stock benchmarks! So as expected the 4090 performs basically the same as a 3090 Ti.
It's only fair to always use the peak data sheet bandwidth to compute efficiency, as that's really the upper limit. In a memory bandwidth benchmark you get different bandwidth numbers for coalesced/misaligned read/write access, and usually only coalesced read/write gets close to peak, see here figure 22.

from fluidx3d.

SLGY avatar SLGY commented on August 26, 2024

Hi ibonito1,

OpenCL support on EPYC CPUs is a bit difficult as these are not officially supported by AMD. Being x86-64, they should work with the Intel OpenCL CPU Runtime though, or alternatively with POCL. Fingers crossed! To run on a specific device, in the console run ./FluidX3D.exe 2 (on Linux) or FluidX3D.exe 2 (on Windows), to select device with ID 2 for example. You can just copy the console output here.

Regards, Moritz

Would it be possible to run this on Google Colab? I have a lot of credit on there and it would be good to run this on their A100 GPU's... I've tried, at a basic level, to use the gcc compiler on there after mounting the FluidX3D files in my google drive. I have tried compliling a few of the .cpp files to no avail and get different types of errors.
Or is there a way to compile FluidX3d in Visual Studio so they can be run on Colab directly? (and as far as I know I can't run .exe files on Colab)

from fluidx3d.

ProjectPhysX avatar ProjectPhysX commented on August 26, 2024

@SirWixy I've run it in Colab already to benchmark the Tesla T4. Make sure to have WINDOWS_GRAPHICS disabled and compile with ./make.sh. You might also have to enable UTILITIES_NO_CPP17 in src/utilities.hpp line 10 in case gcc there does not support C++17; this will disable automatic folder creation for exported files, so make sure to have the bin/export/ folder setup before running the setup, or else it won't write any files.

from fluidx3d.

Shmarvadon avatar Shmarvadon commented on August 26, 2024

2X Intel ARC A770

image

from fluidx3d.

sachithdickwella avatar sachithdickwella commented on August 26, 2024

FP32-FP16S on RTX 2070 (Mobile)

image

from fluidx3d.

ProjectPhysX avatar ProjectPhysX commented on August 26, 2024

@masazzz thank you very much, this is amazing hardware! While for single-GPU the performance is mostly independent of resolution, for multi-GPU it makes a bigger difference, and especially the high resolutions are of interest. At low resolution, the domain communication overhead is more significant compared to the computation of the domains themselves. So I'd expect quite a bit of improvement at larger resolution.

Feels too much to ask for, but would you mind benchmarking the 2x/4x/8x GPU configurations with "memory=39800u;" for the A100's and "memory=31800u;" for the V100's? At this higher resolution, you can set the loop iterations from 1000 to 80 so it doesn't take that long.

Thank you so much!!

from fluidx3d.

lgmnrx avatar lgmnrx commented on August 26, 2024

Hello, this benchmark was performed on a RX 6700M(130W), the card was set on extreme performance. Still having doubts about the bandwidth used during the benchmark as the card has a 320GB/s bandwidth. Let me know if i should redo the testing.
fp32
fp16c
fp16s

from fluidx3d.

ProjectPhysX avatar ProjectPhysX commented on August 26, 2024

@lgmnrx thank you very much! ~60% efficiency is typical for RDNA 1/2/3 GPUs. All good!

from fluidx3d.

ProjectPhysX avatar ProjectPhysX commented on August 26, 2024

@illwieckz awesome, 512-bit memory bus FTW! Could you provide the FP16S/FP16C benchmarks as well please?

from fluidx3d.

ProjectPhysX avatar ProjectPhysX commented on August 26, 2024

@jpecar indeed, that is a good metric. Here is some useful charts of all benchmarked hardware so far:

Performance [MLUPs/s]
image

Memory efficiency (roofline model) [%]
image

Performance per Watt [MLUPs/s / W]
image

Performance per $ (launch price) [MLUPs/s / $]
image

Value [MLUPs/s * memory capacity / (W * $)]
image

from fluidx3d.

ProjectPhysX avatar ProjectPhysX commented on August 26, 2024

@skittles-fivem thank you for the 4070 FP16C benchmark! Could you also post the FP32 and FP16S benchmarks please?
Thanks!!

from fluidx3d.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.