GithubHelp home page GithubHelp logo

blackwer / fft_bench Goto Github PK

View Code? Open in Web Editor NEW
3.0 1.0 2.0 5.34 MB

More benchmarks of various fft implementations

License: GNU General Public License v3.0

CMake 26.29% C++ 30.02% Python 16.12% Shell 27.58%

fft_bench's Introduction

Benchmarking various FFT implementations

FFTW3 run in FFTW_MEASURE mode. FFTW_PATIENT is death, and I wanted to give it a fair shot with the default. It's compiled with

./configure --enable-openmp --enable-shared=yes --enable-sse2 --enable-avx\
    --enable-avx2 --enable-avx512 --disable-kcvi --disable-vsx \
    --disable-avx-128-fma --enable-fma

using gcc 12.2.0. All benchmarks are compiled with native instructions enabled: -march=native -O3.

MKL FFTW KISS Pocket DUCC Sleef
2023.1.0 3.3.10 131.1.0 81d171a6 0.32.0 3.5.1

Single threaded results

1D

2D

3D

Multi threaded results

All benchmarks are run on a single socket, using all available cores on that socket. I.e.

# Rome: 64 cores
# Skylake: 20 cores
# Icelake: 32 cores

export OMP_NUM_THREADS=$((nproc / 2))
export OMP_PROC_BIND=spread
export OMP_PLACES=threads

taskset -c 0-$((OMP_NUM_THREADS-1)) blah_bench args

Note that KISS has openmp enabled, but I didn't do a separate build for it. Given its performance, I am happy to ignore it. So the following benchmarks are just for mkl/fftw3.

1D

2D

Note the AMD measurements are not in error. This really happens consistently. MKL is very unhappy with more than 16 threads for these particular sizes in 2D.

3D

fft_bench's People

Contributors

blackwer avatar mreineck avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar

fft_bench's Issues

ducc FFT support?

Would you be interested in adding ducc FFT to the implementations? I think I could provide a PR fairly easily.

Suggestions for benchmark plots, and some ducc news

Especially when looking at the 1D benchmark plots it is fairly difficult to estimate the quantitative differences between all the FFT implementations, because the y axis covers so many orders of magnitude. I wonder whether it would be clearer to plot "wallclock time per grid point" instead of "wallclock time per full FFT"? The cuFinufft paper has done this, as far as I remember.

Concerning ducc FFT, I have been working on further reducing the multithreading overhead for small transforms, as well as improving large multithreaded 1D transforms, both with fairly good success. I'm plotting the run time changes on my laptop below (using the proposed new y-axis).

The only drawback of the change is that the new ducc version would require C++20, and I'm not sure how many users will be sufficiently adventurous to switch to such a recent standard...

[Edit: updated figures]

1d_c2c_st_marvin
2d_c2c_st_marvin
3d_c2c_st_marvin
1d_c2c_mt_marvin
2d_c2c_mt_marvin
3d_c2c_mt_marvin

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.