GithubHelp home page GithubHelp logo

cjsatuforc / gpu_fftw Goto Github PK

View Code? Open in Web Editor NEW

This project forked from f5oeo/gpu_fftw

0.0 1.0 0.0 109 KB

Run FFTW3 programs with Raspberry Pi GPU - fast ffts!

License: Other

Python 8.77% Makefile 4.62% C 76.56% C++ 10.05%

gpu_fftw's Introduction

Run FFTW3 programs with Raspberry Pi GPU

๐ŸŽ What you get

Use the Raspberry Pi GPU to calculate fast FFTs without changing source code. Binary compatibility with fftw3 programs.

โš ๏ธ This is an early alpha release, it is not production quality. As always, use at your own risk.

๐Ÿ Quickstart

Make sure you have fftw3 installed first.

git clone https://github.com/gpu-fftw/gpu_fftw.git
cd gpu_fftw
make

After make finishes run the following to make sure everything is working:

sudo ./gpu_fftw -D 0 -t

You should see something like this:

gpu_fftw - Version 0.1.1-3-gcd19

GPU FFT forward/reverse error = 0.680192ppm (nrms error)
GPU_FFTW/FFTW difference = 4.82646ppm (nrms error)
GPU FFTW 7.11652 times faster (11523.3 ffts/sec, 86.781 usec/fft, fftw3: 1619.23 ffts/sec)

Override FFT3W...PASSED
Test suite passed.

Notice for this run, GPU_FFTW ran 7 times faster than fftw3, it will vary with cpu usage, fft size, etc. Then you can run your program with:

sudo ./gpu_fftw -D 1 -d myfftprogram <args...>

๐Ÿ” Why?

Thanks to the work of Andrew Holme we can now have fast GPU aided FFTs on the Raspberry Pi. They can be up to ten times faster than running fftw3 by itself. However, in order to use the GPU we have to write specialized code that makes use of the GPU_FFT api, and many programs that are already written do not use this api.

Most programs use the fftw3 library, which is one of the best available libraries for fft computation on a regular cpu. Even high-end mathematical programs like octave and matlab use fftw3.

This is where the idea of GPU_FFTW originated. The idea is to have binary compatibility with fftw3. The goal is to simply install gpu_fftw and let your programs take advantage of the GPU.

This is not easy to do, so there are some limitations, make sure you read the [limitations] section.

๐Ÿ‘€ Limitations

  • Float precission: For now, Andrew's work only supports float precision. Many users typically use fftw3 with double precision. Hopefully Andrew will add support for double precision to his work.

    To attenuate this problem, gpu_fftw supports double squashing which allows you to compute a float based fft on the GPU even if the user requested a double precision fft. This degraded precision may or may not work with your particular appplication. To enable double squashing use the -d option.

  • N has to be a power of two: N for the gpu fft is limited to powers of two, Log2(N) has to be in the range [8,21]. gpu_fftw falls back to fftw3 if N is not a power of two or if it is outside the supported range.

  • 2d, 3d and real FFTs are not supported yet

  • Fortran: A basic fortran api is provided for programs that have been compiled with GNU fortran. Other compilers may be added later if anyone asks.

  • Auto-fallback: If there is any problem starting the GPU fft (e.g. lack of memory or permissions) gpu_fftw automatically falls back to fftw3.

  • Array copying: gpu_fftw copies the data arrays back and forth. It does it in the fastest way possible, but still needs more memory than fftw3. This will be fixed later on when we provide overrides for the fftw3 malloc family of functions.

๐Ÿ’ก TODO

  • Fortran programs are not working (segfault), Fortran API needs to be added
  • fftw_malloc family can be overriden to avoid copying arrays
  • Test cases/benchmarks for double squashing
  • Support double squashing
  • r2c and c2r API missing, should we add it?
  • 2d API
  • Drop root permissions asap
  • Fortran API for compilers other than GNU fortran
  • support for fftw3_threads, so that octave works

gpu_fftw's People

Contributors

f5oeo avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.