Comments (7)
Another potential solution: https://github.com/ospray/tsimd
from fluid-engine-dev.
There is the bullet physics math library which has SIMD extensions... https://github.com/bulletphysics/bullet3/tree/master/src/LinearMath
Older versions of BulletPhysics used the "Sony Vector Math Library", but it looks like Bullet consolidated it to what is above... Similar goals though, I see lots of SSE....
Sony Vector Math lib here:
https://github.com/erwincoumans/sce_vectormath
Support for PPC (ppu/
directory) and Intel (SSE/ directory)... and the SPU chip in the PS3... (spu/
dir) and a CPU only version (in scalar/cpp)
from fluid-engine-dev.
Thanks for the resources! I will take a look.
from fluid-engine-dev.
The Sony one is very similar to ones I've used on Xbox360 and Playstation3 in AAA titles. We are fortunate that Sony opensourced it. It's pretty rare to find a production quality (cross platform) math lib with native vector intrinsics / SoA.
Here is one post when Bullet Physics dropped the Sony library around 2015. So I haven't compared the two math libs, but maybe the Bullet team had a reason. Or maybe the Bullet math is more tailored to Bullet's needs...
from fluid-engine-dev.
Often much of the computation speed up from simds doesn't come up from vector operation, true a vector maps to a simd register nicely but often the operation do not, like cross and dot are not crazily simds friendly. The real good speeds up comes from processing like 4-8 particles at the time, in those cases SOA data layout helps a lot. I would expect particles to map quite well with that. If you guys know more I would love to know that.
M
from fluid-engine-dev.
He's right, Definitely have to be careful. Think of vector and floating point as running on separate units inside the CPU (the FPU and the vector unit) - and if you use results from one in the other, you suffer a penalty (called a Load Hit Store). Basically some code execution latency (due to CPU pipeline stall) as the CPU marshals the data over to the other unit.
The strategy to use with SIMD is to keep the work on the vector unit in the CPU - when a scalar is needed in a calculation, use a SIMD "scalar" type, basically you're only using the X component of the 4 vector. There are certainly cases where you can do clever things to process 4 particles at once in each of the X/Y/Z/W components to get 4x as giordi91 says.
Executing math functions sequentially on large memory coherent arrays of vectors are the best, as giordi91 says.
Another speedup tip is to avoid conditionals. Sure branch prediction is fast, but even faster is no conditional. Often you can have a simd "boolean" (just a floatingpoint 0 - 1) that you multiply in a very basic equation:
result = pickMe * myBoolean + orPickMe * myBoolean;
When myBoolean
is 1, you get pickMe
, and when myBoolean
is 0, you get orPickMe
...
And my final advice is to take advantage of CPU's parallel pipelining. Interlace non-dependent operations so that while one of the hidden pipelines in your CPU is working on your operation, you can keep the other pipelines full:
Vec4 someResult1 = a.someMath()
Vec4 someResult2 = b.someMath() // can pipeline into the CPU 'while' the 1st one is running
Vec4 someResult3 = c.someMath() // can pipeline into the CPU 'while' the 1st two are running
/* use the someResult1 */
/* use the someResult2 */
/* use the someResult3 */
You'd think the compiler would be smart enough, and sometimes it is. But sometimes it isn't... I've seen gains just by rearranging the order of my code, so that tells me this is worth knowing about.
from fluid-engine-dev.
Thanks for the great input, @giordi91 and @subatomicglue!
I haven't spent much time on this topic lately. But as @giordi91 mentioned, I also think batch processing of particles (4~8 in bundle) would be nicer. That becomes a little bit tricky when dealing with SPH operators which are essentially for each neighbor { ... }
since it's likely to be an unordered random neighbor access, though.
Grid-based/hybrid simulations could be a bit more straightforward compared to the SPH solvers. The main perf bottleneck is in pressure Poisson solver which is basically a combination of BLAS function calls (mat x vec, axpy, and something very similar). So I think vectorizing Fdm*
solvers could bring some meaningful perf enhancement. Actually, it would be great to see some contributions in this area since I'm mostly focusing on GPGPU at the moment.
from fluid-engine-dev.
Related Issues (20)
- How do I run the examples and the animations?
- How to run the particles2obj example? HOT 14
- Compile error when compiling with Clang 10 HOT 1
- Promotion of projects/research using Jet Framework HOT 4
- How to implement the xyz to mesh generation to another code for fluid simulation? HOT 1
- How do I set up a fluid particles at rest and a fluid particles with initial velocity HOT 3
- Suggestion to replace Travis CI/Appveyor with GitHub Actions HOT 5
- Some unit tests failure on 32-bit system
- Fix Visual Studio 2017 build failure on GitHub Action
- Timer test code failure on MinGW system
- No known features for CXX compiler "Clang" HOT 5
- running the Python example show error HOT 8
- Replace Clara
- makeTranslationMatrix should be transposed HOT 8
- About the local method in the surface_to_implicit HOT 1
- GridSmokeSolver3::computeDiffusion is diffusing smokeDensity instead of temperature HOT 2
- Incorrect formula for collider boundary conditions (Equation 3.18 in the book)
- can't build cmake in win64 HOT 1
- bug in void Matrix<T, M, N>::invert() in "include/detail/matrix-inl.h" HOT 1
- Newer Versions of Flatbuffer is breaking the build in particle_system_data2.cpp HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fluid-engine-dev.