Comments (5)
Interesting how CPU multi-threading seems to not scale much with current case sizes (at least from 12 threads upwards).
I think there's some problem with the CPU threading, because in htop I see very few cores busy, I don't know if it's bad load balancing, too many memory allocations causing lots of GC pauses, or what else, I haven't looked at the code.
Also I understand that the p=8 jelly case does not fit in the GPU?
No, on the GPU it works fine, if I remember correctly it takes about 50 seconds, it's the CPU version which is unbearably slow, I think it was going to take over 20 minutes with 12 threads, and probably not much less with more threads, given the generally bad scaling on CPU, so I didn't want to spend more than 2 hours to get the results.
from waterlily.jl.
Hey! I updated the benchamark suite and now you need to explicitly pass "case arguments" for all requested cases, that would be for example:
bash "${WATERLILY_ROOT}/benchmark/benchmark.sh" -v "1.10.0" -t "12 24 36 48 60 72" -c "tgv jelly" -p "5,6,7,8 5,6,7,8" -s "100 100" -ft "Float32 Float32"
from waterlily.jl.
Thanks!
For the record, with #101 and running
# Get Waterlily root directory
WATERLILY_ROOT=$(julia --project=. --startup-file=no -e 'using WaterLily; print(pkgdir(WaterLily))')
# Run the benchmarks. jelly only up to log2p=7 because the case log2p=8 is superslow on CPU
"${WATERLILY_ROOT}/benchmark/benchmark.sh" -v "1.10" -t "12 24 36 48 60 72" -b "Array CuArray" -c "tgv jelly" -p "5,6,7,8 5,6,7" -s "100 100" -ft "Float32 Float32"
on Nvidia GH200 I get
Benchmark environment: tgv sim_step! (max_steps=100)
▶ log2p = 5
┌─────────┬───────────┬────────┬───────────┬─────────────┬────────┬──────────┬──────────┐
│ Backend │ WaterLily │ Julia │ Precision │ Allocations │ GC [%] │ Time [s] │ Speed-up │
├─────────┼───────────┼────────┼───────────┼─────────────┼────────┼──────────┼──────────┤
│ CPUx12 │ 43e5784 │ 1.10.0 │ Float32 │ 10050707 │ 2.03 │ 1.42 │ 1.00 │
│ CPUx24 │ 43e5784 │ 1.10.0 │ Float32 │ 16226723 │ 2.24 │ 1.89 │ 0.75 │
│ CPUx36 │ 43e5784 │ 1.10.0 │ Float32 │ 21577295 │ 2.48 │ 2.51 │ 0.56 │
│ CPUx48 │ 43e5784 │ 1.10.0 │ Float32 │ 25294522 │ 2.53 │ 3.07 │ 0.46 │
│ CPUx60 │ 43e5784 │ 1.10.0 │ Float32 │ 28977646 │ 2.69 │ 3.34 │ 0.42 │
│ CPUx72 │ 43e5784 │ 1.10.0 │ Float32 │ 32296306 │ 2.80 │ 4.07 │ 0.35 │
│ GPU │ 43e5784 │ 1.10.0 │ Float32 │ 5291894 │ 1.70 │ 1.18 │ 1.20 │
└─────────┴───────────┴────────┴───────────┴─────────────┴────────┴──────────┴──────────┘
▶ log2p = 6
┌─────────┬───────────┬────────┬───────────┬─────────────┬────────┬──────────┬──────────┐
│ Backend │ WaterLily │ Julia │ Precision │ Allocations │ GC [%] │ Time [s] │ Speed-up │
├─────────┼───────────┼────────┼───────────┼─────────────┼────────┼──────────┼──────────┤
│ CPUx12 │ 43e5784 │ 1.10.0 │ Float32 │ 11067020 │ 1.48 │ 3.65 │ 1.00 │
│ CPUx24 │ 43e5784 │ 1.10.0 │ Float32 │ 17929605 │ 2.01 │ 4.03 │ 0.91 │
│ CPUx36 │ 43e5784 │ 1.10.0 │ Float32 │ 24063681 │ 2.34 │ 4.71 │ 0.78 │
│ CPUx48 │ 43e5784 │ 1.10.0 │ Float32 │ 29679847 │ 2.64 │ 5.48 │ 0.67 │
│ CPUx60 │ 43e5784 │ 1.10.0 │ Float32 │ 35276190 │ 2.77 │ 6.12 │ 0.60 │
│ CPUx72 │ 43e5784 │ 1.10.0 │ Float32 │ 40603811 │ 2.82 │ 7.30 │ 0.50 │
│ GPU │ 43e5784 │ 1.10.0 │ Float32 │ 4838973 │ 1.60 │ 1.07 │ 3.40 │
└─────────┴───────────┴────────┴───────────┴─────────────┴────────┴──────────┴──────────┘
▶ log2p = 7
┌─────────┬───────────┬────────┬───────────┬─────────────┬────────┬──────────┬──────────┐
│ Backend │ WaterLily │ Julia │ Precision │ Allocations │ GC [%] │ Time [s] │ Speed-up │
├─────────┼───────────┼────────┼───────────┼─────────────┼────────┼──────────┼──────────┤
│ CPUx12 │ 43e5784 │ 1.10.0 │ Float32 │ 7600713 │ 1.71 │ 9.53 │ 1.00 │
│ CPUx24 │ 43e5784 │ 1.10.0 │ Float32 │ 13186184 │ 2.94 │ 8.10 │ 1.18 │
│ CPUx36 │ 43e5784 │ 1.10.0 │ Float32 │ 18532520 │ 3.52 │ 8.14 │ 1.17 │
│ CPUx48 │ 43e5784 │ 1.10.0 │ Float32 │ 23642594 │ 4.22 │ 8.50 │ 1.12 │
│ CPUx60 │ 43e5784 │ 1.10.0 │ Float32 │ 28747149 │ 4.64 │ 8.85 │ 1.08 │
│ CPUx72 │ 43e5784 │ 1.10.0 │ Float32 │ 33606789 │ 5.02 │ 9.84 │ 0.97 │
│ GPU │ 43e5784 │ 1.10.0 │ Float32 │ 3752082 │ 1.38 │ 1.05 │ 9.04 │
└─────────┴───────────┴────────┴───────────┴─────────────┴────────┴──────────┴──────────┘
▶ log2p = 8
┌─────────┬───────────┬────────┬───────────┬─────────────┬────────┬──────────┬──────────┐
│ Backend │ WaterLily │ Julia │ Precision │ Allocations │ GC [%] │ Time [s] │ Speed-up │
├─────────┼───────────┼────────┼───────────┼─────────────┼────────┼──────────┼──────────┤
│ CPUx12 │ 43e5784 │ 1.10.0 │ Float32 │ 7325711 │ 2.33 │ 69.45 │ 1.00 │
│ CPUx24 │ 43e5784 │ 1.10.0 │ Float32 │ 13016663 │ 4.09 │ 55.60 │ 1.25 │
│ CPUx36 │ 43e5784 │ 1.10.0 │ Float32 │ 18530143 │ 5.54 │ 52.38 │ 1.33 │
│ CPUx48 │ 43e5784 │ 1.10.0 │ Float32 │ 23822965 │ 7.33 │ 52.14 │ 1.33 │
│ CPUx60 │ 43e5784 │ 1.10.0 │ Float32 │ 29067229 │ 8.88 │ 52.25 │ 1.33 │
│ CPUx72 │ 43e5784 │ 1.10.0 │ Float32 │ 34308861 │ 10.52 │ 54.07 │ 1.28 │
│ GPU │ 43e5784 │ 1.10.0 │ Float32 │ 3649091 │ 0.46 │ 3.24 │ 21.40 │
└─────────┴───────────┴────────┴───────────┴─────────────┴────────┴──────────┴──────────┘
Benchmark environment: jelly sim_step! (max_steps=100)
▶ log2p = 5
┌─────────┬───────────┬────────┬───────────┬─────────────┬────────┬──────────┬──────────┐
│ Backend │ WaterLily │ Julia │ Precision │ Allocations │ GC [%] │ Time [s] │ Speed-up │
├─────────┼───────────┼────────┼───────────┼─────────────┼────────┼──────────┼──────────┤
│ CPUx12 │ 43e5784 │ 1.10.0 │ Float32 │ 15469230 │ 1.37 │ 3.29 │ 1.00 │
│ CPUx24 │ 43e5784 │ 1.10.0 │ Float32 │ 25696182 │ 2.37 │ 3.95 │ 0.83 │
│ CPUx36 │ 43e5784 │ 1.10.0 │ Float32 │ 34962198 │ 2.46 │ 5.10 │ 0.64 │
│ CPUx48 │ 43e5784 │ 1.10.0 │ Float32 │ 42857578 │ 2.53 │ 6.32 │ 0.52 │
│ CPUx60 │ 43e5784 │ 1.10.0 │ Float32 │ 50564026 │ 2.70 │ 7.14 │ 0.46 │
│ CPUx72 │ 43e5784 │ 1.10.0 │ Float32 │ 57486610 │ 2.57 │ 8.73 │ 0.38 │
│ GPU │ 43e5784 │ 1.10.0 │ Float32 │ 8018123 │ 2.17 │ 1.57 │ 2.10 │
└─────────┴───────────┴────────┴───────────┴─────────────┴────────┴──────────┴──────────┘
▶ log2p = 6
┌─────────┬───────────┬────────┬───────────┬─────────────┬────────┬──────────┬──────────┐
│ Backend │ WaterLily │ Julia │ Precision │ Allocations │ GC [%] │ Time [s] │ Speed-up │
├─────────┼───────────┼────────┼───────────┼─────────────┼────────┼──────────┼──────────┤
│ CPUx12 │ 43e5784 │ 1.10.0 │ Float32 │ 20598523 │ 0.78 │ 12.15 │ 1.00 │
│ CPUx24 │ 43e5784 │ 1.10.0 │ Float32 │ 35587951 │ 1.58 │ 12.01 │ 1.01 │
│ CPUx36 │ 43e5784 │ 1.10.0 │ Float32 │ 49779678 │ 2.31 │ 13.38 │ 0.91 │
│ CPUx48 │ 43e5784 │ 1.10.0 │ Float32 │ 63047703 │ 2.80 │ 14.95 │ 0.81 │
│ CPUx60 │ 43e5784 │ 1.10.0 │ Float32 │ 76194003 │ 3.16 │ 16.38 │ 0.74 │
│ CPUx72 │ 43e5784 │ 1.10.0 │ Float32 │ 88376271 │ 3.24 │ 19.03 │ 0.64 │
│ GPU │ 43e5784 │ 1.10.0 │ Float32 │ 10491694 │ 1.87 │ 2.22 │ 5.47 │
└─────────┴───────────┴────────┴───────────┴─────────────┴────────┴──────────┴──────────┘
▶ log2p = 7
┌─────────┬───────────┬────────┬───────────┬─────────────┬────────┬──────────┬──────────┐
│ Backend │ WaterLily │ Julia │ Precision │ Allocations │ GC [%] │ Time [s] │ Speed-up │
├─────────┼───────────┼────────┼───────────┼─────────────┼────────┼──────────┼──────────┤
│ CPUx12 │ 43e5784 │ 1.10.0 │ Float32 │ 31778861 │ 1.15 │ 111.33 │ 1.00 │
│ CPUx24 │ 43e5784 │ 1.10.0 │ Float32 │ 55873553 │ 1.99 │ 102.58 │ 1.09 │
│ CPUx36 │ 43e5784 │ 1.10.0 │ Float32 │ 79130929 │ 1.97 │ 100.16 │ 1.11 │
│ CPUx48 │ 43e5784 │ 1.10.0 │ Float32 │ 101121788 │ 2.19 │ 102.71 │ 1.08 │
│ CPUx60 │ 43e5784 │ 1.10.0 │ Float32 │ 122910896 │ 2.31 │ 103.50 │ 1.08 │
│ CPUx72 │ 43e5784 │ 1.10.0 │ Float32 │ 144187652 │ 2.31 │ 107.25 │ 1.04 │
│ GPU │ 43e5784 │ 1.10.0 │ Float32 │ 16077560 │ 1.02 │ 6.44 │ 17.30 │
└─────────┴───────────┴────────┴───────────┴─────────────┴────────┴──────────┴──────────┘
from waterlily.jl.
Interesting how CPU multi-threading seems to not scale much with current case sizes (at least from 12 threads upwards). Also I understand that the p=8
jelly case does not fit in the GPU? The size of the jelly simulation is N=(2^p)*(2^p)*(4*2^p)
-- for p=8
, N≈67e6
. The TGV is just N=(2^p)^3
, which for p=8
results in N≈17e6
. And thank you very much for the benchmarks, @giordano!
from waterlily.jl.
from waterlily.jl.
Related Issues (20)
- Taylor Vortex file HOT 2
- Write VTK size/shape of field in writer HOT 2
- Can WaterLily.jl be used for 2D hydrodynamic modeling of real watersheds, similar to lisflood-fp, mike21, and anuga? HOT 2
- Sorry, my ignoramus questions about the Waterlily again HOT 1
- Simulating a tank for VIV HOT 1
- AD on GPU backend
- Noisy dependence of the body position HOT 10
- docs badge on `README.md` a broken link HOT 1
- Change CI to pass both CPU and GPU tests
- Internals flows in arbitrary shapes HOT 3
- questions about waterlily HOT 11
- Using DifferentiationInterface HOT 2
- AMDGPU downgrades Waterlily HOT 21
- Questions about STL files for simulation HOT 1
- Julia 1.10.4 vs 1.11.0 performance test
- Force Differences in WaterLily Version HOT 3
- Difference added mass force (moving domain/moving body) HOT 3
- Inconsistent steady state drag on 2D circle using Biot-Savart (domain size)
- Running example on AMD GPU fails HOT 5
- Top-level statement has no effect on execution of user code? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from waterlily.jl.