GithubHelp home page GithubHelp logo

Comments (5)

giordano avatar giordano commented on September 29, 2024 1

Interesting how CPU multi-threading seems to not scale much with current case sizes (at least from 12 threads upwards).

I think there's some problem with the CPU threading, because in htop I see very few cores busy, I don't know if it's bad load balancing, too many memory allocations causing lots of GC pauses, or what else, I haven't looked at the code.

Also I understand that the p=8 jelly case does not fit in the GPU?

No, on the GPU it works fine, if I remember correctly it takes about 50 seconds, it's the CPU version which is unbearably slow, I think it was going to take over 20 minutes with 12 threads, and probably not much less with more threads, given the generally bad scaling on CPU, so I didn't want to spend more than 2 hours to get the results.

from waterlily.jl.

b-fg avatar b-fg commented on September 29, 2024

Hey! I updated the benchamark suite and now you need to explicitly pass "case arguments" for all requested cases, that would be for example:

bash "${WATERLILY_ROOT}/benchmark/benchmark.sh"  -v "1.10.0" -t "12 24 36 48 60 72" -c "tgv jelly" -p "5,6,7,8 5,6,7,8" -s "100 100" -ft "Float32 Float32"

from waterlily.jl.

giordano avatar giordano commented on September 29, 2024

Thanks!

For the record, with #101 and running

# Get Waterlily root directory
WATERLILY_ROOT=$(julia --project=. --startup-file=no -e 'using WaterLily; print(pkgdir(WaterLily))')

# Run the benchmarks.  jelly only up to log2p=7 because the case log2p=8 is superslow on CPU
"${WATERLILY_ROOT}/benchmark/benchmark.sh"  -v "1.10" -t "12 24 36 48 60 72" -b "Array CuArray" -c "tgv jelly" -p "5,6,7,8 5,6,7" -s "100 100" -ft "Float32 Float32"

on Nvidia GH200 I get

Benchmark environment: tgv sim_step! (max_steps=100)
▶ log2p = 5
┌─────────┬───────────┬────────┬───────────┬─────────────┬────────┬──────────┬──────────┐
│ Backend │ WaterLily │ Julia  │ Precision │ Allocations │ GC [%] │ Time [s] │ Speed-up │
├─────────┼───────────┼────────┼───────────┼─────────────┼────────┼──────────┼──────────┤
│  CPUx12 │   43e5784 │ 1.10.0 │   Float32 │    10050707 │   2.03 │     1.42 │     1.00 │
│  CPUx24 │   43e5784 │ 1.10.0 │   Float32 │    16226723 │   2.24 │     1.89 │     0.75 │
│  CPUx36 │   43e5784 │ 1.10.0 │   Float32 │    21577295 │   2.48 │     2.51 │     0.56 │
│  CPUx48 │   43e5784 │ 1.10.0 │   Float32 │    25294522 │   2.53 │     3.07 │     0.46 │
│  CPUx60 │   43e5784 │ 1.10.0 │   Float32 │    28977646 │   2.69 │     3.34 │     0.42 │
│  CPUx72 │   43e5784 │ 1.10.0 │   Float32 │    32296306 │   2.80 │     4.07 │     0.35 │
│     GPU │   43e5784 │ 1.10.0 │   Float32 │     5291894 │   1.70 │     1.18 │     1.20 │
└─────────┴───────────┴────────┴───────────┴─────────────┴────────┴──────────┴──────────┘
▶ log2p = 6
┌─────────┬───────────┬────────┬───────────┬─────────────┬────────┬──────────┬──────────┐
│ Backend │ WaterLily │ Julia  │ Precision │ Allocations │ GC [%] │ Time [s] │ Speed-up │
├─────────┼───────────┼────────┼───────────┼─────────────┼────────┼──────────┼──────────┤
│  CPUx12 │   43e5784 │ 1.10.0 │   Float32 │    11067020 │   1.48 │     3.65 │     1.00 │
│  CPUx24 │   43e5784 │ 1.10.0 │   Float32 │    17929605 │   2.01 │     4.03 │     0.91 │
│  CPUx36 │   43e5784 │ 1.10.0 │   Float32 │    24063681 │   2.34 │     4.71 │     0.78 │
│  CPUx48 │   43e5784 │ 1.10.0 │   Float32 │    29679847 │   2.64 │     5.48 │     0.67 │
│  CPUx60 │   43e5784 │ 1.10.0 │   Float32 │    35276190 │   2.77 │     6.12 │     0.60 │
│  CPUx72 │   43e5784 │ 1.10.0 │   Float32 │    40603811 │   2.82 │     7.30 │     0.50 │
│     GPU │   43e5784 │ 1.10.0 │   Float32 │     4838973 │   1.60 │     1.07 │     3.40 │
└─────────┴───────────┴────────┴───────────┴─────────────┴────────┴──────────┴──────────┘
▶ log2p = 7
┌─────────┬───────────┬────────┬───────────┬─────────────┬────────┬──────────┬──────────┐
│ Backend │ WaterLily │ Julia  │ Precision │ Allocations │ GC [%] │ Time [s] │ Speed-up │
├─────────┼───────────┼────────┼───────────┼─────────────┼────────┼──────────┼──────────┤
│  CPUx12 │   43e5784 │ 1.10.0 │   Float32 │     7600713 │   1.71 │     9.53 │     1.00 │
│  CPUx24 │   43e5784 │ 1.10.0 │   Float32 │    13186184 │   2.94 │     8.10 │     1.18 │
│  CPUx36 │   43e5784 │ 1.10.0 │   Float32 │    18532520 │   3.52 │     8.14 │     1.17 │
│  CPUx48 │   43e5784 │ 1.10.0 │   Float32 │    23642594 │   4.22 │     8.50 │     1.12 │
│  CPUx60 │   43e5784 │ 1.10.0 │   Float32 │    28747149 │   4.64 │     8.85 │     1.08 │
│  CPUx72 │   43e5784 │ 1.10.0 │   Float32 │    33606789 │   5.02 │     9.84 │     0.97 │
│     GPU │   43e5784 │ 1.10.0 │   Float32 │     3752082 │   1.38 │     1.05 │     9.04 │
└─────────┴───────────┴────────┴───────────┴─────────────┴────────┴──────────┴──────────┘
▶ log2p = 8
┌─────────┬───────────┬────────┬───────────┬─────────────┬────────┬──────────┬──────────┐
│ Backend │ WaterLily │ Julia  │ Precision │ Allocations │ GC [%] │ Time [s] │ Speed-up │
├─────────┼───────────┼────────┼───────────┼─────────────┼────────┼──────────┼──────────┤
│  CPUx12 │   43e5784 │ 1.10.0 │   Float32 │     7325711 │   2.33 │    69.45 │     1.00 │
│  CPUx24 │   43e5784 │ 1.10.0 │   Float32 │    13016663 │   4.09 │    55.60 │     1.25 │
│  CPUx36 │   43e5784 │ 1.10.0 │   Float32 │    18530143 │   5.54 │    52.38 │     1.33 │
│  CPUx48 │   43e5784 │ 1.10.0 │   Float32 │    23822965 │   7.33 │    52.14 │     1.33 │
│  CPUx60 │   43e5784 │ 1.10.0 │   Float32 │    29067229 │   8.88 │    52.25 │     1.33 │
│  CPUx72 │   43e5784 │ 1.10.0 │   Float32 │    34308861 │  10.52 │    54.07 │     1.28 │
│     GPU │   43e5784 │ 1.10.0 │   Float32 │     3649091 │   0.46 │     3.24 │    21.40 │
└─────────┴───────────┴────────┴───────────┴─────────────┴────────┴──────────┴──────────┘
Benchmark environment: jelly sim_step! (max_steps=100)
▶ log2p = 5
┌─────────┬───────────┬────────┬───────────┬─────────────┬────────┬──────────┬──────────┐
│ Backend │ WaterLily │ Julia  │ Precision │ Allocations │ GC [%] │ Time [s] │ Speed-up │
├─────────┼───────────┼────────┼───────────┼─────────────┼────────┼──────────┼──────────┤
│  CPUx12 │   43e5784 │ 1.10.0 │   Float32 │    15469230 │   1.37 │     3.29 │     1.00 │
│  CPUx24 │   43e5784 │ 1.10.0 │   Float32 │    25696182 │   2.37 │     3.95 │     0.83 │
│  CPUx36 │   43e5784 │ 1.10.0 │   Float32 │    34962198 │   2.46 │     5.10 │     0.64 │
│  CPUx48 │   43e5784 │ 1.10.0 │   Float32 │    42857578 │   2.53 │     6.32 │     0.52 │
│  CPUx60 │   43e5784 │ 1.10.0 │   Float32 │    50564026 │   2.70 │     7.14 │     0.46 │
│  CPUx72 │   43e5784 │ 1.10.0 │   Float32 │    57486610 │   2.57 │     8.73 │     0.38 │
│     GPU │   43e5784 │ 1.10.0 │   Float32 │     8018123 │   2.17 │     1.57 │     2.10 │
└─────────┴───────────┴────────┴───────────┴─────────────┴────────┴──────────┴──────────┘
▶ log2p = 6
┌─────────┬───────────┬────────┬───────────┬─────────────┬────────┬──────────┬──────────┐
│ Backend │ WaterLily │ Julia  │ Precision │ Allocations │ GC [%] │ Time [s] │ Speed-up │
├─────────┼───────────┼────────┼───────────┼─────────────┼────────┼──────────┼──────────┤
│  CPUx12 │   43e5784 │ 1.10.0 │   Float32 │    20598523 │   0.78 │    12.15 │     1.00 │
│  CPUx24 │   43e5784 │ 1.10.0 │   Float32 │    35587951 │   1.58 │    12.01 │     1.01 │
│  CPUx36 │   43e5784 │ 1.10.0 │   Float32 │    49779678 │   2.31 │    13.38 │     0.91 │
│  CPUx48 │   43e5784 │ 1.10.0 │   Float32 │    63047703 │   2.80 │    14.95 │     0.81 │
│  CPUx60 │   43e5784 │ 1.10.0 │   Float32 │    76194003 │   3.16 │    16.38 │     0.74 │
│  CPUx72 │   43e5784 │ 1.10.0 │   Float32 │    88376271 │   3.24 │    19.03 │     0.64 │
│     GPU │   43e5784 │ 1.10.0 │   Float32 │    10491694 │   1.87 │     2.22 │     5.47 │
└─────────┴───────────┴────────┴───────────┴─────────────┴────────┴──────────┴──────────┘
▶ log2p = 7
┌─────────┬───────────┬────────┬───────────┬─────────────┬────────┬──────────┬──────────┐
│ Backend │ WaterLily │ Julia  │ Precision │ Allocations │ GC [%] │ Time [s] │ Speed-up │
├─────────┼───────────┼────────┼───────────┼─────────────┼────────┼──────────┼──────────┤
│  CPUx12 │   43e5784 │ 1.10.0 │   Float32 │    31778861 │   1.15 │   111.33 │     1.00 │
│  CPUx24 │   43e5784 │ 1.10.0 │   Float32 │    55873553 │   1.99 │   102.58 │     1.09 │
│  CPUx36 │   43e5784 │ 1.10.0 │   Float32 │    79130929 │   1.97 │   100.16 │     1.11 │
│  CPUx48 │   43e5784 │ 1.10.0 │   Float32 │   101121788 │   2.19 │   102.71 │     1.08 │
│  CPUx60 │   43e5784 │ 1.10.0 │   Float32 │   122910896 │   2.31 │   103.50 │     1.08 │
│  CPUx72 │   43e5784 │ 1.10.0 │   Float32 │   144187652 │   2.31 │   107.25 │     1.04 │
│     GPU │   43e5784 │ 1.10.0 │   Float32 │    16077560 │   1.02 │     6.44 │    17.30 │
└─────────┴───────────┴────────┴───────────┴─────────────┴────────┴──────────┴──────────┘

from waterlily.jl.

b-fg avatar b-fg commented on September 29, 2024

Interesting how CPU multi-threading seems to not scale much with current case sizes (at least from 12 threads upwards). Also I understand that the p=8 jelly case does not fit in the GPU? The size of the jelly simulation is N=(2^p)*(2^p)*(4*2^p) -- for p=8, N≈67e6. The TGV is just N=(2^p)^3, which for p=8 results in N≈17e6. And thank you very much for the benchmarks, @giordano!

from waterlily.jl.

weymouth avatar weymouth commented on September 29, 2024

from waterlily.jl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.