GithubHelp home page GithubHelp logo

Comments (2)

jcupitt avatar jcupitt commented on June 9, 2024

Hello again @homm,

I think the problem is that user time can be very misleading. If the CPU has hyperthreading (and the library is threaded), it will include the time that paired threads spent stalled on the shared ALU. This means it can overestimate the actual computation done by the library by a factor of two.

For example, on this terrible two-core, four-thread laptop I see:

$ time ./vips-c tmp/x.tif tmp/x2.tif 
real	0m0.787s
user	0m2.800s
sys	0m0.108s

Then if I set the size of the libvips threadpool to 1:

$ export VIPS_CONCURRENCY=1
$ time ./vips-c tmp/x.tif tmp/x2.tif 
real	0m1.597s
user	0m1.592s
sys	0m0.068s

User time roughly halves, even though the same amount of calculation is happening.

My idea instead was to include a one-thread time for libvips -- this should give a more accurate measure of actual computation time. It would be interesting to add similar figures for the other libraries, but it would need some work to figure out how to turn off threading for all of them.

from vips-bench.

homm avatar homm commented on June 9, 2024

it will include the time that paired threads spent stalled on the shared ALU

I believe this number is still interesting. For example, If I have 2-core CPU with multithreading and I see that CPU time is 3.7x more than the real time, I can see that whole CPU is busy and no room to run things faster.

On the other hand, you are perfectly right that CPU time 2.9s compared to 1.66s looks 1.75x slower, while in practice this could be the same time.

Why we ever want to know execution time on the single core? Because other cores will be free for other tasks. So, the actual interesting metric is: "I have a CPU, and I have a lot of tasks (which are the same for the benchmark simplicity). Which maximum throughput I can get using this library?"

My idea instead was to include a one-thread time for libvips

This may be close to the correct indication, but far from the answer to the question, because there are additional factors:

  1. multithread libs can have some overhead which will not be counted in single thread mode.
  2. some libs could win from hyperthreading (for example ImageMagick), while other couldn't, like Pillow-SIMD, which uses heavy SSE4 and AVX2 instructions and load 100% ALU of the core
  3. even when each CPU core is separated, they still have shared resources, which could be consumed. For example memory bandwidth, L3 cache, OS locks

So, in general, the formula "performance on single core × number of cores" doesn't work.

The only solution I see is run the "number of cores" tasks simultaneously. For example, on 4-cores i5-4430:

# Three sequential runs
$ time ./vips.py x.tif && time ./vips.py x.tif && time ./vips.py x.tif
real  0m0.318s
user  0m0.752s
sys   0m0.076s

# Three parallel runs
$ time sh -c "./vips.py x.tif & ./vips.py x.tif & ./vips.py x.tif & ./vips.py x.tif & wait" &&\
  time sh -c "./vips.py x.tif & ./vips.py x.tif & ./vips.py x.tif & ./vips.py x.tif & wait" &&\
  time sh -c "./vips.py x.tif & ./vips.py x.tif & ./vips.py x.tif & ./vips.py x.tif & wait"
real  0m0.895s
user  0m3.184s
sys   0m0.300s

# Three sequential runs
$ time ./pillow.py x.tif && time ./pillow.py x.tif && time ./pillow.py x.tif
real  0m0.221s
user  0m0.156s
sys   0m0.060s

# Three parallel runs
$ time sh -c "./pillow.py x.tif & ./pillow.py x.tif & ./pillow.py x.tif & ./pillow.py x.tif & wait" &&\
  time sh -c "./pillow.py x.tif & ./pillow.py x.tif & ./pillow.py x.tif & ./pillow.py x.tif & wait" &&\
  time sh -c "./pillow.py x.tif & ./pillow.py x.tif & ./pillow.py x.tif & ./pillow.py x.tif & wait"
real  0m0.388s
user  0m1.112s
sys   0m0.404s

from vips-bench.

Related Issues (7)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.