GithubHelp home page GithubHelp logo

Comments (2)

grlee77 avatar grlee77 commented on July 23, 2024 1

Hi @JHancox, thanks for reporting this. Can you specify the shape and dtype of imgrid?

Unfortunately PiecewiseAffineTransform is an outlier in cuCIM in that it currently does not actually have proper GPU implementation and will be faster on CPU. We should consider printing a warning at runtime and adding a Note to this effect in the docstring or removing it from the library. It currently has to copy to CPU to run scipy.spatial.Delauney which CuPy does not have a GPU implementation for.

warp should be faster on the GPU if the image is sufficiently large, but in this case with inverse_map being a PiecewiseAffineTransform callable rather than a cupy.ndarray it will be slow due to that.

In general, for warp if you are able to supply inverse_map as a cupy.ndarray instead of a callable and the image is not too small the GPU should be faster. A quick rule of thumb is that the CPU is expected to be faster if an image is very small like (256, 256) (especially if it fits in L1 cache size of the CPU). For medium sizes such as (512, 512) or (1024, 1024) the GPU should be becoming faster. Above several MB in size, the GPU should be much faster. For the GPU, it is also beneficial to ensure that the input is single precision to avoid relatively slow double precision on the GPU.

I don't doubt that the GPU is slower here, but wanted to mention that using timer for the comparison has a couple of potential pitfalls to be aware of

  • GPU times will be much slower the first time a function is called because any kernels get compiled and cached (fortunately this .cubin cache is persistent on disk across program runs so this is a one time cost).
  • GPU times can be misleadingly short in some cases where synchronization may not have been performed, so it is best to explicitly call cupy.cuda.Device().synchronize() before checking the final time to make sure the kernels have completed.

To handle the above issues automatically, CuPy provides a benchmark timing utility that can be used like this

from cupyx.profiler import benchmark

perf_cpu = benchmark(
    warp,
    args=(imgrid, tform),
    kwargs=dict(output_shape=255, 255),
    n_warmup=10,
    n_repeat=10000,
    max_duration=5)  # cap at 5 seconds duration
print(f"warp: avg CPU time = {perf_cpu.cpu_times.mean()}")


cu_imgrid = cp.array(imgrid)

perf_gpu = benchmark(
    cu_warp,
    args=(cu_imgrid, cu_tform),
    kwargs=dict(output_shape=255, 255),
    n_warmup=10,
    n_repeat=10000,
    max_duration=5)  # cap at 5 seconds duration
print(f"warp: avg GPU time = {perf_gpu.gpu_times.mean()}")

from cucim.

JHancox avatar JHancox commented on July 23, 2024

Thanks for the details @grlee77. In this case the image was 256 x 256 but I will try larger images and see what happens. Thanks for the tip on the timeit - you are quite right. Often there is some implicit mem synch operation involved anyhow, but I should be explicit about it.

from cucim.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.