Comments (2)
Hi @JHancox, thanks for reporting this. Can you specify the shape and dtype of imgrid
?
Unfortunately PiecewiseAffineTransform
is an outlier in cuCIM in that it currently does not actually have proper GPU implementation and will be faster on CPU. We should consider printing a warning at runtime and adding a Note to this effect in the docstring or removing it from the library. It currently has to copy to CPU to run scipy.spatial.Delauney
which CuPy does not have a GPU implementation for.
warp
should be faster on the GPU if the image is sufficiently large, but in this case with inverse_map
being a PiecewiseAffineTransform
callable rather than a cupy.ndarray
it will be slow due to that.
In general, for warp
if you are able to supply inverse_map
as a cupy.ndarray
instead of a callable and the image is not too small the GPU should be faster. A quick rule of thumb is that the CPU is expected to be faster if an image is very small like (256, 256) (especially if it fits in L1 cache size of the CPU). For medium sizes such as (512, 512) or (1024, 1024) the GPU should be becoming faster. Above several MB in size, the GPU should be much faster. For the GPU, it is also beneficial to ensure that the input is single precision to avoid relatively slow double precision on the GPU.
I don't doubt that the GPU is slower here, but wanted to mention that using timer
for the comparison has a couple of potential pitfalls to be aware of
- GPU times will be much slower the first time a function is called because any kernels get compiled and cached (fortunately this
.cubin
cache is persistent on disk across program runs so this is a one time cost). - GPU times can be misleadingly short in some cases where synchronization may not have been performed, so it is best to explicitly call
cupy.cuda.Device().synchronize()
before checking the final time to make sure the kernels have completed.
To handle the above issues automatically, CuPy provides a benchmark
timing utility that can be used like this
from cupyx.profiler import benchmark
perf_cpu = benchmark(
warp,
args=(imgrid, tform),
kwargs=dict(output_shape=255, 255),
n_warmup=10,
n_repeat=10000,
max_duration=5) # cap at 5 seconds duration
print(f"warp: avg CPU time = {perf_cpu.cpu_times.mean()}")
cu_imgrid = cp.array(imgrid)
perf_gpu = benchmark(
cu_warp,
args=(cu_imgrid, cu_tform),
kwargs=dict(output_shape=255, 255),
n_warmup=10,
n_repeat=10000,
max_duration=5) # cap at 5 seconds duration
print(f"warp: avg GPU time = {perf_gpu.gpu_times.mean()}")
from cucim.
Thanks for the details @grlee77. In this case the image was 256 x 256 but I will try larger images and see what happens. Thanks for the tip on the timeit - you are quite right. Often there is some implicit mem synch operation involved anyhow, but I should be explicit about it.
from cucim.
Related Issues (20)
- Cleanup `python/cucim/docs`
- Cleanup `python/cucim/ci`
- Review & cleanup `python/cucim`
- Drop `.cookiecutterrc`
- Drop `.idea`
- Validate wheel on CI
- Test CuPy 12 & 13 on CI
- Devendor CMake dependencies
- Drop Python pre-3.9 logic HOT 4
- Enable `libcufile` on `linux_aarch64` (w/CUDA 12.2+) HOT 1
- [BUG] `click` is listed only in test requirements but is a required in order to use `cucim.clara` HOT 2
- Is `cucim.code-workspace` still used? HOT 2
- Add `devcontainer`s HOT 1
- Order of axes for `CuImage.origin`? HOT 4
- [FEA] Enable cuda-version dependency resolution for packages that depend on cucim or other rapids packages HOT 4
- [FEA] CUDA accelerated foolfill algorithm with distance return HOT 2
- [BUG] No module named 'cucim.clara._cucim' HOT 5
- investigate switching KDTree usage from SciPy to CuPy HOT 1
- Explore NumPy 2 compatibility HOT 10
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cucim.