GithubHelp home page GithubHelp logo

Performance about imageproc HOT 20 OPEN

image-rs avatar image-rs commented on September 28, 2024
Performance

from imageproc.

Comments (20)

theotherphil avatar theotherphil commented on September 28, 2024

@tafia has been making a lot of large performance improvements recently. It might eventually be nice to write a short performance tutorial/discussion working roughly through the commit history of e.g. one of the filtering functions.

from imageproc.

tafia avatar tafia commented on September 28, 2024

Always working with GenericImage may not be the best strategy for performance.

Knowing we work with a VecBuffer (as stated on readme non-goals) instead would allow us

  • to use iterators => no need for unsafe in many places
  • no need to copy GenericImage into ImageBuffer (like here)
  • eventually allow implementing more iterators instead of new VecBuffers

@theotherphil Is this something I can try working on or you don't want to go in that direction ?

from imageproc.

theotherphil avatar theotherphil commented on September 28, 2024

I'm conflicted on this. My initial not-really-a-plan was just to wait until specialisation landed in the compiler and have faster versions of functions specialised to images that we know consist of contiguous blocks of memory (i.e. that implemented some ReallyAChunkOfMemory trait).

I'm somewhat reluctant to just switch to requiring ImageBuffers everywhere because this precludes any possible clever future ideas involving lazy images. It also potentially stops anyone using this library with their own image types without first copying all their data. However, maybe it's reasonable to say that ImageBuffers are the only way to efficiently use the library and if you don't care that much about performance you can just implement to/from buffer for your own image types.

I guess we can always change our minds later - we need to be very explicit that we're not guaranteeing stability of any APIs for the medium term future, so that we can switch around our requirements as we find out what does or doesn't work.

For maximum performance we'd probably also want to control image strides and alignment, which I don't think ImageBuffer currently allows.

Ok, there wasn't anything very concrete there, just rambling about vague concerns.

So, more concretely:

  • I'm not convinced that standardising on ImageBuffer is a great idea. The best structure for image encoding/decoding might not be the best for image processing, and I want to keep the ability to vary our requirements somewhat independently of the concrete types in the image library.
  • However, allowing for arbitrary GenericImages does impose a huge performance restriction, as you say. I'd be happy for us to start adding extra restrictions on functions for increased performance. The best approach might be to create our own image traits with additional requirements, e.g. ability to provide a row slice, or providing raw access to underlying data, etc.

Does this sound reasonable? This is more work than just switching to ImageBuffer everywhere, and it's not me offering to do the work, so it seems a little unfair to just say "no, do something harder instead".

from imageproc.

tafia avatar tafia commented on September 28, 2024

The best approach might be to create our own image traits with additional requirements, e.g. ability to provide a row slice, or providing raw access to underlying data, etc.

This shouldn't be too complicated, I like it.

from imageproc.

theotherphil avatar theotherphil commented on September 28, 2024

Ok, great.

Anything that made functions faster on ImageBuffers would be directly useful for my hypothetical new world of custom traits, and getting the API for these traits right will involve having some experience of trying to write optimised functions for buffer-like images. So it's probably worth maintaining an ImageBuffer branch for a while to let us experiment with optimisations that are enabled by restricting everything to ImageBuffers and get a better idea of what our new traits should look like.

from imageproc.

theotherphil avatar theotherphil commented on September 28, 2024

I've just created a "performance" branch for this.

from imageproc.

theotherphil avatar theotherphil commented on September 28, 2024

For future reference, here are all the current benchmarks according to my laptop (ignore the "baseline" ones - they're to let us make sure that whatever we add is faster than what's already in image::imagops).

test affine::test::bench_rotate_bilinear                                ... bench:     402,929 ns/iter (+/- 12,354)
test affine::test::bench_rotate_nearest                                 ... bench:     319,127 ns/iter (+/- 52,885)
test affine::test::bench_translate                                      ... bench:     211,600 ns/iter (+/- 12,841)
test contrast::test::bench_equalize_histogram                           ... bench:   2,236,089 ns/iter (+/- 224,182)
test corners::test::bench_is_corner_fast12_12_noncontiguous             ... bench:          21 ns/iter (+/- 2)
test corners::test::bench_is_corner_fast9_9_contiguous_lighter_pixels   ... bench:          21 ns/iter (+/- 2)
test filter::test::bench_baseline_gaussian_stdev_1                      ... bench:   1,720,996 ns/iter (+/- 147,949)
test filter::test::bench_baseline_gaussian_stdev_10                     ... bench:  12,492,048 ns/iter (+/- 518,286)
test filter::test::bench_baseline_gaussian_stdev_3                      ... bench:   4,335,076 ns/iter (+/- 317,193)
test filter::test::bench_box_filter                                     ... bench:   2,813,485 ns/iter (+/- 156,651)
test filter::test::bench_filter3x3_i32_filter                           ... bench:   1,988,750 ns/iter (+/- 87,241)
test filter::test::bench_horizontal_filter                              ... bench:   3,051,673 ns/iter (+/- 227,131)
test filter::test::bench_separable_filter                               ... bench:   2,251,522 ns/iter (+/- 538,489)
test filter::test::bench_vertical_filter                                ... bench:   3,172,381 ns/iter (+/- 142,535)
test haar::test::bench_evaluate_all_filters_10x10                       ... bench:   1,919,664 ns/iter (+/- 456,883)
test integralimage::test::bench_column_running_sum                      ... bench:         778 ns/iter (+/- 123)
test integralimage::test::bench_integral_image                          ... bench:     443,570 ns/iter (+/- 63,085)
test integralimage::test::bench_row_running_sum                         ... bench:         596 ns/iter (+/- 51)
test regionlabelling::test::bench_connected_components_eight_chessboard ... bench:     923,307 ns/iter (+/- 112,500)
test regionlabelling::test::bench_connected_components_four_chessboard  ... bench:     472,306 ns/iter (+/- 147,901)
test suppress::test::bench_local_maxima_dense                           ... bench:      27,985 ns/iter (+/- 5,060)
test suppress::test::bench_local_maxima_sparse                          ... bench:      36,700 ns/iter (+/- 13,368)
test suppress::test::bench_suppress_non_maximum_decreasing_gradient     ... bench:       1,297 ns/iter (+/- 642)
test suppress::test::bench_suppress_non_maximum_increasing_gradient     ... bench:       1,759 ns/iter (+/- 329)
test suppress::test::bench_suppress_non_maximum_noise_1                 ... bench:       5,255 ns/iter (+/- 1,382)
test suppress::test::bench_suppress_non_maximum_noise_3                 ... bench:       3,218 ns/iter (+/- 2,241)
test suppress::test::bench_suppress_non_maximum_noise_7                 ... bench:       2,284 ns/iter (+/- 311)

from imageproc.

tafia avatar tafia commented on September 28, 2024

Just read about http://bluss.github.io/rust-ndarray/master/ndarray/index.html on reddit. It might help when working with kernels etc ...

from imageproc.

theotherphil avatar theotherphil commented on September 28, 2024

It looks good. I've not had the time to work on this library much recently but I've just started reading up on halide a little and think we could nick some of their ideas. I doubt we'll ever be as fast as them, but it would be interesting to see how far we could get towards allowing configurable pipelines of functions by combining eager and static evaluation of iterators of image chunks. This is obviously ridiculously vague - I've not given serious thought to how exactly we'd do this, but I think we definitely need to move away from functions that force the creation of intermediates at all steps. The n-d split_at support in ndarray looks really promising for this.

from imageproc.

theotherphil avatar theotherphil commented on September 28, 2024

https://github.com/libmir/dcv This D library is based heavily on the ndslice type, so might provide some useful ideas.

from imageproc.

ivandardi avatar ivandardi commented on September 28, 2024

I have a question: is this crate based on CPU computations? And if yes, would it be interesting to add CUDA and OpenCL support?

from imageproc.

theotherphil avatar theotherphil commented on September 28, 2024

Yes, everything is on the CPU at the moment. It certainly would be interesting to add GPU support - do you have any ideas for how we might go about doing this?

from imageproc.

ivandardi avatar ivandardi commented on September 28, 2024

Yes, actually!

There are a few crates out there that offer OpenCL and CUDA support. We'd have to research them. However, the core of the idea is PIMPL.

Basically we define a bunch of traits, instead of structs. Example:

trait VerticalSobel {
    fn vertical_sobel(&self, image: &GrayImage) -> Image<Luma<i16>>;
}

Then we'll have the different implementors:

struct CPUImpl { ... }
struct CUDAImpl { ... }

Which implement the traits:

impl VerticalSobel for CPUImpl {
    fn vertical_sobel(&self, image: &GrayImage) -> Image<Luma<i16>> {
        // Use CPU to perform calculations
    }
}

impl VerticalSobel for CUDAImpl {
    fn vertical_sobel(&self, image: &GrayImage) -> Image<Luma<i16>> {
        // Use CUDA to perform calculations
    }
}

And to use them, we'd have something like

struct Gradients<T> {
    implementation: T,
    ...
}

impl<T: VerticalSobel> VerticalSobel for Gradients<T> {
    fn vertical_sobel(&self, image: &GrayImage) -> Image<Luma<i16>> {
        self.implementation.vertical_sobel(self.image)
    }
}

That way people would have a stable API to use in their code, and then we could setup at compile time to use CUDA/OpenCL if it's available, or fall back to the CPU implementation if not.

from imageproc.

theotherphil avatar theotherphil commented on September 28, 2024

I'm afraid I don't follow your examples. Could you elaborate on why there's an implementation of VerticalSobel for Gradients here, and on what this would look like for the eventual user?

from imageproc.

ivandardi avatar ivandardi commented on September 28, 2024

Gradients doesn't need to have an implementation of VerticalSobel. It's just a possibility.

Actually, my example was kinda bad thought out. I think it would be better to have a Processor struct.

struct Processor<T> {
    image: GrayImage,
    implementation: T,
    ...
}

impl<T: VerticalSobel> Processor<T> {
    fn vertical_sobel(&self, &GrayImage) -> Image<Luma<i16>> {
        self.implementation.vertical_sobel(self.image)
    }
}

As for the end user, it could either look like this:

let img: GrayImage = get_gray_image();
let processor = Processor::new_with_CUDA();
let vertical_sobel = processor.vertical_sobel(&img);

Or this:

let img: GrayImage = get_gray_image();
let processor = Processor::new_with_CPU(img);
let vertical_sobel = processor.vertical_sobel();

That way all image processing would be handled via this Processor struct which has a user-chosen implementation of the algorithms.

from imageproc.

theotherphil avatar theotherphil commented on September 28, 2024

This sounds like an interesting approach. Before making sweeping API changes we should have a few GPU implementations in place to experiment with (initially just defined as separate functions). However, I'm certainly open to switching to something like this if it works out.

from imageproc.

ivandardi avatar ivandardi commented on September 28, 2024

Googling around for GPU code with Rust, there seems to be a few crates already:

https://github.com/arrayfire/arrayfire-rust
https://github.com/autumnai/rust-cudnn
https://github.com/cogciprocate/ocl

We just need to pick a good one to test out. That arrayfire one looks promising.

from imageproc.

theotherphil avatar theotherphil commented on September 28, 2024

I agree! Are you planning on trying this out yourself?

from imageproc.

ivandardi avatar ivandardi commented on September 28, 2024

I'd love to, but I can't promise anything. College is a huge time drain, unfortunately. But hey, the idea is there, so maybe someone else can get interested in it?

from imageproc.

theotherphil avatar theotherphil commented on September 28, 2024

Ok, thanks for the suggestions!

from imageproc.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.