Comments (20)
@tafia has been making a lot of large performance improvements recently. It might eventually be nice to write a short performance tutorial/discussion working roughly through the commit history of e.g. one of the filtering functions.
from imageproc.
Always working with GenericImage
may not be the best strategy for performance.
Knowing we work with a VecBuffer
(as stated on readme non-goals) instead would allow us
- to use iterators => no need for unsafe in many places
- no need to copy GenericImage into ImageBuffer (like here)
- eventually allow implementing more iterators instead of new
VecBuffer
s
@theotherphil Is this something I can try working on or you don't want to go in that direction ?
from imageproc.
I'm conflicted on this. My initial not-really-a-plan was just to wait until specialisation landed in the compiler and have faster versions of functions specialised to images that we know consist of contiguous blocks of memory (i.e. that implemented some ReallyAChunkOfMemory trait).
I'm somewhat reluctant to just switch to requiring ImageBuffers everywhere because this precludes any possible clever future ideas involving lazy images. It also potentially stops anyone using this library with their own image types without first copying all their data. However, maybe it's reasonable to say that ImageBuffers are the only way to efficiently use the library and if you don't care that much about performance you can just implement to/from buffer for your own image types.
I guess we can always change our minds later - we need to be very explicit that we're not guaranteeing stability of any APIs for the medium term future, so that we can switch around our requirements as we find out what does or doesn't work.
For maximum performance we'd probably also want to control image strides and alignment, which I don't think ImageBuffer currently allows.
Ok, there wasn't anything very concrete there, just rambling about vague concerns.
So, more concretely:
- I'm not convinced that standardising on ImageBuffer is a great idea. The best structure for image encoding/decoding might not be the best for image processing, and I want to keep the ability to vary our requirements somewhat independently of the concrete types in the image library.
- However, allowing for arbitrary GenericImages does impose a huge performance restriction, as you say. I'd be happy for us to start adding extra restrictions on functions for increased performance. The best approach might be to create our own image traits with additional requirements, e.g. ability to provide a row slice, or providing raw access to underlying data, etc.
Does this sound reasonable? This is more work than just switching to ImageBuffer everywhere, and it's not me offering to do the work, so it seems a little unfair to just say "no, do something harder instead".
from imageproc.
The best approach might be to create our own image traits with additional requirements, e.g. ability to provide a row slice, or providing raw access to underlying data, etc.
This shouldn't be too complicated, I like it.
from imageproc.
Ok, great.
Anything that made functions faster on ImageBuffers would be directly useful for my hypothetical new world of custom traits, and getting the API for these traits right will involve having some experience of trying to write optimised functions for buffer-like images. So it's probably worth maintaining an ImageBuffer branch for a while to let us experiment with optimisations that are enabled by restricting everything to ImageBuffers and get a better idea of what our new traits should look like.
from imageproc.
I've just created a "performance" branch for this.
from imageproc.
For future reference, here are all the current benchmarks according to my laptop (ignore the "baseline" ones - they're to let us make sure that whatever we add is faster than what's already in image::imagops).
test affine::test::bench_rotate_bilinear ... bench: 402,929 ns/iter (+/- 12,354)
test affine::test::bench_rotate_nearest ... bench: 319,127 ns/iter (+/- 52,885)
test affine::test::bench_translate ... bench: 211,600 ns/iter (+/- 12,841)
test contrast::test::bench_equalize_histogram ... bench: 2,236,089 ns/iter (+/- 224,182)
test corners::test::bench_is_corner_fast12_12_noncontiguous ... bench: 21 ns/iter (+/- 2)
test corners::test::bench_is_corner_fast9_9_contiguous_lighter_pixels ... bench: 21 ns/iter (+/- 2)
test filter::test::bench_baseline_gaussian_stdev_1 ... bench: 1,720,996 ns/iter (+/- 147,949)
test filter::test::bench_baseline_gaussian_stdev_10 ... bench: 12,492,048 ns/iter (+/- 518,286)
test filter::test::bench_baseline_gaussian_stdev_3 ... bench: 4,335,076 ns/iter (+/- 317,193)
test filter::test::bench_box_filter ... bench: 2,813,485 ns/iter (+/- 156,651)
test filter::test::bench_filter3x3_i32_filter ... bench: 1,988,750 ns/iter (+/- 87,241)
test filter::test::bench_horizontal_filter ... bench: 3,051,673 ns/iter (+/- 227,131)
test filter::test::bench_separable_filter ... bench: 2,251,522 ns/iter (+/- 538,489)
test filter::test::bench_vertical_filter ... bench: 3,172,381 ns/iter (+/- 142,535)
test haar::test::bench_evaluate_all_filters_10x10 ... bench: 1,919,664 ns/iter (+/- 456,883)
test integralimage::test::bench_column_running_sum ... bench: 778 ns/iter (+/- 123)
test integralimage::test::bench_integral_image ... bench: 443,570 ns/iter (+/- 63,085)
test integralimage::test::bench_row_running_sum ... bench: 596 ns/iter (+/- 51)
test regionlabelling::test::bench_connected_components_eight_chessboard ... bench: 923,307 ns/iter (+/- 112,500)
test regionlabelling::test::bench_connected_components_four_chessboard ... bench: 472,306 ns/iter (+/- 147,901)
test suppress::test::bench_local_maxima_dense ... bench: 27,985 ns/iter (+/- 5,060)
test suppress::test::bench_local_maxima_sparse ... bench: 36,700 ns/iter (+/- 13,368)
test suppress::test::bench_suppress_non_maximum_decreasing_gradient ... bench: 1,297 ns/iter (+/- 642)
test suppress::test::bench_suppress_non_maximum_increasing_gradient ... bench: 1,759 ns/iter (+/- 329)
test suppress::test::bench_suppress_non_maximum_noise_1 ... bench: 5,255 ns/iter (+/- 1,382)
test suppress::test::bench_suppress_non_maximum_noise_3 ... bench: 3,218 ns/iter (+/- 2,241)
test suppress::test::bench_suppress_non_maximum_noise_7 ... bench: 2,284 ns/iter (+/- 311)
from imageproc.
Just read about http://bluss.github.io/rust-ndarray/master/ndarray/index.html on reddit. It might help when working with kernels etc ...
from imageproc.
It looks good. I've not had the time to work on this library much recently but I've just started reading up on halide a little and think we could nick some of their ideas. I doubt we'll ever be as fast as them, but it would be interesting to see how far we could get towards allowing configurable pipelines of functions by combining eager and static evaluation of iterators of image chunks. This is obviously ridiculously vague - I've not given serious thought to how exactly we'd do this, but I think we definitely need to move away from functions that force the creation of intermediates at all steps. The n-d split_at support in ndarray looks really promising for this.
from imageproc.
https://github.com/libmir/dcv This D library is based heavily on the ndslice type, so might provide some useful ideas.
from imageproc.
I have a question: is this crate based on CPU computations? And if yes, would it be interesting to add CUDA and OpenCL support?
from imageproc.
Yes, everything is on the CPU at the moment. It certainly would be interesting to add GPU support - do you have any ideas for how we might go about doing this?
from imageproc.
Yes, actually!
There are a few crates out there that offer OpenCL and CUDA support. We'd have to research them. However, the core of the idea is PIMPL.
Basically we define a bunch of traits, instead of structs. Example:
trait VerticalSobel {
fn vertical_sobel(&self, image: &GrayImage) -> Image<Luma<i16>>;
}
Then we'll have the different implementors:
struct CPUImpl { ... }
struct CUDAImpl { ... }
Which implement the traits:
impl VerticalSobel for CPUImpl {
fn vertical_sobel(&self, image: &GrayImage) -> Image<Luma<i16>> {
// Use CPU to perform calculations
}
}
impl VerticalSobel for CUDAImpl {
fn vertical_sobel(&self, image: &GrayImage) -> Image<Luma<i16>> {
// Use CUDA to perform calculations
}
}
And to use them, we'd have something like
struct Gradients<T> {
implementation: T,
...
}
impl<T: VerticalSobel> VerticalSobel for Gradients<T> {
fn vertical_sobel(&self, image: &GrayImage) -> Image<Luma<i16>> {
self.implementation.vertical_sobel(self.image)
}
}
That way people would have a stable API to use in their code, and then we could setup at compile time to use CUDA/OpenCL if it's available, or fall back to the CPU implementation if not.
from imageproc.
I'm afraid I don't follow your examples. Could you elaborate on why there's an implementation of VerticalSobel for Gradients here, and on what this would look like for the eventual user?
from imageproc.
Gradients
doesn't need to have an implementation of VerticalSobel
. It's just a possibility.
Actually, my example was kinda bad thought out. I think it would be better to have a Processor
struct.
struct Processor<T> {
image: GrayImage,
implementation: T,
...
}
impl<T: VerticalSobel> Processor<T> {
fn vertical_sobel(&self, &GrayImage) -> Image<Luma<i16>> {
self.implementation.vertical_sobel(self.image)
}
}
As for the end user, it could either look like this:
let img: GrayImage = get_gray_image();
let processor = Processor::new_with_CUDA();
let vertical_sobel = processor.vertical_sobel(&img);
Or this:
let img: GrayImage = get_gray_image();
let processor = Processor::new_with_CPU(img);
let vertical_sobel = processor.vertical_sobel();
That way all image processing would be handled via this Processor
struct which has a user-chosen implementation of the algorithms.
from imageproc.
This sounds like an interesting approach. Before making sweeping API changes we should have a few GPU implementations in place to experiment with (initially just defined as separate functions). However, I'm certainly open to switching to something like this if it works out.
from imageproc.
Googling around for GPU code with Rust, there seems to be a few crates already:
https://github.com/arrayfire/arrayfire-rust
https://github.com/autumnai/rust-cudnn
https://github.com/cogciprocate/ocl
We just need to pick a good one to test out. That arrayfire one looks promising.
from imageproc.
I agree! Are you planning on trying this out yourself?
from imageproc.
I'd love to, but I can't promise anything. College is a huge time drain, unfortunately. But hey, the idea is there, so maybe someone else can get interested in it?
from imageproc.
Ok, thanks for the suggestions!
from imageproc.
Related Issues (20)
- Another doc-macro for parallel variations of functions?
- `Rect` refactors HOT 2
- Add filter benchmarks for kernels larger than 3x3
- Add hardcoded filter3x3 function to benchmark current generic filter function against HOT 19
- Switch `imageproc::Rect` to `image::Rect` HOT 3
- `Laplacian` from `master` is wrong HOT 4
- Check for any functions missing tests
- Use `Kernel` by `ref` HOT 2
- Document release process
- Agree a sensible API for filtering so we can reduce the risk of frequent breaking changes to signatures HOT 3
- Fix filter performance vs simple hardcoded implementation
- Add Tags for all previous releases where possible HOT 2
- Decide module layout for new imageops functions HOT 1
- how encode drawed image to [u8] ? HOT 6
- Rounded corners on rectangles and images HOT 4
- Projection matrix for rotation is weird
- Add `#[must_use]` to all non-mut image processing functions
- Broken links in docs.rs documentation
- Add extra maintainers HOT 1
- Create a new minor release HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from imageproc.