Looks like performance of the iterators is ~x3 times slower than std .zip or izip!

size_hint did not change anything: <a class="commit-link" data-hovercard-type="commit"

performance of the slice iterator about soa-derive HOT 14 CLOSED

lumol-org commented on August 31, 2024

performance of the slice iterator

from soa-derive.

Comments (14)

Luthaf commented on August 31, 2024

What code are you using for your benchmark ?

I think this might be related to missing implementation of Iterator::size_hint and other helper methods in

soa-derive/soa-derive-internal/src/iter.rs

Lines 39 to 51 in bb9d45a

 impl<'a> Iterator for Iter<'a> { 

 type Item = #ref_name<'a>; 

 fn next(&mut self) -> Option<#ref_name<'a>> { 

 #(let #fields_names_1 = self.#fields_names_2.next();)* 

 if #first_field.is_none() { 

 None 

 } else { 

 Some(#ref_name { 

 #(#fields_names_1: #fields_names_2.unwrap(),)* 

 }) 

 } 

 } 

 }

Generally speaking, I am certain there are quite a few performances optimizations that are possible to improve this crate.

from soa-derive.

inv2004 commented on August 31, 2024

Please find I just created:

I did 4 fields, if more fields => bigger difference.
https://github.com/inv2004/soa_iter_bench

For me its a bit strange that test_new2 is the fastest and test_new is slow, because prev day I saw that test_new was the fastest and test_new2 x1.5 slower, but faster than test_old.

virtual Intel(R) Atom(TM) CPU C2750 @ 2.40GHz:

running 5 tests
test test_izip ... bench:     517,470 ns/iter (+/- 31,921)
test test_new  ... bench:   1,208,318 ns/iter (+/- 38,745)
test test_new2 ... bench:     494,725 ns/iter (+/- 29,403)
test test_old  ... bench:   1,203,817 ns/iter (+/- 25,856)
test test_zip  ... bench:     509,945 ns/iter (+/- 26,682)

Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz

running 5 tests
test test_izip ... bench:     105,344 ns/iter (+/- 3,834)
test test_new  ... bench:     197,468 ns/iter (+/- 6,442)
test test_new2 ... bench:     105,983 ns/iter (+/- 2,385)
test test_old  ... bench:     263,211 ns/iter (+/- 6,848)
test test_zip  ... bench:     105,318 ns/iter (+/- 2,628)

--added--
not sure that lifetimes are very correct.

from soa-derive.

inv2004 commented on August 31, 2024

size_hint did not change anything: inv2004/soa_iter_bench@e594312

running 5 tests
test test_izip ... bench:     106,009 ns/iter (+/- 7,491)
test test_new  ... bench:     199,861 ns/iter (+/- 13,772)
test test_new2 ... bench:     106,520 ns/iter (+/- 5,779)
test test_old  ... bench:     269,461 ns/iter (+/- 16,255)
test test_zip  ... bench:     105,314 ns/iter (+/- 4,614)

from soa-derive.

Luthaf commented on August 31, 2024

Oh, OK, I think I understand the issue better! The main problem here might be that we have to create new SRef<'a> at each step of the iteration, and then dereference it.

You should be able to use the soa_zip! macro instead of izip! and get at least the same performance (both macro expand to basically the same code)

from soa-derive.

inv2004 commented on August 31, 2024

I am not so sure that the problem is in creation of the SRef, because test_izip and test test_zip creates SRef too: https://github.com/inv2004/soa_iter_bench/blob/14803a468f0a153170431d7aa7b62176aa89693d/src/main.rs#L227

new bench without creation of the SRef:

test test_zip             ... bench:     523,696 ns/iter (+/- 16,403)
test test_zip_without_ref ... bench:     542,164 ns/iter (+/- 58,219)

from soa-derive.

Luthaf commented on August 31, 2024

What does this gives with soa_zip!?

from soa-derive.

inv2004 commented on August 31, 2024

No. I did not use soa_zip! because its not to impl SRef in the case and not possible to make universal solution.

from soa-derive.

inv2004 commented on August 31, 2024

Please find update.

I created new test_old2 from test_old and used construction from core::iter::Zip source code: https://github.com/inv2004/soa_iter_bench/blob/d7a30016fcee702331d05ad24a17addb43b806e7/src/main.rs#L101

and now it looks like it improved situation a lot:

running 10 tests
test test_new2_rev_test ... ignored
test test_izip            ... bench:     550,991 ns/iter (+/- 34,369)
test test_new             ... bench:   1,242,180 ns/iter (+/- 31,991)
test test_new2            ... bench:     628,175 ns/iter (+/- 27,861)
test test_new2_rev        ... bench:     702,763 ns/iter (+/- 29,718)
test test_old             ... bench:   1,247,024 ns/iter (+/- 20,131)
**test test_old2            ... bench:     622,488 ns/iter (+/- 18,691)**
test test_zip             ... bench:     549,055 ns/iter (+/- 26,092)
test test_zip_rev         ... bench:     570,239 ns/iter (+/- 27,660)
test test_zip_without_ref ... bench:     554,853 ns/iter (+/- 57,848)

--added--
a bit faster:

fn next(&mut self) -> Option<Self::Item> {
      self.a.next().and_then(|a|
         Some(SRef{
           a,
           b:self.b.next().unwrap(),
           c:self.c.next().unwrap(),
           d:self.d.next().unwrap()})
      )
    }

I do not know why. Looks like Rust can optimize some loops but cannot others.

from soa-derive.

Luthaf commented on August 31, 2024

Wow, this looks good! And should not be too hard to implement also.

Just to know, it sound like you want to abstract over Slice/Vec in your use case, why?

One possibility to do so would be to have an AsSlice trait that convert everything to a slice, and then work from here. Both XXXVec and XXXSliceMut have a slice() method that can be used for this.

from soa-derive.

inv2004 commented on August 31, 2024

If You would like to implement it, please do not forget about std::iter::DoubleEndedIterator, which is useful for many cases.

I have a different structs, for example S1, S2, S3, S3 with different fields, all of them derive SoA. because vectorized structure calculations are much master + structure of the data is more contant: vec of numbers is easier to use than vectors of structs with numbers.

then I implemented fn calc (for ref) to make some comp with S1,2,3,4 and different fields. Now, with Iterators I can easy write universal function without working with fields:

let sl = (S1Slice or SVec):: new(...);
for x in &sl { x.calc() } .

Unfortunately test_old2 is still ~12% slower than test_zip.

from soa-derive.

inv2004 commented on August 31, 2024

I feel like I am a bit annoying, but please find new update:

running 10 tests
test test_new2_rev_test ... ignored
test test_izip            ... bench:     505,946 ns/iter (+/- 9,750)
**test test_new             ... bench:     501,313 ns/iter (+/- 38,063)**
test test_new2            ... bench:     589,285 ns/iter (+/- 45,108)
test test_new2_rev        ... bench:     653,283 ns/iter (+/- 19,451)
test test_old             ... bench:   1,197,755 ns/iter (+/- 13,299)
test test_old2            ... bench:     570,729 ns/iter (+/- 36,173)
test test_zip             ... bench:     504,843 ns/iter (+/- 31,671)
test test_zip_rev         ... bench:     533,405 ns/iter (+/- 43,096)
test test_zip_without_ref ... bench:     509,646 ns/iter (+/- 35,153)

test_new (which uses Zip internally) time is exactly like test_zip. I suppose iter::Zip makes some optimizations, that is why it is the fastest.

Unfortunately this PR will not be so easy like test_old2. But I think that the main problem is to create "pub struct Iter<Zip<Zip<Zip<....>" type and (((a,b),c),d) tuple string.

https://github.com/inv2004/soa_iter_bench/blob/93dd8d2403c6ca16063545c72b7ee1c34c16cdf5/src/main.rs#L145

from soa-derive.

Luthaf commented on August 31, 2024

Thank you very much for your investigation here!

from soa-derive.

inv2004 commented on August 31, 2024

Thanks to You.

from soa-derive.

inv2004 commented on August 31, 2024

Please do not publish, because looks like DoubleEndedIterator and .rev() have performance problems. Do not know why at the moment.

from soa-derive.

performance of the slice iterator about soa-derive HOT 14 CLOSED

Comments (14)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs

	impl<'a> Iterator for Iter<'a> {
	type Item = #ref_name<'a>;
	fn next(&mut self) -> Option<#ref_name<'a>> {
	#(let #fields_names_1 = self.#fields_names_2.next();)*
	if #first_field.is_none() {
	None
	} else {
	Some(#ref_name {
	#(#fields_names_1: #fields_names_2.unwrap(),)*
	})
	}
	}
	}