(This issue is also related to <a class="user-mention notranslate" data-hovercard-type

+1 for <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-u

Confusing about (un)optimization for all langs about rays HOT 4 CLOSED

kidoman commented on August 15, 2024

Confusing about (un)optimization for all langs

from rays.

Comments (4)

kidoman commented on August 15, 2024

Thanks for bringing this up. I had a self debate yesterday about this particular commit. Let me try and explain the thought process:

The project started off initially as a good way to learn idiomatic Go
I shifted my focus to the performance aspect when I saw that there was a large delta between "unoptimized" Go and C++ (that is when I created the post in golang-nuts)
A lot of optimizations were done (and documented in the first blog post) as a way to see where Go lacked and what measures could be taken to close the gap as much as possible

The direction of the project "rays" has definitely shifted now. In my mind "its about seeing how good can a program perform in a given language/compiler/design (l/c/d) combination whilst still keeping the code as close to real world as possible."

So, there are two kinds of optimizations in my mind:

The first version (C++) of the code scanned through the entire ART and incurred a huge cost in computation time whilst accomplishing nothing; this looked like a broken algo design, hence I fixed it by creating a objects array
In Go, replacing math.Pow(x, 99) with a hand optimized multiplication tree to get 5 % extra perf

I still believe in retaining the first one, but like I reversed the micro-opt with bc029c5 yesterday, I want to bring all the implementations up to a stage where we do not avoid stuff like math.Pow(), etc. Instead, give scope for the compiler to do the right thing for you.

Then "rays" essentially becomes a good test bed to see how much we can extract from a l/c/d combination without doing benchmark specific optimization; by letting the compiler do its thing. As much as possible.

That being said, SSE in C++ is not something we need to avoid; in fact, its a USP in the language itself that it allows us to go from 12.7 s to 9.4 s by still writing C++. SSE is not the same as replacing math.Pow() in my mind

I want to know what you think about this though

from rays.

t-mat commented on August 15, 2024

I would like to see 2 versions of code for every language

"mainline" version
- Standard, platform independent
- Only algorithm/calculation level optimization is allowed
"hacked" version
- Non-standard, deeply platform/language dependent
- Any kind of optimization is allowed
- But every single line is written in target language
  - ex. For C++, intrinsics are allowed, but inline assembly is prohibited

"mainline" shows idiomatic way. Good for the language tourists. "hacked" shows back street of the language. Tourists should not walk into there, but locals enjoy the secret side of the language.

Some reasons

I think there are 4 ranks of goodness

Standard, platform independent, straight forward code
- Math.Pow(), Math.rand
Algorithm/calculation level optimization
- Pseudo lazy evaluation (algorithm)
- Replace division with reciprocal (calculation)
Non-standard, platform dependent, deeply language dependent
- p33, rnd() (non-standard)
- SSE vector (platform/runtime environment dependent)
- PR #13 (deeply language dependent)
- Commonly used external library (ex. PCRE)
Out of the target
- Special purpose external library
- Another language (inline asm)

I would like to see 1. and 2. in mainline of the code. But I also want to see 'insanely optimized' version by 3.
'Insane' version should not allowed to merge to mainline, but as you have seen these optimization clearly show some kind of the room and weakness.

More random thoughts:

If we have an ideal compiler, auto-vectorization (ex. SSE optimizing) should be done by the compiler.
- Also clamping, 2D-RNG
Usually, imperative programming language allows side effect, so compiler/interpreter could (should) not achieve lazy evaluation without special notations.
- ex. Some kind of "pure" function attributes.
Process-wide GC is seriously bad.
RNG is not so good. LFSR variant is widely used for this purpose
- eg. Xorshift, MT
- Or use standard library
Division to reciprocal number multiplication conversion should be allowed.
- This conversion is not same (ex. x87) but widely used.

from rays.

tkalbitz commented on August 15, 2024

+1 for @t-mat

There should be a clean vanilla version as basis for a "dirty" optimized version.

from rays.

kidoman commented on August 15, 2024

+1 for a clean reference version; and a crazy all out optimized version

from rays.

Confusing about (un)optimization for all langs about rays HOT 4 CLOSED

Comments (4)

Some reasons

Related Issues (7)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs