cc <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Exponential: Dual Xeon Gold 6154 result about laser HOT 3 OPEN

mratsim commented on May 15, 2024

Exponential: Dual Xeon Gold 6154 result

from laser.

Comments (3)

c-blake commented on May 15, 2024 1

That's pretty good for exp. For log it is usually helpful to take advantage of the fact that most CPUs can usually domain/range reduce for you to just one multiple of two. Specifically, any IEEE float (single, double, probably even half-precision) has an exponent field which tells you the logarithm base 2 up to the mantissa field. So, if yo are willing to assume IEEE float representation then you only actually need a good fast approximation of log_2 on [1/2,1) or [1,2). (log base e of course is just re-scaled log_2).

In the past I've done the mantissa part with a mixture of Taylor series for near 1 and Padé approximants for further away. That got down to about 60 to 17 cycles on Intel core2 series from Nehalem to Skylake, respectively, for a 1e-7 relative error single precision log, anyway.

There may be a faster approach than Taylor/Padé, but it seemed very likely the "IEEE exponent field" trick would be applicable to whatever you do if you're willing to assume IEEE. I also perused your references and didn't see it mentioned. They seem more exp-focused, but log is generally more poorly optimized by various stdlibs/vendors in my experience.

(edit: clarified performance mention in time and accuracy)

from laser.

c-blake commented on May 15, 2024 1

Well, when you do get to log, I suggest as a starting point these blog posts from some Ebay guy. {That material didn't exist back when I was figuring this out for myself :-) } :

https://www.ebayinc.com/stories/blogs/tech/fast-approximate-logarithms-part-i-the-basics/

https://www.ebayinc.com/stories/blogs/tech/fast-approximate-logarithms-part-ii-rounding-error/

https://www.ebayinc.com/stories/blogs/tech/fast-approximate-logarithms-part-iii-the-formulas/

He never really mentions Padé approximants, though his rational approximations are similar. He argues for a [3/4, 6/4) interval. (As mentioned, any [t,2t) works - whatever gets you a fast, well approximated "shifted mantissa".) Seems like if you have all this Intel-specific SIMD code, assuming IEEE (even little-endian IEEE) is not much of a stretch.

Particularly with log, there can be unavoidable speed/accuracy trade-offs. So, you may want to have a Nim call that accepts a target precision (and conceivably even returns an error estimate), and then just default it to 23 bits for float32 and 51 bits for float64 and let the call-site use a larger error when faster performance is desired and in-context lower accuracy is acceptable.

from laser.

mratsim commented on May 15, 2024

Log focus is coming, I didn't get to that yet.

from laser.

Exponential: Dual Xeon Gold 6154 result about laser HOT 3 OPEN

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs