GithubHelp home page GithubHelp logo

lemire / pospopcnt_avx512 Goto Github PK

View Code? Open in Web Editor NEW
11.0 5.0 3.0 67 KB

benchmarking positional population count

License: Apache License 2.0

Makefile 1.79% C++ 38.88% C 20.54% Shell 0.12% Assembly 38.68%

pospopcnt_avx512's Introduction

Position population-count benchmarks

requirements

  • Linux, with bare-metal access (you may need root)
  • Make sure your processor is in performance mode (not powersaving)
  • x64 processor supporting the AVX512BW extension

instructions

make
./instrumented_benchmark -v

Sample output

On a Cannon Lake processor:

$ ./instrumented_benchmark -v
n = 10000000 m = 1 
iterations = 100 
array size: 19.073 MB
nothing                                         instructions per cycle 0.20, cycles per 16-bit word:  0.000, instructions per 16-bit word 0.000 
min:       65 cycles,       13 instructions,           1 branch mis.,        0 cache ref.,        0 cache mis.
avg:     69.4 cycles,     13.0 instructions,         1.0 branch mis.,      0.1 cache ref.,      0.1 cache mis.


pospopcnt_u16_scalar                            alignments: 16 
instructions per cycle 3.75, cycles per 16-bit word:  17.325, instructions per 16-bit word 65.000 
min: 173251245 cycles, 650000159 instructions,         3 branch mis.,   407959 cache ref.,   283659 cache mis.
avg: 173473725.4 cycles, 650000160.2 instructions,           8.4 branch mis., 409996.0 cache ref., 295584.1 cache mis.
 0.367 GB/s 
estimated clock in range 3.102 GHz to 3.183 GHz


pospopcnt_u16_avx512bw_harvey_seal_1KB          alignments: 16 
instructions per cycle 0.46, cycles per 16-bit word:  0.497, instructions per 16-bit word 0.227 
min:  4966068 cycles,  2271648 instructions,         114 branch mis.,   547590 cache ref.,   262188 cache mis.
avg: 5296987.3 cycles, 2271648.7 instructions,     137.9 branch mis., 552796.2 cache ref., 275536.8 cache mis.
 12.407 GB/s 
estimated clock in range 2.916 GHz to 3.140 GHz


pospopcnt_u16_avx512bw_harvey_seal_512B         alignments: 16 
instructions per cycle 0.60, cycles per 16-bit word:  0.538, instructions per 16-bit word 0.325 
min:  5382323 cycles,  3245605 instructions,          82 branch mis.,   487390 cache ref.,   268386 cache mis.
avg: 5697539.5 cycles, 3245605.8 instructions,     125.8 branch mis., 498338.1 cache ref., 279114.2 cache mis.
 11.658 GB/s 
estimated clock in range 2.985 GHz to 3.142 GHz


pospopcnt_u16_avx512bw_harvey_seal_256B         alignments: 16 
instructions per cycle 0.84, cycles per 16-bit word:  0.618, instructions per 16-bit word 0.518 
min:  6184692 cycles,  5181931 instructions,         114 branch mis.,   441161 cache ref.,   267157 cache mis.
avg: 6661233.2 cycles, 5181931.2 instructions,     124.0 branch mis., 446684.9 cache ref., 280685.8 cache mis.
 10.162 GB/s 
estimated clock in range 2.956 GHz to 3.148 GHz

Reference

pospopcnt_avx512's People

Contributors

clausecker avatar lemire avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.