edanor / umesimd Goto Github PK

View Code? Open in Web Editor NEW

89.0 10.0 18.0 6.04 MB

UME::SIMD A library for explicit simd vectorization.

License: Other

C++ 98.49% C 1.18% Makefile 0.28% Shell 0.04% Batchfile 0.01% CMake 0.01%

avx simd scalar-types performance-tuning vectorization benchmark vector avx2 avx512 neon

umesimd's Introduction

NOTE: UME::Vector library has been moved to github! Please see: https://github.com/edanor/umevector

Current stable release is: v0.8.1
To checkout stable release use:

git clone https://[email protected]/edanor/umesimd.git
git checkout tags/v0.8.1

UME::SIMD is an explicit vectorization library. The library defines homogeneous interface for accessing functionality of SIMD registers of AVX, AVX2, AVX512 and IMCI (KNCNI, k1om) instruction set.

You can find the most recent documentation and tutorials here: UME::SIMD tutorials.
Also a link to older wiki(deprecated): wiki pages.

For quotations please refer to: A high-performance portable abstract interface for explicit SIMD vectorization

This piece of code was developed as part of ICE-DIP project at CERN:

"ICE-DIP is a European Industrial Doctorate project funded by the European Community's 7th Framework programme Marie Curie Actions under grant PITN-GA-2012-316596".

All questions should be submitted using the bug tracking system:

bug tracker

or by sending e-mail to:

[email protected]

RELEASE NOTES for v0.8.1

Interface:
-
Performance tuning:
-
Benchmarks:
- Add VS2015 solution for benchmarks.

Fixes:
- remove unnecessary include in explog.
- fix explog to use more portable reinterpret-cast

Tests:
-

Other:
- Update Readme

Donations

I am not getting paid for developing this software, so any type of help would be appreciated. If you like this project and you would like to support it, please feel free to make a volontary donation. This software will remain free regardless of any donations, but money can help keeping it up to date and bug-free.

umesimd's People

Contributors

Stargazers

Watchers

Forkers

marehr noma amadio pcanal heretherebedragons sinozope templeblock mypopydev zhangjitao ngzqqb mfdeakin dendisuhubdy 5l1v3r1 fjardon luyuncheng sparks-code duanmeng dangyajun

umesimd's Issues

[Tests] Fix precision constraints for FP math operations

Originally reported by: edanor (Bitbucket: edanor, GitHub: edanor)

Some floating point operations rely on approximation. The problem is to test such functions against standard version with automatically generated data inputs.

What should be done is:

design a code pattern for calculating 'ULP distance' from the reference value,
create a test fail/pass threshold value expressed in this ULP distance,
report the ulp precision (for selected tests only) together with test result.

Bitbucket: https://bitbucket.org/edanor/umesimd/issue/38

Add copysign to interface

Originally reported by: edanor (Bitbucket: edanor, GitHub: edanor)

COPYSIGN and MCOPYSIGN should be added with behaviour similar to one of std::copysign.

Bitbucket: https://bitbucket.org/edanor/umesimd/issue/34

GATHER/SCATTER with uniform stride

Originally reported by: edanor (Bitbucket: edanor, GitHub: edanor)

Add interface functions for GATHER/SCATTER using a single scalar value to define stride for each loaded element.

Bitbucket: https://bitbucket.org/edanor/umesimd/issue/33

Missing assingment operators

Originally reported by: edanor (Bitbucket: edanor, GitHub: edanor)

Assignment operators are not inherited from base class. For that reason, all assignment operators have to be defined in specialized classes.

Bitbucket: https://bitbucket.org/edanor/umesimd/issue/26

`_mm256_permutexvar_epi32` not declared in g++

g++-6 -mavx512f -mavx512cd -mavx512bw -mavx512dq -mavx512vl -mavx512ifma -mavx512vbmi Example1.cpp

In file included from ../plugins/avx512/UMESimdVecUintAVX512.h:56:0,
                 from ../plugins/UMESimdPluginAVX512.h:113,
                 from ../UMESimd.h:133,
                 from Example1.cpp:31:
../plugins/avx512/uint/UMESimdVecUint32_8.h: In member function ‘UME::SIMD::SIMDVec_u<unsigned int, 8u> UME::SIMD::SIMDVec_u<unsigned int, 8u>::swizzle(const UME::SIMD::SIMDSwizzle<8u>&) const’:
../plugins/avx512/uint/UMESimdVecUint32_8.h:282:67: error: ‘_mm256_permutexvar_epi32’ was not declared in this scope
             __m256i t0 = _mm256_permutexvar_epi32(mVec, sMask.mVec);
                                                                   ^
../plugins/avx512/uint/UMESimdVecUint32_8.h: In member function ‘UME::SIMD::SIMDVec_u<unsigned int, 8u> UME::SIMD::SIMDVec_u<unsigned int, 8u>::swizzle()’:
../plugins/avx512/uint/UMESimdVecUint32_8.h:296:59: error: there are no arguments to ‘_mm256_permutexvar_epi32’ that depend on a template parameter, so a declaration of ‘_mm256_permutexvar_epi32’ must be available [-fpermissive]
             __m256i t1 = _mm256_permutexvar_epi32(mVec, t0);
                                                           ^
../plugins/avx512/uint/UMESimdVecUint32_8.h:296:59: note: (if you use ‘-fpermissive’, G++ will accept your code, but allowing the use of an undeclared name is deprecated)
../plugins/avx512/uint/UMESimdVecUint32_8.h: In member function ‘UME::SIMD::SIMDVec_u<unsigned int, 8u>& UME::SIMD::SIMDVec_u<unsigned int, 8u>::swizzlea(const UME::SIMD::SIMDSwizzle<8u>&)’:
../plugins/avx512/uint/UMESimdVecUint32_8.h:311:61: error: ‘_mm256_permutexvar_epi32’ was not declared in this scope
             mVec = _mm256_permutexvar_epi32(mVec, sMask.mVec);
                                                             ^

SIMD64_8[ui] not constructable

The following works without problem:

UME::SIMD::SIMD32_8u v2(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
               16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31);
std::cout << "SIMD32_8u:\n";
printVector(v2);

But the same for 512bit doesn't work

UME::SIMD::SIMD64_8u v3(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
           16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,
           32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47,
           48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63);
std::cout << "SIMD64_8u:\n";
printVector(v3);

error: no matching function for call to ‘UME::SIMD::SIMDVec_u<unsigned char, 64u>::SIMDVec_u(int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int)’
                48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63);
                                                                              ^

[AVX2, SIMD8_32u] Functions using AVX code

Originally reported by: edanor (Bitbucket: edanor, GitHub: edanor)

Some functions in SIMDVecAVX2_u<uint32_t, 8> have been copied from AVX for initial testing. Now since there are generic tests available, these should be re-factored to use AVX2 intrinsics. Functions are, among others, ADD & MUL. It would be good to fix them ASAP for performance.

Bitbucket: https://bitbucket.org/edanor/umesimd/issue/1

Add benchmark for dense/sparse SIMD vector masking

When calculations are performed within an 'if-else' statement for scalar operations, only one part of the statement has to be executed ('if' block or 'else' block). When using masking, both blocks have to be executed even if the required conditions do not apply for most of the data elements.

A benchmark should be created to measure this masking overhead and, if possible, show in which cases scalar code might be a better option.

Mask assignable operators

Originally reported by: edanor (Bitbucket: edanor, GitHub: edanor)

One of the requested features is to have syntax like this:

vec1[mask] = vec2

This can be possible using the same design pattern as std::mask_arrays are using.
Overloading assign operators could give a very nice syntactic feature to the users.

This feature should be tested for performance.

Bitbucket: https://bitbucket.org/edanor/umesimd/issue/11

Replace all operators overloaded as member functions with non-member overloads.

Originally reported by: edanor (Bitbucket: edanor, GitHub: edanor)

Except operators using scalar LHS operand, all overloaded operators are implemented as members in specialized classes.

Because of that it is necessary to both define operators in abstract interface and then re-define them in every class. This additional override is exactly the same as the interface function. Defining all operators as non-member will reduce code size and code repetition.

What needs to be considered is the overhead of such operators.

This issue relates directly to Issue #25

Bitbucket: https://bitbucket.org/edanor/umesimd/issue/27

Add BANDNOT and LANDNOT to interface.

Originally reported by: edanor (Bitbucket: edanor, GitHub: edanor)

an operation for:

(~a) & b

should be added to the bitwise interface. Similar logical operation should be added to mask interface.

Bitbucket: https://bitbucket.org/edanor/umesimd/issue/19

Plugin using OpenMP #pragmas simd

Originally reported by: edanor (Bitbucket: edanor, GitHub: edanor)

A plugin currently named autovec has been proved to provide at least partial vectorization, in a controllable manner. A full implementation using OpenMP would be a nice fallback for new architectures.

Bitbucket: https://bitbucket.org/edanor/umesimd/issue/36

saturated addition/subtraction unnecessary on FP vectors

Originally reported by: edanor (Bitbucket: edanor, GitHub: edanor)

Saturation in FP is available using +/-NaN and thus exposure of such operations in the interface is unnecessary.

Saturated operations should be moved to separate interface classes and inherited only in integer vector types.

Bitbucket: https://bitbucket.org/edanor/umesimd/issue/20

Unused variable

/home/marehr/develope/seqan-src/include/umesimd/plugins/avx512/uint/UMESimdVecUint64_16.h: In member function ‘UME::SIMD::SIMDVec_u<long unsigned int, 16u>& UME::SIMD::SIMDVec_u<long unsigned int, 16u>::swizzlea(const UME::SIMD::SIMDSwizzle<16u>&)’:
/home/marehr/develope/seqan-src/include/umesimd/plugins/avx512/uint/UMESimdVecUint64_16.h:281:34: warning: variable ‘result’ set but not used [-Wunused-but-set-variable]
             alignas(64) uint64_t result[16];

Which is indeed not used:

        UME_FORCE_INLINE SIMDVec_u & swizzlea(SIMDSwizzle<16> const & sMask) {
            alignas(64) uint32_t raw_smask[16];
            alignas(64) uint64_t raw[16];
            alignas(64) uint64_t result[16];

            _mm512_store_epi32(raw_smask, sMask.mVec);
            _mm512_store_epi64(&raw[0], mVec[0]);
            _mm512_store_epi64(&raw[8], mVec[1]);

            for(unsigned int i = 0; i < 16; i++) {
                result[i] = raw[raw_smask[i]];
            }

            mVec[0] = _mm512_load_epi64(&raw[0]);
            mVec[1] = _mm512_load_epi64(&raw[1]);
            return *this;
        }

Implementation of pRNG algorithms within UME::SIMD

Originally reported by: Guilherme Amadio (Bitbucket: amadio, GitHub: amadio)

It would be nice to have at least a simple (and thread-safe, if possible) random number generator in UME::SIMD. Even a linear congruential generator should be good enough for some applications, and ideally a generator with good randomness for more sensitive workloads (e.g., particle simulation).

The links below might be helpful:

Bitbucket: https://bitbucket.org/edanor/umesimd/issue/41

Does not compile with clang 3.9.0 on Haswell

Here's the output:

In file included from ./umesimd/plugins/avx2/UMESimdVecUintAVX2.h:57:
./umesimd/plugins/avx2/uint/UMESimdVecUint64_4.h:62:11: error: explicit
      specialization of 'UME::SIMD::SIMDVec_u<unsigned long, 4>' after instantiation
    class SIMDVec_u<uint64_t, 4> :
          ^~~~~~~~~~~~~~~~~~~~~~
./umesimd/plugins/avx2/uint/UMESimdVecUint32_4.h:142:27: note: implicit
      instantiation first required here
            return assign(b);
                          ^
In file included from kernel_simd_class_umesimd.cpp:6:
In file included from /home/b/bemnoack/repositories/simd_benchmarks/include/common/kernel.hpp:17:
In file included from ./umesimd/UMESimd.h:137:
In file included from ./umesimd/plugins/UMESimdPluginAVX2.h:97:
In file included from ./umesimd/plugins/avx2/UMESimdVecUintAVX2.h:57:
./umesimd/plugins/avx2/uint/UMESimdVecUint64_4.h:757:9: warning: '/*' within
      block comment [-Wcomment]
        /*UME_FORCE_INLINE SIMDVec_u div(SIMDVecMask<4> const & mask, SIMDVec_u const & b) const {
        ^
./umesimd/plugins/avx2/uint/UMESimdVecUint64_4.h:763:9: warning: '/*' within
      block comment [-Wcomment]
        /*UME_FORCE_INLINE SIMDVec_u div(uint64_t b) const {
        ^
./umesimd/plugins/avx2/uint/UMESimdVecUint64_4.h:787:9: warning: '/*' within
      block comment [-Wcomment]
        /*UME_FORCE_INLINE SIMDVec_u & diva(SIMDVecMask<4> const & mask, SIMDVec_u const & b) {
        ^
./umesimd/plugins/avx2/uint/UMESimdVecUint64_4.h:793:9: warning: '/*' within
      block comment [-Wcomment]
        /*UME_FORCE_INLINE SIMDVec_u & diva(uint64_t b) {
        ^
./umesimd/plugins/avx2/uint/UMESimdVecUint64_4.h:802:9: warning: '/*' within
      block comment [-Wcomment]
        /*UME_FORCE_INLINE SIMDVec_u & diva(SIMDVecMask<4> const & mask, uint64_t b) {
        ^
In file included from kernel_simd_class_umesimd.cpp:6:
In file included from /home/b/bemnoack/repositories/simd_benchmarks/include/common/kernel.hpp:17:
In file included from ./umesimd/UMESimd.h:137:
In file included from ./umesimd/plugins/UMESimdPluginAVX2.h:100:
./umesimd/plugins/avx2/UMESimdCastOperatorsAVX2.h:153:36: error: template
      specialization requires 'template<>'
    inline SIMDVec_u<uint64_t, 4>::operator SIMDVec_i<int64_t, 4>() const {
           ~~~~~~~~~~~~~~~~~~~~~~  ^
    template<>
./umesimd/plugins/avx2/UMESimdCastOperatorsAVX2.h:217:36: error: template
      specialization requires 'template<>'
    inline SIMDVec_u<uint64_t, 4>::operator SIMDVec_f<double, 4>() const {
           ~~~~~~~~~~~~~~~~~~~~~~  ^
    template<>
./umesimd/plugins/avx2/UMESimdCastOperatorsAVX2.h:848:36: error: template
      specialization requires 'template<>'
    inline SIMDVec_u<uint64_t, 4>::operator SIMDVec_u<uint32_t, 4>() const {
           ~~~~~~~~~~~~~~~~~~~~~~  ^
    template<>
5 warnings and 4 errors generated.

[AVX512] SIMD4: review implementation - remove blend operations

Originally reported by: edanor (Bitbucket: edanor, GitHub: edanor)

SIMD 4 implementation uses blend operation to perform mask operations for non AVX512VL instruction sets. This will most likely introduce overhead. The same can be done using cast to 512b vectors and using mask operations supported by AVX512F.

The task is to review the code for SIMD4 and make sure that all blending operations are removed.

Bitbucket: https://bitbucket.org/edanor/umesimd/issue/21

[AVX512] Missing fallback for MLOAD/MSTORE for pure AVX512F

Originally reported by: edanor (Bitbucket: edanor, GitHub: edanor)

AVX512 seems to be using only AVX512VL operations, and not emulating such for pure AVX512F (See SIMD4_32f).

Unit tests testing specifically LOAD and STORE operations are missing. While LOAD/STORE are used extensively by other unit tests, it is not the same case with LOADA, STOREA, MLOAD, MSTORE, MLOADA, MSTOREA.

Bitbucket: https://bitbucket.org/edanor/umesimd/issue/28

Unit test clean-up required.

Originally reported by: edanor (Bitbucket: edanor, GitHub: edanor)

There are multiple unit test fails even when running with full scalar emulation. This can be as well caused by incorrect test data sets.

Bitbucket: https://bitbucket.org/edanor/umesimd/issue/22

Mask types dont need additional parameters.

Originally reported by: edanor (Bitbucket: edanor, GitHub: edanor)

Mask types use additional parameter for scalar type. The original reason for that was to be able to specialize mask types for different scalar types, but this is unnecessary for AVX512 and other extensions already use only one mask representative.

This is unnecessary and should be removed across all plugins.

Bitbucket: https://bitbucket.org/edanor/umesimd/issue/15

AVX and AVX2 implementation of SIMD2_64x needs improvement

Originally reported by: edanor (Bitbucket: edanor, GitHub: edanor)

Implementation of SIMD2 types for 64b scalars is not using intrinsic types. This should be fixed.

Bitbucket: https://bitbucket.org/edanor/umesimd/issue/29

Unit test - multiple test apps

Originally reported by: edanor (Bitbucket: edanor, GitHub: edanor)

Compiling current test suite takes insane amounts of time. For development purposes only a minimal set of tests is required, usually regarding single SIMD type being specialized. Separate testing apps can be provided to compile only tests for required type. This would also allow faster parallel builds.

Bitbucket: https://bitbucket.org/edanor/umesimd/issue/40

Bug in assign() of avx512/uint/64_2 version

Hi,

the following will produce a wrong result for avx512/uint:

using TVector = UME::SIMD::SIMDVec_u<uint64_t, 2>;
TVector a{10, 0}, b{0, 10}, retval(0);

auto c = a.cmpgt(b);
retval.assign(c, ~uint64_t(0));

// retval = (0xFFFFFFFF00000000, 0x0), but should be
// retval = (0xFFFFFFFFFFFFFFFF, 0x0)

Unit tests have to be split into multiple translation units.

Originally reported by: edanor (Bitbucket: edanor, GitHub: edanor)

Large number of code generated using templates cannot be handled without flags allowing big object files generated (/bigobj in VS). This problem can be solved easily by moving test functions for specific vector types into separate .cpp files.

Bitbucket: https://bitbucket.org/edanor/umesimd/issue/23

Does not compile with GCC 6.2.0 on Knights Landing

Here's the rather long build log:

gcc_6.2.0_umesimd_build_log_knl.txt

Raise warning level

Hi,

I would suggest to raise the warning level of the library.

We use the following warning level

gcc + icc: -W -Wall -pedantic -fstrict-aliasing -Wstrict-aliasing
clang: -W -Wall -pedantic -fstrict-aliasing -Wstrict-aliasing -Wshorten-64-to-32
msvc: /W2, but would like to use /W3

Best regards!

Source file format incompatible with Cray Compiler (and inconsistent)

When trying to build with the Cray Compiler, I got the following message

CC-7 crayc++: ERROR File = ./umesimd/UMESimd.h, Line = 1
  The indicated token is not valid in this context.
  // The MIT License (MIT)
  ^

Looking at the files, there seems to some inconsistency... and windows line endings. ;-)

file UMESimd.h 
UMESimd.h: UTF-8 Unicode (with BOM) C++ program text, with CRLF line terminators

file UMESimdTraits.h 
UMESimdTraits.h: ASCII C++ program text, with CRLF line terminators

The Problem seems to be the BOM (byte order mark) which is some magic number at the beginning of the file. The Cray compiler seems not be able to deal with it. If I open such a file in GNOME's gedit, set the cursor to the beginning of the file, the BOM materialises as an invisible character, i.e. I have to press the right arrow key twice to get the cursor one position to the right. If I delete the first character (no visible change), the file works and the compiler complains with the same message for the next include.

I'll report that to Cray too, but would hope for you to fix the format, or maybe put a script inside the repository, if your editor of choice enforces that format.

[AVX512] Unsigned 64bit-integer comparison fails

#include "../UMESimd.h"

using namespace UME::SIMD;

int main()
{
    SIMD8_64u a((uint64_t)(~0));
    SIMD8_64u b(0);

    auto c = a.cmpgt(b);
    std::cout << a[0] << " = a > b = " << b[0] << " <==> " << (c[0] ? "true" : "false") << std::endl;

    return 0;
}

Should return

18446744073709551615 = a > b = 0 <==> true

but returns

18446744073709551615 = a > b = 0 <==> false

Const correctness in IntermediateIndex

Consider:

#include "../UMESimd.h"

using namespace UME::SIMD;

template <typename T1, typename T2>
bool compare(T1 const & value1, T2 const & value2)
{
    if (value1 == value2) {
        std::cout << "true" << std::endl;
        return true;
    } else {
        std::cout << "false" << std::endl;
        return false;
    }
}

int main()
{
    SIMD4_64u a(5, 3, 8, 4), b(13, 984, 5, 0);

    // won't work
    compare(a[0], b[2]);

    // works
    if (a[0] == b[2]) {
        std::cout << "true" << std::endl;
    } else {
        std::cout << "false" << std::endl;
    }
    return 0;
}

This will produce

gcc:

index_bug.cpp: In instantiation of ‘bool compare(const T1&, const T2&) [with T1 = UME::SIMD::IntermediateIndex<UME::SIMD::SIMDVec_u<long unsigned int, 4u>, long unsigned int>; T2 = UME::SIMD::IntermediateIndex<UME::SIMD::SIMDVec_u<long unsigned int, 4u>, long unsigned int>]’:
index_bug.cpp:22:23:   required from here
index_bug.cpp:8:16: error: no match for ‘operator==’ (operand types are ‘const UME::SIMD::IntermediateIndex<UME::SIMD::SIMDVec_u<long unsigned int, 4u>, long unsigned int>’ and ‘const UME::SIMD::IntermediateIndex<UME::SIMD::SIMDVec_u<long unsigned int, 4u>, long unsigned int>’)
     if (value1 == value2) {
         ~~~~~~~^~~~~~~~~
index_bug.cpp:8:16: note: candidate: operator==(long unsigned int, long unsigned int) <built-in>
index_bug.cpp:8:16: note:   conversion of argument 2 would be ill-formed:
index_bug.cpp:8:16: error: passing ‘const UME::SIMD::IntermediateIndex<UME::SIMD::SIMDVec_u<long unsigned int, 4u>, long unsigned int>’ as ‘this’ argument discards qualifiers [-fpermissive]
In file included from ../plugins/UMESimdPluginScalarEmulation.h:36:0,
                 from ../UMESimd.h:147,
                 from index_bug.cpp:1:
../plugins/../UMESimdInterface.h:654:26: note:   in call to ‘UME::SIMD::IntermediateIndex<VEC_TYPE, SCALAR_TYPE>::operator SCALAR_TYPE() [with VEC_TYPE = UME::SIMD::SIMDVec_u<long unsigned int, 4u>; SCALAR_TYPE = long unsigned int]’
         UME_FORCE_INLINE operator SCALAR_TYPE() { return mVecRef_RW.extract(mIndexRef); }
                          ^~~~~~~~

[...]

clang:

index_bug.cpp:8:16: error: invalid operands to binary expression ('const UME::SIMD::IntermediateIndex<UME::SIMD::SIMDVec_u<unsigned long, 4>, unsigned
      long>' and 'const UME::SIMD::IntermediateIndex<UME::SIMD::SIMDVec_u<unsigned long, 4>, unsigned long>')
    if (value1 == value2) {
        ~~~~~~ ^  ~~~~~~
index_bug.cpp:22:5: note: in instantiation of function template specialization 'compare<UME::SIMD::IntermediateIndex<UME::SIMD::SIMDVec_u<unsigned
      long, 4>, unsigned long>, UME::SIMD::IntermediateIndex<UME::SIMD::SIMDVec_u<unsigned long, 4>, unsigned long> >' requested here
    compare(a[0], b[2]);
    ^
1 error generated.

The problem is that all comparison operators are not declared as const. Furthermore, the templated function won't allow implicit type conversion of the second argument.

Visual studio warnings

Originally reported by: edanor (Bitbucket: edanor, GitHub: edanor)

Visual studio 2015 is reporting a large number of warnings when building unit tests. A cleanup is required.

Bitbucket: https://bitbucket.org/edanor/umesimd/issue/39

[BENCH] Statistics refactoring

Originally reported by: edanor (Bitbucket: edanor, GitHub: edanor)

In file microbenchmarks/utilities/TimingStatistics.h, a class TimingStatistics is defined, together with a more general Statistics template.

The task is to remove completely TimingStatistics and replace its use with templated version. In addition to that, all files should also calculate some numeric error measure, so that the it would be possible to compare precision of different implementations (see RMS error in matmul benchmark).

Bitbucket: https://bitbucket.org/edanor/umesimd/issue/32

How should the structure of UME::SIMD in /usr/include be?

Hi Przemyslaw,

I came across the problem, how to include UME::SIMD in my programs. As an example, we have in our repository the folder structure /include/seqan that means the distributors can copy all our header files from there to some common /include folder (e.g. /usr/include, if you install apt-get install seqan-dev you will find our header files in in that folder).

In the source code, you can include the headers like this:

#include <seqan/bam_io.h>
// ....

See http://seqan.readthedocs.io/en/master/Tutorial/InputOutput/FileIOOverview.html for an example.

Your repository has no obvious /include folder.

So, do you want that users include as

#include <umesimd/UMESimd.h> // so?
#include <UMESimd/UMESimd.h> // or, so?
#include <UMESimd.h> // or, so?

I looked into some projects, how they handle it (the part in [] is what will be copied over):

mysql: has /include/[**.h] in the repository, installs as /usr/include/mysql/[**.h]
protobuf: has /src/[google/protobuf/**.h]which will (partially) be copied to /usr/include/[google/protobuf/**.h]
poppler: has [/glib/**.h] -> /usr/include/poppler/[glib/**.h]
boost: Is a collection of repositories that follow the same rule
- For example hana has /include/[boost/**.h] -> /usr/include/[boost/**.h]

Thus, there are different philosophies how to address this. I'd say that purely C++ written libraries will most likely have a dedicated /include folder, but there is no common pattern if they will have the library name included in /include/[umesimd?].

Add unit test for pack/unpack operations

Originally reported by: edanor (Bitbucket: edanor, GitHub: edanor)

Bitbucket: https://bitbucket.org/edanor/umesimd/issue/18

Can't build library with gcc and clang for knl

Hi umesimd developers,

first of all, thank you for your great work and effort.
And it is also great to see that you migrated to github.

Now to my issue, in umesimd/examples/

g++-6 -std=c++11 -march=knl Example1.cpp

and

clang++-3.9 -std=c++11 -march=knl Example1.cpp

will result in similar errors of missing functions:

In file included from Example1.cpp:31:
In file included from ./../UMESimd.h:133:
In file included from ./../plugins/UMESimdPluginAVX512.h:98:
In file included from ./../plugins/avx512/UMESimdVecUintAVX512.h:52:
./../plugins/avx512/uint/UMESimdVecUint32_4.h:977:31: error: use of undeclared identifier '_mm512_reduce_add_epi32'
            uint32_t retval = _mm512_reduce_add_epi32(t0);
                              ^
./../plugins/avx512/uint/UMESimdVecUint32_4.h:984:31: error: use of undeclared identifier '_mm512_mask_reduce_add_epi32'
            uint32_t retval = _mm512_mask_reduce_add_epi32(t1, t0);
                              ^
./../plugins/avx512/uint/UMESimdVecUint32_4.h:990:31: error: use of undeclared identifier '_mm512_reduce_add_epi32'
            uint32_t retval = _mm512_reduce_add_epi32(t0);
                              ^
[...]

Because we are also an IPCC, we asked our persons in charge and they said, that it seems that not all intrinsics are integral part of the architecture’s instruction set but extensions in software (like the reduce intrinsics). Since it seems that gcc and a clang add those feature way later, it would be nice to be able to compile the library regardless of the compiler.

Plugin system requires expanding.

Originally reported by: edanor (Bitbucket: edanor, GitHub: edanor)

Currently each plugin occupies a single file. File hierarchy should be expanded:

add subdirectory to store plugins
split plugin files into multiple files with SIMD classes occupying separate files.

Bitbucket: https://bitbucket.org/edanor/umesimd/issue/13

Sporadic failures in COPYSIGN

Originally reported by: edanor (Bitbucket: edanor, GitHub: edanor)

Copysign fails even with scalar emulation. This might be a blocker for VecGeom.

Bitbucket: https://bitbucket.org/edanor/umesimd/issue/37

Increase reusal of specialized SIMD classes

Originally reported by: edanor (Bitbucket: edanor, GitHub: edanor)

Some SIMD types, such as SIMD1 and SIMD2 will have similar if not the same implementation for different plugins. It would be very useful to have some code that can be reused.

One of the problems with code reuse is separate naming for classes of different plugins.

Bitbucket: https://bitbucket.org/edanor/umesimd/issue/16

KNC: SIMD8_32f segfault on histogram1

Originally reported by: edanor (Bitbucket: edanor, GitHub: edanor)

Histogram1 microbenchmark fails to execute on KNC when using SIMD8_32f. Potential problem is caused by non-aligned memory access pattern.

For the moment SIMD8_32f native implementation has been disabled to prevent segfaults.

Bitbucket: https://bitbucket.org/edanor/umesimd/issue/17

Replace insert/extract in scalar emulation with load/store

Originally reported by: edanor (Bitbucket: edanor, GitHub: edanor)

Because we want to explicitly specialize only operations that can be expressed in specific instruction set, all unsupported operations should be left with resolution to scalar emulation.

Most of emulated functions are using insert/extract on a per-element basis. Performing insert/extract operations on vectors is slow for non-emulated vectors. Doing load/store on emulated data types shouldn't create any slow-downs due to compiler optimizations, and even if not, it will still reduce slow-down on target vector code.

This proposal is to re-write scalar emulation functions with LOAD/STORE instead of INSERT/EXTRACT.

Bitbucket: https://bitbucket.org/edanor/umesimd/issue/14

Implementation issues with IntermediateIndex

#include "../UMESimd.h"

using namespace UME::SIMD;

int main()
{
    SIMD4_64i a(5, 3, 8, 4), b(13, 984, 5, 0);

    // this can't be constructed, because IntermediateIndex's constructor is private
    // decltype(a[0]) = UME::SIMD::IntermediateIndex<UME::SIMD::SIMDVec_i<long int, 4u>, long int>
    // auto a_0 = a[0];
    // auto b_2 = b[2];

    auto true1 = a[0] == b[2];
    std::cout << "a[0] = " << a[0] << " == " << b[2] << " = b[2]: " << true1 << std::endl;

    // auto true2 = a_0 == b_2;
    // std::cout << "a_0  = " << a_0 << " == " << b_2 << " = b_2 : " << true2 << std::endl;

    // decltype(ca) = const UME::SIMD::SIMDVec_i<long int, 4u>&
    const auto & ca = a;
    const auto & cb = b;

    auto true3 = ca[0] == cb[2];
    std::cout << "ca[0] = " << ca[0] << " == " << cb[2] << " = cb[2]: " << true3 << std::endl;

    // decltype(ca[0]) = long int
    auto ca_0 = ca[0];
    auto cb_2 = cb[2];

    auto true4 = ca_0 == cb_2;
    std::cout << "ca_0  = " << ca_0 << " == " << cb_2 << " = cb_2 : " << true4 << std::endl;

    return 0;
}

Will produce

a[0] = 5 == 5 = b[2]: 0
ca[0] = 5 == 5 = cb[2]: 1
ca_0  = 5 == 5 = cb_2 : 1

The first line is completely wrong.

I stumbled across this, as I wanted to fix the const correctness in the IntermediateIndex. I already know what the issue is and will create a pull request later. The purpose of this issue is only for documentation.

Interface issue for assign() on intergers?

Hi,

for float and double the interface is like this:

assign(SIMDVecMask<4> const & mask, float b); // UMESimdVecFloat32_4
assign(SIMDVecMask<2> const & mask, double b); // UMESimdVecFloat64_2

Thus, the assignee type matches the scalar type of the vector.

For integers this isn't true:

assign(SIMDVecMask<4> const & mask, uint32_t b); // in avx512/uint/UMESimdVecUint16_4.h
assign(SIMDVecMask<4> const & mask, uint32_t b); // in avx512/uint/UMESimdVecUint32_4.h
assign(SIMDVecMask<8> const & mask, uint64_t b); // in avx512/uint/UMESimdVecUint64_8.h

I looked into the source code and apparently avx512/uint/UMESimdVecUint16_4.h was the only file where you didn't use uint16_t. I thought this is a interface inconsistency, but it seems to be only a bug.

REM operation missing in interface

Originally reported by: edanor (Bitbucket: edanor, GitHub: edanor)

Interface needs to be extended with reminder operation. This should be done for both integer and floating point numbers.

Bitbucket: https://bitbucket.org/edanor/umesimd/issue/24

Using logical operator on arithmetic vectors should return a mask.

Originally reported by: edanor (Bitbucket: edanor, GitHub: edanor)

For now the behaviour for | and || is that the returned value is a bitwise or operation result. The interface should return mask type for logical operators instead. This would make better consistency with how scalar types are treated in c++.

Bitbucket: https://bitbucket.org/edanor/umesimd/issue/30

Missing function for mask checking if mask is empty/mixed/full

I am missing an equivalent to Vc's isEmpty() function on the masks, or maybe a conversion operator to bool, that is true if at least one element of the mast is true.

With Intel you could use:

_mm512_mask2int

For other compilers, I use this replacement:

#if !defined(__INTEL_COMPILER)
// online defined by Intel Compiler
inline int _mm512_mask2int(__mmask16 k1) {
    return static_cast<int>(k1);
}
#endif

Rationale: You usually have three cases, that might be beneficial to check and handle differently:

empty mask: nothing to do
mixed mask: proceed with masked operation
full mask: proceed with unmasked operations

Update license header dates.

Year 2016 is closing, and some headers still have 2015 in the copyright notice. This should be fixed.

Const correctness of pointers

Hi again,

we ran into some problems compiling the library, because of const correctness of pointers.

In our case, we assumed that the gather operation would be fine with a const pointer, since it semantics suggests that it only loads stuff and doesn't mess with the data.

For example:

SIMDVec_u & gather(uint64_t * baseAddr, SIMDVec_u const & indices); // current
SIMDVec_u & gather(uint64_t const * baseAddr, SIMDVec_u const & indices);  // our suggestion

//or

SIMDVec_u & gather(uint64_t * baseAddr, uint64_t* indices); // current
SIMDVec_u & gather(uint64_t const * baseAddr, uint64_t const * indices); // our suggestion

Is there a reason to not guarantee constness? If not I would open a pull request to fix at least the gather operations.

Extend interface with masking operations.

Originally reported by: edanor (Bitbucket: edanor, GitHub: edanor)

As for other interfaces, masks should also implement MFI functions with masking operand. An example where it causes problems comes from VecGeom:

VECGEOM_FORCE_INLINE
void MaskedAssign(UmesimdBool_v const &cond, UmesimdBool_v const &thenval, UmesimdBool_v *const output)
{
//output->assign(cond, thenval);
UmesimdBool_v out_v;
out_v.assign(*output);
UmesimdBool_v t0 = cond.land(thenval);
UmesimdBool_v t1 = (!cond).land(out_v);
UmesimdBool_v t2 = t0 || t1;
output->assign(t2);
}

here the overloaded 'MaskedAssign' operation tries to perform blending between two masks.

If usage model comes from the users it should be provided automatically.

Bitbucket: https://bitbucket.org/edanor/umesimd/issue/35

BNOT: operator! used

Originally reported by: edanor (Bitbucket: edanor, GitHub: edanor)

operator! should be used for Logical Not (LNOT) operation. Instead it is being used for Bitwise Not (BNOT) operations. This should be fixed.

Bitbucket: https://bitbucket.org/edanor/umesimd/issue/31

Operators for mixed scalar-vector operations.

Originally reported by: edanor (Bitbucket: edanor, GitHub: edanor)

Currently only vec<->vec operations are permitted using overloaded C++ operators. Writing:

#!c++
SIMD4_32f a, b, c;
c = a + b;

Is permitted, while:

#!c++
SIMD4_32f a, b, c;
float d, e;
c = a + b; // OK
c = a + d; // Error: no implicit conversion between float and SIMD4_32f
c = e + b; // Error: no operator matching: 'operator+ (float, SIMD4_32f &)'

This requires actually adding two overloads for every operator: one for RHS scalar types can be defined as member function of SIMD types:

#!c++

VEC_TYPE operator+(VEC_TYPE const & a, SCALAR_TYPE b) const;

and second one for LHS scalars as friend functions:

#!c++

VEC_TYPE operator+(SCALAR_TYPE b, VEC_TYPE const & b) const;

Because of potential collisions with operators overloaded in std::, this cannot be done using templates, and thus requires explicit specialization of all scalar and SIMD types combinations.

Bitbucket: https://bitbucket.org/edanor/umesimd/issue/25

UME::SIMD master (5785acb) does not compile with GCC (AVX2 plugin)

Originally reported by: Guilherme Amadio (Bitbucket: amadio, GitHub: amadio)

I encountered the following problem when compiling VecCore with UME::SIMD backend enabled:

/home/amadio/src/umesimd/plugins/avx2/uint/UMESimdVecUint64_4.h: In member function ‘UME::SIMD::SIMDVec_u<long unsigned int, 4u>& UME::SIMD::SIMDVec_u<long unsigned int, 4u>::load(const UME::SIMD::SIMDVecMask<4u>&, const uint64_t*)’:
/home/amadio/src/umesimd/plugins/avx2/uint/UMESimdVecUint64_4.h:186:43: error: ‘__int64’ was not declared in this scope
             mVec = _mm256_maskload_epi64((__int64 const*)p, t0);
                                           ^
/home/amadio/src/umesimd/plugins/avx2/uint/UMESimdVecUint64_4.h:186:51: error: expected ‘)’ before ‘const’
             mVec = _mm256_maskload_epi64((__int64 const*)p, t0);
                                                   ^
/home/amadio/src/umesimd/plugins/avx2/uint/UMESimdVecUint64_4.h: In member function ‘UME::SIMD::SIMDVec_u<long unsigned int, 4u>& UME::SIMD::SIMDVec_u<long unsigned int, 4u>::loada(const UME::SIMD::SIMDVecMask<4u>&, const uint64_t*)’:
/home/amadio/src/umesimd/plugins/avx2/uint/UMESimdVecUint64_4.h:197:43: error: ‘__int64’ was not declared in this scope
             mVec = _mm256_maskload_epi64((__int64 const*)p, t0);
                                           ^
/home/amadio/src/umesimd/plugins/avx2/uint/UMESimdVecUint64_4.h:197:51: error: expected ‘)’ before ‘const’
             mVec = _mm256_maskload_epi64((__int64 const*)p, t0);
                                                   ^
/home/amadio/src/umesimd/plugins/avx2/uint/UMESimdVecUint64_4.h: In member function ‘uint64_t* UME::SIMD::SIMDVec_u<long unsigned int, 4u>::store(const UME::SIMD::SIMDVecMask<4u>&, uint64_t*) const’:
/home/amadio/src/umesimd/plugins/avx2/uint/UMESimdVecUint64_4.h:208:37: error: ‘__int64’ was not declared in this scope
             _mm256_maskstore_epi64((__int64 *)p, t0, mVec);
                                     ^
/home/amadio/src/umesimd/plugins/avx2/uint/UMESimdVecUint64_4.h:208:46: error: expected primary-expression before ‘)’ token
             _mm256_maskstore_epi64((__int64 *)p, t0, mVec);
                                              ^
/home/amadio/src/umesimd/plugins/avx2/uint/UMESimdVecUint64_4.h: In member function ‘uint64_t* UME::SIMD::SIMDVec_u<long unsigned int, 4u>::storea(const UME::SIMD::SIMDVecMask<4u>&, uint64_t*) const’:
/home/amadio/src/umesimd/plugins/avx2/uint/UMESimdVecUint64_4.h:219:37: error: ‘__int64’ was not declared in this scope
             _mm256_maskstore_epi64((__int64 *)p, t0, mVec);
                                     ^
/home/amadio/src/umesimd/plugins/avx2/uint/UMESimdVecUint64_4.h:219:46: error: expected primary-expression before ‘)’ token
             _mm256_maskstore_epi64((__int64 *)p, t0, mVec);
                                              ^
make[2]: *** [test/CMakeFiles/Math.dir/build.make:63: test/CMakeFiles/Math.dir/mathtest.cc.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:94: test/CMakeFiles/Math.dir/all] Error 2
make: *** [Makefile:139: all] Error 2

Bitbucket: https://bitbucket.org/edanor/umesimd/issue/42

edanor / umesimd Goto Github PK

umesimd's Introduction

Donations

umesimd's People

Contributors

Stargazers

Watchers

Forkers

umesimd's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs