GithubHelp home page GithubHelp logo

edanor / umesimd Goto Github PK

View Code? Open in Web Editor NEW
89.0 10.0 18.0 6.04 MB

UME::SIMD A library for explicit simd vectorization.

License: Other

C++ 98.49% C 1.18% Makefile 0.28% Shell 0.04% Batchfile 0.01% CMake 0.01%
avx simd scalar-types performance-tuning vectorization benchmark vector avx2 avx512 neon

umesimd's Introduction

NOTE: UME::Vector library has been moved to github! Please see: https://github.com/edanor/umevector

Build Status Donate

Current stable release is: v0.8.1
To checkout stable release use:

git clone https://[email protected]/edanor/umesimd.git
git checkout tags/v0.8.1

UME::SIMD is an explicit vectorization library. The library defines homogeneous interface for accessing functionality of SIMD registers of AVX, AVX2, AVX512 and IMCI (KNCNI, k1om) instruction set.

You can find the most recent documentation and tutorials here: UME::SIMD tutorials.
Also a link to older wiki(deprecated): wiki pages.

For quotations please refer to: A high-performance portable abstract interface for explicit SIMD vectorization

This piece of code was developed as part of ICE-DIP project at CERN:

"ICE-DIP is a European Industrial Doctorate project funded by the European Community's 7th Framework programme Marie Curie Actions under grant PITN-GA-2012-316596".

All questions should be submitted using the bug tracking system:

bug tracker

or by sending e-mail to:

[email protected]

RELEASE NOTES for v0.8.1

Interface:
-
Performance tuning:
-
Benchmarks:
- Add VS2015 solution for benchmarks.

Fixes:
- remove unnecessary include in explog.
- fix explog to use more portable reinterpret-cast

Tests:
-

Other:
- Update Readme

Donations

I am not getting paid for developing this software, so any type of help would be appreciated. If you like this project and you would like to support it, please feel free to make a volontary donation. This software will remain free regardless of any donations, but money can help keeping it up to date and bug-free.

paypal

umesimd's People

Contributors

edanor avatar heretherebedragons avatar marehr avatar noma avatar pcanal avatar sawenzel avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

umesimd's Issues

[Tests] Fix precision constraints for FP math operations

Originally reported by: edanor (Bitbucket: edanor, GitHub: edanor)


Some floating point operations rely on approximation. The problem is to test such functions against standard version with automatically generated data inputs.

What should be done is:

  1. design a code pattern for calculating 'ULP distance' from the reference value,
  2. create a test fail/pass threshold value expressed in this ULP distance,
  3. report the ulp precision (for selected tests only) together with test result.

`_mm256_permutexvar_epi32` not declared in g++

g++-6 -mavx512f -mavx512cd -mavx512bw -mavx512dq -mavx512vl -mavx512ifma -mavx512vbmi Example1.cpp

In file included from ../plugins/avx512/UMESimdVecUintAVX512.h:56:0,
                 from ../plugins/UMESimdPluginAVX512.h:113,
                 from ../UMESimd.h:133,
                 from Example1.cpp:31:
../plugins/avx512/uint/UMESimdVecUint32_8.h: In member function ‘UME::SIMD::SIMDVec_u<unsigned int, 8u> UME::SIMD::SIMDVec_u<unsigned int, 8u>::swizzle(const UME::SIMD::SIMDSwizzle<8u>&) const’:
../plugins/avx512/uint/UMESimdVecUint32_8.h:282:67: error: ‘_mm256_permutexvar_epi32’ was not declared in this scope
             __m256i t0 = _mm256_permutexvar_epi32(mVec, sMask.mVec);
                                                                   ^
../plugins/avx512/uint/UMESimdVecUint32_8.h: In member function ‘UME::SIMD::SIMDVec_u<unsigned int, 8u> UME::SIMD::SIMDVec_u<unsigned int, 8u>::swizzle()’:
../plugins/avx512/uint/UMESimdVecUint32_8.h:296:59: error: there are no arguments to ‘_mm256_permutexvar_epi32’ that depend on a template parameter, so a declaration of ‘_mm256_permutexvar_epi32’ must be available [-fpermissive]
             __m256i t1 = _mm256_permutexvar_epi32(mVec, t0);
                                                           ^
../plugins/avx512/uint/UMESimdVecUint32_8.h:296:59: note: (if you use ‘-fpermissive’, G++ will accept your code, but allowing the use of an undeclared name is deprecated)
../plugins/avx512/uint/UMESimdVecUint32_8.h: In member function ‘UME::SIMD::SIMDVec_u<unsigned int, 8u>& UME::SIMD::SIMDVec_u<unsigned int, 8u>::swizzlea(const UME::SIMD::SIMDSwizzle<8u>&)’:
../plugins/avx512/uint/UMESimdVecUint32_8.h:311:61: error: ‘_mm256_permutexvar_epi32’ was not declared in this scope
             mVec = _mm256_permutexvar_epi32(mVec, sMask.mVec);
                                                             ^

SIMD64_8[ui] not constructable

The following works without problem:

UME::SIMD::SIMD32_8u v2(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
               16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31);
std::cout << "SIMD32_8u:\n";
printVector(v2);

But the same for 512bit doesn't work

UME::SIMD::SIMD64_8u v3(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
           16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,
           32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47,
           48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63);
std::cout << "SIMD64_8u:\n";
printVector(v3);
error: no matching function for call to ‘UME::SIMD::SIMDVec_u<unsigned char, 64u>::SIMDVec_u(int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int, int)’
                48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63);
                                                                              ^

Add benchmark for dense/sparse SIMD vector masking

When calculations are performed within an 'if-else' statement for scalar operations, only one part of the statement has to be executed ('if' block or 'else' block). When using masking, both blocks have to be executed even if the required conditions do not apply for most of the data elements.

A benchmark should be created to measure this masking overhead and, if possible, show in which cases scalar code might be a better option.

Mask assignable operators

Originally reported by: edanor (Bitbucket: edanor, GitHub: edanor)


One of the requested features is to have syntax like this:

vec1[mask] = vec2

This can be possible using the same design pattern as std::mask_arrays are using.
Overloading assign operators could give a very nice syntactic feature to the users.

This feature should be tested for performance.


Replace all operators overloaded as member functions with non-member overloads.

Originally reported by: edanor (Bitbucket: edanor, GitHub: edanor)


Except operators using scalar LHS operand, all overloaded operators are implemented as members in specialized classes.

Because of that it is necessary to both define operators in abstract interface and then re-define them in every class. This additional override is exactly the same as the interface function. Defining all operators as non-member will reduce code size and code repetition.

What needs to be considered is the overhead of such operators.

This issue relates directly to Issue #25


Unused variable

/home/marehr/develope/seqan-src/include/umesimd/plugins/avx512/uint/UMESimdVecUint64_16.h: In member function ‘UME::SIMD::SIMDVec_u<long unsigned int, 16u>& UME::SIMD::SIMDVec_u<long unsigned int, 16u>::swizzlea(const UME::SIMD::SIMDSwizzle<16u>&)’:
/home/marehr/develope/seqan-src/include/umesimd/plugins/avx512/uint/UMESimdVecUint64_16.h:281:34: warning: variable ‘result’ set but not used [-Wunused-but-set-variable]
             alignas(64) uint64_t result[16];

Which is indeed not used:

        UME_FORCE_INLINE SIMDVec_u & swizzlea(SIMDSwizzle<16> const & sMask) {
            alignas(64) uint32_t raw_smask[16];
            alignas(64) uint64_t raw[16];
            alignas(64) uint64_t result[16];

            _mm512_store_epi32(raw_smask, sMask.mVec);
            _mm512_store_epi64(&raw[0], mVec[0]);
            _mm512_store_epi64(&raw[8], mVec[1]);

            for(unsigned int i = 0; i < 16; i++) {
                result[i] = raw[raw_smask[i]];
            }

            mVec[0] = _mm512_load_epi64(&raw[0]);
            mVec[1] = _mm512_load_epi64(&raw[1]);
            return *this;
        }

Implementation of pRNG algorithms within UME::SIMD

Originally reported by: Guilherme Amadio (Bitbucket: amadio, GitHub: amadio)


It would be nice to have at least a simple (and thread-safe, if possible) random number generator in UME::SIMD. Even a linear congruential generator should be good enough for some applications, and ideally a generator with good randomness for more sensitive workloads (e.g., particle simulation).

The links below might be helpful:


Does not compile with clang 3.9.0 on Haswell

Here's the output:

In file included from ./umesimd/plugins/avx2/UMESimdVecUintAVX2.h:57:
./umesimd/plugins/avx2/uint/UMESimdVecUint64_4.h:62:11: error: explicit
      specialization of 'UME::SIMD::SIMDVec_u<unsigned long, 4>' after instantiation
    class SIMDVec_u<uint64_t, 4> :
          ^~~~~~~~~~~~~~~~~~~~~~
./umesimd/plugins/avx2/uint/UMESimdVecUint32_4.h:142:27: note: implicit
      instantiation first required here
            return assign(b);
                          ^
In file included from kernel_simd_class_umesimd.cpp:6:
In file included from /home/b/bemnoack/repositories/simd_benchmarks/include/common/kernel.hpp:17:
In file included from ./umesimd/UMESimd.h:137:
In file included from ./umesimd/plugins/UMESimdPluginAVX2.h:97:
In file included from ./umesimd/plugins/avx2/UMESimdVecUintAVX2.h:57:
./umesimd/plugins/avx2/uint/UMESimdVecUint64_4.h:757:9: warning: '/*' within
      block comment [-Wcomment]
        /*UME_FORCE_INLINE SIMDVec_u div(SIMDVecMask<4> const & mask, SIMDVec_u const & b) const {
        ^
./umesimd/plugins/avx2/uint/UMESimdVecUint64_4.h:763:9: warning: '/*' within
      block comment [-Wcomment]
        /*UME_FORCE_INLINE SIMDVec_u div(uint64_t b) const {
        ^
./umesimd/plugins/avx2/uint/UMESimdVecUint64_4.h:787:9: warning: '/*' within
      block comment [-Wcomment]
        /*UME_FORCE_INLINE SIMDVec_u & diva(SIMDVecMask<4> const & mask, SIMDVec_u const & b) {
        ^
./umesimd/plugins/avx2/uint/UMESimdVecUint64_4.h:793:9: warning: '/*' within
      block comment [-Wcomment]
        /*UME_FORCE_INLINE SIMDVec_u & diva(uint64_t b) {
        ^
./umesimd/plugins/avx2/uint/UMESimdVecUint64_4.h:802:9: warning: '/*' within
      block comment [-Wcomment]
        /*UME_FORCE_INLINE SIMDVec_u & diva(SIMDVecMask<4> const & mask, uint64_t b) {
        ^
In file included from kernel_simd_class_umesimd.cpp:6:
In file included from /home/b/bemnoack/repositories/simd_benchmarks/include/common/kernel.hpp:17:
In file included from ./umesimd/UMESimd.h:137:
In file included from ./umesimd/plugins/UMESimdPluginAVX2.h:100:
./umesimd/plugins/avx2/UMESimdCastOperatorsAVX2.h:153:36: error: template
      specialization requires 'template<>'
    inline SIMDVec_u<uint64_t, 4>::operator SIMDVec_i<int64_t, 4>() const {
           ~~~~~~~~~~~~~~~~~~~~~~  ^
    template<>
./umesimd/plugins/avx2/UMESimdCastOperatorsAVX2.h:217:36: error: template
      specialization requires 'template<>'
    inline SIMDVec_u<uint64_t, 4>::operator SIMDVec_f<double, 4>() const {
           ~~~~~~~~~~~~~~~~~~~~~~  ^
    template<>
./umesimd/plugins/avx2/UMESimdCastOperatorsAVX2.h:848:36: error: template
      specialization requires 'template<>'
    inline SIMDVec_u<uint64_t, 4>::operator SIMDVec_u<uint32_t, 4>() const {
           ~~~~~~~~~~~~~~~~~~~~~~  ^
    template<>
5 warnings and 4 errors generated.

[AVX512] SIMD4: review implementation - remove blend operations

Originally reported by: edanor (Bitbucket: edanor, GitHub: edanor)


SIMD 4 implementation uses blend operation to perform mask operations for non AVX512VL instruction sets. This will most likely introduce overhead. The same can be done using cast to 512b vectors and using mask operations supported by AVX512F.

The task is to review the code for SIMD4 and make sure that all blending operations are removed.


Unit test - multiple test apps

Originally reported by: edanor (Bitbucket: edanor, GitHub: edanor)


Compiling current test suite takes insane amounts of time. For development purposes only a minimal set of tests is required, usually regarding single SIMD type being specialized. Separate testing apps can be provided to compile only tests for required type. This would also allow faster parallel builds.


Bug in assign() of avx512/uint/64_2 version

Hi,

the following will produce a wrong result for avx512/uint:

using TVector = UME::SIMD::SIMDVec_u<uint64_t, 2>;
TVector a{10, 0}, b{0, 10}, retval(0);

auto c = a.cmpgt(b);
retval.assign(c, ~uint64_t(0));

// retval = (0xFFFFFFFF00000000, 0x0), but should be
// retval = (0xFFFFFFFFFFFFFFFF, 0x0)

Raise warning level

Hi,

I would suggest to raise the warning level of the library.

We use the following warning level

  • gcc + icc: -W -Wall -pedantic -fstrict-aliasing -Wstrict-aliasing
  • clang: -W -Wall -pedantic -fstrict-aliasing -Wstrict-aliasing -Wshorten-64-to-32
  • msvc: /W2, but would like to use /W3

Best regards!

Source file format incompatible with Cray Compiler (and inconsistent)

When trying to build with the Cray Compiler, I got the following message

CC-7 crayc++: ERROR File = ./umesimd/UMESimd.h, Line = 1
  The indicated token is not valid in this context.
  // The MIT License (MIT)
  ^

Looking at the files, there seems to some inconsistency... and windows line endings. ;-)

file UMESimd.h 
UMESimd.h: UTF-8 Unicode (with BOM) C++ program text, with CRLF line terminators

file UMESimdTraits.h 
UMESimdTraits.h: ASCII C++ program text, with CRLF line terminators

The Problem seems to be the BOM (byte order mark) which is some magic number at the beginning of the file. The Cray compiler seems not be able to deal with it. If I open such a file in GNOME's gedit, set the cursor to the beginning of the file, the BOM materialises as an invisible character, i.e. I have to press the right arrow key twice to get the cursor one position to the right. If I delete the first character (no visible change), the file works and the compiler complains with the same message for the next include.

I'll report that to Cray too, but would hope for you to fix the format, or maybe put a script inside the repository, if your editor of choice enforces that format.

[AVX512] Unsigned 64bit-integer comparison fails

#include "../UMESimd.h"

using namespace UME::SIMD;

int main()
{
    SIMD8_64u a((uint64_t)(~0));
    SIMD8_64u b(0);

    auto c = a.cmpgt(b);
    std::cout << a[0] << " = a > b = " << b[0] << " <==> " << (c[0] ? "true" : "false") << std::endl;

    return 0;
}

Should return

18446744073709551615 = a > b = 0 <==> true

but returns

18446744073709551615 = a > b = 0 <==> false

Const correctness in IntermediateIndex

Consider:

#include "../UMESimd.h"

using namespace UME::SIMD;

template <typename T1, typename T2>
bool compare(T1 const & value1, T2 const & value2)
{
    if (value1 == value2) {
        std::cout << "true" << std::endl;
        return true;
    } else {
        std::cout << "false" << std::endl;
        return false;
    }
}

int main()
{
    SIMD4_64u a(5, 3, 8, 4), b(13, 984, 5, 0);

    // won't work
    compare(a[0], b[2]);

    // works
    if (a[0] == b[2]) {
        std::cout << "true" << std::endl;
    } else {
        std::cout << "false" << std::endl;
    }
    return 0;
}

This will produce

gcc:

index_bug.cpp: In instantiation of ‘bool compare(const T1&, const T2&) [with T1 = UME::SIMD::IntermediateIndex<UME::SIMD::SIMDVec_u<long unsigned int, 4u>, long unsigned int>; T2 = UME::SIMD::IntermediateIndex<UME::SIMD::SIMDVec_u<long unsigned int, 4u>, long unsigned int>]’:
index_bug.cpp:22:23:   required from here
index_bug.cpp:8:16: error: no match for ‘operator==’ (operand types are ‘const UME::SIMD::IntermediateIndex<UME::SIMD::SIMDVec_u<long unsigned int, 4u>, long unsigned int>’ and ‘const UME::SIMD::IntermediateIndex<UME::SIMD::SIMDVec_u<long unsigned int, 4u>, long unsigned int>’)
     if (value1 == value2) {
         ~~~~~~~^~~~~~~~~
index_bug.cpp:8:16: note: candidate: operator==(long unsigned int, long unsigned int) <built-in>
index_bug.cpp:8:16: note:   conversion of argument 2 would be ill-formed:
index_bug.cpp:8:16: error: passing ‘const UME::SIMD::IntermediateIndex<UME::SIMD::SIMDVec_u<long unsigned int, 4u>, long unsigned int>’ as ‘this’ argument discards qualifiers [-fpermissive]
In file included from ../plugins/UMESimdPluginScalarEmulation.h:36:0,
                 from ../UMESimd.h:147,
                 from index_bug.cpp:1:
../plugins/../UMESimdInterface.h:654:26: note:   in call to ‘UME::SIMD::IntermediateIndex<VEC_TYPE, SCALAR_TYPE>::operator SCALAR_TYPE() [with VEC_TYPE = UME::SIMD::SIMDVec_u<long unsigned int, 4u>; SCALAR_TYPE = long unsigned int]’
         UME_FORCE_INLINE operator SCALAR_TYPE() { return mVecRef_RW.extract(mIndexRef); }
                          ^~~~~~~~

[...]

clang:

index_bug.cpp:8:16: error: invalid operands to binary expression ('const UME::SIMD::IntermediateIndex<UME::SIMD::SIMDVec_u<unsigned long, 4>, unsigned
      long>' and 'const UME::SIMD::IntermediateIndex<UME::SIMD::SIMDVec_u<unsigned long, 4>, unsigned long>')
    if (value1 == value2) {
        ~~~~~~ ^  ~~~~~~
index_bug.cpp:22:5: note: in instantiation of function template specialization 'compare<UME::SIMD::IntermediateIndex<UME::SIMD::SIMDVec_u<unsigned
      long, 4>, unsigned long>, UME::SIMD::IntermediateIndex<UME::SIMD::SIMDVec_u<unsigned long, 4>, unsigned long> >' requested here
    compare(a[0], b[2]);
    ^
1 error generated.

The problem is that all comparison operators are not declared as const. Furthermore, the templated function won't allow implicit type conversion of the second argument.

[BENCH] Statistics refactoring

Originally reported by: edanor (Bitbucket: edanor, GitHub: edanor)


In file microbenchmarks/utilities/TimingStatistics.h, a class TimingStatistics is defined, together with a more general Statistics template.

The task is to remove completely TimingStatistics and replace its use with templated version. In addition to that, all files should also calculate some numeric error measure, so that the it would be possible to compare precision of different implementations (see RMS error in matmul benchmark).


How should the structure of UME::SIMD in /usr/include be?

Hi Przemyslaw,

I came across the problem, how to include UME::SIMD in my programs. As an example, we have in our repository the folder structure /include/seqan that means the distributors can copy all our header files from there to some common /include folder (e.g. /usr/include, if you install apt-get install seqan-dev you will find our header files in in that folder).

In the source code, you can include the headers like this:

#include <seqan/bam_io.h>
// ....

See http://seqan.readthedocs.io/en/master/Tutorial/InputOutput/FileIOOverview.html for an example.

Your repository has no obvious /include folder.

So, do you want that users include as

#include <umesimd/UMESimd.h> // so?
#include <UMESimd/UMESimd.h> // or, so?
#include <UMESimd.h> // or, so?

I looked into some projects, how they handle it (the part in [] is what will be copied over):

  • mysql: has /include/[**.h] in the repository, installs as /usr/include/mysql/[**.h]
  • protobuf: has /src/[google/protobuf/**.h]which will (partially) be copied to /usr/include/[google/protobuf/**.h]
  • poppler: has [/glib/**.h] -> /usr/include/poppler/[glib/**.h]
  • boost: Is a collection of repositories that follow the same rule
    • For example hana has /include/[boost/**.h] -> /usr/include/[boost/**.h]

Thus, there are different philosophies how to address this. I'd say that purely C++ written libraries will most likely have a dedicated /include folder, but there is no common pattern if they will have the library name included in /include/[umesimd?].

Can't build library with gcc and clang for knl

Hi umesimd developers,

first of all, thank you for your great work and effort.
And it is also great to see that you migrated to github.

Now to my issue, in umesimd/examples/

g++-6 -std=c++11 -march=knl Example1.cpp

and

clang++-3.9 -std=c++11 -march=knl Example1.cpp

will result in similar errors of missing functions:

In file included from Example1.cpp:31:
In file included from ./../UMESimd.h:133:
In file included from ./../plugins/UMESimdPluginAVX512.h:98:
In file included from ./../plugins/avx512/UMESimdVecUintAVX512.h:52:
./../plugins/avx512/uint/UMESimdVecUint32_4.h:977:31: error: use of undeclared identifier '_mm512_reduce_add_epi32'
            uint32_t retval = _mm512_reduce_add_epi32(t0);
                              ^
./../plugins/avx512/uint/UMESimdVecUint32_4.h:984:31: error: use of undeclared identifier '_mm512_mask_reduce_add_epi32'
            uint32_t retval = _mm512_mask_reduce_add_epi32(t1, t0);
                              ^
./../plugins/avx512/uint/UMESimdVecUint32_4.h:990:31: error: use of undeclared identifier '_mm512_reduce_add_epi32'
            uint32_t retval = _mm512_reduce_add_epi32(t0);
                              ^
[...]

Because we are also an IPCC, we asked our persons in charge and they said, that it seems that not all intrinsics are integral part of the architecture’s instruction set but extensions in software (like the reduce intrinsics). Since it seems that gcc and a clang add those feature way later, it would be nice to be able to compile the library regardless of the compiler.

Replace insert/extract in scalar emulation with load/store

Originally reported by: edanor (Bitbucket: edanor, GitHub: edanor)


Because we want to explicitly specialize only operations that can be expressed in specific instruction set, all unsupported operations should be left with resolution to scalar emulation.

Most of emulated functions are using insert/extract on a per-element basis. Performing insert/extract operations on vectors is slow for non-emulated vectors. Doing load/store on emulated data types shouldn't create any slow-downs due to compiler optimizations, and even if not, it will still reduce slow-down on target vector code.

This proposal is to re-write scalar emulation functions with LOAD/STORE instead of INSERT/EXTRACT.


Implementation issues with IntermediateIndex

#include "../UMESimd.h"

using namespace UME::SIMD;

int main()
{
    SIMD4_64i a(5, 3, 8, 4), b(13, 984, 5, 0);

    // this can't be constructed, because IntermediateIndex's constructor is private
    // decltype(a[0]) = UME::SIMD::IntermediateIndex<UME::SIMD::SIMDVec_i<long int, 4u>, long int>
    // auto a_0 = a[0];
    // auto b_2 = b[2];

    auto true1 = a[0] == b[2];
    std::cout << "a[0] = " << a[0] << " == " << b[2] << " = b[2]: " << true1 << std::endl;

    // auto true2 = a_0 == b_2;
    // std::cout << "a_0  = " << a_0 << " == " << b_2 << " = b_2 : " << true2 << std::endl;

    // decltype(ca) = const UME::SIMD::SIMDVec_i<long int, 4u>&
    const auto & ca = a;
    const auto & cb = b;

    auto true3 = ca[0] == cb[2];
    std::cout << "ca[0] = " << ca[0] << " == " << cb[2] << " = cb[2]: " << true3 << std::endl;

    // decltype(ca[0]) = long int
    auto ca_0 = ca[0];
    auto cb_2 = cb[2];

    auto true4 = ca_0 == cb_2;
    std::cout << "ca_0  = " << ca_0 << " == " << cb_2 << " = cb_2 : " << true4 << std::endl;

    return 0;
}

Will produce

a[0] = 5 == 5 = b[2]: 0
ca[0] = 5 == 5 = cb[2]: 1
ca_0  = 5 == 5 = cb_2 : 1

The first line is completely wrong.


I stumbled across this, as I wanted to fix the const correctness in the IntermediateIndex. I already know what the issue is and will create a pull request later. The purpose of this issue is only for documentation.

Interface issue for assign() on intergers?

Hi,

for float and double the interface is like this:

assign(SIMDVecMask<4> const & mask, float b); // UMESimdVecFloat32_4
assign(SIMDVecMask<2> const & mask, double b); // UMESimdVecFloat64_2

Thus, the assignee type matches the scalar type of the vector.

For integers this isn't true:

assign(SIMDVecMask<4> const & mask, uint32_t b); // in avx512/uint/UMESimdVecUint16_4.h
assign(SIMDVecMask<4> const & mask, uint32_t b); // in avx512/uint/UMESimdVecUint32_4.h
assign(SIMDVecMask<8> const & mask, uint64_t b); // in avx512/uint/UMESimdVecUint64_8.h

I looked into the source code and apparently avx512/uint/UMESimdVecUint16_4.h was the only file where you didn't use uint16_t. I thought this is a interface inconsistency, but it seems to be only a bug.

Missing function for mask checking if mask is empty/mixed/full

I am missing an equivalent to Vc's isEmpty() function on the masks, or maybe a conversion operator to bool, that is true if at least one element of the mast is true.

With Intel you could use:

_mm512_mask2int

For other compilers, I use this replacement:

#if !defined(__INTEL_COMPILER)
// online defined by Intel Compiler
inline int _mm512_mask2int(__mmask16 k1) {
    return static_cast<int>(k1);
}
#endif

Rationale: You usually have three cases, that might be beneficial to check and handle differently:

  • empty mask: nothing to do
  • mixed mask: proceed with masked operation
  • full mask: proceed with unmasked operations

Const correctness of pointers

Hi again,

we ran into some problems compiling the library, because of const correctness of pointers.

In our case, we assumed that the gather operation would be fine with a const pointer, since it semantics suggests that it only loads stuff and doesn't mess with the data.

For example:

SIMDVec_u & gather(uint64_t * baseAddr, SIMDVec_u const & indices); // current
SIMDVec_u & gather(uint64_t const * baseAddr, SIMDVec_u const & indices);  // our suggestion

//or

SIMDVec_u & gather(uint64_t * baseAddr, uint64_t* indices); // current
SIMDVec_u & gather(uint64_t const * baseAddr, uint64_t const * indices); // our suggestion

Is there a reason to not guarantee constness? If not I would open a pull request to fix at least the gather operations.

Extend interface with masking operations.

Originally reported by: edanor (Bitbucket: edanor, GitHub: edanor)


As for other interfaces, masks should also implement MFI functions with masking operand. An example where it causes problems comes from VecGeom:

VECGEOM_FORCE_INLINE
void MaskedAssign(UmesimdBool_v const &cond, UmesimdBool_v const &thenval, UmesimdBool_v *const output)
{
//output->assign(cond, thenval);
UmesimdBool_v out_v;
out_v.assign(*output);
UmesimdBool_v t0 = cond.land(thenval);
UmesimdBool_v t1 = (!cond).land(out_v);
UmesimdBool_v t2 = t0 || t1;
output->assign(t2);
}

here the overloaded 'MaskedAssign' operation tries to perform blending between two masks.

If usage model comes from the users it should be provided automatically.


Operators for mixed scalar-vector operations.

Originally reported by: edanor (Bitbucket: edanor, GitHub: edanor)


Currently only vec<->vec operations are permitted using overloaded C++ operators. Writing:

#!c++
SIMD4_32f a, b, c;
c = a + b;

Is permitted, while:

#!c++
SIMD4_32f a, b, c;
float d, e;
c = a + b; // OK
c = a + d; // Error: no implicit conversion between float and SIMD4_32f
c = e + b; // Error: no operator matching: 'operator+ (float, SIMD4_32f &)' 

This requires actually adding two overloads for every operator: one for RHS scalar types can be defined as member function of SIMD types:

#!c++

VEC_TYPE operator+(VEC_TYPE const & a, SCALAR_TYPE b) const;   

and second one for LHS scalars as friend functions:

#!c++

VEC_TYPE operator+(SCALAR_TYPE b, VEC_TYPE const & b) const;

Because of potential collisions with operators overloaded in std::, this cannot be done using templates, and thus requires explicit specialization of all scalar and SIMD types combinations.


UME::SIMD master (5785acb) does not compile with GCC (AVX2 plugin)

Originally reported by: Guilherme Amadio (Bitbucket: amadio, GitHub: amadio)


I encountered the following problem when compiling VecCore with UME::SIMD backend enabled:

/home/amadio/src/umesimd/plugins/avx2/uint/UMESimdVecUint64_4.h: In member function ‘UME::SIMD::SIMDVec_u<long unsigned int, 4u>& UME::SIMD::SIMDVec_u<long unsigned int, 4u>::load(const UME::SIMD::SIMDVecMask<4u>&, const uint64_t*)’:
/home/amadio/src/umesimd/plugins/avx2/uint/UMESimdVecUint64_4.h:186:43: error: ‘__int64’ was not declared in this scope
             mVec = _mm256_maskload_epi64((__int64 const*)p, t0);
                                           ^
/home/amadio/src/umesimd/plugins/avx2/uint/UMESimdVecUint64_4.h:186:51: error: expected ‘)’ before ‘const’
             mVec = _mm256_maskload_epi64((__int64 const*)p, t0);
                                                   ^
/home/amadio/src/umesimd/plugins/avx2/uint/UMESimdVecUint64_4.h: In member function ‘UME::SIMD::SIMDVec_u<long unsigned int, 4u>& UME::SIMD::SIMDVec_u<long unsigned int, 4u>::loada(const UME::SIMD::SIMDVecMask<4u>&, const uint64_t*)’:
/home/amadio/src/umesimd/plugins/avx2/uint/UMESimdVecUint64_4.h:197:43: error: ‘__int64’ was not declared in this scope
             mVec = _mm256_maskload_epi64((__int64 const*)p, t0);
                                           ^
/home/amadio/src/umesimd/plugins/avx2/uint/UMESimdVecUint64_4.h:197:51: error: expected ‘)’ before ‘const’
             mVec = _mm256_maskload_epi64((__int64 const*)p, t0);
                                                   ^
/home/amadio/src/umesimd/plugins/avx2/uint/UMESimdVecUint64_4.h: In member function ‘uint64_t* UME::SIMD::SIMDVec_u<long unsigned int, 4u>::store(const UME::SIMD::SIMDVecMask<4u>&, uint64_t*) const’:
/home/amadio/src/umesimd/plugins/avx2/uint/UMESimdVecUint64_4.h:208:37: error: ‘__int64’ was not declared in this scope
             _mm256_maskstore_epi64((__int64 *)p, t0, mVec);
                                     ^
/home/amadio/src/umesimd/plugins/avx2/uint/UMESimdVecUint64_4.h:208:46: error: expected primary-expression before ‘)’ token
             _mm256_maskstore_epi64((__int64 *)p, t0, mVec);
                                              ^
/home/amadio/src/umesimd/plugins/avx2/uint/UMESimdVecUint64_4.h: In member function ‘uint64_t* UME::SIMD::SIMDVec_u<long unsigned int, 4u>::storea(const UME::SIMD::SIMDVecMask<4u>&, uint64_t*) const’:
/home/amadio/src/umesimd/plugins/avx2/uint/UMESimdVecUint64_4.h:219:37: error: ‘__int64’ was not declared in this scope
             _mm256_maskstore_epi64((__int64 *)p, t0, mVec);
                                     ^
/home/amadio/src/umesimd/plugins/avx2/uint/UMESimdVecUint64_4.h:219:46: error: expected primary-expression before ‘)’ token
             _mm256_maskstore_epi64((__int64 *)p, t0, mVec);
                                              ^
make[2]: *** [test/CMakeFiles/Math.dir/build.make:63: test/CMakeFiles/Math.dir/mathtest.cc.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:94: test/CMakeFiles/Math.dir/all] Error 2
make: *** [Makefile:139: all] Error 2

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.