GithubHelp home page GithubHelp logo

db-tu-dresden / tsl Goto Github PK

View Code? Open in Web Editor NEW
7.0 3.0 8.0 60.06 MB

Template SIMD Library (+Generator)

License: GNU General Public License v3.0

Python 34.02% CMake 2.29% Shell 0.21% C++ 3.85% Verilog 0.38% HTML 58.19% SystemVerilog 1.06%
abstraction-database hardware-agnostic simd-intrinsics simd-programming

tsl's People

Contributors

actions-user avatar alexkrausetud avatar cmrschwarz avatar dertuchi avatar dhabich avatar ericmier avatar jpietrzyktud avatar niclashedam avatar ratusz avatar tomschw avatar yuhta avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

tsl's Issues

Add support for logic operations (OR) in lscpu_flags

As it turns out, the flag indicating the existence of ARM neon using lscpu (/proc/cpuinfo) is either "neon" or "asimd" (advanced simd). Currently, the specified list of lscpu_flags of a definition is used as conjunction => all flags must be available. However, there should be a way to express them using an either-or.
This is also true for oneAPI FPGAs since we can only identify the accelerator cards using lspci | grep accel (afaik). However, stratix 10 has another hex-value than the agilex while providing the same set of functionalities in the scope of the TSL.

Maybe we can implement this quite "natively" using lists of lists for the lscpu_flags:

#old: lscpu_flags: ["neon"]
#new: lscpu_flags: [["neon"], ["asimd"]]

Some lscpu flags need an alias

When the TSL is generated, the required compiler arguments are derived from the lscpu list; either trhough py-cpuinfo or lscpu.

However, some flags have no direct mapping to compiler flags, especially avx512 flags, e.g.

lscpu g++/clang
avx512_fp16 -mavx512fp16
avx512_vpopcntdq -mavx512vpopcntdq
avx512_vbmi2 -mavx512vbmi2

Maybe we can conventionally just remove the underscores, but this might not be true for all flags.

Unwanted `Primitive <..> not implemented` warnings

The test suite currently produces a significant amount of warnings related to unimplemented primitives:

Full Dump
./build/generator_output/src/test/tsl_test |& grep -A 1 warning | grep implemented | cut -d" " -f4-
shift_left<simd<double, avx2>> not implemented.
shift_left<simd<float, avx2>> not implemented.
shift_left<simd<int8_t, avx2>> not implemented.
shift_left<simd<uint8_t, avx2>> not implemented.
shift_left<simd<double, avx512>> not implemented.
shift_left<simd<float, avx512>> not implemented.
shift_left<simd<int8_t, avx512>> not implemented.
shift_left<simd<uint8_t, avx512>> not implemented.
shift_left<simd<double, scalar>> not implemented.
shift_left<simd<float, scalar>> not implemented.
shift_left<simd<double, sse>> not implemented.
shift_left<simd<float, sse>> not implemented.
shift_left<simd<int8_t, sse>> not implemented.
shift_left<simd<uint8_t, sse>> not implemented.
shift_left_vector<simd<double, avx2>> not implemented.
shift_left_vector<simd<float, avx2>> not implemented.
shift_left_vector<simd<int8_t, avx2>> not implemented.
shift_left_vector<simd<uint8_t, avx2>> not implemented.
shift_left_vector<simd<double, avx512>> not implemented.
shift_left_vector<simd<float, avx512>> not implemented.
shift_left_vector<simd<int8_t, avx512>> not implemented.
shift_left_vector<simd<uint8_t, avx512>> not implemented.
shift_left_vector<simd<double, scalar>> not implemented.
shift_left_vector<simd<float, scalar>> not implemented.
shift_left_vector<simd<double, sse>> not implemented.
shift_left_vector<simd<float, sse>> not implemented.
shift_left_vector<simd<int8_t, sse>> not implemented.
shift_left_vector<simd<uint8_t, sse>> not implemented.
shift_right<simd<double, avx2>> not implemented.
shift_right<simd<float, avx2>> not implemented.
shift_right<simd<int8_t, avx2>> not implemented.
shift_right<simd<uint8_t, avx2>> not implemented.
shift_right<simd<double, avx512>> not implemented.
shift_right<simd<float, avx512>> not implemented.
shift_right<simd<int8_t, avx512>> not implemented.
shift_right<simd<uint8_t, avx512>> not implemented.
shift_right<simd<double, scalar>> not implemented.
shift_right<simd<float, scalar>> not implemented.
shift_right<simd<double, sse>> not implemented.
shift_right<simd<float, sse>> not implemented.
shift_right<simd<int8_t, sse>> not implemented.
shift_right<simd<uint8_t, sse>> not implemented.
shift_right_logical<simd<double, avx2>> not implemented.
shift_right_logical<simd<float, avx2>> not implemented.
shift_right_logical<simd<int8_t, avx2>> not implemented.
shift_right_logical<simd<uint8_t, avx2>> not implemented.
shift_right_logical<simd<double, avx512>> not implemented.
shift_right_logical<simd<float, avx512>> not implemented.
shift_right_logical<simd<int8_t, avx512>> not implemented.
shift_right_logical<simd<uint8_t, avx512>> not implemented.
shift_right_logical<simd<double, scalar>> not implemented.
shift_right_logical<simd<float, scalar>> not implemented.
shift_right_logical<simd<double, sse>> not implemented.
shift_right_logical<simd<float, sse>> not implemented.
shift_right_logical<simd<int8_t, sse>> not implemented.
shift_right_logical<simd<uint8_t, sse>> not implemented.
shift_right_logical_vector<simd<double, avx2>> not implemented.
shift_right_logical_vector<simd<float, avx2>> not implemented.
shift_right_logical_vector<simd<int8_t, avx2>> not implemented.
shift_right_logical_vector<simd<uint8_t, avx2>> not implemented.
shift_right_logical_vector<simd<double, avx512>> not implemented.
shift_right_logical_vector<simd<float, avx512>> not implemented.
shift_right_logical_vector<simd<int8_t, avx512>> not implemented.
shift_right_logical_vector<simd<uint8_t, avx512>> not implemented.
shift_right_logical_vector<simd<double, scalar>> not implemented.
shift_right_logical_vector<simd<float, scalar>> not implemented.
shift_right_logical_vector<simd<double, sse>> not implemented.
shift_right_logical_vector<simd<float, sse>> not implemented.
shift_right_logical_vector<simd<int8_t, sse>> not implemented.
shift_right_logical_vector<simd<uint8_t, sse>> not implemented.
shift_right_vector<simd<double, avx2>> not implemented.
shift_right_vector<simd<float, avx2>> not implemented.
shift_right_vector<simd<int8_t, avx2>> not implemented.
shift_right_vector<simd<uint8_t, avx2>> not implemented.
shift_right_vector<simd<double, avx512>> not implemented.
shift_right_vector<simd<float, avx512>> not implemented.
shift_right_vector<simd<int8_t, avx512>> not implemented.
shift_right_vector<simd<uint8_t, avx512>> not implemented.
shift_right_vector<simd<double, scalar>> not implemented.
shift_right_vector<simd<float, scalar>> not implemented.
shift_right_vector<simd<double, sse>> not implemented.
shift_right_vector<simd<float, sse>> not implemented.
shift_right_vector<simd<int8_t, sse>> not implemented.
shift_right_vector<simd<uint8_t, sse>> not implemented.
equal<simd<double, sse>> not implemented.
equal<simd<float, sse>> not implemented.
equal<simd<double, sse>> not implemented.
equal<simd<float, sse>> not implemented.
mask_equal<simd<double, avx2>> not implemented.
mask_equal<simd<float, avx2>> not implemented.
mask_equal not implemented for avx512
mask_equal not implemented for scalar
mask_equal not implemented for sse
convert_down<simd<double, avx2>> not implemented.
convert_down<simd<float, avx2>> not implemented.
convert_down<simd<int16_t, avx2>> not implemented.
convert_down<simd<int8_t, avx2>> not implemented.
convert_down<simd<uint16_t, avx2>> not implemented.
convert_down<simd<uint8_t, avx2>> not implemented.
convert_down not implemented for avx512
convert_down not implemented for scalar
convert_up<simd<double, avx2>> not implemented.
convert_up<simd<float, avx2>> not implemented.
convert_up not implemented for avx512
convert_up not implemented for scalar
convert_up<simd<double, sse>> not implemented.
convert_up<simd<float, sse>> not implemented.
convert_up<simd<int64_t, sse>> not implemented.
convert_up<simd<uint64_t, sse>> not implemented.

Some of these are justified, mainly convert_up/convert_down/mask_equal/equal not being implemented for some extensions

But others aren't:

  • convert_down doesn't make sense for uint_8 (no smaller type exists)
  • convert_up doesn't make sense for uint64_t (no larger type exists)
  • shift_*** does not make sense for float or double (C++ scalars and AVX don't supports that, neither should we. User should cast instead.)
  • there are probably more cases that just don't occur because the corresponding primitive doesn't have tests yet, e.g.
    cast

I see three ways to deal with this:

  1. Get rid of this warning entirely
  2. Only emit the warning if the implementation is missing because of lscpu flags, not for implementations that were never written at all
  3. Add a yaml tag (e.g. implementation_omitted) to indicate implementations that are explicitly not desired.

My personal favorite is 3., since i would rather not remove a warning that is very helpful in development.
While we're at it, we might consider emitting different warnings for the two cases of

  • missing because of lscpu flags
  • missing because no implementation was written

Which mean very different things for the developer.

Solve ambiguity problems

As we allow function overloading through functor_name we can end up in a situation where the same function signature is generated twice (e.g., for scalar definition + mask/imask primitives, since the mask type equals the imask type).

Currently, we work around that issue by leaving out the offending definitions. However, this seems to be bad practice.
Thus, we need a better solution.

[NEON] Missing parentheses in `mask_ls_neon.hpp`

I'm getting the following warning with NEON:

generate_tsl_neon-asimd/include/generated/definitions/mask_ls/mask_ls_neon.hpp:850:36: 
warning: & has lower precedence than ==; == will be evaluated first [-Wparentheses]
  850 |                   if ((imask >> i) & 0b1 == true) {
      |                                    ^~~~~~~~~~~~~

This looks like the generated code is "wrong". If == is evaluated first, the code checks if (mask >> i) & true, where true will just be 1. So this has the same effect in the end. You can probably just remove the == true because any value != 0 should evaluate to true anyway.

Make CXX-Standard an option

I think the generated code (if not using C++-20 concepts) can be generated with c++-14.
We should incorporate the functionality to choose between c++-14/17/20

Usability: Add a CMakeLists.txt

As it currently stands,
this repository does not have it's own CMakeLists.txt.

The current 'TVL' repository also only contains a pre generated
version of the tvl for a single set of lscpu flags.

Adding a CMakeLists.txt to this repository that generates a customized version of the library
on demand would be a better way of distribution, and also help development.

A simple starting version is attached here:

CMakeLists.txt

Check lscpu-flags in primitives

Alex brought to my attention that there are some primitive definitions with wrong lscpu flags.
We should incorporate a lscpu flag check within the CI-Pipelines, this should do the trick.

[AVX512] `convert_up` missing types

The following conversions lead to compile errors with AVX512. The errors are all like the foloowing, but with the corresponding types.

 error: no member named 'apply' in 'tsl::functors::convert_up<tsl::simd<unsigned int, tsl::avx512>, 
                                                              tsl::simd<unsigned long, tsl::avx512>, 
                                                              tsl::workaround>'
  • uint16_t --> uint32_t
  • uint16_t --> uint64_t
  • uint32_t --> uint64_t

Compressstore buggy

In the workaround version of compress store we have this little peace of code in the end:

if(((mask>>Vec::vector_element_count())&0b1) == 0) {
   *memory = safe[memory-orig_mem];
}

This is just wrong.

Primitive Table Inconsistencies

When viewing the primitive table, it shows almost always equal availability for oneAPIfpga and oneAPIfpgaRTL. That is due to the "fallback" mechanism of making C++ HLS code available, if no RTL specification is present.

However, this is rather confusing, since no actual indication of the fallback is present. This should be somehow enhanced to mark fallback Solutions or actual avilability.

Sort include-order for definitions

As some primitives internally use other primitives declared and defined in different files, we need to build a dependency graph and sort the includes in tsl_generated.hpp accordingly.
Example:
calc.yaml:1236ff (mod<simd<uint32_t, avx512>>):

/*...*/
__m512 vec_d = tsl::cast<Vec, typename Vec::template transform_extension<T>>(vec);
/*...*/

Cast is defined in convert.yaml:360ff.
The include order would be:

#include "extensions/scalar.hpp"
#include "extensions/simd/intel/avx2.hpp"
#include "declarations/*"
#include "definitions/compare/compare_avx2.hpp"
#include "definitions/compare/compare_sse.hpp"
#include "definitions/compare/compare_scalar.hpp"
#include "definitions/calc/calc_avx2.hpp"
#include "definitions/calc/calc_sse.hpp"
#include "definitions/calc/calc_scalar.hpp"

Missing Runtime with deb

When installing the TSL from the deb package and including it via
#include <tsl/tslintrin.hpp>

the CPU runtime header cannot be found:

In file included from /usr/include/tsl/tslintrin.hpp:33,
                 from main.cpp:5:
/usr/include/tsl/generated/tsl_generated.hpp:79:10: fatal error: tslCPUrt.hpp: No such file or directory
   79 | #include "tslCPUrt.hpp"
      |          ^~~~~~~~~~~~~~
compilation terminated.

The link to the tslCPUrt.hpp is originally formed through the CMake integration and has to be present through tslintrin as well.

[NEON] Unknown type name `neon` in tslCPUrt.hpp

When using tsl::runtime::cpu::max_width_extension_t on AArch64, I get the following error message:

runtime/cpu/include/tslCPUrt.hpp:15:29: error: unknown type name 'neon'
   15 |         using extension_t = neon;
      |                             ^

runtime/cpu/include/tslCPUrt.hpp:22:23: error: use of undeclared identifier 'scalar'
   22 |             (Par==1), scalar, typename details::simd_ext_helper_t<sizeof(T)*8*Par>::extension_t
      |                       ^

These types are not included, so they cannot be known.

Side note: same holds for VectorProcessingStyle and TSLArithmetic further down in the code. I'm not using them, but my IDE shows me that they are undefined. Probably just a few includes missing.

Sequence

As reported by @EricMier, the primitive sequence creates a register with the values in the wrong order.

[NEON] Shift left requires constant integer

When downloading the current tar.gz from v0.1.9-rc5 (and probably ones before that but I didn't check), I cannot compile the NEON stuff on Mac with LLVM/Clang 18. I get the following error multiple times.

cmake-build-release/_deps/tsl_gen-src/generate_tsl_neon-asimd/include/generated/definitions/binary/binary_neon.hpp:1117:104: error: argument to '__builtin_neon_vshlq_n_v' must be a constant integer
 1117 |                 return __extension__ ({ uint8x16_t __ret; uint8x16_t __s0 = data; __ret = (uint8x16_t) __builtin_neon_vshlq_n_v((int8x16_t)__s0, shift, 48); __ret; });

which is the expanded macro for:

[[nodiscard]] 
TSL_FORCE_INLINE 
static typename Vec::register_type apply(
    const typename Vec::register_type data, const unsigned int shift
) {
    return vshlq_n_u8(data, shift);  // <-- this is a macro in Clang
}

When looking at the NEON specs, this error is correct, as vshlq_n_u8 requires a const int as the second argument. I'm not sure how this should be handled, but probably this needs a vdup_n_* for the runtime value before the shift.

Allow TVLGen to generate only a certain set of ctypes

TVLGen already provides the possibility to limit the generated code to a set of extensions. However, especially for scientific paper writing, it would be great to limit the library also to a set of selected ctypes, e.g. float and uint16_t.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.