springer13 / hptt Goto Github PK

View Code? Open in Web Editor NEW

179.0 12.0 39.0 838 KB

High-Performance Tensor Transpose library

License: BSD 3-Clause "New" or "Revised" License

Makefile 1.20% C++ 84.08% Shell 4.27% Python 8.13% C 1.27% CMake 1.05%

tensor tensors transposition tensor-transposition multidimensional-arrays high-performance-computing

hptt's Introduction

High-Performance Tensor Transpose library

HPTT is a high-performance C++ library for out-of-place tensor transpositions of the general form:

where A and B respectively denote the input and output tensor; represents the user-specified transposition, and and being scalars (i.e., setting != 0 enables the user to update the output tensor B).

Key Features

Multi-threading support
Explicit vectorization
Auto-tuning (akin to FFTW)
- Loop order
- Parallelization
Multi architecture support
- Explicitly vectorized kernels for (AVX and ARM)
Supports float, double, complex and double complex data types
Supports both column-major and row-major data layouts

HPTT now also offers C- and Python-interfaces (see below).

Requirements

You must have a working C++ compiler with c++11 support. I have tested HPTT with:

Intel's ICPC 15.0.3, 16.0.3, 17.0.2
GNU g++ 5.4, 6.2, 6.3
clang++ 3.8, 3.9

Install

Clone the repository into a desired directory and change to that location:

git clone https://github.com/springer13/hptt.git
cd hptt
export CXX=<desired compiler>

Now you have several options to build the desired version of the library:

make avx
make arm
make scalar

Using CMake: mkdir build && cd build cmake .. -DCMAKE_C_COMPILER=gcc -DCMAKE_CXX_COMPILER=g++ #Optionally one of [-DENABLE_ARM=ON -DENABLE_AVX=ON -DENABLE_IBM=ON]

This should create 'libhptt.so' inside the ./lib folder.

Getting Started

Please have a look at the provided benchmark.cpp.

In general HPTT is used as follows:

#include <hptt.h>

// allocate tensors
float A* = ...
float B* = ...

// specify permutation and size
int dim = 6;
int perm[dim] = {5,2,0,4,1,3};
int size[dim] = {48,28,48,28,28};

// create a plan (shared_ptr)
auto plan = hptt::create_plan( perm, dim, 
                               alpha, A, size, NULL, 
                               beta,  B, NULL, 
                               hptt::ESTIMATE, numThreads);

// execute the transposition
plan->execute();

The example above does not use any auto-tuning, but solely relies on HPTT's performance model. To active auto-tuning, please use hptt::MEASURE, or hptt::PATIENT instead of hptt::ESTIMATE.

C-Interface

HPTT also offeres a C-interface. This interface is less expressive than its C++ counter part since it does not expose control over the plan.

void sTensorTranspose( const int *perm, const int dim,
        const float alpha, const float *A, const int *sizeA, const int *outerSizeA, 
        const float beta,        float *B,                   const int *outerSizeB, 
        const int numThreads, const int useRowMajor);

void dTensorTranspose( const int *perm, const int dim,
        const double alpha, const double *A, const int *sizeA, const int *outerSizeA, 
        const double beta,        double *B,                   const int *outerSizeB, 
        const int numThreads, const int useRowMajor);
...

Python-Interface

HPTT now also offers a python-interface. The functionality offered by HPTT is comparable to numpy.transpose with the difference being that HPTT can also update the output tensor.

tensorTransposeAndUpdate( perm, alpha, A, beta, B, numThreads=-1)

tensorTranspose( perm, alpha, A, numThreads=-1)

See docstring for additional information. Based on those there are also the following drop-in replacements for numpy functions:

hptt.transpose(A, axes)
hptt.ascontiguousarray(A)
hptt.asfortranarray(A)

Installation should be straight forward via:

cd ./pythonAPI
python setup.py install

pip install -U .

if you want a pip managed install. At this point you should be able to import the 'hptt' package within your python scripts.

The python interface also offers support for:

Single and double precision
Column-major and row-major data layouts
multi-threading support (HPTT by default utilizes all cores of a system)

Python Benchmark

You can find an elaborate example under ./pythonAPI/benchmark/benchmark.py --help

Multi-threaded 2x Intel Haswell-EP E5-2680 v3 (24 threads)
- Comparison again numpy.transpose

Documentation

You can generate the doxygen documentation via

make doc

Benchmark

The benchmark is the same as the original TTC benchmark benchmark for tensor transpositions.

You can compile the benchmark via:

cd benchmark
make

Before running the benchmark, please modify the number of threads and the thread affinity within the benchmark.sh file. To run the benchmark just use:

./benshmark.sh

This will create hptt_benchmark.dat file containing all the runtime information of HPTT and the reference implementation.

Performance Results

See (pdf) for details.

TODOs

Add explicit vectorization for IBM power
Add explicit vectorization for complex types

Related Projects

Shared-Memory Tensor Contractions:
- TCL
- TBLIS
Distributed-Memory Tensor Contractions:
- CTF
- libtensor
Tensor network codes:
- ITensor
- Uni10

Citation

In case you want refer to HPTT as part of a research paper, please cite the following article (pdf):

@inproceedings{hptt2017,
 author = {Springer, Paul and Su, Tong and Bientinesi, Paolo},
 title = {{HPTT}: {A} {H}igh-{P}erformance {T}ensor {T}ransposition {C}++ {L}ibrary},
 booktitle = {Proceedings of the 4th ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming},
 series = {ARRAY 2017},
 year = {2017},
 isbn = {978-1-4503-5069-3},
 location = {Barcelona, Spain},
 pages = {56--62},
 numpages = {7},
 url = {http://doi.acm.org/10.1145/3091966.3091968},
 doi = {10.1145/3091966.3091968},
 acmid = {3091968},
 publisher = {ACM},
 address = {New York, NY, USA},
 keywords = {High-Performance Computing, autotuning, multidimensional transposition, tensor transposition, tensors, vectorization},
}

hptt's People

Contributors

Stargazers

Watchers

hptt's Issues

Any further update with new instruction sets like AVX2/AVX512?

The project is mainly finished about five years before, and only implement transpose with avx instructions. Will there be a update that support new instruction sets?

Please allow the user to choose the type of library that is built: STATIC or SHARED

here

If STATIC is removed - cmake's variable BUILD_SHARED_LIBS changeable by the user would define the library type.

Compiling benchmark

When I compile the benchmark and reference files by operating make in the /benchmark folder, I get the following error:
reference.cpp:60:30: error: cannot convert 'std::complex<float>' to 'float' in assignment 60 | B_[i] = alpha * std::conj(A_[i * strideAinner]); | ~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | | | std::complex<float>

alpha is a floatType, B_[i] is a floatType, A_[i * strideAinner] is a FloatComplex. Which of them should I change to match the types?
Thanks!

Fix for MSVC

Three places need to fix if want to compile with MSVC:

complex types in MSVC-C is _Fcomplex and _Dcomplex, not conforming to C99 , should should replace xxx _Complex with following macro:

#ifndef HPTT_C_FLT_COMPLEX
#ifdef _MSC_VER
  #define HPTT_C_FLT_COMPLEX _Fcomplex
  #define HPTT_C_DBL_COMPLEX _Dcomplex
#else
  #define HPTT_C_FLT_COMPLEX float _Complex
  #define HPTT_C_DBL_COMPLEX double _Complex
#endif
#endif

the INLINE macro should add specification with MSVC: __forceinline

#if defined(__ICC) || defined(__INTEL_COMPILER) || defined(_MSC_VER)
# define INLINE __forceinline
#elif .....

MSVC does not support VLA, should use _alloca instead.

#ifdef _MSC_VER
#define HPTT_DECL_VLA(type_, name_, len_) type_* name_ = reinterpret_cast<type_*>(_alloca(sizeof(type_)*len_));
#else 
#define HPTT_DECL_VLA(type_, name_, len_) type_ name_[len_];
#endif

Strange behaviour with numThreads>1 and execute_expert<useStream, false, betaIsNull>

The following code is OK for numThreads=1 but the result is different for numThreads>1.

   const int dim = 9;
    int sizeAx[dim];
    int sizeAy[dim];
    int sizeAz[dim];
    int perm[dim];


    perm[0] = 2;
    perm[1] = 5;
    perm[2] = 7;
    perm[3] = 8;
    perm[4] = 1;
    perm[5] = 4;
    perm[6] = 3;
    perm[7] = 0;
    perm[8] = 6;

    sizeAx[8] = 3;
    sizeAx[7] = 2;
    sizeAx[6] = 2;
    sizeAx[5] = 1;
    sizeAx[4] = 1;
    sizeAx[3] = 1;
    sizeAx[2] = 2;
    sizeAx[1] = 1;
    sizeAx[0] = 2;

    for (int d = 0; d < 9; d++) sizeAz[d] = sizeAx[perm[d]];
    for (int d = 0; d < 9; d++) sizeAy[d] = sizeAz[perm[d]];

    uint flatSize = 1;
    for (int d = 0; d < 9; d++) flatSize *= sizeAx[d];

    RealType * Ax = new RealType[flatSize];
    RealType * Ay = new RealType[flatSize];
    RealType * Az = new RealType[flatSize];

    for (uint i = 0; i < flatSize; i++) {
        Ax[i] = RealType(i);
        Ay[i] = 0.0;
        Az[i] = 0.0;
    }

    RealType alpha = 1.0;
    RealType beta = 0.0;

    const int numThreads = 4;

    auto planXZ = hptt::create_plan(perm, dim,
            alpha, Ax, sizeAx, NULL,
            beta, Az, NULL,
            hptt::ESTIMATE, numThreads);

    auto planZY = hptt::create_plan(perm, dim,
            alpha, Az, sizeAz, NULL,
            beta, Ay, NULL,
            hptt::ESTIMATE, numThreads);

    auto planYX = hptt::create_plan(perm, dim,
            alpha, Ay, sizeAy, NULL,
            beta, Ax, NULL,
            hptt::ESTIMATE, numThreads);


    const bool useStream = false;
    const bool useThreads = false;
    const bool betaIsNull = true;

    planXZ->execute_expert<useStream, useThreads, betaIsNull>();
    for (uint i = 0; i < flatSize; i++) std::cout << Az[i] << " ";
    std::cout << std::endl;
    planZY->execute_expert<useStream, useThreads, betaIsNull>();
    for (uint i = 0; i < flatSize; i++) std::cout << Ay[i] << " ";
    std::cout << std::endl;
    planYX->execute_expert<useStream, useThreads, betaIsNull>();
    for (uint i = 0; i < flatSize; i++) std::cout << Ax[i] << " ";
    std::cout << std::endl;

    delete[] Ax;
    delete[] Ay;
    delete[] Az;

Result for numThreads=1:

0 2 8 10 16 18 24 26 32 34 40 42 1 3 9 11 17 19 25 27 33 35 41 43 4 6 12 14 20 22 28 30 36 38 44 46 5 7 13 15 21 23 29 31 37 39 45 47
0 8 1 9 4 12 5 13 16 24 17 25 20 28 21 29 32 40 33 41 36 44 37 45 2 10 3 11 6 14 7 15 18 26 19 27 22 30 23 31 34 42 35 43 38 46 39 47
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47

Result for numThreads=4
0 2 8 10 16 18 0 0 0 0 0 0 1 3 9 11 17 19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 8 0 0 0 0 0 0 16 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 10 0 0 0 0 0 0 18 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 2 3 4 5 6 7 8 0 10 11 12 13 14 15 16 0 18 19 20 21 22 23 0 0 26 27 28 29 30 31 0 0 34 35 36 37 38 39 0 0 42 43 44 45 46 47

Transposition into sub-tensor

Hello,
I tried to do a tensor transpose from a 3x3x3 tensor into a 3x3x3 sub-tensor of a 5x5x5 tensor, but the result is unexpected. The following code snippet is what I tried to do

std::vector<double> A(125), B(27, 1);
std::iota(A.begin(), A.end(), 0);
double* aliasA = &A[0];
std::vector<int> perm = {0,1,2};
std::vector<int> size = {3,3,3};
std::vector<int> outerSize = {5,5,5};
auto plan = hptt::create_plan(&perm[0], 3,
                              1, &B[0],   &size[0], NULL,
                              10, aliasA,           &outerSize[0],
                              hptt::ESTIMATE, 1);
plan->execute();
for(int i = 0; i < 125; i++) std::cout << A[i] << std::endl;

I would expect as result the following tensor

  1,  11,  21,   3,   4,
 51,  61,  71,   8,   9,
101, 111, 121,  13,  14,
 15,  16,  17,  18,  19,
 20,  21,  22,  23,  24,

251, 261, 271,  28,  29,
301, 311, 321,  33,  34,
351, 361, 371,  38,  39,
 40,  41,  42,  43,  44,
 45,  46,  47,  48,  49,

501, 511, 521,  53,  54,
551, 561, 571,  58,  59,
601, 611, 621,  63,  64,
 65,  66,  67,  68,  69,
 70,  71,  72,  73,  74,

 75,  76,  77,  78,  79,
 80,  81,  82,  83,  84,
 85,  86,  87,  88,  89,
 90,  91,  92,  93,  94,
 95,  96,  97,  98,  99,

100, 101, 102, 103, 104,
105, 106, 107, 108, 109,
110, 111, 112, 113, 114,
115, 116, 117, 118, 119,
120, 121, 122, 123, 124

However, the result ends up as the tensor

  1, 111,2111, 311,  41,
 51, 611,7111, 811,  91,
101,1111,12111,1311, 141,
 15,  16,  17,  18,  19,
 20,  21,  22,  23,  24,

 25,  26,  27,  28,  29,
 30,  31,  32,  33,  34,
 35,  36,  37,  38,  39,
 40,  41,  42,  43,  44,
 45,  46,  47,  48,  49,

 50,  51,  52,  53,  54,
 55,  56,  57,  58,  59,
 60,  61,  62,  63,  64,
 65,  66,  67,  68,  69,
 70,  71,  72,  73,  74,

 75,  76,  77,  78,  79,
 80,  81,  82,  83,  84,
 85,  86,  87,  88,  89,
 90,  91,  92,  93,  94,
 95,  96,  97,  98,  99,

100, 101, 102, 103, 104,
105, 106, 107, 108, 109,
110, 111, 112, 113, 114,
115, 116, 117, 118, 119,
120, 121, 122, 123, 124

invalid read in create_plan for scalar build

Found a particularly 'challenging' transposition that's causing trouble. Building with gnu, no opts, via make scalar. Here is a minimal test

#include <hptt.h>

int main(){
  int order = 2;
  int perm[] = {1,0};
  int size[] = {1,1};
  double st_buffer[4];
  double new_buffer[4];
  int numThreads = 1;

  auto plan = hptt::create_plan( perm, order,
      1.0, ((double*)st_buffer), size, NULL,
      0.0, ((double*)new_buffer), NULL,
      hptt::ESTIMATE, numThreads );


  return 0;
}

Executing this in test.cxx in the hptt main folder as

g++ -O0 -std=c++0x test.cxx -I./src/ ./lib/libhptt.a  && valgrind ./a.out

gives

==27584== Invalid read of size 8
==27584==    at 0x41D90E: hptt::Transpose<double>::createPlans(std::vector<std::shared_ptr<hptt::Plan>, std::allocator<std::shared_ptr<hptt::Plan> > >&) const (hptt.cpp:1799)
==27584==    by 0x41CF33: hptt::Transpose<double>::createPlan() (hptt.cpp:1693)
==27584==    by 0x4047AE: hptt::create_plan(int const*, int, double, double const*, int const*, int const*, double, double*, int const*, hptt::SelectionMethod, int, int const*) (hptt.cpp:1926)
==27584==    by 0x401516: main (in /home/edgar/work/hptt-v1.0/a.out)
==27584==  Address 0x5ab5ef8 is 8 bytes before a block of size 16 alloc'd
==27584==    at 0x4C2E0EF: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==27584==    by 0x498796: __gnu_cxx::new_allocator<unsigned long>::allocate(unsigned long, void const*) (in /home/edgar/work/hptt-v1.0/a.out)
==27584==    by 0x49873B: std::allocator_traits<std::allocator<unsigned long> >::allocate(std::allocator<unsigned long>&, unsigned long) (in /home/edgar/work/hptt-v1.0/a.out)
==27584==    by 0x4984D2: std::_Vector_base<unsigned long, std::allocator<unsigned long> >::_M_allocate(unsigned long) (in /home/edgar/work/hptt-v1.0/a.out)
==27584==    by 0x49815D: std::vector<unsigned long, std::allocator<unsigned long> >::_M_default_append(unsigned long) (vector.tcc:557)
==27584==    by 0x4129C0: std::vector<unsigned long, std::allocator<unsigned long> >::resize(unsigned long) (stl_vector.h:676)
==27584==    by 0x41AE79: hptt::Transpose<double>::Transpose(int const*, int const*, int const*, int const*, int, double const*, double, double*, double, hptt::SelectionMethod, int, int const*) (hptt.h:145)
==27584==    by 0x4641AC: void __gnu_cxx::new_allocator<hptt::Transpose<double> >::construct<hptt::Transpose<double>, int const*&, int const*&, int const*&, int const*&, int const&, double const*&, double const&, double*&, double const&, hptt::SelectionMethod const&, int const&, int const*&>(hptt::Transpose<double>*, int const*&, int const*&, int const*&, int const*&, int const&, double const*&, double const&, double*&, double const&, hptt::SelectionMethod const&, int const&, int const*&) (in /home/edgar/work/hptt-v1.0/a.out)
==27594==    by 0x463DB3: void std::allocator_traits<std::allocator<hptt::Transpose<double> > >::construct<hptt::Transpose<double>, int const*&, int const*&, int const*&, int const*&, int const&, double const*&, double const&, double*&, double const&, hptt::SelectionMethod const&, int const&, int const*&>(std::allocator<hptt::Transpose<double> >&, hptt::Transpose<double>*, int const*&, int const*&, int const*&, int const*&, int const&, double const*&, double const&, double*&, double const&, hptt::SelectionMethod const&, int const&, int const*&) (in /home/edgar/work/hptt-v1.0/a.out)
==27594==    by 0x463997: std::_Sp_counted_ptr_inplace<hptt::Transpose<double>, std::allocator<hptt::Transpose<double> >, (__gnu_cxx::_Lock_policy)2>::_Sp_counted_ptr_inplace<int const*&, int const*&, int const*&, int const*&, int const&, double const*&, double const&, double*&, double const&, hptt::SelectionMethod const&, int const&, int const*&>(std::allocator<hptt::Transpose<double> >, int const*&, int const*&, int const*&, int const*&, int const&, double const*&, double const&, double*&, double const&, hptt::SelectionMethod const&, int const&, int const*&) (shared_ptr_base.h:522)
==27594==    by 0x4635B3: std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count<hptt::Transpose<double>, std::allocator<hptt::Transpose<double> >, int const*&, int const*&, int const*&, int const*&, int const&, double const*&, double const&, double*&, double const&, hptt::SelectionMethod const&, int const&, int const*&>(std::_Sp_make_shared_tag, hptt::Transpose<double>*, std::allocator<hptt::Transpose<double> > const&, int const*&, int const*&, int const*&, int const*&, int const&, double const*&, double const&, double*&, double const&, hptt::SelectionMethod const&, int const&, int const*&) (shared_ptr_base.h:617)
==27594==    by 0x4632E5: std::__shared_ptr<hptt::Transpose<double>, (__gnu_cxx::_Lock_policy)2>::__shared_ptr<std::allocator<hptt::Transpose<double> >, int const*&, int const*&, int const*&, int const*&, int const&, double const*&, double const&, double*&, double const&, hptt::SelectionMethod const&, int const&, int const*&>(std::_Sp_make_shared_tag, std::allocator<hptt::Transpose<double> > const&, int const*&, int const*&, int const*&, int const*&, int const&, double const*&, double const&, double*&, double const&, hptt::SelectionMethod const&, int const&, int const*&) (shared_ptr_base.h:1096)
==27594==
==27594== Invalid read of size 8
==27594==    at 0x41D935: hptt::Transpose<double>::createPlans(std::vector<std::shared_ptr<hptt::Plan>, std::allocator<std::shared_ptr<hptt::Plan> > >&) const (hptt.cpp:1800)
==27594==    by 0x41CF33: hptt::Transpose<double>::createPlan() (hptt.cpp:1693)
...

don't ask me why I want to do this transposition :).

Python API not working for complex arrays

Test fails for the python API when complex numbers are used. The culprit seems to be a missing parameter in pythonAPI/hptt/hptt.py at line 119 (setConjA is missing, which is instead present in the C handle).

"ValueError: repeated axis in transpose" in hptt.ascontiguousarray, not in np.ascontiguousarray

Hi,

I tried to use the python API of hptt in analogy to numpy. For the following code, I met "ValueError: repeated axis in transpose" in hptt.ascontiguousarray, not in np.ascontiguousarray. May I know is this normal? If yes, what would be the reason for this issue? Thanks

import numpy as np
import hptt
import copy

n_a = n_b = n_c = 1
n_d = n_e = n_f = 2
dim_a = (n_a, n_b, n_c, n_d, n_e, n_f)
a = np.random.random(dim_a)
b = copy.deepcopy(a)

b = np.transpose(b, (1,0,2,3,5,4))
#print(b.flags)
#b = np.ascontiguousarray(b)
b = hptt.ascontiguousarray(b)

Support for Travis CI

Add support for Travis CI

Inconsistent BSD vs LGPLv3 license text

Reading through the HPTT code, I noticed that some files (e.g., hptt.h and hptt.cpp) have LGPLv3 license headers, even though the top-level license text says that the license is 3-clause BSD.

Is this an oversight?

Thanks!

Testing Framework

Create a testing framework that tests HPTT for a many (random) tensor transpositions, sizes, number of threads, data types, outerSizes, beta=0, and beta!=0.

One could use benchmark/referecence.cpp as a reference implementation.

Missing LICENSE information

I can't see a license or copyright information for this code. Could you please add one? Thanks!

Conjugation flag for 'A'

Would it make sense to have a conjugation flag for A; you'd already need this if you would like to use this library to implement Hermitian conjugation.

Clang build OpenMP requirement

It would be nice to be able to build without OpenMP, for instance when using clang. My understanding is external modules are still necessary to do clang + OpenMP. My version of clang is

clang version 3.8.0-2ubuntu4 (tags/RELEASE_380/final)
Target: x86_64-pc-linux-gnu
Thread model: posix

I get the following error when building with -fopenmp or without

src/hptt.cpp:29:10: fatal error: 'omp.h' file not found
#include <omp.h>

Either CMake config file or pkgconfig .pc needed

Currently neither is generated:

a ./opt/local/lib/libhptt.a
a ./opt/local/include/compute_node.h
a ./opt/local/include/hptt.h
a ./opt/local/include/hptt_types.h
a ./opt/local/include/macros.h
a ./opt/local/include/plan.h
a ./opt/local/include/transpose.h
a ./opt/local/include/utils.h

It is probably also better to install headers to ${prefix}/include/hptt and not dump them into a common folder :)

benchmark/reference.cpp fails to compile: error: cannot convert 'std::complex<float>' to 'float' in assignment

g++10 -O3 -std=c++11 -I../src/  -c ../benchmark/reference.cpp -o ../benchmark/reference.o
../benchmark/reference.cpp: In instantiation of 'void transpose_ref(uint32_t*, uint32_t*, int, const floatType*, floatType, floatType*, floatType, bool) [with floatType = float; uint32_t = unsigned int]':
../benchmark/reference.cpp:74:58:   required from here
../benchmark/reference.cpp:60:30: error: cannot convert 'std::complex<float>' to 'float' in assignment
   60 |                B_[i] = alpha * std::conj(A_[i * strideAinner]);
      |                        ~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      |                              |
      |                              std::complex<float>
../benchmark/reference.cpp:66:64: error: cannot convert 'std::complex<float>' to 'float' in assignment
   66 |                B_[i] = alpha * std::conj(A_[i * strideAinner]) + beta * B_[i];
      |                        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~
      |                                                                |
      |                                                                std::complex<float>
../benchmark/reference.cpp: In instantiation of 'void transpose_ref(uint32_t*, uint32_t*, int, const floatType*, floatType, floatType*, floatType, bool) [with floatType = double; uint32_t = unsigned int]':
../benchmark/reference.cpp:80:60:   required from here
../benchmark/reference.cpp:60:30: error: cannot convert 'std::complex<double>' to 'double' in assignment
   60 |                B_[i] = alpha * std::conj(A_[i * strideAinner]);
      |                        ~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      |                              |
      |                              std::complex<double>
../benchmark/reference.cpp:66:64: error: cannot convert 'std::complex<double>' to 'double' in assignment
   66 |                B_[i] = alpha * std::conj(A_[i * strideAinner]) + beta * B_[i];
      |                        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~
      |                                                                |
      |                                                                std::complex<double>
gmake[1]: *** [Makefile:32: ../benchmark/reference.o] Error 1
gmake[1]: Leaving directory '/disk-samsung/freebsd-ports/math/hptt/work/hptt-1.0.5-18-g9425386/testframework'
*** Error code 2

Version: 1.0.5-18-g9425386
gcc-10
FreeBSD 13.1

throw error rather than exit directly

when something like dimension is not correct, hptt exit(-1) directly,
it is not good for debugging,
it would be better if change them to throw error.

Problem with executing benchmark and projects

Hello,

I compiled the benchmark using the Makefile, but then i got an error, when I tried to run the exe.
"Error while loading shared libraries: libhptt.so: cannot open shared object file: No such file or directory"
What could the problem be? I followed exactly the instructions during installation.

Thanks in advance!

Compilation failed with g++ 6.3.0

Hi,
the hptt compilation failed on my machine (6700K-ubuntu 17.04-g++ 6.3.0) with the following message:

/usr/lib/gcc/x86_64-linux-gnu/6/include/avxintrin.h:994:1: error: inlining failed in call to always_inline ‘void hptt::_mm256_stream_ps(float*, hptt::__m256)’: target specific option mismatch _mm256_stream_ps (float *__P, __m256 __A) ^~~~~~~~~~~~~~~~

The compilation is OK with intel icpc 2018.

API description

Hi,
is it possible to give a little more details on the API:

The size vector corresponds to the sizes of A or B ?
i_1 corresponds to the major index (i_1 contiguous to i_1+1) ? ( or is it i_N)

Thanks in advance,

Laurent

error: implicit conversion from 'const _Complex float' to 'float' is not permitted in C++

Build fails with clang-13:

===>  Building for hptt-1.0.5.18
[ 20% 4/5] /usr/bin/c++  -I/disk-samsung/freebsd-ports/math/hptt/work/hptt-1.0.5-18-g9425386/include -O2 -pipe -fno-omit-frame-pointer -fstack-protector-strong -fno-strict-aliasing -fno-omit-frame-pointer -O2 -pipe -fno-omit-frame-pointer -fstack-protector-strong -fno-strict-aliasing -fno-omit-frame-pointer -fopenmp -march=native -std=gnu++11 -MD -MT CMakeFiles/hptt.dir/src/utils.cpp.o -MF CMakeFiles/hptt.dir/src/utils.cpp.o.d -o CMakeFiles/hptt.dir/src/utils.cpp.o -c /disk-samsung/freebsd-ports/math/hptt/work/hptt-1.0.5-18-g9425386/src/utils.cpp
[ 40% 4/5] /usr/bin/c++  -I/disk-samsung/freebsd-ports/math/hptt/work/hptt-1.0.5-18-g9425386/include -O2 -pipe -fno-omit-frame-pointer -fstack-protector-strong -fno-strict-aliasing -fno-omit-frame-pointer -O2 -pipe -fno-omit-frame-pointer -fstack-protector-strong -fno-strict-aliasing -fno-omit-frame-pointer -fopenmp -march=native -std=gnu++11 -MD -MT CMakeFiles/hptt.dir/src/hptt.cpp.o -MF CMakeFiles/hptt.dir/src/hptt.cpp.o.d -o CMakeFiles/hptt.dir/src/hptt.cpp.o -c /disk-samsung/freebsd-ports/math/hptt/work/hptt-1.0.5-18-g9425386/src/hptt.cpp
FAILED: CMakeFiles/hptt.dir/src/hptt.cpp.o 
/usr/bin/c++  -I/disk-samsung/freebsd-ports/math/hptt/work/hptt-1.0.5-18-g9425386/include -O2 -pipe -fno-omit-frame-pointer -fstack-protector-strong -fno-strict-aliasing -fno-omit-frame-pointer -O2 -pipe -fno-omit-frame-pointer -fstack-protector-strong -fno-strict-aliasing -fno-omit-frame-pointer -fopenmp -march=native -std=gnu++11 -MD -MT CMakeFiles/hptt.dir/src/hptt.cpp.o -MF CMakeFiles/hptt.dir/src/hptt.cpp.o.d -o CMakeFiles/hptt.dir/src/hptt.cpp.o -c /disk-samsung/freebsd-ports/math/hptt/work/hptt-1.0.5-18-g9425386/src/hptt.cpp
/disk-samsung/freebsd-ports/math/hptt/work/hptt-1.0.5-18-g9425386/src/hptt.cpp:179:131: error: implicit conversion from 'const _Complex float' to 'float' is not permitted in C++
                         (const hptt::FloatComplex*) A, (hptt::FloatComplex) alpha, (hptt::FloatComplex*) B, (hptt::FloatComplex) beta, hptt::ESTIMATE, numThreads, nullptr, useRowMajor));
                                                                                                             ~                    ^~~~
/disk-samsung/freebsd-ports/math/hptt/work/hptt-1.0.5-18-g9425386/src/hptt.cpp:179:78: error: implicit conversion from 'const _Complex float' to 'float' is not permitted in C++
                         (const hptt::FloatComplex*) A, (hptt::FloatComplex) alpha, (hptt::FloatComplex*) B, (hptt::FloatComplex) beta, hptt::ESTIMATE, numThreads, nullptr, useRowMajor));
                                                        ~                    ^~~~~
/disk-samsung/freebsd-ports/math/hptt/work/hptt-1.0.5-18-g9425386/src/hptt.cpp:190:135: error: implicit conversion from 'const _Complex double' to 'double' is not permitted in C++
                         (const hptt::DoubleComplex*) A, (hptt::DoubleComplex) alpha, (hptt::DoubleComplex*) B, (hptt::DoubleComplex) beta, hptt::ESTIMATE, numThreads, nullptr, useRowMajor));
                                                                                                                ~                     ^~~~
/disk-samsung/freebsd-ports/math/hptt/work/hptt-1.0.5-18-g9425386/src/hptt.cpp:190:80: error: implicit conversion from 'const _Complex double' to 'double' is not permitted in C++
                         (const hptt::DoubleComplex*) A, (hptt::DoubleComplex) alpha, (hptt::DoubleComplex*) B, (hptt::DoubleComplex) beta, hptt::ESTIMATE, numThreads, nullptr, useRowMajor));
                                                         ~                     ^~~~~
4 errors generated.

Version: 1.0.5-18

springer13 / hptt Goto Github PK

hptt's Introduction

High-Performance Tensor Transpose library

Key Features

Requirements

Install

Getting Started

C-Interface

Python-Interface

Python Benchmark

Documentation

Benchmark

Performance Results

TODOs

Related Projects

Citation

hptt's People

Contributors

Stargazers

Watchers

Forkers

hptt's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs