GithubHelp home page GithubHelp logo

oneapi-src / onemkl Goto Github PK

View Code? Open in Web Editor NEW
567.0 46.0 141.0 7.78 MB

oneAPI Math Kernel Library (oneMKL) Interfaces

License: Apache License 2.0

CMake 1.74% C++ 98.06% Python 0.21%
oneapi onemkl hpc dpcpp performance blas intel parallel-computing parallel-programming cpu

onemkl's Introduction

oneAPI Math Kernel Library (oneMKL) Interfaces

oneAPI logo

oneMKL Interfaces is an open-source implementation of the oneMKL Data Parallel C++ (DPC++) interface according to the oneMKL specification. It works with multiple devices (backends) using device-specific libraries underneath.

oneMKL is part of oneAPI.

User Application oneMKL Layer Third-Party Library Hardware Backend
oneMKL interface oneMKL selector Intel(R) oneAPI Math Kernel Library (oneMKL) x86 CPU, Intel GPU
NVIDIA cuBLAS NVIDIA GPU
NVIDIA cuSOLVER NVIDIA GPU
NVIDIA cuRAND NVIDIA GPU
NVIDIA cuFFT NVIDIA GPU
NETLIB LAPACK x86 CPU
AMD rocBLAS AMD GPU
AMD rocSOLVER AMD GPU
AMD rocRAND AMD GPU
AMD rocFFT AMD GPU
portBLAS x86 CPU, Intel GPU, NVIDIA GPU, AMD GPU
portFFT x86 CPU, Intel GPU, NVIDIA GPU, AMD GPU

Table of Contents


Support and Requirements

Supported Usage Models:

Host API

There are two oneMKL selector layer implementations:

  • Run-time dispatching: The application is linked with the oneMKL library and the required backend is loaded at run-time based on device vendor (all libraries should be dynamic).

    Example of app.cpp with run-time dispatching:

    #include "oneapi/mkl.hpp"
    
    ...
    cpu_dev = sycl::device(sycl::cpu_selector());
    gpu_dev = sycl::device(sycl::gpu_selector());
    
    sycl::queue cpu_queue(cpu_dev);
    sycl::queue gpu_queue(gpu_dev);
    
    oneapi::mkl::blas::column_major::gemm(cpu_queue, transA, transB, m, ...);
    oneapi::mkl::blas::column_major::gemm(gpu_queue, transA, transB, m, ...);

    How to build an application with run-time dispatching:

    if OS is Linux, use icpx compiler. If OS is Windows, use icx compiler. Linux example:

    $> icpx -fsycl –I$ONEMKL/include app.cpp
    $> icpx -fsycl app.o –L$ONEMKL/lib –lonemkl
  • Compile-time dispatching: The application uses a templated backend selector API where the template parameters specify the required backends and third-party libraries and the application is linked with the required oneMKL backend wrapper libraries (libraries can be static or dynamic).

    Example of app.cpp with compile-time dispatching:

    #include "oneapi/mkl.hpp"
    
    ...
    cpu_dev = sycl::device(sycl::cpu_selector());
    gpu_dev = sycl::device(sycl::gpu_selector());
    
    sycl::queue cpu_queue(cpu_dev);
    sycl::queue gpu_queue(gpu_dev);
    
    oneapi::mkl::backend_selector<oneapi::mkl::backend::mklcpu> cpu_selector(cpu_queue);
    
    oneapi::mkl::blas::column_major::gemm(cpu_selector, transA, transB, m, ...);
    oneapi::mkl::blas::column_major::gemm(oneapi::mkl::backend_selector<oneapi::mkl::backend::cublas> {gpu_queue}, transA, transB, m, ...);

    How to build an application with compile-time dispatching:

    $> clang++ -fsycl –I$ONEMKL/include app.cpp
    $> clang++ -fsycl app.o –L$ONEMKL/lib –lonemkl_blas_mklcpu –lonemkl_blas_cublas

Refer to Selecting a Compiler for the choice between icpx/icx and clang++ compilers.

Device API

Header-based and backend-independent Device API can be called within sycl kernel or work from Host code (device-rng-usage-model-example). Currently, the following domains support the Device API:

  • RNG. To use RNG Device API functionality it's required to include oneapi/mkl/rng/device.hpp header file.

Supported Configurations:

Supported domains include: BLAS, LAPACK, RNG, DFT, SPARSE_BLAS

Supported compilers include:

  • Intel(R) oneAPI DPC++ Compiler: Intel proprietary compiler that supports CPUs and Intel GPUs. Intel(R) oneAPI DPC++ Compiler will be referred to as "Intel DPC++" in the "Supported Compiler" column of the tables below.
  • oneAPI DPC++ Compiler: Open source compiler that supports CPUs and Intel, NVIDIA, and AMD GPUs. oneAPI DPC++ Compiler will be referred to as "Open DPC++" in the "Supported Compiler" column of the tables below.
  • AdaptiveCpp Compiler (formerly known as hipSYCL): Open source compiler that supports CPUs and Intel, NVIDIA, and AMD GPUs.
    Note: The source code and some documents in this project still use the previous name hipSYCL during this transition period.

Linux*

Domain Backend Library Supported Compiler Supported Link Type
BLAS x86 CPU Intel(R) oneMKL Intel DPC++
AdaptiveCpp
Dynamic, Static
NETLIB LAPACK Intel DPC++
Open DPC++
AdaptiveCpp
Dynamic, Static
portBLAS Intel DPC++
Open DPC++
Dynamic, Static
Intel GPU Intel(R) oneMKL Intel DPC++ Dynamic, Static
portBLAS Intel DPC++
Open DPC++
Dynamic, Static
NVIDIA GPU NVIDIA cuBLAS Open DPC++
AdaptiveCpp
Dynamic, Static
portBLAS Open DPC++ Dynamic, Static
AMD GPU AMD rocBLAS Open DPC++
AdaptiveCpp
Dynamic, Static
portBLAS Open DPC++ Dynamic, Static
LAPACK x86 CPU Intel(R) oneMKL Intel DPC++ Dynamic, Static
Intel GPU Intel(R) oneMKL Intel DPC++ Dynamic, Static
NVIDIA GPU NVIDIA cuSOLVER Open DPC++ Dynamic, Static
AMD GPU AMD rocSOLVER Open DPC++ Dynamic, Static
RNG x86 CPU Intel(R) oneMKL Intel DPC++
AdaptiveCpp
Dynamic, Static
Intel GPU Intel(R) oneMKL Intel DPC++ Dynamic, Static
NVIDIA GPU NVIDIA cuRAND Open DPC++
AdaptiveCpp
Dynamic, Static
AMD GPU AMD rocRAND Open DPC++
AdaptiveCpp
Dynamic, Static
DFT x86 CPU Intel(R) oneMKL Intel DPC++ Dynamic, Static
portFFT (limited API support) Intel DPC++ Dynamic, Static
Intel GPU Intel(R) oneMKL Intel DPC++ Dynamic, Static
portFFT (limited API support) Intel DPC++ Dynamic, Static
NVIDIA GPU NVIDIA cuFFT Open DPC++ Dynamic, Static
portFFT (limited API support) Open DPC++ Dynamic, Static
AMD GPU AMD rocFFT Open DPC++ Dynamic, Static
portFFT (limited API support) Open DPC++ Dynamic, Static
SPARSE_BLAS x86 CPU Intel(R) oneMKL Intel DPC++ Dynamic, Static
Intel GPU Intel(R) oneMKL Intel DPC++ Dynamic, Static

Windows*

Domain Backend Library Supported Compiler Supported Link Type
BLAS x86 CPU Intel(R) oneMKL Intel DPC++ Dynamic, Static
NETLIB LAPACK Intel DPC++
Open DPC++
Dynamic, Static
Intel GPU Intel(R) oneMKL Intel DPC++ Dynamic, Static
LAPACK x86 CPU Intel(R) oneMKL Intel DPC++ Dynamic, Static
Intel GPU Intel(R) oneMKL Intel DPC++ Dynamic, Static
RNG x86 CPU Intel(R) oneMKL Intel DPC++ Dynamic, Static
Intel GPU Intel(R) oneMKL Intel DPC++ Dynamic, Static

Hardware Platform Support

  • CPU
    • Intel Atom(R) Processors
    • Intel(R) Core(TM) Processor Family
    • Intel(R) Xeon(R) Processor Family
  • Accelerators
    • Intel(R) Arc(TM) A-Series Graphics
    • Intel(R) Data Center GPU Max Series
    • NVIDIA(R) A100 (Linux* only)
    • AMD(R) GPUs see here tested on AMD Vega 20 (gfx906)

Supported Operating Systems

Linux*

Backend Supported Operating System
x86 CPU Red Hat Enterprise Linux* 9 (RHEL* 9)
Intel GPU Ubuntu 22.04 LTS
NVIDIA GPU Ubuntu 22.04 LTS

Windows*

Backend Supported Operating System
x86 CPU Microsoft Windows* Server 2022
Intel GPU Microsoft Windows* 11

Software Requirements

What should I download?

General:

Functional Testing Build Only Documentation
CMake (version 3.13 or newer)
Linux* : GNU* GCC 5.1 or higher
Windows* : MSVS* 2017 or MSVS* 2019 (version 16.5 or newer)
Ninja (optional)
GNU* FORTRAN Compiler - Sphinx
NETLIB LAPACK - -

Hardware and OS Specific:

Operating System Device Package
Linux*/Windows* x86 CPU Intel(R) oneAPI DPC++ Compiler
or
oneAPI DPC++ Compiler
Intel(R) oneAPI Math Kernel Library
Intel GPU Intel(R) oneAPI DPC++ Compiler
Intel GPU driver
Intel(R) oneAPI Math Kernel Library
Linux* only NVIDIA GPU oneAPI DPC++ Compiler
or
AdaptiveCpp with CUDA backend and dependencies
AMD GPU oneAPI DPC++ Compiler
or
AdaptiveCpp with ROCm backend and dependencies

Product and Version Information:

Product Supported Version License
CMake 3.13 or higher The OSI-approved BSD 3-clause License
Ninja 1.10.0 Apache License v2.0
GNU* FORTRAN Compiler 7.4.0 or higher GNU General Public License, version 3
Intel(R) oneAPI DPC++ Compiler Latest End User License Agreement for the Intel(R) Software Development Products
AdaptiveCpp Later than 2cfa530 BSD-2-Clause License
oneAPI DPC++ Compiler binary for x86 CPU Daily builds Apache License v2
oneAPI DPC++ Compiler source for NVIDIA and AMD GPUs Daily source releases Apache License v2
Intel(R) oneAPI Math Kernel Library Latest Intel Simplified Software License
NVIDIA CUDA SDK 12.0 End User License Agreement
AMD rocBLAS 4.5 AMD License
AMD rocRAND 5.1.0 AMD License
AMD rocSOLVER 5.0.0 AMD License
AMD rocFFT rocm-5.4.3 AMD License
NETLIB LAPACK 5d4180c BSD like license
portBLAS 0.1 Apache License v2.0
portFFT 0.1 Apache License v2.0

Documentation


Contributing

See CONTRIBUTING for more information.


License

Distributed under the Apache license 2.0. See LICENSE for more information.


FAQs

oneMKL

Q: What is the difference between the following oneMKL items?

A:

  • The oneAPI Specification for oneMKL defines the DPC++ interfaces for performance math library functions. The oneMKL specification can evolve faster and more frequently than implementations of the specification.

  • The oneAPI Math Kernel Library (oneMKL) Interfaces Project is an open source implementation of the specification. The project goal is to demonstrate how the DPC++ interfaces documented in the oneMKL specification can be implemented for any math library and work for any target hardware. While the implementation provided here may not yet be the full implementation of the specification, the goal is to build it out over time. We encourage the community to contribute to this project and help to extend support to multiple hardware targets and other math libraries.

  • The Intel(R) oneAPI Math Kernel Library (oneMKL) product is the Intel product implementation of the specification (with DPC++ interfaces) as well as similar functionality with C and Fortran interfaces, and is provided as part of Intel® oneAPI Base Toolkit. It is highly optimized for Intel CPU and Intel GPU hardware.

Q: I'm trying to use oneMKL Interfaces in my project using FetchContent, but I keep running into ONEMKL::SYCL::SYCL target was not found problem when I try to build the project. What should I do?

A: Make sure you set the compiler when you configure your project. E.g. cmake -Bbuild . -DCMAKE_CXX_COMPILER=icpx.

Q: I'm trying to use oneMKL Interfaces in my project using find_package(oneMKL). I set oneMKL/oneTBB and Compiler environment first, then I built and installed oneMKL Interfaces, and finally I tried to build my project using installed oneMKL Interfaces (e.g. like this cmake -Bbuild -GNinja -DCMAKE_CXX_COMPILER=icpx -DoneMKL_ROOT=<path_to_installed_oneMKL_interfaces> .) and I noticed that cmake includes installed oneMKL Interfaces headers as a system include which ends up as a lower priority than the installed oneMKL package includes which I set before for building oneMKL Interfaces. As a result, I get conflicts between oneMKL and installed oneMKL Interfaces headers. What should I do?

A: Having installed oneMKL Interfaces headers as -I instead on system includes (as -isystem) helps to resolve this problem. We use INTERFACE_INCLUDE_DIRECTORIES to add paths to installed oneMKL Interfaces headers (check oneMKLTargets.cmake in lib/cmake to find it). It's a known limitation that INTERFACE_INCLUDE_DIRECTORIES puts headers paths as system headers. To avoid that:

  • Option 1: Use CMake >=3.25. In this case oneMKL Interfaces will be built with EXPORT_NO_SYSTEM property set to true and you won't see the issue.
  • Option 2: If you use CMake < 3.25, set PROPERTIES NO_SYSTEM_FROM_IMPORTED true for your target. E.g: set_target_properties(test PROPERTIES NO_SYSTEM_FROM_IMPORTED true).

onemkl's People

Contributors

aacostadiaz avatar aelizaro avatar aidanbeltons avatar akabalov avatar andrewtbarker avatar andreyfe1 avatar dnhsieh-intel avatar ericlars avatar fitchbe avatar fmarno avatar hdelan avatar hjabird avatar jasukhar avatar jle-quel avatar mcao59 avatar mkrainiuk avatar mmeterel avatar muhammad-tanvir-1211 avatar nadyaten avatar npmiller avatar ouadielfarouki avatar pasaulais avatar pgorlani avatar rbiessy avatar s-nick avatar sbalint98 avatar sknepper avatar tejax-alaghari avatar vmalia avatar vrpascuzzi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

onemkl's Issues

Overriding CMAKE_BUILD_TYPE

The following code:

oneMKL/CMakeLists.txt

Lines 26 to 32 in 7f40c12

option(BUILD_DEBUG "" OFF)
if(BUILD_DEBUG)
set(CMAKE_BUILD_TYPE "Debug")
else()
set(CMAKE_BUILD_TYPE "Release")
endif()

overrides the default CMAKE_BUILD_TYPE passed by the command line, and expect users to use BUILD_DEBUG flag. Is not really the way CMake users expect to build the programs, some may say is not idiomatic. In practice, it causes some conflicts with Jenkins scripts and user expectations about building the project.

Is there any particular reason for this? otherwise, CMAKE_BUILD_TYPE should be enough to cover for the use case...

Unit tests on Cuda device deplete the device memory

Summary

When running the unit tests on a Cuda device the tests fail since the GPU runs out of memory.

I am trying to run the tests on a gtx1080Ti with 11178MiB of global memory, but after executing the first few tests, a runtime exception is thrown because of insufficient device memory (CUDA_ERROR_OUT_OF_MEMORY) (see log below)

Version

The current oneMKL develop head is used eg: 1ed12c7

Environment

  • HW you use
    Intel Gold 6130 CPU with Nvidia gtx1080 GPUs
  • Backend library version
    Cuda 10.0
    MKL, and TBB obtained via intel installer version 2021.2.0
  • OS name and version
    Ubuntu 20.04 (fakeroot singularity container)
  • Compiler version
    dpc++ compiler cloned from develop with hash: 4e26734cb87c451e0562559d5d6f83b7eabcaea3
    compiled with:
    buildbot/configure.py --cuda
    and buildbot/compile.py
  • CMake
    cmake.md

Steps to reproduce

Let the cuda-enabled dpc++ be installed in: <cuda-DPC++-dir>
configure, build oneMKL:

LD_LIBRARY_PATH=<cuda-DPC++-dir>/lib/ \
CXX=<cuda-DPC++-dir>/bin/clang++ \
CC=<cuda-DPC++-dir>/bin/clang cmake  \
-DCMAKE_BUILD_TYPE=Debug \
-DTBB_ROOT=/opt/intel/oneapi/tbb/2021.2.0/ \
-DMKL_ROOT=/opt/intel/oneapi/mkl/2021.2.0/ \
-DENABLE_CUBLAS_BACKEND=ON \
-DENABLE_CURAND_BACKEND=OFF \
-DENABLE_MKLGPU_BACKEND=OFF \
-DCMAKE_INSTALL_PREFIX=/home/sbalint/hipSYCL-main/oneMKL-install/ \
..
LD_LIBRARY_PATH=<cuda-DPC++-dir>/lib/ make -j 64
LD_LIBRARY_PATH=<cuda-DPC++-dir>/lib/:$LD_LIBRARY_PATH bin/test_main_blas_ct

Observed behavior

After the first few tests, all GPU test fail because of CUDA_ERROR_OUT_OF_MEMORY. Checking nvidia-smi while running the tests confirms that the allocated memory is continuously increasing over time. Possible memory leak?
cuda_test_out.log

Expected behavior

GPU tests shouldn't fail because of a lack of device memory

os.errno deprecated as of Python 3.7

Summary

The helper scripts for template generation -- e.g. scripts/generate_wrappers.py -- use os.errno which has been deprecated as of Python 3.7. Moreover, according to this Python bug tracker discussion, os.errno should not be used; rather, the errno module is recommended.

Version

0ecbf93f6c

Environment

Python 3.8.5

Steps to reproduce

python3.8 \
scripts/generate_wrappers.py \
include/oneapi/mkl/rng/detail/curand/onemkl_rng_newbackend.hpp \
src/rng/function_table.hpp \
src/rng/backends/newbackend/mkl_rng_newbackend_wrappers.cpp \
newbackend

Observed behavior

Running the above step results in:

<...>
Generate src/rng/backends/curand/wrappers.cpp
Formatting with clang-format src/rng/backends/curand/wrappers.cpp
Generate src/rng/backends/curand/curand_wrappers_table_dyn.cpp
Traceback (most recent call last):
  File "/devel/src/intel/onemkl/scripts/generate_wrappers.py", line 127, in <module>
    os.makedirs(os.path.dirname(table_file))
  File "/usr/lib/python3.8/os.py", line 223, in makedirs
    mkdir(name, mode)
FileExistsError: [Errno 17] File exists: 'src/rng/backends/curand'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/devel/src/intel/onemkl/scripts/generate_wrappers.py", line 129, in <module>
    if exc.errno != os.errno.EEXIST:
AttributeError: module 'os' has no attribute 'errno'

Expected behavior

Importing errno and replacing os.errno with errno, running the above "Steps to reproduce" should output:

<...>
Generate src/rng/backends/curand/wrappers.cpp
Formatting with clang-format src/rng/backends/curand/wrappers.cpp
Generate src/rng/backends/curand/curand_wrappers_table_dyn.cpp
Formatting with clang-format src/rng/backends/curand/curand_wrappers_table_dyn.cpp

Proposed solution

Use errno rather than os.errno. This should not require a change in the Python version (3.6) requirement.

Compilation error when building cuRAND or cuBLAS tests

Summary

When compiling oneMKL with both tests and cuRAND or cuBLAS enabled a compilation error occurs.

Version

The current oneMKL develop head is used eg: 1ed12c7

Environment

  • HW you use
    Intel Gold 6130 CPU with Nvidia gtx1080 GPUs
  • Backend library version
    Cuda 10.0
    MKL, and TBB obtained via intel installer version 2021.1.1
  • OS name and version
    Ubuntu 20.04
  • Compiler version
    dpc++ compiler cloned from develop with hash: 4e26734cb87c451e0562559d5d6f83b7eabcaea3
compiled with buildbot/configure.py --cuda
and buildbot/compile.py

Steps to reproduce

git clone https://github.com/oneapi-src/oneMKL.git
mkdir build && cd build

LD_LIBRARY_PATH=/root/hipSYCL-main/dpc++-hand/llvm/build/install/lib/:$LD_LIBRARY_PATH \
CXX=/root/hipSYCL-main/dpc++-hand/llvm/build/install/bin/clang++ \
CC=/root/hipSYCL-main/dpc++-hand/llvm/build/install/bin/clang \
cmake -G Ninja \
-DCMAKE_BUILD_TYPE=Debug \
-DTBB_ROOT=/root/hipSYCL-main/dpc++/tbb/latest \
-DMKL_ROOT=/root/hipSYCL-main/dpc++/mkl/latest \
-DREF_BLAS_ROOT=/root/spack/opt/spack/linux-centos7-skylake_avx512/gcc-9.3.1/openblas-0.3.14-npb5lv7dhfygc3lgh6zx3x6chlyt4kth/ \
-DENABLE_CUBLAS_BACKEND=OFF \
-DENABLE_CURAND_BACKEND=ON \
-DENABLE_MKLGPU_BACKEND=OFF ..

LD_LIBRARY_PATH=/root/hipSYCL-main/dpc++-hand/llvm/build/install/lib/:$LD_LIBRARY_PATH ninja

Observed behavior

When either of ENABLE_CURAND_BACKEND and ENABLE_CUBLAS_BACKEND is defined the compilation fails. I believe this can be traced back to the following issues:

  • In case ENABLE_CURAND_BACKEND=ON ENABLE_CUBLAS_BACKEND=OFF
    The compilation terminates with an error. I suspect that this is caused by the code in test_helper.hpp 70-81. In case ENABLE_CURAND_BACKEND is defined the compilation will fail since TEST_RUN_NVIDIAGPU_CURAND_SELECT will be defined with the backend selector oneapi::mkl::backend::curand and there are no such blas functions defined in blas_ct_backends.hpp
    compile_error_curand.log

  • In case ENABLE_CURAND_BACKEND=OFF ENABLE_CUBLAS_BACKEND=ON
    The compilation fails since the cuRAND tests are attempted to be compiled with the cublas backend selector.
    compile_error_cublas.log
    A possible workaround in case only cuBLAS is of interest is to comment out adding the rng domain in the root level CMakelists.txt. In that case, the compilation is successful. only a few warnings about SYCL 2020 depreciation warnings are displayed.

Expected behavior

All combination should compile successfully. I believe a possible fix might be to use a single Cuda backend selector instead of separate cuBLAS and cuRAND?

oneapi::mkl::lapack::getrf pivot vectors are wrong

Summary

The oneapi::mkl::lapack::getrf routine takes a std::int64_t pointer for the pivots.
But the output for the pivots seems wrong. It looks like they are generated as 32 bit numbers.
The corresponding getrs code works with the output from oneapi::mkl::lapack::getrf.

Version

oneapi/2020.12.15.005

Environment

  • Intel(R) Xeon(R) Gold 6226R CPU @ 2.90GHz

Steps to reproduce

#include <iostream>
#include <CL/sycl.hpp>
#include "oneapi/mkl.hpp"

int main(int argc, char* argv[]) {
  cl::sycl::queue q(cl::sycl::host_selector{});
  int n = 10;
  std::vector<double> A(n*n);
  for (int i=0; i<n; i++) A[i+i*n] = 1.;
  auto iwork = oneapi::mkl::lapack::getrf_scratchpad_size<double>(q, n, n, n);
  std::vector<double> work(iwork);
  std::vector<std::int64_t> piv(n);
  oneapi::mkl::lapack::getrf(q, n, n, A.data(), n, piv.data(), work.data(), iwork);
  for (auto p : piv) std::cout << p << " ";
  std::cout << std::endl;
  for (int i=0; i<n; i++) std::cout << reinterpret_cast<int*>(piv.data())[i] << " ";
  std::cout << std::endl;
}

Observed behavior

Output of above code:
8589934593 17179869187 25769803781 34359738375 42949672969 0 0 0 0 0
1 2 3 4 5 6 7 8 9 10

Expected behavior

Document behavior you expect.
1 2 3 4 5 6 7 8 9 10

Investigate switching to external oneMKL DPC++ APIs for mklcpu/mklgpu backend

Summary

This issues is related to Intel oneMKL symbols used in mklgpu backend. Currently internal symbols are used to call oneMKL routines. However, these symbols can chance between releases, which will cause build failures in oneMKL open source interfaces.

Problem statement

The internal symbols can change between releases and cause build failures.

Details

Switching to external Intel oneMKL APIs will improve stability of the repo between releases.

Compilation error for `mklgpu/mklgpu_batch.cpp`

Summary

Using the name “MAJOR” here

cgh.single_task<class MAJOR>([]() {});
causes compiling problems with the version of DPC++ I have (Intel(R) oneAPI DPC++/C++ Compiler 2022.1.0 (2022.x.0.20211025)). It looks like the issue is because there’s a class named MAJOR which becomes MKL_COL_MAJOR from the #define MAJOR MKL_COL_MAJOR and a enum member defined as MKL_COL_MAJOR
typedef enum { MKL_ROW_MAJOR = 101, MKL_COL_MAJOR = 102 } MKL_LAYOUT;

As work around I changed class MAJOR to “class test” and it compiled fine. If there's a better workaround or I messed something else up, let me know!

Version

oneMKL git commit d06919ca0f1c675b170df94259f319eb1d020d5c

Environment

  • HW you use: Intel GPU
  • Backend library version: MKL
  • OS name and version: openSUSE 15.3
  • Compiler version: DPC++ Intel(R) oneAPI DPC++/C++ Compiler 2022.1.0 (2022.x.0.20211025)
  • CMake output log: See output below

Steps to reproduce

Please check that the issue is reproducible with the latest revision on
master. Include all the steps to reproduce the issue.

    git clone https://github.com/oneapi-src/oneMKL.git
    cd oneMKL
    rm -rf build
    mkdir build
    cd build
    CXX=`which dpcpp` cmake ../ -DMKL_ROOT=$MKLROOT -DREF_BLAS_ROOT=${blas_root} -DREF_LAPACK_ROOT=${blas_root} -DSYCL_LIBRARY=${SDK_ROOT}/compiler/latest/linux/lib/libsycl.so
    cmake --build . -j1

Observed behavior

When I try to build it fails with:

[  4%] Building CXX object bin/blas/backends/mklgpu/CMakeFiles/onemkl_blas_mklgpu_obj.dir/mklgpu_batch.cpp.o
In file included from /gpfs/jlse-fs0/users/bertoni/oneMKL/oneMKL/src/blas/backends/mklgpu/mklgpu_batch.cpp:34:
/gpfs/jlse-fs0/users/bertoni/oneMKL/oneMKL/src/blas/backends/mklgpu/mklgpu_batch.cxx:84:50: error: 'MKL_COL_MAJOR' does not refer to a value
    ::oneapi::mkl::gpu::sgemv_batch_sycl(&queue, MAJOR, mkl_convert(transa), m, n, alpha, &a, lda,
                                                 ^
/gpfs/jlse-fs0/users/bertoni/oneMKL/oneMKL/src/blas/backends/mklgpu/mklgpu_batch.cpp:33:15: note: expanded from macro 'MAJOR'
#define MAJOR MKL_COL_MAJOR
              ^
/tmp/mklgpu_batch-header-7d3491.h:8:7: note: declared here
class MKL_COL_MAJOR;
      ^

Expected behavior

I expect it to compile.

oneapi-src/oneMKL and RNG

[Possibly related to #8.]

I am having some troubling experience with oneAPI and oneMKL. To build my application against oneMKL from oneAPI/beta07 using intel/llvm (1b762a8), it seems that I need

set(LINK_LIBS
  mkl_sycl mkl_intel_ilp64 mkl_tbb_thread
  mkl_core tbb sycl OpenCL pthread m dl)

[UPDATE]
I can actually build with just:

mkl_sycl mkl_intel_ilp64 mkl_tbb_thread mkl_core tbb

At first, I thought I found the perfect configuration, but it turns out when running a simple unit test top shows ~14GB of virtual memory being used since I'm pulling in all these additional libraries.

I thought I'd build oneMKL from this repo against the oneAPI/beta07 oneMKL -- i.e. setting MKL_ROOT to the oneAPI/<...>/oneMKL install directory -- to see if that'd help. However, I can't even compile against the built library:

$ cmake --build .
[1/12] Linking CXX shared library x86_64-centos7-clang110-opt/lib/libSyclRng.so
FAILED: x86_64-centos7-clang110-opt/lib/libSyclRng.so 
: && /opt/modulefiles/dpcpp/dpcpp_wrapper/clang++ -fPIC -g -O2 -fsycl -std=c++17 -Wno-unknown-cuda-version -O2 -g -DNDEBUG  -Wl,--as-needed -Wl,--no-undefined -Wl,-z,max-page-size=0x1000 -Wl,--hash-style=both -shared -Wl,-soname,libSyclRng.so -o x86_64-centos7-clang110-opt/lib/libSyclRng.so FastCaloSycl/SyclRng/CMakeFiles/SyclRng.dir/src/SimHitRng.cxx.o -L/bld4/atlas/root/v6-14-08_gcc93/lib -Wl,-rpath,/bld4/atlas/root/v6-14-08_gcc93/lib:/home/vrpascuzzi/atlas/dev/build.FastCaloSim-GPU.dpcpp/x86_64-centos7-clang110-opt/lib:  x86_64-centos7-clang110-opt/lib/libSyclCommon.so  -lonemkl  -lonemkl_blas_mklcpu  -lonemkl_blas_mklgpu && :
/tmp/SimHitRng-00fcad.o: In function `SimHitRng::Dealloc()':
/home/vrpascuzzi/atlas/dev/source/FastCaloSim-GPU/FastCaloSimAnalyzer/FastCaloSycl/SyclRng/src/SimHitRng.cxx:105: undefined reference to `mkl::rng::philox4x32x10::~philox4x32x10()'
/tmp/SimHitRng-00fcad.o: In function `SimHitRng::Init(unsigned int, unsigned short, unsigned long, unsigned long long)':
/home/vrpascuzzi/atlas/dev/source/FastCaloSim-GPU/FastCaloSimAnalyzer/FastCaloSycl/SyclRng/src/SimHitRng.cxx:47: undefined reference to `mkl::rng::philox4x32x10::philox4x32x10(cl::sycl::queue&, unsigned long)'
/tmp/SimHitRng-00fcad.o: In function `SimHitRng::Generate(unsigned int)':
/home/vrpascuzzi/atlas/dev/source/FastCaloSim-GPU/FastCaloSimAnalyzer/FastCaloSycl/SyclRng/src/SimHitRng.cxx:58: undefined reference to `cl::sycl::event mkl::rng::generate<mkl::rng::uniform<float, mkl::rng::uniform_method::standard>, mkl::rng::philox4x32x10>(mkl::rng::uniform<float, mkl::rng::uniform_method::standard> const&, mkl::rng::philox4x32x10&, long, mkl::rng::uniform<float, mkl::rng::uniform_method::standard>::result_type*, std::vector<cl::sycl::event, std::allocator<cl::sycl::event> > const&)'
clang-11: error: linker command failed with exit code 1 (use -v to see invocation)
ninja: build stopped: subcommand failed.

using now:

set(LINK_LIBS onemkl onemkl_blas_mklcpu onemkl_blas_mklgpu)

Can someone shed some light on this for me? How should I be compiling against oneMKL to get access to the RNGs?

is MKL now open?

It appears like you have open sourced MKL library for sgemm. Thanks

PyPI package mkl==2021.1.1 takes up too much space

Summary

MKL library installed via pip3 install mkl==2021.1.1 has duplicate shared library (so) entries and takes up too much space on disk, which, when built into Docker images, will increase significantly the final image size.

Version

mkl == 2021.1.1 release

Environment

Ubuntu Linux 18.04/20.04
Python3.6/Python3.8
pip3 20.3.1

Steps to reproduce

pip3 install mkl==2021.1.1
pip3 show -f mkl 
ll /usr/local/lib/libmkl_*  -lh

Observed and Expected behavior

ls -lh shows the following files

-rwxr-xr-x 1 root root  47M Dec 10 20:19 /usr/local/lib/libmkl_avx2.so.1*
-rwxr-xr-x 1 root root  64M Dec 10 20:19 /usr/local/lib/libmkl_avx512_mic.so.1*
-rwxr-xr-x 1 root root  61M Dec 10 20:19 /usr/local/lib/libmkl_avx512.so.1*
-rwxr-xr-x 1 root root  50M Dec 10 20:19 /usr/local/lib/libmkl_avx.so.1*
-rwxr-xr-x 1 root root 513K Dec 10 20:19 /usr/local/lib/libmkl_blacs_intelmpi_ilp64.so*
-rwxr-xr-x 1 root root 513K Dec 10 20:19 /usr/local/lib/libmkl_blacs_intelmpi_ilp64.so.1*
-rwxr-xr-x 1 root root 310K Dec 10 20:19 /usr/local/lib/libmkl_blacs_intelmpi_lp64.so*
-rwxr-xr-x 1 root root 310K Dec 10 20:19 /usr/local/lib/libmkl_blacs_intelmpi_lp64.so.1*
-rwxr-xr-x 1 root root 514K Dec 10 20:19 /usr/local/lib/libmkl_blacs_openmpi_ilp64.so*
-rwxr-xr-x 1 root root 514K Dec 10 20:19 /usr/local/lib/libmkl_blacs_openmpi_ilp64.so.1*
-rwxr-xr-x 1 root root 315K Dec 10 20:19 /usr/local/lib/libmkl_blacs_openmpi_lp64.so*
-rwxr-xr-x 1 root root 315K Dec 10 20:19 /usr/local/lib/libmkl_blacs_openmpi_lp64.so.1*
-rwxr-xr-x 1 root root 513K Dec 10 20:19 /usr/local/lib/libmkl_blacs_sgimpt_ilp64.so*
-rwxr-xr-x 1 root root 513K Dec 10 20:19 /usr/local/lib/libmkl_blacs_sgimpt_ilp64.so.1*
-rwxr-xr-x 1 root root 310K Dec 10 20:19 /usr/local/lib/libmkl_blacs_sgimpt_lp64.so*
-rwxr-xr-x 1 root root 310K Dec 10 20:19 /usr/local/lib/libmkl_blacs_sgimpt_lp64.so.1*
-rwxr-xr-x 1 root root 166K Dec 10 20:19 /usr/local/lib/libmkl_cdft_core.so*
-rwxr-xr-x 1 root root 166K Dec 10 20:19 /usr/local/lib/libmkl_cdft_core.so.1*
-rwxr-xr-x 1 root root 129M Dec 10 20:19 /usr/local/lib/libmkl_core.so*
-rwxr-xr-x 1 root root 129M Dec 10 20:19 /usr/local/lib/libmkl_core.so.1*
-rwxr-xr-x 1 root root  40M Dec 10 20:19 /usr/local/lib/libmkl_def.so.1*
-rwxr-xr-x 1 root root  12M Dec 10 20:19 /usr/local/lib/libmkl_gf_ilp64.so*
-rwxr-xr-x 1 root root  12M Dec 10 20:19 /usr/local/lib/libmkl_gf_ilp64.so.1*
-rwxr-xr-x 1 root root  13M Dec 10 20:19 /usr/local/lib/libmkl_gf_lp64.so*
-rwxr-xr-x 1 root root  13M Dec 10 20:19 /usr/local/lib/libmkl_gf_lp64.so.1*
-rwxr-xr-x 1 root root  30M Dec 10 20:19 /usr/local/lib/libmkl_gnu_thread.so*
-rwxr-xr-x 1 root root  30M Dec 10 20:19 /usr/local/lib/libmkl_gnu_thread.so.1*
-rwxr-xr-x 1 root root  12M Dec 10 20:19 /usr/local/lib/libmkl_intel_ilp64.so*
-rwxr-xr-x 1 root root  12M Dec 10 20:19 /usr/local/lib/libmkl_intel_ilp64.so.1*
-rwxr-xr-x 1 root root  13M Dec 10 20:19 /usr/local/lib/libmkl_intel_lp64.so*
-rwxr-xr-x 1 root root  13M Dec 10 20:19 /usr/local/lib/libmkl_intel_lp64.so.1*
-rwxr-xr-x 1 root root  62M Dec 10 20:19 /usr/local/lib/libmkl_intel_thread.so*
-rwxr-xr-x 1 root root  62M Dec 10 20:19 /usr/local/lib/libmkl_intel_thread.so.1*
-rwxr-xr-x 1 root root  48M Dec 10 20:19 /usr/local/lib/libmkl_mc3.so.1*
-rwxr-xr-x 1 root root  46M Dec 10 20:19 /usr/local/lib/libmkl_mc.so.1*
-rwxr-xr-x 1 root root  40M Dec 10 20:19 /usr/local/lib/libmkl_pgi_thread.so*
-rwxr-xr-x 1 root root  40M Dec 10 20:19 /usr/local/lib/libmkl_pgi_thread.so.1*
-rwxr-xr-x 1 root root 6.8M Dec 10 20:19 /usr/local/lib/libmkl_rt.so*
-rwxr-xr-x 1 root root 6.8M Dec 10 20:19 /usr/local/lib/libmkl_rt.so.1*
-rwxr-xr-x 1 root root 7.4M Dec 10 20:19 /usr/local/lib/libmkl_scalapack_ilp64.so*
-rwxr-xr-x 1 root root 7.4M Dec 10 20:19 /usr/local/lib/libmkl_scalapack_ilp64.so.1*
-rwxr-xr-x 1 root root 7.4M Dec 10 20:19 /usr/local/lib/libmkl_scalapack_lp64.so*
-rwxr-xr-x 1 root root 7.4M Dec 10 20:19 /usr/local/lib/libmkl_scalapack_lp64.so.1*
-rwxr-xr-x 1 root root  28M Dec 10 20:19 /usr/local/lib/libmkl_sequential.so*
-rwxr-xr-x 1 root root  28M Dec 10 20:19 /usr/local/lib/libmkl_sequential.so.1*
-rwxr-xr-x 1 root root 617M Dec 10 20:19 /usr/local/lib/libmkl_sycl.so*
-rwxr-xr-x 1 root root 617M Dec 10 20:19 /usr/local/lib/libmkl_sycl.so.1*
-rwxr-xr-x 1 root root  40M Dec 10 20:19 /usr/local/lib/libmkl_tbb_thread.so*
-rwxr-xr-x 1 root root  40M Dec 10 20:19 /usr/local/lib/libmkl_tbb_thread.so.1*
-rwxr-xr-x 1 root root  15M Dec 10 20:19 /usr/local/lib/libmkl_vml_avx2.so.1*
-rwxr-xr-x 1 root root  15M Dec 10 20:19 /usr/local/lib/libmkl_vml_avx512_mic.so.1*
-rwxr-xr-x 1 root root  14M Dec 10 20:19 /usr/local/lib/libmkl_vml_avx512.so.1*
-rwxr-xr-x 1 root root  15M Dec 10 20:19 /usr/local/lib/libmkl_vml_avx.so.1*
-rwxr-xr-x 1 root root 7.4M Dec 10 20:19 /usr/local/lib/libmkl_vml_cmpt.so.1*
-rwxr-xr-x 1 root root 8.3M Dec 10 20:19 /usr/local/lib/libmkl_vml_def.so.1*
-rwxr-xr-x 1 root root  14M Dec 10 20:19 /usr/local/lib/libmkl_vml_mc2.so.1*
-rwxr-xr-x 1 root root  14M Dec 10 20:19 /usr/local/lib/libmkl_vml_mc3.so.1*
-rwxr-xr-x 1 root root  14M Dec 10 20:19 /usr/local/lib/libmkl_vml_mc.so.1*

Many of the libmkl_XXX.so and libmkl_XXX.so.1 pairs share the same checksum, where libmkl_XXX.so should be symbol link of its corresponding libmkl_XXX.so.1.

And the same problem exists with tbb==2021.1.1.

USM with sycl::half support

Hello,
I built oneMKL with cuBLAS and everything works fine and all the tests passed. But still, I'm unable to pass matrices of sycl::half to oneapi::mkl::blas::column_major::gemm when using the USM syntax, the overload is not found. I get the following:

/home/michel/Documents/oneAPI_build/sample/mkl_matmult_usm.cpp:86:13: error: no matching function for call to 'gemm'
[build]             gemm(my_queue, transpose::nontrans, transpose::nontrans, m, n, k, alpha, A.get(), ldA, B.get(), ldB, beta, C.get(), ldC);
[build]             ^~~~
[build] /home/michel/sycl_workspace/deploy/include/oneapi/mkl/blas.hxx:265:20: note: candidate function not viable: no known conversion from 'sycl::detail::half_impl::half *' to 'cl::sycl::buffer<half, 1> &' (aka 'buffer<sycl::detail::half_impl::half, 1> &') for 8th argument
[build] static inline void gemm(cl::sycl::queue &queue, transpose transa, transpose transb, std::int64_t m,
[build]                    ^     ^
[build] /home/michel/sycl_workspace/deploy/include/oneapi/mkl/blas/detail/blas_ct_backends.hxx:417:20: note: candidate function not viable: no known conversion from 'sycl::queue' to 'backend_selector<backend::mklgpu>' for 1st argument
[build] static inline void gemm(backend_selector<backend::BACKEND> selector, transpose transa,
[build]                    ^
...

Am I doing something wrong? sycl::half works with the buffer syntax.

Compile-time error trying to build code and run tests

Summary

I'm seeing a compile-time error when I try to build and run the tests. Any insight or tip about what I'm doing wrong is appreciated!

Version

This is with oneMKL pulled from github, git commit 596ba0a7ce75547698a311f607629bbd4fac03ac

Environment

  • HW you use: Iris Gen9
  • Backend library version: OpenCL
  • OS name and version: linux, openSUSE, 15.2
  • Compiler version:
> dpcpp -v
Intel(R) oneAPI DPC++ Compiler 2021.1 (2020.10.0.1113)

Steps to reproduce

    blas_root=$PWD
    # first get lapack                                                                                                                                                                                                              
    git clone https://github.com/Reference-LAPACK/lapack.git
    cd lapack/
    mkdir build
    cd build/
    cmake -DBUILD_SHARED_LIBS=ON -DCBLAS=ON -DCMAKE_INSTALL_LIBDIR=${blas_root} -DCMAKE_INSTALL_PREFIX=${blas_root} ..
    cmake --build . -j4 --target install
    cd ../..

    git clone https://github.com/oneapi-src/oneMKL.git
    cd oneMKL
    rm -rf build
    mkdir build
    cd build
    CXX=`which dpcpp` cmake ../ -DMKL_ROOT=$MKLROOT -DREF_BLAS_ROOT=${blas_root} -DSYCL_LIBRARY=${SDK_ROOT}/compiler/latest/linux/lib/libsycl.so
    cmake --build . -j4
    ctest

Observed behavior

The code isn't compiling for me. Am I doing something obviously wrong? The build output ends with:

...
Scanning dependencies of target gtest_main
[ 96%] Building CXX object deps/googletest/CMakeFiles/gtest_main.dir/src/gtest_main.cc.o
[ 96%] Building CXX object tests/unit_tests/blas/level2/CMakeFiles/blas_level2_rt.dir/spr_usm.cpp.o
[ 96%] Linking CXX shared library ../../lib/libgtest_main.so
[ 96%] Built target gtest_main
[ 96%] Building CXX object tests/unit_tests/blas/level2/CMakeFiles/blas_level2_rt.dir/spr2_usm.cpp.o
Scanning dependencies of target test_main_rng_ct
[ 96%] Building CXX object tests/unit_tests/CMakeFiles/test_main_rng_ct.dir/main_test.cpp.o
Scanning dependencies of target test_main_blas_ct
[ 96%] Building CXX object tests/unit_tests/CMakeFiles/test_main_blas_ct.dir/main_test.cpp.o
Scanning dependencies of target test_main_rng_rt
[ 96%] Building CXX object tests/unit_tests/CMakeFiles/test_main_rng_rt.dir/main_test.cpp.o
[ 97%] Building CXX object tests/unit_tests/blas/level2/CMakeFiles/blas_level2_rt.dir/spmv_usm.cpp.o
[ 98%] Linking CXX executable ../../bin/test_main_rng_ct

[ FATAL ] /home/bertoni/gpu_tests/testing/source/benchmarks/conformance/r.oneMKL/oneMKL/deps/googletest/include/gtest/internal/gtest-param-util.h:562:: Condition test_param_names.count(param_name) == 0 failed. Duplicate paramet\
erized test name 'Intel_R__Xeon_R__CPU_E3_1585_v5___3_50GHz', in /home/bertoni/gpu_tests/testing/source/benchmarks/conformance/r.oneMKL/oneMKL/tests/unit_tests/rng/statistics_check/uniform.cpp line 98

CMake Error at /soft/packaging/spack-builds/linux-rhel7-x86_64/gcc-9.3.0/cmake-3.18.2-mwdhwbhynfd7dcpegq4iq6xkzvavcmsh/share/cmake-3.18/Modules/GoogleTestAddTests.cmake:77 (message):
  Error running test executable.

    Path: '/home/bertoni/gpu_tests/testing/source/benchmarks/conformance/r.oneMKL/oneMKL/build/bin/test_main_rng_ct'
    Result: Child aborted
    Output:


Call Stack (most recent call first):
  /soft/packaging/spack-builds/linux-rhel7-x86_64/gcc-9.3.0/cmake-3.18.2-mwdhwbhynfd7dcpegq4iq6xkzvavcmsh/share/cmake-3.18/Modules/GoogleTestAddTests.cmake:173 (gtest_discover_tests_impl)


gmake[2]: *** [tests/unit_tests/CMakeFiles/test_main_rng_ct.dir/build.make:133: bin/test_main_rng_ct] Error 1
gmake[2]: *** Deleting file 'bin/test_main_rng_ct'
gmake[1]: *** [CMakeFiles/Makefile2:922: tests/unit_tests/CMakeFiles/test_main_rng_ct.dir/all] Error 2
gmake[1]: *** Waiting for unfinished jobs....

Expected behavior

I expect it to compile and run.

Update sycl:: functionality deprecated for 2020 specification

Summary

Align all sycl:: apis with sycl 2020 spec version

Problem statement

llvm compiler have already added new APIs and deprecated old ones (Example: get_count() for sycl::buffer), but dpcpp compiler doesn't support new versions yet.

Preferred solution

  • Update functionality marked as deprecated in sycl 2020 to the new versions
  • Remove SYCL2020_DISABLE_DEPRECATION_WARNINGS macro

Intel oneAPI Installation Not Working

I have been trying to register myself on Intel's Developer Zone to get access to the oneMKL library, but the download page is sending me back to the registration form every time. Is there any workaround to downloading oneMKL without registering first on Intel?

Couldn't load selected backend

I've compiled intel/llvm (a7ad8b8) for CUDA:

python3 ${LLVM_SRC}/buildbot/configure.py --shared-libs --cmake-gen "Unix Makefiles" --cuda -o .

and using it built oneapi-src/oneMKL (f805087) against oneAPI/mkl/beta08 (beta09 wasn't supported in this revision):

export CXX=`which clang++`
source <path_to>/mkl/2021.1-beta08/env/vars.sh
cmake -DBUILD_FUNCTIONAL_TESTS=OFF  ${ONEMKL_SRC}

I'm using CUDA 10.2 > 10.0 so should be compatible. To compile the test program (below):

$ clang++ -fsycl -std=c++17 -fsycl-targets=nvptx64-nvidia-cuda-sycldevice \
  -Wno-unknown-cuda-version -DSYCL_TARGET_CUDA test_mkl.cc -lonemkl

the test compiles cleanly, but when running I get:

 $ SYCL_DEVICE_FILTER=cuda:* ./a.out 
Running on: GeForce RTX 2080 SUPER
terminate called after throwing an instance of 'oneapi::mkl::backend_not_found'
  what():  oneMKL: Couldn't load selected backend
Aborted (core dumped)

When adding -lonemkl_rng_mklgpu to the compile line:

$ clang++ -fsycl -std=c++17 -fsycl-targets=nvptx64-nvidia-cuda-sycldevice -Wno-unknown-cuda-version -DSYCL_TARGET_CU
DA test_mkl.cc -lonemkl -lonemkl_rng_mklgpu
//bld4/opt/intel/inteloneapi/mkl/2021.1-beta08/lib/intel64/libmkl_sycl.so: undefined reference to `cl::sycl::level0::make_platform(unsign
ed long)'
//bld4/opt/intel/inteloneapi/mkl/2021.1-beta08/lib/intel64/libmkl_sycl.so: undefined reference to `cl::sycl::context::context(cl::sycl::d
evice const&, std::function<void (cl::sycl::exception_list)>, bool)'
//bld4/opt/intel/inteloneapi/mkl/2021.1-beta08/lib/intel64/libmkl_sycl.so: undefined reference to `cl::sycl::level0::make_device(cl::sycl
::platform const&, unsigned long)'
//bld4/opt/intel/inteloneapi/mkl/2021.1-beta08/lib/intel64/libmkl_sycl.so: undefined reference to `clCreateProgramWithIL'
//bld4/opt/intel/inteloneapi/mkl/2021.1-beta08/lib/intel64/libmkl_sycl.so: undefined reference to `cl::sycl::level0::make_program(cl::syc
l::context const&, unsigned long)'
//bld4/opt/intel/inteloneapi/mkl/2021.1-beta08/lib/intel64/libmkl_sycl.so: undefined reference to `cl::sycl::level0::make_queue(cl::sycl:
:context const&, unsigned long)'
clang-12: error: linker command failed with exit code 1 (use -v to see invocation)

N.B. In Intel GPU tests I also see these linker errors and so do not use -lonemkl_rng_mklgpu during compilation.

Another thought was to enable CUBLAS (a long shot) in the oneMLK build, but this doesn't compile at all:

$ cmake -DBUILD_FUNCTIONAL_TESTS=OFF -DENABLE_CUBLAS_BACKEND=on ${ONEMKL_SRC}
<...>
$ cmake --build . -- -j70
<...>
In file included from /opt/dpcpp/2020.10.02-a7ad8b8-cuda/bin/../include/sycl/CL/sycl/detail/common.hpp:11:
/opt/dpcpp/2020.10.02-a7ad8b8-cuda/bin/../include/sycl/CL/cl_ext_intel.h:431:9: error: unknown type name 'cl_properties'
typedef cl_properties cl_mem_properties_intel;
        ^
In file included from /home/vrpascuzzi/sw/intel/oneMKL/src/blas/backends/cublas/cublas_scope_handle.cpp:19:
In file included from /home/vrpascuzzi/sw/intel/oneMKL/src/blas/backends/cublas/cublas_scope_handle.hpp:21:
In file included from /opt/dpcpp/2020.10.02-a7ad8b8-cuda/bin/../include/sycl/CL/sycl.hpp:11:
In file included from /opt/dpcpp/2020.10.02-a7ad8b8-cuda/bin/../include/sycl/CL/sycl/ONEAPI/atomic.hpp:11:
In file included from /opt/dpcpp/2020.10.02-a7ad8b8-cuda/bin/../include/sycl/CL/sycl/ONEAPI/atomic_accessor.hpp:11:
In file included from /opt/dpcpp/2020.10.02-a7ad8b8-cuda/bin/../include/sycl/CL/sycl/ONEAPI/atomic_enums.hpp:12:
In file included from /opt/dpcpp/2020.10.02-a7ad8b8-cuda/bin/../include/sycl/CL/sycl/access/access.hpp:10:
In file included from /opt/dpcpp/2020.10.02-a7ad8b8-cuda/bin/../include/sycl/CL/sycl/detail/common.hpp:121:
In file included from /opt/dpcpp/2020.10.02-a7ad8b8-cuda/bin/../include/sycl/CL/sycl/exception.hpp:15:
/opt/dpcpp/2020.10.02-a7ad8b8-cuda/bin/../include/sycl/CL/sycl/detail/pi.h:228:7: error: use of undeclared identifier 'CL_DEVICE_QUEUE_ON
_DEVICE_PROPERTIES'
      CL_DEVICE_QUEUE_ON_DEVICE_PROPERTIES,
      ^
/opt/dpcpp/2020.10.02-a7ad8b8-cuda/bin/../include/sycl/CL/sycl/detail/pi.h:229:45: error: use of undeclared identifier 'CL_DEVICE_QUEUE_O
N_HOST_PROPERTIES'
  PI_DEVICE_INFO_QUEUE_ON_HOST_PROPERTIES = CL_DEVICE_QUEUE_ON_HOST_PROPERTIES,
                                            ^
/opt/dpcpp/2020.10.02-a7ad8b8-cuda/bin/../include/sycl/CL/sycl/detail/pi.h:233:31: error: use of undeclared identifier 'CL_DEVICE_IL_VERSION_KHR'
  PI_DEVICE_INFO_IL_VERSION = CL_DEVICE_IL_VERSION_KHR,
<...>

plus a load of other related errors.

Thanks,
Vince

##############
# test_mkl.cc
##############

#include <math.h>

#include <CL/sycl.hpp>
#include <iostream>
#include <oneapi/mkl.hpp>
#include <vector>

// Value to initialize random number generator
#define SEED 7777777

// Value of Pi with many exact digits to compare with estimated value of Pi
#define PI 3.1415926535897932384626433832795

#ifdef SYCL_TARGET_CUDA
class CUDASelector : public cl::sycl::device_selector {
 public:
  int operator()(const cl::sycl::device& device) const override {
    const std::string device_vendor = device.get_info<cl::sycl::info::device::vendor>();
    const std::string device_driver =
        device.get_info<cl::sycl::info::device::driver_version>();
    const std::string device_name = device.get_info<cl::sycl::info::device::name>();

    if (device.is_gpu() &&
        (device_vendor.find("NVIDIA") != std::string::npos) &&
        (device_driver.find("CUDA") != std::string::npos) &&
        (device_name.find("2080") != std::string::npos)) {
      return 1;
    };
    return -1;
  }
};
#endif

// Gets the target device, as defined by the build configuration.
static inline cl::sycl::device GetTargetDevice() {
  cl::sycl::device dev;
#if defined SYCL_TARGET_CUDA
  CUDASelector cuda_selector;
  try {
    dev = cl::sycl::device(cuda_selector);
  } catch (...) {
  }
#elif defined SYCL_TARGET_DEFAULT
  dev = cl::sycl::device(cl::sycl::default_selector());
#elif defined SYCL_TARGET_CPU
  dev = cl::sycl::device(cl::sycl::cpu_selector());
#elif defined SYCL_TARGET_GPU
  dev = cl::sycl::device(cl::sycl::gpu_selector());
#else
  dev = cl::sycl::device(cl::sycl::host_selector());
#endif

  return dev;
}

void test_rng(size_t n_points) {
  auto exception_handler = [](cl::sycl::exception_list exceptions) {
    for (std::exception_ptr const& e : exceptions) {
      try {
        std::rethrow_exception(e);
      } catch (cl::sycl::exception const& e) {
        std::cout << "Caught asynchronous SYCL exception:\n"
                  << e.what() << std::endl;
      }
    }
  };

  // Choose device to run on and create queue
  cl::sycl::device dev = GetTargetDevice();
  cl::sycl::queue queue(dev, exception_handler);

  // Create usm allocator
  cl::sycl::usm_allocator<float, cl::sycl::usm::alloc::shared> allocator(
      queue.get_context(), queue.get_device());

  // Allocate storage for random numbers
  std::vector<float, decltype(allocator)> x(n_points, allocator);

  std::cout << "Running on: "
            << queue.get_device().get_info<cl::sycl::info::device::name>()
            << std::endl;

  try {
    // Generator initialization
    oneapi::mkl::rng::philox4x32x10 engine(queue, SEED);
    oneapi::mkl::rng::uniform<float> distr(0.0f, 1.0f);

    oneapi::mkl::rng::generate(distr, engine, n_points, x.data());
    // wait to finish generation
    queue.wait_and_throw();
  } catch (cl::sycl::exception const& e) {
    std::cout << "\t\tSYCL exception \n" << e.what() << std::endl;
  }
}

int main() {
  size_t n_points = 120000000;

  test_rng(n_points);

  return 0;
}

Rng mklcpu backend uses ambiguous kernel names

Summary

In the rng mklcpu backend in philox4x32x10.cpp the same kernel names are generated for both the USM and buffer API. This does not cause an error when using dpc++, but according to the SYCL standard, this is illegal and causes problems with some SYCL implementations, tested with hipSYCL.

Observed behavior

For example, the kernels at line 328 and 68 in philox4x32x10.cpp appear to have the same name. This is a result of the combination of the type of distr and philox4x32x10_impl being used as kernel names. Since kernels are declared for both USM and buffer interfaces with the same type of distr this results in ambiguous kernel names.

Expected behavior

All kernels should have a unique name.

Building with CMake fails in installation

Hi there,

I am trying to install oneMKL using CMake on Ubuntu 18.04.
I have successfully finished the building step, and ctest is passing all the tests, however when I attempt to install using "cmake --install . --prefix ../install", it fails with the following error:

-- Up-to-date: /data4/salar/oneMKL/install/include -- Up-to-date: /data4/salar/oneMKL/install/include/oneapi -- Up-to-date: /data4/salar/oneMKL/install/include/oneapi/mkl.hpp -- Up-to-date: /data4/salar/oneMKL/install/include/oneapi/mkl -- Up-to-date: /data4/salar/oneMKL/install/include/oneapi/mkl/blas.hxx -- Up-to-date: /data4/salar/oneMKL/install/include/oneapi/mkl/rng.hpp -- Up-to-date: /data4/salar/oneMKL/install/include/oneapi/mkl/exceptions.hpp -- Up-to-date: /data4/salar/oneMKL/install/include/oneapi/mkl/detail -- Up-to-date: /data4/salar/oneMKL/install/include/oneapi/mkl/detail/get_device_id.hpp -- Up-to-date: /data4/salar/oneMKL/install/include/oneapi/mkl/detail/backend_selector.hpp -- Up-to-date: /data4/salar/oneMKL/install/include/oneapi/mkl/detail/exceptions.hpp -- Up-to-date: /data4/salar/oneMKL/install/include/oneapi/mkl/detail/export.hpp -- Up-to-date: /data4/salar/oneMKL/install/include/oneapi/mkl/detail/backends.hpp -- Up-to-date: /data4/salar/oneMKL/install/include/oneapi/mkl/detail/backends_table.hpp -- Up-to-date: /data4/salar/oneMKL/install/include/oneapi/mkl/detail/backend_selector_predicates.hpp -- Up-to-date: /data4/salar/oneMKL/install/include/oneapi/mkl/rng -- Up-to-date: /data4/salar/oneMKL/install/include/oneapi/mkl/rng/predicates.hpp -- Up-to-date: /data4/salar/oneMKL/install/include/oneapi/mkl/rng/distributions.hpp -- Up-to-date: /data4/salar/oneMKL/install/include/oneapi/mkl/rng/functions.hpp -- Up-to-date: /data4/salar/oneMKL/install/include/oneapi/mkl/rng/detail -- Up-to-date: /data4/salar/oneMKL/install/include/oneapi/mkl/rng/detail/rng_loader.hpp -- Up-to-date: /data4/salar/oneMKL/install/include/oneapi/mkl/rng/detail/engine_impl.hpp -- Up-to-date: /data4/salar/oneMKL/install/include/oneapi/mkl/rng/detail/curand -- Up-to-date: /data4/salar/oneMKL/install/include/oneapi/mkl/rng/detail/curand/onemkl_rng_curand.hpp -- Up-to-date: /data4/salar/oneMKL/install/include/oneapi/mkl/rng/detail/mklcpu -- Up-to-date: /data4/salar/oneMKL/install/include/oneapi/mkl/rng/detail/mklcpu/onemkl_rng_mklcpu.hpp -- Up-to-date: /data4/salar/oneMKL/install/include/oneapi/mkl/rng/detail/mklgpu -- Up-to-date: /data4/salar/oneMKL/install/include/oneapi/mkl/rng/detail/mklgpu/onemkl_rng_mklgpu.hpp -- Up-to-date: /data4/salar/oneMKL/install/include/oneapi/mkl/rng/engines.hpp -- Up-to-date: /data4/salar/oneMKL/install/include/oneapi/mkl/types.hpp -- Up-to-date: /data4/salar/oneMKL/install/include/oneapi/mkl/blas -- Up-to-date: /data4/salar/oneMKL/install/include/oneapi/mkl/blas/predicates.hpp -- Up-to-date: /data4/salar/oneMKL/install/include/oneapi/mkl/blas/detail -- Up-to-date: /data4/salar/oneMKL/install/include/oneapi/mkl/blas/detail/blas_ct_backends.hpp -- Up-to-date: /data4/salar/oneMKL/install/include/oneapi/mkl/blas/detail/blas_loader.hxx -- Up-to-date: /data4/salar/oneMKL/install/include/oneapi/mkl/blas/detail/cublas -- Up-to-date: /data4/salar/oneMKL/install/include/oneapi/mkl/blas/detail/cublas/onemkl_blas_cublas.hpp -- Up-to-date: /data4/salar/oneMKL/install/include/oneapi/mkl/blas/detail/cublas/blas_ct.hxx -- Up-to-date: /data4/salar/oneMKL/install/include/oneapi/mkl/blas/detail/cublas/onemkl_blas_cublas.hxx -- Up-to-date: /data4/salar/oneMKL/install/include/oneapi/mkl/blas/detail/cublas/blas_ct.hpp -- Up-to-date: /data4/salar/oneMKL/install/include/oneapi/mkl/blas/detail/netlib -- Up-to-date: /data4/salar/oneMKL/install/include/oneapi/mkl/blas/detail/netlib/blas_ct.hxx -- Up-to-date: /data4/salar/oneMKL/install/include/oneapi/mkl/blas/detail/netlib/onemkl_blas_netlib.hpp -- Up-to-date: /data4/salar/oneMKL/install/include/oneapi/mkl/blas/detail/netlib/blas_ct.hpp -- Up-to-date: /data4/salar/oneMKL/install/include/oneapi/mkl/blas/detail/blas_ct_backends.hxx -- Up-to-date: /data4/salar/oneMKL/install/include/oneapi/mkl/blas/detail/onemkl_blas_backends.hxx -- Up-to-date: /data4/salar/oneMKL/install/include/oneapi/mkl/blas/detail/blas_loader.hpp -- Up-to-date: /data4/salar/oneMKL/install/include/oneapi/mkl/blas/detail/mklcpu -- Up-to-date: /data4/salar/oneMKL/install/include/oneapi/mkl/blas/detail/mklcpu/blas_ct.hxx -- Up-to-date: /data4/salar/oneMKL/install/include/oneapi/mkl/blas/detail/mklcpu/onemkl_blas_mklcpu.hpp -- Up-to-date: /data4/salar/oneMKL/install/include/oneapi/mkl/blas/detail/mklcpu/blas_ct.hpp -- Up-to-date: /data4/salar/oneMKL/install/include/oneapi/mkl/blas/detail/mklgpu -- Up-to-date: /data4/salar/oneMKL/install/include/oneapi/mkl/blas/detail/mklgpu/blas_ct.hxx -- Up-to-date: /data4/salar/oneMKL/install/include/oneapi/mkl/blas/detail/mklgpu/onemkl_blas_mklgpu.hpp -- Up-to-date: /data4/salar/oneMKL/install/include/oneapi/mkl/blas/detail/mklgpu/blas_ct.hpp -- Up-to-date: /data4/salar/oneMKL/install/include/oneapi/mkl/blas/predicates.hxx -- Up-to-date: /data4/salar/oneMKL/install/include/oneapi/mkl/blas.hpp -- Up-to-date: /data4/salar/oneMKL/install/lib/cmake/oneMKL/oneMKLTargets.cmake -- Up-to-date: /data4/salar/oneMKL/install/lib/cmake/oneMKL/oneMKLTargets-release.cmake -- Up-to-date: /data4/salar/oneMKL/install/lib/cmake/oneMKL/oneMKLConfig.cmake -- Up-to-date: /data4/salar/oneMKL/install/lib/cmake/oneMKL/oneMKLConfigVersion.cmake -- Up-to-date: /data4/salar/oneMKL/install/lib/cmake/oneMKL/FindMKL.cmake -- Up-to-date: /data4/salar/oneMKL/install/lib/cmake/oneMKL/FindCompiler.cmake -- Up-to-date: /data4/salar/oneMKL/install/include/oneapi/mkl/detail/config.hpp CMake Error at bin/cmake_install.cmake:45 (file): file INSTALL cannot find "/data4/salar/oneMKL/build/bin/CMakeFiles/CMakeRelink.dir/libonemkl.so.0": No such file or directory. Call Stack (most recent call first): cmake_install.cmake:73 (include)

Would really appreciate if you could help me out to figure out what could be the possible issue.

CMake issue, -fsycl-unnamed-lambda passed to C/Fortran compiler

If I add

find_package(MKL REQUIRED)
target_link_libraries(mylib PUBLIC MKL::MKL_DPCPP)

in CMakeLists.txt and run CMake with -DCMAKE_CXX_COMPILER=dpcpp,
then I get these errors:

gfortran: error: unrecognized command-line option '-fsycl-unnamed-lambda'
gcc: error: unrecognized command-line option '-fsycl-unnamed-lambda'

It does work when I add this -DCMAKE_C_COMPILER=icx -DCMAKE_Fortran_COMPILER=ifx when running CMake
but it still gives these warnings:

clang-13: warning: argument unused during compilation: '-fsycl-unnamed-lambda' [-Wunused-command-line-argument]
ifx: command line warning #10006: ignoring unknown option '-fsycl-unnamed-lambda'

Error compiling a sample example for LLVM compiler and NVIDIA

Summary

I am trying to use oneMKL and the open-source LLVM compiler in an Nvidia GPU, the building and testing processes went fine. However, compiling a program with a "oneapi::mkl::blas::column_major::gemm" function fails at linking process.

Version

Environment

Steps to reproduce

I also add the building process to confirm that everything is ok:

mkdir build && cd build
export CXX=~/sycl_workspace/llvm/build/bin/clang++ # path to the LLVM compiler
cmake .. -DENABLE_CUBLAS_BACKEND=True -DENABLE_MKLCPU_BACKEND=False -DENABLE_MKLGPU_BACKEND=False
cmake --build .
ctest # 100% tests pass
mkdir ~/sycl_workspace/llvm/build/include/oneMKL
cmake --install . --prefix ~/sycl_workspace/llvm/build/include/oneMKL

Now I have tried to compile the following example:

#include <CL/sycl.hpp>
#include <iostream>
#include "oneapi/mkl.hpp"

using namespace std;
using namespace cl::sycl;

// Matrix size constants
#define SIZE 4800  // Must be a multiple of 8.
#define M SIZE / 8
#define N SIZE / 4
#define P SIZE / 2

class CUDASelector : public cl::sycl::device_selector {
  public:
    int operator()(const cl::sycl::device &Device) const override {
      //using namespace cl::sycl::info;
      const std::string DriverVersion = Device.get_info<info::device::driver_version>();

      if (Device.is_gpu() && (DriverVersion.find("CUDA") != std::string::npos)) {
       // std::cout << " CUDA device found " << std::endl;
        return 1;
      };
      return 0;
    }
};


int main() {
  oneapi::mkl::transpose transA = oneapi::mkl::transpose::nontrans;
  oneapi::mkl::transpose transB = oneapi::mkl::transpose::nontrans;

  // matrix data sizes
  int m = M;
  int n = P;
  int k = N;

  // leading dimensions of data
  int ldA = m;
  int ldB = k;
  int ldC = m;

  // set scalar fp values
  float alpha = 1.0;
  float beta = 0.0;

  CUDASelector Selector;
  cl::sycl::queue device_queue(Selector);
  std::cout << "Running on " << device_queue.get_device().get_info<sycl::info::device::name>() << std::endl;

  // 1D arrays on host side

  float* A = malloc_shared<float>(M*N, device_queue);
  float* B = malloc_shared<float>(N*P, device_queue);
  float* C = malloc_shared<float>(M*P, device_queue);
  
  // prepare matrix data with column-major style
  int i, j;
  // A(M, N) is a matrix whose values are column number plus one
  for (i = 0; i < N; i++)
    for (j = 0; j < M; j++) A[i * M + j] = i + 1.0;

  // B(N, P) is matrix whose values are row number plus one
  for (i = 0; i < P; i++)
    for (j = 0; j < N; j++) B[i * N + j] = j + 1.0;

  cout << "Problem size: c(" << M << "," << P << ") = a(" << M << "," << N
       << ") * b(" << N << "," << P << ")" << std::endl;

  oneapi::mkl::blas::column_major::gemm(device_queue, transA, transB, m, n, k, alpha, A, ldA, B,
                    ldB, beta, C, ldC);

  free(A, device_queue);
  free(B, device_queue);
  free(C, device_queue);

  return 0;
}

Observed behavior

I compiled with:
clang++ -fsycl -fsycl-targets=nvptx64-nvidia-cuda -I$ONEMKL/include mkl.cpp

Where "$ONEMKL" is an env var that stores the library path (in my case: "~/sycl_workspace/llvm/build/include/oneMKL"). That gets the following error:

warning: linking module '/home/user/sycl_workspace/llvm/build/lib/clang/14.0.0/../../clc/remangled-l64-signed_char.libspirv-nvptx64--nvidiacl.bc': Linking two modules of different target triples: '/home/user/sycl_workspace/llvm/build/lib/clang/14.0.0/../../clc/remangled-l64-signed_char.libspirv-nvptx64--nvidiacl.bc' is 'nvptx64-unknown-nvidiacl' whereas 'mkl.cpp' is 'nvptx64-nvidia-cuda'
 [-Wlinker-warnings]
1 warning generated.
/usr/bin/ld: /tmp/mkl-4ee840.o: en la función `oneapi::mkl::blas::column_major::gemm(cl::sycl::queue&, oneapi::mkl::transpose, oneapi::mkl::transpose, long, long, long, float, float const*, long, float const*, long, float, float*, long, std::vector<cl::sycl::event, std::allocator<cl::sycl::event> > const&)':
mkl-a9d4b7.cpp:(.text+0x960): referencia a `oneapi::mkl::blas::column_major::detail::gemm(oneapi::mkl::device, cl::sycl::queue&, oneapi::mkl::transpose, oneapi::mkl::transpose, long, long, long, float, float const*, long, float const*, long, float, float*, long, std::vector<cl::sycl::event, std::allocator<cl::sycl::event> > const&)' sin definir
clang-14: error: linker command failed with exit code 1 (use -v to see invocation)

Do you have any idea why "oneapi::mkl::blas::column_major::detail::gemm" is not defined?

Expected behavior

Compile without errors.

Refactor build rules for unit tests

Summary

Enabling BUILD_FUNCTIONAL_TESTS compiles unit tests in both rng and blas domains.

Problem statement

If working in a single domain with BUILD_FUNCTIONAL_TESTS enabled, irrelevant unit tests are compiled, adding a non-negligible overhead in build times. For example, if one is working exclusively in the rng domain and enables BUILD_FUNCTIONAL_TESTS, blas unit tests are compiled and the total build time is substantially greater than if only rng domain unit tests (the relevant tests) were compiled.

# `blas' + `rng' unit tests
$ time ( cmake --build <...> -- -j16 )
<...>
real	10m59.050s
user	158m41.756s
sys	2m28.292s

vs.

# `rng' unit tests
$ time ( cmake --build <...> -- -j16 )
<...>
real	1m24.986s
user	9m7.986s
sys	0m11.170s

Of course, going the other direction -- i.e. working in the blas domain and compiling rng domain tests -- does not add much more compilation time.

Details

The top-level CMakeLists.txt adds the tests subdirectory if BUILD_FUNCTIONAL_TESTS is enabled, then tests/CMakeLists.txt builds the GoogleTest infrastructure and adds the unit_tests subdirectory.

Furthermore, tests/unit_tests/CMakeLists.txt currently handles all the condition checking for both domains. The checks can quickly become difficult to manage when adding additional back-ends. For example, rng domain tests aren't built if either cublas or netlib back-end are enabled. Should one wish to add an additional CUDA-based back-end (say, for cuRAND), this could be an issue; it conceivable one would want to build tests for both cuBLAS and cuRAND.

Proposed solution(s)

Some possibilities:

  1. Clean up tests/unit_tests/CMakeLists.txt
    Use conditionals in a cleaner way to separate domain-specific rules more generally.
  2. Split tests by domain, keeping existing cmake options
    Refactor the existing CMakeLists.txt files to build unit tests based on whether blas or rng options are specified.
  3. Split tests by domain, modifying existing cmake options
    Similar to the previous solution, but also introducing domain-specific options BUILD_FUNCTIONAL_TESTS_BLAS and BUILD_FUNCTIONAL_TESTS_RNG (or some variation of the names), and keeping BUILD_FUNCTIONAL_TESTS as a catch-all.

Other ideas are of course welcome.

[CUDA] MKL and RNG

Hi,

After spending some time getting oneMKL to build with cuBLAS support [1], I can now use oneMKL with CUDA devices. However, I overlooked the fact that this brings in only the cuBLAS backend, and not the random-number generators available for Intel hardware.

Is there any plan to integrate RNGs for CUDA devices, e.g. with a cuRAND backend? While I understand CUDA support is in general experimental, I test my codes on both Intel and NVIDIA GPUs to ensure the software runs on both platforms, i.e. we are aiming for heterogenous solutions.
I'd be very interested to help in such an effort, but don't want to reinvent the wheel if something is already in the works.

Thanks.

[1] intel/llvm#1548

Could NOT find cuBLAS (missing CUBLAS_INCLUDE_DIR)

Summary

Building with the cublas back-end fails to find cuBLAS library:

$ cmake -DBUILD_FUNCTIONAL_TESTS=ON -DENABLE_MKLGPU_BACKEND=OFF -DENABLE_MKLCPU_BACKEND=OFF -DENABLE_CUBLAS_BACKEND=ON -DENABLE_CURAND_BACKEND=OFF -DOPENCL_INCLUDE_DIR=/opt/khronos/ocl-headers/include -DREF_BLAS_ROOT=/opt/netlib/lapack/3.9.0 $SRCDIR
-- CMAKE_BUILD_TYPE: None, set to Release by default
-- The CXX compiler identification is Clang 12.0.0
-- Check for working CXX compiler: /opt/intel/llvm/2020.12.27-6ca33e2df283-cuda/bin/clang++
-- Check for working CXX compiler: /opt/intel/llvm/2020.12.27-6ca33e2df283-cuda/bin/clang++ -- works
<...>
-- Found CUDA: /opt/nvidia/cuda/10.2 (found suitable version "10.2", minimum required is "10.0") 
CMake Error at /usr/share/cmake-3.16/Modules/FindPackageHandleStandardArgs.cmake:146 (message):
  Could NOT find cuBLAS (missing: CUBLAS_INCLUDE_DIR)
Call Stack (most recent call first):
  /usr/share/cmake-3.16/Modules/FindPackageHandleStandardArgs.cmake:393 (_FPHSA_FAILURE_MESSAGE)
  cmake/FindcuBLAS.cmake:39 (find_package_handle_standard_args)
  src/blas/backends/cublas/CMakeLists.txt:22 (find_package)
<...>

causing the cmake configuration to fail. This is due to the fact that CMAKE_CUDA_TOOLKIT_INCLUDE_DIRECTORIES is not defined by default.

Version

0.1.0, a99dde8

Environment

CUDA: cuda/10.2
Compiler: intel/llvm@6ca33e2 (compiled with CUDA support)

Steps to reproduce

Setup environment and use the command in "Summary".

Observed behavior

cmake configuration fails. (See "Summary".)

Expected behavior

$ cmake <...> -DENABLE_CUBLAS_BACKEND=ON  <...>
<...>
-- Found cuBLAS: /opt/nvidia/cuda/10.2/targets/x86_64-linux/include
<...>

Solution

The preferred method is to add enable_language(CUDA) to cmake/FindcuBLAS.cmake:

diff --git a/cmake/FindcuBLAS.cmake b/cmake/FindcuBLAS.cmake
index 06fe6fe..f79d571 100644
--- a/cmake/FindcuBLAS.cmake
+++ b/cmake/FindcuBLAS.cmake
@@ -18,6 +18,7 @@
 #=========================================================================
 
 find_package(CUDA 10.0 REQUIRED)
+enable_language(CUDA)
 find_path(CUBLAS_INCLUDE_DIR "cublas_v2.h" HINTS ${CMAKE_CUDA_TOOLKIT_INCLUDE_DIRECTORIES})
 get_filename_component(SYCL_BINARY_DIR ${CMAKE_CXX_COMPILER} DIRECTORY)
 # the OpenCL include file from cuda is opencl 1.1 and it is not compatible with DPC++

Killed by signal 9 Problem about running HPL test of MKL in intel/oneapi-hpckit docker container

I want to run hpl test script in docker container. I run the container like below the command:

docker run -d --privileged -it intel/oneapi-hpckit

Then, I enter the container, run the below command:

$ cd /opt/intel/oneapi
$ source setvars.sh --force
$ cd /opt/intel/oneapi/mkl/latest/benchmarks/mp_linpack
$ ./runme_intel64_dynamic

When i am running hpl test of mkl in intel/oneapi-hpckit docker container,it will occur error like the below:

image

My docker info like the below:
image

My HPL.dat file like the below:
image

Building problems

Hello.

I'm getting the following error in the building process, more specifically in the first cmake command.

CMake Error at /usr/share/cmake-3.16/Modules/FindPackageHandleStandardArgs.cmake:146 (message): Could NOT find CBLAS (missing: CBLAS_file)

I have built lapack from the link listed in the dependencies section of the instructions.

Does anyone know how I can fix this?

Enable context caching for cuBLAS backend to improve performance

Summary

As noted in #106, performance with cuBLAS backend is low and can be improved if CUDA context is cached.
Opening this issue as a tracker.

Version

Appears in latest.

Environment

cuBLAS backend

Steps to reproduce

See #106

Observed behavior

Running cuBLAS backend through oneMKL is slower than running cuBLAS directly.

Expected behavior

Running cuBLAS backend through oneMKL should match or perform very close to pure cuBLAS.

[CUDA] Add support for cuSparse

oneMKL provides support for sparse BLAS operations, and NVIDIA provides an interface to sparse BLAS operations optimized for their hardware in the cuSparse library. Much as oneMKL provides support for cuBLAS, I propose it add support for cuSparse.

The unified interface provided by oneMKL will be much more convenient for users interested in performance portability across platform sets that include NVIDIA GPUs.

The mechanisms and design work that are relevant for cuBLAS are likely to be reusable for this work. Because sparse interfaces are generally younger and more flexible, it may be that this is best done incrementally.

Enable using oneMKL with hipSYCL

Summary

With a set of Pull requests, we would like to upstream changes to enable the use of oneMKL BLAS with hipSYCL. The changes and added features encompass the following:

  • PR[#101] in case not compiling with dpc++ use of add_sycl_to_target cmake integration -
  • PR[#102] Add the option to disable the functions using half data-types
  • PR[#103] Use an additional layer of abstraction when invoking host-tasks for the cuBLAS backend
  • Add a hipSYCL specific cuBLAS context handler, and host-task invocation
  • PR[#100] PR[#104]PR[#105] Replace non-standard types and member function calls, eg half -> cl::sycl::half and .get_cl_code() -> .what() -

Problem statement

Our intention is to make oneMKL more available for the general SYCL community. These changes will allow easier integration with other SYCL implementation and enable adding rocm libraries to oneMKL.

In case the PRs from #100, #101, #102, #104, #105 are merged, adding the actual hipSYCL support could look like this: https://github.com/sbalint98/oneMKL/tree/ustream-hipsycl-specific-changes .

when 100-103 is merged into the current develop e8e3dab, all tests pass locally: int_test_oneapi.log

Static library version of libonemkl.so

Will a static library version of the dispatcher libonemkl.so (with the same PIC versions of the objects as in the shared library) be provided? I'm not sure if linking with a static library make sense for oneMKL, but even if it doesn't, this static library may be useful for making custom oneMKL shared libraries, similar to this for regular Intel MKL.

https://software.intel.com/content/www/us/en/develop/documentation/mkl-linux-developer-guide/top/linking-your-application-with-the-intel-math-kernel-library/building-custom-shared-objects.html

oneMKL full code examples

Hello.

Does anyone know where I can find full oneMKL code examples?

Intel MKL (not oneMKL) comes with some SYCL examples, however there are some discrepancies between their syntax and the oneMKL syntax as given in the oneAPI spec (https://spec.oneapi.com/versions/latest/index.html). Just to give an example, in the Intel MKL SYCL examples the mkl::sparse::init_matrix_handle function is used for initializing sparse matrix handlers while in the oneAPI specs the same is done with the onemkl::sparse::matrixInit function.

I'm little bit confused by this. What exactly is the difference between Intel MKL SYCL and oneMKL?

Add DFT support

Summary

Add support for discrete Fourier transform.

Problem statement

The onemkl api shows the API for the DFT. However, I'm unable to find the necessary mkl_dfti_sycl.hpp referenced there.

Preferred solution

An example of using onemkl with DFT.

CUDA_ERROR_ILLEGAL_ADDRESS when using level1 and higher level rutines in the same queue

Summary

When submitting level 1 and higher-level kernels in the same queue, for the cublas backend CUDA_ERROR_ILLEGAL_ADDRESS runtime error is thrown.

I believe this is due to the fact that for some of the level1 functions the pointer mode is set to CUBLAS_POINTER_MODE_DEVICE but it is never set back to the default value, CUBLAS_POINTER_MODE_HOST, therefore the device setting remains active for all subsequent calls with that cublas handle, which seems to cause problems. Adding the line cublasSetPointerMode(handle, CUBLAS_POINTER_MODE_HOST); to the respective functions resolves the issue.

The tests create a queue for every BLAS function, therefore this issue hasn't surfaced there, but it can be triggered with a simple test program.

Version

The current oneMKL develop head is used eg: 1ed12c7

Environment

  • HW you use
    Intel Gold 6130 CPU with Nvidia gtx1080 GPUs
  • Backend library version
    Cuda 10.0
    MKL, and TBB obtained via intel installer version 2021.2.0
  • OS name and version
    Ubuntu 20.04 (fakeroot singularity container)
  • Compiler version
    dpc++ compiler cloned from develop with hash: 4e26734cb87c451e0562559d5d6f83b7eabcaea3
    compiled with:
    buildbot/configure.py --cuda
    and buildbot/compile.py
  • CMake
    cmake.md

Steps to reproduce

Use the following simple test program:

#include "oneapi/mkl.hpp"
#include <iostream>
#include <CL/sycl.hpp>

int main(){
  std::vector<double> M = {1, 1, 1, 1};
  std::vector<double> y = {3, 4};
  std::vector<double> x = {1, 1};
  
  std::vector<double> x1 = {1,1};
  std::vector<double> x2 = {2,2};

  double result = -1;

  cl::sycl::buffer<double, 1> M_buffer = cl::sycl::buffer(M.data(), cl::sycl::range<1>(M.size()));
  cl::sycl::buffer<double, 1> y_buffer = cl::sycl::buffer(y.data(), cl::sycl::range<1>(y.size()));
  cl::sycl::buffer<double, 1> x_buffer = cl::sycl::buffer(x.data(), cl::sycl::range<1>(x.size())); 

  cl::sycl::buffer<double, 1> x1_buffer = cl::sycl::buffer(x1.data(), cl::sycl::range<1>(x1.size())); 
  cl::sycl::buffer<double, 1> x2_buffer = cl::sycl::buffer(x1.data(), cl::sycl::range<1>(x1.size())); 
  
  cl::sycl::buffer<double, 1> result_buffer = cl::sycl::buffer(&result, cl::sycl::range<1>(1)); 

 auto gpu_dev = sycl::device(sycl::gpu_selector());
 sycl::queue gpu_queue(gpu_dev);
 
 oneapi::mkl::backend_selector<oneapi::mkl::backend::cublas> gpu_selector(gpu_queue);
  
 oneapi::mkl::blas::column_major::dot(gpu_selector, 2, x1_buffer, 1, x2_buffer, 1, result_buffer);
 oneapi::mkl::blas::column_major::gemv(gpu_selector, oneapi::mkl::transpose::nontrans, 2, 2,
                                   1.0, M_buffer, 2, x_buffer, 1, 1.0, y_buffer, 1.0);
}

compile:
LD_LIBRARY_PATH=/home/sbalint/hipSYCL-main/dpc++-hand/llvm/build/install/lib/:/opt/hipSYCL/cuda/lib64:$LD_LIBRARY_PATH /home/sbalint/hipSYCL-main/dpc++-hand/llvm/build/install/bin/clang++ -fsycl -fsycl-targets=nvptx64-nvidia-cuda-sycldevice -I /home/sbalint/hipSYCL-main/oneMKL-install/include/ -L/home/sbalint/hipSYCL-main/oneMKL-install/lib/ -lonemkl_blas_cublas test.cpp
and run:
LD_LIBRARY_PATH=/home/sbalint/hipSYCL-main/dpc++-hand/llvm/build/install/lib/:/opt/hipSYCL/cuda/lib64:/home/sbalint/hipSYCL-main/oneMKL-install/lib/:$LD_LIBRARY_PATH ./a.out

Observed behavior

The following runtime error is displayed:

Singularity> LD_LIBRARY_PATH=/home/sbalint/hipSYCL-main/dpc++-hand/llvm/build/install/lib/:/opt/hipSYCL/cuda/lib64:/home/sbalint/hipSYCL-main/oneMKL-install/lib/:$LD_LIBRARY_PATH ./a.out 
Hello

PI CUDA ERROR:
        Value:           700
        Name:            CUDA_ERROR_ILLEGAL_ADDRESS
        Description:     an illegal memory access was encountered
        Function:        cuda_piEnqueueMemBufferRead
        Source Location: /root/hipSYCL-main/dpc++-hand/llvm/sycl/plugins/cuda/pi_cuda.cpp:2199


PI CUDA ERROR:
        Value:           700
        Name:            CUDA_ERROR_ILLEGAL_ADDRESS
        Description:     an illegal memory access was encountered
        Function:        wait
        Source Location: /root/hipSYCL-main/dpc++-hand/llvm/sycl/plugins/cuda/pi_cuda.cpp:447


PI CUDA ERROR:
        Value:           700
        Name:            CUDA_ERROR_ILLEGAL_ADDRESS
        Description:     an illegal memory access was encountered
        Function:        wait
        Source Location: /root/hipSYCL-main/dpc++-hand/llvm/sycl/plugins/cuda/pi_cuda.cpp:447


PI CUDA ERROR:
        Value:           700
        Name:            CUDA_ERROR_ILLEGAL_ADDRESS
        Description:     an illegal memory access was encountered
        Function:        wait
        Source Location: /root/hipSYCL-main/dpc++-hand/llvm/sycl/plugins/cuda/pi_cuda.cpp:447


PI CUDA ERROR:
        Value:           700
        Name:            CUDA_ERROR_ILLEGAL_ADDRESS
        Description:     an illegal memory access was encountered
        Function:        enqueueEventWait
        Source Location: /root/hipSYCL-main/dpc++-hand/llvm/sycl/plugins/cuda/pi_cuda.cpp:473


PI CUDA ERROR:
        Value:           700
        Name:            CUDA_ERROR_ILLEGAL_ADDRESS
        Description:     an illegal memory access was encountered
        Function:        _pi_event
        Source Location: /root/hipSYCL-main/dpc++-hand/llvm/sycl/plugins/cuda/pi_cuda.cpp:331


PI CUDA ERROR:
        Value:           700
        Name:            CUDA_ERROR_ILLEGAL_ADDRESS
        Description:     an illegal memory access was encountered
        Function:        wait
        Source Location: /root/hipSYCL-main/dpc++-hand/llvm/sycl/plugins/cuda/pi_cuda.cpp:447

Expected behavior

The program executes without errors

Wrong compiler lib directory inferred in VS16 toolset v142 intergration .props/.targets files

Summary

Part of directory path (relative) for compiler libraries hardcoded into MSBuild\Microsoft\VC\v160\Platforms\x64\PlatformToolsets\v142\ImportAfter\Intel.Libs.oneMKL.v142.props does not match actual installation directory of the compiler libraries.

Version

w_onemkl_p_2021.1.1.52_offline.exe

Environment

VS16.8.3
Building x64 target

More details

Right out of the box, after installing oneMKL package end enabling MKL in project options, the file Intel.Libs.oneMKL.v142.targets:42 issues an error about being unable to locate compiler library directory during a build attempt:

    <ICMessage Code="WRN001" Type="Warning" Arguments="oneMKLOmpLibDir;oneMKL" Condition="'$(oneMKLOmpLibDir)'==''" />

Is this the right repo to report the issue with toolchain glue files? There is an obvious typo in the .props file, easy to fix. But I cannot find the VS .props/.targets files anywhere in the open. Should I report the issue here? But since I'm already doing that, the original resolved directory is compiler/latest/windows/compiler/lib/intel64_win, the actual directory the compiler DLLs are installed is compiler/latest/windows/lib/x64.

The machine has never had any Intel MKL or related performance library product installed--maybe that's the reason? Also, I did not install any other packages except mentioned above.

Here's the fix to match the correct directory:

$ diff -U2 Intel.Libs.oneMKL.v142.props.orig Intel.Libs.oneMKL.v142.props
--- Intel.Libs.oneMKL.v142.props.orig   2021-01-06 23:40:58.705937900 -0800
+++ Intel.Libs.oneMKL.v142.props   2021-01-07 12:32:41.599610400 -0800
@@ -45,5 +45,5 @@
     <oneMKLIncludeDir>$([MSBuild]::GetRegistryValueFromView('HKEY_LOCAL_MACHINE\SOFTWARE\Intel\PerfLibSuites\$(_oneMKLSubKey)\oneMKL\$(ICPlatform)', 'IncludeDir', null, RegistryView.Registry32))</oneMKLIncludeDir>
     <oneMKLLibDir>$([MSBuild]::GetRegistryValueFromView('HKEY_LOCAL_MACHINE\SOFTWARE\Intel\PerfLibSuites\$(_oneMKLSubKey)\oneMKL\$(ICPlatform)', 'LibDir', null, RegistryView.Registry32))</oneMKLLibDir>
-    <_oneMKLOmpLibDir>$([System.IO.Path]::Combine($(oneMKLProductDir), ..\..\compiler\latest\windows\compiler\lib\$(IntelPlatform)_win))</_oneMKLOmpLibDir>
+    <_oneMKLOmpLibDir>$([System.IO.Path]::Combine($(oneMKLProductDir), ..\..\compiler\latest\windows\lib\$(PlatformTarget)))</_oneMKLOmpLibDir>
     <_oneMKLOmpLibDir>$([System.IO.Path]::GetFullPath($(_oneMKLOmpLibDir)))</_oneMKLOmpLibDir>
     <oneMKLOmpLibDir Condition="Exists('$(_oneMKLOmpLibDir)')">$(_oneMKLOmpLibDir)</oneMKLOmpLibDir>

I suspect it's not the right place to report the issue, but hope you might forward this internally--this VS integration script does not seem to be part of the OSS project. Thanks..

ROCm/HIP backend support for oneMKL BLAS domain

The current CUDA backend for BLAS domain seems to utilize "CL/sycl/backends/cuda.hpp" (cublas_scope_handle.hpp) to communicate with the CUDA runtime. But there doesn't seem to be any such equivalent header file on the "CL/sycl/backends" folder for supporting ROCm/HIP backend.

Given the context, what is the recommended way to add ROCm/HIP backend support for oneMKL domain libs?

Be able to setup oneMKL with the open source intel compiler ( https://github.com/intel/llvm )

Summary

Presently the oneMKL setup revolves around a full oneAPI installation. It would be nice to be able to easily setup and build oneMKL with just the open source version of the intel compiler for SYCL ( https://github.com/intel/llvm ).

The Intel compiler for SYCL is built with CMake, Ninja and Python 3, and it has the user configure the various dependencies and backends themselves (LevelZero, OpenCL, TBB) with instructions. Ideally the matching oneMKL setup could build atop that (rather than assuming Conan or sudo access).

Please provide an open source licensed backend option

While oneMKL is licensed under the Apache2 license, which is an open source license (thanks for that!), to the best of my knowledge both of the currently available backends are currently licensed under proprietary, non-open-source licenses: Intel MKL is licensed under the Intel Simplified Software License, while NVIDIA cuBLAS is licensed under NVIDIA's proprietary EULA.

It'd be great if there was an alternative backend option that was open source, like e.g. OpenBLAS. Thanks for the consideration and for your efforts in this project!

A better way to measure performance of SGEMM using cuBLAS backend

Summary

onMKL shows underwhelmed SGEMM performance for small matrices.

Version

oneMKL v0.2

Environment

  • HW you use: NVIDIA V100
  • Backend library version: CUDA/10.2
  • OS name and version: CentOS 7.4.1708
  • Compiler: Intel-llvm 2021-06-08

Steps to reproduce

I attached the source files required to measure the performance of sgemm (GFLOPS).
Timing is obtained using std;:chrono.

Observed behavior

For m = n = k = 4096, the observed performance is ~ 500 GFlops. In comparison, native cuBLAS archives up to 10 TFlops.
I think one possible reason is the high cost of creating cuBLAS context under the hood.
For native cuBLAS, the creation of context can be effectively excluded from cudaEventRecord().
Is there a better way to measure the performance of SGEMM with cuBLAS backend ?

Thanks
sgemm.zip

Building tests with both cuBLAS and cuRAND enabled hits compilation issues

Summary

Trying to build the tests with both cuBLAS and cuRAND backends enabled hit compilation issues.

Version

Using the latest version: 19c43b0

Environment

This should be pretty straightforward to reproduce in any environment but let me know if you need more details for this section.

Steps to reproduce

The issue happens if all of these are provided:

  • -DENABLE_CUBLAS_BACKEND=True
  • -DENABLE_CURAND_BACKEND=True
  • -DBUILD_FUNCTIONAL_TESTS=True

Observed behavior

This was discussed in #91 and #126 (comment), however it looks like that ticket fixed the issue for building one of them but not both of them at the same time.

TARGET_DOMAINS also doesn't seem to work properly with cuBLAS and cuRAND, it always gets reset here:

Because it's checking the specified target domains against a domain list that is only populated when using the MKLCPU or MKLGPU backends.

I tried hacking that part of the CMake to let a blas;rng target domain list go through, however this still seemed to fail when building the tests.

Expected behavior

I see two main possible outcomes for this:

  • Mark this as unsupported and make CMake throw out an error if someone is trying to build this configuration
  • Fix the build issue/CMake so that this combination is supported

Use of sycl::half, with CL/sycl.hpp SYCL header included

In #143, the half data types were replaced with sycl::half, while including the CL/sycl.hpp SYCL header. According to the SYCL specification section 4.3, when that header is included all SYCL types should exist inside the ::cl::sycl namespace.

Unfortunately, this change also breaks compilation with hipSYCL, since sycl::half is not defined. In order to solve this problem I see three possible solutions:

  1. Use, like for all other SYCL types, the ::cl::sycl namespace for half as well
  2. Move to the ::sycl namespace in case of all SYCL types, and include the sycl/sycl.hpp header instead of CL/sycl.hpp
  3. Add a hipSYCL specific workaround similar to #122 (comment)

I think the most consistent and least error-prone solution would be the first one. Can you give some feedback on what would be the preferred solution?

Compiling oneMKL (CUDA backends) on Windows parsing commands incorrectly

Summary

While the build README says that the CUDA backend is only supported on Linux, I wanted to naively try building it for Windows anyways - the oneAPI sycl compiler on llvm/clang built on Windows while only being tested on Linux, so I expected this to be pretty trivial as well. However, it looks like on the CMake build step, it is parsing some cl compile options as file/directories incorrectly.

Version

Latest develop commit.

Environment

oneMKL works with multiple HW and backend libraries and also depends on the
compiler and build environment. Include
the following information to help reproduce the issue:

  • HW you use: AMD 5900x, NVIDIA RTX 3090
  • Backend library version: CUDA Toolkit 11.5
  • OS name and version: Windows 11 Pro
  • Compiler version: oneAPI's DPC++ 2021-09 (latest on https://github.com/intel/llvm)
  • CMake output log: in Observed behavior section.

Steps to reproduce

Following the build guide on the README, but on Windows.

Observed behavior

CMake was able to configure the project (without the functional tests):

C:\Users\valen\Documents\GitHub\oneMKL\build>cmake .. -DCMAKE_C_COMPILER="C:\Users\valen\Documents\GitHub\sycl_workspace\llvm\build\bin\clang.exe" -DENABLE_CUBLAS_BACKEND=True -DENABLE_CURAND_BACKEND=True -DENABLE_MKLCPU_BACKEND=False -DENABLE_MKLGPU_BACKEND=False -DBUILD_FUNCTIONAL_TESTS=False
-- Building for: Visual Studio 17 2022
-- CMAKE_BUILD_TYPE: None, set to Release by default
-- The CXX compiler identification is MSVC 19.30.30706.0
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Tools/MSVC/14.30.30705/bin/Hostx64/x64/cl.exe - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- TARGET_DOMAINS: blas;rng
-- Configuring done
-- Generating done
-- Build files have been written to: C:/Users/valen/Documents/GitHub/oneMKL/build

Upon trying to build it however, I get the following output:

[1/3] Building CXX object bin/blas/CMakeFiles/onemkl_blas.dir/blas_loader.cpp.obj
FAILED: bin/blas/CMakeFiles/onemkl_blas.dir/blas_loader.cpp.obj
C:\Users\valen\Documents\GitHub\sycl_workspace\llvm\build\bin\clang++.exe -fsycl /nologo  -IC:/Users/valen/Documents/GitHub/oneMKL/include -IC:/Users/valen/Documents/GitHub/oneMKL/src -IC:/Users/valen/Documents/GitHub/oneMKL/src/include -IC:/Users/valen/Documents/GitHub/oneMKL/build/bin /EHsc -Wno-unused-function -w -DSYCL2020_DISABLE_DEPRECATION_WARNINGS -O3 -DNDEBUG -D_DLL -D_MT -Xclang --dependent-lib=msvcrt -Donemkl_EXPORTS -MD -MT bin/blas/CMakeFiles/onemkl_blas.dir/blas_loader.cpp.obj -MF bin\blas\CMakeFiles\onemkl_blas.dir\blas_loader.cpp.obj.d /Fobin/blas/CMakeFiles/onemkl_blas.dir/blas_loader.cpp.obj -c C:/Users/valen/Documents/GitHub/oneMKL/src/blas/blas_loader.cpp
clang++: error: no such file or directory: '/nologo'
clang++: error: no such file or directory: '/EHsc'
clang++: error: no such file or directory: '/Fobin/blas/CMakeFiles/onemkl_blas.dir/blas_loader.cpp.obj'
[2/3] Building CXX object bin/rng/CMakeFiles/onemkl_rng.dir/rng_loader.cpp.obj
FAILED: bin/rng/CMakeFiles/onemkl_rng.dir/rng_loader.cpp.obj
C:\Users\valen\Documents\GitHub\sycl_workspace\llvm\build\bin\clang++.exe -fsycl /nologo  -IC:/Users/valen/Documents/GitHub/oneMKL/include -IC:/Users/valen/Documents/GitHub/oneMKL/src -IC:/Users/valen/Documents/GitHub/oneMKL/src/include -IC:/Users/valen/Documents/GitHub/oneMKL/build/bin /EHsc -Wno-unused-function -w -DSYCL2020_DISABLE_DEPRECATION_WARNINGS -O3 -DNDEBUG -D_DLL -D_MT -Xclang --dependent-lib=msvcrt -Donemkl_EXPORTS -MD -MT bin/rng/CMakeFiles/onemkl_rng.dir/rng_loader.cpp.obj -MF bin\rng\CMakeFiles\onemkl_rng.dir\rng_loader.cpp.obj.d /Fobin/rng/CMakeFiles/onemkl_rng.dir/rng_loader.cpp.obj -c C:/Users/valen/Documents/GitHub/oneMKL/src/rng/rng_loader.cpp
clang++: error: no such file or directory: '/nologo'
clang++: error: no such file or directory: '/EHsc'
clang++: error: no such file or directory: '/Fobin/rng/CMakeFiles/onemkl_rng.dir/rng_loader.cpp.obj'
ninja: build stopped: subcommand failed.

Expected behavior

For the backend to build cleanly, without any errors.

API to return oneMKL version number?

Summary

Should there be an API, callable from both dispatcher (libonemkl.so) and underlying backend libraries to report the oneMKL version supported by each library?

Problem statement

Using oneMKL could involve libraries from multiple vendors (or other sources) to provide backend libraries for different devices. How can a user easily detect if the versions involved are inconsistent with one another? This could occur if one backend has not been updated in line with the dispatcher and other backends, or because of a misconfiguration of LD_LIBRARY_PATH, etc when multiple versions are installed on the system.

Preferred solution

One possible solution would be a onemkl::utility namespace with a routine to return the MAJOR, MINOR and PATCH numbers, e.g. version_number(major,minor,patch). With auto backend selection, the dispatcher would need to have its own version of the routine too.

I'm assuming all header files would be at the top level. If not, inconsistent versions would need to be considered there too. Providing a simple program to call these routines to check consistency would also be helpful.

/opt/intel/onemkl-cublas/tests/unit_tests/blas/extensions/../include/reference_blas_templates.hpp:224:18: error: unknown type name 'CBLAS_LAYOUT'

Summary

Everything fails like this:

In file included from /opt/intel/onemkl-cublas/tests/unit_tests/blas/extensions/gemm_bias.cpp:32:
/opt/intel/onemkl-cublas/tests/unit_tests/blas/extensions/../include/onemkl_blas_helper.hpp:64:8: error: unknown type name 'CBLAS_LAYOUT'
inline CBLAS_LAYOUT convert_to_cblas_layout(oneapi::mkl::layout is_column) {
       ^
/opt/intel/onemkl-cublas/tests/unit_tests/blas/extensions/../include/onemkl_blas_helper.hpp:65:61: error: use of undeclared identifier 'CBLAS_LAYOUT'
    return is_column == oneapi::mkl::layout::column_major ? CBLAS_LAYOUT::CblasColMajor
                                                            ^

Version

jehammond@dgx-a100-math:/opt/intel/onemkl-cublas/build$ git log -n1
commit 3cb60dd57e5606b71c9fd7e53d42a6999ceb9082 (HEAD -> develop, origin/develop)
Author: Andrew T. Barker <[email protected]>
Date:   Tue Dec 14 00:22:55 2021 +0000

    Fix empty kernel name issue for old and new compilers. (#150)

Environment

oneMKL works with multiple HW and backend libraries and also depends on the
compiler and build environment. Include
the following information to help reproduce the issue:

  • HW you use
    DGX A100 station: AMD 7742 and NVIDIA A100
  • Backend library version
    ?
  • OS name and version
jehammond@dgx-a100-math:/opt/intel/onemkl-cublas$ uname -a 
Linux dgx-a100-math 5.4.0-91-generic #102-Ubuntu SMP Fri Nov 5 16:31:28 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
  • Compiler version

DPC++

jehammond@dgx-a100-math:/opt/intel/onemkl-cublas/build$ /opt/intel/dpcpp-cuda/build/install/bin/clang++  --version
clang version 14.0.0 (https://github.com/intel/llvm.git 7b7e044cc73977f5a6a3d434487252bcd45ae3da)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/intel/dpcpp-cuda/build/install/bin

GCC

jehammond@dgx-a100-math:/opt/intel/onemkl-cublas$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/9/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none:hsa
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 9.3.0-17ubuntu1~20.04' --with-bugurl=file:///usr/share/doc/gcc-9/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-9 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none=/build/gcc-9-HskZEa/gcc-9-9.3.0/debian/tmp-nvptx/usr,hsa --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04) 
  • CMake output log
jehammond@dgx-a100-math:/opt/intel/onemkl-cublas/build$ cmake .. -DCMAKE_CXX_COMPILER=/opt/intel/dpcpp-cuda/build/install/bin/clang++ \
>          -DCMAKE_C_COMPILER=/opt/intel/dpcpp-cuda/build/install/bin/clang \
>          -DENABLE_CUBLAS_BACKEND=True  \
>          -DENABLE_MKLCPU_BACKEND=False \
>          -DENABLE_MKLGPU_BACKEND=False \
>          -DREF_BLAS_ROOT=/opt/intel/onemkl-cublas/lapack/build/lib
-- CMAKE_BUILD_TYPE: None, set to Release by default
-- The CXX compiler identification is Clang 14.0.0
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /opt/intel/dpcpp-cuda/build/install/bin/clang++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- TARGET_DOMAINS: blas
-- Looking for dpc++
-- Performing Test is_dpcpp
-- Performing Test is_dpcpp - Success
-- Looking for C++ include pthread.h
-- Looking for C++ include pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE  
-- Found CUDA: /usr/local/cuda-11.4 (found suitable version "11.4", minimum required is "10.0") 
-- Found cuBLAS: /usr/local/cuda-11.4/include  
-- The C compiler identification is Clang 14.0.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /opt/intel/dpcpp-cuda/build/install/bin/clang - skipped
-- Detecting C compile features
-- Detecting C compile features - done
CMake Deprecation Warning at deps/googletest/CMakeLists.txt:53 (cmake_minimum_required):
  Compatibility with CMake < 2.8.12 will be removed from a future version of
  CMake.

  Update the VERSION argument <min> value or use a ...<max> suffix to tell
  CMake that the project does not need compatibility with older versions.


-- Found PythonInterp: /usr/bin/python (found version "3.8.10") 
-- Found CBLAS: /opt/intel/onemkl-cublas/lapack/build/lib/libcblas.so  
-- Found CBLAS: /opt/intel/onemkl-cublas/lapack/build/lib/libblas.so  
-- Found CBLAS: /usr/include/x86_64-linux-gnu  
-- Configuring done
-- Generating done
-- Build files have been written to: /opt/intel/onemkl-cublas/build
jehammond@dgx-a100-math:/opt/intel/onemkl-cublas/build$ cmake --build .
[  0%] Building CXX object bin/blas/CMakeFiles/onemkl_blas.dir/blas_loader.cpp.o

[  0%] Built target onemkl_blas
[  1%] Linking CXX shared library ../lib/libonemkl.so
[  1%] Built target onemkl
[  2%] Building CXX object bin/blas/backends/cublas/CMakeFiles/onemkl_blas_cublas_obj.dir/cublas_level1.cpp.o
[  2%] Building CXX object bin/blas/backends/cublas/CMakeFiles/onemkl_blas_cublas_obj.dir/cublas_level2.cpp.o

[  3%] Building CXX object bin/blas/backends/cublas/CMakeFiles/onemkl_blas_cublas_obj.dir/cublas_level3.cpp.o
[  3%] Building CXX object bin/blas/backends/cublas/CMakeFiles/onemkl_blas_cublas_obj.dir/cublas_batch.cpp.o
[  3%] Building CXX object bin/blas/backends/cublas/CMakeFiles/onemkl_blas_cublas_obj.dir/cublas_extensions.cpp.o
[  4%] Building CXX object bin/blas/backends/cublas/CMakeFiles/onemkl_blas_cublas_obj.dir/cublas_scope_handle.cpp.o
[  4%] Building CXX object bin/blas/backends/cublas/CMakeFiles/onemkl_blas_cublas_obj.dir/cublas_wrappers.cpp.o
[  4%] Built target onemkl_blas_cublas_obj
[  4%] Linking CXX shared library ../../../../lib/libonemkl_blas_cublas.so
[  4%] Built target onemkl_blas_cublas
[  4%] Building CXX object deps/googletest/CMakeFiles/gtest.dir/src/gtest-all.cc.o
[  5%] Linking CXX shared library ../../lib/libgtest.so
[  5%] Built target gtest
[  5%] Building CXX object deps/googletest/CMakeFiles/gtest_main.dir/src/gtest_main.cc.o
[  5%] Linking CXX shared library ../../lib/libgtest_main.so
[  5%] Built target gtest_main
[  5%] Building CXX object tests/unit_tests/blas/extensions/CMakeFiles/blas_extensions_ct.dir/gemm_bias.cpp.o
In file included from /opt/intel/onemkl-cublas/tests/unit_tests/blas/extensions/gemm_bias.cpp:32:
/opt/intel/onemkl-cublas/tests/unit_tests/blas/extensions/../include/onemkl_blas_helper.hpp:64:8: error: unknown type name 'CBLAS_LAYOUT'
inline CBLAS_LAYOUT convert_to_cblas_layout(oneapi::mkl::layout is_column) {
       ^
/opt/intel/onemkl-cublas/tests/unit_tests/blas/extensions/../include/onemkl_blas_helper.hpp:65:61: error: use of undeclared identifier 'CBLAS_LAYOUT'
    return is_column == oneapi::mkl::layout::column_major ? CBLAS_LAYOUT::CblasColMajor
                                                            ^
/opt/intel/onemkl-cublas/tests/unit_tests/blas/extensions/../include/onemkl_blas_helper.hpp:66:61: error: use of undeclared identifier 'CBLAS_LAYOUT'
                                                          : CBLAS_LAYOUT::CblasRowMajor;
                                                            ^
In file included from /opt/intel/onemkl-cublas/tests/unit_tests/blas/extensions/gemm_bias.cpp:33:
/opt/intel/onemkl-cublas/tests/unit_tests/blas/extensions/../include/reference_blas_templates.hpp:118:41: error: unknown type name 'CBLAS_LAYOUT'
static inline void copy_mat(T_src &src, CBLAS_LAYOUT layout, CBLAS_TRANSPOSE trans, int row,
                                        ^
/opt/intel/onemkl-cublas/tests/unit_tests/blas/extensions/../include/reference_blas_templates.hpp:138:41: error: unknown type name 'CBLAS_LAYOUT'
static inline void copy_mat(T_src &src, CBLAS_LAYOUT layout, CBLAS_TRANSPOSE trans, int row,
                                        ^
/opt/intel/onemkl-cublas/tests/unit_tests/blas/extensions/../include/reference_blas_templates.hpp:158:41: error: unknown type name 'CBLAS_LAYOUT'
static inline void copy_mat(T_src &src, CBLAS_LAYOUT layout, int row, int col, int ld,
                                        ^
/opt/intel/onemkl-cublas/tests/unit_tests/blas/extensions/../include/reference_blas_templates.hpp:195:41: error: unknown type name 'CBLAS_LAYOUT'
static inline void update_c(T_src &src, CBLAS_LAYOUT layout, CBLAS_UPLO upper_lower, int row,
                                        ^
/opt/intel/onemkl-cublas/tests/unit_tests/blas/extensions/../include/reference_blas_templates.hpp:224:18: error: unknown type name 'CBLAS_LAYOUT'
static void gemm(CBLAS_LAYOUT layout, CBLAS_TRANSPOSE transa, CBLAS_TRANSPOSE transb, const int *m,
                 ^
/opt/intel/onemkl-cublas/tests/unit_tests/blas/extensions/../include/reference_blas_templates.hpp:229:11: error: unknown type name 'CBLAS_LAYOUT'
void gemm(CBLAS_LAYOUT layout, CBLAS_TRANSPOSE transa, CBLAS_TRANSPOSE transb, const int *m,
          ^
/opt/intel/onemkl-cublas/tests/unit_tests/blas/extensions/../include/reference_blas_templates.hpp:261:11: error: unknown type name 'CBLAS_LAYOUT'
void gemm(CBLAS_LAYOUT layout, CBLAS_TRANSPOSE transa, CBLAS_TRANSPOSE transb, const int *m,
          ^
/opt/intel/onemkl-cublas/tests/unit_tests/blas/extensions/../include/reference_blas_templates.hpp:268:11: error: unknown type name 'CBLAS_LAYOUT'
void gemm(CBLAS_LAYOUT layout, CBLAS_TRANSPOSE transa, CBLAS_TRANSPOSE transb, const int *m,
          ^
/opt/intel/onemkl-cublas/tests/unit_tests/blas/extensions/../include/reference_blas_templates.hpp:275:11: error: unknown type name 'CBLAS_LAYOUT'
void gemm(CBLAS_LAYOUT layout, CBLAS_TRANSPOSE transa, CBLAS_TRANSPOSE transb, const int *m,
          ^
/opt/intel/onemkl-cublas/tests/unit_tests/blas/extensions/../include/reference_blas_templates.hpp:283:11: error: unknown type name 'CBLAS_LAYOUT'
void gemm(CBLAS_LAYOUT layout, CBLAS_TRANSPOSE transa, CBLAS_TRANSPOSE transb, const int *m,
          ^
/opt/intel/onemkl-cublas/tests/unit_tests/blas/extensions/../include/reference_blas_templates.hpp:292:18: error: unknown type name 'CBLAS_LAYOUT'
static void gemm(CBLAS_LAYOUT layout, CBLAS_TRANSPOSE transa, CBLAS_TRANSPOSE transb, const int *m,
                 ^
/opt/intel/onemkl-cublas/tests/unit_tests/blas/extensions/../include/reference_blas_templates.hpp:297:11: error: unknown type name 'CBLAS_LAYOUT'
void gemm(CBLAS_LAYOUT layout, CBLAS_TRANSPOSE transa, CBLAS_TRANSPOSE transb, const int *m,
          ^
/opt/intel/onemkl-cublas/tests/unit_tests/blas/extensions/../include/reference_blas_templates.hpp:320:11: error: unknown type name 'CBLAS_LAYOUT'
void gemm(CBLAS_LAYOUT layout, CBLAS_TRANSPOSE transa, CBLAS_TRANSPOSE transb, const int *m,
          ^
/opt/intel/onemkl-cublas/tests/unit_tests/blas/extensions/../include/reference_blas_templates.hpp:344:18: error: unknown type name 'CBLAS_LAYOUT'
static void symm(CBLAS_LAYOUT layout, CBLAS_SIDE left_right, CBLAS_UPLO uplo, const int *m,
                 ^
/opt/intel/onemkl-cublas/tests/unit_tests/blas/extensions/../include/reference_blas_templates.hpp:349:11: error: unknown type name 'CBLAS_LAYOUT'
void symm(CBLAS_LAYOUT layout, CBLAS_SIDE left_right, CBLAS_UPLO uplo, const int *m, const int *n,
          ^
/opt/intel/onemkl-cublas/tests/unit_tests/blas/extensions/../include/reference_blas_templates.hpp:356:11: error: unknown type name 'CBLAS_LAYOUT'
void symm(CBLAS_LAYOUT layout, CBLAS_SIDE left_right, CBLAS_UPLO uplo, const int *m, const int *n,
          ^
fatal error: too many errors emitted, stopping now [-ferror-limit=]
20 errors generated.
make[2]: *** [tests/unit_tests/blas/extensions/CMakeFiles/blas_extensions_ct.dir/build.make:76: tests/unit_tests/blas/extensions/CMakeFiles/blas_extensions_ct.dir/gemm_bias.cpp.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:794: tests/unit_tests/blas/extensions/CMakeFiles/blas_extensions_ct.dir/all] Error 2
make: *** [Makefile:146: all] Error 2

Steps to reproduce

Please check that the issue is reproducible with the latest revision on
master. Include all the steps to reproduce the issue.

cd /opt/intel/onemkl-cublas
git clone https://github.com/Reference-LAPACK/lapack
mkdir -p build && cd build && git clean -dfx
cmake .. -G Ninja -DBUILD_SHARED_LIBS=ON -DCBLAS=ON -DCMAKE_Fortran_COMPILER=gfortran -DCMAKE_C_COMPILER=gcc 
cmake --build .

# oneMKL CUBLAS

cd /opt/intel/onemkl-cublas
mkdir -p build && cd build && git clean -dfx
cmake .. -DCMAKE_CXX_COMPILER=/opt/intel/dpcpp-cuda/build/install/bin/clang++ \
         -DCMAKE_C_COMPILER=/opt/intel/dpcpp-cuda/build/install/bin/clang \
         -DENABLE_CUBLAS_BACKEND=True  \
         -DENABLE_MKLCPU_BACKEND=False \
         -DENABLE_MKLGPU_BACKEND=False \
         -DREF_BLAS_ROOT=/opt/intel/onemkl-cublas/lapack/build/lib
cmake --build .
#ctest
#cmake --install . --prefix /opt/intel/onemkl-cublas/install

Reference i?amin produce unexpected results with NaN input

Summary

When compiled with recent dpcpp compilers, the cblas_i?amin routines in src/blas/backends/netlib/netlib_level1.cpp and the iamin routines in tests/unit_tests/blas/include/reference_blas_templates.hpp produce incorrect results when there are NaNs in the input, because std::isnan always returns true for dpcpp by default unless you use -fp-model=precise.

Version

Current develop branch.

Steps to reproduce

Using the cblas_isamin function from netlib_level1.cpp, compile and run the following test:

int main(int argc, char * argv[]) {
    float * x = (float*) malloc(3 * sizeof(float));
    int idx = 0;

    auto check_three = [&](float a, float b, float c, int expected) -> int {
        x[0] = a; x[1] = b; x[2] = c;
        idx = cblas_isamin(3, x, 1);
        std::cout << "    idx: " << idx << ", expected: " << expected << std::endl;
        if (idx != expected) return 1;
        return 0;
    };

    int result = 0;
    result += check_three(fnan, -0.5, 1.0, 0);
    result += check_three(-0.5, fnan, 1.0, 1);
    result += check_three(-0.5, 1.0, fnan, 2);
    result += check_three(0.0, fnan, 1.0, 1);
    result += check_three(1.0, fnan, 0.0, 1);
    result += check_three(-0.3, 2.1, fnan, 2);
    result += check_three(2.1, -0.3, fnan, 2);
    result += check_three(fnan, -0.3, 2.1, 0);
    result += check_three(fnan, fnan, 1.0, 0);
    result += check_three(fnan, 1.0, fnan, 0);
    result += check_three(1.0, fnan, fnan, 1);
    if (result) std::cout << "  FAILED!\n"; 

    free(x);
    return (result != 0);
}

Observed/expected behavior

Output of build and run:

issue.cpp:17:29: warning: comparison with NaN always evaluates to false in fast floating point modes [-Wtautological-constant-compare]
        bool is_first_nan = std::isnan(curr_val) && !std::isnan(min_val);
                            ^~~~~~~~~~~~~~~~~~~~
issue.cpp:17:54: warning: comparison with NaN always evaluates to false in fast floating point modes [-Wtautological-constant-compare]
        bool is_first_nan = std::isnan(curr_val) && !std::isnan(min_val);
                                                     ^~~~~~~~~~~~~~~~~~~
2 warnings generated.
    idx: 0, expected: 0
    idx: 0, expected: 1
    idx: 0, expected: 2
    idx: 0, expected: 1
    idx: 2, expected: 1
    idx: 0, expected: 2
    idx: 1, expected: 2
    idx: 0, expected: 0
    idx: 0, expected: 0
    idx: 0, expected: 0
    idx: 0, expected: 1
  FAILED!

oneMKL runtime API tests fail with seg fault on GPU with Level0 driver

Summary

oneMKL runtime API tests fail with seg fault on GPU when DPC++ compiler uses Level0 as a backend (default behavior)

Version

The problem appeared with Intel oneMKL and DPC++ compiler update to 2021.1.

Environment

The issue can be reproduced with:

  • HW: Intel GPU (Gen9)
  • OS: Ubuntu 18.04
  • Intel oneMKL version: 2021.1
  • Intel DPC++ Compiler version: 2021.1

Steps to reproduce

On the machine with Intel GPU and Level0 driver run oneMKL tests:

$> mkdir build && cd build
$> cmake .. && cmake --build . -j 24 && ctest
...
        Start   1: BLAS/RT/GemmTestSuite/GemmTests.HalfHalfFloatPrecision/Column_Major_Intel_R__Core_TM__i7_6770HQ_CPU___2_60GHz
  1/664 Test   #1: BLAS/RT/GemmTestSuite/GemmTests.HalfHalfFloatPrecision/Column_Major_Intel_R__Core_TM__i7_6770HQ_CPU___2_60GHz ........................   Passed    0.44 sec
        Start   2: BLAS/RT/GemmTestSuite/GemmTests.HalfHalfFloatPrecision/Row_Major_Intel_R__Core_TM__i7_6770HQ_CPU___2_60GHz
  2/664 Test   #2: BLAS/RT/GemmTestSuite/GemmTests.HalfHalfFloatPrecision/Row_Major_Intel_R__Core_TM__i7_6770HQ_CPU___2_60GHz ...........................   Passed    0.43 sec
        Start   3: BLAS/RT/GemmTestSuite/GemmTests.HalfHalfFloatPrecision/Column_Major_Intel_R__Graphics_Gen9__0x193b_
  3/664 Test   #3: BLAS/RT/GemmTestSuite/GemmTests.HalfHalfFloatPrecision/Column_Major_Intel_R__Graphics_Gen9__0x193b_ ..................................***Exception: SegFault  2.72 sec
        Start   4: BLAS/RT/GemmTestSuite/GemmTests.HalfHalfFloatPrecision/Row_Major_Intel_R__Graphics_Gen9__0x193b_
  4/664 Test   #4: BLAS/RT/GemmTestSuite/GemmTests.HalfHalfFloatPrecision/Row_Major_Intel_R__Graphics_Gen9__0x193b_ .....................................***Exception: SegFault  2.10 sec

Observed behavior

Tests with runtime dispatching report correct results but they fail on the final step when libraries are unloaded. Looks like Level0 is unloaded before Intel oneMKL that still uses it.

Expected behavior

All tests are passed

Dense linear algebra functions need encapsulations for matrices and vectors

Summary

oneMKL's dense linear algebra functions should provide or use classes that encapsulate matrices and vectors, instead of taking vectors and matrices as 1-D sycl::buffer or raw pointers. This would improve memory safety and usability, and make the interface more idiomatically C++. It would also better align with the various linear algebra proposals currently being considered for the C++ Standard Library.

Problem statement

1-D sycl::buffer does not correctly encapsulate matrices or vectors

The current oneMKL dense BLAS interface takes vectors and matrices in two different ways:

  1. as 1-D sycl::buffer (e.g., sycl::buffer<double, 1>); and,
  2. as raw pointers (e.g., double*).

Both overloads take the dimensions and strides as separate, integer arguments. Using raw pointers has the same memory safety issues as the C BLAS interface; for discussion, see P1674. Using 1-D sycl::buffer, with separate integer dimension and stride arguments, has the following issues:

  1. It's as memory unsafe as the C BLAS interface, but more verbose.
  2. It discards dimension(s) that sycl::buffer already stores, in favor of extra integer arguments that might be incorrect.
  3. Matrices are 2-D objects, and sycl::buffer<T, 2> exists, yet the interface takes matrices as sycl::buffer<T, 1>.
  4. "Batched" interfaces like gemm_batch compound the issue by adding another dimension.

Using sycl::buffer directly as a matrix or vector interface would offer less functionality than the current oneMKL interface. This is because sycl::buffer always expresses contiguous memory, but (C and Fortran) BLAS functions can work with strided, possibly noncontiguous memory. There is a way to create a "sub-buffer" of an existing sycl::buffer, but the sub-buffer must also be contiguous.
(See Chapter 4.7.2 of the SYCL 1.2.1 spec.) The proposed oneMKL interface already accepts strided, possibly noncontiguous memory, just like the C or Fortran BLAS.

basic_mdspan could replace the raw pointers interface, but not the sycl::buffer interface

It would be tempting to use basic_mdspan in place of sycl::buffer. That would let callers express all the different strided or contiguous matrix or vector layouts that the C BLAS can already express. It would also make this proposal nearly a subset of a pending C++ Standard Library proposal.

This would work perfectly well for the raw pointers interface, but it would not work for the sycl::buffer interface. The problem is that basic_mdspan is a "view" in the C++ Standard Library sense. Views do not own their storage. This means that views can't (or shouldn't really) do the things that SYCL needs to do with buffers, such as possibly allocate temporary storage on host or device, or track data dependencies.

Preferred solution

Replace the raw pointer interface with a basic_mdspan interface

Using basic_mdspan instead of raw pointers would be more expressive, easier to use, and less error prone. It would be no less accessible from other programming languages than oneMKL's current interface. (The current interface is not an extern "C" interface; it uses namespaces and class references.)

Consider basic_mdarray as a sycl::buffer wrapper

Please consider whether basic_mdarray, the container variant of basic_mdspan, could correctly represent matrices or vectors backed by sycl::buffer storage. (Note how oneMKL's functions all take sycl::buffer by reference. On the other hand, sycl::buffer is reference counted, while basic_mdarray's copy construction and copy assignment deep-copy just like std::vector and the other C++ Standard Library containers.) If not, please consider whether basic_mdarray could be changed to fix this, or whether some other container or buffer type would be appropriate. If the latter, perhaps this type could benefit from standardization, in SYCL and/or in the C++ Standard Library.

hipRAND backend

Summary

We have Intel CPU, GPU and cuRAND backends for the RNG domain. Let's do one more and support AMD GPUs as well. Note that this backend would require:

  • ROCm >= 3.5
  • HIP >= 3.5
  • llvm >= 11.0.0 (with AMDGPU target)
  • hipSYCL >= 0.9.0

Problem statement

oneMKL is missing RNG (and BLAS) for AMD GPUs.

Preferred solution

Add support for hipRAND. Given the uncanny resemblance to cuRAND, I expect this will be a trivial amount of work.

Use LLVM libomp as the threading layer

Hi,

I am using Intel oneAPI 2021.3 and it comes with a MKLConfig.cmake file that allows customized importing of MKL::MKL target.

However, when on macOS with brew installed llvm and libomp, the MKLConfig.cmake does not recognize the libomp as an option for the threading layer:

-- MKL_ARCH: intel64
-- MKL_LINK: static
-- MKL_INTERFACE_FULL: intel_lp64
CMake Error at /usr/local/lib/cmake/mkl-2021.3.0/MKLConfig.cmake:154 (message):
  Invalid MKL_THREADING `gnu_thread`, options are: sequential intel_thread
  tbb_thread
Call Stack (most recent call first):
  /usr/local/lib/cmake/mkl-2021.3.0/MKLConfig.cmake:326 (define_param)
  CMakeLists.txt:31 (find_package)

Inspecting the relevant section of code in MKLConfig.cmake, I found it checks only the GNU compiler, not LLVM:

<omitted>
if(DPCPP_COMPILER)
  set(DEFAULT_MKL_THREADING tbb_thread)
  list(REMOVE_ITEM MKL_THREADING_LIST intel_thread)
# C, Fortran API
elseif(PGI_COMPILER)
  # PGI compiler supports PGI OpenMP threading, additionally
  list(APPEND MKL_THREADING_LIST pgi_thread)
  # PGI compiler does not support TBB threading
  list(REMOVE_ITEM MKL_THREADING_LIST tbb_thread)
  if(WIN32)
    # PGI 19.10 and 20.1 on Windows, do not support Intel OpenMP threading
    list(REMOVE_ITEM MKL_THREADING_LIST intel_thread)
    set(DEFAULT_MKL_THREADING pgi_thread)
  endif()
elseif(GNU_C_COMPILER OR GNU_Fortran_COMPILER)
  list(APPEND MKL_THREADING_LIST gnu_thread)
else()
  # Intel and Microsoft compilers
  # Nothing to do, only for completeness
endif()
define_param(MKL_THREADING DEFAULT_MKL_THREADING MKL_THREADING_LIST)
<omitted>

Also, I found this online link line advisor https://software.intel.com/content/www/us/en/develop/tools/oneapi/components/onemkl/link-line-advisor.html
which does not provide the option of using LLVM Clang either.

So, my question is, is it supported to use the LLVM libomp + LLVM Clang + intel MKL? If so, what would be the link line?

Add several working examples

Summary

Provide several examples that can be built and run.

Problem statement

As it was mentioned in #142, readme example section doesn't cover all link lines and it does not reflect all latest changes in compilers behavior. At least one or two full examples that users can build and execute will improve the experience with the project. And also regular testing could show if examples need to be updated in case there is some changes in the latest compilers.

Preferred solution

Add examples folder with several examples and include them to the main build system.

Improper namespace for BLAS domain

Summary

Namespace scopes in oneAPI are of the form oneapi::mkl::domain, e.g. oneapi::mkl::blas and oneapi::mkl::rng. In oneMKL, the RNG domain libraries are of the form oneapi::mkl::rng::library, e.g. oneapi::mkl::rng::mklgpu.

Problem statement

The namespace convention in oneMKL differs between the BLAS and RNG domain libraries; where BLAS uses oneapi::mkl::library for library namespaces, RNG uses -- more appropriately -- oneapi::mkl::rng::library. This observation was made when reading through Integrating a Third-party Library to oneAPI Math Kernel Library (oneMKL) Interfaces:

python scripts/generate_backend_api.py include/oneapi/mkl/blas.hpp \                                  # Base header file
                                       include/oneapi/mkl/blas/detail/newlib/onemkl_blas_newlib.hpp \ # Output header file
                                       oneapi::mkl::newlib                                            # Wrappers namespace

where I believe it should be:

python scripts/generate_backend_api.py include/oneapi/mkl/blas.hpp \                                  # Base header file
                                       include/oneapi/mkl/blas/detail/newlib/onemkl_blas_newlib.hpp \ # Output header file
                                       oneapi::mkl::domain::newlib                                    # Wrappers namespace

With the former instruction, adding a new third-party library to the RNG domain generates an incorrect namespace, e.g. onemkl_rng_curand.hpp:

...
namespace oneapi {
namespace mkl {
namespace curand {} // namespace curand
} // namespace mkl
} // namespace oneapi

while it should be:

...
namespace oneapi {
namespace mkl {
namespace rng {
namespace curand {} // namespace curand
} // namespace rng
} // namespace mkl
} // namespace oneapi

Preferred solution

The BLAS domain should be namespace'd consistently with the RNG domain. That is, more suitable would be:

oneapi::mkl::mklcpu --> oneapi::mkl::blas::mklcpu
oneapi::mkl::mklgpu --> oneapi::mkl::blas::mklgpu
oneapi::mkl::netlib --> oneapi::mkl::blas::netlib
oneapi::mkl::cublas --> oneapi::mkl::blas::cublas

which matches the RNG domain:

oneapi::mkl::rng::mklcpu
oneapi::mkl::rng::mklgpu

This also better reflects the header file naming convention, e.g. onemkl_blas_netlib.hpp.

If this is an acceptable change, I will make a new PR.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.