rocm / hipblas Goto Github PK
View Code? Open in Web Editor NEWROCm BLAS marshalling library
Home Page: https://rocm.docs.amd.com/projects/hipBLAS/en/latest/index.html
License: Other
ROCm BLAS marshalling library
Home Page: https://rocm.docs.amd.com/projects/hipBLAS/en/latest/index.html
License: Other
Would "axpby" (https://oneapi-src.github.io/oneMKL/domains/blas/axpby.html) be added to hip/roc-BLAS ?
The minimal testing case has been attached as
igemm_all_in_one.cc.gz.
Can be compiled with g++ -I/usr/include/eigen3 -I/opt/rocm/include igemm_all_in_one.cc -Wl,-rpath,/opt/rocm/lib/ -L/opt/rocm/lib/ -lrocblas -lhipblas -lamdhip64 -o igemm_aiw
Hardware | description |
---|---|
GPU | gfx90a |
CPU | AMD EPYC 7542 32-Core Processor |
Software | version |
---|---|
ROCK | modinfo amdgpu|grep version shows version: 5.13.20.22.10 |
ROCR | v5.1.3 |
HCC | HIP version: 5.1.20532-f592a741 |
Library | v5.1.3 |
I tried to run an example that calls the hipblasHgemm function on a gfx908 GPU. The results seemed incorrect when comparing the results of the half- and single-precision versions. Thanks.
https://github.com/zjin-lcf/HeCBench/tree/master/src/mkl-sgemm-hip
HIP version: 6.0.32831-204d35d16
AMD clang version 17.0.0 (https://github.com/RadeonOpenCompute/llvm-project roc-6.0.2 24012 af27734ed982b52a9f1be0f035ac91726fc697e4)
hipcc -std=c++17 -Wall -O3 gemm.o -o main -lhipblas
./main 79 91 83 1000
Running with half precision real data type:
Average GEMM execution time: 542.991028 (us)
Outputting 2x2 block of A,B,C matrices:
A = [ 0, 1, ...
[ 1, 1, ...
[ ...
B = [ 1, 0, ...
[ 1, 1, ...
[ ...
C = [ 0, 0, ...
[ 0, 0, ...
[ ...
Running with single precision real data type:
Average GEMM execution time: 288.084442 (us)
Outputting 2x2 block of A,B,C matrices:
A = [ 0, 1, ...
[ 1, 1, ...
[ ...
B = [ 1, 0, ...
[ 1, 1, ...
[ ...
C = [ 92, 96, ...
[ 116, 112, ...
[ ...
For example, float *AP may be changed to const float *AP.
HIPBLAS_EXPORT hipblasStatus_t hipblasStrsm(hipblasHandle_t handle,
hipblasSideMode_t side,
hipblasFillMode_t uplo,
hipblasOperation_t transA,
hipblasDiagType_t diag,
int m,
int n,
const float* alpha,
float* AP,
int lda,
float* BP,
int ldb);
HIPBLAS_EXPORT hipblasStatus_t hipblasDtrsm(hipblasHandle_t handle,
hipblasSideMode_t side,
hipblasFillMode_t uplo,
hipblasOperation_t transA,
hipblasDiagType_t diag,
int m,
int n,
const double* alpha,
double* AP,
int lda,
double* BP,
int ldb);
cublasStatus_t cublasStrsm(cublasHandle_t handle,
cublasSideMode_t side, cublasFillMode_t uplo,
cublasOperation_t trans, cublasDiagType_t diag,
int m, int n,
const float *alpha,
const float *A, int lda,
float *B, int ldb)
cublasStatus_t cublasDtrsm(cublasHandle_t handle,
cublasSideMode_t side, cublasFillMode_t uplo,
cublasOperation_t trans, cublasDiagType_t diag,
int m, int n,
const double *alpha,
const double *A, int lda,
double *B, int ldb)
cublasStatus_t cublasCtrsm(cublasHandle_t handle,
cublasSideMode_t side, cublasFillMode_t uplo,
cublasOperation_t trans, cublasDiagType_t diag,
int m, int n,
const cuComplex *alpha,
const cuComplex *A, int lda,
cuComplex *B, int ldb)
cublasStatus_t cublasZtrsm(cublasHandle_t handle,
cublasSideMode_t side, cublasFillMode_t uplo,
cublasOperation_t trans, cublasDiagType_t diag,
int m, int n,
const cuDoubleComplex *alpha,
const cuDoubleComplex *A, int lda,
cuDoubleComplex *B, int ldb)
Background
“Error: There is no device can be used to do the computation”, while executing MXNet tests on HIP/CUDA (NVCC) path after integrating hipBLAS
Issue
Facing runtime issues on integrating hipBLAS to MXNet library . “error: There is no device can be used to do the computation”. (Test cases terminate after throwing this error)
Attached sample application which will reproduce the above mentioned error
hipblas_test.cpp.zip
Steps to reproduce
Analysis
Though the error indicates absence of device, the gpu is being detected. The image of nvidia-smi log has been attached.
hipBLAS TRMM functions hipblasStrmm
, hipblasDtrmm
, hipblasCtrmm
, hipblasZtrmm
do not match neither cublas TRMM functions, nor cublas TRMM _v2 functions.
For instance:
cublasStrmm:
void CUBLASWINAPI cublasStrmm(char side,
char uplo,
char transa,
char diag,
int m,
int n,
float alpha,
const float* A,
int lda,
float* B,
int ldb);
cublasStrmm_v2:
CUBLASAPI cublasStatus_t CUBLASWINAPI cublasStrmm_v2(cublasHandle_t handle,
cublasSideMode_t side,
cublasFillMode_t uplo,
cublasOperation_t trans,
cublasDiagType_t diag,
int m,
int n,
const float* alpha, /* host or device pointer */
const float* A,
int lda,
const float* B,
int ldb,
float* C,
int ldc);
hipblasStrmm:
HIPBLAS_EXPORT hipblasStatus_t hipblasStrmm(hipblasHandle_t handle,
hipblasSideMode_t side,
hipblasFillMode_t uplo,
hipblasOperation_t transA,
hipblasDiagType_t diag,
int m,
int n,
const float* alpha,
const float* AP,
int lda,
float* BP,
int ldb);
The same goes for rocBLAS analogues rocblas_strmm
, rocblas_dtrmm
, rocblas_ctrmm
, rocblas_ztrmm
(ROCm/rocBLAS#1265).
So, the above 4 hipBLAS and 4 rocBLAS functions are marked as HIP UNSUPPORTED
.
[Solution]
As far as hipBLAS doesn't support v1
BLAS functions, populate hipblas TRMM functions with two missing arguments: float* C
and int ldc
and revise functions' logic.
hip_complex.h
to provide hipFloatComplex & hipDoubleComplexhipblas.h
and hip/hip_complex.h
in any project:/// test.cpp
#include <hipblas.h>
#include <hip/hip_complex.h>
hipcc test.cpp
errors
Software | version |
---|---|
HCC | 3.0.19493-75ea952e-40756364719e |
Why does the hipBLAS library define complex numbers in a way which is incompatible with the rest of HIP?
There are far too many errors for even an example program. The first error is:
/opt/rocm/include/hipblas.h:53:36: error: typedef redefinition with different types ('hip_complex_number<float>' vs 'hipFloatComplex' (aka 'HIP_vector_type<float, 2>'))
typedef hip_complex_number<float> hipComplex;
^
/opt/rocm/hip/include/hip/hcc_detail/hip_complex.h:269:25: note: previous definition is here
typedef hipFloatComplex hipComplex;
^
It seems the lines are: https://github.com/ROCmSoftwarePlatform/hipBLAS/tree/develop/library/include#L53
Is there a reason why hipBLAS could not use the HIP standard complex number definition? Or is there some workaround to be able to use both at the same time?
Analogous to already implemented hipblasIdamax.
User should have control of whether they want to build for either platform, since you can't do both at once
You have no way to specify that the build is meant for HCC, if you have CUDA at all installed, for whatever reason, it tries to enable that, and if you were trying to build with HCC, the build fails
Try to build using HCC, for ROC, but also have CUDA installed on your system
Observed in Arch Linux whilst converting the scripts from Experimental ROC
I have CUDA installed and bumped into this.
I'm making a PR with what seems to be the simplest way to do this, I provided an additional TRY_CUDA
option. if that is OFF
, the find_package call is skipped. By default it is ON, so the legacy behaviour is maintained.
But really I feel like making this choice on the back of a find_package (and a find module that has already been deprecated anyway) is the wrong way to do this. Instead the correct would be a bit of a larger change and would include
set(HIPBLAS_BACKEND "hcc" CACHE STRING "Backend the hipBLAS build should use")
set_property(CACHE HIPBLAS_BACKEND PROPERTY STRINGS "hcc;cuda")
And then derive all the choices in inner branches of the build tree from querying this variable
Does the BLAS (e.g. hipblasIdamax) function support 64-bit interface ? Thanks.
https://www.intel.com/content/www/us/en/docs/onemkl/developer-reference-dpcpp/2023-0/iamax.html
The very last link of the README is broken.
I measured the performance of hipblasHgemm using _Float16 on MI 250. here are what I got:
N=K=M=8192: MI250: ~38 TFLOPs. Is it right? or I missed anything?
https://www.amd.com/en/products/server-accelerators/instinct-mi250x said that "Peak Half Precision (FP16) Performance is 383 TFLOPs"
Since I can't see a better place to list this issue
The hipblas package contains library, headers and CMake config files
example from rocm-4.3.0
# dpkg-query -L hipblas
/opt
/opt/rocm-4.3.0
/opt/rocm-4.3.0/hipblas
/opt/rocm-4.3.0/hipblas/include
/opt/rocm-4.3.0/hipblas/include/exceptions.hpp
/opt/rocm-4.3.0/hipblas/include/hipblas-export.h
/opt/rocm-4.3.0/hipblas/include/hipblas-version.h
/opt/rocm-4.3.0/hipblas/include/hipblas.h
/opt/rocm-4.3.0/hipblas/include/hipblas_module.f90
/opt/rocm-4.3.0/hipblas/lib
/opt/rocm-4.3.0/hipblas/lib/cmake
/opt/rocm-4.3.0/hipblas/lib/cmake/hipblas
/opt/rocm-4.3.0/hipblas/lib/cmake/hipblas/hipblas-config-version.cmake
/opt/rocm-4.3.0/hipblas/lib/cmake/hipblas/hipblas-config.cmake
/opt/rocm-4.3.0/hipblas/lib/cmake/hipblas/hipblas-targets-release.cmake
/opt/rocm-4.3.0/hipblas/lib/cmake/hipblas/hipblas-targets.cmake
/opt/rocm-4.3.0/hipblas/lib/libhipblas.so.0.1.40300
/opt/rocm-4.3.0/include
/opt/rocm-4.3.0/lib
/opt/rocm-4.3.0/lib/cmake
/opt/rocm-4.3.0/lib/cmake/hipblas
/opt/rocm-4.3.0/hipblas/lib/libhipblas.so
/opt/rocm-4.3.0/hipblas/lib/libhipblas.so.0
/opt/rocm-4.3.0/include/exceptions.hpp
/opt/rocm-4.3.0/include/hipblas-export.h
/opt/rocm-4.3.0/include/hipblas-version.h
/opt/rocm-4.3.0/include/hipblas.h
/opt/rocm-4.3.0/include/hipblas_module.f90
/opt/rocm-4.3.0/lib/cmake/hipblas/hipblas-config-version.cmake
/opt/rocm-4.3.0/lib/cmake/hipblas/hipblas-config.cmake
/opt/rocm-4.3.0/lib/cmake/hipblas/hipblas-targets-release.cmake
/opt/rocm-4.3.0/lib/cmake/hipblas/hipblas-targets.cmake
/opt/rocm-4.3.0/lib/libhipblas.so
/opt/rocm-4.3.0/lib/libhipblas.so.0
/opt/rocm-4.3.0/lib/libhipblas.so.0.1.40300
The hipblas package contains only the library itself without headers and CMake config
# dpkg-query -L hipblas
/opt
/opt/rocm-4.5.0
/opt/rocm-4.5.0/hipblas
/opt/rocm-4.5.0/hipblas/lib
/opt/rocm-4.5.0/hipblas/lib/libhipblas.so.0.1.40500
/opt/rocm-4.5.0/lib
/opt/rocm-4.5.0/hipblas/lib/libhipblas.so.0
/opt/rocm-4.5.0/lib/libhipblas.so.0
/opt/rocm-4.5.0/lib/libhipblas.so.0.1.40500
Install the latest hipblas
package from http://repo.radeon.com/rocm/apt/debian/ ubuntu main
Issue : While downstreaming mxnet code we have come across certain additional cuBLAS apis. What are the equivalent hip apis for these?
The list of cuBLAS apis :
cublasStrmm
cublasDtrmm
cublasSsyrk
cublasDsyrk
cublasGetMathMode
cublasSetMathMode
cublasSgemmEx
cublasGemmEx
The list of cuBLAS varibales :
cudaDataType:
CUDA_R_32F
CUDA_R_64F
CUDA_R_16F
CUDA_R_8U
CUDA_R_8I
CUDA_R_32I
cublasMath_t:
CUBLAS_DEFAULT_MATH
CUBLAS_TENSOR_OP_MATH
cublasGemmAlgo_t:
CUBLAS_GEMM_DFALT
#include "rocsolver.h"
in hipblas.cpp
resulting in failing build.-D BUILD_WITH_SOLVER=OFF
Hardware | description |
---|---|
GPU | device string |
CPU | device string |
Software | version |
---|---|
ROCK | v3.3-19 |
ROCR | v3.3-19 |
HCC | v3.1.20114-6776c83f-1 |
Library | master @ 29b3fb0 |
I suggest adding topics such as rocm
, blas
, cuda
in the About section at https://github.com/ROCm/hipBLAS
While installing on NVIDIA, hipBLAS searches for hipConfig.cmake but hipConfig.cmake will be generated only on HCC Platform while installing HIP.
This is not really a bug, but it seems kind of inconsistent. The hipblas .deb package is installed into /usr/include
and /usr/lib
, whereas most other rocm
packages install into /opt/rocm
.
The function should work without error. The same function call with the same inputs works on an Nvidia system using cuBLAS.
rocblas_status_invalid_pointer
error is returned.
This code fails on AMD platforms, but runs fine when using any Nvidia card (the HIP version and original CUDA/cuBLAS versions both work).
#include <iostream>
#include <vector>
#include <assert.h>
#include <hip/hip_runtime.h>
#include <hipblas.h>
void getri(hipblasHandle_t *handle, int *n, double *A, int *lda, int *ipiv, double *lwork, int *info);
int main(int argc, char **argv)
{
std::vector<double> const test{0.767135868133925, -0.641484652834663,
0.641484652834663, 0.767135868133926};
std::vector<int> ipiv(2);
std::vector<int> info(10);
std::vector<double> work(4);
int n = 2;
int lda = 4;
double *test_d, *work_d;
int *ipiv_d, *info_d;
// Create device copies of input matrix, ipiv, info, and work buffers
auto success = hipMalloc((void **)&ipiv_d, ipiv.size() * sizeof(int *));
assert(success == hipSuccess);
success = hipMemcpy(ipiv_d, ipiv.data(), ipiv.size() * sizeof(int), hipMemcpyHostToDevice);
assert(success == 0);
success = hipMalloc((void **)&info_d, info.size() * sizeof(int *));
assert(success == hipSuccess);
success = hipMemcpy(info_d, info.data(), info.size() * sizeof(int), hipMemcpyHostToDevice);
assert(success == 0);
success = hipMalloc((void **)&test_d, test.size() * sizeof(double));
assert(success == hipSuccess);
success = hipMemcpy(test_d, test.data(), test.size() * sizeof(double), hipMemcpyHostToDevice);
assert(success == 0);
success = hipMalloc((void **)&work_d, work.size() * sizeof(double));
assert(success == hipSuccess);
success = hipMemcpy(work_d, work.data(), work.size() * sizeof(double), hipMemcpyHostToDevice);
assert(success == 0);
// Make a hipblas handle
hipblasHandle_t handle;
auto hipblas_success = hipblasCreate(&handle);
assert(hipblas_success == HIPBLAS_STATUS_SUCCESS);
// Call hipBlasDgetriBatched
std::cout << "Running getri..\n";
getri(&handle, &n, test_d, &lda, ipiv_d, work_d, info_d);
success = hipDeviceSynchronize();
assert(success == 0);
std::cout << " -- after getri\n";
hipblas_success = hipblasDestroy(handle);
assert(hipblas_success == HIPBLAS_STATUS_SUCCESS);
std::cout << "Done\n";
return 0;
}
void getri(hipblasHandle_t *handle, int *n, double *A, int *lda, int *ipiv, double *lwork, int *info)
{
double **A_d;
double **work_d;
auto stat = hipMalloc((void **)&A_d, sizeof(double *));
assert(stat == 0);
stat = hipMemcpy(A_d, &A, sizeof(double *), hipMemcpyHostToDevice);
assert(stat == 0);
stat = hipMalloc((void **)&work_d, sizeof(double *));
assert(stat == 0);
stat = hipMemcpy(work_d, &lwork, sizeof(double *), hipMemcpyHostToDevice);
assert(stat == 0);
auto const success = hipblasDgetriBatched(*handle, *n, A_d, *lda, nullptr, work_d, *n, info, 1);
assert(success == 0);
}
Hardware | description |
---|---|
GPU | gfx906 (Vega 20) |
CPU | AMD EPYC 7452 |
System is using ROCM 4.2. See attached rocminfo.txt.
rocm_smi --showdriverversion
gives Driver version: 5.9.25
Running a CUDA program shows that cublasGemmEx supports compute type CUBLAS_COMPUTE_32F_FAST_TF32 and CUBLAS_GEMM_DEFAULT_TENSOR_OP. The type is not available in hipBLAS. Thank you for your discussion.
status = cublasGemmEx(handle, CUBLAS_OP_N, CUBLAS_OP_N, B_cols, A_rows, A_cols, &alpha, gpu_B, CUDA_R_32F, A_rows, gpu_A, CUDA_R_32F,
A_cols, &beta, gpu_C, CUDA_R_32F, A_rows, CUBLAS_COMPUTE_32F_FAST_TF32, CUBLAS_GEMM_DEFAULT_TENSOR_OP);
$ sudo apt update && sudo apt install hipblas
Hit:2 http://repo.radeon.com/rocm/apt/debian xenial InRelease
Reading package lists... Done
Building dependency tree
Reading state information... Done
Reading package lists... Done
Building dependency tree
Reading state information... Done
hipblas is already the newest version (0.36.0.0-rocm-rel-3.9-17-e4d9e7b).
0 upgraded, 0 newly installed, 0 to remove and 1 not upgraded.
$ apt search rocsolver
Sorting... Done
Full Text Search... Done
rocsolver/Ubuntu 16.04 3.9.0.0-c2cd214 amd64
Radeon Open Compute SOLVER library
rocsolver3.9.0/Ubuntu 16.04 3.9.0.0-c2cd214 amd64
Radeon Open Compute SOLVER library
/usr/bin/ld: warning: librocsolver.so.0, needed by /opt/rocm/lib/libhipblas.so, not found (try using -rpath or -rpath-link)
/usr/bin/ld: /opt/rocm/lib/libhipblas.so: undefined reference to `rocsolver_dgetrs'
/usr/bin/ld: /opt/rocm/lib/libhipblas.so: undefined reference to `rocsolver_cgetrs'
/usr/bin/ld: /opt/rocm/lib/libhipblas.so: undefined reference to `rocsolver_zgeqrf_ptr_batched'
/usr/bin/ld: /opt/rocm/lib/libhipblas.so: undefined reference to `rocsolver_dgetrf_strided_batched'
/usr/bin/ld: /opt/rocm/lib/libhipblas.so: undefined reference to `rocsolver_dgetrs_batched'
/usr/bin/ld: /opt/rocm/lib/libhipblas.so: undefined reference to `rocsolver_cgetrs_strided_batched'
/usr/bin/ld: /opt/rocm/lib/libhipblas.so: undefined reference to `rocsolver_sgetrf_strided_batched'
/usr/bin/ld: /opt/rocm/lib/libhipblas.so: undefined reference to `rocsolver_zgeqrf'
/usr/bin/ld: /opt/rocm/lib/libhipblas.so: undefined reference to `rocsolver_sgetrs'
/usr/bin/ld: /opt/rocm/lib/libhipblas.so: undefined reference to `rocsolver_cgeqrf_ptr_batched'
/usr/bin/ld: /opt/rocm/lib/libhipblas.so: undefined reference to `rocsolver_cgetrf_batched'
/usr/bin/ld: /opt/rocm/lib/libhipblas.so: undefined reference to `rocsolver_dgetrs_strided_batched'
/usr/bin/ld: /opt/rocm/lib/libhipblas.so: undefined reference to `rocsolver_sgetrf'
/usr/bin/ld: /opt/rocm/lib/libhipblas.so: undefined reference to `rocsolver_zgeqrf_strided_batched'
/usr/bin/ld: /opt/rocm/lib/libhipblas.so: undefined reference to `rocsolver_cgeqrf_strided_batched'
/usr/bin/ld: /opt/rocm/lib/libhipblas.so: undefined reference to `rocsolver_dgeqrf'
/usr/bin/ld: /opt/rocm/lib/libhipblas.so: undefined reference to `rocsolver_cgeqrf'
/usr/bin/ld: /opt/rocm/lib/libhipblas.so: undefined reference to `rocsolver_zgetrs'
/usr/bin/ld: /opt/rocm/lib/libhipblas.so: undefined reference to `rocsolver_dgetrf_batched'
/usr/bin/ld: /opt/rocm/lib/libhipblas.so: undefined reference to `rocsolver_sgetri_outofplace_batched'
/usr/bin/ld: /opt/rocm/lib/libhipblas.so: undefined reference to `rocsolver_sgeqrf_strided_batched'
/usr/bin/ld: /opt/rocm/lib/libhipblas.so: undefined reference to `rocsolver_dgeqrf_ptr_batched'
/usr/bin/ld: /opt/rocm/lib/libhipblas.so: undefined reference to `rocsolver_sgeqrf'
/usr/bin/ld: /opt/rocm/lib/libhipblas.so: undefined reference to `rocsolver_sgetrf_batched'
/usr/bin/ld: /opt/rocm/lib/libhipblas.so: undefined reference to `rocsolver_cgetri_outofplace_batched'
/usr/bin/ld: /opt/rocm/lib/libhipblas.so: undefined reference to `rocsolver_zgetrs_strided_batched'
/usr/bin/ld: /opt/rocm/lib/libhipblas.so: undefined reference to `rocsolver_zgetrs_batched'
/usr/bin/ld: /opt/rocm/lib/libhipblas.so: undefined reference to `rocsolver_zgetri_outofplace_batched'
/usr/bin/ld: /opt/rocm/lib/libhipblas.so: undefined reference to `rocsolver_cgetrf_strided_batched'
/usr/bin/ld: /opt/rocm/lib/libhipblas.so: undefined reference to `rocsolver_sgetrs_batched'
/usr/bin/ld: /opt/rocm/lib/libhipblas.so: undefined reference to `rocsolver_dgetrf'
/usr/bin/ld: /opt/rocm/lib/libhipblas.so: undefined reference to `rocsolver_cgetrs_batched'
/usr/bin/ld: /opt/rocm/lib/libhipblas.so: undefined reference to `rocsolver_sgeqrf_ptr_batched'
/usr/bin/ld: /opt/rocm/lib/libhipblas.so: undefined reference to `rocsolver_cgetrf'
/usr/bin/ld: /opt/rocm/lib/libhipblas.so: undefined reference to `rocsolver_zgetrf'
/usr/bin/ld: /opt/rocm/lib/libhipblas.so: undefined reference to `rocsolver_zgetrf_strided_batched'
/usr/bin/ld: /opt/rocm/lib/libhipblas.so: undefined reference to `rocsolver_dgeqrf_strided_batched'
/usr/bin/ld: /opt/rocm/lib/libhipblas.so: undefined reference to `rocsolver_dgetri_outofplace_batched'
/usr/bin/ld: /opt/rocm/lib/libhipblas.so: undefined reference to `rocsolver_sgetrs_strided_batched'
/usr/bin/ld: /opt/rocm/lib/libhipblas.so: undefined reference to `rocsolver_zgetrf_batched'
clang-12: error: linker command failed with exit code 1 (use -v to see invocation)
#include <hipblas.h>
int main()
{
hipblasHandle_t h;
int order{1};
double alpha{0};
double A{1}, B{1}, C{1};
hipblasDgemm(h, HIPBLAS_OP_N, HIPBLAS_OP_N, order, order, order, &alpha, &A, order, &B, order, &alpha, &C, order);
return 0;
}
$ hipcc bug.cc -L/opt/rocm/lib -lrocblas -L/opt/rocm/lib -lhipblas
/usr/bin/ld: warning: librocsolver.so.0, needed by /opt/rocm/lib/libhipblas.so, not found (try using -rpath or -rpath-link)
/usr/bin/ld: /opt/rocm/lib/libhipblas.so: undefined reference to `rocsolver_dgetrs'
/usr/bin/ld: /opt/rocm/lib/libhipblas.so: undefined reference to `rocsolver_cgetrs'
/usr/bin/ld: /opt/rocm/lib/libhipblas.so: undefined reference to `rocsolver_zgeqrf_ptr_batched'
/usr/bin/ld: /opt/rocm/lib/libhipblas.so: undefined reference to `rocsolver_dgetrf_strided_batched'
/usr/bin/ld: /opt/rocm/lib/libhipblas.so: undefined reference to `rocsolver_dgetrs_batched'
/usr/bin/ld: /opt/rocm/lib/libhipblas.so: undefined reference to `rocsolver_cgetrs_strided_batched'
/usr/bin/ld: /opt/rocm/lib/libhipblas.so: undefined reference to `rocsolver_sgetrf_strided_batched'
/usr/bin/ld: /opt/rocm/lib/libhipblas.so: undefined reference to `rocsolver_zgeqrf'
/usr/bin/ld: /opt/rocm/lib/libhipblas.so: undefined reference to `rocsolver_sgetrs'
/usr/bin/ld: /opt/rocm/lib/libhipblas.so: undefined reference to `rocsolver_cgeqrf_ptr_batched'
/usr/bin/ld: /opt/rocm/lib/libhipblas.so: undefined reference to `rocsolver_cgetrf_batched'
/usr/bin/ld: /opt/rocm/lib/libhipblas.so: undefined reference to `rocsolver_dgetrs_strided_batched'
/usr/bin/ld: /opt/rocm/lib/libhipblas.so: undefined reference to `rocsolver_sgetrf'
/usr/bin/ld: /opt/rocm/lib/libhipblas.so: undefined reference to `rocsolver_zgeqrf_strided_batched'
/usr/bin/ld: /opt/rocm/lib/libhipblas.so: undefined reference to `rocsolver_cgeqrf_strided_batched'
/usr/bin/ld: /opt/rocm/lib/libhipblas.so: undefined reference to `rocsolver_dgeqrf'
/usr/bin/ld: /opt/rocm/lib/libhipblas.so: undefined reference to `rocsolver_cgeqrf'
/usr/bin/ld: /opt/rocm/lib/libhipblas.so: undefined reference to `rocsolver_zgetrs'
/usr/bin/ld: /opt/rocm/lib/libhipblas.so: undefined reference to `rocsolver_dgetrf_batched'
/usr/bin/ld: /opt/rocm/lib/libhipblas.so: undefined reference to `rocsolver_sgetri_outofplace_batched'
/usr/bin/ld: /opt/rocm/lib/libhipblas.so: undefined reference to `rocsolver_sgeqrf_strided_batched'
/usr/bin/ld: /opt/rocm/lib/libhipblas.so: undefined reference to `rocsolver_dgeqrf_ptr_batched'
/usr/bin/ld: /opt/rocm/lib/libhipblas.so: undefined reference to `rocsolver_sgeqrf'
/usr/bin/ld: /opt/rocm/lib/libhipblas.so: undefined reference to `rocsolver_sgetrf_batched'
/usr/bin/ld: /opt/rocm/lib/libhipblas.so: undefined reference to `rocsolver_cgetri_outofplace_batched'
/usr/bin/ld: /opt/rocm/lib/libhipblas.so: undefined reference to `rocsolver_zgetrs_strided_batched'
/usr/bin/ld: /opt/rocm/lib/libhipblas.so: undefined reference to `rocsolver_zgetrs_batched'
/usr/bin/ld: /opt/rocm/lib/libhipblas.so: undefined reference to `rocsolver_zgetri_outofplace_batched'
/usr/bin/ld: /opt/rocm/lib/libhipblas.so: undefined reference to `rocsolver_cgetrf_strided_batched'
/usr/bin/ld: /opt/rocm/lib/libhipblas.so: undefined reference to `rocsolver_sgetrs_batched'
/usr/bin/ld: /opt/rocm/lib/libhipblas.so: undefined reference to `rocsolver_dgetrf'
/usr/bin/ld: /opt/rocm/lib/libhipblas.so: undefined reference to `rocsolver_cgetrs_batched'
/usr/bin/ld: /opt/rocm/lib/libhipblas.so: undefined reference to `rocsolver_sgeqrf_ptr_batched'
/usr/bin/ld: /opt/rocm/lib/libhipblas.so: undefined reference to `rocsolver_cgetrf'
/usr/bin/ld: /opt/rocm/lib/libhipblas.so: undefined reference to `rocsolver_zgetrf'
/usr/bin/ld: /opt/rocm/lib/libhipblas.so: undefined reference to `rocsolver_zgetrf_strided_batched'
/usr/bin/ld: /opt/rocm/lib/libhipblas.so: undefined reference to `rocsolver_dgeqrf_strided_batched'
/usr/bin/ld: /opt/rocm/lib/libhipblas.so: undefined reference to `rocsolver_dgetri_outofplace_batched'
/usr/bin/ld: /opt/rocm/lib/libhipblas.so: undefined reference to `rocsolver_sgetrs_strided_batched'
/usr/bin/ld: /opt/rocm/lib/libhipblas.so: undefined reference to `rocsolver_zgetrf_batched'
clang-12: error: linker command failed with exit code 1 (use -v to see invocation)
jrhammon@jrhammon-nuc:~/PRK/Cxx11$ rocminfo
ROCk module is loaded
Able to open /dev/kfd read-write
=====================
HSA System Attributes
=====================
Runtime Version: 1.1
System Timestamp Freq.: 1000.000000MHz
Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model: LARGE
System Endianness: LITTLE
==========
HSA Agents
==========
*******
Agent 1
*******
Name: Intel(R) Core(TM) i7-8809G CPU @ 3.10GHz
Uuid: CPU-XX
Marketing Name: Intel(R) Core(TM) i7-8809G CPU @ 3.10GHz
Vendor Name: CPU
Feature: None specified
Profile: FULL_PROFILE
Float Round Mode: NEAR
Max Queue Number: 0(0x0)
Queue Min Size: 0(0x0)
Queue Max Size: 0(0x0)
Queue Type: MULTI
Node: 0
Device Type: CPU
Cache Info:
L1: 32768(0x8000) KB
Chip ID: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 8300
BDFID: 0
Internal Node ID: 0
Compute Unit: 8
SIMDs per CU: 0
Shader Engines: 0
Shader Arrs. per Eng.: 0
WatchPts on Addr. Ranges:1
Features: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED
Size: 32803976(0x1f48c88) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 2
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 32803976(0x1f48c88) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
ISA Info:
N/A
*******
Agent 2
*******
Name: gfx803
Uuid: GPU-XX
Marketing Name: Polaris 22 XT [Radeon RX Vega M GH]
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 4096(0x1000)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 1
Device Type: GPU
Cache Info:
L1: 16(0x10) KB
Chip ID: 26956(0x694c)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 1190
BDFID: 256
Internal Node ID: 1
Compute Unit: 24
SIMDs per CU: 4
Shader Engines: 4
Shader Arrs. per Eng.: 1
WatchPts on Addr. Ranges:4
Features: KERNEL_DISPATCH
Fast F16 Operation: FALSE
Wavefront Size: 64(0x40)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 40(0x28)
Max Work-item Per CU: 2560(0xa00)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
Max fbarriers/Workgrp: 32
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 4194304(0x400000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 2
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Alignment: 0KB
Accessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx803
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32
*** Done ***
$ hipcc --version
HIP version: 3.9.20412-6d111f85
clang version 12.0.0 (/src/external/llvm-project/clang 60f39e2924d51c1e8606f2135f95e9047fb1da5d)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/rocm-3.9.0/llvm/bin
The following packages will be upgraded:
hipblas
1 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
4 not fully installed or removed.
Need to get 0 B/62.1 kB of archives.
After this operation, 270 kB of additional disk space will be used.
Do you want to continue? [Y/n]
(Reading database ... 1092339 files and directories currently installed.)
Preparing to unpack .../hipblas_0.6.0.8_amd64.deb ...
Unpacking hipblas (0.6.0.8) over (0.5.2.0) ...
dpkg: error processing archive /var/cache/apt/archives/hipblas_0.6.0.8_amd64.deb (--unpack):
unable to open '/opt/rocm/hipblas/lib/cmake/hipblas/hipblas-config-version.cmake.dpkg-new': No such file or directory
abort-upgrade
Errors were encountered while processing:
/var/cache/apt/archives/hipblas_0.6.0.8_amd64.deb
E: Sub-process /usr/bin/dpkg returned an error code (1)
Linux mint 18.2
find_package(hipblas)
should work.hipblas
can no longer be found be CMake on an NVIDIA platform:CMake Error at /opt/cmake-3.21.3-linux-x86_64/share/cmake-3.21/Modules/CMakeFindDependencyMacro.cmake:47 (find_package):
By not providing "Findhip.cmake" in CMAKE_MODULE_PATH this project has
asked CMake to find a package configuration file provided by "hip", but
CMake did not find one.
Could not find a package configuration file provided by "hip" with any of
the following names:
hipConfig.cmake
hip-config.cmake
Add the installation prefix of "hip" to CMAKE_PREFIX_PATH or set "hip_DIR"
to a directory containing one of the above files. If "hip" provides a
separate development package or SDK, be sure it has been installed.
Call Stack (most recent call first):
/opt/rocm/hipblas/lib/cmake/hipblas/hipblas-config.cmake:90 (find_dependency)
CMakeLists.txt:239 (find_package)
DEPENDS PACKAGE hip
to DEPENDS PACKAGE HIP
. Then after recompiling and installing hipBLAS, it works as expected../install.sh -i --cuda
find_package(HIP REQUIRED)
find_package(hipblas REQUIRED)
/opt/rocm/hipblas/lib/cmake/hipblas/hipblas-config.cmake:90
shows find_dependency(hip)
whereas find_dependency(HIP)
works fine (see workaround above).Ubuntu 20.04 with CUDA 11.5, CMake 3.21.3.
HIP 4.5 installed using debian packages. hipBLAS 4.5 compiled with --cuda
.
Would the mixed precision "dot" be added to the library ?
For mixed precision version (inputs are float while result is double), dot product is computed with double precision.
Reference
https://www.intel.com/content/www/us/en/docs/onemkl/developer-reference-dpcpp/2023-0/dot.html
Hi, first I build HIP from source and change the HIP_PATH, then when I cmake and make hipBLAS it shows the following error:
CMake Error: INSTALL(EXPORT) given unknown export "hipblas-targets"
I also try to correct the CMakelist.txt it doesn't work, too.
So is there a simple description about how to build hipBLAS on nvcc platform without install rocm?
The hipblas Datatypes equivalent is not supported for integer datatypes but supported in both rocblas and cublas.
list of rocblas datatypes not supported in hipblas
rocblas_datatype_i8_c
rocblas_datatype_u8_r
rocblas_datatype_u8_c
rocblas_datatype_i32_r
rocblas_datatype_i32_c
rocblas_datatype_u32_r
rocblas_datatype_u32_c
list of cublas Datatypes:
CUDA_R_8I = 3,
CUDA_C_8I = 7,
CUDA_R_8U = 8,
CUDA_C_8U = 9,
CUDA_R_32I= 10,
CUDA_C_32I= 11,
CUDA_R_32U= 12,
CUDA_C_32U= 13
where as hipblas only supports float datatypes as below
enum hipblasDatatype_t
{
HIPBLAS_R_16F = 150,
HIPBLAS_R_32F = 151,
HIPBLAS_R_64F = 152,
HIPBLAS_C_16F = 153,
HIPBLAS_C_32F = 154,
HIPBLAS_C_64F = 155,
};
Hardware | description |
---|---|
GPU | device string |
CPU | device string |
Software | version |
---|---|
ROCK | v0.0 |
ROCR | v0.0 |
HCC | v0.0 |
Library | v0.0 |
running develop
running egg_info
writing hipmat.egg-info/PKG-INFO
writing top-level names to hipmat.egg-info/top_level.txt
writing dependency_links to hipmat.egg-info/dependency_links.txt
reading manifest file 'hipmat.egg-info/SOURCES.txt'
writing manifest file 'hipmat.egg-info/SOURCES.txt'
running build_ext
building 'hipmat.libhipmat' extension
/opt/rocm/hip/bin/hipcc -I/opt/rocm/hipblas/include -I/opt/rocm/hip/include/hip/ -I/usr/include/python2.7 -c hipmat/hipmat.cpp -o build/temp.linux-x86_64-2.7/hipmat/hipmat.o -O -fPIC
In file included from hipmat/hipmat.cpp:5:
In file included from /opt/rocm/hipblas/include/hipblas.h:17:
In file included from /opt/rocm/hip/include/hip/hip_complex.h:29:
/opt/rocm/hip/include/hip/hcc_detail/hip_complex.h:107:5: error: expected 'restrict' specifier
MAKE_COMPONENT_CONSTRUCTOR_TWO_COMPONENT(hipFloatComplex, signed short)
^
/opt/rocm/hip/include/hip/hcc_detail/hip_complex.h:108:5: error: expected 'restrict' specifier
MAKE_COMPONENT_CONSTRUCTOR_TWO_COMPONENT(hipFloatComplex, unsigned int)
^
/opt/rocm/hip/include/hip/hcc_detail/hip_complex.h:109:5: error: expected 'restrict' specifier
MAKE_COMPONENT_CONSTRUCTOR_TWO_COMPONENT(hipFloatComplex, signed int)
^
/opt/rocm/hip/include/hip/hcc_detail/hip_complex.h:110:5: error: expected 'restrict' specifier
MAKE_COMPONENT_CONSTRUCTOR_TWO_COMPONENT(hipFloatComplex, double)
^
/opt/rocm/hip/include/hip/hcc_detail/hip_complex.h:111:5: error: expected 'restrict' specifier
MAKE_COMPONENT_CONSTRUCTOR_TWO_COMPONENT(hipFloatComplex, unsigned long)
^
/opt/rocm/hip/include/hip/hcc_detail/hip_complex.h:112:5: error: expected 'restrict' specifier
MAKE_COMPONENT_CONSTRUCTOR_TWO_COMPONENT(hipFloatComplex, signed long)
^
/opt/rocm/hip/include/hip/hcc_detail/hip_complex.h:113:5: error: expected 'restrict' specifier
MAKE_COMPONENT_CONSTRUCTOR_TWO_COMPONENT(hipFloatComplex, unsigned long long)
^
/opt/rocm/hip/include/hip/hcc_detail/hip_complex.h:114:5: error: expected 'restrict' specifier
MAKE_COMPONENT_CONSTRUCTOR_TWO_COMPONENT(hipFloatComplex, signed long long)
^
/opt/rocm/hip/include/hip/hcc_detail/hip_complex.h:106:5: error: C++ requires a type specifier for all declarations
MAKE_COMPONENT_CONSTRUCTOR_TWO_COMPONENT(hipFloatComplex, unsigned short)
^
/opt/rocm/hip/include/hip/hcc_detail/hip_complex.h:114:80: error: expected ';' at end of declaration list
MAKE_COMPONENT_CONSTRUCTOR_TWO_COMPONENT(hipFloatComplex, signed long long)
^
;
/opt/rocm/hip/include/hip/hcc_detail/hip_complex.h:103:45: error: member initializer 'x' does not name a non-static data member or base class
__device__ __host__ hipFloatComplex() : x(0.0f), y(0.0f) {}
^~~~~~~
/opt/rocm/hip/include/hip/hcc_detail/hip_complex.h:103:54: error: member initializer 'y' does not name a non-static data member or base class
__device__ __host__ hipFloatComplex() : x(0.0f), y(0.0f) {}
^~~~~~~
/opt/rocm/hip/include/hip/hcc_detail/hip_complex.h:104:52: error: member initializer 'x' does not name a non-static data member or base class
__device__ __host__ hipFloatComplex(float x) : x(x), y(0.0f) {}
^~~~
/opt/rocm/hip/include/hip/hcc_detail/hip_complex.h:104:58: error: member initializer 'y' does not name a non-static data member or base class
__device__ __host__ hipFloatComplex(float x) : x(x), y(0.0f) {}
^~~~~~~
/opt/rocm/hip/include/hip/hcc_detail/hip_complex.h:105:61: error: member initializer 'x' does not name a non-static data member or base class
__device__ __host__ hipFloatComplex(float x, float y) : x(x), y(y) {}
^~~~
/opt/rocm/hip/include/hip/hcc_detail/hip_complex.h:105:67: error: member initializer 'y' does not name a non-static data member or base class
__device__ __host__ hipFloatComplex(float x, float y) : x(x), y(y) {}
^~~~
/opt/rocm/hip/include/hip/hcc_detail/hip_complex.h:126:5: error: expected 'restrict' specifier
MAKE_COMPONENT_CONSTRUCTOR_TWO_COMPONENT(hipDoubleComplex, signed short)
^
/opt/rocm/hip/include/hip/hcc_detail/hip_complex.h:127:5: error: expected 'restrict' specifier
MAKE_COMPONENT_CONSTRUCTOR_TWO_COMPONENT(hipDoubleComplex, unsigned int)
^
/opt/rocm/hip/include/hip/hcc_detail/hip_complex.h:128:5: error: expected 'restrict' specifier
MAKE_COMPONENT_CONSTRUCTOR_TWO_COMPONENT(hipDoubleComplex, signed int)
^
fatal error: too many errors emitted, stopping now [-ferror-limit=]
20 errors generated.
In file included from hipmat/hipmat.cpp:5:
In file included from /opt/rocm/hipblas/include/hipblas.h:17:
In file included from /opt/rocm/hip/include/hip/hip_complex.h:29:
/opt/rocm/hip/include/hip/hcc_detail/hip_complex.h:107:5: error: expected 'restrict' specifier
MAKE_COMPONENT_CONSTRUCTOR_TWO_COMPONENT(hipFloatComplex, signed short)
^
/opt/rocm/hip/include/hip/hcc_detail/hip_complex.h:108:5: error: expected 'restrict' specifier
MAKE_COMPONENT_CONSTRUCTOR_TWO_COMPONENT(hipFloatComplex, unsigned int)
^
/opt/rocm/hip/include/hip/hcc_detail/hip_complex.h:109:5: error: expected 'restrict' specifier
MAKE_COMPONENT_CONSTRUCTOR_TWO_COMPONENT(hipFloatComplex, signed int)
^
/opt/rocm/hip/include/hip/hcc_detail/hip_complex.h:110:5: error: expected 'restrict' specifier
MAKE_COMPONENT_CONSTRUCTOR_TWO_COMPONENT(hipFloatComplex, double)
^
/opt/rocm/hip/include/hip/hcc_detail/hip_complex.h:111:5: error: expected 'restrict' specifier
MAKE_COMPONENT_CONSTRUCTOR_TWO_COMPONENT(hipFloatComplex, unsigned long)
^
/opt/rocm/hip/include/hip/hcc_detail/hip_complex.h:112:5: error: expected 'restrict' specifier
MAKE_COMPONENT_CONSTRUCTOR_TWO_COMPONENT(hipFloatComplex, signed long)
^
/opt/rocm/hip/include/hip/hcc_detail/hip_complex.h:113:5: error: expected 'restrict' specifier
MAKE_COMPONENT_CONSTRUCTOR_TWO_COMPONENT(hipFloatComplex, unsigned long long)
^
/opt/rocm/hip/include/hip/hcc_detail/hip_complex.h:114:5: error: expected 'restrict' specifier
MAKE_COMPONENT_CONSTRUCTOR_TWO_COMPONENT(hipFloatComplex, signed long long)
^
/opt/rocm/hip/include/hip/hcc_detail/hip_complex.h:106:5: error: C++ requires a type specifier for all declarations
MAKE_COMPONENT_CONSTRUCTOR_TWO_COMPONENT(hipFloatComplex, unsigned short)
^
/opt/rocm/hip/include/hip/hcc_detail/hip_complex.h:114:80: error: expected ';' at end of declaration list
MAKE_COMPONENT_CONSTRUCTOR_TWO_COMPONENT(hipFloatComplex, signed long long)
^
;
/opt/rocm/hip/include/hip/hcc_detail/hip_complex.h:103:45: error: member initializer 'x' does not name a non-static data member or base class
__device__ __host__ hipFloatComplex() : x(0.0f), y(0.0f) {}
^~~~~~~
/opt/rocm/hip/include/hip/hcc_detail/hip_complex.h:103:54: error: member initializer 'y' does not name a non-static data member or base class
__device__ __host__ hipFloatComplex() : x(0.0f), y(0.0f) {}
^~~~~~~
/opt/rocm/hip/include/hip/hcc_detail/hip_complex.h:104:52: error: member initializer 'x' does not name a non-static data member or base class
__device__ __host__ hipFloatComplex(float x) : x(x), y(0.0f) {}
^~~~
/opt/rocm/hip/include/hip/hcc_detail/hip_complex.h:104:58: error: member initializer 'y' does not name a non-static data member or base class
__device__ __host__ hipFloatComplex(float x) : x(x), y(0.0f) {}
^~~~~~~
/opt/rocm/hip/include/hip/hcc_detail/hip_complex.h:105:61: error: member initializer 'x' does not name a non-static data member or base class
__device__ __host__ hipFloatComplex(float x, float y) : x(x), y(y) {}
^~~~
/opt/rocm/hip/include/hip/hcc_detail/hip_complex.h:105:67: error: member initializer 'y' does not name a non-static data member or base class
__device__ __host__ hipFloatComplex(float x, float y) : x(x), y(y) {}
^~~~
/opt/rocm/hip/include/hip/hcc_detail/hip_complex.h:126:5: error: expected 'restrict' specifier
MAKE_COMPONENT_CONSTRUCTOR_TWO_COMPONENT(hipDoubleComplex, signed short)
^
/opt/rocm/hip/include/hip/hcc_detail/hip_complex.h:127:5: error: expected 'restrict' specifier
MAKE_COMPONENT_CONSTRUCTOR_TWO_COMPONENT(hipDoubleComplex, unsigned int)
^
/opt/rocm/hip/include/hip/hcc_detail/hip_complex.h:128:5: error: expected 'restrict' specifier
MAKE_COMPONENT_CONSTRUCTOR_TWO_COMPONENT(hipDoubleComplex, signed int)
^
fatal error: too many errors emitted, stopping now [-ferror-limit=]
20 errors generated.
Died at /opt/rocm/hip/bin/hipcc line 565.
error: command '/opt/rocm/hip/bin/hipcc' failed with exit status 1
Hardware | description |
---|---|
GPU | Radeon R9 Fury Nano |
CPU | Intel i5-4440 |
There are extra space between the last @ and the string in this hipblas-version.h.in
I think this leads CMake can not replace the version correctly
I'm building using CMake directly[1] for CUDA backend:
# On branch release/rocm-rel-5.5
mkdir build && cd build
cmake -DUSE_CUDA=ON -DHIP_ROOT_DIR=/opt/rocm ..
make
Scanning dependencies of target hipblas_fortran
[ 20%] Building Fortran object library/src/CMakeFiles/hipblas_fortran.dir/hipblas_module.f90.o
[ 40%] Linking Fortran shared library libhipblas_fortran.so
[ 40%] Built target hipblas_fortran
[ 60%] Building CXX object library/src/CMakeFiles/hipblas.dir/nvidia_detail/hipblas.cpp.o
In file included from /home/torrance/hipBLAS/library/src/nvidia_detail/hipblas.cpp:27:
/usr/local/cuda/include/cublas_v2.h:59:2: error: #error "It is an error to include both cublas.h and cublas_v2.h"
59 | #error "It is an error to include both cublas.h and cublas_v2.h"
| ^~~~~
I can compile with the following changes, though I don't know if this is properly functional or not:
diff --git a/library/src/nvidia_detail/hipblas.cpp b/library/src/nvidia_detail/hipblas.cpp
index b60a31d..08a6257 100644
--- a/library/src/nvidia_detail/hipblas.cpp
+++ b/library/src/nvidia_detail/hipblas.cpp
@@ -23,7 +23,7 @@
#include "hipblas.h"
#include "exceptions.hpp"
-#include <cublas.h>
+//#include <cublas.h>
#include <cublas_v2.h>
#include <cuda_runtime_api.h>
#include <hip/hip_runtime.h>
@@ -350,7 +350,7 @@ hipblasStatus_t hipblasGetPointerMode(hipblasHandle_t handle, hipblasPointerMode
try
{
cublasPointerMode_t cublasMode;
- cublasStatus status = cublasGetPointerMode((cublasHandle_t)handle, &cublasMode);
+ cublasStatus_t status = cublasGetPointerMode((cublasHandle_t)handle, &cublasMode);
*mode = CudaPointerModeToHIPPointerMode(cublasMode);
return hipCUBLASStatusToHIPStatus(status);
}
[1] i.e. not using the install.sh script, though the error is present in both obviously. Honestly though, why this install script, when the rest of the hip ecosystem uses CMake? And why default to making a .deb package?
Hardware | description |
---|---|
GPU | Tesla T4 |
CPU | Intel(R) Xeon(R) Gold 6254 CPU @ 3.10GHz |
Software | version |
---|---|
HIP | v5.5 |
CUDA | 12.1 |
Would expect the performance of GEMM operations to be comparable to cublas GEMM operations.
The GEMM operations on floats is extremely slow.
See attached test case. Compile test case with -D CUDA and run on an NVIDIA GPU and recompile with -D HIP and run on an AMD Hipyfied AMD GPU and compare the results.
| Hardware | description |
A Linux desktop with Radeon PRO W6800 and a Linux desktop with GeForce RTX 2060 SUPER.
| GPU | device string |
19:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 21 GL-XL [Radeon PRO W6800]
| CPU | device string |
| Software | version |
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=20.04
DISTRIB_CODENAME=focal
DISTRIB_DESCRIPTION="Ubuntu 20.04.5 LTS"
NAME="Ubuntu"
VERSION="20.04.5 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.5 LTS"
VERSION_ID="20.04"
| ROCK | v0.0 |
Runtime Version: 1.1
| ROCR | v0.0 |
| HCC | v0.0 |
HIP version: 5.2.21153-02187ecf
| Library | v0.0 |
I ran it on my two desktops -- one with Radeon W6800 and the other with GeForce RTX 2060. See attached GemmTest.cpp and here are the results:
GeForce RTX 2060:
size 512 average 8.85824e-05 s
size 1024 average 0.000308755 s
size 2048 average 0.00214913 s
size 4096 average 0.0187068 s
size 8192 average 0.153399 s
size 16384 average 1.2847 s
Radeon W6800:
size 512 average 0.262805 s
size 1024 average 0.0126241 s
size 2048 average 0.118601 s
size 4096 average 0.794954 s
size 8192 average 3.916 s
size 16384 average 26.0589 s
#include <unistd.h>
#include <iostream>
#include <stdlib.h>
#include <assert.h>
#if defined(CUDA)
#include <cuda_runtime.h>
#include <cublas_v2.h>
#define GPUBLAS_OP_N CUBLAS_OP_N
typedef cudaError_t gpuError_t;
typedef cudaEvent_t gpuEvent_t;
typedef cudaStream_t gpuStream_t;
typedef cublasHandle_t gpuBlasHandle_t;
typedef cublasStatus_t gpuBlasStatus_t;
const gpuBlasStatus_t gpuBlasSuccess = CUBLAS_STATUS_SUCCESS;
const gpuError_t gpuSuccess = cudaSuccess;
#define gpuMemcpyHostToDevice cudaMemcpyHostToDevice
gpuError_t gpuMemcpy(void* dest, void* src, size_t size, cudaMemcpyKind flags) {
return cudaMemcpy(dest, src, size, flags);
}
gpuError_t gpuMallocManaged(void** ret, size_t size) {
return cudaMallocManaged(ret, size);
}
gpuError_t gpuEventCreate(gpuEvent_t* pevent) {
return cudaEventCreate(pevent);
}
gpuError_t gpuEventRecord(gpuEvent_t event, gpuStream_t stream) {
return cudaEventRecord(event, stream);
}
gpuError_t gpuEventSynchronize(gpuEvent_t event) {
return cudaEventSynchronize(event);
}
gpuError_t gpuGetLastError() {
return cudaGetLastError();
}
gpuError_t gpuEventElapsedTime(float* t, gpuEvent_t start, gpuEvent_t stop) {
return cudaEventElapsedTime(t, start, stop);
}
gpuError_t gpuFree(void* p) {
return cudaFree(p);
}
gpuBlasStatus_t gpuBlasCreate(gpuBlasHandle_t* phandle) {
return cublasCreate(phandle);
}
gpuBlasStatus_t gpuBlasSgemm(gpuBlasHandle_t handle, cublasOperation_t ta, cublasOperation_t tb, int m, int n, int k, const float* alpha, const float* a, int lda, const float* b, int ldb, const float* beta, float* c, int ldc) {
return cublasSgemm(handle, ta, tb, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc);
}
#elif defined(HIP)
#include <hip/hip_runtime.h>
#include <hip/hip_runtime_api.h>
#include <hipblas/hipblas.h>
#include <hiprand/hiprand.h>
#define GPUBLAS_OP_N HIPBLAS_OP_N
typedef hipError_t gpuError_t;
typedef hipEvent_t gpuEvent_t;
typedef hipStream_t gpuStream_t;
typedef hipblasHandle_t gpuBlasHandle_t;
typedef hipblasStatus_t gpuBlasStatus_t;
const gpuBlasStatus_t gpuBlasSuccess = HIPBLAS_STATUS_SUCCESS;
const gpuError_t gpuSuccess = hipSuccess;
#define gpuMemcpyHostToDevice hipMemcpyHostToDevice
gpuError_t gpuMemcpy(void* dest, void* src, size_t size, hipMemcpyKind flags) {
return hipMemcpy(dest, src, size, flags);
}
gpuError_t gpuMallocManaged(void** ret, size_t size) {
return hipMallocManaged(ret, size);
}
gpuError_t gpuEventCreate(gpuEvent_t* pevent) {
return hipEventCreate(pevent);
}
gpuError_t gpuEventRecord(gpuEvent_t event, hipStream_t stream) {
return hipEventRecord(event, stream);
}
gpuError_t gpuEventSynchronize(gpuEvent_t event) {
return hipEventSynchronize(event);
}
gpuError_t gpuGetLastError() {
return hipGetLastError();
}
gpuError_t gpuEventElapsedTime(float* t, gpuEvent_t start, gpuEvent_t stop) {
return hipEventElapsedTime(t, start, stop);
}
gpuError_t gpuFree(void* p) {
return hipFree(p);
}
gpuBlasStatus_t gpuBlasCreate(gpuBlasHandle_t* phandle) {
return hipblasCreate(phandle);
}
gpuBlasStatus_t gpuBlasSgemm(gpuBlasHandle_t handle, hipblasOperation_t ta, hipblasOperation_t tb, int m, int n, int k, const float* alpha, const float* a, int lda, const float* b, int ldb, const float* beta, float* c, int ldc) {
return hipblasSgemm(handle, ta, tb, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc);
}
#else
#error "Specify GPU type"
#endif
void initRand(float* p, int rows, int cols) {
int a = 1;
for (int i = 0; i < rows * cols; i++) {
p[i] = (float) rand() / (float) (RAND_MAX / a);
}
}
int main(int argc, char* argv[]) {
int status, lower, upper, num, reps, verbose;
lower = 256;
upper = 8192;
num = 25000;
reps = 5;
verbose = 0;
while ((status = getopt(argc, argv, "l:u:n:r:v")) != -1) {
switch (status) {
case 'l':
lower = strtoul(optarg, 0, 0);
break;
case 'u':
upper = strtoul(optarg, 0, 0);
break;
case 'n':
num = strtoul(optarg, 0, 0);
break;
case 'r':
reps = strtoul(optarg, 0, 0);
break;
case 'v':
verbose = strtoul(optarg, 0, 0);
break;
default:
std::cerr << "invalid argument: " << status << std::endl;
exit(1);
}
}
if (verbose) {
std::cout << "Running with" << " lower: " << lower << " upper: " << upper << " num: " << num << " reps: " << reps << std::endl;
}
if (verbose) {
std::cout << "initializing inputs" << std::endl;
}
gpuBlasHandle_t handle;
gpuBlasStatus_t blasStatus;
blasStatus = gpuBlasCreate(&handle);
if (blasStatus != gpuBlasSuccess) {
std::cerr << "Could not create blas handle " << blasStatus << std::endl;
exit(1);
}
float* A = (float*) calloc(1, upper * upper * sizeof(float));
float* B = (float*) calloc(1, upper * upper * sizeof(float));
float* C = (float*) calloc(1, upper * upper * sizeof(float));
initRand(A, upper, upper);
initRand(B, upper, upper);
initRand(C, upper, upper);
float* dA, *dB, *dC;
gpuError_t err;
int lda, ldb, ldc, m, n, k;
float alpha = 1.0f, beta = 0.0f;
err = gpuMallocManaged((void**) &dA, upper * upper * sizeof(float));
if (err != gpuSuccess) {
std::cerr << "Could not allocate GPU memory; size: " << upper * upper * sizeof(float) << std::endl;
}
err = gpuMallocManaged((void**) &dB, upper * upper * sizeof(float));
if (err != gpuSuccess) {
std::cerr << "Could not allocate GPU memory; size: " << upper * upper * sizeof(float) << std::endl;
}
err = gpuMallocManaged((void**) &dC, upper * upper * sizeof(float));
if (err != gpuSuccess) {
std::cerr << "Could not allocate GPU memory; size: " << upper * upper * sizeof(float) << std::endl;
}
err = gpuMemcpy(dA, A, upper * upper * sizeof(float), gpuMemcpyHostToDevice);
if (err != gpuSuccess) {
std::cerr << "Could not copy to GPU memory; size: " << upper * upper * sizeof(float) << std::endl;
}
err = gpuMemcpy(dB, B, upper * upper * sizeof(float), gpuMemcpyHostToDevice);
if (err != gpuSuccess) {
std::cerr << "Could not copy to GPU memory; size: " << upper * upper * sizeof(float) << std::endl;
}
err = gpuMemcpy(dC, C, upper * upper * sizeof(float), gpuMemcpyHostToDevice);
if (err != gpuSuccess) {
std::cerr << "Could not copy to GPU memory; size: " << upper * upper * sizeof(float) << std::endl;
}
gpuEvent_t start, stop;
gpuEventCreate(&start);
gpuEventCreate(&stop);
for (int s = lower; s <= upper; s = s * 2) {
double sum = 0.0;
for (int r = 0; r < reps; r++) {
gpuEventRecord(start, 0);
m = n = k = s;
lda = m; ldb = k; ldc = m;
blasStatus = gpuBlasSgemm(handle, GPUBLAS_OP_N, GPUBLAS_OP_N, m, n, k, &alpha, dA, lda, dB, ldb, &beta, dC, ldc);
gpuEventRecord(stop, 0);
gpuEventSynchronize(stop);
if (blasStatus != gpuBlasSuccess) {
std::cerr << "gpuBlasSgemm failed: " << blasStatus << std::endl;
exit(1);
}
err = gpuGetLastError();
if (err != gpuSuccess) {
std::cerr << "gpu error: " << err << std::endl;
exit(1);
}
float elapsed;
gpuEventElapsedTime(&elapsed, start, stop);
elapsed /= 1000.0f;
sum += elapsed;
}
std::cout << "size " << s << " average " << sum / reps << " s " << std::endl;
}
gpuFree(dA); gpuFree(dB); gpuFree(dC);
free(A); free(B); free(C);
}
hipDataType
from HIP API should be used instead of hipblasDatatype_t
.hipDataType
should be populated with the corresponding hipblasDatatype_t
elements, like:HIPBLAS_R_8I
-> HIP_R_8I
hipblasDatatype_t
should be deleted from hipBLAS API or might be left as a typedef for hipDataType
, like it is done for cublasDataType_t
in cuBLAS API:typedef cudaDataType cublasDataType_t;
[Reasons]
hipblasDatatype_t
being actually an analogue to cudaDataType
is a type common for all HIP libraries, not only for hipBLAS, and it might be used in any HIP application.cudaDataType
to hipblasDatatype_t
because it will require adding a corresponding #include
to hipblas.h
what is definitely incorrect.Software
hipcc --version HIP version: 4.3.0
compile command: hipcc fp16gemm.hip -o fp16_gemm -lhipblas -L/opt/rocm-4.3.0/hipblas/lib
execute: ./fp16_gemm
Output is like this:
The error analysis may have some problem(I am still working on it), but the run time of Gemm of fp16 or Hgemm is too few to be real.
Hardware | description |
---|---|
GPU | MI100 |
CPU | AMD EPYC 7302 16-Core Processor |
Software | version |
---|---|
ROCm | v4.3.0 |
HipCC | v4.3 |
RocBlas | v4.3 |
The problem is with the penultimate argument hipblasDatatype_t computeType
, which doesn't match to cublasComputeType_t computeType
. cublasComputeType_t
appeared with CUDA 11.0. cublasGemmEx
used cudaDataType
instead of cublasComputeType_t
for its penultimate argument starting with CUDA 8.0 and till CUDA 11.0.
HIPBLAS_EXPORT hipblasStatus_t hipblasGemmEx(hipblasHandle_t handle,
hipblasOperation_t transA,
hipblasOperation_t transB,
int m,
int n,
int k,
const void* alpha,
const void* A,
hipblasDatatype_t aType,
int lda,
const void* B,
hipblasDatatype_t bType,
int ldb,
const void* beta,
void* C,
hipblasDatatype_t cType,
int ldc,
hipblasDatatype_t computeType,
hipblasGemmAlgo_t algo);
CUBLASAPI cublasStatus_t CUBLASWINAPI cublasGemmEx(cublasHandle_t handle,
cublasOperation_t transa,
cublasOperation_t transb,
int m,
int n,
int k,
const void* alpha, /* host or device pointer */
const void* A,
cudaDataType Atype,
int lda,
const void* B,
cudaDataType Btype,
int ldb,
const void* beta, /* host or device pointer */
void* C,
cudaDataType Ctype,
int ldc,
cublasComputeType_t computeType,
cublasGemmAlgo_t algo);
typedef enum
{
HIPBLAS_R_16F = 150, /**< 16 bit floating point, real */
HIPBLAS_R_32F = 151, /**< 32 bit floating point, real */
HIPBLAS_R_64F = 152, /**< 64 bit floating point, real */
HIPBLAS_C_16F = 153, /**< 16 bit floating point, complex */
HIPBLAS_C_32F = 154, /**< 32 bit floating point, complex */
HIPBLAS_C_64F = 155, /**< 64 bit floating point, complex */
HIPBLAS_R_8I = 160, /**< 8 bit signed integer, real */
HIPBLAS_R_8U = 161, /**< 8 bit unsigned integer, real */
HIPBLAS_R_32I = 162, /**< 32 bit signed integer, real */
HIPBLAS_R_32U = 163, /**< 32 bit unsigned integer, real */
HIPBLAS_C_8I = 164, /**< 8 bit signed integer, complex */
HIPBLAS_C_8U = 165, /**< 8 bit unsigned integer, complex */
HIPBLAS_C_32I = 166, /**< 32 bit signed integer, complex */
HIPBLAS_C_32U = 167, /**< 32 bit unsigned integer, complex */
HIPBLAS_R_16B = 168, /**< 16 bit bfloat, real */
HIPBLAS_C_16B = 169, /**< 16 bit bfloat, complex */
} hipblasDatatype_t;
typedef enum {
CUBLAS_COMPUTE_16F = 64, /* half - default */
CUBLAS_COMPUTE_16F_PEDANTIC = 65, /* half - pedantic */
CUBLAS_COMPUTE_32F = 68, /* float - default */
CUBLAS_COMPUTE_32F_PEDANTIC = 69, /* float - pedantic */
CUBLAS_COMPUTE_32F_FAST_16F = 74, /* float - fast, allows down-converting inputs to half or TF32 */
CUBLAS_COMPUTE_32F_FAST_16BF = 75, /* float - fast, allows down-converting inputs to bfloat16 or TF32 */
CUBLAS_COMPUTE_32F_FAST_TF32 = 77, /* float - fast, allows down-converting inputs to TF32 */
CUBLAS_COMPUTE_64F = 70, /* double - default */
CUBLAS_COMPUTE_64F_PEDANTIC = 71, /* double - pedantic */
CUBLAS_COMPUTE_32I = 72, /* signed 32-bit int - default */
CUBLAS_COMPUTE_32I_PEDANTIC = 73, /* signed 32-bit int - pedantic */
} cublasComputeType_t;
Running the HIP program produces some error. I migrated the program from CUDA to HIP. Thanks.
To reproduce:
go to https://github.com/zjin-lcf/HeCBench/tree/master/src/gels-hip
type 'make'
type './main 1'
ERROR: hipblasXgelsBatched() failed with error HIPBLAS_STATUS_INVALID_VALUE..
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.