GithubHelp home page GithubHelp logo

rocm / rocrand Goto Github PK

View Code? Open in Web Editor NEW
109.0 49.0 66.0 103.87 MB

RAND library for HIP programming language

Home Page: https://rocmdocs.amd.com/projects/rocRAND/en/latest/

License: MIT License

CMake 0.12% C++ 95.15% C 4.45% Shell 0.02% Fortran 0.14% Python 0.09% Groovy 0.01% Assembly 0.01%
rocm hip random rng cuda gpu

rocrand's Introduction

rocRAND

The rocRAND project provides functions that generate pseudorandom and quasirandom numbers. The rocRAND library is implemented in the HIP programming language and optimized for AMD's latest discrete GPUs. It is designed to run on top of AMD's ROCm runtime, but it also works on CUDA-enabled GPUs.

Prior to ROCm version 5.0, this project included the hipRAND wrapper. As of version 5.0, it was split into a separate library. As of version 6.0, hipRAND can no longer be built from rocRAND.

Supported random number generators

  • XORWOW
  • MRG31k3p
  • MRG32k3a
  • Mersenne Twister (MT19937)
  • Mersenne Twister for Graphic Processors (MTGP32)
  • Philox (4x32, 10 rounds)
  • LFSR113
  • Sobol32
  • Scrambled Sobol32
  • Sobol64
  • Scrambled Sobol64
  • ThreeFry

Documentation

Documentation for rocRAND is available at https://rocm.docs.amd.com/projects/rocRAND/en/latest/

To build documentation locally, use the following code:

# Go to the docs directory
cd docs

# Install Python dependencies
python3 -m pip install -r sphinx/requirements.txt

# Build the documentation
python3 -m sphinx -T -E -b html -d _build/doctrees -D language=en . _build/html

# E.g. serve the HTML docs locally
cd _build/html
python3 -m http.server

Requirements

  • CMake (3.16 or later)
  • C++ compiler with C++17 support to build the library.
    • Recommended to use at least gcc 9
    • clang uses the development headers and libraries from gcc, so a recent version of it must still be installed when compiling with clang
  • C++ compiler with C++11 support to consume the library.
  • For AMD platforms:
    • ROCm (1.7 or later)
    • HIP-clang compiler, which must be set as C++ compiler on ROCm platform.
  • For CUDA platforms:
    • HIP
    • Latest CUDA SDK
  • Python 3.6 or higher (HIP on Windows only, only required for install script)
  • Visual Studio 2019 with clang support (HIP on Windows only)
  • Strawberry Perl (HIP on Windows only)

Optional:

  • GoogleTest (required only for tests; building tests is enabled by default)
    • Use GTEST_ROOT to specify the GoogleTest location (see also FindGTest)
    • Note: If GoogleTest is not already installed, it will be automatically downloaded and built
  • Fortran compiler (required only for Fortran wrapper)
    • gfortran is recommended
  • Python 3.5+ (required only for Python wrapper)
  • doxygen to build the documentation

If some dependencies are missing, the CMake script automatically downloads, builds, and installs them. Setting the DEPENDENCIES_FORCE_DOWNLOAD option to ON forces the script to download all dependencies, rather than using the system-installed libraries.

Build and install

git clone https://github.com/ROCm/rocRAND.git

# Go to rocRAND directory, create and go to build directory
cd rocRAND; mkdir build; cd build

# Configure rocRAND, setup options for your system
# Build options: BUILD_TEST (off by default), BUILD_BENCHMARK (off by default), BUILD_SHARED_LIBS (on by default)
# Additionally, the ROCm installation prefix should be passed using CMAKE_PREFIX_PATH or by setting the ROCM_PATH environment variable.
#
# ! IMPORTANT !
# Set C++ compiler to HIP-clang. You can do it by adding 'CXX=<path-to-compiler>'
# before 'cmake' or setting cmake option 'CMAKE_CXX_COMPILER' to path to the compiler.
#
# The python interface do not work with static library.
#
[CXX=hipcc] cmake -DBUILD_BENCHMARK=ON ../. -DCMAKE_PREFIX_PATH=/opt/rocm # or cmake-gui ../.

# To configure rocRAND for NVIDIA platforms, the CXX compiler must be set to a host compiler. The CUDA compiler can
# be set explicitly using `-DCMAKE_CUDA_COMPILER=<path-to-nvcc>`.
# Additionally, the path to FindHIP.cmake should be passed via CMAKE_MODULE_PATH. By default, this is module is
# installed in /opt/rocm/hip/cmake.
cmake -DBUILD_BENCHMARK=ON ../. -DCMAKE_PREFIX_PATH=/opt/rocm -DCMAKE_MODULE_PATH=/opt/rocm/hip/cmake # or cmake-gui ../.
# or
[CXX=g++] cmake -DBUILD_BENCHMARK=ON -DCMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc -DCMAKE_PREFIX_PATH=/opt/rocm -DCMAKE_MODULE_PATH=/opt/rocm/hip/cmake ../. # or cmake-gui ../.

# Build
make -j4

# Optionally, run tests if they're enabled
ctest --output-on-failure

# Install
[sudo] make install

HIP on Windows

We've added initial support for HIP on Windows, which you can install using the rmake.py python script:

git clone https://github.com/ROCm/rocRAND.git
cd rocRAND

# the -i option will install rocPRIM to C:\hipSDK by default
python rmake.py -i

# the -c option will build all clients including unit tests
python rmake.py -c

The existing GoogleTest library in the system (especially static GoogleTest libraries built with other compilers) may cause a build failure; if you encounter errors with the existing GoogleTest library or other dependencies, you can pass the DEPENDENCIES_FORCE_DOWNLOAD flag to CMake, which can help to solve the problem.

To disable inline assembly optimizations in rocRAND (for both the host library and the device functions provided in rocrand_kernel.h), set the CMake option ENABLE_INLINE_ASM to OFF.

Running unit tests

# Go to rocRAND build directory
cd rocRAND; cd build

# To run all tests
ctest

# To run unit tests
./test/<unit-test-name>

Running benchmarks

# Go to rocRAND build directory
cd rocRAND; cd build

# To run benchmark for the host generate functions:
# The benchmarks are registered with Google Benchmark as `device_generate<engine,distribution>`, where
# engine -> xorwow, mrg31k3p, mrg32k3a, mtgp32, philox, lfsr113, mt19937,
#           threefry2x32, threefry2x64, threefry4x32, threefry4x64,
#           sobol32, scrambled_sobol32, sobol64, scrambled_sobol64
# distribution -> uniform-uint, uniform-uchar, uniform-ushort,
#                 uniform-half, uniform-float, uniform-double,
#                 normal-half, normal-float, normal-double,
#                 log-normal-half, log-normal-float, log-normal-double, poisson
# Further option can be found using --help
./benchmark/benchmark_rocrand_host_api
# To run specific benchmarks:
./benchmark/benchmark_rocrand_host_api --benchmark_filter=<regex>
# For example to run benchmarks with engine sobol64:
./benchmark/benchmark_rocrand_host_api --benchmark_filter="device_generate<sobol64*"
# To view all registered benchmarks:
./benchmark/benchmark_rocrand_host_api --benchmark_list_tests=true
# The benchmark also supports user input:
./benchmark/benchmark_rocrand_host_api --size <number> --trials <number> --offset <number> --dimensions <number> --lambda <float float float ...>
# And can print output in different formats:
./benchmark/benchmark_rocrand_host_api --benchmark_format=<console|json|csv>

# To run benchmark for device kernel functions:
# The benchmarks are registered with Google Benchmark as `device_kernel<engine,distribution>`, where
# engine -> xorwow, mrg31k3p, mrg32k3a, mtgp32, philox, lfsr113,
#           threefry2x32, threefry2x64, threefry4x32, threefry4x64,
#           sobol32, scrambled_sobol32, sobol64, scrambled_sobol64
# distribution -> uniform-uint or uniform-ullong, uniform-float, uniform-double, normal-float, normal-double,
#                 log-normal-float, log-normal-double, poisson, discrete-poisson, discrete-custom
# Further option can be found using --help
./benchmark/benchmark_rocrand_device_api
# To run specific benchmarks:
./benchmark/benchmark_rocrand_device_api --benchmark_filter=<regex>
# For example to run benchmarks with engine sobol64:
./benchmark/benchmark_rocrand_device_api --benchmark_filter="device_kernel<sobol64*"
# To view all registered benchmarks:
./benchmark/benchmark_rocrand_device_api --benchmark_list_tests=true
# The benchmark also supports user input:
./benchmark/benchmark_rocrand_device_api --size <number> --trials <number> --dimensions <number> --lambda <float float float ...>
# And can print output in different formats:
./benchmark/benchmark_rocrand_device_api --benchmark_format=<console|json|csv>

# To compare against cuRAND (cuRAND must be supported):
./benchmark/benchmark_curand_host_api [google benchmark options]
./benchmark/benchmark_curand_device_api [google benchmark options]

Legacy benchmarks

You can disable legacy benchmarks (those used prior to Google Benchmark) by setting the CMake option BUILD_LEGACY_BENCHMARK to OFF. For compatibility, the default setting is ON when BUILD_BENCHMARK is set.

Legacy benchmarks are deprecated and will be removed in a future version once all benchmarks have been migrated to the new framework.

Wrappers

Support

Bugs and feature requests can be reported through the issue tracker.

Contributions and license

Contributions of any kind are most welcome! You can find more information at CONTRIBUTING.

Licensing information is located at LICENSE.

rocrand's People

Contributors

aaronenyeshi avatar ajcodes avatar alexbrownamd avatar amdkila avatar arvindcheru avatar bragadeesh avatar cgmb avatar dependabot[bot] avatar doctorcolinsmith avatar eidenyoshida avatar ex-rzr avatar lawruble13 avatar mathiasmagnus avatar mfep avatar mkknorr avatar naraenda avatar nb4444 avatar neon60 avatar nguyennhudi avatar nolmoonen avatar parbenc avatar pruthvistony avatar rmalavally avatar saadrahim avatar samjwu avatar snektron avatar stanleytsang-amd avatar swraw avatar umfranzw avatar vincentsc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rocrand's Issues

Use of undefined member m_engine_size in library/src/rng/xorwow.hpp

Describe the bug
xorwow_generator_template::operator=(xorwow_generator_template&& other) has the line:
m_engines_size = other.m_engine_size;
but m_engine_size appears nowhere else in the source. It should probably be "other.m_engines_size".

A recent clang change (llvm/llvm-project#90152) fixes a bug which caused this code to previously be erroneously accepted.

To Reproduce
Build rocRAND with clang that includes the above fix.

Expected behavior
Build is successful.

Log-files
In file included from /work/anjenner/reland/rocRAND/library/src/rng/generator_type_xorwow.cpp:23:
/work/anjenner/reland/rocRAND/library/src/rng/xorwow.hpp:204:48: error: no member named 'm_engine_size' in 'xorwow_generator_template<System, ConfigProvider>'; did you mean 'm_engines_size'?
204 | m_engines_size = other.m_engine_size;
| ^~~~~~~~~~~~~
| m_engines_size
/work/anjenner/reland/rocRAND/library/src/rng/xorwow.hpp:426:18: note: 'm_engines_size' declared here
426 | unsigned int m_engines_size = 0;
| ^
1 error generated when compiling for gfx1030.
make[2]: *** [library/CMakeFiles/rocrand.dir/build.make:272: library/CMakeFiles/rocrand.dir/src/rng/generator_type_xorwow.cpp.o] Error 1
make[2]: *** Waiting for unfinished jobs....
make[1]: *** [CMakeFiles/Makefile2:880: library/CMakeFiles/rocrand.dir/all] Error 2
make: *** [Makefile:156: all] Error 2

Request: Better handling of sign when doing comparisons.

While running though rocRAND I saw a comparison between unsigned and signed integers. An example shown here where variable i is signed but variable v is unsigned.
https://github.com/ROCmSoftwarePlatform/rocRAND/blob/master/library/include/rocrand_xorwow.h#L220

I'd prefer to have equivalency between types throughout the codebase. The difference between signed and unsigned integers is that the last bit is interpreted differently.

While this isn't a major issue at the moment, it'd be nice to have it handled in the future.

[cmake error] Could not find a package configuration file provided by "hip"

Describe the bug
Doing cmake , while met a error:

(base) loong@home:~/Downloads/rocRAND/build$ CXX=hipcc cmake -DBUILD_BENCHMARK=ON ../.
-- The CXX compiler identification is Clang 15.0.0
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/hipcc - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Setting build type to 'Release' as none was specified.
-- Found Git: /usr/bin/git (found version "2.34.1")
-- Performing Test COMPILER_HAS_TARGET_ID_gfx803
-- Performing Test COMPILER_HAS_TARGET_ID_gfx803 - Success
-- Performing Test COMPILER_HAS_TARGET_ID_gfx900_xnack_off
-- Performing Test COMPILER_HAS_TARGET_ID_gfx900_xnack_off - Success
-- Performing Test COMPILER_HAS_TARGET_ID_gfx906_xnack_off
-- Performing Test COMPILER_HAS_TARGET_ID_gfx906_xnack_off - Success
-- Performing Test COMPILER_HAS_TARGET_ID_gfx908_xnack_off
-- Performing Test COMPILER_HAS_TARGET_ID_gfx908_xnack_off - Success
-- Performing Test COMPILER_HAS_TARGET_ID_gfx90a_xnack_off
-- Performing Test COMPILER_HAS_TARGET_ID_gfx90a_xnack_off - Success
-- Performing Test COMPILER_HAS_TARGET_ID_gfx90a_xnack_on
-- Performing Test COMPILER_HAS_TARGET_ID_gfx90a_xnack_on - Success
-- Performing Test COMPILER_HAS_TARGET_ID_gfx1030
-- Performing Test COMPILER_HAS_TARGET_ID_gfx1030 - Success
-- Performing Test COMPILER_HAS_TARGET_ID_gfx1100
-- Performing Test COMPILER_HAS_TARGET_ID_gfx1100 - Success
-- Performing Test COMPILER_HAS_TARGET_ID_gfx1101
-- Performing Test COMPILER_HAS_TARGET_ID_gfx1101 - Success
-- Performing Test COMPILER_HAS_TARGET_ID_gfx1102
-- Performing Test COMPILER_HAS_TARGET_ID_gfx1102 - Success
Looking in ...
CMake Error at cmake/VerifyCompiler.cmake:35 (find_package):
  Could not find a package configuration file provided by "hip" with any of
  the following names:

    hipConfig.cmake
    hip-config.cmake

  Add the installation prefix of "hip" to CMAKE_PREFIX_PATH or set "hip_DIR"
  to a directory containing one of the above files.  If "hip" provides a
  separate development package or SDK, be sure it has been installed.
Call Stack (most recent call first):
  CMakeLists.txt:111 (include)


-- Configuring incomplete, errors occurred!
See also "/home/loong/Downloads/rocRAND/build/CMakeFiles/CMakeOutput.log".

To Reproduce

  1. cd rocRAND; mkdir build; cd build
  2. CXX=hipcc cmake -DBUILD_BENCHMARK=ON ../.

Build issues (module path, CUDA 9 and sm_20)

  • CMAKE_MODULE_PATH=/path/to/hip/ did not work (was my fault)
    • tried various path combinations with CMAKE_PREFIX_PATH and CMAKE_MODULE_PATH and HIP_DIR as suggested by cmake output... however, only a direct cmake -DHIP_PATH=/path/to/hip/ .. is working
  • when building with CUDA9, lowest CC with sm_20 yields an error (CUDA 9 no more supports Fermi)
    • can be fixed in cmake/SetupNVCC.cmake
    • would be nice, if there would be a cmake option to propagate NVGPU_TARGETS
      • Edit: ... make NVGPU_TARGETS as a cached variable like AMDGPU_TARGETS
Building NVCC (Device) object library/CMakeFiles/hiprand.dir/src/hiprand/hiprand_generated_hiprand_nvcc.cpp.o
nvcc fatal   : Value 'sm_20' is not defined for option 'gpu-architecture'

after fixing this, build succeeded.
Test run successfully on P100 and GV100.

Tools: gcc/5.3.0, CUDA9.1, HIP 1.5.18315

rocrand-config.cmake directory

rocrand is (as far as I know) the only library that installs its rocrand-config.cmake and related files in <rocm-dir>/lib/cmake/rocrand/rocrand/
All other libraries install to <rocm-dir>/lib/cmake/<roclib>/
Is there a special reason for that?

rocRAND/hipRAND 5.4.0 packages are missing symlinks

Describe the bug
The packages for hipRAND and rocRAND provided for ROCm 5.4.0 at https://repo.radeon.com/ are missing symlinks for hiprand-fortran-config.cmake and rocrand-fortran-config.cmake

To Reproduce
Install rocrand, try to find rocrand using CMake find_package(rocrand) with ROCRAND_PATH set to /opt/rocm-5.4.0/rocrand
For the installation, I used

wget https://repo.radeon.com/amdgpu-install/5.4/ubuntu/jammy/amdgpu-install_5.4.50400-1_all.deb \
    && apt-get install -y --no-install-recommends ./amdgpu-install_5.4.50400-1_all.deb \
    && apt-get update && apt-get install -y --no-install-recommends \
    rocm-dev hipblas-dev hipfft-dev hipsparse-dev rocfft-dev rocrand-dev rocsolver-dev rocthrust-dev roctracer-dev

Expected behavior
The find module succeeds.

Log-files

CMake Error at /opt/rocm-5.4.0/rocrand/lib/cmake/rocrand-config.cmake:92 (include):
  include could not find requested file:

    /opt/rocm-5.4.0/rocrand/lib/cmake/rocrand-fortran-config.cmake
Call Stack (most recent call first):
  cmake/hip.cmake:176 (find_package)
  CMakeLists.txt:101 (include)

Workaround
Add the symlinks via

cd /opt/rocm-5.4.0/hiprand/lib/cmake/ && ln -s ../../../lib/cmake/hiprand/hiprand-fortran-config.cmake
cd /opt/rocm-5.4.0/rocrand/lib/cmake/ && ln -s ../../../lib/cmake/rocrand/rocrand-fortran-config.cmake

Environment

=== environment


=== date
Sat Feb  4 23:41:26 UTC 2023


=== Linux Kernel
Linux 2e88cbd9c48a 4.18.0-408.el8.x86_64 #1 SMP Mon Jul 18 17:42:52 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux


=== rocm-smi

======================= ROCm System Management Interface =======================
================================= Concise Info =================================
GPU  Temp (DieEdge)  AvgPwr  SCLK    MCLK    Fan     Perf  PwrCap  VRAM%  GPU%  
0    30.0c           22.0W   808Mhz  350Mhz  20.78%  auto  250.0W    0%   0%    
================================================================================
============================= End of ROCm SMI Log ==============================


HIP version  : 5.4.22801-aaa1e3d8

== hipconfig
HIP_PATH     : /opt/rocm-5.4.0/hip
ROCM_PATH    : /opt/rocm-5.4.0
HIP_COMPILER : clang
HIP_PLATFORM : amd
HIP_RUNTIME  : rocclr
CPP_CONFIG   :  -D__HIP_PLATFORM_HCC__= -D__HIP_PLATFORM_AMD__= -I/opt/rocm-5.4.0/hip/include -I/opt/rocm-5.4.0/llvm/bin/../lib/clang/15.0.0 -I/opt/rocm-5.4.0/hsa/include

== hip-clang
HSA_PATH         : /opt/rocm-5.4.0/hsa
HIP_CLANG_PATH   : /opt/rocm-5.4.0/llvm/bin
AMD clang version 15.0.0 (https://github.com/RadeonOpenCompute/llvm-project roc-5.4.0 22465 d6f0fe8b22e3d8ce0f2cbd657ea14b16043018a5)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/rocm-5.4.0/llvm/bin
AMD LLVM version 15.0.0git
  Optimized build.
  Default target: x86_64-unknown-linux-gnu
  Host CPU: znver1

  Registered Targets:
    amdgcn - AMD GCN GPUs
    r600   - AMD GPUs HD2XXX-HD6XXX
    x86    - 32-bit X86: Pentium-Pro and above
    x86-64 - 64-bit X86: EM64T and AMD64
hip-clang-cxxflags :  -isystem "/opt/rocm-5.4.0/llvm/lib/clang/15.0.0/include/.." -isystem /opt/rocm-5.4.0/hsa/include -isystem "/opt/rocm-5.4.0/hip/include" -O3 --rocm-path=/opt/rocm-5.4.0
hip-clang-ldflags  :  -L"/opt/rocm-5.4.0/hip/lib" -O3 -lgcc_s -lgcc -lpthread -lm -lrt

=== Environment Variables
PATH=/opt/intel/oneapi/dpcpp-ct/2023.0.0/bin:/opt/intel/oneapi/vtune/2023.0.0/bin64:/opt/intel/oneapi/mkl/2023.0.0/bin/intel64:/opt/intel/oneapi/compiler/2023.0.0/linux/lib/oclfpga/bin:/opt/intel/oneapi/compiler/2023.0.0/linux/bin/intel64:/opt/intel/oneapi/compiler/2023.0.0/linux/bin:/opt/rocm-5.4.0/bin:/root/spack-env/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/root/spack/bin
CUDACXX=/root/spack/opt/spack/linux-ubuntu22.04-x86_64/gcc-11.3.0/cuda-11.8.0-bf5ocuyzznq57tji4taz2m3a6yuuvgoa/bin/nvcc
HIPCC_COMPILE_FLAGS_APPEND=--rocm-path=/opt/rocm-5.4.0
HIP_CLANG_PATH=/opt/rocm-5.4.0/llvm/bin
LD_LIBRARY_PATH=/opt/intel/oneapi/tbb/2021.8.0/lib/intel64/gcc4.8:/opt/intel/oneapi/mkl/2023.0.0/lib/intel64:/opt/intel/oneapi/compiler/2023.0.0/linux/lib:/opt/intel/oneapi/compiler/2023.0.0/linux/lib/x64:/opt/intel/oneapi/compiler/2023.0.0/linux/lib/oclfpga/host/linux64/lib:/opt/intel/oneapi/compiler/2023.0.0/linux/compiler/lib/intel64_lin
HIP_COMPILER=clang
HSA_PATH=/opt/rocm-5.4.0/hsa
HIP_PATH=/opt/rocm-5.4.0/hip
HIP_PLATFORM=amd
HIP_DEVICE_LIB_PATH=/opt/rocm-5.4.0/amdgcn/bitcode

== Linux Kernel
Hostname     : 2e88cbd9c48a
Linux 2e88cbd9c48a 4.18.0-408.el8.x86_64 #1 SMP Mon Jul 18 17:42:52 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Distributor ID: Ubuntu
Description:    Ubuntu 22.04.1 LTS
Release:        22.04
Codename:       jammy



=== rocminfo
ROCk module is loaded
=====================    
HSA System Attributes    
=====================    
Runtime Version:         1.1
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE                              
System Endianness:       LITTLE                             

==========               
HSA Agents               
==========               
*******                  
Agent 1                  
*******                  
  Name:                    AMD Ryzen Threadripper 1920X 12-Core Processor
  Uuid:                    CPU-XX                             
  Marketing Name:          AMD Ryzen Threadripper 1920X 12-Core Processor
  Vendor Name:             CPU                                
  Feature:                 None specified                     
  Profile:                 FULL_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        0(0x0)                             
  Queue Min Size:          0(0x0)                             
  Queue Max Size:          0(0x0)                             
  Queue Type:              MULTI                              
  Node:                    0                                  
  Device Type:             CPU                                
  Cache Info:              
    L1:                      32768(0x8000) KB                   
  Chip ID:                 0(0x0)                             
  ASIC Revision:           0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   3500                               
  BDFID:                   0                                  
  Internal Node ID:        0                                  
  Compute Unit:            24                                 
  SIMDs per CU:            0                                  
  Shader Engines:          0                                  
  Shader Arrs. per Eng.:   0                                  
  WatchPts on Addr. Ranges:1                                  
  Features:                None
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: FINE GRAINED        
      Size:                    65556416(0x3e84fc0) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    65556416(0x3e84fc0) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 3                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    65556416(0x3e84fc0) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
  ISA Info:                
*******                  
Agent 2                  
*******                  
  Name:                    gfx906                             
  Uuid:                    GPU-38ce408172dc76e5               
  Marketing Name:          AMD Radeon VII                     
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128(0x80)                          
  Queue Min Size:          64(0x40)                           
  Queue Max Size:          131072(0x20000)                    
  Queue Type:              MULTI                              
  Node:                    1                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      16(0x10) KB                        
    L2:                      8192(0x2000) KB                    
  Chip ID:                 26287(0x66af)                      
  ASIC Revision:           1(0x1)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   1801                               
  BDFID:                   17664                              
  Internal Node ID:        1                                  
  Compute Unit:            60                                 
  SIMDs per CU:            4                                  
  Shader Engines:          4                                  
  Shader Arrs. per Eng.:   1                                  
  WatchPts on Addr. Ranges:4                                  
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      TRUE                               
  Wavefront Size:          64(0x40)                           
  Workgroup Max Size:      1024(0x400)                        
  Workgroup Max Size per Dimension:
    x                        1024(0x400)                        
    y                        1024(0x400)                        
    z                        1024(0x400)                        
  Max Waves Per CU:        40(0x28)                           
  Max Work-item Per CU:    2560(0xa00)                        
  Grid Max Size:           4294967295(0xffffffff)             
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)             
    y                        4294967295(0xffffffff)             
    z                        4294967295(0xffffffff)             
  Max fbarriers/Workgrp:   32                                 
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    16760832(0xffc000) KB              
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 2                   
      Segment:                 GROUP                              
      Size:                    64(0x40) KB                        
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Alignment:         0KB                                
      Accessible by all:       FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-amd-amdhsa--gfx906:sramecc+:xnack-
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)             
        y                        4294967295(0xffffffff)             
        z                        4294967295(0xffffffff)             
      FBarrier Max Size:       32                                 
*** Done ***             


=== lspci VGA
08:00.0 VGA compatible controller: NVIDIA Corporation GP102 [TITAN X] (rev a1)
45:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Vega 20 [Radeon VII] (rev c1)

Tests fail on AMD MI25, ROCm 1.6.4

rocRAND's tests test_hiprand_kernel, test_hiprand_api, and test_rocrand_kernel_philox4x32_10 randomly fail on AMD MI25 on ROCm 1.6.4. Mentioned tests don't fail on ROCm 1.6.3 and on CUDA 8/9. As far as we know right now, they also don't fail on any other device on ROCm 1.6.4. Currently, we suspect the problem is in ROCm, not it rocRAND.

After investigation we think it's some kind of synchronisation bug which shows itself only in very specific situations. Until it's fixed you can use temporary workarounds from branch rocm_164_mi25_workarounds.

Most of the features (including the most popular ones) are not / should not be affected by this bug.

Environment

Hardware:

  • AMD Radeon Instinct MI25
Software version
ROCm 1.6.4
HIP 1.3.17385
HCC clang version 6.0.0 (based on HCC 1.0.17412-f590a25-821e6d8-64e7fc7)
rocRAND master (452ef66)

Workarounds

The possible workarounds for this bug are:

  • adding additional synchronization after kernels and before copying the memory (as presented in branch rocm_164_mi25_workarounds; you can try using hipStreamWaitEvent() or hipStreamSynchronize() which should have less impact on performance),
  • setting environment variable HCC_OPT_FLUSH to 0, or
  • setting HIP_LAUNCH_BLOCKING to 1.

Please comment if you have problems applying the workarounds, or experience similar bug in a different place or on a different device.

hip_hcc.so dependency issue when instaill *.rpm

Environment: CentOS 7,4 + gcc 7.4 + Vega10 + ROCM 1.7.x

trying to install rocrand *.rpm, with the libhip_hcc.so()(64bit) dependency unresolved.
No such issue on rocFFT which does not depend on libhip_hcc.so but hip_hcc >= 1.3.

Is hip_hcc.so a CUDA compatible library? see here
" I tensorflow/stream_executor/dso_loader.cc:139] successfully opened CUDA library libhip_hcc.so locally" in https://github.com/ROCmSoftwarePlatform/hiptensorflow/issues/28

=================================
[root@af5c7d5e5ccb build]# yum install rocrand-1.7.1-1.x86_64.rpm
Loaded plugins: fastestmirror, ovl
Examining rocrand-1.7.1-Linux.rpm: rocrand-1.7.1-1.x86_64
Marking rocrand-1.7.1-Linux.rpm to be installed
Resolving Dependencies
--> Running transaction check
---> Package rocrand.x86_64 0:1.7.1-1 will be installed
--> Processing Dependency: libhip_hcc.so()(64bit) for package: rocrand-1.7.1-1.x86_64
Loading mirror speeds from cached hostfile

  • base: pubmirrors.dal.corespace.com
  • epel: mirror.compevo.com
  • extras: centos.mirror.lstn.net
  • updates: mirror.hackingand.coffee
    --> Finished Dependency Resolution
    Error: Package: rocrand-1.7.1-1.x86_64 (/rocrand-1.7.1-Linux)
    Requires: libhip_hcc.so()(64bit)

    You could try using --skip-broken to work around the problem
    You could try running: rpm -Va --nofiles --nodigest

=================================

[root@af5c7d5e5ccb build]# yum deplist rocrand-1.7.1-1.x86_64.rpm
Loaded plugins: fastestmirror, ovl
Loading mirror speeds from cached hostfile

  • base: pubmirrors.dal.corespace.com
  • epel: mirror.compevo.com
  • extras: centos.mirror.lstn.net
  • updates: mirror.hackingand.coffee
    package: rocrand.x86_64 1.7.1-1
    dependency: /bin/sh
    provider: bash.x86_64 4.2.46-29.el7_4
    dependency: libc.so.6()(64bit)
    provider: glibc.x86_64 2.17-196.el7_4.2
    dependency: libc.so.6(GLIBC_2.14)(64bit)
    provider: glibc.x86_64 2.17-196.el7_4.2
    dependency: libc.so.6(GLIBC_2.2.5)(64bit)
    provider: glibc.x86_64 2.17-196.el7_4.2
    dependency: libc.so.6(GLIBC_2.4)(64bit)
    provider: glibc.x86_64 2.17-196.el7_4.2
    dependency: libdl.so.2()(64bit)
    provider: glibc.x86_64 2.17-196.el7_4.2
    dependency: libdl.so.2(GLIBC_2.2.5)(64bit)
    provider: glibc.x86_64 2.17-196.el7_4.2
    dependency: libgcc_s.so.1()(64bit)
    provider: libgcc.x86_64 4.8.5-16.el7_4.2
    dependency: libgcc_s.so.1(GCC_3.0)(64bit)
    provider: libgcc.x86_64 4.8.5-16.el7_4.2
    dependency: libhc_am.so()(64bit)
    provider: hcc.x86_64 1.2.18054-1
    dependency: libhip_hcc.so()(64bit)
    Unsatisfied dependency

[root@af5c7d5e5ccb rocFFT_BUILD_TOOLSET7]# yum deplist rocfft-0.8.1.0-Linux.rpm
Loaded plugins: fastestmirror, ovl
Loading mirror speeds from cached hostfile

  • base: pubmirrors.dal.corespace.com
  • epel: mirror.compevo.com
  • extras: centos.mirror.lstn.net
  • updates: mirror.hackingand.coffee
    package: rocfft.x86_64 0.8.1.0-1
    dependency: /bin/sh
    provider: bash.x86_64 4.2.46-29.el7_4
    dependency: hip_hcc >= 1.3
    provider: hip_hcc.x86_64 1.5.18092-1

can not compile on Nvidia platform

I tried to compile to rocRAND on Nvidia platform, but I get the following error with make

 CMake Error at cmake/VerifyCompiler.cmake:34 (message):
   On ROCm platform 'hcc' or 'clang' must be used as C++ compiler.
 Call Stack (most recent call first):
   CMakeLists.txt:50 (include)

Should I always use clang? Can I use gnu?

Invalid use of inline assembly for GFX1010 target

Describe the bug

When building PyTorch the following error is observed:

[ 86%] Building HIPCC object caffe2/CMakeFiles/torch_hip.dir/__/aten/src/THHUNN/torch_hip_generated_RReLU.hip.o
<inline asm>:1:24: error: invalid operand for instruction
        v_mad_u64_u32 v[2:3], s[10:11], s1, v42, v[8:9]
                              ^
note: !srcloc = 13700931
<inline asm>:1:24: error: invalid operand for instruction
        v_mad_u64_u32 v[4:5], s[10:11], s46, v42, v[8:9]

I traced this down to use of inline assembly in rocRAND:

/opt/rocm-3.7.0/rocrand/include/rocrand_common.h
61:    asm volatile("v_mad_u64_u32 %0, %1, %2, %3, %4"

If I comment out the above line, the PyTorch build proceeds.

To Reproduce
Build PyTorch with ROCm, see ROCm/pytorch#718

Expected behavior
Build succeeds.

rocrand_set_offset unexpected behaviour

Describe the bug
rorand_set_offset seems to change the seed of the generator. The numbers produced by a generator with and without offset with the same seed are completely different.

To Reproduce
Steps to reproduce the behavior:

  1. Install rocrand-dev4.5.2 and hip-dev4.5.2 version 4.5.2 using the package repositories

  2. Compile the attached reproducer (reprod.txt) with the following line:
    /opt/rocm-4.5.2/hip/bin/hipcc -I/opt/rocm-4.5.2/rocrand/include/ -L/opt/rocm-4.5.2/rocrand/lib/ -lrocrand test.cpp

  3. See that in the output no offset can be archived, but the two lists of random numbers seem to be completely independent:

0.0225561:0.750519
0.129137:0.628226
0.805372:0.446292
0.974561:0.55909
0.109374:0.951059
0.471769:0.231839
0.920535:0.930325
0.731697:0.620792
0.33033:0.228607
0.2921:0.888079

Expected behavior
Based on the description here, I would expect that the second list of random numbers generated is the same as the original one, but with a certain offset. For example in this case I would expect something like:

0.0225561:0.00121
0.129137:0.0225561
0.805372:0.129137
0.974561:0.805372
0.109374:0.974561
0.471769:0.109374
0.920535:0.471769
0.731697:0.920535
0.33033:0.731697
0.2921:0.33033

Log-files
out.txt

Environment
environment.txt

Additional context
This issue was encountered while working on the rocrand backend for oneMKL.

Create symbolic links under /opt/rocm in the installer

Symbolic links for rocrand & hiprand libraries and headers are missing under /opt/rocm/lib and /opt/rocm/include . Like other roc libraries such as rocBLAS, we need to create symbolic links for hiprand and rocrand under /opt/rocm during installation from the .deb package etc.

make -j24 problem

collect2: error: ld returned 1 exit status
test/CMakeFiles/test_hiprand_cpp_wrapper.dir/build.make:467: recipe for target 'test/test_hiprand_cpp_wrapper' failed
make[2]: *** [test/test_hiprand_cpp_wrapper] Error 1
CMakeFiles/Makefile2:1174: recipe for target 'test/CMakeFiles/test_hiprand_cpp_wrapper.dir/all' failed
make[1]: *** [test/CMakeFiles/test_hiprand_cpp_wrapper.dir/all] Error 2

Test 4 hangs

  • ubuntu 16.04
  • cmake 3.9.4
  • hcc 7.0.0

I cloned the rocRAND repo, and tried to build is as per instructions.
For some reason it was trying to include CUDA. E.g. various options in library/CMakeLists.txt [lines 35 and 50]

So I used:
CXX=/opt/rocm/bin/hcc cmake -DBUILD_BENCHMARK=ON -DHIP_PLATFORM=hcc ../.

Then I tried the tests:

Test project /home/derek/project/rocRAND/build
      Start  1: test_log_normal_distribution
 1/26 Test  #1: test_log_normal_distribution ........   Passed    0.05 sec
      Start  2: test_normal_distribution
 2/26 Test  #2: test_normal_distribution ............   Passed    0.05 sec
      Start  3: test_poisson_distribution
 3/26 Test  #3: test_poisson_distribution ...........   Passed    6.29 sec
      Start  4: test_rocrand_basic

... and it never returns.

Makefile:149: recipe for target 'all' failed

Describe the bug
make failed while installing rocRAND. I tried installing the package following instructions from here.
The following error has been encountered while running make

[ 85%] Linking CXX shared library libhiprand.so
/opt/rocm-3.3.0/hcc/bin/llc: error: /opt/rocm-3.3.0/hcc/bin/llc: /tmp/tmp.NIbV50p6jg/hiprand_hcc.cpp.e7318d0902a31816a057156d3b2e6fdd.kernel.bc-gfx906.isabin.opt.bc: error: Could not open input file: No such file or directory
Generating AMD GCN kernel failed in llc for target: gfx906
clang-10: error: linker command failed with exit code 1 (use -v to see invocation)
library/CMakeFiles/hiprand.dir/build.make:99: recipe for target 'library/libhiprand.so.1.1' failed
make[2]: *** [library/libhiprand.so.1.1] Error 1
CMakeFiles/Makefile2:257: recipe for target 'library/CMakeFiles/hiprand.dir/all' failed
make[1]: *** [library/CMakeFiles/hiprand.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
[ 92%] Linking CXX executable benchmark_rocrand_generate
/opt/rocm-3.3.0/hcc/bin/clamp-device: line 219: /tmp/tmp.eytZhCk9Ww/benchmark_rocrand_generate.cpp.219832756f3388cfc5fc0906ea98c5aa.kernel.bc-gfx906.isabin.linked.bc: No such file or directory
Generating AMD GCN kernel failed in HCC-specific opt passes for target: gfx906
clang-10: error: linker command failed with exit code 1 (use -v to see invocation)
benchmark/CMakeFiles/benchmark_rocrand_generate.dir/build.make:99: recipe for target 'benchmark/benchmark_rocrand_generate' failed
make[2]: *** [benchmark/benchmark_rocrand_generate] Error 1
  akeFiles/Makefile2:351: recipe for target 'benchmark/CMakeFiles/benchmark_rocrand_generate.dir/all' failed
make[1]: *** [benchmark/CMakeFiles/benchmark_rocrand_generate.dir/all] Error 2
[100%] Linking CXX executable benchmark_rocrand_kernel
[100%] Built target benchmark_rocrand_kernel
Makefile:149: recipe for target 'all' failed

To Reproduce
The instructions used to install

git clone https://github.com/ROCmSoftwarePlatform/rocRAND.git
cd rocRAND; mkdir build; cd build
CMAKE_CXX_COMPILER=/opt/rocm/hcc/bin/hcc cmake -DBUILD_BENCHMARK=ON ../. 
HCC_AMDGPU_TARGET=gfx906 make -j4

Environment

=== environment


=== date
Mon Apr 13 07:44:19 UTC 2020


=== Linux Kernel
Linux ******************** 4.15.0-96-generic #97~16.04.1-Ubuntu SMP Wed Apr 1 03:03:31 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux


=== rocm-smi

========================ROCm System Management Interface========================
================================================================================
GPU  Temp   AvgPwr  SCLK    MCLK    Fan     Perf  PwrCap  VRAM%  GPU%
0    32.0c  18.0W   808Mhz  350Mhz  21.96%  auto  250.0W    0%   0%
================================================================================
==============================End of ROCm SMI Log ==============================


HIP version  : 3.3.20126-2dbba46b

== hipconfig
HIP_PATH     : /opt/rocm/hip
HIP_PLATFORM : hcc
CPP_CONFIG   :  -D__HIP_PLATFORM_HCC__=   -I/opt/rocm/hip/include -I/opt/rocm/hcc/include -I/opt/rocm/hsa/include

== hcc
HSA_PATH     : /opt/rocm/hsa
HCC_HOME     : /opt/rocm/hcc
HCC clang version 10.0.0 (/data/jenkins-workspace/compute-rocm-rel-3.3/external/hcc-tot/llvm-project/clang 1ce0fe5e88b2124494b9500817b4c2c66bdfa5aa) (based on HCC 3.1.20114-6776c83f-1ce0fe5e88b )
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/rocm/hcc/bin
LLVM (http://llvm.org/):
  LLVM version 10.0.0git
  Optimized build.
  Default target: x86_64-unknown-linux-gnu
  Host CPU: skylake

  Registered Targets:
    amdgcn - AMD GCN GPUs
    r600   - AMD GPUs HD2XXX-HD6XXX
    x86    - 32-bit X86: Pentium-Pro and above
    x86-64 - 64-bit X86: EM64T and AMD64
HCC-cxxflags :  -hc -std=c++amp -I/opt/rocm/hcc/includeHCC-ldflags  :  -hc -std=c++amp -L/opt/rocm/hcc/lib -Wl,--rpath=/opt/rocm/hcc/lib -ldl -lm -lpthread -lhc_am -Wl,--whole-archive -lmcwamp -Wl,--no-whole-archive

=== Environment Variables
PATH=/usr/local/bin:/opt/rocm/hcc/bin:/opt/rocm/hip/bin:/opt/rocm/bin:/opt/rocm/opencl/bin/x86_64:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/opt/rocm/bin:/opt/rocm/profiler/bin:/opt/rocm/opencl/bin/x86_64
LD_LIBRARY_PATH=:/opt/rocm/opencl/lib/x86_64
HIP_PLATFORM=hcc
HIP_PATH=/opt/rocm/hip
HCC_AMDGPU_TARGET=gfx906
HIP_VISIBLE_DEVICES=0
HCC_HOME=/opt/rocm/hcc

== Linux Kernel
Hostname    : **************
Linux ******************** 4.15.0-96-generic #97~16.04.1-Ubuntu SMP Wed Apr 1 03:03:31 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
LSB Version:	core-9.20160110ubuntu0.2-amd64:core-9.20160110ubuntu0.2-noarch:security-9.20160110ubuntu0.2-amd64:security-9.20160110ubuntu0.2-noarch
Distributor ID:	Ubuntu
Description:	Ubuntu 16.04.6 LTS
Release:	16.04
Codename:	xenial



=== rocminfo
ROCk module is loaded
root is member of video group
=====================
HSA System Attributes
=====================
Runtime Version:         1.1
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE
System Endianness:       LITTLE

==========
HSA Agents
==========
*******
Agent 1
*******
  Name:                    Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
  Marketing Name:          Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
  Vendor Name:             CPU
  Feature:                 None specified
  Profile:                 FULL_PROFILE
  Float Round Mode:        NEAR
  Max Queue Number:        0(0x0)
  Queue Min Size:          0(0x0)
  Queue Max Size:          0(0x0)
  Queue Type:              MULTI
  Node:                    0
  Device Type:             CPU
  Cache Info:
    L1:                      32768(0x8000) KB
  Chip ID:                 0(0x0)
  Cacheline Size:          64(0x40)
  Max Clock Freq. (MHz):   4200
  BDFID:                   0
  Internal Node ID:        0
  Compute Unit:            8
  SIMDs per CU:            0
  Shader Engines:          0
  Shader Arrs. per Eng.:   0
  WatchPts on Addr. Ranges:1
  Features:                None
  Pool Info:
    Pool 1
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    32877628(0x1f5ac3c) KB
      Allocatable:             TRUE
      Alloc Granule:           4KB
      Alloc Alignment:         4KB
      Acessible by all:        TRUE
    Pool 2
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED
      Size:                    32877628(0x1f5ac3c) KB
      Allocatable:             TRUE
      Alloc Granule:           4KB
      Alloc Alignment:         4KB
      Acessible by all:        TRUE
  ISA Info:
    N/A
*******
Agent 2
*******
  Name:                    gfx906
  Marketing Name:          Vega 20
  Vendor Name:             AMD
  Feature:                 KERNEL_DISPATCH
  Profile:                 BASE_PROFILE
  Float Round Mode:        NEAR
  Max Queue Number:        128(0x80)
  Queue Min Size:          4096(0x1000)
  Queue Max Size:          131072(0x20000)
  Queue Type:              MULTI
  Node:                    1
  Device Type:             GPU
  Cache Info:
    L1:                      16(0x10) KB
  Chip ID:                 26287(0x66af)
  Cacheline Size:          64(0x40)
  Max Clock Freq. (MHz):   1801
  BDFID:                   768
  Internal Node ID:        1
  Compute Unit:            60
  SIMDs per CU:            4
  Shader Engines:          4
  Shader Arrs. per Eng.:   1
  WatchPts on Addr. Ranges:4
  Features:                KERNEL_DISPATCH
  Fast F16 Operation:      FALSE
  Wavefront Size:          64(0x40)
  Workgroup Max Size:      1024(0x400)
  Workgroup Max Size per Dimension:
    x                        1024(0x400)
    y                        1024(0x400)
    z                        1024(0x400)
  Max Waves Per CU:        40(0x28)
  Max Work-item Per CU:    2560(0xa00)
  Grid Max Size:           4294967295(0xffffffff)
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)
    y                        4294967295(0xffffffff)
    z                        4294967295(0xffffffff)
  Max fbarriers/Workgrp:   32
  Pool Info:
    Pool 1
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED
      Size:                    16760832(0xffc000) KB
      Allocatable:             TRUE
      Alloc Granule:           4KB
      Alloc Alignment:         4KB
      Acessible by all:        FALSE
    Pool 2
      Segment:                 GROUP
      Size:                    64(0x40) KB
      Allocatable:             FALSE
      Alloc Granule:           0KB
      Alloc Alignment:         0KB
      Acessible by all:        FALSE
  ISA Info:
    ISA 1
      Name:                    amdgcn-amd-amdhsa--gfx906
      Machine Models:          HSA_MACHINE_MODEL_LARGE
      Profiles:                HSA_PROFILE_BASE
      Default Rounding Mode:   NEAR
      Default Rounding Mode:   NEAR
      Fast f16:                TRUE
      Workgroup Max Size:      1024(0x400)
      Workgroup Max Size per Dimension:
        x                        1024(0x400)
        y                        1024(0x400)
        z                        1024(0x400)
      Grid Max Size:           4294967295(0xffffffff)
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)
        y                        4294967295(0xffffffff)
        z                        4294967295(0xffffffff)
      FBarrier Max Size:       32
*** Done ***


=== lspci VGA
03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Vega 20 (rev c1)

Investigating performance issues in philox

copying @gargrahul

@jszuppe could you or someone help here? One of the teams is investigating some performance regression issues in rocRAND with changes in compiler (hcc/hip in ROCm). When running (fox example): benchmark_rocrand_kernel --engine philox --dis uniform-float
we are seeing slowdowns. Could you point to the source kernels that are launched with this command? Is it somewhere here:
https://github.com/ROCmSoftwarePlatform/rocRAND/blob/master/library/src/rng/philox4x32_10.hpp
Also explain some top level details about the kernel, and the kernel launch parameters?

It would also help if you could give some tips/ways to simplify the kernel as we track down the perf issues to compiler changes. Thanks

Please enable two factor authentication in your github account

@VincentSC;@sbalint98;@Maetveis;@AJcodes;@neon60;@nolmoonen

We are going to enforce two factor authentication in (https://github.com/ROCmSoftwarePlatform/) organization on 29th April, 2022 .
Since we identified you as outside collaborator for ROCmSoftwarePlatform organization, you need to enable two factor authentication in your github account else you shall be removed from the organization after the enforcement.
Please skip if already done.

To set up two factor authentication, please go through the steps in below link:

https://docs.github.com/en/free-pro-team@latest/github/authenticating-to-github/configuring-two-factor-authentication

Please email "[email protected]" for queries

Setting up ROCM issue; ImportError: libhiprand.so.1 ; Specify CUDA_TOOLKIT_ROOT_DIR

I have AMD GPU. Trying to install ROCM to utilize tensorflow:

after installing ROCM. Python IDE provides the following:

ImportError: libhiprand.so.1: cannot open shared object file: No such file or directory

To solve this issue, I read i need to install RocRand to solve the issue but I get the following. I do not use Invidia GPU.

CMake Error at /usr/share/cmake-3.10/Modules/FindCUDA.cmake:682 (message):
Specify CUDA_TOOLKIT_ROOT_DIR
Call Stack (most recent call first):
cmake/SetupNVCC.cmake:72 (find_package)
cmake/VerifyCompiler.cmake:28 (include)
CMakeLists.txt:49 (include)

-- Configuring incomplete, errors occurred!
See also "/home/c/Downloads/rocRAND-master/build/CMakeFiles/CMakeOutput.log".

When I run "hipconfig": bash: hipconfig: command not found

StackGuardSlot compile errors independent of hcc version

Several people have reported (#40 #41 ROCm/hcc#999 )
that rocRAND does not compile under ArchLinux. This error occurred for @baerbock , @Palmitoxico , ArchLinux packager Okoล„ski and me. We have used different versions of hip+hcc (2.0 / 2.1 and master) for multiple times, but all failed with the same error.

I'm using latest hcc & hip (HCC clang version 9.0.0 (https://github.com/RadeonOpenCompute/hcc-clang-upgrade.git 2d7f3a3c8f385c0aba115a6eed3ca96dd0b289e9) (https://github.com/RadeonOpenCompute/llvm.git 5f38a9683361416cfecc3e9c55f8c48dc5d5a041) (based on HCC 1.3.19064-46916709-2d7f3a3c8f-5f38a968336 )
HIP version: 1.5.19064), which @ex-rzr confirms as working, but it won't compile.

There must be a cause for this OUTSIDE of hip & hcc/clang.

Performance regression in mtgp32 with uniform-double

Merging pull request #71 (Vega20 changes) has caused a performance regression in one of the benchmarks. Performance drop happens for mtgp32 generator using uniform-double.

Benchmark before:
mtgp32:
uniform-double:
Throughput = 555.562 GB/s, Samples = 69.445 GSample/s, AvgTime (1 trial) = 1.800 ms, Time (all) = 36.000 ms, Size = 134217728

Benchmark after:
mtgp32:
uniform-double:
Throughput = 397.233 GB/s, Samples = 49.654 GSample/s, AvgTime (1 trial) = 2.517 ms, Time (all) = 50.348 ms, Size = 134217728

Build error : no member named 'data' in 'uint4'

When I try to build the rocRAND with the command:

cmake -DBUILD_BENCHMARK=OFF -DBUILD_TEST=OFF ..

and the summary is :

-- ******** Summary ********
-- General:
-- System : Linux
-- HIP ROOT : /opt/rocm/hip
-- C++ compiler : /opt/rocm/bin/hcc
-- C++ compiler version : 7.0.0
-- CXX flags : -Wno-unused-command-line-argument -Wall -Wextra
-- Build type : Release
-- Install prefix : /opt/rocm

-- BUILD_SHARED_LIBS : ON
-- BUILD_FORTRAN_WRAPPER : OFF
-- BUILD_TEST : OFF
-- BUILD_BENCHMARK : OFF

I meet the problem :

[ 70%] Building CXX object library/CMakeFiles/rocrand.dir/src/rocrand.cpp.o
In file included from /work/home/wangzh/rocRAND/library/src/rocrand.cpp:23:
In file included from /work/home/wangzh/rocRAND/library/src/rng/generators.hpp:24:
In file included from /work/home/wangzh/rocRAND/library/src/rng/philox4x32_10.hpp:62:
In file included from /work/home/wangzh/rocRAND/library/src/rng/device_engines.hpp:32:
In file included from /work/home/wangzh/rocRAND/library/include/rocrand_kernel.h:29:
/work/home/wangzh/rocRAND/library/include/rocrand_philox4x32_10.h:199:47: error: no member named 'data' in 'uint4'
unsigned int ret = m_state.result.data[m_state.substate];

Must I open the BENCHMARK and TEST? because I cannot connect to the Internet, so I close it.

rocRAND build with rocm 1.7?

Hi,

On the same machine, when rocm 1.6 is installed, rocRAND can be built. When rocm 1.7 is installed, the rocRAND build keeps failing.

Are there any extra steps other than the instructions on the github page that I should take to build rocRAND with rocm 1.7?

Thanks,
Qiyu

Unknown option `--amdgpu-target`

Describe the bug
The option --amdgpu-target introduced in #156 is unsupported by clang-13 from llvm-amdgpu 4.3.0.

To Reproduce
Steps to reproduce the behavior:

  1. git clone -b rocm-4.3.0 [email protected]:ROCmSoftwarePlatform/rocRAND.git
  2. CXX=/opt/rocm/hip/bin/hipcc cmake -B build
  3. make -C build -j

Expected behavior
Successful linking of librocrand.so.

Log-files

-- The CXX compiler identification is ROCMClang 4.3.21314
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /opt/rocm/llvm/bin/clang++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Setting build type to 'Release' as none was specified.
-- Found Git: /usr/bin/git (found version "2.32.0") 
-- Performing Test HAVE_gfx803
-- Performing Test HAVE_gfx803 - Success
-- Performing Test HAVE_gfx900:xnack-
-- Performing Test HAVE_gfx900:xnack- - Success
-- Performing Test HAVE_gfx906:xnack-
-- Performing Test HAVE_gfx906:xnack- - Success
-- Performing Test HAVE_gfx908:xnack-
-- Performing Test HAVE_gfx908:xnack- - Success
-- Performing Test HAVE_gfx90a:xnack-
-- Performing Test HAVE_gfx90a:xnack- - Success
-- Performing Test HAVE_gfx90a:xnack+
-- Performing Test HAVE_gfx90a:xnack+ - Success
-- Performing Test HAVE_gfx1030
-- Performing Test HAVE_gfx1030 - Success
-- ROCclr at /opt/rocm/lib/cmake/rocclr
-- hip::amdhip64 is SHARED_LIBRARY

*******************************************************************************
*----------------------------------- ERROR -----------------------------------*
* The variable 'CMAKE_CXX_FLAGS' should only be set by the cmake toolchain,
* either by calling 'cmake -DCMAKE_CXX_FLAGS="-Wall -g -march=native -O2 -Wall -Wextra"' or
* set in a toolchain file and added with
* 'cmake -DCMAKE_TOOLCHAIN_FILE=<toolchain-file>'.
*-----------------------------------------------------------------------------*
*******************************************************************************

CMake Warning at /opt/rocm/share/rocm/cmake/ROCMChecks.cmake:41 (message):
  The toolchain variable 'CMAKE_CXX_FLAGS' is modified in the CMakeLists.txt.
Call Stack (most recent call first):
  CMakeLists.txt:9223372036854775807 (rocm_check_toolchain_var)
  CMakeLists.txt:96 (set)


-- 
-- ******** Summary ********
-- General:
--   System                     : Linux
--   HIP ROOT                   : 
--   C++ compiler               : /opt/rocm/llvm/bin/clang++
--   C++ compiler version       : 
--   CXX flags                  : -Wall -g -march=native -O2 -Wall -Wextra
--   Build type                 : Release
--   Install prefix             : /opt/rocm
--   Device targets             : gfx803;gfx900:xnack-;gfx906:xnack-;gfx908:xnack-;gfx90a:xnack-;gfx90a:xnack+;gfx1030
-- 
--   BUILD_SHARED_LIBS          : ON
--   BUILD_FORTRAN_WRAPPER      : OFF
--   BUILD_TEST                 : OFF
--   BUILD_BENCHMARK            : OFF
--   DEPENDENCIES_FORCE_DOWNLOAD: 
-- Configuring done
-- Generating done
-- Build files have been written to: /tmp/rocRAND/build
make: Entering directory '/tmp/rocRAND/build'
make[1]: Entering directory '/tmp/rocRAND/build'
make[2]: Entering directory '/tmp/rocRAND/build'
make[2]: Entering directory '/tmp/rocRAND/build'
make[2]: Entering directory '/tmp/rocRAND/build'
make[2]: Entering directory '/tmp/rocRAND/build'
make[2]: Leaving directory '/tmp/rocRAND/build'
make[2]: Leaving directory '/tmp/rocRAND/build'
make[2]: Leaving directory '/tmp/rocRAND/build'
make[2]: Leaving directory '/tmp/rocRAND/build'
make[2]: Entering directory '/tmp/rocRAND/build'
make[2]: Entering directory '/tmp/rocRAND/build'
make[2]: Entering directory '/tmp/rocRAND/build'
make[2]: Entering directory '/tmp/rocRAND/build'
[ 10%] Building CXX object tools/CMakeFiles/sobol_direction_vector_generator.dir/sobol_direction_vector_generator.cpp.o
[ 20%] Building CXX object tools/CMakeFiles/mrg32k3a_precomputed_generator.dir/mrg32k3a_precomputed_generator.cpp.o
[ 30%] Building CXX object library/CMakeFiles/rocrand.dir/src/rocrand.cpp.o
[ 40%] Building CXX object tools/CMakeFiles/xorwow_precomputed_generator.dir/xorwow_precomputed_generator.cpp.o
clang-13: warning: argument unused during compilation: '-amdgpu-function-calls=false' [-Wunused-command-line-argument]
[ 50%] Linking CXX executable mrg32k3a_precomputed_generator
[ 60%] Linking CXX executable sobol_direction_vector_generator
[ 70%] Linking CXX executable xorwow_precomputed_generator
make[2]: Leaving directory '/tmp/rocRAND/build'
[ 70%] Built target mrg32k3a_precomputed_generator
make[2]: Leaving directory '/tmp/rocRAND/build'
make[2]: Leaving directory '/tmp/rocRAND/build'
[ 70%] Built target sobol_direction_vector_generator
[ 70%] Built target xorwow_precomputed_generator
In file included from /tmp/rocRAND/library/src/rocrand.cpp:23:
In file included from /tmp/rocRAND/library/src/rng/generators.hpp:24:
In file included from /tmp/rocRAND/library/src/rng/philox4x32_10.hpp:61:
In file included from /tmp/rocRAND/library/src/rng/common.hpp:28:
/tmp/rocRAND/library/include/rocrand_common.h:65:13: warning: unknown pragma ignored [-Wunknown-pragmas]
    #pragma warning "Disabled inline asm, because the build target does not support it."
            ^
1 warning generated when compiling for gfx1030.
In file included from /tmp/rocRAND/library/src/rocrand.cpp:23:
In file included from /tmp/rocRAND/library/src/rng/generators.hpp:24:
In file included from /tmp/rocRAND/library/src/rng/philox4x32_10.hpp:61:
In file included from /tmp/rocRAND/library/src/rng/common.hpp:28:
/tmp/rocRAND/library/include/rocrand_common.h:65:13: warning: unknown pragma ignored [-Wunknown-pragmas]
    #pragma warning "Disabled inline asm, because the build target does not support it."
            ^
1 warning generated when compiling for gfx90a.
In file included from /tmp/rocRAND/library/src/rocrand.cpp:23:
In file included from /tmp/rocRAND/library/src/rng/generators.hpp:24:
In file included from /tmp/rocRAND/library/src/rng/philox4x32_10.hpp:61:
In file included from /tmp/rocRAND/library/src/rng/common.hpp:28:
/tmp/rocRAND/library/include/rocrand_common.h:65:13: warning: unknown pragma ignored [-Wunknown-pragmas]
    #pragma warning "Disabled inline asm, because the build target does not support it."
            ^
1 warning generated when compiling for gfx90a.
In file included from /tmp/rocRAND/library/src/rocrand.cpp:23:
In file included from /tmp/rocRAND/library/src/rng/generators.hpp:24:
In file included from /tmp/rocRAND/library/src/rng/philox4x32_10.hpp:61:
In file included from /tmp/rocRAND/library/src/rng/common.hpp:28:
/tmp/rocRAND/library/include/rocrand_common.h:65:13: warning: unknown pragma ignored [-Wunknown-pragmas]
    #pragma warning "Disabled inline asm, because the build target does not support it."
            ^
1 warning generated when compiling for host.
[ 80%] Linking CXX shared library librocrand.so
clang-13: error: unsupported option '--amdgpu-target=gfx803'
clang-13: error: unsupported option '--amdgpu-target=gfx900:xnack-'
clang-13: error: unsupported option '--amdgpu-target=gfx906:xnack-'
clang-13: error: unsupported option '--amdgpu-target=gfx908:xnack-'
clang-13: error: unsupported option '--amdgpu-target=gfx90a:xnack-'
clang-13: error: unsupported option '--amdgpu-target=gfx90a:xnack+'
clang-13: error: unsupported option '--amdgpu-target=gfx1030'
make[2]: *** [library/CMakeFiles/rocrand.dir/build.make:100: library/librocrand.so.1.1] Error 1
make[2]: Leaving directory '/tmp/rocRAND/build'
make[1]: *** [CMakeFiles/Makefile2:226: library/CMakeFiles/rocrand.dir/all] Error 2
make[1]: Leaving directory '/tmp/rocRAND/build'
make: *** [Makefile:156: all] Error 2
make: Leaving directory '/tmp/rocRAND/build'

Environment
See attachment.
environment.txt

Additional context
ROCm 4.3.0 was compiled from source on Arch Linux with cmake 3.21.1.

The different compile result using rocRAND and hipRAND

When I port a program using hip, I find a strange problem

#include <iostream>
using namespace std;
#include <hip/hip_runtime.h>
#include <hiprand.h>
#include <hiprand_kernel.h>
#include <rocrand_kernel.h>

__global__ void hip_kernel_randtest()
{
    hiprandState States;
    float2 data1;
    rocrand_init(1234, 100, 0, &States);
    data1= rocrand_normal2(&States);
}

__global__ void hip_kernel_randtest2()
{
    hiprandState States;
    float2 data1;
    hiprand_init(1234, 100, 0, &States);
    data1= hiprand_normal2(&States);
}





int main(int argc, char *argv[])
{

    hipLaunchKernelGGL(hip_kernel_randtest,128,128,0,0);
    hipLaunchKernelGGL(hip_kernel_randtest2,128,128,0,0);
    return 0;
}

When I compile the above program, the rocrand API is normal but the hiprand API have the problem:

In file included from test_hiprand.cpp:6:
In file included from /opt/rocm/hiprand/include/hiprand_kernel.h:58:
/opt/rocm/hiprand/include/hiprand_kernel_hcc.h:471:5: error: static_assert failed due to requirement 'detail::is_any_of<hiprandState,
      hiprandStateXORWOW_t, hiprandStatePhilox4_32_10_t, hiprandStateMRG32k3a_t>::value' "Used StateType is not supported"
    static_assert(
    ^
test_hiprand.cpp:22:9: note: in instantiation of function template specialization 'hiprand_normal2<hiprandState>' requested here
        data1= hiprand_normal2(&States);
               ^
1 error generated.

According to the introduction, the hip api may use the rocrand api, why this error appear?

not be able to build on rocm 1.7.x

building error on Vega10 is like

In file included from /home/rocRAND/library/src/rocrand.cpp:23:
In file included from /home/rocRAND/library/src/rng/generators.hpp:27:
/home/rocRAND/library/src/rng/sobol32.hpp:159:9: error: no matching function for call to 'hipLaunchKernelGGL'
hipLaunchKernelGGL(
^~~~~~~~~~~~~~~~~~
/home/rocRAND/library/src/rng/sobol32.hpp:186:16: note: in instantiation of function template specialization
'rocrand_sobol32::generate<double, normal_distribution >' requested here
return generate(data, data_size, distribution);

=======================================
[master u=] $ uname -a
Linux tim-hsa 4.16.0-rc1-kfd-compute-roc-master-7620 #1 SMP Mon Mar 5 13:33:13 CST 2018 x86_64 x86_64 x86_64 GNU/Linux

[master u=] $ ccmake ../
AMDGPU_TARGETS gfx803;gfx900
BUILD_BENCHMARK OFF
BUILD_CRUSH_TEST OFF
BUILD_FORTRAN_WRAPPER OFF
BUILD_SHARED_LIBS ON
BUILD_TEST OFF
CMAKE_BUILD_TYPE Release
CMAKE_INSTALL_PREFIX /opt/rocm
CMAKE_INSTALL_RPATH_USE_LINK_P TRUE
DPKG_EXE /usr/bin/dpkg
ENABLE_INLINE_ASM ON
HIP_ROOT_DIR /opt/rocm/hip
HSA_HEADER /opt/rocm/include
HSA_LIBRARY /opt/rocm/lib/libhsa-runtime64.so
RPMBUILD_EXE RPMBUILD_EXE-NOTFOUND
UNWIND_LIBRARY /usr/lib/x86_64-linux-gnu/libunwind.so
hcc_DIR /opt/rocm/hcc/lib/cmake/hcc
hip_DIR /opt/rocm/hip/lib/cmake/hip

Can the rocRAND v1.7.1 run on Carrizo?

Hello,

I can build the rocRAND v1.7.1 on my Carrizo machine. However, if I run the
./benchmark_rocrand_generate --engine all --dis all

it will crash:
xorwow:
uniform-uint:
terminate called after throwing an instance of 'std::runtime_error'
what(): No device code available for function: _ZN12rocrand_host6detail19init_engines_kernelEPN14rocrand_device13xorwow_engineEyy
Aborted (core dumped)

My system info is:
Ubuntu 16.04.4 LTS
rocm-dkms 1.7.137
HIP version : 1.5.0
GPU: Carrizo

If I run the same built executable on Vega machine, there's no crash.

Thanks,
Qiyu

rocRand is compiled without gfx803 support

Loading of pytorch failed because of missing code object for gfx803. All other rocm libs are fine, just this one failed. Works with local build.

Version:
rocrand4.2.0_2.10.9.40200-21_amd64.deb

Error:
:1:hip_code_object.cpp :456 : 15814696557 us: hipErrorNoBinaryForGpu: Unable to find code object for all current devices!
:1:hip_code_object.cpp :458 : 15814696572 us: Devices:
:1:hip_code_object.cpp :460 : 15814696575 us: amdgcn-amd-amdhsa--gfx803 - [Not Found]
:1:hip_code_object.cpp :465 : 15814696578 us: Bundled Code Objects:
:1:hip_code_object.cpp :482 : 15814696581 us: host-x86_64-unknown-linux - [Unsupported]
:1:hip_code_object.cpp :479 : 15814696585 us: hipv4-amdgcn-amd-amdhsa--gfx900:xnack- - [code object v4 is amdgcn-amd-amdhsa--gfx900:xnack-]
:1:hip_code_object.cpp :479 : 15814696589 us: hipv4-amdgcn-amd-amdhsa--gfx906:xnack- - [code object v4 is amdgcn-amd-amdhsa--gfx906:xnack-]
:1:hip_code_object.cpp :479 : 15814696592 us: hipv4-amdgcn-amd-amdhsa--gfx908:xnack- - [code object v4 is amdgcn-amd-amdhsa--gfx908:xnack-]
HIP/rocclr/hip_code_object.cpp:486: "hipErrorNoBinaryForGpu: Unable to find code object for all current devices!"

rocRAND build issue with rocm-dev

Hi,

When I try to build rocRAND on AMD platform with rocm-dev (v1.6) installed, instead of rocm, there is error reported:
command:
cmake -DBUILD_BENCHMARK=ON ../.

error:
CMake Error at /usr/share/cmake-3.5/Modules/FindCUDA.cmake:617 (message):
Specify CUDA_TOOLKIT_ROOT_DIR
Call Stack (most recent call first):
cmake/NVCC.cmake:54 (find_package)
cmake/Dependencies.cmake:12 (include)
CMakeLists.txt:44 (include)

Do I have to have rocm instead of rocm-dev to build the rocRAND?

Thanks,
Qiyu

Issue installing rocRAND (HIP version : 1.5.18205)

Hi! It appears that when building rocRAND, it does not include the path /opt/rocm/hcc/include/ as a library search path (-L/opt/rocm/hcc/include/), leading to the error below.

/opt/rocm/hip/include/hip/hcc_detail/hip_runtime.h:61:10: fatal error: 'grid_launch.h' file not found
#include <grid_launch.h>

Could a spack package be provided for hiprand?

Is your feature request related to a problem? Please describe.
There is not a download of hiprand for centos8 as a yum package. Here (https://repo.radeon.com/rocm/centos8/5.2/main), I see that there are yum packages for hipfft, hibblas and others. Could a centos package be added for hiprand?

Describe the solution you'd like
Could a centos package be added for hiprand?

Alternatively, the DOE favors software distribution through spack. There are already spack packages for ROCm components.
How about adding a spack package for hiprand?

Compilation issue

There is an compilation issue for me (with ROCm-2.0):

[ 54%] Linking CXX executable test_log_normal_distribution
Call parameter type does not match function signature!
  %StackGuardSlot = alloca i8*, addrspace(5)
 i8**  call void @llvm.stackprotector(i8* %0, i8* addrspace(5)* %StackGuardSlot)
in function _Z19hiprand_init_kernelI20hiprandStateMRG32k3aEvPT_myy
LLVM ERROR: Broken function found, compilation aborted!
Generating AMD GCN kernel failed in llc for target: gfx900
Call parameter type does not match function signature!
  %StackGuardSlot = alloca i8*, addrspace(5)
 i8**  call void @llvm.stackprotector(i8* %0, i8* addrspace(5)* %StackGuardSlot)
in function _Z19hiprand_init_kernelI20hiprandStateMRG32k3aEvPT_myy
LLVM ERROR: Broken function found, compilation aborted!
Generating AMD GCN kernel failed in llc for target: gfx803
Call parameter type does not match function signature!
  %StackGuardSlot = alloca i8*, addrspace(5)
 i8**  call void @llvm.stackprotector(i8* %0, i8* addrspace(5)* %StackGuardSlot)
in function _Z19hiprand_init_kernelI20hiprandStateMRG32k3aEvPT_myy
LLVM ERROR: Broken function found, compilation aborted!
Generating AMD GCN kernel failed in llc for target: gfx906
clang-8: error: linker command failed with exit code 7 (use -v to see invocation)
make[2]: *** [test/CMakeFiles/test_hiprand_kernel.dir/build.make:92: test/test_hiprand_kernel] Fehler 7
make[1]: *** [CMakeFiles/Makefile2:776: test/CMakeFiles/test_hiprand_kernel.dir/all] Fehler 2
[ 54%] Built target test_log_normal_distribution
207 warnings generated.
[ 55%] Linking CXX executable test_rocrand_cpp_wrapper
[ 55%] Built target test_rocrand_cpp_wrapper

rocrand free an object that is not malloced

It seems librocrand is trying to free an object that is not malloc'ed. It's caught by our asan build (note we removed the asan flag for hcc compiler, but we keep it for the other files that are built with clang/gcc):
This happens on our clang 8 + glibc 2.26 runs, but not on clang 7 + glibc 2.23 run.

=================================================================
==1914160==ERROR: AddressSanitizer: attempting free on address which was not malloc()-ed: 0x7ffda0245c98 in thread T0
SCARINESS: 40 (bad-free)
#0 0x7f6a7ce486d8 in operator delete(void*)
#1 0x7f6a38566299 in hip_impl::functions()::'lambda'()::operator()() const (librocrand.so.1+0x32299)
#2 0x7f6a7c7c9cd8 in __pthread_once_slow glibc-2.26/nptl/pthread_once.c:116
#3 0x7f6a3855c86c in hip_impl::hipLaunchKernelGGLImpl(unsigned long, dim3 const&, dim3 const&, unsigned int, ihipStream_t*, void**) (librocrand.so.1+0x2886c)
#4 0x7f6a3855a58b in rocrand_xorwow::init() (librocrand.so.1+0x2658b)
#5 0x7f6a38558b6f in rocrand_status rocrand_xorwow::generate_normal(float*, unsigned long, float, float) (librocrand.so.1+0x24b6f)
#6 0x7f6a3efbe555 in hiprandGenerateNormal (libhiprand.so.1+0x3555)
#7 0x7f6a525a0112 in void caffe2::math::RandGaussian<float, caffe2::HIPContext>(unsigned long, float, float, float*, caffe2::HIPContext*) caffe2/utils/hip/math_gpu.hip:1569
#8 0x7f6a4ff696af in caffe2::GaussianFillOp<float, caffe2::HIPContext>::Fill(caffe2::Tensor*) caffe2/caffe2/operators/filler_op.h:433
#9 0x7f6a4ff3897b in caffe2::FillerOpcaffe2::HIPContext::RunOnDevice() caffe2/caffe2/operators/filler_op.h:93
#10 0x7f6a4ef1281a in caffe2::Operatorcaffe2::HIPContext::Run(int) caffe2/caffe2/core/operator.h:834
#11 0x7f6a4a7d35c4 in caffe2::SimpleNet::Run() caffe2/caffe2/core/net_simple.cc:63
#12 0x7f6a4a9369d9 in caffe2::Workspace::RunNetOnce(caffe2::NetDef const&) caffe2/caffe2/core/workspace.cc:292

Address 0x7ffda0245c98 is located in stack of thread T0
SUMMARY: AddressSanitizer: bad-free in operator delete(void*)

Compilation issues form rocRand while building the mxnet

While building the hip port of mxnet with latest rocRAND, we observe compilation errors from rocRAND. Please find the log attached.
mxnet_log_rocrand_issue.txt
We needed some support regarding this issue.

ROCM version:
dpkg -s rocm-dkms
Package: rocm-dkms
Status: install ok installed
Priority: optional
Section: devel
Installed-Size: 13
Maintainer: Advanced Micro Devices Inc.
Architecture: amd64
Version: 2.7.22
Depends: rocm-dev, rock-dkms
Description: Radeon Open Compute (ROCm) Runtime software stack
Homepage: https://github.com/RadeonOpenCompute/ROCm

rocRAND Version:
dpkg -s rocrand
Package: rocrand
Status: install ok installed
Priority: optional
Section: devel
Installed-Size: 25033
Maintainer: Saad Rahim [email protected]
Architecture: amd64
Version: 2.7.0.641-rocm-rel-2.7-22-dd953aa
Depends: hip_hcc (>= 1.5.19055)
Description: The rocRAND library provides functions that generate pseudo-random and quasi-random numbers.

can't build rocRAND on nvidia platform, requires amd_comgr

Describe the bug
on nvidia platform, building rocRAND, cmake finishes with error - could not find package amd_comgr
I don't have the comgr installed, but, should I? I am on nvidia platform.

To Reproduce
HIP_PLATFORM=nvidia
cuda installed in standard location (/usr/local/cuda)
installed HIP to ${HOME}/apps/HIP/installation
HIP_PATH=${HOME}/apps/HIP/installation
git clone https://github.com/ROCmSoftwarePlatform/rocRAND.git (also tried the release https://github.com/ROCmSoftwarePlatform/rocRAND/archive/refs/tags/rocm-4.2.0.tar.gz)
cd rocRAND; mkdir build; cd build
CXX=nvcc cmake ..
fails, the output is:

-- The CXX compiler identification is GNU 9.3.0
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/local/cuda/bin/nvcc - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Setting build type to 'Release' as none was specified.
-- Found Git: /usr/bin/git (found version "2.17.1")
-- Performing Test HAVE_gfx803
-- Performing Test HAVE_gfx803 - Failed
-- Performing Test HAVE_gfx900:xnack-
-- Performing Test HAVE_gfx900:xnack- - Failed
-- Performing Test HAVE_gfx906:xnack-
-- Performing Test HAVE_gfx906:xnack- - Failed
-- Performing Test HAVE_gfx908:xnack-
-- Performing Test HAVE_gfx908:xnack- - Failed
-- Performing Test HAVE_gfx90a:xnack-
-- Performing Test HAVE_gfx90a:xnack- - Failed
-- Performing Test HAVE_gfx90a:xnack+
-- Performing Test HAVE_gfx90a:xnack+ - Failed
-- Performing Test HAVE_gfx1030
-- Performing Test HAVE_gfx1030 - Failed
CMake Error at /home/jakub/apps/cmake-3.20.5/share/cmake-3.20/Modules/CMakeFindDependencyMacro.cmake:47 (find_package):
  By not providing "Findamd_comgr.cmake" in CMAKE_MODULE_PATH this project
  has asked CMake to find a package configuration file provided by
  "amd_comgr", but CMake did not find one.

  Could not find a package configuration file provided by "amd_comgr" with
  any of the following names:

    amd_comgrConfig.cmake
    amd_comgr-config.cmake

  Add the installation prefix of "amd_comgr" to CMAKE_PREFIX_PATH or set
  "amd_comgr_DIR" to a directory containing one of the above files.  If
  "amd_comgr" provides a separate development package or SDK, be sure it has
  been installed.
Call Stack (most recent call first):
  /home/jakub/apps/HIP/installation/lib/cmake/hip/hip-config.cmake:155 (find_dependency)
  cmake/VerifyCompiler.cmake:27 (find_package)
  CMakeLists.txt:110 (include)


-- Configuring incomplete, errors occurred!
See also "/home/jakub/apps/rocRAND/build/CMakeFiles/CMakeOutput.log".
See also "/home/jakub/apps/rocRAND/build/CMakeFiles/CMakeError.log".

Expected behavior
I expected rocRAND to not require amd_comgr when building for nvidia platform and to build successfully without it.

Environment
Ubuntu-18.04 in WSL2 with working cuda
cmake 3.20.5
hipcc --version:

HIP version: 4.2.21155-37cb3a34
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Wed_Jun__2_19:15:15_PDT_2021
Cuda compilation tools, release 11.4, V11.4.48
Build cuda_11.4.r11.4/compiler.30033411_0

in conclusion:
is this really the HIP equivalent of cuRAND, or is it just the rocm equivalent working on amd platform? the README says the HIP wrapper is here, so I am confused.
I found hipBLAS, hipSPARSE, hipFFT, but not hipRAND only this rocRAND. is this the one?
how should I get it working on nvidia platform?
do I really need amd_comgr and its own requirements, like AMDDeviceLibs, even on the nvidia platform?
am I missing something fundamental here?

I am trying to build it from source, using apt-get is the last option for me.

Thanks for any help.
Jakub

PS: all the installation/build guides all around HIP should me more thorough IMO. I am really struggling a lot.

Use mtgp32_kernel_params field inside of mtgp_engine for better OO design.

I don't see why you're diverging from CUDA for no good reason. I noticed a redundancy with the design of the mtgp_engine class.

This is how it's already done in CUDA.

struct curandStateMtgp32 {
    unsigned int s[MTGP32_STATE_SIZE];
    int offset;
    int pIdx;
    mtgp32_kernel_params_t * k;
    int precise_double_flag;
};

Let's take a look at rocRAND.
Fields of the mtgp32_kernel_params_t class:

public:
    // State
    mtgp32_state m_state;
    // Parameters
    unsigned int pos_tbl;
    unsigned int param_tbl[MTGP_TS];
    unsigned int temper_tbl[MTGP_TS];
    unsigned int single_temper_tbl[MTGP_TS];
    unsigned int sh1_tbl;
    unsigned int sh2_tbl;
    unsigned int mask;

Fields of t mtgp32_kernel_params_t:

    unsigned int pos_tbl[MTGP_BN_MAX];
    unsigned int param_tbl[MTGP_BN_MAX][MTGP_TS];
    unsigned int temper_tbl[MTGP_BN_MAX][MTGP_TS];
    unsigned int single_temper_tbl[MTGP_BN_MAX][MTGP_TS];
    unsigned int sh1_tbl[MTGP_BN_MAX];
    unsigned int sh2_tbl[MTGP_BN_MAX];
    unsigned int mask[1];

Why not simply define the mtgp_engine class to have a pointer to mtgp32_kernel_params_t?

Example:

public:
    // State
    mtgp32_state m_state;
    // Parameters
    mtgp32_kernel_params_t * k;

Reason One
Better object oriented design. I should be able to create a single mtgp32_kernel_params and reuse that object in other hiprandStateMtgp32_t states. Otherwise, you'll have to do bloated things such as

my_state.pos_tbl = mtgp32_kernel_params.pos_tbl;
my_state.param_tbl = mtgp32_kernel_params.param_tbl;
my_state.temper_tbl = mtgp32_kernel_params.temper_tbl;
my_state.single_temper_tbl = mtgp32_kernel_params.single_temper_tbl;
my_state.sh1_tbl = mtgp32_kernel_params.sh1_tbl;
my_state.sh2_tbl = mtgp32_kernel_params.sh2_tbl;
my_state.mask = mtgp32_kernel_params.mask;

Reason Two
Unnecessary divergence from the CUDA API leading to more work to have new frameworks run seamlessly.

rocRAND does not build with hcc

Trying to build with hcc, like this:

CXX=hcc cmake -DCMAKE_PREFIX_PATH="/opt/rocm/hcc;/opt/rocm/hip" ..

Still uses hipcc to build:

-- Found HIP: /opt/rocm/hip (found version "1.5.18141") 
-- The CXX compiler identification is Clang 7.0.0
-- Check for working CXX compiler: /opt/rocm/hip/bin/hipcc
-- Check for working CXX compiler: /opt/rocm/hip/bin/hipcc -- broken

I use hcc to build all the library components because hipcc has many problems(including in the above scenario).

All the other libraries such as: miopen, rocblas, hipblas, rocfft do not have this problem, and can be built with hcc.

It seems a major problem is here, where CMAKE_CXX_COMPILER is set. A cmake build script should NEVER override the the compiler the user has selected.

Instead, it should check if the compiler is hcc and then call the appropriate find_package like rocBLAS does here:

# Find HCC/HIP dependencies
if( CMAKE_CXX_COMPILER MATCHES ".*/hcc$" )
  find_package( hcc REQUIRED CONFIG PATHS /opt/rocm )
  find_package( hip REQUIRED CONFIG PATHS /opt/rocm )
endif( )

Then link with the hcc/hip targets like here:

target_link_libraries( rocblas PRIVATE hip::hip_hcc hip::hip_device hcc::hccshared )

from tensorflow.python.keras.metrics import Metric ImportError: cannot import name 'Metric'

import tensorflow as tf
mnist = tf.keras.datasets.mnist

(x_train, y_train),(x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(),
  tf.keras.layers.Dense(512, activation=tf.nn.relu),
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(10, activation=tf.nn.softmax)
])
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

model.fit(x_train, y_train, epochs=5)
model.evaluate(x_test, y_test)

/usr/bin/python3.5 /home/c/.PyCharmCE2018.2/config/scratches/scratch.py
Traceback (most recent call last):
File "/home/c/.PyCharmCE2018.2/config/scratches/scratch.py", line 1, in
import tensorflow as tf
File "/home/c/.local/lib/python3.5/site-packages/tensorflow/init.py", line 22, in
from tensorflow.python import pywrap_tensorflow # pylint: disable=unused-import
File "/home/c/.local/lib/python3.5/site-packages/tensorflow/python/init.py", line 81, in
from tensorflow.python import keras
File "/home/c/.local/lib/python3.5/site-packages/tensorflow/python/keras/init.py", line 24, in
from tensorflow.python.keras import activations
File "/home/c/.local/lib/python3.5/site-packages/tensorflow/python/keras/activations/init.py", line 22, in
from tensorflow.python.keras._impl.keras.activations import elu
File "/home/c/.local/lib/python3.5/site-packages/tensorflow/python/keras/_impl/keras/init.py", line 21, in
from tensorflow.python.keras._impl.keras import activations
File "/home/c/.local/lib/python3.5/site-packages/tensorflow/python/keras/_impl/keras/activations.py", line 24, in
from tensorflow.python.keras._impl.keras.utils.generic_utils import deserialize_keras_object
File "/home/c/.local/lib/python3.5/site-packages/tensorflow/python/keras/_impl/keras/utils/init.py", line 34, in
from tensorflow.python.keras._impl.keras.utils.multi_gpu_utils import multi_gpu_model
File "/home/c/.local/lib/python3.5/site-packages/tensorflow/python/keras/_impl/keras/utils/multi_gpu_utils.py", line 22, in
from tensorflow.python.keras._impl.keras.engine.training import Model
File "/home/c/.local/lib/python3.5/site-packages/tensorflow/python/keras/_impl/keras/engine/init.py", line 21, in
from tensorflow.python.keras._impl.keras.engine.base_layer import InputSpec
File "/home/c/.local/lib/python3.5/site-packages/tensorflow/python/keras/_impl/keras/engine/base_layer.py", line 28, in
from tensorflow.python.estimator import util as estimator_util
File "/home/c/.local/lib/python3.5/site-packages/tensorflow/python/estimator/init.py", line 25, in
import tensorflow.python.estimator.estimator_lib
File "/home/c/.local/lib/python3.5/site-packages/tensorflow/python/estimator/estimator_lib.py", line 22, in
from tensorflow.python.estimator.canned.baseline import BaselineClassifier
File "/home/c/.local/lib/python3.5/site-packages/tensorflow/python/estimator/canned/baseline.py", line 50, in
from tensorflow.python.estimator import estimator
File "/home/c/.local/lib/python3.5/site-packages/tensorflow/python/estimator/estimator.py", line 34, in
from tensorflow.python.estimator import model_fn as model_fn_lib
File "/home/c/.local/lib/python3.5/site-packages/tensorflow/python/estimator/model_fn.py", line 29, in
from tensorflow.python.keras.metrics import Metric
ImportError: cannot import name 'Metric'

Not really sure how to fix this. any input on where to start would be great.

Call parameter type does not match function signature

Hi,

I'm having problems when building the rocRAND library on my ArchLinux machine:

Call parameter type does not match function signature!
  %StackGuardSlot = alloca i8*, addrspace(5)
 i8**  call void @llvm.stackprotector(i8* %0, i8* addrspace(5)* %StackGuardSlot)
in function _Z14rocrand_kernelIN14rocrand_device14sobol32_engineILb0EEEEvPjS3_m
LLVM ERROR: Broken function found, compilation aborted!
Generating AMD GCN kernel failed in llc for target: gfx906
Call parameter type does not match function signature!
  %StackGuardSlot = alloca i8*, addrspace(5)
 i8**  call void @llvm.stackprotector(i8* %0, i8* addrspace(5)* %StackGuardSlot)
in function _Z14rocrand_kernelIN14rocrand_device14sobol32_engineILb0EEEEvPjS3_m
LLVM ERROR: Broken function found, compilation aborted!
Generating AMD GCN kernel failed in llc for target: gfx900
Call parameter type does not match function signature!
  %StackGuardSlot = alloca i8*, addrspace(5)
 i8**  call void @llvm.stackprotector(i8* %0, i8* addrspace(5)* %StackGuardSlot)
in function _Z14rocrand_kernelIN14rocrand_device14sobol32_engineILb0EEEEvPjS3_m
LLVM ERROR: Broken function found, compilation aborted!
Generating AMD GCN kernel failed in llc for target: gfx803
clang-8: error: linker command failed with exit code 7 (use -v to see invocation)
make[2]: *** [test/CMakeFiles/test_rocrand_kernel_sobol32.dir/build.make:91: test/test_rocrand_kernel_sobol32] Error 7
make[1]: *** [CMakeFiles/Makefile2:738: test/CMakeFiles/test_rocrand_kernel_sobol32.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
Call parameter type does not match function signature!
  %StackGuardSlot = alloca i8*, addrspace(5)
 i8**  call void @llvm.stackprotector(i8* %0, i8* addrspace(5)* %StackGuardSlot)
in function _Z19rocrand_init_kernelIN14rocrand_device13xorwow_engineEEvPT_myy
LLVM ERROR: Broken function found, compilation aborted!
Generating AMD GCN kernel failed in llc for target: gfx906
Call parameter type does not match function signature!
  %StackGuardSlot = alloca i8*, addrspace(5)
 i8**  call void @llvm.stackprotector(i8* %0, i8* addrspace(5)* %StackGuardSlot)
in function _Z19rocrand_init_kernelIN14rocrand_device13xorwow_engineEEvPT_myy
LLVM ERROR: Broken function found, compilation aborted!
Generating AMD GCN kernel failed in llc for target: gfx900
Call parameter type does not match function signature!
  %StackGuardSlot = alloca i8*, addrspace(5)
 i8**  call void @llvm.stackprotector(i8* %0, i8* addrspace(5)* %StackGuardSlot)
in function _Z19rocrand_init_kernelIN14rocrand_device13xorwow_engineEEvPT_myy
LLVM ERROR: Broken function found, compilation aborted!
Generating AMD GCN kernel failed in llc for target: gfx803

I've built rocm, hcc and hip from scratch (master branch), but I keep getting the same error.

$ /opt/rocm/hcc/bin/hcc --version
HCC clang version 8.0.0 (https://github.com/RadeonOpenCompute/hcc-clang-upgrade.git 6ec3c61e09fbb60373eaf5a40021eb862363ba2c) (https://github.com/RadeonOpenCompute/llvm.git d5938b6c383ee68cf93d4508e48836d6118517e2) (based on HCC 1.3.18505-6cf476c2-6ec3c61e09-d5938b6c383 )
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/rocm/hcc/bin
$ /opt/rocm/hip/bin/hipcc --version
HIP version: 1.5.18494
HCC clang version 8.0.0 (https://github.com/RadeonOpenCompute/hcc-clang-upgrade.git 6ec3c61e09fbb60373eaf5a40021eb862363ba2c) (https://github.com/RadeonOpenCompute/llvm.git d5938b6c383ee68cf93d4508e48836d6118517e2) (based on HCC 1.3.18505-6cf476c2-6ec3c61e09-d5938b6c383 )
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/rocm/hcc/bin

Any hints?

Thanks,
Augusto.

Cannot build 2.7.0

Using the Gentoo overlay, I cannot get rocRAND to install with the 2.7.0 versions of everything.

>>> Emerging (1 of 1) sci-libs/rocRAND-2.7.0-r1::rocm
 * rocRAND-2.7.0.tar.gz BLAKE2B SHA512 size ;-) ...                                                                                                                                                                                    [ ok ]
>>> Unpacking source...
>>> Unpacking rocRAND-2.7.0.tar.gz to /var/tmp/portage/sci-libs/rocRAND-2.7.0-r1/work
>>> Source unpacked in /var/tmp/portage/sci-libs/rocRAND-2.7.0-r1/work
>>> Preparing source in /var/tmp/portage/sci-libs/rocRAND-2.7.0-r1/work/rocRAND-rocm-2.7 ...
 * Applying master-disable2ndfindhcc.patch ...                                                                                                                                                                                         [ ok ]
 * Hardcoded definition(s) removed in CMakeLists.txt:
 *  set(CMAKE_INSTALL_PREFIX "/opt/rocm" CACHE PATH "Install path prefix, prepend
 *    set(CMAKE_BUILD_TYPE "Release" CACHE STRING "Choose the type of build." FOR
>>> Source prepared.
>>> Configuring source in /var/tmp/portage/sci-libs/rocRAND-2.7.0-r1/work/rocRAND-rocm-2.7 ...
>>> Working in BUILD_DIR: "/var/tmp/portage/sci-libs/rocRAND-2.7.0-r1/work/rocRAND-2.7.0_build"
cmake -C /var/tmp/portage/sci-libs/rocRAND-2.7.0-r1/work/rocRAND-2.7.0_build/gentoo_common_config.cmake -G Ninja -DCMAKE_INSTALL_PREFIX=/usr -DHIP_PLATFORM=hcc -DHIP_ROOT_DIR=/usr/lib/hip -DBUILD_TEST=OFF -DCMAKE_INSTALL_PREFIX=/usr -DCMAKE_CXX_FLAGS:STRING=-I/usr/lib/hcc/2.7/include -DCMAKE_BUILD_TYPE=Gentoo -DCMAKE_TOOLCHAIN_FILE=/var/tmp/portage/sci-libs/rocRAND-2.7.0-r1/work/rocRAND-2.7.0_build/gentoo_toolchain.cmake  /var/tmp/portage/sci-libs/rocRAND-2.7.0-r1/work/rocRAND-rocm-2.7
loading initial cache file /var/tmp/portage/sci-libs/rocRAND-2.7.0-r1/work/rocRAND-2.7.0_build/gentoo_common_config.cmake
CMake Warning (dev) at gentoo_common_config.cmake:8 (SET):
  implicitly converting 'BOOLEAN' to 'STRING' type.
This warning is for project developers.  Use -Wno-dev to suppress it.

-- The CXX compiler identification is Clang 9.0.0
-- Check for working CXX compiler: /usr/lib/hcc/2.7/bin/hcc
-- Check for working CXX compiler: /usr/lib/hcc/2.7/bin/hcc -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
CMake Warning (dev) at CMakeLists.txt:47 (set):
  implicitly converting 'BOOLEAN' to 'STRING' type.
This warning is for project developers.  Use -Wno-dev to suppress it.

-- Found Git: /usr/bin/git (found version "2.23.0") 
-- 
-- ******** Summary ********
-- General:
--   System                : Linux
--   HIP ROOT              : /usr/lib/hip
--   C++ compiler          : /usr/lib/hcc/2.7/bin/hcc
--   C++ compiler version  : 9.0.0
--   CXX flags             : -I/usr/lib/hcc/2.7/include -Wno-unused-command-line-argument -Wall -Wextra
--   Build type            : Gentoo
--   Install prefix        : /usr
--   Device targets        : gfx803;gfx900;gfx906
-- 
--   BUILD_SHARED_LIBS     : ON
--   BUILD_FORTRAN_WRAPPER : OFF
--   BUILD_TEST            : OFF
--   BUILD_BENCHMARK       : OFF
-- <<< Gentoo configuration >>>
Build type      Gentoo
Install path    /usr
Compiler flags:
C               
C++             -I/usr/lib/hcc/2.7/include -Wno-unused-command-line-argument -Wall -Wextra
Linker flags:
Executable      -Wl,-O1 -Wl,--as-needed
Module          -Wl,-O1 -Wl,--as-needed
Shared          -Wl,-O1 -Wl,--as-needed

-- Configuring done
-- Generating done
-- Build files have been written to: /var/tmp/portage/sci-libs/rocRAND-2.7.0-r1/work/rocRAND-2.7.0_build
>>> Source configured.
>>> Compiling source in /var/tmp/portage/sci-libs/rocRAND-2.7.0-r1/work/rocRAND-rocm-2.7 ...
>>> Working in BUILD_DIR: "/var/tmp/portage/sci-libs/rocRAND-2.7.0-r1/work/rocRAND-2.7.0_build"
ninja -v -j6 -l0
[1/12] /usr/lib/hcc/2.7/bin/hcc    -DNDEBUG -I/usr/lib/hcc/2.7/include -Wno-unused-command-line-argument -Wall -Wextra   -std=c++11 -MD -MT tools/CMakeFiles/xorwow_precomputed_generator.dir/xorwow_precomputed_generator.cpp.o -MF tools/CMakeFiles/xorwow_precomputed_generator.dir/xorwow_precomputed_generator.cpp.o.d -o tools/CMakeFiles/xorwow_precomputed_generator.dir/xorwow_precomputed_generator.cpp.o -c /var/tmp/portage/sci-libs/rocRAND-2.7.0-r1/work/rocRAND-rocm-2.7/tools/xorwow_precomputed_generator.cpp
[2/12] : && /usr/lib/hcc/2.7/bin/hcc  -I/usr/lib/hcc/2.7/include -Wno-unused-command-line-argument -Wall -Wextra  -Wl,-O1 -Wl,--as-needed tools/CMakeFiles/xorwow_precomputed_generator.dir/xorwow_precomputed_generator.cpp.o  -o tools/xorwow_precomputed_generator   && :
[3/12] /usr/lib/hcc/2.7/bin/hcc    -DNDEBUG -I/usr/lib/hcc/2.7/include -Wno-unused-command-line-argument -Wall -Wextra   -std=c++11 -MD -MT tools/CMakeFiles/sobol_direction_vector_generator.dir/sobol_direction_vector_generator.cpp.o -MF tools/CMakeFiles/sobol_direction_vector_generator.dir/sobol_direction_vector_generator.cpp.o.d -o tools/CMakeFiles/sobol_direction_vector_generator.dir/sobol_direction_vector_generator.cpp.o -c /var/tmp/portage/sci-libs/rocRAND-2.7.0-r1/work/rocRAND-rocm-2.7/tools/sobol_direction_vector_generator.cpp
[4/12] /usr/lib/hcc/2.7/bin/hcc    -DNDEBUG -I/usr/lib/hcc/2.7/include -Wno-unused-command-line-argument -Wall -Wextra   -std=c++11 -MD -MT tools/CMakeFiles/mrg32k3a_precomputed_generator.dir/mrg32k3a_precomputed_generator.cpp.o -MF tools/CMakeFiles/mrg32k3a_precomputed_generator.dir/mrg32k3a_precomputed_generator.cpp.o.d -o tools/CMakeFiles/mrg32k3a_precomputed_generator.dir/mrg32k3a_precomputed_generator.cpp.o -c /var/tmp/portage/sci-libs/rocRAND-2.7.0-r1/work/rocRAND-rocm-2.7/tools/mrg32k3a_precomputed_generator.cpp
[5/12] : && /usr/lib/hcc/2.7/bin/hcc  -I/usr/lib/hcc/2.7/include -Wno-unused-command-line-argument -Wall -Wextra  -Wl,-O1 -Wl,--as-needed tools/CMakeFiles/sobol_direction_vector_generator.dir/sobol_direction_vector_generator.cpp.o  -o tools/sobol_direction_vector_generator   && :
[6/12] : && /usr/lib/hcc/2.7/bin/hcc  -I/usr/lib/hcc/2.7/include -Wno-unused-command-line-argument -Wall -Wextra  -Wl,-O1 -Wl,--as-needed tools/CMakeFiles/mrg32k3a_precomputed_generator.dir/mrg32k3a_precomputed_generator.cpp.o  -o tools/mrg32k3a_precomputed_generator   && :
[7/12] /usr/lib/hcc/2.7/bin/hcc -Dhiprand_EXPORTS -Ilibrary/include -I/var/tmp/portage/sci-libs/rocRAND-2.7.0-r1/work/rocRAND-rocm-2.7/library/include -isystem /usr/lib/hip/include  -DNDEBUG -I/usr/lib/hcc/2.7/include -Wno-unused-command-line-argument -Wall -Wextra -fPIC   -std=c++11 -MD -MT library/CMakeFiles/hiprand.dir/src/hiprand/hiprand_hcc.cpp.o -MF library/CMakeFiles/hiprand.dir/src/hiprand/hiprand_hcc.cpp.o.d -o library/CMakeFiles/hiprand.dir/src/hiprand/hiprand_hcc.cpp.o -c /var/tmp/portage/sci-libs/rocRAND-2.7.0-r1/work/rocRAND-rocm-2.7/library/src/hiprand/hiprand_hcc.cpp
FAILED: library/CMakeFiles/hiprand.dir/src/hiprand/hiprand_hcc.cpp.o 
/usr/lib/hcc/2.7/bin/hcc -Dhiprand_EXPORTS -Ilibrary/include -I/var/tmp/portage/sci-libs/rocRAND-2.7.0-r1/work/rocRAND-rocm-2.7/library/include -isystem /usr/lib/hip/include  -DNDEBUG -I/usr/lib/hcc/2.7/include -Wno-unused-command-line-argument -Wall -Wextra -fPIC   -std=c++11 -MD -MT library/CMakeFiles/hiprand.dir/src/hiprand/hiprand_hcc.cpp.o -MF library/CMakeFiles/hiprand.dir/src/hiprand/hiprand_hcc.cpp.o.d -o library/CMakeFiles/hiprand.dir/src/hiprand/hiprand_hcc.cpp.o -c /var/tmp/portage/sci-libs/rocRAND-2.7.0-r1/work/rocRAND-rocm-2.7/library/src/hiprand/hiprand_hcc.cpp
In file included from /var/tmp/portage/sci-libs/rocRAND-2.7.0-r1/work/rocRAND-rocm-2.7/library/src/hiprand/hiprand_hcc.cpp:21:
In file included from /usr/lib/hip/include/hip/hip_runtime.h:56:
In file included from /usr/lib/hip/include/hip/hcc_detail/hip_runtime.h:69:
In file included from /usr/lib/hcc/2.7/include/hc_printf.hpp:13:
In file included from /usr/lib/hcc/2.7/include/hc_am_internal.hpp:3:
In file included from /usr/lib/hcc/2.7/include/hc_am.hpp:3:
In file included from /usr/lib/hcc/2.7/include/hc.hpp:17:
/usr/lib/hcc/2.7/include/kalmar_index.h:42:35: error: expected ';' at end of declaration list
    explicit __index_leaf(int __t) restrict(amp,cpu) : __idx(__t) {}
                                  ^
/usr/lib/hcc/2.7/include/kalmar_index.h:76:17: error: expected ';' at end of declaration list
    index_impl() restrict(amp,cpu) : __index_leaf<N>(0)... {}
                ^
/usr/lib/hcc/2.7/include/kalmar_index.h:150:37: error: expected ';' at end of declaration list
    static inline void set(_Tp& now) restrict(amp,cpu) {
                                    ^
/usr/lib/hcc/2.7/include/kalmar_index.h:166:37: error: expected ';' at end of declaration list
    static inline void set(_Tp& now) restrict(amp,cpu) {
                                    ^
/usr/lib/hcc/2.7/include/kalmar_index.h:180:66: error: expected ';' at end of declaration list
    static bool inline contains(const _Tp1& idx, const _Tp2& ext) restrict(amp,cpu) {
                                                                 ^
/usr/lib/hcc/2.7/include/kalmar_index.h:201:66: error: expected ';' at end of declaration list
    static bool inline contains(const _Tp1& idx, const _Tp2& ext) restrict(amp,cpu) {
                                                                 ^
/usr/lib/hcc/2.7/include/kalmar_index.h:242:12: error: expected ';' at end of declaration list
    index() restrict(amp,cpu) : base_() {
           ^
/usr/lib/hcc/2.7/include/kalmar_index.h:253:30: error: expected ';' at end of declaration list
    index(const index& other) restrict(amp,cpu)
                             ^
/usr/lib/hcc/2.7/include/kalmar_index.h:446:5: error: unknown type name 'base'
    base base_;
    ^
/usr/lib/hcc/2.7/include/kalmar_index.h:454:33: error: expected ';' at end of declaration list
    void __cxxamp_opencl_index() restrict(amp,cpu)
                                ^
/usr/lib/hcc/2.7/include/kalmar_index.h:492:61: error: expected ';' at end of declaration
index<N> operator+(const index<N>& lhs, const index<N>& rhs) restrict(amp,cpu) {
                                                            ^
/usr/lib/hcc/2.7/include/kalmar_index.h:492:71: error: unknown type name 'amp'
index<N> operator+(const index<N>& lhs, const index<N>& rhs) restrict(amp,cpu) {
                                                                      ^
/usr/lib/hcc/2.7/include/kalmar_index.h:492:75: error: unknown type name 'cpu'
index<N> operator+(const index<N>& lhs, const index<N>& rhs) restrict(amp,cpu) {
                                                                          ^
/usr/lib/hcc/2.7/include/kalmar_index.h:492:62: error: C++ requires a type specifier for all declarations
index<N> operator+(const index<N>& lhs, const index<N>& rhs) restrict(amp,cpu) {
                                                             ^
/usr/lib/hcc/2.7/include/kalmar_index.h:493:11: error: use of undeclared identifier 'N'
    index<N> __r = lhs;
          ^
/usr/lib/hcc/2.7/include/kalmar_index.h:494:5: error: use of undeclared identifier '__r'
    __r += rhs;
    ^
/usr/lib/hcc/2.7/include/kalmar_index.h:494:12: error: use of undeclared identifier 'rhs'
    __r += rhs;
           ^
/usr/lib/hcc/2.7/include/kalmar_index.h:495:12: error: use of undeclared identifier '__r'
    return __r;
           ^
/usr/lib/hcc/2.7/include/kalmar_index.h:498:61: error: expected ';' at end of declaration
index<N> operator-(const index<N>& lhs, const index<N>& rhs) restrict(amp,cpu) {
                                                            ^
fatal error: too many errors emitted, stopping now [-ferror-limit=]
20 errors generated.
[8/12] /usr/lib/hcc/2.7/bin/hcc -Drocrand_EXPORTS -Ilibrary/include -I/var/tmp/portage/sci-libs/rocRAND-2.7.0-r1/work/rocRAND-rocm-2.7/library/include -isystem /usr/lib/hip/include  -DNDEBUG -I/usr/lib/hcc/2.7/include -Wno-unused-command-line-argument -Wall -Wextra -fPIC   -std=c++11 -MD -MT library/CMakeFiles/rocrand.dir/src/rocrand.cpp.o -MF library/CMakeFiles/rocrand.dir/src/rocrand.cpp.o.d -o library/CMakeFiles/rocrand.dir/src/rocrand.cpp.o -c /var/tmp/portage/sci-libs/rocRAND-2.7.0-r1/work/rocRAND-rocm-2.7/library/src/rocrand.cpp
FAILED: library/CMakeFiles/rocrand.dir/src/rocrand.cpp.o 
/usr/lib/hcc/2.7/bin/hcc -Drocrand_EXPORTS -Ilibrary/include -I/var/tmp/portage/sci-libs/rocRAND-2.7.0-r1/work/rocRAND-rocm-2.7/library/include -isystem /usr/lib/hip/include  -DNDEBUG -I/usr/lib/hcc/2.7/include -Wno-unused-command-line-argument -Wall -Wextra -fPIC   -std=c++11 -MD -MT library/CMakeFiles/rocrand.dir/src/rocrand.cpp.o -MF library/CMakeFiles/rocrand.dir/src/rocrand.cpp.o.d -o library/CMakeFiles/rocrand.dir/src/rocrand.cpp.o -c /var/tmp/portage/sci-libs/rocRAND-2.7.0-r1/work/rocRAND-rocm-2.7/library/src/rocrand.cpp
In file included from /var/tmp/portage/sci-libs/rocRAND-2.7.0-r1/work/rocRAND-rocm-2.7/library/src/rocrand.cpp:21:
In file included from /usr/lib/hip/include/hip/hip_runtime.h:56:
In file included from /usr/lib/hip/include/hip/hcc_detail/hip_runtime.h:69:
In file included from /usr/lib/hcc/2.7/include/hc_printf.hpp:13:
In file included from /usr/lib/hcc/2.7/include/hc_am_internal.hpp:3:
In file included from /usr/lib/hcc/2.7/include/hc_am.hpp:3:
In file included from /usr/lib/hcc/2.7/include/hc.hpp:17:
/usr/lib/hcc/2.7/include/kalmar_index.h:42:35: error: expected ';' at end of declaration list
    explicit __index_leaf(int __t) restrict(amp,cpu) : __idx(__t) {}
                                  ^
/usr/lib/hcc/2.7/include/kalmar_index.h:76:17: error: expected ';' at end of declaration list
    index_impl() restrict(amp,cpu) : __index_leaf<N>(0)... {}
                ^
/usr/lib/hcc/2.7/include/kalmar_index.h:150:37: error: expected ';' at end of declaration list
    static inline void set(_Tp& now) restrict(amp,cpu) {
                                    ^
/usr/lib/hcc/2.7/include/kalmar_index.h:166:37: error: expected ';' at end of declaration list
    static inline void set(_Tp& now) restrict(amp,cpu) {
                                    ^
/usr/lib/hcc/2.7/include/kalmar_index.h:180:66: error: expected ';' at end of declaration list
    static bool inline contains(const _Tp1& idx, const _Tp2& ext) restrict(amp,cpu) {
                                                                 ^
/usr/lib/hcc/2.7/include/kalmar_index.h:201:66: error: expected ';' at end of declaration list
    static bool inline contains(const _Tp1& idx, const _Tp2& ext) restrict(amp,cpu) {
                                                                 ^
/usr/lib/hcc/2.7/include/kalmar_index.h:242:12: error: expected ';' at end of declaration list
    index() restrict(amp,cpu) : base_() {
           ^
/usr/lib/hcc/2.7/include/kalmar_index.h:253:30: error: expected ';' at end of declaration list
    index(const index& other) restrict(amp,cpu)
                             ^
/usr/lib/hcc/2.7/include/kalmar_index.h:446:5: error: unknown type name 'base'
    base base_;
    ^
/usr/lib/hcc/2.7/include/kalmar_index.h:454:33: error: expected ';' at end of declaration list
    void __cxxamp_opencl_index() restrict(amp,cpu)
                                ^
/usr/lib/hcc/2.7/include/kalmar_index.h:492:61: error: expected ';' at end of declaration
index<N> operator+(const index<N>& lhs, const index<N>& rhs) restrict(amp,cpu) {
                                                            ^
/usr/lib/hcc/2.7/include/kalmar_index.h:492:71: error: unknown type name 'amp'
index<N> operator+(const index<N>& lhs, const index<N>& rhs) restrict(amp,cpu) {
                                                                      ^
/usr/lib/hcc/2.7/include/kalmar_index.h:492:75: error: unknown type name 'cpu'
index<N> operator+(const index<N>& lhs, const index<N>& rhs) restrict(amp,cpu) {
                                                                          ^
/usr/lib/hcc/2.7/include/kalmar_index.h:492:62: error: C++ requires a type specifier for all declarations
index<N> operator+(const index<N>& lhs, const index<N>& rhs) restrict(amp,cpu) {
                                                             ^
/usr/lib/hcc/2.7/include/kalmar_index.h:493:11: error: use of undeclared identifier 'N'
    index<N> __r = lhs;
          ^
/usr/lib/hcc/2.7/include/kalmar_index.h:494:5: error: use of undeclared identifier '__r'
    __r += rhs;
    ^
/usr/lib/hcc/2.7/include/kalmar_index.h:494:12: error: use of undeclared identifier 'rhs'
    __r += rhs;
           ^
/usr/lib/hcc/2.7/include/kalmar_index.h:495:12: error: use of undeclared identifier '__r'
    return __r;
           ^
/usr/lib/hcc/2.7/include/kalmar_index.h:498:61: error: expected ';' at end of declaration
index<N> operator-(const index<N>& lhs, const index<N>& rhs) restrict(amp,cpu) {
                                                            ^
fatal error: too many errors emitted, stopping now [-ferror-limit=]
20 errors generated.
ninja: build stopped: subcommand failed.
 * ERROR: sci-libs/rocRAND-2.7.0-r1::rocm failed (compile phase):
 *   ninja -v -j6 -l0 failed
 * 
 * Call stack:
 *     ebuild.sh, line  125:  Called src_compile
 *   environment, line 2025:  Called cmake-utils_src_compile
 *   environment, line  714:  Called cmake-utils_src_make
 *   environment, line  895:  Called _cmake_ninja_src_make
 *   environment, line  449:  Called eninja
 *   environment, line 1194:  Called die
 * The specific snippet of code:
 *       "$@" || die "${nonfatal_args[@]}" "${*} failed"
 * 
 * If you need support, post the output of `emerge --info '=sci-libs/rocRAND-2.7.0-r1::rocm'`,
 * the complete build log and the output of `emerge -pqv '=sci-libs/rocRAND-2.7.0-r1::rocm'`.
 * The complete build log is located at '/var/tmp/portage/sci-libs/rocRAND-2.7.0-r1/temp/build.log'.
 * The ebuild environment file is located at '/var/tmp/portage/sci-libs/rocRAND-2.7.0-r1/temp/environment'.
 * Working directory: '/var/tmp/portage/sci-libs/rocRAND-2.7.0-r1/work/rocRAND-2.7.0_build'
 * S: '/var/tmp/portage/sci-libs/rocRAND-2.7.0-r1/work/rocRAND-rocm-2.7'

>>> Failed to emerge sci-libs/rocRAND-2.7.0-r1, Log file:

>>>  '/var/tmp/portage/sci-libs/rocRAND-2.7.0-r1/temp/build.log'

 * Messages for package sci-libs/rocRAND-2.7.0-r1:

 * ERROR: sci-libs/rocRAND-2.7.0-r1::rocm failed (compile phase):
 *   ninja -v -j6 -l0 failed
 * 
 * Call stack:
 *     ebuild.sh, line  125:  Called src_compile
 *   environment, line 2025:  Called cmake-utils_src_compile
 *   environment, line  714:  Called cmake-utils_src_make
 *   environment, line  895:  Called _cmake_ninja_src_make
 *   environment, line  449:  Called eninja
 *   environment, line 1194:  Called die
 * The specific snippet of code:
 *       "$@" || die "${nonfatal_args[@]}" "${*} failed"
 * 
 * If you need support, post the output of `emerge --info '=sci-libs/rocRAND-2.7.0-r1::rocm'`,
 * the complete build log and the output of `emerge -pqv '=sci-libs/rocRAND-2.7.0-r1::rocm'`.
 * The complete build log is located at '/var/tmp/portage/sci-libs/rocRAND-2.7.0-r1/temp/build.log'.
 * The ebuild environment file is located at '/var/tmp/portage/sci-libs/rocRAND-2.7.0-r1/temp/environment'.
 * Working directory: '/var/tmp/portage/sci-libs/rocRAND-2.7.0-r1/work/rocRAND-2.7.0_build'
 * S: '/var/tmp/portage/sci-libs/rocRAND-2.7.0-r1/work/rocRAND-rocm-2.7'

I have submitted a bug there too.

Make the state structs accessible directly from C.

You can take a look at CUDA headers and you'll see that state is designed as a struct: https://github.com/Geof23/Gklee/blob/master/Gklee/include/cuda/curand_mtgp32.h#L194

rocRAND however has its state defined as a class with member functions. This is not supported in C and thus it's not possible to obtain C style linkage.

Why is this important? Many deep learning frameworks interface with Python and C via Python C Extensions.

I suggest two possible solutions would be either

  1. Add a constructor to the mtgp32_engine class that takes in a hiprandStateMtgp32 object, allowing conversions from hiprandStateMtgp32 to mtgp32_engine.

OR

  1. Change the mtgp32_engine class to a C style struct by removing the constructors & member functions, and instead make these free functions (i.e. global scope).

rocRAND crashes when compiled with avx2 support

Describe the bug
Due to unaligned allocations in heap with new of structs that have __attribute__((ext_vector_type(4))) inside them (from amd_hip_vector_types.h and other places) rocRAND crashes when compiled with AVX2 support.

To Reproduce
Steps to reproduce the behavior:

  1. Install rocRAND 5.7.1
  2. Build with -march=znver4 (optimization level does not matter)
  3. Run tests, e. g. test_rocrand_generate
  4. It crashes at line https://github.com/ROCmSoftwarePlatform/rocRAND/blob/rocm-5.7.1/library/src/rng/threefry4x64_20.hpp#L219

Tech details

rocRAND 5.7.1 allocates generators in rocrand_create_generator with generic new, which generally has 128-bit alignment on 64-bit systems. However due to usage of ext_vector_type, for methods of generator clang-17 generates code that expect 256-bit aligned structures. And while inside structures everything is fine, due to heap allocation the whole structure is randomly 128-bit aligned, which causes crash.

vmovdqu %xmm1,0x48(%rdi)
vmovdqa %ymm0,0x60(%rdi) <- attempt to copy aligned structures, SIGSEGV

Environment
HIP version: 5.7.31921
clang version 17.0.2
custom build for Gentoo

Additional context
In Eigen similar problem was solved in 2 ways: for C++17-compatible compilers there is nothing to do, for older compilers overloaded new/delete are required.
https://github.com/search?q=repo%3Aeigen-mirror%2Feigen%20EIGEN_HAS_CXX17_OVERALIGN&type=code

Patch in Gentoo: gentoo/gentoo@54e4e7d#diff-e3383e414e6cc5ffd2e0222a1d4152298699139ea2dc379238423f31971149ae

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.