radeonopencompute / rocm Goto Github PK

View Code? Open in Web Editor NEW

4.2K 212.0 350.0 121.48 MB

AMD ROCm™ Software - GitHub Home

Home Page: https://rocm.docs.amd.com

License: MIT License

Shell 85.40% Python 10.15% Jinja 0.55% CMake 1.01% Makefile 2.74% Dockerfile 0.15%

documentation

rocm's Introduction

AMD ROCm Software

ROCm is an open-source stack, composed primarily of open-source software, designed for graphics processing unit (GPU) computation. ROCm consists of a collection of drivers, development tools, and APIs that enable GPU programming from low-level kernel to end-user applications.

With ROCm, you can customize your GPU software to meet your specific needs. You can develop, collaborate, test, and deploy your applications in a free, open source, integrated, and secure software ecosystem. ROCm is particularly well-suited to GPU-accelerated high-performance computing (HPC), artificial intelligence (AI), scientific computing, and computer aided design (CAD).

ROCm is powered by AMD’s Heterogeneous-computing Interface for Portability (HIP), an open-source software C++ GPU programming environment and its corresponding runtime. HIP allows ROCm developers to create portable applications on different platforms by deploying code on a range of platforms, from dedicated gaming GPUs to exascale HPC clusters.

ROCm supports programming models, such as OpenMP and OpenCL, and includes all necessary open source software compilers, debuggers, and libraries. ROCm is fully integrated into machine learning (ML) frameworks, such as PyTorch and TensorFlow.

Getting the ROCm Source Code

AMD ROCm is built from open source software. It is, therefore, possible to modify the various components of ROCm by downloading the source code and rebuilding the components. The source code for ROCm components can be cloned from each of the GitHub repositories using git. For easy access to download the correct versions of each of these tools, the ROCm repository contains a repo manifest file called default.xml. You can use this manifest file to download the source code for ROCm software.

Installing the repo tool

The repo tool from Google allows you to manage multiple git repositories simultaneously. Run the following commands to install the repo tool:

mkdir -p ~/bin/
curl https://storage.googleapis.com/git-repo-downloads/repo > ~/bin/repo
chmod a+x ~/bin/repo

Note: The ~/bin/ folder is used as an example. You can specify a different folder to install the repo tool into if you desire.

Installing git-lfs

Some ROCm projects use the Git Large File Storage (LFS) format that may require you to install git-lfs. Refer to Git Large File Storage for more information. For example, to install git-lfs for Ubuntu, use the following command:

sudo apt-get install git-lfs

Downloading the ROCm source code

The following example shows how to use the repo tool to download the ROCm source code. If you choose a directory other than ~/bin/ to install the repo tool, you must use that chosen directory in the code as shown below:

mkdir -p ~/ROCm/
cd ~/ROCm/
~/bin/repo init -u http://github.com/ROCm/ROCm.git -b roc-6.0.x
~/bin/repo sync

Note: Using this sample code will cause the repo tool to download the open source code associated with the specified ROCm release. Ensure that you have ssh-keys configured on your machine for your GitHub ID prior to the download as explained at Connecting to GitHub with SSH.

Building the ROCm source code

Each ROCm component repository contains directions for building that component, such as the rocSPARSE documentation Installation and Building for Linux. Refer to the specific component documentation for instructions on building the repository.

Each release of the ROCm software supports specific hardware and software configurations. Refer to System requirements (Linux) for the current supported hardware and OS.

Build ROCm from source

The Build will use as many processors as it can find to build in parallel. Some of the compiles can consume as much as 10GB of RAM, so make sure you have plenty of Swap Space !

By default the ROCm build will compile for all supported GPU architectures and will take approximately 500 CPU hours. The Build time will reduce significantly if we limit the GPU Architecture/s against which we need to build by using the environment variable GPU_ARCHS as mentioned below.

# --------------------------------------
# Step1: clone source code
# --------------------------------------

mkdir -p ~/WORKSPACE/      # Or any folder name other than WORKSPACE
cd ~/WORKSPACE/
export ROCM_VERSION=6.1.0   # or 6.1.1 6.1.2
~/bin/repo init -u http://github.com/ROCm/ROCm.git -b roc-6.1.x -m tools/rocm-build/rocm-${ROCM_VERSION}.xml
~/bin/repo sync

# --------------------------------------
# Step 2: Prepare build environment
# --------------------------------------

# Option 1: Start a docker container
# Pulling required base docker images:
# Ubuntu20.04 built from ROCm/tools/rocm-build/docker/ubuntu20/Dockerfile
docker pull rocm/rocm-build-ubuntu-20.04:6.1
# Ubuntu22.04 built from ROCm/tools/rocm-build/docker/ubuntu22/Dockerfile
docker pull rocm/rocm-build-ubuntu-22.04:6.1

# Start docker container and mount the source code folder:
docker run -ti \
    -e ROCM_VERSION=${ROCM_VERSION} \
    -e CCACHE_DIR=$HOME/.ccache \
    -e CCACHE_ENABLED=true \
    -e DOCK_WORK_FOLD=/src \
    -w /src \
    -v $PWD:/src \
    -v /etc/passwd:/etc/passwd \
    -v /etc/shadow:/etc/shadow \
    -v ${HOME}/.ccache:${HOME}/.ccache \
    -u $(id -u):$(id -g) \
    <replace_with_required_ubuntu_base_docker_image> bash

# Option 2: Install required packages into the host machine
# For ubuntu20.04 system
cd ROCm/tools/rocm-build/docker/ubuntu20
bash install-prerequisites.sh
# For ubuntu22.04 system
cd ROCm/tools/rocm-build/docker/ubuntu22
bash install-prerequisities.sh

# --------------------------------------
# Step 3: Run build command line
# --------------------------------------

# Select GPU targets before building:
# When GPU_ARCHS is not set, default GPU targets supported by ROCm6.1 will be used.
# To build against a subset of GFX architectures you can use the below env variable.
# Support MI300 (gfx940, gfx941, gfx942).
export GPU_ARCHS="gfx942"               # Example
export GPU_ARCHS="gfx940;gfx941;gfx942" # Example

# Pick and run build commands in the docker container:
# Build rocm-dev packages
make -f ROCm/tools/rocm-build/ROCm.mk -j ${NPROC:-$(nproc)} rocm-dev
# Build all ROCm packages
make -f ROCm/tools/rocm-build/ROCm.mk -j ${NPROC:-$(nproc)} all
# list all ROCm components to find required components
make -f ROCm/tools/rocm-build/ROCm.mk list_components
# Build a single ROCm packages
make -f ROCm/tools/rocm-build/ROCm.mk T_rocblas

# Find built packages in ubuntu20.04:
out/ubuntu-20.04/20.04/deb/
# Find built packages in ubuntu22.04:
out/ubuntu-22.04/22.04/deb/

# Find built logs in ubuntu20.04:
out/ubuntu-20.04/20.04/logs/
# Find built logs in ubuntu22.04:
out/ubuntu-22.04/22.04/logs/
# All logs pertaining to failed components, end with .errrors extension.
out/ubuntu-22.04/22.04/logs/rocblas.errors          # Example
# All logs pertaining to building components, end with .inprogress extension.
out/ubuntu-22.04/22.04/logs/rocblas.inprogress  # Example
# All logs pertaining to passed components, use the component names.
out/ubuntu-22.04/22.04/logs/rocblas             # Example

Note: Overview for ROCm.mk

ROCm documentation

This repository contains the manifest file for ROCm releases, changelogs, and release information.

The default.xml file contains information for all repositories and the associated commit used to build the current ROCm release; default.xml uses the Manifest Format repository.

Source code for our documentation is located in the /docs folder of most ROCm repositories. The develop branch of our repositories contains content for the next ROCm release.

The ROCm documentation homepage is rocm.docs.amd.com.

Building the documentation

For a quick-start build, use the following code. For more options and detail, refer to Building documentation.

cd docs
pip3 install -r sphinx/requirements.txt
python3 -m sphinx -T -E -b html -d _build/doctrees -D language=en . _build/html

Alternatively, CMake build is supported.

cmake -B build
cmake --build build --target=doc

Older ROCm releases

For release information for older ROCm releases, refer to the CHANGELOG.

rocm's People

Stargazers

Watchers

Forkers

brainiarc7 briansp2020 dkuspawono forresti raxopsrc bigzaajung nabar nuaays psteinb janbraiins rzel hongyunnchen xzllxls xyuan chenkaiidy waitsam vincentsc mightyaladin jameslinus digikata ghostintheshellarise sabirdvd tstellar hardentoo zhengweisk drcurtis1 maodong1 mypandashaoxiang andreychernyshev bdgowda1 settle candiceflower paddymahoney hiddenvs luco2018 hdyen cdamodify likewind123 odellus agrawald sfrias freedom99 priteshkumar xuwenqiang224 vastech73 yangxiu123321 ruc98 pricebenjamin wbgilmartin peterwillcn saipoorna nieliangtian doddle7456 nlipski noerr yongzhao12 hizumi0 aleksthegreat hephaex fengjinwei666 slimshizn graviiity nikpavlenko bhardwajrahul johnsontsai99 hlc0216 amd-aakash wangqiang1588 t37ra firewindmill nculz wuziyou199217 9600- fun9948 tlwzzy sailfish009 stjordanis jjsbear neveroldmilk greytear tjovanovic996 apulis qiyd81 acowley christinaelder nelsonc-amd luckyangman jackgold123 strategist922 linuxperia lovehrtf chauthai suryagrandhi drkoller pradorocchi bacilladoro thinkerston miras1990 sileht tomneko

rocm's Issues

ROCm vector_copy sample hangs after "Loading the code object succeeded"

I followed the directions at ROCm Install on a machine running a fresh Ubuntu server 16.04 64-bit install (headless, no graphical desktop). I have a R9 Nano in the machine. I rebooted into the 4.4.0-kfd-compute-rocm-rel-1.2-31 kernel, then tried to compile and run the vector_copy sample:

$ cd /opt/rocm/hsa/sample
$ make
$ ./vector_copy
Initializing the hsa runtime succeeded.
Checking finalizer 1.0 extension support succeeded.
Generating function table for finalizer succeeded.
Getting a gpu agent succeeded.
Querying the agent name succeeded.
The agent name is Fiji.
Querying the agent maximum queue size succeeded.
The maximum queue size is 131072.
Creating the queue succeeded.
"Obtaining machine model" succeeded.
"Getting agent profile" succeeded.
Create the program succeeded.
Adding the brig module to the program succeeded.
Query the agents isa succeeded.
Finalizing the program succeeded.
Destroying the program succeeded.
Create the executable succeeded.
Loading the code object succeeded.

It hangs there for multiple minutes using 100% CPU. I try Control-C multiple times eventually the process is killed. And I see this error in dmesg:

[  253.260002] kfd: qcm fence wait loop timeout expired 
[  253.260025] kfd: unmapping queues failed. 
[  253.260040] kfd: the cp might be in an unrecoverable state due to an unsuccessful queues preemption

Why does vector_copy not run? I should point out this machine has a CPU without PCIe Gen3 atomics. It's a 2008-era Core 2 Duo E8400, on a mobo with Intel P31 chipset. (Don't ask why it is so old—it's just a spare CPU & mobo I had laying around for a quick test ) Is the lack of atomics causing this problem?

How to run from Docker container?

Is it possible to install and run ROCm from a docker container provided that the correct kernel is loaded?

I am running Ubuntu 16.04 and would like to install and use ROCm from a 14.04 container. I've tried using kernel 4.8.0rc8 (built from yakkety master/next) and starting docker container with docker run -it --device=/dev/kfd <image id>. /opt/rocm/bin/rocm-smi -a sees the GPU. However, vector_copy fails with Getting a gpu agent failed.. Debugging shows that it doesn't find any agents.

OpenCL alongside ROCm?

Hi - I was wondering how to have OpenCL available on a rocm enabled kernel with ubuntu 14.04.4. I followed the installation instructions in this repo and installed the opencl icd through ocl-icd-opencl-dev but clinfo just prints:

$ clinfo
I: ICD loader reports no usable platforms

whereas lspci correctly reports the GPU(s):

$ lspci -v |grep "VGA controller"
02:00.0 VGA compatible controller: NVIDIA Corporation GF119 [NVS 310] (rev a1) (prog-if 00 [VGA controller])
03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Fiji [Radeon R9 FURY / NANO Series] (rev ca) (prog-if 00 [VGA controller])

but rocm-smi also chokes a bit:

$ sudo /opt/rocm/bin/rocm-smi -a


===================   ROCm System Management Interface   ===================
============================================================================
GPU[0]          : Temperature: 38.0c
GPU[1]          : Unable to display temperature
============================================================================
============================================================================
GPU[0]          : GPU Clock Level: 0 (300Mhz)
GPU[0]          : GPU Memory Clock Level: 0 (500Mhz)
GPU[1]          : PowerPlay not enabled - Cannot display clocks
============================================================================
============================================================================
GPU[0]          : Fan Level: 48 (18.82)%
GPU[1]          : PowerPlay not enabled - Cannot display fan speed
============================================================================
============================================================================
GPU[0]          : Current PowerPlay Level: auto
GPU[1]          : PowerPlay not enabled - Cannot display Performance Level
============================================================================
============================================================================
GPU[0]          : Supported GPU clock frequencies on GPU0
GPU[0]          : 0: 300Mhz *
GPU[0]          : 1: 508Mhz 
GPU[0]          : 2: 717Mhz 
GPU[0]          : 3: 874Mhz 
GPU[0]          : 4: 911Mhz 
GPU[0]          : 5: 944Mhz 
GPU[0]          : 6: 974Mhz 
GPU[0]          : 7: 1000Mhz 
GPU[0]          : 
GPU[0]          : Supported GPU Memory clock frequencies on GPU0
GPU[0]          : 0: 500Mhz *
GPU[0]          : 
GPU[1]          : PowerPlay not enabled - Cannot display clocks
============================================================================
===================          End of ROCm SMI Log         ===================

Any hint would be appreciated.
P

slow download speed

It's downloading slow on my 60Mb network, while everything other than this repo is fast in apt-get.

kernel with debug symbols?

Hi, I wanted to some performance analysis on my machine that comes with the ROCm kernel. For systemtap and perf, kernel side debug symbols would be nice. Are you guys considereing to ship a kernel-devel package as well?

Is ROCm-1.1.1 released?

I see that the documentation here and the tags are updated. However, apt repository still does not have 1.1.1 files.
Also, since this release is not 1.2, I suppose Hawaii support is still not released?

Thanks!

FuryX, dual CPU, Ubuntu 16.04, ROCm 1.2, can't get GPU agent.

I installed ROCm 1.2 according to the instructions. Compiling vector_copy went OK, but running it outputs:

Initializing the hsa runtime succeeded.
Checking finalizer 1.0 extension support succeeded.
Generating function table for finalizer succeeded.
Getting a gpu agent failed.

I looked into it a bit, and hsa_iterate_agents() reports only 2 CPU agents. (I'm on a dual- Xeon E5-2630v3 system). No GPU agent reported, although I do have a FuryX running correctly.

I can use the FuryX through OpenCL.

I also tried hcc with saxpy.cpp:
./saxpy
There is no device can be used to do the computation

(by the way, there's an error in that error message as well).

hcc --version:
HCC clang version 3.5.0 (based on HCC 0.10.16313-d90738a-10704f4 LLVM 3.5.0svn)
Target: x86_64-unknown-linux-gnu
Thread model: posix

uname -a
Linux big 4.4.0-kfd-compute-rocm-rel-1.2-31 #1 SMP Fri Jul 22 06:06:24 CDT 2016 x86_64 x86_64 x86_64 GNU/Linux

I suspect this may have something to do with my dual-CPU? maybe hsa_iterate_agents() stops early at two agents before reaching the GPU?

Here is output from clinfo:

Number of platforms: 1
Platform Profile: FULL_PROFILE
Platform Version: OpenCL 2.0 AMD-APP (2117.7)
Platform Name: AMD Accelerated Parallel Processing
Platform Vendor: Advanced Micro Devices, Inc.
Platform Extensions: cl_khr_icd cl_amd_event_callback cl_amd_offline_devices

Platform Name: AMD Accelerated Parallel Processing
Number of devices: 2
Device Type: CL_DEVICE_TYPE_GPU
Vendor ID: 1002h
Board name:
Device Topology: PCI[ B#129, D#0, F#0 ]
Max compute units: 14
Max work items dimensions: 3
Max work items[0]: 256
Max work items[1]: 256
Max work items[2]: 256
Max work group size: 256
Preferred vector width char: 4
Preferred vector width short: 2
Preferred vector width int: 1
Preferred vector width long: 1
Preferred vector width float: 1
Preferred vector width double: 1
Native vector width char: 4
Native vector width short: 2
Native vector width int: 1
Native vector width long: 1
Native vector width float: 1
Native vector width double: 1
Max clock frequency: 555Mhz
Address bits: 64
Max memory allocation: 2699563008
Image support: Yes
Max number of images read arguments: 128
Max number of images write arguments: 8
Max image 2D width: 16384
Max image 2D height: 16384
Max image 3D width: 2048
Max image 3D height: 2048
Max image 3D depth: 2048
Max samplers within kernel: 16
Max size of kernel argument: 1024
Alignment (bits) of base address: 2048
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: No
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: Yes
Round to +ve and infinity: Yes
IEEE754-2008 fused multiply-add: Yes
Cache type: Read/Write
Cache line size: 64
Cache size: 16384
Global memory size: 3784101888
Constant buffer size: 65536
Max number of constant args: 8
Local memory type: Scratchpad
Local memory size: 32768
Max pipe arguments: 0
Max pipe active reservations: 0
Max pipe packet size: 0
Max global variable size: 0
Max global variable preferred total size: 0
Max read/write image args: 0
Max on device events: 0
Queue on device max size: 0
Max on device queues: 0
Queue on device preferred size: 0
SVM capabilities:
Coarse grain buffer: No
Fine grain buffer: No
Fine grain system: No
Atomics: No
Preferred platform atomic alignment: 0
Preferred global atomic alignment: 0
Preferred local atomic alignment: 0
Kernel Preferred work group size multiple: 64
Error correction support: 0
Unified memory for Host and Device: 0
Profiling timer resolution: 1
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: No
Queue on Host properties:
Out-of-Order: No
Profiling : Yes
Queue on Device properties:
Out-of-Order: No
Profiling : No
Platform ID: 0x7f9a2e4868f8
Name: Fiji
Vendor: Advanced Micro Devices, Inc.
Device OpenCL C version: OpenCL C 1.2
Driver version: 2117.7 (VM)
Profile: FULL_PROFILE
Version: OpenCL 1.2 AMD-APP (2117.7)
Extensions: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_image2d_from_buffer cl_khr_spir cl_khr_gl_event

Device Type: CL_DEVICE_TYPE_CPU
Vendor ID: 1002h
Board name:
Max compute units: 32
Max work items dimensions: 3
Max work items[0]: 1024
Max work items[1]: 1024
Max work items[2]: 1024
Max work group size: 1024
Preferred vector width char: 16
Preferred vector width short: 8
Preferred vector width int: 4
Preferred vector width long: 2
Preferred vector width float: 8
Preferred vector width double: 4
Native vector width char: 16
Native vector width short: 8
Native vector width int: 4
Native vector width long: 2
Native vector width float: 8
Native vector width double: 4
Max clock frequency: 2356Mhz
Address bits: 64
Max memory allocation: 33766751232
Image support: Yes
Max number of images read arguments: 128
Max number of images write arguments: 64
Max image 2D width: 8192
Max image 2D height: 8192
Max image 3D width: 2048
Max image 3D height: 2048
Max image 3D depth: 2048
Max samplers within kernel: 16
Max size of kernel argument: 4096
Alignment (bits) of base address: 1024
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: Yes
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: Yes
Round to +ve and infinity: Yes
IEEE754-2008 fused multiply-add: Yes
Cache type: Read/Write
Cache line size: 64
Cache size: 32768
Global memory size: 135067004928
Constant buffer size: 65536
Max number of constant args: 8
Local memory type: Global
Local memory size: 32768
Max pipe arguments: 16
Max pipe active reservations: 16
Max pipe packet size: 3701980160
Max global variable size: 1879048192
Max global variable preferred total size: 1879048192
Max read/write image args: 64
Max on device events: 0
Queue on device max size: 0
Max on device queues: 0
Queue on device preferred size: 0
SVM capabilities:
Coarse grain buffer: No
Fine grain buffer: No
Fine grain system: No
Atomics: No
Preferred platform atomic alignment: 0
Preferred global atomic alignment: 0
Preferred local atomic alignment: 0
Kernel Preferred work group size multiple: 1
Error correction support: 0
Unified memory for Host and Device: 1
Profiling timer resolution: 1
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: Yes
Queue on Host properties:
Out-of-Order: No
Profiling : Yes
Queue on Device properties:
Out-of-Order: No
Profiling : No
Platform ID: 0x7f9a2e4868f8
Name: Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz
Vendor: GenuineIntel
Device OpenCL C version: OpenCL C 1.2
Driver version: 2117.7 (sse2,avx)
Profile: FULL_PROFILE
Version: OpenCL 1.2 AMD-APP (2117.7)
Extensions: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_spir cl_khr_gl_event

missing GPU list, can't find any OpenCL devices

Good Day! Please help solved problem
Build platform AMD 16 GPU
Now test RocM with 1 card
use 2 software Ethereum (ETHMINER) and Claymore's AMD
all software not see any card, normally works with FGLRX 15.12 (not ROCm)
How to tell the program where the list of devices.
THANKS

uname -a
Linux KFD 4.4.0-kfd-compute-rocm-rel-1.2-31 #1 SMP Fri Jul 22 06:06:24 CDT 2016 x86_64 x86_64 x86_64 GNU/Linux

dmesg | grep kfd
[    0.000000] Linux version 4.4.0-kfd-compute-rocm-rel-1.2-31 (jenkins@sm15k-37) (gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04.1) ) #1 SMP Fri Jul 22 06:06:24 CDT 2016
[    0.000000] Command line: BOOT_IMAGE=/vmlinuz-4.4.0-kfd-compute-rocm-rel-1.2-31 root=/dev/mapper/KFD--vg-root ro
[    0.000000] Kernel command line: BOOT_IMAGE=/vmlinuz-4.4.0-kfd-compute-rocm-rel-1.2-31 root=/dev/mapper/KFD--vg-root ro
[    1.078764] usb usb1: Manufacturer: Linux 4.4.0-kfd-compute-rocm-rel-1.2-31 ehci_hcd
[    1.243041] usb usb2: Manufacturer: Linux 4.4.0-kfd-compute-rocm-rel-1.2-31 ehci_hcd
[    1.438467] usb usb3: Manufacturer: Linux 4.4.0-kfd-compute-rocm-rel-1.2-31 xhci-hcd
[    1.641858] usb usb4: Manufacturer: Linux 4.4.0-kfd-compute-rocm-rel-1.2-31 xhci-hcd
[   12.097085] CPU: 4 PID: 476 Comm: systemd-udevd Not tainted 4.4.0-kfd-compute-rocm-rel-1.2-31 #1
[   12.149212] amdkfd: PeerDirect interface was not detected
[   12.149215] kfd kfd: Initialized module
[   22.757640] kfd kfd: Allocated 3944480 bytes on gart for device(1002:67b1)
[   22.757788] kfd kfd: added device (1002:67b1)

 lspci | grep -i VGA
03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Hawaii PRO [Radeon R9 290] (rev 80)

lsmod | grep amd
amdkfd                184320  1
amd_iommu_v2           20480  1 amdkfd
amdgpu               1449984  1
ttm                    94208  1 amdgpu
drm_kms_helper        139264  1 amdgpu
drm                   356352  4 ttm,drm_kms_helper,amdgpu
i2c_algo_bit           16384  2 igb,amdgpu

# ===================   ROCm System Management Interface   ===================
# GPU[0]          : GPU ID: 0x67b1
# 
# GPU[0]          : Temperature: 41.0c
# 
# GPU[0]          : PowerPlay not enabled - Cannot display clocks
# 
# GPU[0]          : Fan Level: 51 (20.0)%
# 
# GPU[0]          : Current PowerPlay Level: auto
# 
# GPU[0]          : Current OverDrive value: 0%
# 
# GPU[0]          : PowerPlay not enabled - Cannot display clocks

===================          End of ROCm SMI Log         ===================


**clinfo**
Number of platforms:                             1
  Platform Profile:                              FULL_PROFILE
  Platform Version:                              OpenCL 1.2 AMD-APP (1445.5)
  Platform Name:                                 AMD Accelerated Parallel Processing
  Platform Vendor:                               Advanced Micro Devices, Inc.
  Platform Extensions:                           cl_khr_icd cl_amd_event_callback cl_amd_offline_devices cl_amd_hsa

  Platform Name:                                 AMD Accelerated Parallel Processing
Number of devices:                               1
  Device Type:                                   CL_DEVICE_TYPE_CPU
  Vendor ID:                                     1002h
  Board name:
  Max compute units:                             10
  Max work items dimensions:                     3
    Max work items[0]:                           1024
    Max work items[1]:                           1024
    Max work items[2]:                           1024
  Max work group size:                           1024
  Preferred vector width char:                   16
  Preferred vector width short:                  8
  Preferred vector width int:                    4
  Preferred vector width long:                   2
  Preferred vector width float:                  8
  Preferred vector width double:                 4
  Native vector width char:                      16
  Native vector width short:                     8
  Native vector width int:                       4
  Native vector width long:                      2
  Native vector width float:                     8
  Native vector width double:                    4
  Max clock frequency:                           2175Mhz
  Address bits:                                  64
  Max memory allocation:                         8406649856
  Image support:                                 Yes
  Max number of images read arguments:           128
  Max number of images write arguments:          8
  Max image 2D width:                            8192
  Max image 2D height:                           8192
  Max image 3D width:                            2048
  Max image 3D height:                           2048
  Max image 3D depth:                            2048
  Max samplers within kernel:                    16
  Max size of kernel argument:                   4096
  Alignment (bits) of base address:              1024
  Minimum alignment (bytes) for any datatype:    128
  Single precision floating point capability
    Denorms:                                     Yes
    Quiet NaNs:                                  Yes
    Round to nearest even:                       Yes
    Round to zero:                               Yes
    Round to +ve and infinity:                   Yes
    IEEE754-2008 fused multiply-add:             Yes
  Cache type:                                    Read/Write
  Cache line size:                               64
  Cache size:                                    32768
  Global memory size:                            33626599424
  Constant buffer size:                          65536
  Max number of constant args:                   8
  Local memory type:                             Global
  Local memory size:                             32768
  Kernel Preferred work group size multiple:     1
  Error correction support:                      0
  Unified memory for Host and Device:            1
  Profiling timer resolution:                    1
  Device endianess:                              Little
  Available:                                     Yes
  Compiler available:                            Yes
  Execution capabilities:
    Execute OpenCL kernels:                      Yes
    Execute native function:                     Yes
  Queue properties:
    Out-of-Order:                                No
    Profiling :                                  Yes
  Platform ID:                                   0x00007fc553302de0
  Name:                                          Intel(R) Xeon(R) CPU E5-2663 v3 @ 2.80GHz
  Vendor:                                        GenuineIntel
  Device OpenCL C version:                       OpenCL C 1.2
  Driver version:                                1445.5 (sse2,avx)
  Profile:                                       FULL_PROFILE
  Version:                                       OpenCL 1.2 AMD-APP (1445.5)
  Extensions:                                    cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_spir cl_amd_svm cl_khr_gl_event


./vector_copy
Initializing the hsa runtime succeeded.
Checking finalizer 1.0 extension support succeeded.
Generating function table for finalizer succeeded.
Getting a gpu agent succeeded.
Querying the agent name succeeded.
The agent name is Hawaii.
Querying the agent maximum queue size succeeded.
The maximum queue size is 131072.
Creating the queue succeeded.
"Obtaining machine model" succeeded.
"Getting agent profile" succeeded.
Create the program succeeded.
Adding the brig module to the program succeeded.
Query the agents isa succeeded.
Finalizing the program succeeded.
Destroying the program succeeded.
Create the executable succeeded.
Loading the code object succeeded.
Freeze the executable succeeded.
Extract the symbol from the executable succeeded.
Extracting the symbol from the executable succeeded.
Extracting the kernarg segment size from the executable succeeded.
Extracting the group segment size from the executable succeeded.
Extracting the private segment from the executable succeeded.
Creating a HSA signal succeeded.
Finding a fine grained memory region succeeded.
Allocating argument memory for input parameter succeeded.
Allocating argument memory for output parameter succeeded.
Finding a kernarg memory region succeeded.
Allocating kernel argument memory buffer succeeded.
Dispatching the kernel succeeded.
Passed validation.
Freeing kernel argument memory buffer succeeded.
Destroying the signal succeeded.
Destroying the executable succeeded.
Destroying the code object succeeded.
Destroying the queue succeeded.
Freeing in argument memory buffer succeeded.
Freeing out argument memory buffer succeeded.
Shutting down the runtime succeeded.

Multi GPU scaling

Hello!
First of all, very cool project!

My question is not specific to ROCm, but it is related, and I thought you folks may have some advice on the following:

I have two RX 470 cards. I am running a series of OpenCL kernels which are fairly memory intensive : this is a video compression application, so a lot of data passes from host to GPU and back. There is also
high CPU usage.

When I run my kernels on a single 470, total frame rate is 40 FPS. When I use two 470s, frame rate equals
60 FPS. There is no dependency in the code between the two devices.

So, it looks like scaling is sub-optimal. I was hoping/expecting to get around 80 FPS for two cards. What factors may be affecting compute scaling on multiple cards? How can I trouble-shoot this issue?

Any advice would be greatly appreciated.

Thanks!
Aaron

Strip HSA BRIG section

Sorry, off topic, but does anyone here know how to strip a .brig section from an ELF binary ?
I am converting my OpenCL kernel into binary, and I don't need this section.

Thanks!

hcc and friends not in path

Hi -

I just installed ROCm 1.1 on an ubuntu 14.04.04 box successfully, but the paths to hcc/clang++/hipify are not in PATH/LD_LIBRARY_PATH after boot. I am not sure what the rational behind this is, but I suggest to add /opt/rocm/* to the respective PATHs so that after a fresh boot, I can use these tools right away.

Also, I couldn't find any manpages or other terminal-based documentation bundled under /opt/rocm. I only saw /opt/rocm/hip/docs/html. Are there any plans to provide further documentation?

Just 2 suggestions/questions -
P

OpenCL support in ROCm

I see #12 - but this topic did not lay out OpenCL support - Ben Sander in comments @ http://gpuopen.com/rocm-do-you-speaka-my-language/ says OpenCL is soon, but this was May 5th (4 months ago from this issue).

I think it's pretty ironic/funny/sad OpenCL is the last thing to come out of the treasure trove of HSA that's been going on. I see CLOC but it's useless against complex applications that use the OpenCL runtime.

Where are we with OpenCL kernel + runtime support in ROCm?

RHEL/Fedora binaries?

As long as the RHEL/Fedora rpms and contained binaries are not around, where do I find some documentation how to build ROCm from source?

What is ROC-smi?

The README.md has ROC-smi as one of the project that was released as part of 1.1 release. But the link is not working. What is it? Inquiring mind wants to know! :)

IRC channel for ROCm ?

would be great to have a place to discuss performance computing, multi-GPU design and hardware, GPU black arts etc.

can't repo sync

The repo configuration is set to use ssh, which makes it unusable for people for people who are not github users (at the least) and I believe github users who don't have commit access (like myself).

Perhaps I missed something that lets ssh cloning work in these cases?

ps. Please support and build binaries for OpenSUSE!

Question about ROCm

I have a GPU-accelerated video compression application - very memory intensive.

I use OpenCL 1.2. There is no need for device side enqueue, or to have different
GPUs communicate with each other.

Is there an advantage to using ROCm over regular OpenCL runtime, for my application?

I have heard that performance is better due to lower host-side overhead.

Thanks so much,
Aaron

compute-firmware fails to install

When I tried to install rocm through apt-get, compute-firmware fails to install properly:

Errors were encountered while processing:
 /var/cache/apt/archives/compute-firmware_1.0-fdd910a_all.deb
E: Sub-process /usr/bin/dpkg returned an error code (1)

When I tried to force the installation through sudo apt-get -f install I get the following:

The following extra packages will be installed:
  compute-firmware
The following NEW packages will be installed:
  compute-firmware
0 upgraded, 1 newly installed, 0 to remove and 92 not upgraded.
21 not fully installed or removed.
Need to get 0 B/1,349 kB of archives.
After this operation, 21.0 MB of additional disk space will be used.
Do you want to continue? [Y/n] y
(Reading database ... 250969 files and directories currently installed.)
Preparing to unpack .../compute-firmware_1.0-fdd910a_all.deb ...
Unpacking compute-firmware (1.0-fdd910a) ...
Replacing files in old package linux-firmware (1.127.20) ...
dpkg: error processing archive /var/cache/apt/archives/compute-firmware_1.0-fdd910a_all.deb (--unpack):
 trying to overwrite '/lib/firmware/radeon/tonga_sdma1.bin', which is also in package radeon-firmware 410-604
dpkg-deb: error: subprocess paste was killed by signal (Broken pipe)
Errors were encountered while processing:
 /var/cache/apt/archives/compute-firmware_1.0-fdd910a_all.deb
E: Sub-process /usr/bin/dpkg returned an error code (1)

Official RPMs not availabe

The path for the yum repositories (http://packages.amd.com/rocm/yum/rpm/) does not exist.
Are the official packages not yet uploaded or have they been removed? Or did the link perhaps change and has not yet been corrected?

hsa_signal_wait_acquire() doesn't return on /opt/rocm/hsa/sample/vector_copy

cont. ROCm/hcc#71

"verify-installation" step doesn't succeed because hsa_signal_wait_acquire() doesn't return.

Environment is following:

Ubuntu 14.04.4
Core i7-3770K
AMD Radeon R9 FuryX
DDR3-1600 2GBx4

Steps to reproduce are following:

Clean-install ubuntu 14.04.4 ("Erase disk and install ubuntu")
Boot normally, and I got "low level graphic mode" (this is expected for FuryX, right?).
Enter console mode with "Ctrl+Alt+F1".
Install hcc; following https://github.com/RadeonOpenCompute/ROCm#add-the-rocm-apt-repository
reboot
Enter console mode again
export PATH=${PATH}:/opt/rocm/bin
cp -r /opt/rocm/hsa/sample/ ~/sample
cd ~/sample
make
./vector_copy output following but doesn't finish.

Initializing the hsa runtime succeeded.
Checking finalizer 1.0 extension support succeeded.
Generating function table for finalizer succeeded.
Getting a gpu agent succeeded.
Querying the agent name succeeded.
The agent name is Fiji.
Querying the agent maximum queue size succeeded.
The maximum queue size is 131072.
Creating the queue succeeded.
"Obtaining machine model" succeeded.
"Getting agent profile" succeeded.
Create the program succeeded.
Adding the brig module to the program succeeded.
Query the agents isa succeeded.
Finalizing the program succeeded.
Destroying the program succeeded.
Create the executable succeeded.
Loading the code object succeeded.
Freeze the executable succeeded.
Extract the symbol from the executable succeeded.
Extracting the symbol from the executable succeeded.
Extracting the kernarg segment size from the executable succeeded.
Extracting the group segment size from the executable succeeded.
Extracting the private segment from the executable succeeded.
Creating a HSA signal succeeded.
Finding a fine grained memory region succeeded.
Allocating argument memory for input parameter succeeded.
Allocating argument memory for output parameter succeeded.
Finding a kernarg memory region succeeded.
Allocating kernel argument memory buffer succeeded.
Dispatching the kernel succeeded.

This seems hsa_signal_wait_acquire() doesn't return.

What's the problem and how can I fix this?

Note that I use Ivy Bridge CPU so ROCm doen't support the CPU according to @whchung .
Could you point me to the notice for the supported CPU on any wiki page or other documents etc?

Thanks.

Please help a poor windows user

First of all, congratulations on the latest release.

My situation is:

C++ application with OpenCL 1.2 kernels, running on windows but is portable to Linux.
(video compression app, makes heavy use of OpenCL images, events )
Windows 10 Crimson with 2x 470 4GB cards
i7 6600 CPU, ASUS mb

I am looking for a simple, tested, step-by-step guide to getting my app running on ROCm.

i.e.

which operating system (hopefully xubuntu, since I am familiar with that)
which graphics drivers
how to do the ROCm install

I tried installing xubuntu 16.04 a few months ago when I got my system, but I wasn't able to get the graphics drivers to recognize the cards, so had to wipe and install windows 10. But, I would like to use Linux as my main target OS, so ROCm will be perfect if I can get it running.

Any guidance here would be greatly appreciated.

Thanks!
Aaron

404s for some links to repos in Readme.md

I'm getting 404s for the links to the github repositories for HCC, the Assembler stuff, HIP, and HIP examples.

Carrizo APU, Ubuntu 14.04, ROCm 1.2, stack trace on boot

Hello,

I updated ROCm from 1.1.1 to version 1.2 and upon reboot I get a stack trace which hangs the boot sequence.
A fresh install of Ubuntu did nothing to solve this so it should easily be reproducible, at least on a Aspire E15 with a FX-8800P processor.

I would love to be able to paste the stack trace in this thread but kern.log doesn't show anything related to the 4.4.0-kfd-compute-rocm-rel-1.2-31 kernel and would gladly follow any directions as to provide any additional information. I do have a picture of it below which hopefully will get the idea across.

Would it be possible, as a fast recovery option, to provide ROCm version 1.1 through the apt-get server ?

Missing information for 1.2 source?

Using the given command to download the RoCM 1.2 source does not work:

$ repo init -u https://github.com/RadeonOpenCompute/ROCm.git -b roc-1.2.0

curl: (22) The requested URL returned error: 404 Not Found
Server does not provide clone.bundle; ignoring.
fatal: Couldn't find remote ref refs/heads/roc-1.2.0
Unexpected end of command stream

Use ROCm on ubuntu VM hosted on windows ??

Is this possible ? Or does ubuntu need to be on bare metal.

rocm 1.1.1 removed rocm-smi

I recently updated to rocm 1.1.1 under ubuntu 14.04.x. I saw that rocm-smi was remove during the update. I saw that I can install it via aptitude but I get a version that carries 1.0.0 in it's name.

I was wondering how to proceed? Further, I love nvidia-smi in order to see if a process really runs on the dGPU, AFAIK I cannot do this with rocm-smi. What other methods are there to monitor app execution on the dGPU?

how to build from source

Maybe I missed something out of the documentation spread across the repositories - but how does one build this hydra's toolchain?

CLOC still requires components to be in the old path

I had just installed the ROCm using apt-get a few days ago. I saw the cloc is integrated into the package. However, in cloc.sh and snack.sh, the default path is still pointing to /opt/amd/llvm and /opt/amd/hsa. Should these default paths be updated? Is there a way that we can configure the tools to still make them usable? I did not see a clang in the llvm path, so which clang we can use for now? Can the one in /opt/rocm/hcc-hsail/bin work together with cloc?

Polaris Support?

Does ROCm 1.2 support Polaris yet?

supporting other distributions

Hi - I wanted to inquire on the availability of packages for other distributions - from RHEL/SLES to desktops such as OpenSUSE.

I understand this project on the whole is FOSS, but it is a big /complex toolchain and seemingly the FOSS path is unused (take #27 as a kernel of truth towards this ). Ubuntu is a good start but there needs to be some builds available for other distributions. It would probably operate best if using the upstream build systems, such as Novell/SuSE's Open Build System (OBS), whether you still host a local repository (like for Ubuntu) or use build.opensuse.org for hosting. Fedora has something similar as does really many distros - however I know several projects/corporation have used OBS, such as Intel Tizen... it also supports building against many other distros, if that's of interest. My main point is though that support should be increased in a manner similar to ZFSonLinux.

Surely you have more resources than a LLNL project porting an existing software? :-)

Problem using ROCm

Hi,
I have been following the development of hcc and ROCm for a while. I have used most of previous versions and had them running on my Kaveri APU system. However, since I updated to ROCm-1.0 release, none of the executable I compile runs.
When I try to run executable compiled with hcc, I get

### Error: HSA_STATUS_ERROR_INCOMPATIBLE_ARGUMENTS (4109) at line:1757

Funny thing is, my code is not very long. They are toy examples and are well less than a thousand lines of code.

I can run /opt/rocm/hsa/sample/vector_copy and it prints out success messages. But that's the only thing I can run right now.

Also, configuring hcc compilation environment does not find hsa installation any more. Before installing ROCm, cmake would find everything without any issues. Now, I get

=============================================
HCC version: 0.10.16163-caab0f1-7e4cd9e
=============================================
CMake Error at CMakeLists.txt:261 (MESSAGE):
Neither OpenCL nor HSA is available on the system!

What did I do wrong?

Missing aufs kernel module in ROCm kernel makes running docker daemon much harder

Current ROCm distribution of Linux kernel (4.6.0-kfd-compute-rocm-rel-1.3-74) doesn't provide kernel module for aufs file system. This makes docker daemon fail when starting and makes the description (https://github.com/RadeonOpenCompute/ROCm-docker) of installation of docker containers on the ROCm kernel incomplete.

systemctl status docker.service
● docker.service - Docker Application Container Engine
Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since pią 2016-12-09 08:22:57 CET; 5s ago
Docs: https://docs.docker.com
Process: 2911 ExecStart=/usr/bin/dockerd -H fd:// $DOCKER_OPTS (code=exited, status=1/FAILURE)
Main PID: 2911 (code=exited, status=1/FAILURE)

gru 09 08:22:56 computer systemd[1]: Starting Docker Application Container Engine...
gru 09 08:22:56 computer dockerd[2911]: time="2016-12-09T08:22:56.334816244+01:00" level=info msg="libcontainerd: new containerd process, pid: 2923"
gru 09 08:22:57 computer dockerd[2911]: time="2016-12-09T08:22:57.394559569+01:00" level=error msg="[graphdriver] prior storage driver "aufs" failed: driver not supported"
gru 09 08:22:57 computer dockerd[2911]: time="2016-12-09T08:22:57.394752192+01:00" level=fatal msg="Error starting daemon: error initializing graphdriver: driver not supported"
gru 09 08:22:57 computer systemd[1]: docker.service: Main process exited, code=exited, status=1/FAILURE
gru 09 08:22:57 computer systemd[1]: Failed to start Docker Application Container Engine.
gru 09 08:22:57 computer systemd[1]: docker.service: Unit entered failed state.
gru 09 08:22:57 computer systemd[1]: docker.service: Failed with result 'exit-code'.

ROCm-1.3 problem

Hi,
I just installed ROCm-1.3 and am having problems. Even simple examples (vector_copy) no longer work. dmesg shows

[ 834.418133] amdkfd: PeerDirect interface was not detected
[ 843.452878] kfd: qcm fence wait loop timeout expired
[ 843.452880] kfd: unmapping queues failed.
[ 843.452881] kfd: the cp might be in an unrecoverable state due to an unsuccessful queues preemption
[ 852.457134] kfd: qcm fence wait loop timeout expired
[ 852.457136] kfd: unmapping queues failed.
[ 852.457137] kfd: the cp might be in an unrecoverable state due to an unsuccessful queues preemption
[ 852.457139] amdkfd: Resetting wave fronts on dev ffff88043c0db000

My set up is i7-6700 ASUS Z170M-E D3 + 2 Fury Nano. I did not have any issue when I was using ROCm-1.2.

I upgraded from ROCm-1.2 set up. Maybe that caused some issue? When I tried

sudo apt-get upgrade

It said the packages were held back. So, I did

sudo apt-get install rocm

and it upgraded. After ward, I did

sudo apt-get autoremove

to remove packages that apt-get said no longer needed.

Any ideas?

ROCm OpenCL support/plans

Please clarify plans for and status of the OpenCL compiler/runtime support in ROCm!

Not being able to use OpenCL applications on the compute-oriented stack nor having any official plan for when/if this will happen creates an awkward situation where many compute projects with compatible/tuned OpenCL code -- that some may call "loyal" -- are left behind without clear path for progress. Possibly even worse, these are the people likely most interested in testing/using/helping develop the compute stack and not involving them seems like a bad idea.

symlinks to /opt/rocm/bin missing

I expected the installer to have put symbolic links in /usr/bin to the binary files in /opt/rocm/bin

ROCm-1.3 linker problem.

I tried to build my code at cs344 Problem Set 1 and I get errors.

$ make
hipcc -o HW1 main.o student_func.o compare.o reference_calc.o -L /usr/lib -lopencv_core -lopencv_imgproc -lopencv_highgui -g -hc -std=c++amp
/opt/rocm/hcc-lc/compiler/bin/clamp-link: line 302: cd: /home/briansp/git/cs344/Problem: No such file or directory
objdump: 'main.o': No such file
objdump: 'student_func.o': No such file
objdump: 'compare.o': No such file
objdump: 'reference_calc.o': No such file
ld: cannot find main.o: No such file or directory
ld: cannot find student_func.o: No such file or directory
ld: cannot find compare.o: No such file or directory
ld: cannot find reference_calc.o: No such file or directory
clang-3.5: error: linker command failed with exit code 1 (use -v to see invocation)
Died at /opt/rocm/bin/hipcc line 365.
Makefile:37: recipe for target 'student' failed
make: *** [student] Error 1

Then, I copy the code to ~/dev/cs344 and it builds fine. It looks like the linker is not handling space in the path properly. I'm reporting it here since I'm not sure whether the linker is part of llvm or hcc or clang or lld.

Windows support?

Hello,

are you planning to support Windows platform for the upcoming 1.3 release?

Segmentation Faults when running hsa/hip code samples

I've installed rocm package on Ubuntu 16.04 in accordance to the Readme on a laptop with AMD PRO A12-8800B R7 processor. When running any code sample compiled from /opt/rocm/hsa/sample or /opt/rocm/hip/sample directories I get segmentation fault error.

uname -r
gives:
4.6.0-kfd-compute-rocm-rel-1.3-74

Kernel `4.4.0-kfd-compute-rocm-rel-1.2-31` doesn't have ZFS modules.

Please build your Ubuntu kernels from Ubuntu kernel sources using Ubuntu configs to be compatible with Ubuntu.

Installing ROCm in Ubuntu 14.04.5 on Kaveri 7850K APU

Hi,

I'm having trouble in installing ROCm in Ubuntu 14.04.5. I did a fresh installation of Ubuntu. I turned off the lightdm to make sure there was no conflict with the driver to be installed.

I got the error message 'The system is running in low-graphics mode. Your screen, graphics card and input device settings could not be detected correctly'. It seems the driver is not detected.

I also got a warning "[AMD VI] unable to write to IOMMU perf counter" although I turned on the IOMMU in my bios which is Asus A88X-Pro.

Has anyone had the same problem before?

Thank you!

ROCm 1.3

I have two RX 470s on my system, and I am really looking forward to the 1.3 release.
Currently, using both 470s causes a race condition on windows 10, and the system often
freezes. So, I can't use both in production - have to stick with single card for now.

When you do release 1.3, could you please give detailed install instructions for XUbuntu system?
(XUBuntu 16.04) Last time I tried, I was not able to get AMDGPU-PRO driver working - no cards were
recognized.

Thanks!!

Debian repository slow to access from Europe

Download speeds in Europe are 100 KB/s. I've mentioned this to Greg already. The issue is most like a mis-configuration of TCP stack parameters. Basically, the servers are allowing too few packets in flight and don't work well with high-latency networks. There's no need to add additional European servers or set up a CDN.

Which Debian flavors are supported?

Which Ubuntu/Debian versions are supported? The README.md doesn't say.

User 1001

On Ubuntu 14.04.5 /opt/rocm was owned by user root, but the rest was a non-existing user 1001:1001. As I prefer not to build as root, I'd prefer you create a group "rocm" where I can add the users who should get access to ROCm.

Dead links in README.md

I'm getting 404s for four links in https://github.com/RadeonOpenCompute/ROCm/blob/master/README.md, under 'The Latest ROCm Platform - ROCm 1.3". Tags: HCC compiler, LLVM-AMDGPU-Assembler-Extra, llvm, clang. Possibly permission issues.

Some pointers on how to use OpenCL + ROCm

I just installed the latest version of ROCm and realised that there is no GPU device recognized by the "AMD Accelerated Parallel Processing" OpenCL platform.

I'm aware that there are some changes in relation to the way to use OpenCL with the ROCm driver.

For instance, the following Makefile works perfectly and a binary 'app' is produced.

Makefile:
EXECUTABLE := app
CFILES := app.c
OpenCL_SDK=/opt/AMDAPPSDK-3.0
INCLUDE=-I${OpenCL_SDK}/include
LIBPATH=-L${OpenCL_SDK}/lib/x86_64
LIB=-lOpenCL -lm
all:
gcc -O3 ${INCLUDE} ${LIBPATH} ${CFILES} ${LIB} -o ${EXECUTABLE}
clean:
rm -f *~ app

Portion of code from 'app.c' that initializes the OpenCL platform:
errcode = clGetPlatformIDs(1, &platform_id, &num_platforms);
if(errcode == CL_SUCCESS) printf("number of platforms is %d\n",num_platforms);
else printf("Error getting platform IDs\n");
errcode = clGetPlatformInfo(platform_id,CL_PLATFORM_NAME, sizeof(str_temp), str_temp,NULL);
if(errcode == CL_SUCCESS) printf("platform name is %s\n",str_temp);
else printf("Error getting platform name\n");
errcode = clGetPlatformInfo(platform_id, CL_PLATFORM_VERSION, sizeof(str_temp), str_temp,NULL);
if(errcode == CL_SUCCESS) printf("platform version is %s\n",str_temp);
else printf("Error getting platform version\n");
errcode = clGetDeviceIDs( platform_id, CL_DEVICE_TYPE_GPU, 1, &device_id, &num_devices);
if(errcode == CL_SUCCESS) printf("number of devices is %d\n", num_devices);
else printf("Error getting device IDs\n");

The call to clGetDeviceIDs() results in an error, which does not surprise me given that the "AMD Accelerated Parallel Processing" OpenCL platform only recognizes the CPU (the GPU is not listed when executing 'clinfo').

The GPU is a R9 Nano (same Fiji core as the Fury X), wich to my undertanding is supported by ROCm.

My question is, how can I compile my OpenCL apps in a machine with ROCm installed?

Thanks in advance!

Problem running simplest program

I have a test program that worked with ROCm 1.2 and which now fail with ROCm 1.3.

I upgraded to ROCm 1.3 using the instructions at https://github.com/RadeonOpenCompute/ROCm, being careful to uninstall previous version - even rebooted after each kernel change.

Just for the heck of it I also installed hcc_hsail. The problem is identical whether symlink /opt/rocm/hcc points to /opt/rocm/hcc-lc or to /opt/rocm/hcc-hsail.

It'd be great to have v1.3 work as I have been waiting to try out an RX470 that's still sitting in it's retail box...

Basically it looks like something fails in the program setup phase before main() is executed.

Minimal code that demonstrates problem:

#include <iostream>
#include <hc.hpp>
using namespace hc;
using namespace std;

int main(int argc, char **argv) {
return 0;
}

$ hcc `hcc-config --cxxflags` -c detect.cpp
$ hcc `hcc-config --ldflags` detect.o
$ ./a.out

### HCC STATUS_CHECK Error: HSA_STATUS_ERROR_INCOMPATIBLE_ARGUMENTS (0x100d) at file:/home/scchan/code/github/radeonopencompute/hcc.1.3/hcc/lib/hsa/mcwamp_hsa.cpp line:2504
Aborted (core dumped)

System/environment is:

Ubuntu 16.04, AMD A8-7600 APU, no AIB video card.

$uname -a
Linux quad 4.6.0-kfd-compute-rocm-rel-1.3-63 #1 SMP Fri Oct 28 13:14:45 CDT 2016 x86_64 x86_64 x86_64 GNU/Linux

$ echo $PATH
/opt/rocm/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin

$ echo $LD_LIBRARY_PATH
/opt/rocm/lib

$ head /proc/cpuinfo
processor : 0
vendor_id : AuthenticAMD
cpu family : 21
model : 48
model name : AMD A8-7600 Radeon R7, 10 Compute Cores 4C+6G
stepping : 1
microcode : 0x6003104
cpu MHz : 1400.000
cache size : 2048 KB
physical id : 0

PCIe Bus Error when running vector_copy sample

I followed the README to install ROCm driver. However, when I try to run vector_copy sample, the program hangs, and I see report about a PCIe error on dmesg.

Configuration:

Intel Core i7 6700K on GIGABYTE GA-Z170-HD3 (rev. 1.0) motherboard with Intel Z170 chipset.
AMD Radeon Fury Nano GPU in the PCIe x16 slot
Ubuntu 16.04 on AMD64 architecture
uname -r: 4.4.0-kfd-compute-rocm-rel-1.2-31

Kernel log (dmesg) messages after running the vector_copy sample:

[  942.180230] pcieport 0000:00:1c.4: AER: Uncorrected (Fatal) error received: id=00e4
[  942.180239] pcieport 0000:00:1c.4: PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, id=00e4(Receiver ID)
[  942.180327] pcieport 0000:00:1c.4:   device [8086:a114] error status/mask=00040000/00000000
[  942.180392] pcieport 0000:00:1c.4:    [18] Malformed TLP          (First)
[  942.180444] pcieport 0000:00:1c.4:   TLP Header: 6c000002 06000000 0000000f b76e6008
[  942.180511] pcieport 0000:00:1c.4: broadcast error_detected message
[  942.180513] amdgpu 0000:06:00.0: device has no AER-aware driver
[  942.180514] snd_hda_intel 0000:06:00.1: device has no AER-aware driver
[  943.188998] pcieport 0000:00:1c.4: Root Port link has been reset
[  943.189001] pcieport 0000:00:1c.4: AER: Device recovery failed

/opt/rocm/rocm-smi -a output:

===================   ROCm System Management Interface   ===================
============================================================================
GPU[0]      : GPU ID: 0x7300
============================================================================
============================================================================
GPU[0]      : Temperature: 511.0c
============================================================================
============================================================================
GPU[0]      : Unable to determine current clocks. Check dmesg or GPU temperature
============================================================================
GPU[0]      : Fan Level: 255 (100.0)%
============================================================================
============================================================================
GPU[0]      : Current PowerPlay Level: auto
============================================================================
============================================================================
GPU[0]      : Current OverDrive value: 0%
============================================================================
============================================================================
GPU[0]      : Supported GPU clock frequencies on GPU0
GPU[0]      : 0: 300Mhz
GPU[0]      : 1: 508Mhz
GPU[0]      : 2: 717Mhz
GPU[0]      : 3: 874Mhz
GPU[0]      : 4: 911Mhz
GPU[0]      : 5: 944Mhz
GPU[0]      : 6: 974Mhz
GPU[0]      : 7: 1000Mhz
GPU[0]      :
GPU[0]      : Supported GPU Memory clock frequencies on GPU0
GPU[0]      : 0: 500Mhz
GPU[0]      :
============================================================================
===================          End of ROCm SMI Log         ===================

Windows support (driver, runtime etc)

See subj.

Wondering why AMD always releases their frameworks on Linux / Unix platforms only.

Guys, do you have in mind that GPGPU developers can't build their products for Windows users (98% of them in mass markets)?

This problem relates to the problem of compiling C++11 - compatible code for AMD GPU as well. I see that AMD puts a lot of efforts to use HIP infrastructure to build nvcc - compatible code (with its own C++ implementation) but all of these efforts are useless considering no Windows support. Yet.

Is the Git version of LLVM fully compatible with ROCm?

Are all ROC-related changes merged into LLVM master? Is LLVM master compatible with ROCm 1.2? I have to use LLVM master due to a fixed bug. It seems to work, but I thought to ask.