Comments (4)
from hip.
@bensander Thank you!
Yes, there are new instructions in GCN3 such as: Data-Parallel Primitives (DPP), ds_bpermute
and ds_permute
which can provide the functionality such as __shfl()
and even more: http://gpuopen.com/amd-gcn-assembly-cross-lane-operations/
And yes, these instructions use only route of Local memory (LDS 8.6 TB/s), but don't actually use Local memory, this allows to achive ~51.6 TB/s:
They use LDS hardware to route data between the 64 lanes of a wavefront, but they don’t actually write to an LDS location.
But I can't find anything about ds_shuffle
, not in gpuopen.com and not in GCN3 Instruction pdf:
https://github.com/olvaffe/gpu-docs/raw/master/amd-open-gpu-docs/AMD_GCN3_Instruction_Set_Architecture.pdf
What did you mean about ds_shuffle
?
from hip.
from hip.
Hi @AlexeyAB
HIP provides AMD specific functionality for shfl, swzl
from hip.
Related Issues (20)
- [Issue]: Cannot register Static Global Var on inline variable HOT 4
- [Issue]: failed call to hipInit: HIP_ERROR_InvalidDevice HOT 3
- [Issue]: python -c "import torch;print(torch.cuda.is_available())" returns False HOT 10
- stream create, copy and destroy example HOT 6
- [Documentation]: Fix NVIDIA build instructions HOT 1
- [Documentation]: Installation. ( hipcc-nvidia ) HOT 7
- [Issue]: Blender hangs when rendering
- HIP SDK 5.7 installer installs HIP runtime 5.2 HOT 4
- [Issue]: Asynchronous execution with hipExtModuleLaunchKernel HOT 6
- https://github.com/ROCm/HIP/issues/2209 should be reopened for hipcc returning incosistent values HOT 1
- [Issue]: how to close or bypass L1 cache HOT 2
- [Issue]: use of overloaded operator '/' is ambiguous (with operand types 'float2' (aka 'HIP_vector_type<float, 2>') and 'float2') HOT 2
- [Feature]: Do we now have planning that adds Dynamic Parallelism features? HOT 1
- return value of the device property HOT 1
- [Issue]: Windows + nvidia gpu = error: no ROCm-capable device is detected? HOT 1
- [Issue]: Could not find a configuration file for package "HIP" - requested 1.0, found 6.0.0
- HIP installation on Nvidia platform HOT 6
- [Issue]: Conversion of tiny-cuda-nn lib into HIP HOT 2
- [Issue]: I want to use rocgdb to trace the calling situation of the hip runtime functions。such as hipMalloc HOT 1
- [Documentation]: hipMemGetAdressRange unexpectedly returns hipErrorNotFound HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hip.