Comments (7)
Hi @rwbfd, the post does not use CUB, so I'm not sure how your compilation errors with CUB are relevant to the post?
from code-samples.
@rwbfd nvcc -O3 main.cu -o reduce -arch=sm_35
and cub version is cub-93696c4bce447b71c4bd0b25d1e26f1247341c04
https://github.com/NVLabs/cub/tree/93696c4bce447b71c4bd0b25d1e26f1247341c04
from code-samples.
Notice the About notice on that page, which indicates that you are looking at a very old version of CUB. CUB is now part of the CUDA Toolkit, and lives here: https://github.com/NVIDIA/cub
from code-samples.
@harrism harrism teacher, I want to confirm that:
Question 1:
Whether two threads which from two warps access two differenct addresses in the same share memory bank will arise bank conflicts or not ?
Question 2:
If Bank conflicts from different warps
is exist, Bank conflicts from different warps will not cause serious latency and can be ignored
is right ?
from code-samples.
From ncu, Memory Workload Analysis, total bank conflicts is 6
const int BLOCK_DIM{32};
// grid(2,1) block(32, 1) <<<grid, block>>>
template <typename T = float>
__global__ void kernel2(const T* in, T* out) {
__shared__ T shm[BLOCK_DIM * 2 * 4];
auto tid = (blockIdx.y * gridDim.x + blockIdx.x) * (blockDim.x * blockDim.y) + threadIdx.y * blockDim.x + threadIdx.x;
// printf("blockid %d tid %d\n", blockIdx.x, tid);
shm[tid] = in[tid];
__syncthreads();
out[tid] = shm[tid*4];
}
template <typename T= float>
int transpose(const T* in, T* out) {
dim3 grid(2, 1);
dim3 block(BLOCK_DIM, 1);
kernel2<T><<<grid, block>>>(in, out);
CheckCudaErrors(cudaPeekAtLastError());
CheckCudaErrors(cudaDeviceSynchronize());
return 0;
}
tid | bank0 addr |
---|---|
0 | 0 |
8 | 32 |
16 | 64 |
24 | 96 |
tid0, tid8, tid16, tid24
are landed at different memory addresses in the same bank(bank0) , so it has bank conflict, I think the number is 3.
The same case in follow threads in a warp:
tid1, tid9, tid17, tid25
, tid2, tid10, tid18, tid26
, ... tid7, tid15, tid23, tid31
Why From ncu, Memory Workload Analysis, total bank conflicts is 6 ?
How to get this number ? @harrism
from code-samples.
@lix19937 Github issues are not a help forum. Please ask your questions on stack overflow or https://forums.developer.nvidia.com/c/accelerated-computing/cuda/cuda-programming-and-performance/7
from code-samples.
@harrism Much thanks !
from code-samples.
Related Issues (20)
- submodule is broken
- bandwidthtest.cu shows GB/s, but the math looks like MB/s HOT 1
- ERRORS: in simpleOnnx_*.cpp HOT 3
- simpleTensorCoreGEMM has errors in output when compiled with CUDA10 for Turing GPUs HOT 2
- CUDA-aware runtime error HOT 1
- CUDA-aware MPI example complains about CUDA runtime version HOT 2
- memtype_cache.c:137 UCX WARN destroying inuse address HOT 1
- Getting errors running tensor-cores example HOT 4
- CUDA aware Jacobi examples fail using PGI HOT 1
- [grCUDA] vulnerabiliy issues in packackage dependencies HOT 1
- fatal error: cuda_runtime.h HOT 3
- compile "TensorRT-introduction" HOT 3
- ioHelper.cpp:66:5: error: ‘onnx’ has not been declared HOT 7
- Error in simpleOnnx_1.cpp while running on jetson Nano 4 gb
- error by using cuda-aware-mpi-example, bandwidth was wrong HOT 10
- some questions about unified-memory,dataElem.cu file HOT 1
- Verification Failed on sample for cufft_callbacks
- Can't Detecting CUDA compiler ABI HOT 3
- tensor core example result mismatch with that of cublas
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from code-samples.