GithubHelp home page GithubHelp logo

Comments (7)

harrism avatar harrism commented on June 3, 2024

Hi @rwbfd, the post does not use CUB, so I'm not sure how your compilation errors with CUB are relevant to the post?

from code-samples.

lix19937 avatar lix19937 commented on June 3, 2024

@rwbfd nvcc -O3 main.cu -o reduce -arch=sm_35
and cub version is cub-93696c4bce447b71c4bd0b25d1e26f1247341c04 https://github.com/NVLabs/cub/tree/93696c4bce447b71c4bd0b25d1e26f1247341c04

from code-samples.

harrism avatar harrism commented on June 3, 2024

Notice the About notice on that page, which indicates that you are looking at a very old version of CUB. CUB is now part of the CUDA Toolkit, and lives here: https://github.com/NVIDIA/cub

from code-samples.

lix19937 avatar lix19937 commented on June 3, 2024

@harrism harrism teacher, I want to confirm that:
Question 1:
Whether two threads which from two warps access two differenct addresses in the same share memory bank will arise bank conflicts or not ?

Question 2:
If Bank conflicts from different warps is exist, Bank conflicts from different warps will not cause serious latency and can be ignored is right ?

from code-samples.

lix19937 avatar lix19937 commented on June 3, 2024

From ncu, Memory Workload Analysis, total bank conflicts is 6

const int BLOCK_DIM{32};

// grid(2,1)  block(32, 1) <<<grid, block>>>
template <typename T = float>
__global__ void kernel2(const T* in, T* out) {
  __shared__ T shm[BLOCK_DIM * 2 * 4];

  auto tid = (blockIdx.y * gridDim.x + blockIdx.x) * (blockDim.x * blockDim.y) + threadIdx.y * blockDim.x + threadIdx.x;
  // printf("blockid %d  tid %d\n", blockIdx.x, tid);
  shm[tid] = in[tid];

  __syncthreads();
  out[tid] = shm[tid*4];
}

template <typename T= float>
int transpose(const T* in, T* out) {
  dim3 grid(2, 1);
  dim3 block(BLOCK_DIM, 1);

  kernel2<T><<<grid, block>>>(in, out);

  CheckCudaErrors(cudaPeekAtLastError());
  CheckCudaErrors(cudaDeviceSynchronize());
  return 0;
}

tid bank0 addr
0 0
8 32
16 64
24 96

tid0, tid8, tid16, tid24 are landed at different memory addresses in the same bank(bank0) , so it has bank conflict, I think the number is 3.

The same case in follow threads in a warp:
tid1, tid9, tid17, tid25 , tid2, tid10, tid18, tid26, ... tid7, tid15, tid23, tid31

Why From ncu, Memory Workload Analysis, total bank conflicts is 6 ?
How to get this number ? @harrism

from code-samples.

harrism avatar harrism commented on June 3, 2024

@lix19937 Github issues are not a help forum. Please ask your questions on stack overflow or https://forums.developer.nvidia.com/c/accelerated-computing/cuda/cuda-programming-and-performance/7

from code-samples.

lix19937 avatar lix19937 commented on June 3, 2024

@harrism Much thanks !

from code-samples.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.