Comments (4)
Hey everyone,
so I recently ran into the same problem with CUDA 11 and for me it was an issue with the device code that got generated.
If you want to run this sample on Turing you will have to make sure that you are using the -gencode arch=compute_75,code=sm_75
flags during compilation.
Trying to run this on Turing with a binary compiled for a Volta target (sm_70) will provide the error above. I'm guessing the wmma instructions are so low-level that they are not compatible between architectures.
I'm just leaving this here for future reference, hoping I'll save somebody a lot of head-scratching.
from code-samples.
which version of nvcc are you using?, and did you solved it?
from code-samples.
Device: RTX3090
In CMakeLists: set(CMAKE_CUDA_ARCHITECTURES 86)
NVCC version: 11.1
I get the same issue, anyone can help?
from code-samples.
Hi, I have the same issue.
Device: A100
NVCC version: 11.1
I tried -arch=sm_80
but it does not work for me.
The results seem correct after reducing MATRIX_M, MATRIX_N, and MATRIX_K from 16384 to 1024.
I think the 0.01% relative tolerance and 1e-5 absolute tolerance in the code are too small for large matrix like 16384x16384.
However, I did not get speed up with 1024x1024 matrices:
wmma took 0.300032ms
cublas took 0.041984ms
I guess we would just use cuBLAS or refering to the faster implementation here.
// Use tensor cores
cublasErrCheck(cublasSetMathMode(cublasHandle, CUBLAS_TENSOR_OP_MATH));
from code-samples.
Related Issues (20)
- submodule is broken
- bandwidthtest.cu shows GB/s, but the math looks like MB/s HOT 1
- ERRORS: in simpleOnnx_*.cpp HOT 3
- simpleTensorCoreGEMM has errors in output when compiled with CUDA10 for Turing GPUs HOT 2
- CUDA-aware runtime error HOT 1
- CUDA-aware MPI example complains about CUDA runtime version HOT 2
- memtype_cache.c:137 UCX WARN destroying inuse address HOT 1
- CUDA aware Jacobi examples fail using PGI HOT 1
- [grCUDA] vulnerabiliy issues in packackage dependencies HOT 1
- fatal error: cuda_runtime.h HOT 3
- compile "TensorRT-introduction" HOT 3
- ioHelper.cpp:66:5: error: ‘onnx’ has not been declared HOT 7
- Error in simpleOnnx_1.cpp while running on jetson Nano 4 gb
- error by using cuda-aware-mpi-example, bandwidth was wrong HOT 10
- some questions about unified-memory,dataElem.cu file HOT 1
- Verification Failed on sample for cufft_callbacks
- Can't Detecting CUDA compiler ABI HOT 3
- Cannot reproduce the results on parallel reduce with shfl HOT 7
- tensor core example result mismatch with that of cublas
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from code-samples.