Comments (10)
I got the same error (i.e., cudaCheckError() failed : invalid device function) with my Tesla K40. When I changed the -arch
parameter in lib/make.sh
to sm_35
, and rerun make.sh, it worked.
from faster-rcnn_tf.
For other GPUs:
# Which CUDA capabilities do we want to pre-build for?
# https://developer.nvidia.com/cuda-gpus
# Compute/shader model Cards
# 6.1 P4, P40, Titan X so CUDA_MODEL = 61
# 6.0 P100 so CUDA_MODEL = 60
# 5.2 M40
# 3.7 K80
# 3.5 K40, K20
# 3.0 K10, Grid K520 (AWS G2)
# Other Nvidia shader models should work, but they will require extra startup
# time as the code is pre-optimized for them.
CUDA_MODELS=30 35 37 52 60 61
Credit to https://github.com/mldbai/mldb/blob/master/ext/tensorflow.mk
from faster-rcnn_tf.
Just adding to this since it was useful to me.
I hit this same problem when testing on AWS EC2 instances with GPU. I had to use sm_20 in two places as mentioned above:
lib/make.sh
lib/setup.py
and force the rebuild of the python modules:
cd lib
python setup.py build_ext --inplace
from faster-rcnn_tf.
@ahmedammar you could even use sm_30 for AWS
from faster-rcnn_tf.
When I ran $ python ./tools/demo.py --model ./VGGnet_fast_rcnn_iter_70000.ckpt
,
I had exactly the same error cudaCheckError() failed : invalid device function
I tried to follow having the sm_37
in lib/make.sh
and lib/setup.py
. I think the setting is almost there. What can I do? I am using AWS EC2 g2.2xlarge
. Below messages say I am using NVIDA GRID K520
.
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so locally
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: GRID K520
major: 3 minor: 0 memoryClockRate (GHz) 0.797
pciBusID 0000:00:03.0
Total memory: 3.94GiB
Free memory: 3.91GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GRID K520, pci bus id: 0000:00:03.0)
Tensor("Placeholder:0", shape=(?, ?, ?, 3), dtype=float32)
Tensor("conv5_3/conv5_3:0", shape=(?, ?, ?, 512), dtype=float32)
Tensor("rpn_conv/3x3/rpn_conv/3x3:0", shape=(?, ?, ?, 512), dtype=float32)
Tensor("rpn_cls_score/rpn_cls_score:0", shape=(?, ?, ?, 18), dtype=float32)
Tensor("rpn_cls_prob:0", shape=(?, ?, ?, ?), dtype=float32)
Tensor("rpn_cls_prob_reshape:0", shape=(?, ?, ?, 18), dtype=float32)
Tensor("rpn_bbox_pred/rpn_bbox_pred:0", shape=(?, ?, ?, 36), dtype=float32)
Tensor("Placeholder_1:0", shape=(?, 3), dtype=float32)
Tensor("conv5_3/conv5_3:0", shape=(?, ?, ?, 512), dtype=float32)
Tensor("rois:0", shape=(?, 5), dtype=float32)
[<tf.Tensor 'conv5_3/conv5_3:0' shape=(?, ?, ?, 512) dtype=float32>, <tf.Tensor 'rois:0' shape=(?, 5) dtype=float32>]
Tensor("fc7/fc7:0", shape=(?, 4096), dtype=float32)
Loaded network ./VGGnet_fast_rcnn_iter_70000.ckpt
cudaCheckError() failed : invalid device function
from faster-rcnn_tf.
@eakbas Thanks!
from faster-rcnn_tf.
@VitaliKaiser Hello, I am using AWS EC2 GPU to run demo.py, getting 'cudaCheckError() failed : invalid device function'.
Since GPU used is GRID K520 on my instance, I follow your post to change -arch parameter in steup.py and make.sh to sm_30 and rerun make.sh, but this error is still there when I run './tools/demo.py --model ./VGG.....ckpt'. Could you please give me some help?
from faster-rcnn_tf.
@fangyan93 It´s quite a while since I last looked into it, but I had lost a lot of time to figure out things were not rebuild!
You have delete every binary which is build with the make.sh script. And then build it one more time.
from faster-rcnn_tf.
@VitaliKaiser Thanks for reply. Yes, I remove the previous build files and rebuild from very beginning, it works!
from faster-rcnn_tf.
So for those who are still lost. Here are a few clean steps to resolve the issue (you need to recompile your CUDA):
- Got to the following page https://developer.nvidia.com/cuda-gpus and find your GPU
- Find the number (the "Compute Capability") next to your GPU name, e.g. for 680 it is 3.0
- Remove the dot from it so it becomes 30
- In the make_cuda.sh file required for compiling, change the number after "arch" flag in the nvcc command to the one you found above. Example:
nvcc -c -o corr_cuda_kernel.cu.o corr_cuda_kernel.cu -x cu -Xcompiler -fPIC -arch=sm_52
nvcc -c -o corr_cuda_kernel.cu.o corr_cuda_kernel.cu -x cu -Xcompiler -fPIC -arch=sm_30 - Delete the folder of built files if you have already compiled it before
- Run the make_cuda.sh and continue as usual
from faster-rcnn_tf.
Related Issues (20)
- Training Faster-RCNN using scientific data
- some errors happened when i run Make for cython module HOT 1
- module compiled against API version 0xc but this version of numpy is 0xa
- What tools do you use to implement c++ code of roi-pool layer?
- ValueError: Tried to convert 'fiter' to a tensor and faild.
- You may need to pass the encoding= option to numpy.load HOT 1
- You may need to pass the encoding= option to numpy.load HOT 5
- @gdelab while testing,I have a problem about 'pickle'. so ,i change it to open files in byte as you put it,however, another question occured"Ran out of input": File "./faster_rcnn/../lib/datasets/voc_eval.py", line 123, in voc_eval
- Python.h: No such file or directory
- python setup.py build_ext --inplace File "setup.py", line 84 print extra_postargs ^ SyntaxError: Missing parentheses in call to 'print'
- when i make..thx HOT 1
- please help me to for how to get the high accuracy in faster-rcnn and parameter tuning
- demo problem HOT 1
- tensorflow.python.framework.errors_impl.InvalidArgumentError: <exception str() failed>
- Invalid argument: TypeError: object of type <class 'numpy.float64'> cannot be safely interpreted as an integer. Traceback (most recent call last):
- 已经解决了win10下关于faster rcnn下训练自己数据集的问题,如有不了解的,可以交流
- About RPN network improvements(Cascade RPN)
- the mAP is only 0.681? HOT 1
- image invalid, skipping
- potential bug in __init__.py
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from faster-rcnn_tf.