Comments (14)
Hi,
There is a major change on pytorch from v0.3* to v0.4*, I'm migrating the code to support those changes. In the meanwhile I recommend to keep pytorch0.3.1.
Your GPU needs cuda >9.0, so please install pytorch 0.3.1 with cuda 9.1 using:
pip uninstall torch torchvision
pip install https://download.pytorch.org/whl/cu91/torch-0.3.1-cp36-cp36m-linux_x86_64.whl
More info about previous pytorch version on pytorch page
from p2pala.
pip uninstall torch torchvision
pip install https://download.pytorch.org/whl/cu91/torch-0.3.1-cp36-cp36m-linux_x86_64.whl
(p3p) home@home-lnx:~/Desktop/programs/P2PaLA$ python P2PaLA.py --config config_BL_only.txt --tr_data ./data/train --te_data ./data/test --log_comment "_foo"
2019-01-21 15:37:56,527 - optparse - INFO - Reading configuration from config_BL_only.txt
2019-01-21 15:37:56,529 - P2PaLA - INFO - Working on training stage...
2019-01-21 15:37:56,529 - P2PaLA - WARNING - tensorboardX is not installed, display logger set to OFF.
2019-01-21 15:37:56,529 - P2PaLA - INFO - Preprocessing data from ./data/train
Traceback (most recent call last):
File "P2PaLA.py", line 1262, in <module>
main()
File "P2PaLA.py", line 528, in main
y_gen = nnG(x)
File "/home/home/.conda/envs/p3p/lib/python3.6/site-packages/torch/nn/modules/module.py", line 357, in __call__
result = self.forward(*input, **kwargs)
File "/home/home/Desktop/programs/P2PaLA/nn_models/models.py", line 94, in forward
return self.model(input_x)
File "/home/home/.conda/envs/p3p/lib/python3.6/site-packages/torch/nn/modules/module.py", line 357, in __call__
result = self.forward(*input, **kwargs)
File "/home/home/Desktop/programs/P2PaLA/nn_models/models.py", line 184, in forward
return F.log_softmax(self.model(input_x), dim=1)
File "/home/home/.conda/envs/p3p/lib/python3.6/site-packages/torch/nn/modules/module.py", line 357, in __call__
result = self.forward(*input, **kwargs)
File "/home/home/.conda/envs/p3p/lib/python3.6/site-packages/torch/nn/modules/container.py", line 67, in forward
input = module(input)
File "/home/home/.conda/envs/p3p/lib/python3.6/site-packages/torch/nn/modules/module.py", line 357, in __call__
result = self.forward(*input, **kwargs)
File "/home/home/.conda/envs/p3p/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 282, in forward
self.padding, self.dilation, self.groups)
File "/home/home/.conda/envs/p3p/lib/python3.6/site-packages/torch/nn/functional.py", line 90, in conv2d
return f(input, weight, bias)
RuntimeError: CUDNN_STATUS_EXECUTION_FAILED
from p2pala.
I don't think the issue is related to your Ubuntu version. But you need to install the right combination of cuda and pytorch for sure.
If you have installed cuda 9.1 and python 3.6 the command I post before should work, but If you have another combination, like cuda 9.0 or python 2.7 you need to find the right pythorch for it (on pytorch web).
I just test it using python 3.5, cuda9.1 on a GTX 1080 and a TITAN X and it works (I don't have a RTX to test it)
from p2pala.
Same error, even after installing Cuda 9.1
(p3p) home@home-lnx:~/Desktop/programs/P2PaLA$ cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module 410.48 Thu Sep 6 06:36:33 CDT 2018
GCC version: gcc version 7.3.0 (Ubuntu 7.3.0-27ubuntu1~18.04)
(p3p) home@home-lnx:~/Desktop/programs/P2PaLA$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Nov__3_21:07:56_CDT_2017
Cuda compilation tools, release 9.1, V9.1.85
from p2pala.
hmmmm....
it seems that RTX cards don't support Cuda 9.1, that's weird.
Will you consider supporting Cuda 10 via Pytorch 1?
I'm migrating the code to support those changes.
from p2pala.
Yes, my goal is to migrate all the code to the latest version of pytorch, but now i'm a bit short of time and I don't think I will release a new version in the following couple of weeks.
Thanks for spotting out the issue with new GPU's. I will try to migrate the code as soon as posible.
In the meanwhile, you can use the tool for inference using the pre-trained model available on CPU (just add the option --gpu -1).
from p2pala.
Hoping that you support Cuda 10, Thank you
home@home-lnx:~/NVIDIA_CUDA-9.1_Samples/1_Utilities/deviceQuery$ ./deviceQuery
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "GeForce RTX 2070"
CUDA Driver Version / Runtime Version 10.0 / 9.1
CUDA Capability Major/Minor version number: 7.5
Total amount of global memory: 7951 MBytes (8337227776 bytes)
MapSMtoCores for SM 7.5 is undefined. Default to use 64 Cores/SM
MapSMtoCores for SM 7.5 is undefined. Default to use 64 Cores/SM
(36) Multiprocessors, ( 64) CUDA Cores/MP: 2304 CUDA Cores
GPU Max Clock rate: 1815 MHz (1.81 GHz)
Memory Clock rate: 7001 Mhz
Memory Bus Width: 256-bit
L2 Cache Size: 4194304 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 1024
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 3 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Supports Cooperative Kernel Launch: Yes
Supports MultiDevice Co-op Kernel Launch: Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 46 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.0, CUDA Runtime Version = 9.1, NumDevs = 1
Result = PASS
from p2pala.
@lquirosd Can you share this information, which versions are you using:
- Cudnn
- Cuda
- Pytorch
- Compute
from p2pala.
Yes, the software have been tested on several configurations
Cudnn: 5,6,7
Cuda: 8,9
Pytorch: 0.3*
Python 2.7, 3.5, 3.6
Os: for training: Ubuntu 16.04, for test: Ubuntu 16.04, Mac OS 10.13
My current set-up is
>>> sys.version
'3.6.8 |Anaconda, Inc.| (default, Dec 30 2018, 01:22:34) \n[GCC 7.3.0]'
>>> torch.__version__
'0.3.1'
>>> torch.version.cuda
'8.0.61'
>>> torch.backends.cudnn.version()
7005:
from p2pala.
It seems that the problem is caused because RTX cards only support versions of Cuda 10+ and having compute capability 7.5, which the Nvidia forums confirmed to me.
@lquirosd Will you consider upgrading to Pytorch 1.0 ?
Note: CUDA 10 support for compute capability 3.0 – 7.5 (Kepler, Maxwell, Pascal, Volta, Turing)
from p2pala.
Hi,
Did you change the "batch_size" parameter to fit your card? I mean, default is 8 images per mini-batch, but RTX 2070 memory is only 8GB. I think it'll support a max mini-batch of 4 images or so.
Can you please run a experiment using a small mini-batch?
from p2pala.
This is not a memory issue, RTX cards (Turing) initial support is at Cuda 10, Pytorch 1.0 supports Cuda 10 / 9 / 8 versions.
So the only solution is by upgrading the code to Pytorch 1.0
from p2pala.
I just release a new branch for Pytorch 1.0:
git clone --single-branch --branch PyTorch-v1.0 https://github.com/lquirosd/P2PaLA.git
Please notice this branch is not fully tested, so some bugs can be around.
I ran some test on Pytorch: 1.0.0, CUDA: 9.0 and cudnn:7401, but cuda 10 is untested
from p2pala.
May you find peace in your life.
Thank you
from p2pala.
Related Issues (20)
- [config] HOT 1
- evaluation HOT 1
- Different predictions when changing the images order in prod_img_list HOT 1
- Error when run pretrained model HOT 1
- JoseRPrietoF version? HOT 2
- TextLine region HOT 4
- XML generator HOT 5
- [Enhancement] Page-XML extractor HOT 2
- Weights initialisation error
- minor typo in command HOT 1
- error while running the pre trained model in google colab HOT 3
- Baseline + polygon detection of handwriting HOT 4
- Demo not working HOT 1
- Wights used on the Website demo HOT 1
- "Neural Networks based Model" link in docs not working HOT 1
- How to see output polygon drawn on an input image? HOT 4
- Does P2Pala support reading order? HOT 1
- Any chance to see more pre-trained models? HOT 2
- model for zone segmentation HOT 1
- require opencv-python-headless variant HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from p2pala.