hughperkins / tf-coriander Goto Github PK
View Code? Open in Web Editor NEWOpenCL 1.2 implementation for Tensorflow
License: Apache License 2.0
OpenCL 1.2 implementation for Tensorflow
License: Apache License 2.0
tf.reduce_sum
seems to fail sometimes, on Mac
Sometiems, will sum to eg 1e-23
.
I think this is because of my removing the guards in Eigen's TensorCudaReduction.h. I'm going to try re-adding the guards, and see if that fixes it. https://bitbucket.org/hughperkins/eigen/commits/4e47de64dcc4a106407893069409ab6ba95509d5
Consider this very low-priority, at least from me! I mostly use tf-coriander from my Desktop with dedicated graphics, not from my laptop. And, my integrated graphics may well be slower than my laptop CPU, I'm not sure.
Anyways, I just loaded the latest release (0.17.3) to try it out, and it segfaults on Session initialisation:
cathal@europa:~$ source venvs/tf-cl/bin/activate
(tf-cl) cathal@europa:~$ ipython
Python 3.5.2+ (default, Sep 22 2016, 12:18:14)
Type 'copyright', 'credits' or 'license' for more information
IPython 6.1.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: import tensorflow as tf
s
In [2]: sess = tf.Session()
OpenCL platform: Intel Gen OCL Driver
OpenCL device: Intel(R) HD Graphics Haswell Ultrabook GT2 Mobile
I tensorflow/core/common_runtime/gpu/gpu_device.cc:989] Found device 0 with properties:
name: Intel(R) HD Graphics Haswell Ultrabook GT2 Mobile
major: -1 minor: -1 memoryClockRate (GHz) 1000
pciBusID 0000.0000
Total memory: 2.00GiB
Free memory: 1.00GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:877] cannot enable peer access from device ordinal 0 to device ordinal 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1011] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1021] 0: N
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1083] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Intel(R) HD Graphics Haswell Ultrabook GT2 Mobile, pci bus id: 0000.0000)
cl_driver DeviceAllocate 864026624
Segmentation fault (core dumped)
(tf-cl) cathal@europa:~$
If it helps, here's my clinfo
output:
(tf-cl) cathal@europa:~$ clinfo
Number of platforms 1
Platform Name Intel Gen OCL Driver
Platform Vendor Intel
Platform Version OpenCL 1.2 beignet 1.1.2
Platform Profile FULL_PROFILE
Platform Extensions cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_spir cl_khr_icd
Platform Extensions function suffix Intel
Platform Name Intel Gen OCL Driver
Number of devices 1
Device Name Intel(R) HD Graphics Haswell Ultrabook GT2 Mobile
Device Vendor Intel
Device Vendor ID 0x8086
Device Version OpenCL 1.2 beignet 1.1.2
Driver Version 1.1.2
Device OpenCL C Version OpenCL C 1.2 beignet 1.1.2
Device Type GPU
Device Profile FULL_PROFILE
Max compute units 20
Max clock frequency 1000MHz
Device Partition (core)
Max number of sub-devices 1
Supported partition types None, None, None
Max work item dimensions 3
Max work item sizes 512x512x512
Max work group size 512
Preferred work group size multiple 16
Preferred / native vector sizes
char 16 / 8
short 8 / 8
int 4 / 4
long 2 / 2
half 0 / 8 (n/a)
float 4 / 4
double 0 / 2 (n/a)
Half-precision Floating-point support (n/a)
Single-precision Floating-point support (core)
Denormals No
Infinity and NANs Yes
Round to nearest Yes
Round to zero No
Round to infinity No
IEEE754-2008 fused multiply-add No
Support is emulated in software No
Correctly-rounded divide and sqrt operations No
Double-precision Floating-point support (n/a)
Address bits 32, Little-Endian
Global memory size 2147483648 (2GiB)
Error Correction support No
Max memory allocation 1073741824 (1024MiB)
Unified memory for Host and Device Yes
Minimum alignment for any data type 128 bytes
Alignment of base address 1024 bits (128 bytes)
Global Memory cache type Read/Write
Global Memory cache size 8192
Global Memory cache line 64 bytes
Image support Yes
Max number of samplers per kernel 16
Max size for 1D images from buffer 65536 pixels
Max 1D or 2D image array size 2048 images
Max 2D image size 8192x8192 pixels
Max 3D image size 8192x8192x2048 pixels
Max number of read image args 128
Max number of write image args 8
Local memory type Global
Local memory size 65536 (64KiB)
Max constant buffer size 134217728 (128MiB)
Max number of constant args 8
Max size of kernel argument 1024
Queue properties
Out-of-order execution No
Profiling Yes
Prefer user sync for interop Yes
Profiling timer resolution 80ns
Execution capabilities
Run OpenCL kernels Yes
Run native kernels Yes
SPIR versions 1.2
printf() buffer size 1048576 (1024KiB)
Built-in kernels __cl_copy_region_align4;__cl_copy_region_align16;__cl_cpy_region_unalign_same_offset;__cl_copy_region_unalign_dst_offset;__cl_copy_region_unalign_src_offset;__cl_copy_buffer_rect;__cl_copy_image_1d_to_1d;__cl_copy_image_2d_to_2d;__cl_copy_image_3d_to_2d;__cl_copy_image_2d_to_3d;__cl_copy_image_3d_to_3d;__cl_copy_image_2d_to_buffer;__cl_copy_image_3d_to_buffer;__cl_copy_buffer_to_image_2d;__cl_copy_buffer_to_image_3d;__cl_fill_region_unalign;__cl_fill_region_align2;__cl_fill_region_align4;__cl_fill_region_align8_2;__cl_fill_region_align8_4;__cl_fill_region_align8_8;__cl_fill_region_align8_16;__cl_fill_region_align128;__cl_fill_image_1d;__cl_fill_image_1d_array;__cl_fill_image_2d;__cl_fill_image_2d_array;__cl_fill_image_3d;
Device Available Yes
Compiler Available Yes
Linker Available Yes
Device Extensions cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_spir cl_khr_icd
NULL platform behavior
clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) Intel Gen OCL Driver
clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) Success [Intel]
clCreateContext(NULL, ...) [default] Success [Intel]
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) Success (1)
Platform Name Intel Gen OCL Driver
Device Name Intel(R) HD Graphics Haswell Ultrabook GT2 Mobile
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) Success (1)
Platform Name Intel Gen OCL Driver
Device Name Intel(R) HD Graphics Haswell Ultrabook GT2 Mobile
ICD loader properties
ICD loader Name OpenCL ICD Loader
ICD loader Vendor OCL Icd free software
ICD loader Version 2.2.9
ICD loader Profile OpenCL 2.1
Presented only in the spirit of compatibility and correctness, I don't personally need this to work right now. Though, it might improve the odds of using this for tutorials or workshops if it worked on Intel / Beignet.
Can't load python module:
/usr/local/lib/python3.5/dist-packages/tensorflow/python/../third_party/cuda-on-cl/libcocl.so: undefined symbol: _ZN7clblast13CacheClearAllEv
No any symbol similar to _ZN7clblast13CacheClearAllEv was found with
nm /usr/local/lib/python3.5/dist-packages/tensorflow/python/../third_party/cuda-on-cl/libclblast.so|grep -i cache|grep -i clear
I wasn't kidding in the Tensorflow discussion; I use a bunch of software that you maintain, including the CL ports of Torch7. Soon, I hope to be able to use Tensorflow thanks to your efforts. I bought an AMD GPU rather than a NVidia because I believe in open standards, and also because I could see a handful of dedicated people were working on OpenCL ports of the big frameworks; chiefly yourself.
Well, buying AMD saved me literally hundreds of Euro for the same performance. Please give me a chance to contribute some of that saving towards your efforts, as a way to say thank-you. :)
Hi Hugh,
Thanks for your great work!
I was wondering if you would be able to add a step-by-step tutorial on how to setup the code for using Radeon Pro GPU on MacBook with Tensorflow? It'd be very helpful!
Best,
Jacob
cocl on branch origin/working-on-incremental-function-building
tensorflow-cl on branch tensorflow-cl
INFO: From Compiling tensorflow/core/kernels/gather_functor_gpu.cu.cc:
Please set CLANG_HOME
cocl built with CLANG_HOME=/usr/lib/llvm-3.8
and installed into /usr/local/
I've managed to install tf-coriander in OSX mavericks using the wheel file.
Thanks a lot!
Don't know where to put this - should i put this in the wiki?
The least I can do to document my experience for anyone who will come to this package.
Looking forward to keras v2.0 compatibility.
Keras ver 1.1.1
TF v0.11.0rc0
I realized the documentation in keras.io currently is already for the new version, as following examples there may not be updated.
example:
documentation:
keras.utils.to_categorical(y, num_classes)
model.fit - uses epochs as a parameter
keras v1.1.1
keras.utils.np_utils.to_categorical(y, nb_classes)
model.fit - uses nb_epoch as a parameter
hi,
my system is ubuntu 16.04 64bit, i tried both pip and pi3, tried both to ln -s python to python2/python3 under /usr/bin
$ pip install --upgrade tensorflow-0.11.0rc0-py3-none-any.whl
tensorflow-0.11.0rc0-py3-none-any.whl is not a supported wheel on this platform.
i'm new to python, not sure if it is my env issue relative to python, do you happen to know to debug/fix it? thanks.
I'm playing with midi-generative LSTMs, in code that's supposedly built for TF 0.10.x. I had to make some small modifications to get it to run, but I don't think those should have any bearing on the below error:
(Again, like #42, this isn't a high-priority bug for me)
(tf-cl) cathal@thinkum:~/Downloads/MusicGenerator$ python3 main.py --dataset_tag satie --model_tag satie
Welcome to DeepMusic v0.1 !
TensorFlow detected: v0.11.0rc0
Current parameters:
glob_step: 0
keep_all: False
dataset_tag: satie
sample_length: 40
hidden_size: 512
num_layers: 2
target_weights: linear
scheduled_sampling: none
batch_size: 64
save_every: 1000
ratio_dataset: 0.9
testing_curve: 10
batch_builder: relative
learning_rate: cst
enco_cell: identity
deco_cell: lstm
loop_processing: sample_softmax
Restoring dataset from /home/cathal/Downloads/MusicGenerator/data/samples/satie-relative.pkl...
Loaded: 18 songs (16 train/2 test)
Model creation...
OpenCL platform: AMD Accelerated Parallel Processing
OpenCL device: gfx803
I tensorflow/core/common_runtime/gpu/gpu_device.cc:989] Found device 0 with properties:
name: gfx803
major: -1 minor: -1 memoryClockRate (GHz) 1266
pciBusID 0000.0000
Total memory: 8.00GiB
Free memory: 6.00GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:877] cannot enable peer access from device ordinal 0 to device ordinal 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1011] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1021] 0: N
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1083] Creating TensorFlow device (/gpu:0) -> (device: 0, name: gfx803, pci bus id: 0000.0000)
cl_driver DeviceAllocate 6120328192
Initialize variables...
fabs is called, but not defined
This is probalby a bug in Coriander. Please file an issue at https://github.com/hughperkins/coriander/issues/new
basicblockdumper.runGeneration got exception whilst processing:
%373 = call double @fabs(double %372) #8
generateOpenCL failed to generate opencl sourcecode
kernel name orig=_ZN10tensorflow7functor28FillPhiloxRandomKernelLaunchINS_6random27TruncatedNormalDistributionINS2_19SingleSampleAdapterINS2_12PhiloxRandomEEEfEEEEvS5_PNT_17ResultElementTypeExS8_
kernel name short=_ZN10tensorflow7func
kernel name unique=_ZN10tensorflow7functor28FillPhiloxRandomKernelLaunchINS_6random27TruncatedNormalDistributionINS2_19SingleSampleAdapterINS2_12PhiloxRandomEEEfEEEEvS5_PNT_17ResultElementTypeExS8__0_1_2
writing ll to /tmp/failed-kernel.ll
caught runtime error fabs is called, but not defined => cannot continue. Sorry :-(
terminate called after throwing an instance of 'std::runtime_error'
what(): fabs is called, but not defined => cannot continue. Sorry :-(
Aborted (core dumped)
So, just decided to pull https://github.com/hughperkins/TensorFlow-Examples and run a few of the examples, to see how things are going since the fix to #34 and the addition of working ADAM.
The examples that specify a device always crash. Here's an example for 3_NeuralNetworks/dynamic_rnn.py:
cathal@thinkum:~/TensorFlow-Examples/examples/3_NeuralNetworks$ python3 dynamic_rnn.py
/home/cathal/.local/lib/python3.5/site-packages/tensorflow/python/ops/gradients.py:90: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
"Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
OpenCL platform: AMD Accelerated Parallel Processing
OpenCL device: Hawaii
I tensorflow/core/common_runtime/gpu/gpu_device.cc:989] Found device 0 with properties:
name: Hawaii
major: -1 minor: -1 memoryClockRate (GHz) 1040
pciBusID 0000.0000
Total memory: 7.57GiB
Free memory: 3.95GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:877] cannot enable peer access from device ordinal 0 to device ordinal 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1011] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1021] 0: N
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1083] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Hawaii, pci bus id: 0000.0000)
cl_driver DeviceAllocate 3930062848
Traceback (most recent call last):
File "/home/cathal/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 972, in _do_call
return fn(*args)
File "/home/cathal/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 950, in _run_fn
self._extend_graph()
File "/home/cathal/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 999, in _extend_graph
self._session, graph_def.SerializeToString(), status)
File "/usr/lib/python3.5/contextlib.py", line 66, in __exit__
next(self.gen)
File "/home/cathal/.local/lib/python3.5/site-packages/tensorflow/python/framework/errors.py", line 463, in raise_exception_on_not_ok_status
pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors.InvalidArgumentError: Cannot assign a device to node 'split': Could not satisfy explicit device specification '/device:GPU:0' because no supported kernel for GPU devices is available.
Colocation Debug Info:
Colocation group had the following types and devices:
Switch: GPU CPU
Split: CPU
[[Node: split = Split[T=DT_FLOAT, num_split=20, _device="/device:GPU:0"](split/split_dim, Reshape)]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "dynamic_rnn.py", line 170, in <module>
sess.run(init)
File "/home/cathal/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 717, in run
run_metadata_ptr)
File "/home/cathal/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 915, in _run
feed_dict_string, options, run_metadata)
File "/home/cathal/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 965, in _do_run
target_list, options, run_metadata)
File "/home/cathal/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 985, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors.InvalidArgumentError: Cannot assign a device to node 'split': Could not satisfy explicit device specification '/device:GPU:0' because no supported kernel for GPU devices is available.
Colocation Debug Info:
Colocation group had the following types and devices:
Switch: GPU CPU
Split: CPU
[[Node: split = Split[T=DT_FLOAT, num_split=20, _device="/device:GPU:0"](split/split_dim, Reshape)]]
Caused by op 'split', defined at:
File "dynamic_rnn.py", line 155, in <module>
pred = dynamicRNN(x, seqlen, weights, biases)
File "dynamic_rnn.py", line 123, in dynamicRNN
x = tf.split(0, seq_max_len, x)
File "/home/cathal/.local/lib/python3.5/site-packages/tensorflow/python/ops/array_ops.py", line 1036, in split
name=name)
File "/home/cathal/.local/lib/python3.5/site-packages/tensorflow/python/ops/gen_array_ops.py", line 2621, in _split
num_split=num_split, name=name)
File "/home/cathal/.local/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 748, in apply_op
op_def=op_def)
File "/home/cathal/.local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2388, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/home/cathal/.local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1300, in __init__
self._traceback = _extract_stack()
InvalidArgumentError (see above for traceback): Cannot assign a device to node 'split': Could not satisfy explicit device specification '/device:GPU:0' because no supported kernel for GPU devices is available.
Colocation Debug Info:
Colocation group had the following types and devices:
Switch: GPU CPU
Split: CPU
[[Node: split = Split[T=DT_FLOAT, num_split=20, _device="/device:GPU:0"](split/split_dim, Reshape)]]
If I change the specification to :1 instead of :0 (because ???) I get this instead:
cathal@thinkum:~/TensorFlow-Examples/examples/3_NeuralNetworks$ python3 dynamic_rnn.py
/home/cathal/.local/lib/python3.5/site-packages/tensorflow/python/ops/gradients.py:90: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
"Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
OpenCL platform: AMD Accelerated Parallel Processing
OpenCL device: Hawaii
I tensorflow/core/common_runtime/gpu/gpu_device.cc:989] Found device 0 with properties:
name: Hawaii
major: -1 minor: -1 memoryClockRate (GHz) 1040
pciBusID 0000.0000
Total memory: 7.57GiB
Free memory: 3.95GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:877] cannot enable peer access from device ordinal 0 to device ordinal 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1011] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1021] 0: N
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1083] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Hawaii, pci bus id: 0000.0000)
cl_driver DeviceAllocate 3930062848
Traceback (most recent call last):
File "/home/cathal/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 972, in _do_call
return fn(*args)
File "/home/cathal/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 950, in _run_fn
self._extend_graph()
File "/home/cathal/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 999, in _extend_graph
self._session, graph_def.SerializeToString(), status)
File "/usr/lib/python3.5/contextlib.py", line 66, in __exit__
next(self.gen)
File "/home/cathal/.local/lib/python3.5/site-packages/tensorflow/python/framework/errors.py", line 463, in raise_exception_on_not_ok_status
pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors.InvalidArgumentError: Cannot assign a device to node 'GradientDescent/learning_rate': Could not satisfy explicit device specification '/device:GPU:1' because no devices matching that specification are registered in this process; available devices: /job:localhost/replica:0/task:0/cpu:0, /job:localhost/replica:0/task:0/gpu:0
[[Node: GradientDescent/learning_rate = Const[dtype=DT_FLOAT, value=Tensor<type: float shape: [] values: 0.01>, _device="/device:GPU:1"]()]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "dynamic_rnn.py", line 170, in <module>
sess.run(init)
File "/home/cathal/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 717, in run
run_metadata_ptr)
File "/home/cathal/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 915, in _run
feed_dict_string, options, run_metadata)
File "/home/cathal/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 965, in _do_run
target_list, options, run_metadata)
File "/home/cathal/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 985, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors.InvalidArgumentError: Cannot assign a device to node 'GradientDescent/learning_rate': Could not satisfy explicit device specification '/device:GPU:1' because no devices matching that specification are registered in this process; available devices: /job:localhost/replica:0/task:0/cpu:0, /job:localhost/replica:0/task:0/gpu:0
[[Node: GradientDescent/learning_rate = Const[dtype=DT_FLOAT, value=Tensor<type: float shape: [] values: 0.01>, _device="/device:GPU:1"]()]]
Caused by op 'GradientDescent/learning_rate', defined at:
File "dynamic_rnn.py", line 159, in <module>
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(cost)
File "/home/cathal/.local/lib/python3.5/site-packages/tensorflow/python/training/optimizer.py", line 198, in minimize
name=name)
File "/home/cathal/.local/lib/python3.5/site-packages/tensorflow/python/training/optimizer.py", line 314, in apply_gradients
self._prepare()
File "/home/cathal/.local/lib/python3.5/site-packages/tensorflow/python/training/gradient_descent.py", line 62, in _prepare
name="learning_rate")
File "/home/cathal/.local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 657, in convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
File "/home/cathal/.local/lib/python3.5/site-packages/tensorflow/python/framework/constant_op.py", line 180, in _constant_tensor_conversion_function
return constant(v, dtype=dtype, name=name)
File "/home/cathal/.local/lib/python3.5/site-packages/tensorflow/python/framework/constant_op.py", line 167, in constant
attrs={"value": tensor_value, "dtype": dtype_value}, name=name).outputs[0]
File "/home/cathal/.local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2388, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/home/cathal/.local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1300, in __init__
self._traceback = _extract_stack()
InvalidArgumentError (see above for traceback): Cannot assign a device to node 'GradientDescent/learning_rate': Could not satisfy explicit device specification '/device:GPU:1' because no devices matching that specification are registered in this process; available devices: /job:localhost/replica:0/task:0/cpu:0, /job:localhost/replica:0/task:0/gpu:0
[[Node: GradientDescent/learning_rate = Const[dtype=DT_FLOAT, value=Tensor<type: float shape: [] values: 0.01>, _device="/device:GPU:1"]()]]
My clinfo
output:
cathal@thinkum:~/TensorFlow-Examples/examples/3_NeuralNetworks$ clinfo
Number of platforms 1
Platform Name AMD Accelerated Parallel Processing
Platform Vendor Advanced Micro Devices, Inc.
Platform Version OpenCL 2.0 AMD-APP (2348.3)
Platform Profile FULL_PROFILE
Platform Extensions cl_khr_icd cl_amd_event_callback cl_amd_offline_devices
Platform Extensions function suffix AMD
Platform Name AMD Accelerated Parallel Processing
Number of devices 2
Device Name Hawaii
Device Vendor Advanced Micro Devices, Inc.
Device Vendor ID 0x1002
Device Version OpenCL 1.2 AMD-APP (2348.3)
Driver Version 2348.3
Device OpenCL C Version OpenCL C 1.2
Device Type GPU
Device Profile FULL_PROFILE
Device Board Name (AMD) AMD Radeon (TM) R9 390 Series
Device Topology (AMD) PCI-E, 01:00.0
Max compute units 40
SIMD per compute unit (AMD) 4
SIMD width (AMD) 16
SIMD instruction width (AMD) 1
Max clock frequency 1040MHz
Graphics IP (AMD) 7.2
Device Partition (core)
Max number of sub-devices 40
Supported partition types none specified
Max work item dimensions 3
Max work item sizes 256x256x256
Max work group size 256
Preferred work group size multiple 64
Wavefront width (AMD) 64
Preferred / native vector sizes
char 4 / 4
short 2 / 2
int 1 / 1
long 1 / 1
half 1 / 1 (n/a)
float 1 / 1
double 1 / 1 (cl_khr_fp64)
Half-precision Floating-point support (n/a)
Single-precision Floating-point support (core)
Denormals No
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Correctly-rounded divide and sqrt operations Yes
Double-precision Floating-point support (cl_khr_fp64)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Correctly-rounded divide and sqrt operations No
Address bits 64, Little-Endian
Global memory size 8131137536 (7.573GiB)
Global free memory (AMD) 7920852 (7.554GiB)
Global memory channels (AMD) 16
Global memory banks per channel (AMD) 16
Global memory bank width (AMD) 256 bytes
Error Correction support No
Max memory allocation 4244635648 (3.953GiB)
Unified memory for Host and Device No
Minimum alignment for any data type 128 bytes
Alignment of base address 2048 bits (256 bytes)
Global Memory cache type Read/Write
Global Memory cache size 16384
Global Memory cache line 64 bytes
Image support Yes
Max number of samplers per kernel 16
Max size for 1D images from buffer 134217728 pixels
Max 1D or 2D image array size 2048 images
Base address alignment for 2D image buffers 256 bytes
Pitch alignment for 2D image buffers 256 bytes
Max 2D image size 16384x16384 pixels
Max 3D image size 2048x2048x2048 pixels
Max number of read image args 128
Max number of write image args 8
Local memory type Local
Local memory size 32768 (32KiB)
Local memory syze per CU (AMD) 65536 (64KiB)
Local memory banks (AMD) 32
Max constant buffer size 4244635648 (3.953GiB)
Max number of constant args 8
Max size of kernel argument 1024
Queue properties
Out-of-order execution No
Profiling Yes
Prefer user sync for interop Yes
Profiling timer resolution 1ns
Profiling timer offset since Epoch (AMD) 1496133664638937392ns (Tue May 30 09:41:04 2017)
Execution capabilities
Run OpenCL kernels Yes
Run native kernels No
Thread trace supported (AMD) Yes
SPIR versions 1.2
printf() buffer size 1048576 (1024KiB)
Built-in kernels
Device Available Yes
Compiler Available Yes
Linker Available Yes
Device Extensions cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_image2d_from_buffer cl_khr_spir cl_khr_gl_event
Device Name Intel(R) Core(TM) i5-6600K CPU @ 3.50GHz
Device Vendor GenuineIntel
Device Vendor ID 0x1002
Device Version OpenCL 1.2 AMD-APP (2348.3)
Driver Version 2348.3 (sse2,avx)
Device OpenCL C Version OpenCL C 1.2
Device Type CPU
Device Profile FULL_PROFILE
Device Board Name (AMD)
Device Topology (AMD) (n/a)
Max compute units 4
Max clock frequency 799MHz
Device Partition (core, cl_ext_device_fission)
Max number of sub-devices 4
Supported partition types equally, by counts, by affinity domain
Supported affinity domains L3 cache, L2 cache, L1 cache, next partitionable
Supported partition types (ext) equally, by counts, by affinity domain
Supported affinity domains (ext) L3 cache, L2 cache, L1 cache, next fissionable
Max work item dimensions 3
Max work item sizes 1024x1024x1024
Max work group size 1024
Preferred work group size multiple 1
Preferred / native vector sizes
char 16 / 16
short 8 / 8
int 4 / 4
long 2 / 2
half 4 / 4 (n/a)
float 8 / 8
double 4 / 4 (cl_khr_fp64)
Half-precision Floating-point support (n/a)
Single-precision Floating-point support (core)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Correctly-rounded divide and sqrt operations Yes
Double-precision Floating-point support (cl_khr_fp64)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Correctly-rounded divide and sqrt operations No
Address bits 64, Little-Endian
Global memory size 16788918272 (15.64GiB)
Error Correction support No
Max memory allocation 4197229568 (3.909GiB)
Unified memory for Host and Device Yes
Minimum alignment for any data type 128 bytes
Alignment of base address 1024 bits (128 bytes)
Global Memory cache type Read/Write
Global Memory cache size 32768
Global Memory cache line 64 bytes
Image support Yes
Max number of samplers per kernel 16
Max size for 1D images from buffer 65536 pixels
Max 1D or 2D image array size 2048 images
Max 2D image size 8192x8192 pixels
Max 3D image size 2048x2048x2048 pixels
Max number of read image args 128
Max number of write image args 64
Local memory type Global
Local memory size 32768 (32KiB)
Max constant buffer size 65536 (64KiB)
Max number of constant args 8
Max size of kernel argument 4096 (4KiB)
Queue properties
Out-of-order execution No
Profiling Yes
Prefer user sync for interop Yes
Profiling timer resolution 1ns
Profiling timer offset since Epoch (AMD) 1496133664638937392ns (Tue May 30 09:41:04 2017)
Execution capabilities
Run OpenCL kernels Yes
Run native kernels Yes
SPIR versions 1.2
printf() buffer size 65536 (64KiB)
Built-in kernels
Device Available Yes
Compiler Available Yes
Linker Available Yes
Device Extensions cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_spir cl_khr_gl_event
NULL platform behavior
clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) No platform
clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) No platform
clCreateContext(NULL, ...) [default] No platform
clCreateContext(NULL, ...) [other] Success [AMD]
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) No platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) No platform
the error comes from
File "/home/bixian/.virtualenvs/tensorflow2/lib/python3.4/imp.py", line 243, in load_module return load_dynamic(name, filename, file)
when i tried to import tensorflow as tf
and it says
ImportError: /lib/x86_64-linux-gnu/libm.so.6: version
GLIBC_2.23' not found (required by /home/bixian/.virtualenvs/tensorflow2/lib/python3.4/site-packages/tensorflow/python/_pywrap_tensorflow.so)`
I tried
objdump -p _pywrap_tensorflow.so
and it says
...blablabla... Version References: required from libm.so.6: 0x09691a75 0x00 07 GLIBC_2.2.5 0x06969183 0x00 05 GLIBC_2.23 ...blablabla...
I also tried
nm _pywrap_tensorflow.so|grep GLIBC_2.23
and it says
U lgammaf@@GLIBC_2.23 U lgamma@@GLIBC_2.23
Somehow, i believe this problem comes from my ubuntu which is Ubuntu 14.04.5 LTS with my custom Linux Kernel 3.16.36-031636-generic. Because when i tried sudo apt-cache policy libc6
, it says libc6: Installed: 2.19-0ubuntu6.9 Candidate: 2.19-0ubuntu6.9 ...blablabla
.
Besides, I use python 3.4.3 in virtualenv environment as interpreter.
I have Radoen Rx 470 gpu install on windows 10. Do i have any option for running tensorflow on this gpu
When I finish installing the wheel using pip3, I run the test setup and get the following error:
~$ pip install -r tensorflow/stream_executor/cl/test/requirements.txt Could not open requirements file: [Errno 2] No such file or directory: 'tensorflow/stream_executor/cl/test/requirements.txt'
So I spend all day trying to find the file and it doesn't exist
On Mac, training operation broken caused seg fault, using Sierra/Radeon.
ie, forward direction on a linear regression works ok:
'''
A linear regression learning algorithm example using TensorFlow library.
Author: Aymeric Damien
Project: https://github.com/aymericdamien/TensorFlow-Examples/
'''
from __future__ import print_function
import tensorflow as tf
import numpy
import matplotlib.pyplot as plt
rng = numpy.random
# Parameters
learning_rate = 0.01
training_epochs = 1000
training_epochs = 5
display_step = 50
with tf.device('/gpu:0'):
# Training Data
train_X = numpy.asarray([3.3,4.4,5.5,6.71,6.93,4.168,9.779,6.182,7.59,2.167,
7.042,10.791,5.313,7.997,5.654,9.27,3.1])
train_Y = numpy.asarray([1.7,2.76,2.09,3.19,1.694,1.573,3.366,2.596,2.53,1.221,
2.827,3.465,1.65,2.904,2.42,2.94,1.3])
n_samples = train_X.shape[0]
# tf Graph Input
X = tf.placeholder("float")
Y = tf.placeholder("float")
# Set model weights
W = tf.Variable(rng.randn(), name="weight")
b = tf.Variable(rng.randn(), name="bias")
# Construct a linear model
pred = tf.add(tf.mul(X, W), b)
# Mean squared error
cost = tf.reduce_sum(tf.pow(pred-Y, 2))/(2*n_samples)
# Gradient descent
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)
# Initializing the variables
init = tf.initialize_all_variables()
# Launch the graph
with tf.Session() as sess:
sess.run(init)
# Fit all training data
for epoch in range(training_epochs):
batch_num = 0
for (x, y) in zip(train_X, train_Y):
x = sess.run(X, feed_dict={X: x})
if batch_num == 0:
print('epoch %s' % epoch)
X_val, Y_val, W_val, b_val = sess.run((X, Y, W, b), feed_dict={X: x, Y: y})
print(X_val, Y_val, W_val, b_val)
print('pred', sess.run(pred, feed_dict={X: x, Y: y}))
print('cost', sess.run(cost, feed_dict={X: x, Y: y}))
batch_num += 1
... but adding the optimizer operation causes segfault:
# Fit all training data
for epoch in range(training_epochs):
batch_num = 0
for (x, y) in zip(train_X, train_Y):
x = sess.run(X, feed_dict={X: x})
if batch_num == 0:
print('epoch %s' % epoch)
X_val, Y_val, W_val, b_val = sess.run((X, Y, W, b), feed_dict={X: x, Y: y})
print(X_val, Y_val, W_val, b_val)
print('pred', sess.run(pred, feed_dict={X: x, Y: y}))
print('cost', sess.run(cost, feed_dict={X: x, Y: y}))
sess.run(optimizer, feed_dict={X: x, Y: y})
batch_num += 1
F name _ZN5Eigen8internal15EigenMetaKe
running generation on _ZN5Eigen8internal15EigenMetaKe
building kernel _ZN5Eigen8internal15EigenMetaKernelINS_15TensorEvaluatorIKNS_14TensorAssignOpINS_9TensorMapINS_6TensorIfLi1ELi1EiEELi16ENS_11MakePointerEEEKNS_20TensorCwiseNullaryOpINS0_15scalar_const_opIfEEKS8_EEEENS_9GpuDeviceEEEiEEvT_T0_
... built
building kernel _ZN5Eigen8internal15EigenMetaKernelINS_15TensorEvaluatorIKNS_14TensorAssignOpINS_9TensorMapINS_6TensorIfLi1ELi1EiEELi16ENS_11MakePointerEEEKNS_19TensorCwiseBinaryOpINS0_20scalar_difference_opIffEEKS8_KNS9_INS0_17scalar_product_opIKfSE_EEKNS_20TensorBroadcastingOpIKNS_5arrayIiLm1EEEKNS_17TensorReshapingOpIKNS_5SizesIJLl1EEEEKNS4_INS_15TensorFixedSizeISE_NSL_IJEEELi1EiS7_EELi16ES7_EEEEEEKNS4_INS5_ISE_Li1ELi1EiEELi16ES7_EEEEEEEENS_9GpuDeviceEEEiEEvT_T0_
F name _ZN5Eigen8internal15EigenMetaKe
running generation on _ZN5Eigen8internal15EigenMetaKe
building kernel _ZN5Eigen8internal15EigenMetaKernelINS_15TensorEvaluatorIKNS_14TensorAssignOpINS_9TensorMapINS_6TensorIfLi1ELi1EiEELi16ENS_11MakePointerEEEKNS_19TensorCwiseBinaryOpINS0_20scalar_difference_opIffEEKS8_KNS9_INS0_17scalar_product_opIKfSE_EEKNS_20TensorBroadcastingOpIKNS_5arrayIiLm1EEEKNS_17TensorReshapingOpIKNS_5SizesIJLl1EEEEKNS4_INS_15TensorFixedSizeISE_NSL_IJEEELi1EiS7_EELi16ES7_EEEEEEKNS4_INS5_ISE_Li1ELi1EiEELi16ES7_EEEEEEEENS_9GpuDeviceEEEiEEvT_T0_
... built
Segmentation fault: 11
I'm taking a look at this issue.
Edit: seems something to do with event handling:
* thread #13, stop reason = EXC_BAD_ACCESS (code=1, address=0xfffffffffffffff0)
* frame #0: 0x00007fffe6908a59 libc++abi.dylib`__dynamic_cast + 38
frame #1: 0x00007fffd6966baa OpenCL`___lldb_unnamed_symbol306$$OpenCL + 37
frame #2: 0x00007fffd6978ef3 OpenCL`clReleaseEvent + 15
frame #3: 0x000000011ac46667 libcocl.dylib`::cuEventRecord(event=0x000000011b7d7ad0, _queue=<unavailable>) at cocl_events.cpp:92 [opt]
frame #4: 0x00000001091dac75 _pywrap_tensorflow.so`perftools::gputools::cl::CLDriver::RecordEvent(context=0x000000012120db20, event=0x000000011b7d7ad0, stream="0W\x86\x1b\x01") at cl_driver.cc:1121
frame #5: 0x00000001091eae95 _pywrap_tensorflow.so`perftools::gputools::cl::CLExecutor::CreateStreamDependency(this=0x000000011e55baf0, dependent=0x000000011b7de0a0, other=0x000000011b7a4f00) at cl_gpu_executor.cc:730
frame #6: 0x0000000109272b64 _pywrap_tensorflow.so`perftools::gputools::StreamExecutor::CreateStreamDependency(this=0x000000011e55b190, dependent=0x000000011b7de0a0, other=0x000000011b7a4f00) at stream_executor_pimpl.cc:635
frame #7: 0x0000000109246bc8 _pywrap_tensorflow.so`perftools::gputools::Stream::ThenWaitFor(this=0x000000011b7de0a0, other=0x000000011b7a4f00) at stream.cc:1335
frame #8: 0x00000001091a236f _pywrap_tensorflow.so`tensorflow::GPUUtil::CopyCPUTensorToGPU(cpu_tensor=0x000000011b946e78, device_context=0x000000011e58b130, gpu_device=0x000000011b7a5ea0, gpu_tensor=0x000000011d9cc850, done=0x000000010060cf70)>) at gpu_util.cc:326
frame #9: 0x00000001091ac613 _pywrap_tensorflow.so`tensorflow::GPUDeviceContext::CopyCPUTensorToDevice(this=0x000000011e58b130, cpu_tensor=0x000000011b946e78, device=0x000000011b7a5ea0, device_tensor=0x000000011d9cc850, done=<unavailable>)>) const at gpu_util_platform_specific.cc:29
frame #10: 0x00000001096f8fda _pywrap_tensorflow.so`tensorflow::CopyTensor::ViaDMA(edge_name=(data_ = "edge_185__recv_Placeholder_0;0:0", size_ = 28), send_dev_context=0x0000000000000000, recv_dev_context=0x000000011e58b130, src=0x000000011b7ac770, dst=0x000000011b7a5ea0, src_alloc_attr=(value = 4), dst_alloc_attr=(value = 0), input=0x000000011b946e78, output=0x000000011d9cc850, done=0x000000010060bc50)>) at copy_tensor.cc:99
frame #11: 0x00000001097a5cc9 _pywrap_tensorflow.so`tensorflow::IntraProcessRendezvous::SameWorkerRecvDone(this=0x000000011b946140, parsed=0x0000000101b19968, send_args=0x00007000023bba50, recv_args=0x00007000023baf30, in=0x000000011b946e78, out=0x000000011d9cc850, done=0x000000013d006800)>) at rendezvous_mgr.cc:106
OpenCL platform: Apple
OpenCL device: Intel(R) Iris(TM) Graphics 6100
I tensorflow/core/common_runtime/gpu/gpu_device.cc:989] Found device 0 with properties:
name: Intel(R) Iris(TM) Graphics 6100
major: -1 minor: -1 memoryClockRate (GHz) 1050
pciBusID 0000.0000
Total memory: 1.50GiB
Free memory: 384.00MiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:877] cannot enable peer access from device ordinal 0 to device ordinal 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1011] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1021] 0: N
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1083] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Intel(R) Iris(TM) Graphics 6100, pci bus id: 0000.0000)
cl_driver DeviceAllocate 192937984
Traceback (most recent call last):
File "/Users/user/PycharmProjects/tensorflow-test/cnn.py", line 105, in
sess.run(tf.global_variables_initializer())
AttributeError: module 'tensorflow' has no attribute 'global_variables_initializer'
Using latest github code version, on Ubuntu 16.04 / nvidia, a bunch of tests pass, but every so often (quite often, unusably often), it segfaults:
I tensorflow/core/common_runtime/gpu/gpu_device.cc:877] cannot enable peer access from device ordinal 0 to device ordina
l 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1011] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1021] 0: N
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1083] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GRID K
520, pci bus id: 0000.0000)
cl_driver DeviceAllocate 848478208
Segmentation fault (core dumped)
example backtrace, from gdb, https://gist.github.com/hughperkins/10855efd242b0786c7dfc2aa4075e59a
This looks annoyingly hard to diagnose/debug... :-(
Edit: backtrace with debug build: https://gist.github.com/hughperkins/68f636beb90fa9c8cb6d4687acce9f05
Now that `tf-coriander' has reached a state of relative usefulness, it might be helpful to have a wiki to collect information that others have gathered on how to put it to use.
E.g., I identified a version of Keras that I believe to be ~compatible with the version of Tensorflow that tf-coriander represents, and that's information I'm happy to share with people.
Common pitfalls and how to solve them falls into the same category of "tidbits worth sharing".
Basically, anything that applies specifically to using tf-coriander
, either due to the older Tensorflow base, or to the OpenCL nature, or to peculiarities of Coriander, could be thrown on a Wiki. Just a thought. :)
On Mac Sierra + Radeon, autoencoder.py hangs. Cannot ctrl-c. Computer continues to work ok, eg can browse web etc, but have to restart Mac, before running any tensorflow script on GPU.
Following the readme instructions for a standard install gives the following error on Ubuntu 16.04.
$ pip install --upgrade tensorflow-0.11.0rc0-py3-none-any.whl
tensorflow-0.11.0rc0-py3-none-any.whl is not a supported wheel on this platform.
I fixed it by using pip3 instead of pip.
Please provide a single post stating:
This is kind of an experimental approach to getting feedback :-) . But being starred or not doesnt give me much information on what people are looking for, whether they are finding it useful etc, so I'm going to try this approach :-)
Edit: note that I seem to have started adding 👍 to items to indicate I've read them. I probably wont reply into this thread. If you do want a reply, please consider raising a new issue, which I still might not reply to, but I might...
I am using "anaconda" python 3.5 on Mac OSX Sierra. I have downloaded and installed the binary wheel file: "pip install tensorflow-0.11.0rc0-py3-none-any.whl" (version v0.17.2)
When I try to run the tensorflow tests or try to run any of the examples from: https://github.com/aymericdamien/TensorFlow-Examples I get an error:
File "", line 919, in create_module
File "", line 222, in _call_with_frames_removed
ImportError: dlopen(/Users/tomas/anaconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/_pywrap_tensorflow.so, 10): Library not loaded: /Users/hugh2/git-local/tensorflow-llvm40-addingconv/third_party/coriander/build/libclew.dylib
Referenced from: /Users/tomas/anaconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/_pywrap_tensorflow.so
Reason: image not found
The provided binary WHL file has a fixed path dependency to the "libclew.dylib" file in the following directory: "/Users/hugh2/git-local/tensorflow-llvm40-addingconv/third_party/coriander/build/"
Thanks for fixing this problem.
/Library/Frameworks/Python.framework/Versions/3.6/bin/python3.6 /Users/user/PycharmProjects/tensorflow-test/cnn.py
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/ops/gradients.py:90: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
"Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
OpenCL platform: Apple
OpenCL device: Intel(R) Iris(TM) Graphics 6100
I tensorflow/core/common_runtime/gpu/gpu_device.cc:989] Found device 0 with properties:
name: Intel(R) Iris(TM) Graphics 6100
major: -1 minor: -1 memoryClockRate (GHz) 1050
pciBusID 0000.0000
Total memory: 1.50GiB
Free memory: 384.00MiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:877] cannot enable peer access from device ordinal 0 to device ordinal 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1011] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1021] 0: N
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1083] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Intel(R) Iris(TM) Graphics 6100, pci bus id: 0000.0000)
cl_driver DeviceAllocate 192937984
__internal__ build log:
<program source>:31:36: warning: unused variable 'pGlobalVars'
const struct GlobalVars* const pGlobalVars = &globalVars;
^
Cannot select: 0x7fed088a1310: i32 = any_extend 0x7fed08909010 [ID=43]
0x7fed08909010: i32 = IGILISD::IGILSETCC 0x7fed088a1b10, 0x7fed088a1510, 0x7fed0890d010 [ID=42]
0x7fed088a1b10: i64 = bitcast 0x7fed08918b10 [ID=41]
0x7fed08918b10: v2i32 = IGILISD::MOVSWZ 0x7fed0890cf10, 0x7fed088a1e10, 0x7fed08918510, 0x7fed08918510 [ID=38]
0x7fed0890cf10: i32,ch = load 0x7fed09a09970, 0x7fed0890c410, 0x7fed08908810<LD4[%28]> [ORD=24] [ID=34]
0x7fed0890c410: i64 = add 0x7fed0890c210, 0x7fed0890d710 [ORD=23] [ID=33]
0x7fed0890c210: i64,ch = CopyFromReg 0x7fed09a09970, 0x7fed0890d810 [ORD=19] [ID=17]
0x7fed0890d810: i64 = Register %vreg1 [ORD=19] [ID=1]
0x7fed0890d710: i64 = shl 0x7fed08908a10, 0x7fed088a1710 [ORD=23] [ID=32]
0x7fed08908a10: i64 = bitcast 0x7fed088a1f10 [ID=31]
0x7fed088a1f10: v2i32 = IGILISD::MOVSWZ 0x7fed0890c110, 0x7fed0890c610, 0x7fed08918510, 0x7fed08918510 [ID=30]
0x7fed0890c110: i32,i32 = sdivrem 0x7fed08919010, 0x7fed0890ce10 [ID=27]
0x7fed0890c610: i32 = sra 0x7fed0890c110, 0x7fed08918910 [ID=29]
0x7fed08918510: i32 = Constant<0> [ORD=31] [ID=9]
0x7fed08918510: i32 = Constant<0> [ORD=31] [ID=9]
0x7fed088a1710: i64 = bitcast 0x7fed088a1810 [ID=26]
0x7fed088a1810: v2i32 = IGILISD::MOVSWZ 0x7fed0890a410, 0x7fed08918510, 0x7fed08918510, 0x7fed08918510 [ID=23]
0x7fed0890a410: i32 = Constant<2> [ID=16]
0x7fed08918510: i32 = Constant<0> [ORD=31] [ID=9]
0x7fed08918510: i32 = Constant<0> [ORD=31] [ID=9]
0x7fed08918510: i32 = Constant<0> [ORD=31] [ID=9]
0x7fed08908810: i64 = bitcast 0x7fed0890a210 [ID=25]
0x7fed0890a210: v2i32 = IGILISD::MOVSWZ 0x7fed08918510, 0x7fed08918510, 0x7fed08918510, 0x7fed08918510 [ID=21]
0x7fed08918510: i32 = Constant<0> [ORD=31] [ID=9]
0x7fed08918510: i32 = Constant<0> [ORD=31] [ID=9]
0x7fed08918510: i32 = Constant<0> [ORD=31] [ID=9]
0x7fed08918510: i32 = Constant<0> [ORD=31] [ID=9]
0x7fed088a1e10: i32 = sra 0x7fed0890cf10, 0x7fed08918910 [ID=35]
0x7fed0890cf10: i32,ch = load 0x7fed09a09970, 0x7fed0890c410, 0x7fed08908810<LD4[%28]> [ORD=24] [ID=34]
0x7fed0890c410: i64 = add 0x7fed0890c210, 0x7fed0890d710 [ORD=23] [ID=33]
0x7fed0890c210: i64,ch = CopyFromReg 0x7fed09a09970, 0x7fed0890d810 [ORD=19] [ID=17]
0x7fed0890d810: i64 = Register %vreg1 [ORD=19] [ID=1]
0x7fed0890d710: i64 = shl 0x7fed08908a10, 0x7fed088a1710 [ORD=23] [ID=32]
0x7fed08908a10: i64 = bitcast 0x7fed088a1f10 [ID=31]
0x7fed088a1f10: v2i32 = IGILISD::MOVSWZ 0x7fed0890c110, 0x7fed0890c610, 0x7fed08918510, 0x7fed08918510 [ID=30]
0x7fed088a1710: i64 = bitcast 0x7fed088a1810 [ID=26]
0x7fed088a1810: v2i32 = IGILISD::MOVSWZ 0x7fed0890a410, 0x7fed08918510, 0x7fed08918510, 0x7fed08918510 [ID=23]
0x7fed08908810: i64 = bitcast 0x7fed0890a210 [ID=25]
0x7fed0890a210: v2i32 = IGILISD::MOVSWZ 0x7fed08918510, 0x7fed08918510, 0x7fed08918510, 0x7fed08918510 [ID=21]
0x7fed08918510: i32 = Constant<0> [ORD=31] [ID=9]
0x7fed08918510: i32 = Constant<0> [ORD=31] [ID=9]
0x7fed08918510: i32 = Constant<0> [ORD=31] [ID=9]
0x7fed08918510: i32 = Constant<0> [ORD=31] [ID=9]
0x7fed08918910: i32 = Constant<31> [ID=15]
0x7fed08918510: i32 = Constant<0> [ORD=31] [ID=9]
0x7fed08918510: i32 = Constant<0> [ORD=31] [ID=9]
0x7fed088a1510: i64,ch = CopyFromReg 0x7fed09a09970, 0x7fed0890cb10 [ORD=28] [ID=20]
0x7fed0890cb10: i64 = Register %vreg60 [ORD=28] [ID=7]
In function: _ZN10tensorflow14Gat
kernel build error:
Something went wrong with clCreateKernel, OpenCL error code -45
__internal__ build log:
<program source>:31:36: warning: unused variable 'pGlobalVars'
const struct GlobalVars* const pGlobalVars = &globalVars;
^
Cannot select: 0x7fed088a1310: i32 = any_extend 0x7fed08909010 [ID=43]
0x7fed08909010: i32 = IGILISD::IGILSETCC 0x7fed088a1b10, 0x7fed088a1510, 0x7fed0890d010 [ID=42]
0x7fed088a1b10: i64 = bitcast 0x7fed08918b10 [ID=41]
0x7fed08918b10: v2i32 = IGILISD::MOVSWZ 0x7fed0890cf10, 0x7fed088a1e10, 0x7fed08918510, 0x7fed08918510 [ID=38]
0x7fed0890cf10: i32,ch = load 0x7fed09a09970, 0x7fed0890c410, 0x7fed08908810<LD4[%28]> [ORD=24] [ID=34]
0x7fed0890c410: i64 = add 0x7fed0890c210, 0x7fed0890d710 [ORD=23] [ID=33]
0x7fed0890c210: i64,ch = CopyFromReg 0x7fed09a09970, 0x7fed0890d810 [ORD=19] [ID=17]
0x7fed0890d810: i64 = Register %vreg1 [ORD=19] [ID=1]
0x7fed0890d710: i64 = shl 0x7fed08908a10, 0x7fed088a1710 [ORD=23] [ID=32]
0x7fed08908a10: i64 = bitcast 0x7fed088a1f10 [ID=31]
0x7fed088a1f10: v2i32 = IGILISD::MOVSWZ 0x7fed0890c110, 0x7fed0890c610, 0x7fed08918510, 0x7fed08918510 [ID=30]
0x7fed0890c110: i32,i32 = sdivrem 0x7fed08919010, 0x7fed0890ce10 [ID=27]
0x7fed0890c610: i32 = sra 0x7fed0890c110, 0x7fed08918910 [ID=29]
0x7fed08918510: i32 = Constant<0> [ORD=31] [ID=9]
0x7fed08918510: i32 = Constant<0> [ORD=31] [ID=9]
0x7fed088a1710: i64 = bitcast 0x7fed088a1810 [ID=26]
0x7fed088a1810: v2i32 = IGILISD::MOVSWZ 0x7fed0890a410, 0x7fed08918510, 0x7fed08918510, 0x7fed08918510 [ID=23]
0x7fed0890a410: i32 = Constant<2> [ID=16]
0x7fed08918510: i32 = Constant<0> [ORD=31] [ID=9]
0x7fed08918510: i32 = Constant<0> [ORD=31] [ID=9]
0x7fed08918510: i32 = Constant<0> [ORD=31] [ID=9]
0x7fed08908810: i64 = bitcast 0x7fed0890a210 [ID=25]
0x7fed0890a210: v2i32 = IGILISD::MOVSWZ 0x7fed08918510, 0x7fed08918510, 0x7fed08918510, 0x7fed08918510 [ID=21]
0x7fed08918510: i32 = Constant<0> [ORD=31] [ID=9]
0x7fed08918510: i32 = Constant<0> [ORD=31] [ID=9]
0x7fed08918510: i32 = Constant<0> [ORD=31] [ID=9]
0x7fed08918510: i32 = Constant<0> [ORD=31] [ID=9]
0x7fed088a1e10: i32 = sra 0x7fed0890cf10, 0x7fed08918910 [ID=35]
0x7fed0890cf10: i32,ch = load 0x7fed09a09970, 0x7fed0890c410, 0x7fed08908810<LD4[%28]> [ORD=24] [ID=34]
0x7fed0890c410: i64 = add 0x7fed0890c210, 0x7fed0890d710 [ORD=23] [ID=33]
0x7fed0890c210: i64,ch = CopyFromReg 0x7fed09a09970, 0x7fed0890d810 [ORD=19] [ID=17]
0x7fed0890d810: i64 = Register %vreg1 [ORD=19] [ID=1]
0x7fed0890d710: i64 = shl 0x7fed08908a10, 0x7fed088a1710 [ORD=23] [ID=32]
0x7fed08908a10: i64 = bitcast 0x7fed088a1f10 [ID=31]
0x7fed088a1f10: v2i32 = IGILISD::MOVSWZ 0x7fed0890c110, 0x7fed0890c610, 0x7fed08918510, 0x7fed08918510 [ID=30]
0x7fed088a1710: i64 = bitcast 0x7fed088a1810 [ID=26]
0x7fed088a1810: v2i32 = IGILISD::MOVSWZ 0x7fed0890a410, 0x7fed08918510, 0x7fed08918510, 0x7fed08918510 [ID=23]
0x7fed08908810: i64 = bitcast 0x7fed0890a210 [ID=25]
0x7fed0890a210: v2i32 = IGILISD::MOVSWZ 0x7fed08918510, 0x7fed08918510, 0x7fed08918510, 0x7fed08918510 [ID=21]
0x7fed08918510: i32 = Constant<0> [ORD=31] [ID=9]
0x7fed08918510: i32 = Constant<0> [ORD=31] [ID=9]
0x7fed08918510: i32 = Constant<0> [ORD=31] [ID=9]
0x7fed08918510: i32 = Constant<0> [ORD=31] [ID=9]
0x7fed08918910: i32 = Constant<31> [ID=15]
0x7fed08918510: i32 = Constant<0> [ORD=31] [ID=9]
0x7fed08918510: i32 = Constant<0> [ORD=31] [ID=9]
0x7fed088a1510: i64,ch = CopyFromReg 0x7fed09a09970, 0x7fed0890cb10 [ORD=28] [ID=20]
0x7fed0890cb10: i64 = Register %vreg60 [ORD=28] [ID=7]
In function: _ZN10tensorflow14Gat
storing failed kernel into: easycl-failedkernel.cl
libc++abi.dylib: terminating with uncaught exception of type std::runtime_error: Something went wrong with clCreateKernel, OpenCL error code -45
__internal__ build log:
<program source>:31:36: warning: unused variable 'pGlobalVars'
const struct GlobalVars* const pGlobalVars = &globalVars;
^
Cannot select: 0x7fed088a1310: i32 = any_extend 0x7fed08909010 [ID=43]
0x7fed08909010: i32 = IGILISD::IGILSETCC 0x7fed088a1b10, 0x7fed088a1510, 0x7fed0890d010 [ID=42]
0x7fed088a1b10: i64 = bitcast 0x7fed08918b10 [ID=41]
0x7fed08918b10: v2i32 = IGILISD::MOVSWZ 0x7fed0890cf10, 0x7fed088a1e10, 0x7fed08918510, 0x7fed08918510 [ID=38]
0x7fed0890cf10: i32,ch = load 0x7fed09a09970, 0x7fed0890c410, 0x7fed08908810<LD4[%28]> [ORD=24] [ID=34]
0x7fed0890c410: i64 = add 0x7fed0890c210, 0x7fed0890d710 [ORD=23] [ID=33]
0x7fed0890c210: i64,ch = CopyFromReg 0x7fed09a09970, 0x7fed0890d810 [ORD=19] [ID=17]
0x7fed0890d810: i64 = Register %vreg1 [ORD=19] [ID=1]
0x7fed0890d710: i64 = shl 0x7fed08908a10, 0x7fed088a1710 [ORD=23] [ID=32]
0x7fed08908a10: i64 = bitcast 0x7fed088a1f10 [ID=31]
0x7fed088a1f10: v2i32 = IGILISD::MOVSWZ 0x7fed0890c110, 0x7fed0890c610, 0x7fed08918510, 0x7fed08918510 [ID=30]
0x7fed0890c110: i32,i32 = sdivrem 0x7fed08919010, 0x7fed0890ce10 [ID=27]
0x7fed0890c610: i32 = sra 0x7fed0890c110, 0x7fed08918910 [ID=29]
0x7fed08918510: i32 = Constant<0> [ORD=31] [ID=9]
0x7fed08918510: i32 = Constant<0> [ORD=31] [ID=9]
0x7fed088a1710: i64 = bitcast 0x7fed088a1810 [ID=26]
0x7fed088a1810: v2i32 = IGILISD::MOVSWZ 0x7fed0890a410, 0x7fed08918510, 0x7fed08918510, 0x7fed08918510 [ID=23]
0x7fed0890a410: i32 = Constant<2> [ID=16]
0x7fed08918510: i32 = Constant<0> [ORD=31] [ID=9]
0x7fed08918510: i32 = Constant<0> [ORD=31] [ID=9]
0x7fed08918510: i32 = Constant<0> [ORD=31] [ID=9]
0x7fed08908810: i64 = bitcast 0x7fed0890a210 [ID=25]
0x7fed0890a210: v2i32 = IGILISD::MOVSWZ 0x7fed08918510, 0x7fed08918510, 0x7fed08918510, 0x7fed08918510 [ID=21]
0x7fed08918510: i32 = Constant<0> [ORD=31] [ID=9]
0x7fed08918510: i32 = Constant<0> [ORD=31] [ID=9]
0x7fed08918510: i32 = Constant<0> [ORD=31] [ID=9]
0x7fed08918510: i32 = Constant<0> [ORD=31] [ID=9]
0x7fed088a1e10: i32 = sra 0x7fed0890cf10, 0x7fed08918910 [ID=35]
0x7fed0890cf10: i32,ch = load 0x7fed09a09970, 0x7fed0890c410, 0x7fed08908810<LD4[%28]> [ORD=24] [ID=34]
0x7fed0890c410: i64 = add 0x7fed0890c210, 0x7fed0890d710 [ORD=23] [ID=33]
0x7fed0890c210: i64,ch = CopyFromReg 0x7fed09a09970, 0x7fed0890d810 [ORD=19] [ID=17]
0x7fed0890d810: i64 = Register %vreg1 [ORD=19] [ID=1]
0x7fed0890d710: i64 = shl 0x7fed08908a10, 0x7fed088a1710 [ORD=23] [ID=32]
0x7fed08908a10: i64 = bitcast 0x7fed088a1f10 [ID=31]
0x7fed088a1f10: v2i32 = IGILISD::MOVSWZ 0x7fed0890c110, 0x7fed0890c610, 0x7fed08918510, 0x7fed08918510 [ID=30]
0x7fed088a1710: i64 = bitcast 0x7fed088a1810 [ID=26]
0x7fed088a1810: v2i32 = IGILISD::MOVSWZ 0x7fed0890a410, 0x7fed08918510, 0x7fed08918510, 0x7fed08918510 [ID=23]
0x7fed08908810: i64 = bitcast 0x7fed0890a210 [ID=25]
0x7fed0890a210: v2i32 = IGILISD::MOVSWZ 0x7fed08918510, 0x7fed08918510, 0x7fed08918510, 0x7fed08918510 [ID=21]
0x7fed08918510: i32 = Constant<0> [ORD=31] [ID=9]
0x7fed08918510: i32 = Constant<0> [ORD=31] [ID=9]
0x7fed08918510: i32 = Constant<0> [ORD=31] [ID=9]
0x7fed08918510: i32 = Constant<0> [ORD=31] [ID=9]
0x7fed08918910: i32 = Constant<31> [ID=15]
0x7fed08918510: i32 = Constant<0> [ORD=31] [ID=9]
0x7fed08918510: i32 = Constant<0> [ORD=31] [ID=9]
0x7fed088a1510: i64,ch = CopyFromReg 0x7fed09a09970, 0x7fed0890cb10 [ORD=28] [ID=20]
0x7fed0890cb10: i64 = Register %vreg60 [ORD=28] [ID=7]
In function: _ZN10tensorflow14Gatstoring failed kernel into: easycl-failedkernel.cl
Process finished with exit code 134 (interrupted by signal 6: SIGABRT)
When I run the logistic regression example, each epoch takes about 5 seconds on the GPU (RADEON RX460), while it takes about 0.3 seconds on the CPU (i7 4770). My operating system is Ubuntu 16.04 LTS. Note that I'm running the code on the CPU using python2.7, while I use python3 when running on GPU since it doesn't work any other way. But what could be the reason making the GPU run significantly slower?
bazel build --jobs 4 //tensorflow/tools/pip_package:build_pip_package
./tensorflow/stream_executor/dso_loader.h:25:30: fatal error: cuda/cuda_config.h: No such file or directory
And, it's just my oppinion, but it seems too crazy to start multi-gpu support right now. There is a lot of things need to be fixed before.
Hello, on Ubuntu 16.04, installed tensorflow-cl as per instructions in pip3. Keras is version 2.0.5, output error:
`
/usr/local/lib/python3.5/dist-packages/keras/backend/tensorflow_backend.py in _initialize_variables()
298 """Utility to initialize uninitialized variables on the fly.
299 """
--> 300 variables = tf.global_variables()
301 uninitialized_variables = []
302 for v in variables:
AttributeError: module 'tensorflow' has no attribute 'global_variables'
Seen somewhat similar issues online with fix to revert back to tf version 0.10 or upgrade to 0.12.
Has anyone seen this, or successfully used tf-cl in keras (version?)? Simple test importing tensorflow in python (no keras) seem to function okay.
/usr/lib/python3/dist-packages/logilab/common/decorators.py:40: DeprecationWarning: inspect.getargspec() is deprecated, use inspect.signature() instead
if len(getargspec(callableobj).args) == 1 or self.keyarg == 0:
going into keras/tests
======================= test_loss_masking.py =======================
Using TensorFlow backend.
====================== test_loss_weighting.py ======================
num platforms 1
checking platform id 0x7f9397ed6a18
num devices 2
num platforms 1
checking platform id 0x7f9397ed6a18
num devices 2
Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
Using OpenCL device: Pitcairn
I tensorflow/core/common_runtime/gpu/gpu_device.cc:989] Found device 0 with properties:
name: Pitcairn
major: -1 minor: -1 memoryClockRate (GHz) 1000
pciBusID 0000.0000
Total memory: 1.97GiB
Free memory: 1.31GiB
W tensorflow/stream_executor/cl/cl_driver.cc:587] creating context when one is currently active; existing: 0�p�
Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
Using OpenCL device: Pitcairn
I tensorflow/core/common_runtime/gpu/gpu_device.cc:989] Found device 1 with properties:
name: Pitcairn
major: -1 minor: -1 memoryClockRate (GHz) 1000
pciBusID 0000.0000
Total memory: 1.97GiB
Free memory: 1.31GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:877] cannot enable peer access from device ordinal 0 to device ordinal 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:877] cannot enable peer access from device ordinal 0 to device ordinal 1
I tensorflow/core/common_runtime/gpu/gpu_device.cc:877] cannot enable peer access from device ordinal 1 to device ordinal 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:877] cannot enable peer access from device ordinal 1 to device ordinal 1
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1011] DMA: 0 1
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1021] 0: N N
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1021] 1: N N
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1083] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Pitcairn, pci bus id: 0000.0000)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1083] Creating TensorFlow device (/gpu:1) -> (device: 1, name: Pitcairn, pci bus id: 0000.0000)
num platforms 1
checking platform id 0x7f9397ed6a18
num devices 2
num platforms 1
checking platform id 0x7f9397ed6a18
num devices 2
cl_driver DeviceAllocate 1192542208
num platforms 1
checking platform id 0x7f9397ed6a18
num devices 2
num platforms 1
checking platform id 0x7f9397ed6a18
num devices 2
cl_driver DeviceAllocate 1192542208
num platforms 1
checking platform id 0x7f9397ed6a18
num devices 2
building kernel _ZN5Eigen8internal15EigenMetaKernelINS_15TensorEvaluatorIKNS_14TensorAssignOpINS_9TensorMapINS_6TensorIfLi1ELi1EiEELi16ENS_11MakePointerEEEKNS_18TensorCwiseUnaryOpINS0_12scalar_rightIffNS0_17scalar_product_opIffEEEEKNS4_INS5_IKfLi1ELi1EiEELi16ES7_EEEEEENS_9GpuDeviceEEEiEEvT_T0_
Segmentation fault (core dumped)
You have to add support for some debug parameters/env-vars (if still doesn't). For example, if I had access to failed kernel source I would try to compile it with CodeXL and compose more reliable bugreport.
The instructions have a step as follows:
pushd third_party/cuda-on-cl
make -j 4
sudo make install
popd
But it should be
pushd third_party/cuda-on-cl
mkdir build
cd build
cmake ..
make -j 4
sudo make install
popd
Seems that gcc 5.4.0 can't handle string like this, needed escaping or to put all on one line.
EXPECT_EQ(R"( v2 = v1[0];
v3 = (&(v1[0].f1));
v6 = v5[0];
)", oss.str());
Got
tensorflow-cl/third_party/cuda-on-cl/test/gtest/test_block_dumper.cpp:428:23: error: expected ‘)’ before ‘;’ token
and a lot of similar errors
It seems bazel can't find gcc header for libraries like farmharsh,jpeg,png, yet these headers are indeed in search path of gcc.
This is an example for building the jpeg lib.
http://paste.ubuntu.com/24038831/
I'm using Ubuntu14.04.2 and gcc4.8.4.
cuda-oc-cl installed with debug (spam) and test options enabled.
simplify _ZN5Eigen9half_impl9half_baseC2ERKS1_
instructions processed before crash 341
/home/inferno/.cache/bazel/_bazel_inferno/5213170a00a40926f3a8ece61425e0a5/execroot/tensorflow-cl/third_party/cuda-on-cl/bin/../share/cocl/cocl.Makefile:25: recipe for target 'bazel-out/local_linux-py3-fastbuild/bin/tensorflow/core/kernels/_objs/constant_op_gpu/tensorflow/core/kernels/constant_op_gpu-device.cl' failed
@blockIdx = extern_weak addrspace(1) global %struct.__cuda_builtin_blockIdx_t, align 1
@blockDim = extern_weak addrspace(1) global %struct.__cuda_builtin_blockDim_t, align 1
@threadIdx = extern_weak addrspace(1) global %struct.__cuda_builtin_threadIdx_t, align 1
@gridDim = extern_weak addrspace(1) global %struct.__cuda_builtin_gridDim_t, align 1
terminate called after throwing an instance of 'std::runtime_error'
what(): not implemented dumpmemcpy for align 2
make: *** [bazel-out/local_linux-py3-fastbuild/bin/tensorflow/core/kernels/_objs/constant_op_gpu/tensorflow/core/kernels/constant_op_gpu-device.cl] Aborted (core dumped)
ERROR: tensorflow-cl/tensorflow/core/kernels/BUILD:433:1: output 'tensorflow/core/kernels/_objs/constant_op_gpu/tensorflow/core/kernels/constant_op_gpu.cu.pic.o' was not created.
ERROR: tensorflow-cl/tensorflow/core/kernels/BUILD:433:1: not all outputs were created.
Target //tensorflow/tools/pip_package:build_pip_package failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 555.318s, Critical Path: 44.49s
Mac Sierra link doesn't work.
Another thing, I don't need to have any installation of Tensorflow to get Tensorflow-cl installed, correct?
Hello Hugh,
I installed tensorflow-cl as it said, but now I can't run any of the tests... I cant even import tensorflow in any file. Everytime I get the same error:
Traceback (most recent call last):
File "/home/tucker/anaconda3/lib/python3.5/site-packages/tensorflow/python/__init__.py", line 49, in <module>
from tensorflow.python import pywrap_tensorflow
File "/home/tucker/anaconda3/lib/python3.5/site-packages/tensorflow/python/pywrap_tensorflow.py", line 28, in <module>
_pywrap_tensorflow = swig_import_helper()
File "/home/tucker/anaconda3/lib/python3.5/site-packages/tensorflow/python/pywrap_tensorflow.py", line 24, in swig_import_helper
_mod = imp.load_module('_pywrap_tensorflow', fp, pathname, description)
File "/home/tucker/anaconda3/lib/python3.5/imp.py", line 242, in load_module
return load_dynamic(name, filename, file)
File "/home/tucker/anaconda3/lib/python3.5/imp.py", line 342, in load_dynamic
return _load(spec)
ImportError: /home/tucker/anaconda3/bin/../lib/libstdc++.so.6: version `CXXABI_1.3.8' not found (required by /home/tucker/anaconda3/lib/python3.5/site-packages/tensorflow/python/_pywrap_tensorflow.so)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/tucker/anaconda3/lib/python3.5/site-packages/tensorflow/__init__.py", line 23, in <module>
from tensorflow.python import *
File "/home/tucker/anaconda3/lib/python3.5/site-packages/tensorflow/python/__init__.py", line 60, in <module>
raise ImportError(msg)
ImportError: Traceback (most recent call last):
File "/home/tucker/anaconda3/lib/python3.5/site-packages/tensorflow/python/__init__.py", line 49, in <module>
from tensorflow.python import pywrap_tensorflow
File "/home/tucker/anaconda3/lib/python3.5/site-packages/tensorflow/python/pywrap_tensorflow.py", line 28, in <module>
_pywrap_tensorflow = swig_import_helper()
File "/home/tucker/anaconda3/lib/python3.5/site-packages/tensorflow/python/pywrap_tensorflow.py", line 24, in swig_import_helper
_mod = imp.load_module('_pywrap_tensorflow', fp, pathname, description)
File "/home/tucker/anaconda3/lib/python3.5/imp.py", line 242, in load_module
return load_dynamic(name, filename, file)
File "/home/tucker/anaconda3/lib/python3.5/imp.py", line 342, in load_dynamic
return _load(spec)
ImportError: /home/tucker/anaconda3/bin/../lib/libstdc++.so.6: version `CXXABI_1.3.8' not found (required by /home/tucker/anaconda3/lib/python3.5/site-packages/tensorflow/python/_pywrap_tensorflow.so)
Error importing tensorflow. Unless you are using bazel,
you should not try to import tensorflow from its source directory;
please exit the tensorflow source tree, and relaunch your python interpreter
from there.
How do I fix this?
Thanks in advance
Following your instruction: https://github.com/hughperkins/tensorflow-cl/blob/tensorflow-cl/doc/build-from-source.md
install bazel 0.4.0 by apt-get (bazel.io)
git clone --recursive https://github.com/hughperkins/cuda-on-cl
git checkout runtime-compile
cd cuda-on-cl
mkdir build
cd build
cmake ..
make -j8
sudo make install
cd ../test/cocl
cocl -fPIC cuda-sample.cu
./cuda-sample // make sure that cocl works.
2)git clone --recursive https://github.com/hughperkins/tensorflow-cl
./configure
bazel run --verbose_failures --logging 6 //tensorflow/tools/pip_package:build_pip_package
ERROR: /Work1/OpenCL/tensorflow-cl/te
nsorflow/python/BUILD:1773:1: in cc_library rule //tensorflow/python:tf_session_helper: non-test target '//tensorflow/python:tf_session_helper' depends on testonly target '//tensorflow/python:construction_fails_op' and doesn't have testonly attribute set.
ERROR: Analysis of target '//tensorflow/tools/pip_package:build_pip_package' failed; build aborted.
INFO: Elapsed time: 0.240s
ERROR: Build failed. Not running target
I follow the build instructions step by step on my ubuntu16.04 x64 system, but met issue when run
The followings are the log at end.
...
INFO: From Compiling tensorflow/core/lib/io/inputbuffer.cc:
tensorflow/core/lib/io/inputbuffer.cc: In member function 'tensorflow::Status tensorflow::io::InputBuffer::ReadNBytes(tensorflow::int64, std::__cxx11::string*)':
tensorflow/core/lib/io/inputbuffer.cc:81:18: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
if (bytes_read < bytes_to_read) result->resize(bytes_read);
^
At global scope:
cc1plus: warning: unrecognized command line option '-Wno-unused-local-typedef'
cc1plus: warning: unrecognized command line option '-Wno-c++11-narrowing'
INFO: From Compiling tensorflow/core/lib/io/snappy/snappy_outputbuffer.cc:
tensorflow/core/lib/io/snappy/snappy_outputbuffer.cc: In member function 'tensorflow::Status tensorflow::io::SnappyOutputBuffer::Write(tensorflow::StringPiece)':
tensorflow/core/lib/io/snappy/snappy_outputbuffer.cc:42:22: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
if (bytes_to_write <= AvailableInputSpace()) {
^
tensorflow/core/lib/io/snappy/snappy_outputbuffer.cc:53:22: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
if (bytes_to_write <= AvailableInputSpace()) {
^
tensorflow/core/lib/io/snappy/snappy_outputbuffer.cc: In member function 'void tensorflow::io::SnappyOutputBuffer::AddToInputBuffer(tensorflow::StringPiece)':
tensorflow/core/lib/io/snappy/snappy_outputbuffer.cc:110:22: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
if (bytes_to_write > free_tail_bytes) {
^
At global scope:
cc1plus: warning: unrecognized command line option '-Wno-unused-local-typedef'
cc1plus: warning: unrecognized command line option '-Wno-c++11-narrowing'
ERROR: /work/ml/tensorflow-cl/tensorflow/core/BUILD:956:1: undeclared inclusion(s) in rule '//tensorflow/core:lib_internal':
this rule is missing dependency declarations for the following files included by 'tensorflow/core/platform/profile_utils/cpu_utils.cc':
'/work/ml/tensorflow-cl/tensorflow/core/platform/profile_utils/android_armv7a_cpu_utils_helper.h'.
Target //tensorflow/tools/pip_package:build_pip_package failed to build
INFO: Elapsed time: 110.980s, Critical Path: 102.10s
ERROR: Build failed. Not running target.
Actually, i do not understand why android_armv7a is involved, my system is ubunut 16.04 with Intel skylake.
Anything i can have a try? thanks.
'tf.random_normal' broken on Ubuntu 16.04/NVIDIA
(env3)~/tf-coriander/tensorflow/models/image/mnist$ python convolutional.py
Extracting data/train-images-idx3-ubyte.gz
Extracting data/train-labels-idx1-ubyte.gz
Extracting data/t10k-images-idx3-ubyte.gz
Extracting data/t10k-labels-idx1-ubyte.gz
OpenCL platform: Intel Gen OCL Driver
OpenCL device: Intel(R) HD Graphics Haswell CRW GT3 Desktop
I tensorflow/core/common_runtime/gpu/gpu_device.cc:989] Found device 0 with properties:
name: Intel(R) HD Graphics Haswell CRW GT3 Desktop
major: -1 minor: -1 memoryClockRate (GHz) 1000
pciBusID 0000.0000
Total memory: 2.00GiB
Free memory: 1.50GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:877] cannot enable peer access from device ordinal 0 to device ordinal 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1011] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1021] 0: N
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1083] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Intel(R) HD Graphics Haswell CRW GT3 Desktop, pci bus id: 0000.0000)
cl_driver DeviceAllocate 1400897536
internal build log:
stringInput.cl:440:12: error: assigning 'struct class_tensorflow__random__PhiloxRandom *' to '__global struct class_tensorflow__random__PhiloxRandom *' changes address space of pointer
kernel build error:
Something went wrong with clCreateKernel, OpenCL error code -45
internal build log:
stringInput.cl:440:12: error: assigning 'struct class_tensorflow__random__PhiloxRandom *' to '__global struct class_tensorflow__random__PhiloxRandom *' changes address space of pointer
storing failed kernel into: easycl-failedkernel.cl
compileOpenCLKernel failed to compile opencl sourcecode
unique kernel name _ZN10tensorflow7functor28FillPhiloxRandomKernelLaunchINS_6random27TruncatedNormalDistributionINS2_19SingleSampleAdapterINS2_12PhiloxRandomEEEfEEEEvS5_PNT_17ResultElementTypeExS8__1_2_3
short kernel name _ZN10tensorflow7func
writing ll to /tmp/failed-kernel.ll
writing cl to /tmp/failed-kernel.cl
caught runtime error Something went wrong with clCreateKernel, OpenCL error code -45
internal build log:
stringInput.cl:440:12: error: assigning 'struct class_tensorflow__random__PhiloxRandom *' to '__global struct class_tensorflow__random__PhiloxRandom *' changes address space of pointer
storing failed kernel into: easycl-failedkernel.cl
terminate called after throwing an instance of 'std::runtime_error'
what(): Something went wrong with clCreateKernel, OpenCL error code -45
internal build log:
stringInput.cl:440:12: error: assigning 'struct class_tensorflow__random__PhiloxRandom *' to '__global struct class_tensorflow__random__PhiloxRandom *' changes address space of pointer
storing failed kernel into: easycl-failedkernel.cl
Aborted (core dumped)
python3 recurrent_network.py
crashes, dragging down Xorg with it.
I can't copy/paste the error from Ubuntu's bugchecker, for some reason (???), so I screenshotted the information, which included stack and traceback information. But, the basic error appears to land within the AMD GPU Pro driver code, with it trying to execute at 0x00... so perhaps a function pointer somewhere is being passed as null instead of pointing to an OpenCL function as intended?
Please enjoy a wall of images..
The install from source instructions say:
# build tensorflow
source ~/env3/bin/activate
But when running that I get:
$ source ~/env3/bin/activate bash: /home/ben/env3/bin/activate: No such file or directory
And then all steps after that fail.
Hugh, a simple HowTo would be helpful in building this repo. I was unable to follow it from the official webpage of tensorflow. I will illustrate what I did
a.) Used the CMake build from tensorflow
b.) Had to modify your bzl files to correctly point it to the eigen file (apparently it is a .zip and not .tar.gz)
c.) Started the build. It went well until it reached Eigen
d.) It was complaining about a missing file "cuda_runtime.h". Now I presume your code should handle that. Is there any #ifdef that needs to be specified?
Relevant test:
def test_split():
shape = (12, 1)
graph = tf.Graph()
with graph.as_default():
with tf.device('/gpu:0'):
a_tf = tf.placeholder(tf.float32, shape)
c_tf = tf.split(0, 4, a_tf)
sess = tf.Session()
with sess.as_default():
a = np.random.randn(*shape).astype(np.float32)
c = sess.run(c_tf, feed_dict={a_tf: a})
if(np.prod(shape)) < 20:
print('a', a)
print('c', c)
Result:
a [[ 0.039643 ]
[ 1.02737081]
[-1.39692032]
[-0.08065519]
[ 0.77159059]
[ 1.21571183]
[ 0.12854558]
[ 3.13103628]
[-0.31965023]
[-0.41063583]
[-1.0400176 ]
[ 0.10558813]]
c [array([[ nan],
[ nan],
[ nan]], dtype=float32), array([[ nan],
[ nan],
[ nan]], dtype=float32), array([[ nan],
[ nan],
[ nan]], dtype=float32), array([[ nan],
[ nan],
[ nan]], dtype=float32)]
(should not be nans...)
Update: looks like this involves passing a float **
into the kernel :-P . THis is one of the kernel parameters:
struct tensorflow__CudaDeviceArrayStruct {
int f0;
float* f1[8];
global float** f2;
};
(in bytecode:
%"struct.tensorflow::CudaDeviceArrayStruct" = type { i32, [8 x float*], float** }
)
Is there a quick way to test my installation to ensure that tensorflow is using the GPU and OpenCL as desired?
InlinedVector doesnt work on Mac, for InUse
structs
std::vector
(which was done in 8a02ae2...e97b994 to fix #34 )There is an opportunity for someone to look into why the InlinedVector doesnt work for InUse
objects, and specificlaly for objects containing std::function
, I think, so we can switch back to InlinedVecvtor
.
To start work on this issue:
tensorflow/core/common_runtime/gpu/gpu_event_mgr.h
, look for the line typedef std::vector<InUse> ToFreeVector;
typedef gtl::InlinedVector<InUse, 4> ToFreeVector;
PollEvents
, in tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc
, to reflect this change./configure
, and do a buildBenefits of fixing this:
Anything to check before working on this?
std::vector
, in the gpu_event_mgr, even in the presence of the segfault, and see if it actually changes anythingstd::vector
By the way, how to fix this?
std::function
to the aligment union in tensorflow//core/lib/gtl/inlined_vector.h
, for the u_
memberstd::function
requires larger alignment than a single pointer?CL_GPUOFFSET=1
to choose eg gpu 2with tf.device('/gpu:1'):
seem to workRef: try doing a Mac build, and log an issue...
tensorflow/tensorflow#22 (comment)
Operating System: Mac
tensorflow/tensorflow#22 (comment)
Installed version of CUDA and cuDNN:
(please attach the output of ls -l /path/to/cuda/lib/libcud*
):
Cuda is not supported on Mac Intel
If installed from binary pip package, provide:
python -c "import tensorflow; print(tensorflow.__version__)"
.derek$ python -c "import tensorflow; print(tensorflow.version)"
Traceback (most recent call last):
File "", line 1, in
File "tensorflow/init.py", line 23, in
from tensorflow.python import *
File "tensorflow/python/init.py", line 60, in
raise ImportError(msg)
ImportError: Traceback (most recent call last):
File "tensorflow/python/init.py", line 49, in
from tensorflow.python import pywrap_tensorflow
ImportError: cannot import name pywrap_tensorflowError importing tensorflow. Unless you are using bazel,
you should not try to import tensorflow from its source directory;
please exit the tensorflow source tree, and relaunch your python interpreter
from there.
If installed from source, provide
git rev-parse HEAD
)bazel version
-bash: bazel: command not found
Hi,
The build on ubuntu 16.04 isnt quite working. I broke it by "upgrading" the instructions to use bazel 0.4.5, instead of 0.3.2. It fails with an issue with not finding protoc
. If someone has a moment to take a look, would be much appreciated :-) . (I'm busy fixing Mac/Radeon stuff at the moment, personally)
can I ask a stupid question:
I don't see any .cu cuda files in tensorflow source code, how to use cocl to transform them into cl files?
Mac build doesnt run yet.
Please feel free to post/subscribe to this issue/thread, to receive updates on this point. (Or you can simply 'watch' the repository).
Edit: note that I'm targeting Mac Sierra, with Radeon Pro 450, for now.
If I run a command to show me the gpus, I get the following error:
Commands:
python3
import tensorflow as tf
# Creates a graph.
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print sess.run(c)
Error:
CommandLine Error:
Option 'enable-value-profiling' registered more than once!
LLVM ERROR: inconsistency in registered CommandLine options
It's on Ubuntu 16.04 using the pip3 install of the downloaded wheel.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.