hughperkins / tf-coriander Goto Github PK

OpenCL 1.2 implementation for Tensorflow

License: Apache License 2.0

Python 41.57% C++ 45.98% C 0.32% Java 0.17% CMake 0.23% Objective-C 0.02% Objective-C++ 0.22% Makefile 0.09% Shell 0.94% Jupyter Notebook 6.20% HTML 1.71% Go 0.16% JavaScript 0.04% TypeScript 2.35% CSS 0.01%

opencl tensorflow gpu mac radeon intel nvidia ubuntu

tf-coriander's People

Stargazers

Watchers

Forkers

xcbat benjamesbabala rhythm92 dcolley stevewusin guoyejun sunmoon9898 honggui shengzhou libardo1 kulshekhar pxli168 zhiming-huang ilibx xjump jicheng-yan chiahungtai jammyzhou tzry pustar kuronekodaisuki flexpad davidpicard tim5tang vfx01j pengkiki roshanraj iame6162013 mcin-armintaheri-archive scryner alex-yip whydazhou zofuthan chanwlee signalimagecv julianko13 gavargas22 antonpolishko mohnkhan ajs-88 jinguoxing arunrajeie hunslater-deeplearning cnjack ic guoyu07 amit2014 aakbar5 aztecsmith interfacefeng manqiaoyue mohammedgomaa pyzhangbit sujigrena jeremydeanw rajshrestha86 atharvakshi yarenty haoliu1706 hbfs hydrogenion strategist922 full-stack-ai-apps abhi5658054 cfrancesco salvatoretrimarchi vnarcizo pint1022 reiisky bryandaley pengaotian xzlxiao ardziv 38438-38438-org tonythomas01 brotersproduction 0ldm0s caihengyu520 ychuan1115 robertomalatesta giga-space cyberluke srikamal237 jamesbright abhiklodh pedromagar icodein ajunlonglive trellixvulnteam

tf-coriander's Issues

`tf.reduce_sum` seems to fail sometimes, on Mac

tf.reduce_sum seems to fail sometimes, on Mac

Sometiems, will sum to eg 1e-23.

I think this is because of my removing the guards in Eigen's TensorCudaReduction.h. I'm going to try re-adding the guards, and see if that fixes it. https://bitbucket.org/hughperkins/eigen/commits/4e47de64dcc4a106407893069409ab6ba95509d5

Segfault on Session Initialisation, Beignet, Intel Laptop Integrated Graphics

Consider this very low-priority, at least from me! I mostly use tf-coriander from my Desktop with dedicated graphics, not from my laptop. And, my integrated graphics may well be slower than my laptop CPU, I'm not sure.

Anyways, I just loaded the latest release (0.17.3) to try it out, and it segfaults on Session initialisation:

cathal@europa:~$ source venvs/tf-cl/bin/activate
(tf-cl) cathal@europa:~$ ipython
Python 3.5.2+ (default, Sep 22 2016, 12:18:14) 
Type 'copyright', 'credits' or 'license' for more information
IPython 6.1.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import tensorflow as tf
s
In [2]: sess = tf.Session()
OpenCL platform: Intel Gen OCL Driver
OpenCL device: Intel(R) HD Graphics Haswell Ultrabook GT2 Mobile
I tensorflow/core/common_runtime/gpu/gpu_device.cc:989] Found device 0 with properties: 
name: Intel(R) HD Graphics Haswell Ultrabook GT2 Mobile
major: -1 minor: -1 memoryClockRate (GHz) 1000
pciBusID 0000.0000
Total memory: 2.00GiB
Free memory: 1.00GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:877] cannot enable peer access from device ordinal 0 to device ordinal 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1011] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1021] 0:   N 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1083] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Intel(R) HD Graphics Haswell Ultrabook GT2 Mobile, pci bus id: 0000.0000)
cl_driver DeviceAllocate 864026624
Segmentation fault (core dumped)
(tf-cl) cathal@europa:~$

If it helps, here's my clinfo output:

(tf-cl) cathal@europa:~$ clinfo
Number of platforms                               1
  Platform Name                                   Intel Gen OCL Driver
  Platform Vendor                                 Intel
  Platform Version                                OpenCL 1.2 beignet 1.1.2
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_spir cl_khr_icd
  Platform Extensions function suffix             Intel

  Platform Name                                   Intel Gen OCL Driver
Number of devices                                 1
  Device Name                                     Intel(R) HD Graphics Haswell Ultrabook GT2 Mobile
  Device Vendor                                   Intel
  Device Vendor ID                                0x8086
  Device Version                                  OpenCL 1.2 beignet 1.1.2
  Driver Version                                  1.1.2
  Device OpenCL C Version                         OpenCL C 1.2 beignet 1.1.2
  Device Type                                     GPU
  Device Profile                                  FULL_PROFILE
  Max compute units                               20
  Max clock frequency                             1000MHz
  Device Partition                                (core)
    Max number of sub-devices                     1
    Supported partition types                     None, None, None
  Max work item dimensions                        3
  Max work item sizes                             512x512x512
  Max work group size                             512
  Preferred work group size multiple              16
  Preferred / native vector sizes                 
    char                                                16 / 8       
    short                                                8 / 8       
    int                                                  4 / 4       
    long                                                 2 / 2       
    half                                                 0 / 8        (n/a)
    float                                                4 / 4       
    double                                               0 / 2        (n/a)
  Half-precision Floating-point support           (n/a)
  Single-precision Floating-point support         (core)
    Denormals                                     No
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 No
    Round to infinity                             No
    IEEE754-2008 fused multiply-add               No
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Double-precision Floating-point support         (n/a)
  Address bits                                    32, Little-Endian
  Global memory size                              2147483648 (2GiB)
  Error Correction support                        No
  Max memory allocation                           1073741824 (1024MiB)
  Unified memory for Host and Device              Yes
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       1024 bits (128 bytes)
  Global Memory cache type                        Read/Write
  Global Memory cache size                        8192
  Global Memory cache line                        64 bytes
  Image support                                   Yes
    Max number of samplers per kernel             16
    Max size for 1D images from buffer            65536 pixels
    Max 1D or 2D image array size                 2048 images
    Max 2D image size                             8192x8192 pixels
    Max 3D image size                             8192x8192x2048 pixels
    Max number of read image args                 128
    Max number of write image args                8
  Local memory type                               Global
  Local memory size                               65536 (64KiB)
  Max constant buffer size                        134217728 (128MiB)
  Max number of constant args                     8
  Max size of kernel argument                     1024
  Queue properties                                
    Out-of-order execution                        No
    Profiling                                     Yes
  Prefer user sync for interop                    Yes
  Profiling timer resolution                      80ns
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            Yes
    SPIR versions                                 1.2
  printf() buffer size                            1048576 (1024KiB)
  Built-in kernels                                __cl_copy_region_align4;__cl_copy_region_align16;__cl_cpy_region_unalign_same_offset;__cl_copy_region_unalign_dst_offset;__cl_copy_region_unalign_src_offset;__cl_copy_buffer_rect;__cl_copy_image_1d_to_1d;__cl_copy_image_2d_to_2d;__cl_copy_image_3d_to_2d;__cl_copy_image_2d_to_3d;__cl_copy_image_3d_to_3d;__cl_copy_image_2d_to_buffer;__cl_copy_image_3d_to_buffer;__cl_copy_buffer_to_image_2d;__cl_copy_buffer_to_image_3d;__cl_fill_region_unalign;__cl_fill_region_align2;__cl_fill_region_align4;__cl_fill_region_align8_2;__cl_fill_region_align8_4;__cl_fill_region_align8_8;__cl_fill_region_align8_16;__cl_fill_region_align128;__cl_fill_image_1d;__cl_fill_image_1d_array;__cl_fill_image_2d;__cl_fill_image_2d_array;__cl_fill_image_3d;
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Device Extensions                               cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_spir cl_khr_icd

NULL platform behavior
  clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  Intel Gen OCL Driver
  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   Success [Intel]
  clCreateContext(NULL, ...) [default]            Success [Intel]
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  Success (1)
    Platform Name                                 Intel Gen OCL Driver
    Device Name                                   Intel(R) HD Graphics Haswell Ultrabook GT2 Mobile
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  Success (1)
    Platform Name                                 Intel Gen OCL Driver
    Device Name                                   Intel(R) HD Graphics Haswell Ultrabook GT2 Mobile

ICD loader properties
  ICD loader Name                                 OpenCL ICD Loader
  ICD loader Vendor                               OCL Icd free software
  ICD loader Version                              2.2.9
  ICD loader Profile                              OpenCL 2.1

Presented only in the spirit of compatibility and correctness, I don't personally need this to work right now. Though, it might improve the odds of using this for tutorials or workshops if it worked on Intel / Beignet.

undefined symbol: _ZN7clblast13CacheClearAllEv

Can't load python module:

/usr/local/lib/python3.5/dist-packages/tensorflow/python/../third_party/cuda-on-cl/libcocl.so: undefined symbol: _ZN7clblast13CacheClearAllEv

No any symbol similar to _ZN7clblast13CacheClearAllEv was found with

nm /usr/local/lib/python3.5/dist-packages/tensorflow/python/../third_party/cuda-on-cl/libclblast.so|grep -i cache|grep -i clear

Enhancement: Patreon

I wasn't kidding in the Tensorflow discussion; I use a bunch of software that you maintain, including the CL ports of Torch7. Soon, I hope to be able to use Tensorflow thanks to your efforts. I bought an AMD GPU rather than a NVidia because I believe in open standards, and also because I could see a handful of dedicated people were working on OpenCL ports of the big frameworks; chiefly yourself.

Well, buying AMD saved me literally hundreds of Euro for the same performance. Please give me a chance to contribute some of that saving towards your efforts, as a way to say thank-you. :)

Radeon GPU (macbook pro) Setup Question

Hi Hugh,

Thanks for your great work!
I was wondering if you would be able to add a step-by-step tutorial on how to setup the code for using Radeon Pro GPU on MacBook with Tensorflow? It'd be very helpful!

Best,
Jacob

"Please set CLANG_HOME"

cocl on branch origin/working-on-incremental-function-building
tensorflow-cl on branch tensorflow-cl

INFO: From Compiling tensorflow/core/kernels/gather_functor_gpu.cu.cc:

Please set CLANG_HOME

cocl built with CLANG_HOME=/usr/lib/llvm-3.8 and installed into /usr/local/

Notes on Macbook Air -- Using Intel HD 5000

I've managed to install tf-coriander in OSX mavericks using the wheel file.
Thanks a lot!

Don't know where to put this - should i put this in the wiki?
The least I can do to document my experience for anyone who will come to this package.
Looking forward to keras v2.0 compatibility.

Keras ver 1.1.1
TF v0.11.0rc0

I realized the documentation in keras.io currently is already for the new version, as following examples there may not be updated.

example:
documentation:
keras.utils.to_categorical(y, num_classes)
model.fit - uses epochs as a parameter

keras v1.1.1
keras.utils.np_utils.to_categorical(y, nb_classes)
model.fit - uses nb_epoch as a parameter

unable to install .whl directly

hi,

my system is ubuntu 16.04 64bit, i tried both pip and pi3, tried both to ln -s python to python2/python3 under /usr/bin

$ pip install --upgrade tensorflow-0.11.0rc0-py3-none-any.whl
tensorflow-0.11.0rc0-py3-none-any.whl is not a supported wheel on this platform.

i'm new to python, not sure if it is my env issue relative to python, do you happen to know to debug/fix it? thanks.

`tf.truncated_normal` fails to run

I'm playing with midi-generative LSTMs, in code that's supposedly built for TF 0.10.x. I had to make some small modifications to get it to run, but I don't think those should have any bearing on the below error:

(Again, like #42, this isn't a high-priority bug for me)

(tf-cl) cathal@thinkum:~/Downloads/MusicGenerator$ python3 main.py --dataset_tag satie --model_tag satie
Welcome to DeepMusic v0.1 !

TensorFlow detected: v0.11.0rc0

Current parameters:
glob_step: 0
keep_all: False
dataset_tag: satie
sample_length: 40
hidden_size: 512
num_layers: 2
target_weights: linear
scheduled_sampling: none
batch_size: 64
save_every: 1000
ratio_dataset: 0.9
testing_curve: 10
batch_builder: relative
learning_rate: cst
enco_cell: identity
deco_cell: lstm
loop_processing: sample_softmax

Restoring dataset from /home/cathal/Downloads/MusicGenerator/data/samples/satie-relative.pkl...
Loaded: 18 songs (16 train/2 test)
Model creation...
OpenCL platform: AMD Accelerated Parallel Processing
OpenCL device: gfx803
I tensorflow/core/common_runtime/gpu/gpu_device.cc:989] Found device 0 with properties: 
name: gfx803
major: -1 minor: -1 memoryClockRate (GHz) 1266
pciBusID 0000.0000
Total memory: 8.00GiB
Free memory: 6.00GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:877] cannot enable peer access from device ordinal 0 to device ordinal 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1011] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1021] 0:   N 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1083] Creating TensorFlow device (/gpu:0) -> (device: 0, name: gfx803, pci bus id: 0000.0000)
cl_driver DeviceAllocate 6120328192
Initialize variables...
fabs is called, but not defined
This is probalby a bug in Coriander. Please file an issue at https://github.com/hughperkins/coriander/issues/new
basicblockdumper.runGeneration got exception whilst processing:
  %373 = call double @fabs(double %372) #8

generateOpenCL failed to generate opencl sourcecode
kernel name orig=_ZN10tensorflow7functor28FillPhiloxRandomKernelLaunchINS_6random27TruncatedNormalDistributionINS2_19SingleSampleAdapterINS2_12PhiloxRandomEEEfEEEEvS5_PNT_17ResultElementTypeExS8_
kernel name short=_ZN10tensorflow7func
kernel name unique=_ZN10tensorflow7functor28FillPhiloxRandomKernelLaunchINS_6random27TruncatedNormalDistributionINS2_19SingleSampleAdapterINS2_12PhiloxRandomEEEfEEEEvS5_PNT_17ResultElementTypeExS8__0_1_2
writing ll to /tmp/failed-kernel.ll
caught runtime error fabs is called, but not defined => cannot continue.  Sorry :-(
terminate called after throwing an instance of 'std::runtime_error'
  what():  fabs is called, but not defined => cannot continue.  Sorry :-(
Aborted (core dumped)

Explicit Device Specification doesn't work?

So, just decided to pull https://github.com/hughperkins/TensorFlow-Examples and run a few of the examples, to see how things are going since the fix to #34 and the addition of working ADAM.

The examples that specify a device always crash. Here's an example for 3_NeuralNetworks/dynamic_rnn.py:

cathal@thinkum:~/TensorFlow-Examples/examples/3_NeuralNetworks$ python3 dynamic_rnn.py 
/home/cathal/.local/lib/python3.5/site-packages/tensorflow/python/ops/gradients.py:90: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
OpenCL platform: AMD Accelerated Parallel Processing
OpenCL device: Hawaii
I tensorflow/core/common_runtime/gpu/gpu_device.cc:989] Found device 0 with properties: 
name: Hawaii
major: -1 minor: -1 memoryClockRate (GHz) 1040
pciBusID 0000.0000
Total memory: 7.57GiB
Free memory: 3.95GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:877] cannot enable peer access from device ordinal 0 to device ordinal 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1011] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1021] 0:   N 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1083] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Hawaii, pci bus id: 0000.0000)
cl_driver DeviceAllocate 3930062848
Traceback (most recent call last):
  File "/home/cathal/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 972, in _do_call
    return fn(*args)
  File "/home/cathal/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 950, in _run_fn
    self._extend_graph()
  File "/home/cathal/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 999, in _extend_graph
    self._session, graph_def.SerializeToString(), status)
  File "/usr/lib/python3.5/contextlib.py", line 66, in __exit__
    next(self.gen)
  File "/home/cathal/.local/lib/python3.5/site-packages/tensorflow/python/framework/errors.py", line 463, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors.InvalidArgumentError: Cannot assign a device to node 'split': Could not satisfy explicit device specification '/device:GPU:0' because no supported kernel for GPU devices is available.
Colocation Debug Info:
Colocation group had the following types and devices: 
Switch: GPU CPU 
Split: CPU 
	 [[Node: split = Split[T=DT_FLOAT, num_split=20, _device="/device:GPU:0"](split/split_dim, Reshape)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "dynamic_rnn.py", line 170, in <module>
    sess.run(init)
  File "/home/cathal/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 717, in run
    run_metadata_ptr)
  File "/home/cathal/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 915, in _run
    feed_dict_string, options, run_metadata)
  File "/home/cathal/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 965, in _do_run
    target_list, options, run_metadata)
  File "/home/cathal/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 985, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors.InvalidArgumentError: Cannot assign a device to node 'split': Could not satisfy explicit device specification '/device:GPU:0' because no supported kernel for GPU devices is available.
Colocation Debug Info:
Colocation group had the following types and devices: 
Switch: GPU CPU 
Split: CPU 
	 [[Node: split = Split[T=DT_FLOAT, num_split=20, _device="/device:GPU:0"](split/split_dim, Reshape)]]

Caused by op 'split', defined at:
  File "dynamic_rnn.py", line 155, in <module>
    pred = dynamicRNN(x, seqlen, weights, biases)
  File "dynamic_rnn.py", line 123, in dynamicRNN
    x = tf.split(0, seq_max_len, x)
  File "/home/cathal/.local/lib/python3.5/site-packages/tensorflow/python/ops/array_ops.py", line 1036, in split
    name=name)
  File "/home/cathal/.local/lib/python3.5/site-packages/tensorflow/python/ops/gen_array_ops.py", line 2621, in _split
    num_split=num_split, name=name)
  File "/home/cathal/.local/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 748, in apply_op
    op_def=op_def)
  File "/home/cathal/.local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2388, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/home/cathal/.local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1300, in __init__
    self._traceback = _extract_stack()

InvalidArgumentError (see above for traceback): Cannot assign a device to node 'split': Could not satisfy explicit device specification '/device:GPU:0' because no supported kernel for GPU devices is available.
Colocation Debug Info:
Colocation group had the following types and devices: 
Switch: GPU CPU 
Split: CPU 
	 [[Node: split = Split[T=DT_FLOAT, num_split=20, _device="/device:GPU:0"](split/split_dim, Reshape)]]

If I change the specification to :1 instead of :0 (because ???) I get this instead:

cathal@thinkum:~/TensorFlow-Examples/examples/3_NeuralNetworks$ python3 dynamic_rnn.py 
/home/cathal/.local/lib/python3.5/site-packages/tensorflow/python/ops/gradients.py:90: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
OpenCL platform: AMD Accelerated Parallel Processing
OpenCL device: Hawaii
I tensorflow/core/common_runtime/gpu/gpu_device.cc:989] Found device 0 with properties: 
name: Hawaii
major: -1 minor: -1 memoryClockRate (GHz) 1040
pciBusID 0000.0000
Total memory: 7.57GiB
Free memory: 3.95GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:877] cannot enable peer access from device ordinal 0 to device ordinal 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1011] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1021] 0:   N 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1083] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Hawaii, pci bus id: 0000.0000)
cl_driver DeviceAllocate 3930062848
Traceback (most recent call last):
  File "/home/cathal/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 972, in _do_call
    return fn(*args)
  File "/home/cathal/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 950, in _run_fn
    self._extend_graph()
  File "/home/cathal/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 999, in _extend_graph
    self._session, graph_def.SerializeToString(), status)
  File "/usr/lib/python3.5/contextlib.py", line 66, in __exit__
    next(self.gen)
  File "/home/cathal/.local/lib/python3.5/site-packages/tensorflow/python/framework/errors.py", line 463, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors.InvalidArgumentError: Cannot assign a device to node 'GradientDescent/learning_rate': Could not satisfy explicit device specification '/device:GPU:1' because no devices matching that specification are registered in this process; available devices: /job:localhost/replica:0/task:0/cpu:0, /job:localhost/replica:0/task:0/gpu:0
	 [[Node: GradientDescent/learning_rate = Const[dtype=DT_FLOAT, value=Tensor<type: float shape: [] values: 0.01>, _device="/device:GPU:1"]()]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "dynamic_rnn.py", line 170, in <module>
    sess.run(init)
  File "/home/cathal/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 717, in run
    run_metadata_ptr)
  File "/home/cathal/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 915, in _run
    feed_dict_string, options, run_metadata)
  File "/home/cathal/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 965, in _do_run
    target_list, options, run_metadata)
  File "/home/cathal/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 985, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors.InvalidArgumentError: Cannot assign a device to node 'GradientDescent/learning_rate': Could not satisfy explicit device specification '/device:GPU:1' because no devices matching that specification are registered in this process; available devices: /job:localhost/replica:0/task:0/cpu:0, /job:localhost/replica:0/task:0/gpu:0
	 [[Node: GradientDescent/learning_rate = Const[dtype=DT_FLOAT, value=Tensor<type: float shape: [] values: 0.01>, _device="/device:GPU:1"]()]]

Caused by op 'GradientDescent/learning_rate', defined at:
  File "dynamic_rnn.py", line 159, in <module>
    optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(cost)
  File "/home/cathal/.local/lib/python3.5/site-packages/tensorflow/python/training/optimizer.py", line 198, in minimize
    name=name)
  File "/home/cathal/.local/lib/python3.5/site-packages/tensorflow/python/training/optimizer.py", line 314, in apply_gradients
    self._prepare()
  File "/home/cathal/.local/lib/python3.5/site-packages/tensorflow/python/training/gradient_descent.py", line 62, in _prepare
    name="learning_rate")
  File "/home/cathal/.local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 657, in convert_to_tensor
    ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
  File "/home/cathal/.local/lib/python3.5/site-packages/tensorflow/python/framework/constant_op.py", line 180, in _constant_tensor_conversion_function
    return constant(v, dtype=dtype, name=name)
  File "/home/cathal/.local/lib/python3.5/site-packages/tensorflow/python/framework/constant_op.py", line 167, in constant
    attrs={"value": tensor_value, "dtype": dtype_value}, name=name).outputs[0]
  File "/home/cathal/.local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2388, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/home/cathal/.local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1300, in __init__
    self._traceback = _extract_stack()

InvalidArgumentError (see above for traceback): Cannot assign a device to node 'GradientDescent/learning_rate': Could not satisfy explicit device specification '/device:GPU:1' because no devices matching that specification are registered in this process; available devices: /job:localhost/replica:0/task:0/cpu:0, /job:localhost/replica:0/task:0/gpu:0
	 [[Node: GradientDescent/learning_rate = Const[dtype=DT_FLOAT, value=Tensor<type: float shape: [] values: 0.01>, _device="/device:GPU:1"]()]]

My clinfo output:

cathal@thinkum:~/TensorFlow-Examples/examples/3_NeuralNetworks$ clinfo
Number of platforms                               1
  Platform Name                                   AMD Accelerated Parallel Processing
  Platform Vendor                                 Advanced Micro Devices, Inc.
  Platform Version                                OpenCL 2.0 AMD-APP (2348.3)
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_icd cl_amd_event_callback cl_amd_offline_devices 
  Platform Extensions function suffix             AMD

  Platform Name                                   AMD Accelerated Parallel Processing
Number of devices                                 2
  Device Name                                     Hawaii
  Device Vendor                                   Advanced Micro Devices, Inc.
  Device Vendor ID                                0x1002
  Device Version                                  OpenCL 1.2 AMD-APP (2348.3)
  Driver Version                                  2348.3
  Device OpenCL C Version                         OpenCL C 1.2 
  Device Type                                     GPU
  Device Profile                                  FULL_PROFILE
  Device Board Name (AMD)                         AMD Radeon (TM) R9 390 Series
  Device Topology (AMD)                           PCI-E, 01:00.0
  Max compute units                               40
  SIMD per compute unit (AMD)                     4
  SIMD width (AMD)                                16
  SIMD instruction width (AMD)                    1
  Max clock frequency                             1040MHz
  Graphics IP (AMD)                               7.2
  Device Partition                                (core)
    Max number of sub-devices                     40
    Supported partition types                     none specified
  Max work item dimensions                        3
  Max work item sizes                             256x256x256
  Max work group size                             256
  Preferred work group size multiple              64
  Wavefront width (AMD)                           64
  Preferred / native vector sizes                 
    char                                                 4 / 4       
    short                                                2 / 2       
    int                                                  1 / 1       
    long                                                 1 / 1       
    half                                                 1 / 1        (n/a)
    float                                                1 / 1       
    double                                               1 / 1        (cl_khr_fp64)
  Half-precision Floating-point support           (n/a)
  Single-precision Floating-point support         (core)
    Denormals                                     No
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  Yes
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Address bits                                    64, Little-Endian
  Global memory size                              8131137536 (7.573GiB)
  Global free memory (AMD)                        7920852 (7.554GiB)
  Global memory channels (AMD)                    16
  Global memory banks per channel (AMD)           16
  Global memory bank width (AMD)                  256 bytes
  Error Correction support                        No
  Max memory allocation                           4244635648 (3.953GiB)
  Unified memory for Host and Device              No
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       2048 bits (256 bytes)
  Global Memory cache type                        Read/Write
  Global Memory cache size                        16384
  Global Memory cache line                        64 bytes
  Image support                                   Yes
    Max number of samplers per kernel             16
    Max size for 1D images from buffer            134217728 pixels
    Max 1D or 2D image array size                 2048 images
    Base address alignment for 2D image buffers   256 bytes
    Pitch alignment for 2D image buffers          256 bytes
    Max 2D image size                             16384x16384 pixels
    Max 3D image size                             2048x2048x2048 pixels
    Max number of read image args                 128
    Max number of write image args                8
  Local memory type                               Local
  Local memory size                               32768 (32KiB)
  Local memory syze per CU (AMD)                  65536 (64KiB)
  Local memory banks (AMD)                        32
  Max constant buffer size                        4244635648 (3.953GiB)
  Max number of constant args                     8
  Max size of kernel argument                     1024
  Queue properties                                
    Out-of-order execution                        No
    Profiling                                     Yes
  Prefer user sync for interop                    Yes
  Profiling timer resolution                      1ns
  Profiling timer offset since Epoch (AMD)        1496133664638937392ns (Tue May 30 09:41:04 2017)
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            No
    Thread trace supported (AMD)                  Yes
    SPIR versions                                 1.2
  printf() buffer size                            1048576 (1024KiB)
  Built-in kernels                                
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Device Extensions                               cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_image2d_from_buffer cl_khr_spir cl_khr_gl_event 

  Device Name                                     Intel(R) Core(TM) i5-6600K CPU @ 3.50GHz
  Device Vendor                                   GenuineIntel
  Device Vendor ID                                0x1002
  Device Version                                  OpenCL 1.2 AMD-APP (2348.3)
  Driver Version                                  2348.3 (sse2,avx)
  Device OpenCL C Version                         OpenCL C 1.2 
  Device Type                                     CPU
  Device Profile                                  FULL_PROFILE
  Device Board Name (AMD)                         
  Device Topology (AMD)                           (n/a)
  Max compute units                               4
  Max clock frequency                             799MHz
  Device Partition                                (core, cl_ext_device_fission)
    Max number of sub-devices                     4
    Supported partition types                     equally, by counts, by affinity domain
    Supported affinity domains                    L3 cache, L2 cache, L1 cache, next partitionable
    Supported partition types (ext)               equally, by counts, by affinity domain
    Supported affinity domains (ext)              L3 cache, L2 cache, L1 cache, next fissionable
  Max work item dimensions                        3
  Max work item sizes                             1024x1024x1024
  Max work group size                             1024
  Preferred work group size multiple              1
  Preferred / native vector sizes                 
    char                                                16 / 16      
    short                                                8 / 8       
    int                                                  4 / 4       
    long                                                 2 / 2       
    half                                                 4 / 4        (n/a)
    float                                                8 / 8       
    double                                               4 / 4        (cl_khr_fp64)
  Half-precision Floating-point support           (n/a)
  Single-precision Floating-point support         (core)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  Yes
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Address bits                                    64, Little-Endian
  Global memory size                              16788918272 (15.64GiB)
  Error Correction support                        No
  Max memory allocation                           4197229568 (3.909GiB)
  Unified memory for Host and Device              Yes
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       1024 bits (128 bytes)
  Global Memory cache type                        Read/Write
  Global Memory cache size                        32768
  Global Memory cache line                        64 bytes
  Image support                                   Yes
    Max number of samplers per kernel             16
    Max size for 1D images from buffer            65536 pixels
    Max 1D or 2D image array size                 2048 images
    Max 2D image size                             8192x8192 pixels
    Max 3D image size                             2048x2048x2048 pixels
    Max number of read image args                 128
    Max number of write image args                64
  Local memory type                               Global
  Local memory size                               32768 (32KiB)
  Max constant buffer size                        65536 (64KiB)
  Max number of constant args                     8
  Max size of kernel argument                     4096 (4KiB)
  Queue properties                                
    Out-of-order execution                        No
    Profiling                                     Yes
  Prefer user sync for interop                    Yes
  Profiling timer resolution                      1ns
  Profiling timer offset since Epoch (AMD)        1496133664638937392ns (Tue May 30 09:41:04 2017)
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            Yes
    SPIR versions                                 1.2
  printf() buffer size                            65536 (64KiB)
  Built-in kernels                                
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Device Extensions                               cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_spir cl_khr_gl_event 

NULL platform behavior
  clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  No platform
  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   No platform
  clCreateContext(NULL, ...) [default]            No platform
  clCreateContext(NULL, ...) [other]              Success [AMD]
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  No platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  No platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  No platform

ImportError: /lib/x86_64-linux-gnu/libm.so.6: version `GLIBC_2.23' not found

the error comes from
File "/home/bixian/.virtualenvs/tensorflow2/lib/python3.4/imp.py", line 243, in load_module return load_dynamic(name, filename, file)
when i tried to import tensorflow as tf and it says
ImportError: /lib/x86_64-linux-gnu/libm.so.6: version GLIBC_2.23' not found (required by /home/bixian/.virtualenvs/tensorflow2/lib/python3.4/site-packages/tensorflow/python/_pywrap_tensorflow.so)`

I tried
objdump -p _pywrap_tensorflow.so
and it says
...blablabla... Version References: required from libm.so.6: 0x09691a75 0x00 07 GLIBC_2.2.5 0x06969183 0x00 05 GLIBC_2.23 ...blablabla...

I also tried
nm _pywrap_tensorflow.so|grep GLIBC_2.23
and it says
U lgammaf@@GLIBC_2.23 U lgamma@@GLIBC_2.23

Somehow, i believe this problem comes from my ubuntu which is Ubuntu 14.04.5 LTS with my custom Linux Kernel 3.16.36-031636-generic. Because when i tried sudo apt-cache policy libc6, it says libc6: Installed: 2.19-0ubuntu6.9 Candidate: 2.19-0ubuntu6.9 ...blablabla.

Besides, I use python 3.4.3 in virtualenv environment as interpreter.

How can i install on windows

I have Radoen Rx 470 gpu install on windows 10. Do i have any option for running tensorflow on this gpu

Testing setup fails, directory not found

When I finish installing the wheel using pip3, I run the test setup and get the following error:
~$ pip install -r tensorflow/stream_executor/cl/test/requirements.txt Could not open requirements file: [Errno 2] No such file or directory: 'tensorflow/stream_executor/cl/test/requirements.txt'
So I spend all day trying to find the file and it doesn't exist

On Mac, training operation broken caused seg fault, using Sierra/Radeon

On Mac, training operation broken caused seg fault, using Sierra/Radeon.

ie, forward direction on a linear regression works ok:

'''
A linear regression learning algorithm example using TensorFlow library.

Author: Aymeric Damien
Project: https://github.com/aymericdamien/TensorFlow-Examples/
'''

from __future__ import print_function

import tensorflow as tf
import numpy
import matplotlib.pyplot as plt
rng = numpy.random

# Parameters
learning_rate = 0.01
training_epochs = 1000
training_epochs = 5
display_step = 50

with tf.device('/gpu:0'):
    # Training Data
    train_X = numpy.asarray([3.3,4.4,5.5,6.71,6.93,4.168,9.779,6.182,7.59,2.167,
                             7.042,10.791,5.313,7.997,5.654,9.27,3.1])
    train_Y = numpy.asarray([1.7,2.76,2.09,3.19,1.694,1.573,3.366,2.596,2.53,1.221,
                             2.827,3.465,1.65,2.904,2.42,2.94,1.3])
    n_samples = train_X.shape[0]

    # tf Graph Input
    X = tf.placeholder("float")
    Y = tf.placeholder("float")

    # Set model weights
    W = tf.Variable(rng.randn(), name="weight")
    b = tf.Variable(rng.randn(), name="bias")

    # Construct a linear model
    pred = tf.add(tf.mul(X, W), b)

    # Mean squared error
    cost = tf.reduce_sum(tf.pow(pred-Y, 2))/(2*n_samples)
    # Gradient descent
    optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)

    # Initializing the variables
    init = tf.initialize_all_variables()

    # Launch the graph
    with tf.Session() as sess:
        sess.run(init)


        # Fit all training data
        for epoch in range(training_epochs):
            batch_num = 0
            for (x, y) in zip(train_X, train_Y):
                x = sess.run(X, feed_dict={X: x})
                if batch_num == 0:
                    print('epoch %s' % epoch)
                    X_val, Y_val, W_val, b_val = sess.run((X, Y, W, b), feed_dict={X: x, Y: y})
                    print(X_val, Y_val, W_val, b_val)
                    print('pred', sess.run(pred, feed_dict={X: x, Y: y}))
                    print('cost', sess.run(cost, feed_dict={X: x, Y: y}))
                batch_num += 1

... but adding the optimizer operation causes segfault:

        # Fit all training data
        for epoch in range(training_epochs):
            batch_num = 0
            for (x, y) in zip(train_X, train_Y):
                x = sess.run(X, feed_dict={X: x})
                if batch_num == 0:
                    print('epoch %s' % epoch)
                    X_val, Y_val, W_val, b_val = sess.run((X, Y, W, b), feed_dict={X: x, Y: y})
                    print(X_val, Y_val, W_val, b_val)
                    print('pred', sess.run(pred, feed_dict={X: x, Y: y}))
                    print('cost', sess.run(cost, feed_dict={X: x, Y: y}))
                sess.run(optimizer, feed_dict={X: x, Y: y})
                batch_num += 1

F name _ZN5Eigen8internal15EigenMetaKe
 running generation on _ZN5Eigen8internal15EigenMetaKe
building kernel _ZN5Eigen8internal15EigenMetaKernelINS_15TensorEvaluatorIKNS_14TensorAssignOpINS_9TensorMapINS_6TensorIfLi1ELi1EiEELi16ENS_11MakePointerEEEKNS_20TensorCwiseNullaryOpINS0_15scalar_const_opIfEEKS8_EEEENS_9GpuDeviceEEEiEEvT_T0_
 ... built
building kernel _ZN5Eigen8internal15EigenMetaKernelINS_15TensorEvaluatorIKNS_14TensorAssignOpINS_9TensorMapINS_6TensorIfLi1ELi1EiEELi16ENS_11MakePointerEEEKNS_19TensorCwiseBinaryOpINS0_20scalar_difference_opIffEEKS8_KNS9_INS0_17scalar_product_opIKfSE_EEKNS_20TensorBroadcastingOpIKNS_5arrayIiLm1EEEKNS_17TensorReshapingOpIKNS_5SizesIJLl1EEEEKNS4_INS_15TensorFixedSizeISE_NSL_IJEEELi1EiS7_EELi16ES7_EEEEEEKNS4_INS5_ISE_Li1ELi1EiEELi16ES7_EEEEEEEENS_9GpuDeviceEEEiEEvT_T0_
F name _ZN5Eigen8internal15EigenMetaKe
 running generation on _ZN5Eigen8internal15EigenMetaKe
building kernel _ZN5Eigen8internal15EigenMetaKernelINS_15TensorEvaluatorIKNS_14TensorAssignOpINS_9TensorMapINS_6TensorIfLi1ELi1EiEELi16ENS_11MakePointerEEEKNS_19TensorCwiseBinaryOpINS0_20scalar_difference_opIffEEKS8_KNS9_INS0_17scalar_product_opIKfSE_EEKNS_20TensorBroadcastingOpIKNS_5arrayIiLm1EEEKNS_17TensorReshapingOpIKNS_5SizesIJLl1EEEEKNS4_INS_15TensorFixedSizeISE_NSL_IJEEELi1EiS7_EELi16ES7_EEEEEEKNS4_INS5_ISE_Li1ELi1EiEELi16ES7_EEEEEEEENS_9GpuDeviceEEEiEEvT_T0_
 ... built
Segmentation fault: 11

I'm taking a look at this issue.

Edit: seems something to do with event handling:

* thread #13, stop reason = EXC_BAD_ACCESS (code=1, address=0xfffffffffffffff0)
  * frame #0: 0x00007fffe6908a59 libc++abi.dylib`__dynamic_cast + 38
    frame #1: 0x00007fffd6966baa OpenCL`___lldb_unnamed_symbol306$$OpenCL + 37
    frame #2: 0x00007fffd6978ef3 OpenCL`clReleaseEvent + 15
    frame #3: 0x000000011ac46667 libcocl.dylib`::cuEventRecord(event=0x000000011b7d7ad0, _queue=<unavailable>) at cocl_events.cpp:92 [opt]
    frame #4: 0x00000001091dac75 _pywrap_tensorflow.so`perftools::gputools::cl::CLDriver::RecordEvent(context=0x000000012120db20, event=0x000000011b7d7ad0, stream="0W\x86\x1b\x01") at cl_driver.cc:1121
    frame #5: 0x00000001091eae95 _pywrap_tensorflow.so`perftools::gputools::cl::CLExecutor::CreateStreamDependency(this=0x000000011e55baf0, dependent=0x000000011b7de0a0, other=0x000000011b7a4f00) at cl_gpu_executor.cc:730
    frame #6: 0x0000000109272b64 _pywrap_tensorflow.so`perftools::gputools::StreamExecutor::CreateStreamDependency(this=0x000000011e55b190, dependent=0x000000011b7de0a0, other=0x000000011b7a4f00) at stream_executor_pimpl.cc:635
    frame #7: 0x0000000109246bc8 _pywrap_tensorflow.so`perftools::gputools::Stream::ThenWaitFor(this=0x000000011b7de0a0, other=0x000000011b7a4f00) at stream.cc:1335
    frame #8: 0x00000001091a236f _pywrap_tensorflow.so`tensorflow::GPUUtil::CopyCPUTensorToGPU(cpu_tensor=0x000000011b946e78, device_context=0x000000011e58b130, gpu_device=0x000000011b7a5ea0, gpu_tensor=0x000000011d9cc850, done=0x000000010060cf70)>) at gpu_util.cc:326
    frame #9: 0x00000001091ac613 _pywrap_tensorflow.so`tensorflow::GPUDeviceContext::CopyCPUTensorToDevice(this=0x000000011e58b130, cpu_tensor=0x000000011b946e78, device=0x000000011b7a5ea0, device_tensor=0x000000011d9cc850, done=<unavailable>)>) const at gpu_util_platform_specific.cc:29
    frame #10: 0x00000001096f8fda _pywrap_tensorflow.so`tensorflow::CopyTensor::ViaDMA(edge_name=(data_ = "edge_185__recv_Placeholder_0;0:0", size_ = 28), send_dev_context=0x0000000000000000, recv_dev_context=0x000000011e58b130, src=0x000000011b7ac770, dst=0x000000011b7a5ea0, src_alloc_attr=(value = 4), dst_alloc_attr=(value = 0), input=0x000000011b946e78, output=0x000000011d9cc850, done=0x000000010060bc50)>) at copy_tensor.cc:99
    frame #11: 0x00000001097a5cc9 _pywrap_tensorflow.so`tensorflow::IntraProcessRendezvous::SameWorkerRecvDone(this=0x000000011b946140, parsed=0x0000000101b19968, send_args=0x00007000023bba50, recv_args=0x00007000023baf30, in=0x000000011b946e78, out=0x000000011d9cc850, done=0x000000013d006800)>) at rendezvous_mgr.cc:106

not work 'global_variables_initializer' in 'Intel(R) Iris(TM) Graphics 6100'

OpenCL platform: Apple
OpenCL device: Intel(R) Iris(TM) Graphics 6100
I tensorflow/core/common_runtime/gpu/gpu_device.cc:989] Found device 0 with properties:
name: Intel(R) Iris(TM) Graphics 6100
major: -1 minor: -1 memoryClockRate (GHz) 1050
pciBusID 0000.0000
Total memory: 1.50GiB
Free memory: 384.00MiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:877] cannot enable peer access from device ordinal 0 to device ordinal 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1011] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1021] 0: N
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1083] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Intel(R) Iris(TM) Graphics 6100, pci bus id: 0000.0000)
cl_driver DeviceAllocate 192937984
Traceback (most recent call last):
File "/Users/user/PycharmProjects/tensorflow-test/cnn.py", line 105, in
sess.run(tf.global_variables_initializer())
AttributeError: module 'tensorflow' has no attribute 'global_variables_initializer'

Latest github code segfaults on Ubuntu 16.04 / NVIDIA

Using latest github code version, on Ubuntu 16.04 / nvidia, a bunch of tests pass, but every so often (quite often, unusably often), it segfaults:

I tensorflow/core/common_runtime/gpu/gpu_device.cc:877] cannot enable peer access from device ordinal 0 to device ordina
l 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1011] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1021] 0:   N
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1083] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GRID K
520, pci bus id: 0000.0000)
cl_driver DeviceAllocate 848478208
Segmentation fault (core dumped)

example backtrace, from gdb, https://gist.github.com/hughperkins/10855efd242b0786c7dfc2aa4075e59a

This looks annoyingly hard to diagnose/debug... :-(

Edit: backtrace with debug build: https://gist.github.com/hughperkins/68f636beb90fa9c8cb6d4687acce9f05

Enhancement: Wiki

Now that `tf-coriander' has reached a state of relative usefulness, it might be helpful to have a wiki to collect information that others have gathered on how to put it to use.

E.g., I identified a version of Keras that I believe to be ~compatible with the version of Tensorflow that tf-coriander represents, and that's information I'm happy to share with people.

Common pitfalls and how to solve them falls into the same category of "tidbits worth sharing".

Basically, anything that applies specifically to using tf-coriander, either due to the older Tensorflow base, or to the OpenCL nature, or to peculiarities of Coriander, could be thrown on a Wiki. Just a thought. :)

`autoencoder.py` hangs, on Mac Sierra/Radeon

On Mac Sierra + Radeon, autoencoder.py hangs. Cannot ctrl-c. Computer continues to work ok, eg can browse web etc, but have to restart Mac, before running any tensorflow script on GPU.

Install fails with wheel error

Following the readme instructions for a standard install gives the following error on Ubuntu 16.04.

$ pip install --upgrade tensorflow-0.11.0rc0-py3-none-any.whl
tensorflow-0.11.0rc0-py3-none-any.whl is not a supported wheel on this platform.

I fixed it by using pip3 instead of pip.

Reviews (positive/negative)

Please provide a single post stating:

what you were/are looking for, that tempted you to at least click into tf-coriander github page?
to what extent did you find it?
- if not, what is missing? what needs to change to provide the thing(s) that you need?
if you use it, that you are, and why
if you decided not to use it it, that you are not using it, and why

This is kind of an experimental approach to getting feedback :-) . But being starred or not doesnt give me much information on what people are looking for, whether they are finding it useful etc, so I'm going to try this approach :-)

Edit: note that I seem to have started adding 👍 to items to indicate I've read them. I probably wont reply into this thread. If you do want a reply, please consider raising a new issue, which I still might not reply to, but I might...

tensorflow-WHL file has a fixed path dependency to "/Users/hugh2/.../libclew.dylib"

I am using "anaconda" python 3.5 on Mac OSX Sierra. I have downloaded and installed the binary wheel file: "pip install tensorflow-0.11.0rc0-py3-none-any.whl" (version v0.17.2)

When I try to run the tensorflow tests or try to run any of the examples from: https://github.com/aymericdamien/TensorFlow-Examples I get an error:

File "", line 919, in create_module
File "", line 222, in _call_with_frames_removed
ImportError: dlopen(/Users/tomas/anaconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/_pywrap_tensorflow.so, 10): Library not loaded: /Users/hugh2/git-local/tensorflow-llvm40-addingconv/third_party/coriander/build/libclew.dylib
Referenced from: /Users/tomas/anaconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/_pywrap_tensorflow.so
Reason: image not found

The provided binary WHL file has a fixed path dependency to the "libclew.dylib" file in the following directory: "/Users/hugh2/git-local/tensorflow-llvm40-addingconv/third_party/coriander/build/"

Thanks for fixing this problem.

crash dynamic_rnn.py in tensorflow-cl

https://github.com/hughperkins/TensorFlow-Examples/blob/as-unit-tests/examples/3_NeuralNetworks/dynamic_rnn.py

/Library/Frameworks/Python.framework/Versions/3.6/bin/python3.6 /Users/user/PycharmProjects/tensorflow-test/cnn.py
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/ops/gradients.py:90: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
OpenCL platform: Apple
OpenCL device: Intel(R) Iris(TM) Graphics 6100
I tensorflow/core/common_runtime/gpu/gpu_device.cc:989] Found device 0 with properties: 
name: Intel(R) Iris(TM) Graphics 6100
major: -1 minor: -1 memoryClockRate (GHz) 1050
pciBusID 0000.0000
Total memory: 1.50GiB
Free memory: 384.00MiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:877] cannot enable peer access from device ordinal 0 to device ordinal 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1011] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1021] 0:   N 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1083] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Intel(R) Iris(TM) Graphics 6100, pci bus id: 0000.0000)
cl_driver DeviceAllocate 192937984
__internal__ build log: 
<program source>:31:36: warning: unused variable 'pGlobalVars'
    const struct GlobalVars* const pGlobalVars = &globalVars;
                                   ^
Cannot select: 0x7fed088a1310: i32 = any_extend 0x7fed08909010 [ID=43]
  0x7fed08909010: i32 = IGILISD::IGILSETCC 0x7fed088a1b10, 0x7fed088a1510, 0x7fed0890d010 [ID=42]
    0x7fed088a1b10: i64 = bitcast 0x7fed08918b10 [ID=41]
      0x7fed08918b10: v2i32 = IGILISD::MOVSWZ 0x7fed0890cf10, 0x7fed088a1e10, 0x7fed08918510, 0x7fed08918510 [ID=38]
        0x7fed0890cf10: i32,ch = load 0x7fed09a09970, 0x7fed0890c410, 0x7fed08908810<LD4[%28]> [ORD=24] [ID=34]
          0x7fed0890c410: i64 = add 0x7fed0890c210, 0x7fed0890d710 [ORD=23] [ID=33]
            0x7fed0890c210: i64,ch = CopyFromReg 0x7fed09a09970, 0x7fed0890d810 [ORD=19] [ID=17]
              0x7fed0890d810: i64 = Register %vreg1 [ORD=19] [ID=1]
            0x7fed0890d710: i64 = shl 0x7fed08908a10, 0x7fed088a1710 [ORD=23] [ID=32]
              0x7fed08908a10: i64 = bitcast 0x7fed088a1f10 [ID=31]
                0x7fed088a1f10: v2i32 = IGILISD::MOVSWZ 0x7fed0890c110, 0x7fed0890c610, 0x7fed08918510, 0x7fed08918510 [ID=30]
                  0x7fed0890c110: i32,i32 = sdivrem 0x7fed08919010, 0x7fed0890ce10 [ID=27]


                  0x7fed0890c610: i32 = sra 0x7fed0890c110, 0x7fed08918910 [ID=29]


                  0x7fed08918510: i32 = Constant<0> [ORD=31] [ID=9]
                  0x7fed08918510: i32 = Constant<0> [ORD=31] [ID=9]
              0x7fed088a1710: i64 = bitcast 0x7fed088a1810 [ID=26]
                0x7fed088a1810: v2i32 = IGILISD::MOVSWZ 0x7fed0890a410, 0x7fed08918510, 0x7fed08918510, 0x7fed08918510 [ID=23]
                  0x7fed0890a410: i32 = Constant<2> [ID=16]
                  0x7fed08918510: i32 = Constant<0> [ORD=31] [ID=9]
                  0x7fed08918510: i32 = Constant<0> [ORD=31] [ID=9]
                  0x7fed08918510: i32 = Constant<0> [ORD=31] [ID=9]
          0x7fed08908810: i64 = bitcast 0x7fed0890a210 [ID=25]
            0x7fed0890a210: v2i32 = IGILISD::MOVSWZ 0x7fed08918510, 0x7fed08918510, 0x7fed08918510, 0x7fed08918510 [ID=21]
              0x7fed08918510: i32 = Constant<0> [ORD=31] [ID=9]
              0x7fed08918510: i32 = Constant<0> [ORD=31] [ID=9]
              0x7fed08918510: i32 = Constant<0> [ORD=31] [ID=9]
              0x7fed08918510: i32 = Constant<0> [ORD=31] [ID=9]
        0x7fed088a1e10: i32 = sra 0x7fed0890cf10, 0x7fed08918910 [ID=35]
          0x7fed0890cf10: i32,ch = load 0x7fed09a09970, 0x7fed0890c410, 0x7fed08908810<LD4[%28]> [ORD=24] [ID=34]
            0x7fed0890c410: i64 = add 0x7fed0890c210, 0x7fed0890d710 [ORD=23] [ID=33]
              0x7fed0890c210: i64,ch = CopyFromReg 0x7fed09a09970, 0x7fed0890d810 [ORD=19] [ID=17]
                0x7fed0890d810: i64 = Register %vreg1 [ORD=19] [ID=1]
              0x7fed0890d710: i64 = shl 0x7fed08908a10, 0x7fed088a1710 [ORD=23] [ID=32]
                0x7fed08908a10: i64 = bitcast 0x7fed088a1f10 [ID=31]
                  0x7fed088a1f10: v2i32 = IGILISD::MOVSWZ 0x7fed0890c110, 0x7fed0890c610, 0x7fed08918510, 0x7fed08918510 [ID=30]




                0x7fed088a1710: i64 = bitcast 0x7fed088a1810 [ID=26]
                  0x7fed088a1810: v2i32 = IGILISD::MOVSWZ 0x7fed0890a410, 0x7fed08918510, 0x7fed08918510, 0x7fed08918510 [ID=23]




            0x7fed08908810: i64 = bitcast 0x7fed0890a210 [ID=25]
              0x7fed0890a210: v2i32 = IGILISD::MOVSWZ 0x7fed08918510, 0x7fed08918510, 0x7fed08918510, 0x7fed08918510 [ID=21]
                0x7fed08918510: i32 = Constant<0> [ORD=31] [ID=9]
                0x7fed08918510: i32 = Constant<0> [ORD=31] [ID=9]
                0x7fed08918510: i32 = Constant<0> [ORD=31] [ID=9]
                0x7fed08918510: i32 = Constant<0> [ORD=31] [ID=9]
          0x7fed08918910: i32 = Constant<31> [ID=15]
        0x7fed08918510: i32 = Constant<0> [ORD=31] [ID=9]
        0x7fed08918510: i32 = Constant<0> [ORD=31] [ID=9]
    0x7fed088a1510: i64,ch = CopyFromReg 0x7fed09a09970, 0x7fed0890cb10 [ORD=28] [ID=20]
      0x7fed0890cb10: i64 = Register %vreg60 [ORD=28] [ID=7]
In function: _ZN10tensorflow14Gat
kernel build error:
Something went wrong with clCreateKernel, OpenCL error code -45
__internal__ build log: 
<program source>:31:36: warning: unused variable 'pGlobalVars'
    const struct GlobalVars* const pGlobalVars = &globalVars;
                                   ^
Cannot select: 0x7fed088a1310: i32 = any_extend 0x7fed08909010 [ID=43]
  0x7fed08909010: i32 = IGILISD::IGILSETCC 0x7fed088a1b10, 0x7fed088a1510, 0x7fed0890d010 [ID=42]
    0x7fed088a1b10: i64 = bitcast 0x7fed08918b10 [ID=41]
      0x7fed08918b10: v2i32 = IGILISD::MOVSWZ 0x7fed0890cf10, 0x7fed088a1e10, 0x7fed08918510, 0x7fed08918510 [ID=38]
        0x7fed0890cf10: i32,ch = load 0x7fed09a09970, 0x7fed0890c410, 0x7fed08908810<LD4[%28]> [ORD=24] [ID=34]
          0x7fed0890c410: i64 = add 0x7fed0890c210, 0x7fed0890d710 [ORD=23] [ID=33]
            0x7fed0890c210: i64,ch = CopyFromReg 0x7fed09a09970, 0x7fed0890d810 [ORD=19] [ID=17]
              0x7fed0890d810: i64 = Register %vreg1 [ORD=19] [ID=1]
            0x7fed0890d710: i64 = shl 0x7fed08908a10, 0x7fed088a1710 [ORD=23] [ID=32]
              0x7fed08908a10: i64 = bitcast 0x7fed088a1f10 [ID=31]
                0x7fed088a1f10: v2i32 = IGILISD::MOVSWZ 0x7fed0890c110, 0x7fed0890c610, 0x7fed08918510, 0x7fed08918510 [ID=30]
                  0x7fed0890c110: i32,i32 = sdivrem 0x7fed08919010, 0x7fed0890ce10 [ID=27]


                  0x7fed0890c610: i32 = sra 0x7fed0890c110, 0x7fed08918910 [ID=29]


                  0x7fed08918510: i32 = Constant<0> [ORD=31] [ID=9]
                  0x7fed08918510: i32 = Constant<0> [ORD=31] [ID=9]
              0x7fed088a1710: i64 = bitcast 0x7fed088a1810 [ID=26]
                0x7fed088a1810: v2i32 = IGILISD::MOVSWZ 0x7fed0890a410, 0x7fed08918510, 0x7fed08918510, 0x7fed08918510 [ID=23]
                  0x7fed0890a410: i32 = Constant<2> [ID=16]
                  0x7fed08918510: i32 = Constant<0> [ORD=31] [ID=9]
                  0x7fed08918510: i32 = Constant<0> [ORD=31] [ID=9]
                  0x7fed08918510: i32 = Constant<0> [ORD=31] [ID=9]
          0x7fed08908810: i64 = bitcast 0x7fed0890a210 [ID=25]
            0x7fed0890a210: v2i32 = IGILISD::MOVSWZ 0x7fed08918510, 0x7fed08918510, 0x7fed08918510, 0x7fed08918510 [ID=21]
              0x7fed08918510: i32 = Constant<0> [ORD=31] [ID=9]
              0x7fed08918510: i32 = Constant<0> [ORD=31] [ID=9]
              0x7fed08918510: i32 = Constant<0> [ORD=31] [ID=9]
              0x7fed08918510: i32 = Constant<0> [ORD=31] [ID=9]
        0x7fed088a1e10: i32 = sra 0x7fed0890cf10, 0x7fed08918910 [ID=35]
          0x7fed0890cf10: i32,ch = load 0x7fed09a09970, 0x7fed0890c410, 0x7fed08908810<LD4[%28]> [ORD=24] [ID=34]
            0x7fed0890c410: i64 = add 0x7fed0890c210, 0x7fed0890d710 [ORD=23] [ID=33]
              0x7fed0890c210: i64,ch = CopyFromReg 0x7fed09a09970, 0x7fed0890d810 [ORD=19] [ID=17]
                0x7fed0890d810: i64 = Register %vreg1 [ORD=19] [ID=1]
              0x7fed0890d710: i64 = shl 0x7fed08908a10, 0x7fed088a1710 [ORD=23] [ID=32]
                0x7fed08908a10: i64 = bitcast 0x7fed088a1f10 [ID=31]
                  0x7fed088a1f10: v2i32 = IGILISD::MOVSWZ 0x7fed0890c110, 0x7fed0890c610, 0x7fed08918510, 0x7fed08918510 [ID=30]




                0x7fed088a1710: i64 = bitcast 0x7fed088a1810 [ID=26]
                  0x7fed088a1810: v2i32 = IGILISD::MOVSWZ 0x7fed0890a410, 0x7fed08918510, 0x7fed08918510, 0x7fed08918510 [ID=23]




            0x7fed08908810: i64 = bitcast 0x7fed0890a210 [ID=25]
              0x7fed0890a210: v2i32 = IGILISD::MOVSWZ 0x7fed08918510, 0x7fed08918510, 0x7fed08918510, 0x7fed08918510 [ID=21]
                0x7fed08918510: i32 = Constant<0> [ORD=31] [ID=9]
                0x7fed08918510: i32 = Constant<0> [ORD=31] [ID=9]
                0x7fed08918510: i32 = Constant<0> [ORD=31] [ID=9]
                0x7fed08918510: i32 = Constant<0> [ORD=31] [ID=9]
          0x7fed08918910: i32 = Constant<31> [ID=15]
        0x7fed08918510: i32 = Constant<0> [ORD=31] [ID=9]
        0x7fed08918510: i32 = Constant<0> [ORD=31] [ID=9]
    0x7fed088a1510: i64,ch = CopyFromReg 0x7fed09a09970, 0x7fed0890cb10 [ORD=28] [ID=20]
      0x7fed0890cb10: i64 = Register %vreg60 [ORD=28] [ID=7]
In function: _ZN10tensorflow14Gat
storing failed kernel into: easycl-failedkernel.cl
libc++abi.dylib: terminating with uncaught exception of type std::runtime_error: Something went wrong with clCreateKernel, OpenCL error code -45
__internal__ build log: 
<program source>:31:36: warning: unused variable 'pGlobalVars'
    const struct GlobalVars* const pGlobalVars = &globalVars;
                                   ^
Cannot select: 0x7fed088a1310: i32 = any_extend 0x7fed08909010 [ID=43]
  0x7fed08909010: i32 = IGILISD::IGILSETCC 0x7fed088a1b10, 0x7fed088a1510, 0x7fed0890d010 [ID=42]
    0x7fed088a1b10: i64 = bitcast 0x7fed08918b10 [ID=41]
      0x7fed08918b10: v2i32 = IGILISD::MOVSWZ 0x7fed0890cf10, 0x7fed088a1e10, 0x7fed08918510, 0x7fed08918510 [ID=38]
        0x7fed0890cf10: i32,ch = load 0x7fed09a09970, 0x7fed0890c410, 0x7fed08908810<LD4[%28]> [ORD=24] [ID=34]
          0x7fed0890c410: i64 = add 0x7fed0890c210, 0x7fed0890d710 [ORD=23] [ID=33]
            0x7fed0890c210: i64,ch = CopyFromReg 0x7fed09a09970, 0x7fed0890d810 [ORD=19] [ID=17]
              0x7fed0890d810: i64 = Register %vreg1 [ORD=19] [ID=1]
            0x7fed0890d710: i64 = shl 0x7fed08908a10, 0x7fed088a1710 [ORD=23] [ID=32]
              0x7fed08908a10: i64 = bitcast 0x7fed088a1f10 [ID=31]
                0x7fed088a1f10: v2i32 = IGILISD::MOVSWZ 0x7fed0890c110, 0x7fed0890c610, 0x7fed08918510, 0x7fed08918510 [ID=30]
                  0x7fed0890c110: i32,i32 = sdivrem 0x7fed08919010, 0x7fed0890ce10 [ID=27]


                  0x7fed0890c610: i32 = sra 0x7fed0890c110, 0x7fed08918910 [ID=29]


                  0x7fed08918510: i32 = Constant<0> [ORD=31] [ID=9]
                  0x7fed08918510: i32 = Constant<0> [ORD=31] [ID=9]
              0x7fed088a1710: i64 = bitcast 0x7fed088a1810 [ID=26]
                0x7fed088a1810: v2i32 = IGILISD::MOVSWZ 0x7fed0890a410, 0x7fed08918510, 0x7fed08918510, 0x7fed08918510 [ID=23]
                  0x7fed0890a410: i32 = Constant<2> [ID=16]
                  0x7fed08918510: i32 = Constant<0> [ORD=31] [ID=9]
                  0x7fed08918510: i32 = Constant<0> [ORD=31] [ID=9]
                  0x7fed08918510: i32 = Constant<0> [ORD=31] [ID=9]
          0x7fed08908810: i64 = bitcast 0x7fed0890a210 [ID=25]
            0x7fed0890a210: v2i32 = IGILISD::MOVSWZ 0x7fed08918510, 0x7fed08918510, 0x7fed08918510, 0x7fed08918510 [ID=21]
              0x7fed08918510: i32 = Constant<0> [ORD=31] [ID=9]
              0x7fed08918510: i32 = Constant<0> [ORD=31] [ID=9]
              0x7fed08918510: i32 = Constant<0> [ORD=31] [ID=9]
              0x7fed08918510: i32 = Constant<0> [ORD=31] [ID=9]
        0x7fed088a1e10: i32 = sra 0x7fed0890cf10, 0x7fed08918910 [ID=35]
          0x7fed0890cf10: i32,ch = load 0x7fed09a09970, 0x7fed0890c410, 0x7fed08908810<LD4[%28]> [ORD=24] [ID=34]
            0x7fed0890c410: i64 = add 0x7fed0890c210, 0x7fed0890d710 [ORD=23] [ID=33]
              0x7fed0890c210: i64,ch = CopyFromReg 0x7fed09a09970, 0x7fed0890d810 [ORD=19] [ID=17]
                0x7fed0890d810: i64 = Register %vreg1 [ORD=19] [ID=1]
              0x7fed0890d710: i64 = shl 0x7fed08908a10, 0x7fed088a1710 [ORD=23] [ID=32]
                0x7fed08908a10: i64 = bitcast 0x7fed088a1f10 [ID=31]
                  0x7fed088a1f10: v2i32 = IGILISD::MOVSWZ 0x7fed0890c110, 0x7fed0890c610, 0x7fed08918510, 0x7fed08918510 [ID=30]




                0x7fed088a1710: i64 = bitcast 0x7fed088a1810 [ID=26]
                  0x7fed088a1810: v2i32 = IGILISD::MOVSWZ 0x7fed0890a410, 0x7fed08918510, 0x7fed08918510, 0x7fed08918510 [ID=23]




            0x7fed08908810: i64 = bitcast 0x7fed0890a210 [ID=25]
              0x7fed0890a210: v2i32 = IGILISD::MOVSWZ 0x7fed08918510, 0x7fed08918510, 0x7fed08918510, 0x7fed08918510 [ID=21]
                0x7fed08918510: i32 = Constant<0> [ORD=31] [ID=9]
                0x7fed08918510: i32 = Constant<0> [ORD=31] [ID=9]
                0x7fed08918510: i32 = Constant<0> [ORD=31] [ID=9]
                0x7fed08918510: i32 = Constant<0> [ORD=31] [ID=9]
          0x7fed08918910: i32 = Constant<31> [ID=15]
        0x7fed08918510: i32 = Constant<0> [ORD=31] [ID=9]
        0x7fed08918510: i32 = Constant<0> [ORD=31] [ID=9]
    0x7fed088a1510: i64,ch = CopyFromReg 0x7fed09a09970, 0x7fed0890cb10 [ORD=28] [ID=20]
      0x7fed0890cb10: i64 = Register %vreg60 [ORD=28] [ID=7]
In function: _ZN10tensorflow14Gatstoring failed kernel into: easycl-failedkernel.cl


Process finished with exit code 134 (interrupted by signal 6: SIGABRT)

GPU runs slower than CPU

When I run the logistic regression example, each epoch takes about 5 seconds on the GPU (RADEON RX460), while it takes about 0.3 seconds on the CPU (i7 4770). My operating system is Ubuntu 16.04 LTS. Note that I'm running the code on the CPU using python2.7, while I use python3 when running on GPU since it doesn't work any other way. But what could be the reason making the GPU run significantly slower?

Failed to build python module

bazel build --jobs 4 //tensorflow/tools/pip_package:build_pip_package
./tensorflow/stream_executor/dso_loader.h:25:30: fatal error: cuda/cuda_config.h: No such file or directory

And, it's just my oppinion, but it seems too crazy to start multi-gpu support right now. There is a lot of things need to be fixed before.

Upgrade to latest Tensorflow version

Hello, on Ubuntu 16.04, installed tensorflow-cl as per instructions in pip3. Keras is version 2.0.5, output error:
`

/usr/local/lib/python3.5/dist-packages/keras/backend/tensorflow_backend.py in _initialize_variables()
    298     """Utility to initialize uninitialized variables on the fly.
    299     """
--> 300     variables = tf.global_variables()
    301     uninitialized_variables = []
    302     for v in variables:

AttributeError: module 'tensorflow' has no attribute 'global_variables'

Seen somewhat similar issues online with fix to revert back to tf version 0.10 or upgrade to 0.12.

Has anyone seen this, or successfully used tf-cl in keras (version?)? Simple test importing tensorflow in python (no keras) seem to function okay.

Another failed pytest with keras

/usr/lib/python3/dist-packages/logilab/common/decorators.py:40: DeprecationWarning: inspect.getargspec() is deprecated, use inspect.signature() instead
  if len(getargspec(callableobj).args) == 1 or self.keyarg == 0:
going into keras/tests
=======================  test_loss_masking.py  =======================
Using TensorFlow backend.

======================  test_loss_weighting.py  ======================
num platforms 1
checking platform id 0x7f9397ed6a18
num devices 2
num platforms 1
checking platform id 0x7f9397ed6a18
num devices 2
Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
Using OpenCL device: Pitcairn
I tensorflow/core/common_runtime/gpu/gpu_device.cc:989] Found device 0 with properties: 
name: Pitcairn
major: -1 minor: -1 memoryClockRate (GHz) 1000
pciBusID 0000.0000
Total memory: 1.97GiB
Free memory: 1.31GiB
W tensorflow/stream_executor/cl/cl_driver.cc:587] creating context when one is currently active; existing: 0�p�
Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
Using OpenCL device: Pitcairn
I tensorflow/core/common_runtime/gpu/gpu_device.cc:989] Found device 1 with properties: 
name: Pitcairn
major: -1 minor: -1 memoryClockRate (GHz) 1000
pciBusID 0000.0000
Total memory: 1.97GiB
Free memory: 1.31GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:877] cannot enable peer access from device ordinal 0 to device ordinal 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:877] cannot enable peer access from device ordinal 0 to device ordinal 1
I tensorflow/core/common_runtime/gpu/gpu_device.cc:877] cannot enable peer access from device ordinal 1 to device ordinal 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:877] cannot enable peer access from device ordinal 1 to device ordinal 1
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1011] DMA: 0 1 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1021] 0:   N N 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1021] 1:   N N 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1083] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Pitcairn, pci bus id: 0000.0000)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1083] Creating TensorFlow device (/gpu:1) -> (device: 1, name: Pitcairn, pci bus id: 0000.0000)
num platforms 1
checking platform id 0x7f9397ed6a18
num devices 2
num platforms 1
checking platform id 0x7f9397ed6a18
num devices 2
cl_driver DeviceAllocate 1192542208
num platforms 1
checking platform id 0x7f9397ed6a18
num devices 2
num platforms 1
checking platform id 0x7f9397ed6a18
num devices 2
cl_driver DeviceAllocate 1192542208
num platforms 1
checking platform id 0x7f9397ed6a18
num devices 2
building kernel _ZN5Eigen8internal15EigenMetaKernelINS_15TensorEvaluatorIKNS_14TensorAssignOpINS_9TensorMapINS_6TensorIfLi1ELi1EiEELi16ENS_11MakePointerEEEKNS_18TensorCwiseUnaryOpINS0_12scalar_rightIffNS0_17scalar_product_opIffEEEEKNS4_INS5_IKfLi1ELi1EiEELi16ES7_EEEEEENS_9GpuDeviceEEEiEEvT_T0_
Segmentation fault (core dumped)

You have to add support for some debug parameters/env-vars (if still doesn't). For example, if I had access to failed kernel source I would try to compile it with CodeXL and compose more reliable bugreport.

Install from source fix

The instructions have a step as follows:

build cuda-on-cl

pushd third_party/cuda-on-cl
make -j 4
sudo make install
popd

But it should be

build cuda-on-cl

pushd third_party/cuda-on-cl
mkdir build
cd build
cmake ..
make -j 4
sudo make install
popd

Can't compile gtests with gcc

Seems that gcc 5.4.0 can't handle string like this, needed escaping or to put all on one line.

    EXPECT_EQ(R"(    v2 = v1[0];
    v3 = (&(v1[0].f1));
    v6 = v5[0];
)", oss.str());

Got

tensorflow-cl/third_party/cuda-on-cl/test/gtest/test_block_dumper.cpp:428:23: error: expected ‘)’ before ‘;’ token

and a lot of similar errors

can't build from source

It seems bazel can't find gcc header for libraries like farmharsh,jpeg,png, yet these headers are indeed in search path of gcc.
This is an example for building the jpeg lib.
http://paste.ubuntu.com/24038831/

I'm using Ubuntu14.04.2 and gcc4.8.4.

Failed to build python module: "not implemented dumpmemcpy for align 2"

cuda-oc-cl installed with debug (spam) and test options enabled.

simplify _ZN5Eigen9half_impl9half_baseC2ERKS1_
instructions processed before crash 341
/home/inferno/.cache/bazel/_bazel_inferno/5213170a00a40926f3a8ece61425e0a5/execroot/tensorflow-cl/third_party/cuda-on-cl/bin/../share/cocl/cocl.Makefile:25: recipe for target 'bazel-out/local_linux-py3-fastbuild/bin/tensorflow/core/kernels/_objs/constant_op_gpu/tensorflow/core/kernels/constant_op_gpu-device.cl' failed
@blockIdx = extern_weak addrspace(1) global %struct.__cuda_builtin_blockIdx_t, align 1
@blockDim = extern_weak addrspace(1) global %struct.__cuda_builtin_blockDim_t, align 1
@threadIdx = extern_weak addrspace(1) global %struct.__cuda_builtin_threadIdx_t, align 1
@gridDim = extern_weak addrspace(1) global %struct.__cuda_builtin_gridDim_t, align 1
terminate called after throwing an instance of 'std::runtime_error'
  what():  not implemented dumpmemcpy for align 2
make: *** [bazel-out/local_linux-py3-fastbuild/bin/tensorflow/core/kernels/_objs/constant_op_gpu/tensorflow/core/kernels/constant_op_gpu-device.cl] Aborted (core dumped)
ERROR: tensorflow-cl/tensorflow/core/kernels/BUILD:433:1: output 'tensorflow/core/kernels/_objs/constant_op_gpu/tensorflow/core/kernels/constant_op_gpu.cu.pic.o' was not created.
ERROR: tensorflow-cl/tensorflow/core/kernels/BUILD:433:1: not all outputs were created.
Target //tensorflow/tools/pip_package:build_pip_package failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 555.318s, Critical Path: 44.49s

The link to Mac Sierra package is borken

Mac Sierra link doesn't work.

Another thing, I don't need to have any installation of Tensorflow to get Tensorflow-cl installed, correct?

Cannot run any tensorflow files after install [URGENT]

Hello Hugh,
I installed tensorflow-cl as it said, but now I can't run any of the tests... I cant even import tensorflow in any file. Everytime I get the same error:

Traceback (most recent call last):
  File "/home/tucker/anaconda3/lib/python3.5/site-packages/tensorflow/python/__init__.py", line 49, in <module>
    from tensorflow.python import pywrap_tensorflow
  File "/home/tucker/anaconda3/lib/python3.5/site-packages/tensorflow/python/pywrap_tensorflow.py", line 28, in <module>
    _pywrap_tensorflow = swig_import_helper()
  File "/home/tucker/anaconda3/lib/python3.5/site-packages/tensorflow/python/pywrap_tensorflow.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow', fp, pathname, description)
  File "/home/tucker/anaconda3/lib/python3.5/imp.py", line 242, in load_module
    return load_dynamic(name, filename, file)
  File "/home/tucker/anaconda3/lib/python3.5/imp.py", line 342, in load_dynamic
    return _load(spec)
ImportError: /home/tucker/anaconda3/bin/../lib/libstdc++.so.6: version `CXXABI_1.3.8' not found (required by /home/tucker/anaconda3/lib/python3.5/site-packages/tensorflow/python/_pywrap_tensorflow.so)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/tucker/anaconda3/lib/python3.5/site-packages/tensorflow/__init__.py", line 23, in <module>
    from tensorflow.python import *
  File "/home/tucker/anaconda3/lib/python3.5/site-packages/tensorflow/python/__init__.py", line 60, in <module>
    raise ImportError(msg)
ImportError: Traceback (most recent call last):
  File "/home/tucker/anaconda3/lib/python3.5/site-packages/tensorflow/python/__init__.py", line 49, in <module>
    from tensorflow.python import pywrap_tensorflow
  File "/home/tucker/anaconda3/lib/python3.5/site-packages/tensorflow/python/pywrap_tensorflow.py", line 28, in <module>
    _pywrap_tensorflow = swig_import_helper()
  File "/home/tucker/anaconda3/lib/python3.5/site-packages/tensorflow/python/pywrap_tensorflow.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow', fp, pathname, description)
  File "/home/tucker/anaconda3/lib/python3.5/imp.py", line 242, in load_module
    return load_dynamic(name, filename, file)
  File "/home/tucker/anaconda3/lib/python3.5/imp.py", line 342, in load_dynamic
    return _load(spec)
ImportError: /home/tucker/anaconda3/bin/../lib/libstdc++.so.6: version `CXXABI_1.3.8' not found (required by /home/tucker/anaconda3/lib/python3.5/site-packages/tensorflow/python/_pywrap_tensorflow.so)


Error importing tensorflow.  Unless you are using bazel,
you should not try to import tensorflow from its source directory;
please exit the tensorflow source tree, and relaunch your python interpreter
from there.

How do I fix this?
Thanks in advance

Building issue on x86

Following your instruction: https://github.com/hughperkins/tensorflow-cl/blob/tensorflow-cl/doc/build-from-source.md

install bazel

install bazel 0.4.0 by apt-get (bazel.io)

install cuda-on-cl

git clone --recursive https://github.com/hughperkins/cuda-on-cl
git checkout runtime-compile
cd cuda-on-cl
mkdir build
cd build
cmake ..
make -j8
sudo make install

cd ../test/cocl
cocl -fPIC cuda-sample.cu
./cuda-sample // make sure that cocl works.

2)git clone --recursive https://github.com/hughperkins/tensorflow-cl
./configure

put python path: /usr/bin/python3

'no' for hadoop, gpu, cloud, etc

bazel run --verbose_failures --logging 6 //tensorflow/tools/pip_package:build_pip_package

ERROR: /Work1/OpenCL/tensorflow-cl/te
nsorflow/python/BUILD:1773:1: in cc_library rule //tensorflow/python:tf_session_helper: non-test target '//tensorflow/python:tf_session_helper' depends on testonly target '//tensorflow/python:construction_fails_op' and doesn't have testonly attribute set.
ERROR: Analysis of target '//tensorflow/tools/pip_package:build_pip_package' failed; build aborted.
INFO: Elapsed time: 0.240s
ERROR: Build failed. Not running target

unable to build from source

I follow the build instructions step by step on my ubuntu16.04 x64 system, but met issue when run

The followings are the log at end.
...
INFO: From Compiling tensorflow/core/lib/io/inputbuffer.cc:
tensorflow/core/lib/io/inputbuffer.cc: In member function 'tensorflow::Status tensorflow::io::InputBuffer::ReadNBytes(tensorflow::int64, std::__cxx11::string*)':
tensorflow/core/lib/io/inputbuffer.cc:81:18: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
if (bytes_read < bytes_to_read) result->resize(bytes_read);
^
At global scope:
cc1plus: warning: unrecognized command line option '-Wno-unused-local-typedef'
cc1plus: warning: unrecognized command line option '-Wno-c++11-narrowing'
INFO: From Compiling tensorflow/core/lib/io/snappy/snappy_outputbuffer.cc:
tensorflow/core/lib/io/snappy/snappy_outputbuffer.cc: In member function 'tensorflow::Status tensorflow::io::SnappyOutputBuffer::Write(tensorflow::StringPiece)':
tensorflow/core/lib/io/snappy/snappy_outputbuffer.cc:42:22: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
if (bytes_to_write <= AvailableInputSpace()) {
^
tensorflow/core/lib/io/snappy/snappy_outputbuffer.cc:53:22: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
if (bytes_to_write <= AvailableInputSpace()) {
^
tensorflow/core/lib/io/snappy/snappy_outputbuffer.cc: In member function 'void tensorflow::io::SnappyOutputBuffer::AddToInputBuffer(tensorflow::StringPiece)':
tensorflow/core/lib/io/snappy/snappy_outputbuffer.cc:110:22: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
if (bytes_to_write > free_tail_bytes) {
^
At global scope:
cc1plus: warning: unrecognized command line option '-Wno-unused-local-typedef'
cc1plus: warning: unrecognized command line option '-Wno-c++11-narrowing'
ERROR: /work/ml/tensorflow-cl/tensorflow/core/BUILD:956:1: undeclared inclusion(s) in rule '//tensorflow/core:lib_internal':
this rule is missing dependency declarations for the following files included by 'tensorflow/core/platform/profile_utils/cpu_utils.cc':
'/work/ml/tensorflow-cl/tensorflow/core/platform/profile_utils/android_armv7a_cpu_utils_helper.h'.
Target //tensorflow/tools/pip_package:build_pip_package failed to build
INFO: Elapsed time: 110.980s, Critical Path: 102.10s
ERROR: Build failed. Not running target.

Actually, i do not understand why android_armv7a is involved, my system is ubunut 16.04 with Intel skylake.

Anything i can have a try? thanks.

'tf.random_normal' broken on Ubuntu 16.04

'tf.random_normal' broken on Ubuntu 16.04/NVIDIA

runs ok, but puts zeros for everything

Does it support multi-GPU setup?

mnist/convolutional.py failed

(env3)~/tf-coriander/tensorflow/models/image/mnist$ python convolutional.py
Extracting data/train-images-idx3-ubyte.gz
Extracting data/train-labels-idx1-ubyte.gz
Extracting data/t10k-images-idx3-ubyte.gz
Extracting data/t10k-labels-idx1-ubyte.gz
OpenCL platform: Intel Gen OCL Driver
OpenCL device: Intel(R) HD Graphics Haswell CRW GT3 Desktop
I tensorflow/core/common_runtime/gpu/gpu_device.cc:989] Found device 0 with properties:
name: Intel(R) HD Graphics Haswell CRW GT3 Desktop
major: -1 minor: -1 memoryClockRate (GHz) 1000
pciBusID 0000.0000
Total memory: 2.00GiB
Free memory: 1.50GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:877] cannot enable peer access from device ordinal 0 to device ordinal 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1011] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1021] 0: N
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1083] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Intel(R) HD Graphics Haswell CRW GT3 Desktop, pci bus id: 0000.0000)
cl_driver DeviceAllocate 1400897536
internal build log:
stringInput.cl:440:12: error: assigning 'struct class_tensorflow__random__PhiloxRandom *' to '__global struct class_tensorflow__random__PhiloxRandom *' changes address space of pointer

kernel build error:
Something went wrong with clCreateKernel, OpenCL error code -45
internal build log:
stringInput.cl:440:12: error: assigning 'struct class_tensorflow__random__PhiloxRandom *' to '__global struct class_tensorflow__random__PhiloxRandom *' changes address space of pointer

storing failed kernel into: easycl-failedkernel.cl
compileOpenCLKernel failed to compile opencl sourcecode
unique kernel name _ZN10tensorflow7functor28FillPhiloxRandomKernelLaunchINS_6random27TruncatedNormalDistributionINS2_19SingleSampleAdapterINS2_12PhiloxRandomEEEfEEEEvS5_PNT_17ResultElementTypeExS8__1_2_3
short kernel name _ZN10tensorflow7func
writing ll to /tmp/failed-kernel.ll
writing cl to /tmp/failed-kernel.cl
caught runtime error Something went wrong with clCreateKernel, OpenCL error code -45
internal build log:
stringInput.cl:440:12: error: assigning 'struct class_tensorflow__random__PhiloxRandom *' to '__global struct class_tensorflow__random__PhiloxRandom *' changes address space of pointer
storing failed kernel into: easycl-failedkernel.cl

terminate called after throwing an instance of 'std::runtime_error'
what(): Something went wrong with clCreateKernel, OpenCL error code -45
internal build log:
stringInput.cl:440:12: error: assigning 'struct class_tensorflow__random__PhiloxRandom *' to '__global struct class_tensorflow__random__PhiloxRandom *' changes address space of pointer
storing failed kernel into: easycl-failedkernel.cl

Aborted (core dumped)

Crashed Xorg on AMD GPUPro driver

python3 recurrent_network.py crashes, dragging down Xorg with it.

I can't copy/paste the error from Ubuntu's bugchecker, for some reason (???), so I screenshotted the information, which included stack and traceback information. But, the basic error appears to land within the AMD GPU Pro driver code, with it trying to execute at 0x00... so perhaps a function pointer somewhere is being passed as null instead of pointing to an OpenCL function as intended?

Please enjoy a wall of images..

No env3

The install from source instructions say:

# build tensorflow
source ~/env3/bin/activate

But when running that I get:
$ source ~/env3/bin/activate bash: /home/ben/env3/bin/activate: No such file or directory

And then all steps after that fail.

Build instructions

Hugh, a simple HowTo would be helpful in building this repo. I was unable to follow it from the official webpage of tensorflow. I will illustrate what I did
a.) Used the CMake build from tensorflow
b.) Had to modify your bzl files to correctly point it to the eigen file (apparently it is a .zip and not .tar.gz)
c.) Started the build. It went well until it reached Eigen
d.) It was complaining about a missing file "cuda_runtime.h". Now I presume your code should handle that. Is there any #ifdef that needs to be specified?

`split` failing

Relevant test:

https://github.com/hughperkins/tensorflow-cl/blob/dcdb5a64385c72aafa19e5fb9e16e64ec42ad751/tensorflow/stream_executor/cl/test/test_misc.py#L158-L180

def test_split():
    shape = (12, 1)
    graph = tf.Graph()
    with graph.as_default():
        with tf.device('/gpu:0'):
            a_tf = tf.placeholder(tf.float32, shape)
            c_tf = tf.split(0, 4, a_tf)
            sess = tf.Session()
            with sess.as_default():
                a = np.random.randn(*shape).astype(np.float32)
                c = sess.run(c_tf, feed_dict={a_tf: a})
                if(np.prod(shape)) < 20:
                    print('a', a)
                    print('c', c)

Result:

a [[ 0.039643  ]
 [ 1.02737081]
 [-1.39692032]
 [-0.08065519]
 [ 0.77159059]
 [ 1.21571183]
 [ 0.12854558]
 [ 3.13103628]
 [-0.31965023]
 [-0.41063583]
 [-1.0400176 ]
 [ 0.10558813]]
c [array([[ nan],
       [ nan],
       [ nan]], dtype=float32), array([[ nan],
       [ nan],
       [ nan]], dtype=float32), array([[ nan],
       [ nan],
       [ nan]], dtype=float32), array([[ nan],
       [ nan],
       [ nan]], dtype=float32)]

(should not be nans...)

Update: looks like this involves passing a float ** into the kernel :-P . THis is one of the kernel parameters:

struct tensorflow__CudaDeviceArrayStruct {
    int f0;
    float* f1[8];
    global float** f2;
};

(in bytecode:

%"struct.tensorflow::CudaDeviceArrayStruct" = type { i32, [8 x float*], float** }

)

Testing instructions

Is there a quick way to test my installation to ensure that tensorflow is using the GPU and OpenCL as desired?

Ensure InlinedVector works on Mac, for `InUse` structs

InlinedVector doesnt work on Mac, for InUse structs

means the eventmgr doesnt work on Mac, unless changed to use a normal std::vector (which was done in 8a02ae2...e97b994 to fix #34 )
however, would be nice to fix the inlinedvector for Mac

There is an opportunity for someone to look into why the InlinedVector doesnt work for InUse objects, and specificlaly for objects containing std::function, I think, so we can switch back to InlinedVecvtor.

To start work on this issue:

download latest tf-coriander
in tensorflow/core/common_runtime/gpu/gpu_event_mgr.h, look for the line typedef std::vector<InUse> ToFreeVector;
- change this line to be typedef gtl::InlinedVector<InUse, 4> ToFreeVector;
also change the signature of PollEvents, in tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc, to reflect this change
./configure, and do a build
try running something
segfault :-(

Benefits of fixing this:

maybe faster

Anything to check before working on this?

you could compare the performance with using InlinedVector vs std::vector, in the gpu_event_mgr, even in the presence of the segfault, and see if it actually changes anything
if the performance is the same, then you could report that here, and we'll close this out, leave it as std::vector

By the way, how to fix this?

I reckon it might be sufficient to add a std::function to the aligment union in tensorflow//core/lib/gtl/inlined_vector.h, for the u_ member
in other words, I think the issue is an alignment issue, since maybe std::function requires larger alignment than a single pointer?

multiple gpus not really working

cannot seem to use CL_GPUOFFSET=1 to choose eg gpu 2
nor does using with tf.device('/gpu:1'): seem to work

Mac build: `libc++abi.dylib: terminating with uncaught exception of type std::__1::system_error: mutex lock failed: Invalid argument`

created a draft travis build at https://travis-ci.org/hughperkins/tensorflow-cl and https://github.com/hughperkins/tensorflow-cl/blob/dev/.travis.yml
plesae feel free to dabble in trying to improve the mac travis bulid for tensorflow-cl
I imagine it might take a bunch of iterations to get this working well :-)

Mac build instructions

Ref: try doing a Mac build, and log an issue...

What related GitHub issues or StackOverflow threads have you found by searching the web for your problem?

tensorflow/tensorflow#22 (comment)

Environment info

Operating System: Mac
tensorflow/tensorflow#22 (comment)

Installed version of CUDA and cuDNN:
(please attach the output of ls -l /path/to/cuda/lib/libcud*):

Cuda is not supported on Mac Intel

If installed from binary pip package, provide:

A link to the pip package you installed:
The output from python -c "import tensorflow; print(tensorflow.__version__)".

derek$ python -c "import tensorflow; print(tensorflow.version)"
Traceback (most recent call last):
File "", line 1, in
File "tensorflow/init.py", line 23, in
from tensorflow.python import *
File "tensorflow/python/init.py", line 60, in
raise ImportError(msg)
ImportError: Traceback (most recent call last):
File "tensorflow/python/init.py", line 49, in
from tensorflow.python import pywrap_tensorflow
ImportError: cannot import name pywrap_tensorflow

Error importing tensorflow. Unless you are using bazel,
you should not try to import tensorflow from its source directory;
please exit the tensorflow source tree, and relaunch your python interpreter
from there.

If installed from source, provide

The commit hash (git rev-parse HEAD)

86e474d

The output of bazel version

-bash: bazel: command not found

If possible, provide a minimal reproducible example (We usually don't have time to read hundreds of lines of your code)

`tensorflow-cl` branch build on ubuntu 16.04 not quite working, if someone has a moment to take a look?

Hi,

The build on ubuntu 16.04 isnt quite working. I broke it by "upgrading" the instructions to use bazel 0.4.5, instead of 0.3.2. It fails with an issue with not finding protoc. If someone has a moment to take a look, would be much appreciated :-) . (I'm busy fixing Mac/Radeon stuff at the moment, personally)

Where is the cuda (.cu) files?

can I ask a stupid question:
I don't see any .cu cuda files in tensorflow source code, how to use cocl to transform them into cl files?

Mac build doesnt run yet

Mac build doesnt run yet.

Please feel free to post/subscribe to this issue/thread, to receive updates on this point. (Or you can simply 'watch' the repository).

Edit: note that I'm targeting Mac Sierra, with Radeon Pro 450, for now.

Tensorflow can't read GPU

If I run a command to show me the gpus, I get the following error:
Commands:

python3
import tensorflow as tf
# Creates a graph.
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print sess.run(c)

Error:

CommandLine Error:
Option 'enable-value-profiling' registered more than once!
LLVM ERROR: inconsistency in registered CommandLine options

It's on Ubuntu 16.04 using the pip3 install of the downloaded wheel.

hughperkins / tf-coriander Goto Github PK

tf-coriander's People

Stargazers

Watchers

Forkers

tf-coriander's Issues

build cuda-on-cl

build cuda-on-cl

install bazel

install cuda-on-cl

put python path: /usr/bin/python3

'no' for hadoop, gpu, cloud, etc

What related GitHub issues or StackOverflow threads have you found by searching the web for your problem?

Environment info

If possible, provide a minimal reproducible example (We usually don't have time to read hundreds of lines of your code)

Recommend Projects

Recommend Topics

Recommend Org

Jobs