Comments (34)
Honestly, not sure at this stage. I'll probably be working mostly with text-based models, RNNs/LSTMs etcetera, for the moment. I'd love to start playing with convnets but I'm not at that yet. :)
So for me, even being able to correctly initialise nets is
from tf-coriander.
Looking good for the py.test
output. :)
I'm sorry to say I don't know what you're referring to re: "RPATH". Happy to help test, if you give me context and/or suggested shell commands to interrogate this.
Going to start running a few TensorFlow-Examples scripts now, also.
from tf-coriander.
So far, all examples I've tried from TensorFlow-Examples have worked perfectly, after I shim out the GPU specifying context managers. I'm training convolutional_network.py
now and it's learing.. I don't know how well it's learning, but the loss generally is going down, and the accuracy is generally going up, so I'll mark that down as "success" for now. :)
Thanks again!
from tf-coriander.
Would be important to learn whether NVidia or Ubuntu are to blame, because NVidia have been accused of deliberately leaving their OpenCL drivers in a buggy/bad/slow state.
I ought to have a new AMD GPU later today, will try to test this out.
It would be interesting to consider a "software" fallback, though; e.g. an OpenCL kernel that, seeded by system entropy from /dev/urandom
, could generated ~CSPRNG output without relying on hardware entropy sources on the card. I wouldn't use it for crypto, but it would be fine for seeding random distributions? Some key-streams are very minimal and might be easy to implement, although most generate random ints, rather than floats. I'm not sure how irritating it is to type-cast from within OpenCL kernels.
from tf-coriander.
By default, if we choose not to register a GPU kernel, it will use a CPU kernel in its place.
I'd be very interested to know if this is Ubutnu specific or NVIDIA specific. This is key information for deciding the future of this issue.
from tf-coriander.
Just booted into my AMDGPU-pro driver environment for the AMD R9 390, Ubuntu 16.04, Intel i5, and ran tests:
cathal@thinkum:~/tf-coriander$ py.test
================================================================= test session starts =================================================================
platform linux -- Python 3.5.2, pytest-3.1.0, py-1.4.33, pluggy-0.4.0
rootdir: /home/cathal/tf-coriander, inifile: pytest.ini
plugins: pep8-1.0.6
collected 99 items
tensorflow/stream_executor/cl/test/conftest.py .
tensorflow/stream_executor/cl/test/measure_binary_ops_perf.py .
tensorflow/stream_executor/cl/test/measure_reduction_ops_perf_bybatchsize.py .
tensorflow/stream_executor/cl/test/measure_reductions_perf.py .
tensorflow/stream_executor/cl/test/measure_unary_ops_perf.py .
tensorflow/stream_executor/cl/test/measure_unary_ops_perf_bybatchsize.py .
tensorflow/stream_executor/cl/test/run_unary_op.py .
tensorflow/stream_executor/cl/test/test_binary_ops.py .xx................
tensorflow/stream_executor/cl/test/test_blas.py ..
tensorflow/stream_executor/cl/test/test_common.py .
tensorflow/stream_executor/cl/test/test_gradients.py ..
tensorflow/stream_executor/cl/test/test_loss.py ..
tensorflow/stream_executor/cl/test/test_misc.py .........s
tensorflow/stream_executor/cl/test/test_nn.py ..
tensorflow/stream_executor/cl/test/test_random.py .FF..
tensorflow/stream_executor/cl/test/test_reductions.py ...........................
tensorflow/stream_executor/cl/test/test_simple.py ..
tensorflow/stream_executor/cl/test/test_softmax.py ....
tensorflow/stream_executor/cl/test/test_unary_ops.py ................
------------------------------------- generated xml file: /home/cathal/tf-coriander/test/junit-pytest-report.xml --------------------------------------
=============================================================== short test summary info ===============================================================
FAIL tensorflow/stream_executor/cl/test/test_random.py::test_random_normal[shape0]
FAIL tensorflow/stream_executor/cl/test/test_random.py::test_random_normal[shape1]
SKIP [1] tensorflow/stream_executor/cl/test/test_misc.py:158: Need to fix passing float** to kernel for this to work
XFAIL tensorflow/stream_executor/cl/test/test_binary_ops.py::test[uint8-div-a / b]
XFAIL tensorflow/stream_executor/cl/test/test_binary_ops.py::test[uint8-mul-a * b]
====================================================================== FAILURES =======================================================================
_____________________________________________________________ test_random_normal[shape0] ______________________________________________________________
shape = (3, 4)
@pytest.mark.parametrize(
'shape',
shapes)
def test_random_normal(shape):
with tf.Graph().as_default():
with tf.device('/gpu:0'):
W_t = tf.Variable(tf.random_normal(shape))
mu_t = tf.reduce_mean(W_t)
var_t = tf.reduce_mean(W_t * W_t)
with tf.Session(config=tf.ConfigProto(log_device_placement=False)) as sess:
sess.run(tf.initialize_all_variables())
W, mu, var = sess.run((W_t, mu_t, var_t))
if np.prod(W.shape) < 20:
print('W', W)
else:
print('W.reshape(-1)[:20]', W.reshape(-1)[:20])
print('mu', mu, 'var', var)
assert abs(mu) < 1.0
> assert var > 0.05
E assert 0.0 > 0.05
tensorflow/stream_executor/cl/test/test_random.py:34: AssertionError
---------------------------------------------------------------- Captured stdout call -----------------------------------------------------------------
W [[ 0. 0. 0. 0.]
[ 0. 0. 0. 0.]
[ 0. 0. 0. 0.]]
mu 0.0 var 0.0
---------------------------------------------------------------- Captured stderr call -----------------------------------------------------------------
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1083] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Hawaii, pci bus id: 0000.0000)
_____________________________________________________________ test_random_normal[shape1] ______________________________________________________________
shape = (50, 70, 12)
@pytest.mark.parametrize(
'shape',
shapes)
def test_random_normal(shape):
with tf.Graph().as_default():
with tf.device('/gpu:0'):
W_t = tf.Variable(tf.random_normal(shape))
mu_t = tf.reduce_mean(W_t)
var_t = tf.reduce_mean(W_t * W_t)
with tf.Session(config=tf.ConfigProto(log_device_placement=False)) as sess:
sess.run(tf.initialize_all_variables())
W, mu, var = sess.run((W_t, mu_t, var_t))
if np.prod(W.shape) < 20:
print('W', W)
else:
print('W.reshape(-1)[:20]', W.reshape(-1)[:20])
print('mu', mu, 'var', var)
assert abs(mu) < 1.0
> assert var > 0.05
E assert 0.0 > 0.05
tensorflow/stream_executor/cl/test/test_random.py:34: AssertionError
---------------------------------------------------------------- Captured stdout call -----------------------------------------------------------------
W.reshape(-1)[:20] [ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0.]
mu 0.0 var 0.0
---------------------------------------------------------------- Captured stderr call -----------------------------------------------------------------
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1083] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Hawaii, pci bus id: 0000.0000)
============================================== 2 failed, 94 passed, 1 skipped, 2 xfailed in 5.64 seconds ==============================================
Device mapping:
/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: Hawaii, pci bus id: 0000.0000
c: /job:localhost/replica:0/task:0/gpu:0
b: /job:localhost/replica:0/task:0/gpu:0
a: /job:localhost/replica:0/task:0/gpu:0
Device mapping:
/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: Hawaii, pci bus id: 0000.0000
c: /job:localhost/replica:0/task:0/gpu:0
b: /job:localhost/replica:0/task:0/gpu:0
a: /job:localhost/replica:0/task:0/gpu:0
Consider this the same output for #13 and #34 also; no segfaults when I manually ran the MNIST example, and the tests didn't all segfault either. :)
So aside from the RNG, things are looking very good for the most recent release! How much is the RNG relied-upon for the other tests, though? If the RNG is always returning 0, then how do the other tests pass? Is it simply not used for seeding the weights of networks, generally?
from tf-coriander.
PS, I'll shortly be wiping and switching to the ROCm driver on a different card, so I don't know if I can continue providing useful test output for the AMDGPU-pro driver + R9 390, at least for a while (until I can afford another MoBo to plug it in! :) ).
from tf-coriander.
from tf-coriander.
UPdates on this:
- on Mac, turns out tha the first 4 random numbers match the cpu reults, but next numbers dont, so I'm going to fix taht first, since it's easier than spinnning up aws instances
- Turns out the random nubmers are an implementation of this paper http://www.thesalmons.org/john/random123/papers/random123sc11.pdf . Looks quite cool: can generate pseudo-random numbers in parallel, not like Mersenne Twister and similar.
I'm going to run the algorithm on the cpu, and compare it to the gpu result, and see how that goes. The implementation in tensorflow is here: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/lib/random/philox_random.h
from tf-coriander.
Created simple cpu-side, easy-to-run, easy-to-introspect philoxrandom script at https://github.com/hughperkins/pub-prototyping/blob/master/cpp/testphilox.cpp , which calls into a copied/modified version of the tensorflow PhiloxRandom generator at https://github.com/hughperkins/pub-prototyping/blob/master/cpp/from_tf/philox_random.h
This then gives the same outputs as calling tf.random_uniform
, with seed 123:
tensroflow output:
W_cpu [[ 0.04080963 0.20842123 0.09180295 0.70220065]
[ 0.7073133 0.39646494 0.06650937 0.29188633]
[ 0.02963269 0.95492315 0.00610673 0.3169049 ]]
test script output:
0: 1887779136 0.0408096
1: 1293593996 0.208421
2: 3473653811 0.091803
3: 257548726 0.702201
4: 727353662 0.707313
5: 2159198045 0.396465
6: 3498607457 0.0665094
7: 3324337288 0.291886
8: 1115933441 0.0296327
9: 1274690284 0.954923
10: 3053504539 0.00610673
11: 3861418071 0.316905
I'm then going to work my way through the opencl kernel, comparing the values in the kernel, with those on the cpu, and finding where they start to differ. I'm going to use COCL_DUMP_CL
, COCL_LOAD_CL
, and COCL_DUMP_CONFIG
to inject values I'm interested in, into an output buffer, and inspect them. https://github.com/hughperkins/coriander/blob/master/doc/advanced_usage.md#runtime-options
from tf-coriander.
Sounds really promising! Can't wait to see this one fixed, I have a feeling a lot of the test cases are "passing", but aren't really behaving as intended, because of poor initialisation. In the meantime I was planning to just monkey-patch this with the numpy-derived random hack, above, and see how that improves things. :P
from tf-coriander.
The test cases dont need initialization: they test specific operations. Please check them, and let me know any test cases you feel are weak, and how we can improve them.
from tf-coriander.
I mean, when you do py.test -v
, those test specific operations. The end-to-end tests are the scripts we are using from Aymeric Damien's Tensorflow-Examples.
from tf-coriander.
But yeah, monkey-patching will work, as long as you dont use AdamOptimizer, which initializes itself from tf.random
. It'll initialize itself really poorly currenlty, and learning will suck. If you use SGD optimizer, it should work ok-ish.
from tf-coriander.
Sorry, I meant the Tensorflow-Examples repo, yes. :) When I was running those previously I noticed that some of the examples weren't learning at any appreciable rate, which was probably down to the randn initialisation being poor. Or worse, if things were being set to 0, then perhaps a lot of neurons were simply dead. I'm running it all again after monkey-patching Numpy's randn in, and they seem to be learning again.
Good to know about Adam! Possibly the monkey-patch would have to be applied further up the chain to work on Adam, then.. This is all running in a virtualenv, so I don't mind doing radical surgery on the Random lib if it lets me use TF. :P
from tf-coriander.
...I'm perusing the code for AdamOptimizer, but I don't actually see any calls to tf.random
in it or its superclass. Is this instead happening somewhere else, like in tf.initialize_all_valirables
?
from tf-coriander.
Hmmm, just noticed my earlier reply failed to send:
Sorry, I meant the Tensorflow-Examples repo, yes. :) When I was running those previously I noticed that some of the examples weren't learning at any appreciable rate, which was probably down to the randn initialisation being poor
Yes thats true. You are right.
Possibly the monkey-patch would have to be applied further up the chain to work on Adam, then.
Possibly, but sounds a lot of work.
And thence onto new reply :-)
...I'm perusing the code for AdamOptimizer, but I don't actually see any calls to tf.random in it or its superclass. Is this instead happening somewhere else, like in tf.initialize_all_valirables?
So, what I would do would probably be in first instance to view the graph on the tensorboard. tensorboard rocks :-) . https://www.tensorflow.org/get_started/graph_viz
from tf-coriander.
Great advice, thanks!
from tf-coriander.
Update: tf.random_uniform
enhanced tests works on Mac now :-) . Means, tf.random_uniform
gives same results as cpu now, not like before. Fixed in 4c1c544 , but specifically in hughperkins/coriander@6452be7
I'm hoping that the experience in PhiloxRandom for Mac will help me to fix the Ubuntu version.
from tf-coriander.
(well... that's odd... the opencl generated on Mac is different than on Ubuntu. Thats quite unexpected ... since the opencl generation is my own code, therefore invariant, and it's based on the compilation using llvm-4.0, which is also invariant. Oh .... different standard libraries. Maybe for that)
Mac sample:
struct class_tensorflow__random__Array {
int f0[4];
};
struct class_tensorflow__random__Array_0 {
int f0[2];
};
struct class_tensorflow__random__NormalDistribution {
char f0;
};
struct class_tensorflow__random__PhiloxRandom {
struct class_tensorflow__random__Array f0;
struct class_tensorflow__random__Array_0 f1;
};
Ubuntu sample:
struct class_tensorflow__random__Array {
int f0[4];
};
struct class_tensorflow__random__Array_0 {
int f0[2];
};
struct class_tensorflow__random__NormalDistribution {
char f0;
};
struct class_tensorflow__random__Array_1 {
float f0[4];
};
struct class_tensorflow__random__PhiloxRandom {
struct class_tensorflow__random__Array f0;
struct class_tensorflow__random__Array_0 f1;
};
An entire extra struct. Exact same kernel being compiled...
Oh, I just found out (as I'm writing this) why random_normal is broken I reckon. There is a shim for sincosf
,but the mangled names are different, so on Mac it is correctly shimmed, but not on Ubuntu. That should be easy to fix...
End of OpenCL on Mac:
/* int v119 = phi v304 */
v119 = v304;
goto v3;
} else {
goto v10;
}
v10:;
goto v11;
v11:;
return;
}
End of OpenCL on Ubuntu:
goto v3;
} else {
goto v10;
}
v10:;
goto v11;
v11:;
return;
}
void _Z7sincosffPfS_(float v1, float* v2, float* v3, local int *scratch) {
}
That sincos
bit has a different mangled name than on Mac, and wasnt detected, but it's easy to tell Coriander the Ubuntu name, and thus detect it.
from tf-coriander.
Oh... I've found the difference... in Tensorflow, in the random distributions code, they have the following:
#if defined(__linux__)
sincosf(v1, f0, f1);
#else
*f0 = sinf(v1);
*f1 = cosf(v1);
#endif
Different code, depending on linux or not :-O
from tf-coriander.
Urgh, so this is an upstream issue, it assumes that Linux's version of available libs (CUDA?) have an extra function defined? Is that something fixable within the coriander framework, just by delegating to sin/cos?
from tf-coriander.
It's easy to fix, in Coriander, by writing an appropriate shim. There are some shims already https://github.com/hughperkins/coriander/blob/master/src/shims.cpp
Well... easy - ish, since it will be the first shim that needs to handle pointers, and pointers in opencl need address spaces statically declared. That means, I'll need a shim for every combination of address spaces used by the code. ie, in opencl:
float angle = 0.1f;
global float *globala;
global float *globalb;
float *a;
float *b;
sincos(angle, a, b); // both are private
sincos(angle, globala, globalb); // both are global
sincos(angle, a, globalb); // one private, one global...
local float localb[30];
sincos(angle, a, loclalb); // one private, a shared...
Since OpenCL 1.2 is C99, you can only use the name sincos
once, for one pair of address spaces, so there'll need to be a unique function name, for every pair of address spaces used, like eg sincos_p_p
, sincos_g_g
, ... So I'll need to write code to handle that. It's ok, I have various ideas on how to handle this, not a challenging issue, but might take a few man-hours or so.
from tf-coriander.
Fixed in 7d2deb9 :-) . Turned out didnt need any shims, or address space stuff. Yay :-)
from tf-coriander.
This is great! So, aside from Splitting, this is almost fully-functioning Tensorflow on CL now? :o Congratulations!
from tf-coriander.
split and conv. but yeah :-) Thanks! :-)
from tf-coriander.
(well, "fully-functioning". more like "has enough functionality to run some basic conv nets". It'd still be nice to have eg batchnormalization, and some other things, ideally.
)
from tf-coriander.
Question: what model(s) do you need to run in priority? I cant do everything at once, so if I know which model(s) you are trageting in priority, I can look at those first.
from tf-coriander.
Cool :-)
from tf-coriander.
created new v0.17.3 wheel. Dont suppose... do you mind double-checking that tf.random_normal
is working ok for you now? Also, maybe run all the unit tests too, ie py.test -v
?
from tf-coriander.
Already on it! :)
from tf-coriander.
:-) . By the way, I'm not sure if the RPATH stuff is correct on ubuntu. If it still fails, it might be because it's using the libcocl.so in /usr/local/lib. So, if it doesnt work after installign the wheel:
- let me know, and show me the output
- and then, after doing that, you might try just downloading the latest coriander, building and installing that, and see if that changes anything (note:
coriander
!=tf_coriander
. coriander is the underlying compiler)
from tf-coriander.
I'm sorry to say I don't know what you're referring to re: "RPATH". Happy to help test, if you give me context and/or suggested shell commands to interrogate this.
As long as the py.test -v
is passing, it's all good. Good news that that is passing now :-)
from tf-coriander.
Cool! :-)
from tf-coriander.
Related Issues (20)
- r1.5 API support?
- Simple benchmark in CPU mode runs order of magnitude longer than in normal TF on Mac
- Problems with build on macOS Sierra HOT 9
- How can I build with updated TensorFlow version
- [Feature Request] CMake Build
- R9 M370X run slower than i7, something wrong?
- Splits above 64 not allowed HOT 1
- install on amd gpu error
- when support tf1.8
- Trying to build for windows 10 HOT 3
- Porting to Tensorflow r1.11 HOT 14
- MacOS, broken on "py.test -v"
- Bazel version build bugs Mac OS High Sierra 10.13.6 HOT 1
- coriander failed to generate opencl HOT 9
- CROSSTOOL setup differences between earlier or later of Bazel 0.17.1 HOT 1
- With Java 9 the rt.jar is gone, and this breaks the make...
- Question about CLBlast tuning and execution
- ModuleNotFoundError: No module named '_pywrap_tensorflow' HOT 2
- Cannot import Keras HOT 1
- couldn't understand the installation process HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tf-coriander.