'tf.random_normal' broken on Ubuntu 16.04/NVIDIA runs ok, but

Looking good for the py.test output. :) <p dir="a

UPdates on this: on Mac, turns out tha the first 4 random numb

'tf.random_normal' broken on Ubuntu 16.04,about hughperkins/tf-coriander

Comments (34)

cathalgarvey commented on May 30, 2024 1

Honestly, not sure at this stage. I'll probably be working mostly with text-based models, RNNs/LSTMs etcetera, for the moment. I'd love to start playing with convnets but I'm not at that yet. :)

So for me, even being able to correctly initialise nets is 🎈 🎉

from tf-coriander.

cathalgarvey commented on May 30, 2024 1

Looking good for the py.test output. :)

I'm sorry to say I don't know what you're referring to re: "RPATH". Happy to help test, if you give me context and/or suggested shell commands to interrogate this.

Going to start running a few TensorFlow-Examples scripts now, also.

test_results.txt

from tf-coriander.

cathalgarvey commented on May 30, 2024 1

So far, all examples I've tried from TensorFlow-Examples have worked perfectly, after I shim out the GPU specifying context managers. I'm training convolutional_network.py now and it's learing.. I don't know how well it's learning, but the loss generally is going down, and the accuracy is generally going up, so I'll mark that down as "success" for now. :)

Thanks again!

from tf-coriander.

cathalgarvey commented on May 30, 2024

Would be important to learn whether NVidia or Ubuntu are to blame, because NVidia have been accused of deliberately leaving their OpenCL drivers in a buggy/bad/slow state.

I ought to have a new AMD GPU later today, will try to test this out.

It would be interesting to consider a "software" fallback, though; e.g. an OpenCL kernel that, seeded by system entropy from /dev/urandom, could generated ~CSPRNG output without relying on hardware entropy sources on the card. I wouldn't use it for crypto, but it would be fine for seeding random distributions? Some key-streams are very minimal and might be easy to implement, although most generate random ints, rather than floats. I'm not sure how irritating it is to type-cast from within OpenCL kernels.

from tf-coriander.

hughperkins commented on May 30, 2024

By default, if we choose not to register a GPU kernel, it will use a CPU kernel in its place.

I'd be very interested to know if this is Ubutnu specific or NVIDIA specific. This is key information for deciding the future of this issue.

from tf-coriander.

cathalgarvey commented on May 30, 2024

Just booted into my AMDGPU-pro driver environment for the AMD R9 390, Ubuntu 16.04, Intel i5, and ran tests:

cathal@thinkum:~/tf-coriander$ py.test
================================================================= test session starts =================================================================
platform linux -- Python 3.5.2, pytest-3.1.0, py-1.4.33, pluggy-0.4.0
rootdir: /home/cathal/tf-coriander, inifile: pytest.ini
plugins: pep8-1.0.6
collected 99 items 

tensorflow/stream_executor/cl/test/conftest.py .
tensorflow/stream_executor/cl/test/measure_binary_ops_perf.py .
tensorflow/stream_executor/cl/test/measure_reduction_ops_perf_bybatchsize.py .
tensorflow/stream_executor/cl/test/measure_reductions_perf.py .
tensorflow/stream_executor/cl/test/measure_unary_ops_perf.py .
tensorflow/stream_executor/cl/test/measure_unary_ops_perf_bybatchsize.py .
tensorflow/stream_executor/cl/test/run_unary_op.py .
tensorflow/stream_executor/cl/test/test_binary_ops.py .xx................
tensorflow/stream_executor/cl/test/test_blas.py ..
tensorflow/stream_executor/cl/test/test_common.py .
tensorflow/stream_executor/cl/test/test_gradients.py ..
tensorflow/stream_executor/cl/test/test_loss.py ..
tensorflow/stream_executor/cl/test/test_misc.py .........s
tensorflow/stream_executor/cl/test/test_nn.py ..
tensorflow/stream_executor/cl/test/test_random.py .FF..
tensorflow/stream_executor/cl/test/test_reductions.py ...........................
tensorflow/stream_executor/cl/test/test_simple.py ..
tensorflow/stream_executor/cl/test/test_softmax.py ....
tensorflow/stream_executor/cl/test/test_unary_ops.py ................

------------------------------------- generated xml file: /home/cathal/tf-coriander/test/junit-pytest-report.xml --------------------------------------
=============================================================== short test summary info ===============================================================
FAIL tensorflow/stream_executor/cl/test/test_random.py::test_random_normal[shape0]
FAIL tensorflow/stream_executor/cl/test/test_random.py::test_random_normal[shape1]
SKIP [1] tensorflow/stream_executor/cl/test/test_misc.py:158: Need to fix passing float** to kernel for this to work
XFAIL tensorflow/stream_executor/cl/test/test_binary_ops.py::test[uint8-div-a / b]
XFAIL tensorflow/stream_executor/cl/test/test_binary_ops.py::test[uint8-mul-a * b]
====================================================================== FAILURES =======================================================================
_____________________________________________________________ test_random_normal[shape0] ______________________________________________________________

shape = (3, 4)

    @pytest.mark.parametrize(
        'shape',
        shapes)
    def test_random_normal(shape):
        with tf.Graph().as_default():
            with tf.device('/gpu:0'):
                W_t = tf.Variable(tf.random_normal(shape))
                mu_t = tf.reduce_mean(W_t)
                var_t = tf.reduce_mean(W_t * W_t)
    
                with tf.Session(config=tf.ConfigProto(log_device_placement=False)) as sess:
                    sess.run(tf.initialize_all_variables())
                    W, mu, var = sess.run((W_t, mu_t, var_t))
                if np.prod(W.shape) < 20:
                    print('W', W)
                else:
                    print('W.reshape(-1)[:20]', W.reshape(-1)[:20])
                print('mu', mu, 'var', var)
                assert abs(mu) < 1.0
>               assert var > 0.05
E               assert 0.0 > 0.05

tensorflow/stream_executor/cl/test/test_random.py:34: AssertionError
---------------------------------------------------------------- Captured stdout call -----------------------------------------------------------------
W [[ 0.  0.  0.  0.]
 [ 0.  0.  0.  0.]
 [ 0.  0.  0.  0.]]
mu 0.0 var 0.0
---------------------------------------------------------------- Captured stderr call -----------------------------------------------------------------
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1083] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Hawaii, pci bus id: 0000.0000)
_____________________________________________________________ test_random_normal[shape1] ______________________________________________________________

shape = (50, 70, 12)

    @pytest.mark.parametrize(
        'shape',
        shapes)
    def test_random_normal(shape):
        with tf.Graph().as_default():
            with tf.device('/gpu:0'):
                W_t = tf.Variable(tf.random_normal(shape))
                mu_t = tf.reduce_mean(W_t)
                var_t = tf.reduce_mean(W_t * W_t)
    
                with tf.Session(config=tf.ConfigProto(log_device_placement=False)) as sess:
                    sess.run(tf.initialize_all_variables())
                    W, mu, var = sess.run((W_t, mu_t, var_t))
                if np.prod(W.shape) < 20:
                    print('W', W)
                else:
                    print('W.reshape(-1)[:20]', W.reshape(-1)[:20])
                print('mu', mu, 'var', var)
                assert abs(mu) < 1.0
>               assert var > 0.05
E               assert 0.0 > 0.05

tensorflow/stream_executor/cl/test/test_random.py:34: AssertionError
---------------------------------------------------------------- Captured stdout call -----------------------------------------------------------------
W.reshape(-1)[:20] [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.]
mu 0.0 var 0.0
---------------------------------------------------------------- Captured stderr call -----------------------------------------------------------------
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1083] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Hawaii, pci bus id: 0000.0000)
============================================== 2 failed, 94 passed, 1 skipped, 2 xfailed in 5.64 seconds ==============================================
Device mapping:
/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: Hawaii, pci bus id: 0000.0000
c: /job:localhost/replica:0/task:0/gpu:0
b: /job:localhost/replica:0/task:0/gpu:0
a: /job:localhost/replica:0/task:0/gpu:0
Device mapping:
/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: Hawaii, pci bus id: 0000.0000
c: /job:localhost/replica:0/task:0/gpu:0
b: /job:localhost/replica:0/task:0/gpu:0
a: /job:localhost/replica:0/task:0/gpu:0

Consider this the same output for #13 and #34 also; no segfaults when I manually ran the MNIST example, and the tests didn't all segfault either. :)

So aside from the RNG, things are looking very good for the most recent release! How much is the RNG relied-upon for the other tests, though? If the RNG is always returning 0, then how do the other tests pass? Is it simply not used for seeding the weights of networks, generally?

from tf-coriander.

cathalgarvey commented on May 30, 2024

PS, I'll shortly be wiping and switching to the ROCm driver on a different card, so I don't know if I can continue providing useful test output for the AMDGPU-pro driver + R9 390, at least for a while (until I can afford another MoBo to plug it in! :) ).

from tf-coriander.

hughperkins commented on May 30, 2024

If the RNG is always returning 0, then how do the other tests pass? Is it simply not used for seeding the weights of networks, generally?

Well, the nets will be poorly initialized, but they will still learn, slowly Thank you for the information that this is a general problem on Ubuntu, not specific to NVIDIA hardware. I will take a look.

from tf-coriander.

hughperkins commented on May 30, 2024

UPdates on this:

on Mac, turns out tha the first 4 random numbers match the cpu reults, but next numbers dont, so I'm going to fix taht first, since it's easier than spinnning up aws instances
Turns out the random nubmers are an implementation of this paper http://www.thesalmons.org/john/random123/papers/random123sc11.pdf . Looks quite cool: can generate pseudo-random numbers in parallel, not like Mersenne Twister and similar.

I'm going to run the algorithm on the cpu, and compare it to the gpu result, and see how that goes. The implementation in tensorflow is here: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/lib/random/philox_random.h

from tf-coriander.

hughperkins commented on May 30, 2024

Created simple cpu-side, easy-to-run, easy-to-introspect philoxrandom script at https://github.com/hughperkins/pub-prototyping/blob/master/cpp/testphilox.cpp , which calls into a copied/modified version of the tensorflow PhiloxRandom generator at https://github.com/hughperkins/pub-prototyping/blob/master/cpp/from_tf/philox_random.h

This then gives the same outputs as calling tf.random_uniform, with seed 123:

tensroflow output:

W_cpu [[ 0.04080963  0.20842123  0.09180295  0.70220065]
 [ 0.7073133   0.39646494  0.06650937  0.29188633]
 [ 0.02963269  0.95492315  0.00610673  0.3169049 ]]

test script output:

0: 1887779136 0.0408096
1: 1293593996 0.208421
2: 3473653811 0.091803
3: 257548726 0.702201
4: 727353662 0.707313
5: 2159198045 0.396465
6: 3498607457 0.0665094
7: 3324337288 0.291886
8: 1115933441 0.0296327
9: 1274690284 0.954923
10: 3053504539 0.00610673
11: 3861418071 0.316905

I'm then going to work my way through the opencl kernel, comparing the values in the kernel, with those on the cpu, and finding where they start to differ. I'm going to use COCL_DUMP_CL, COCL_LOAD_CL, and COCL_DUMP_CONFIG to inject values I'm interested in, into an output buffer, and inspect them. https://github.com/hughperkins/coriander/blob/master/doc/advanced_usage.md#runtime-options

from tf-coriander.

cathalgarvey commented on May 30, 2024

Sounds really promising! Can't wait to see this one fixed, I have a feeling a lot of the test cases are "passing", but aren't really behaving as intended, because of poor initialisation. In the meantime I was planning to just monkey-patch this with the numpy-derived random hack, above, and see how that improves things. :P

from tf-coriander.

hughperkins commented on May 30, 2024

The test cases dont need initialization: they test specific operations. Please check them, and let me know any test cases you feel are weak, and how we can improve them.

from tf-coriander.

hughperkins commented on May 30, 2024

I mean, when you do py.test -v, those test specific operations. The end-to-end tests are the scripts we are using from Aymeric Damien's Tensorflow-Examples.

from tf-coriander.

hughperkins commented on May 30, 2024

But yeah, monkey-patching will work, as long as you dont use AdamOptimizer, which initializes itself from tf.random. It'll initialize itself really poorly currenlty, and learning will suck. If you use SGD optimizer, it should work ok-ish.

from tf-coriander.

cathalgarvey commented on May 30, 2024

Sorry, I meant the Tensorflow-Examples repo, yes. :) When I was running those previously I noticed that some of the examples weren't learning at any appreciable rate, which was probably down to the randn initialisation being poor. Or worse, if things were being set to 0, then perhaps a lot of neurons were simply dead. I'm running it all again after monkey-patching Numpy's randn in, and they seem to be learning again.

Good to know about Adam! Possibly the monkey-patch would have to be applied further up the chain to work on Adam, then.. This is all running in a virtualenv, so I don't mind doing radical surgery on the Random lib if it lets me use TF. :P

from tf-coriander.

cathalgarvey commented on May 30, 2024

...I'm perusing the code for AdamOptimizer, but I don't actually see any calls to tf.random in it or its superclass. Is this instead happening somewhere else, like in tf.initialize_all_valirables?

from tf-coriander.

hughperkins commented on May 30, 2024

Hmmm, just noticed my earlier reply failed to send:

Sorry, I meant the Tensorflow-Examples repo, yes. :) When I was running those previously I noticed that some of the examples weren't learning at any appreciable rate, which was probably down to the randn initialisation being poor

Yes thats true. You are right.

Possibly the monkey-patch would have to be applied further up the chain to work on Adam, then.

Possibly, but sounds a lot of work.

And thence onto new reply :-)

...I'm perusing the code for AdamOptimizer, but I don't actually see any calls to tf.random in it or its superclass. Is this instead happening somewhere else, like in tf.initialize_all_valirables?

So, what I would do would probably be in first instance to view the graph on the tensorboard. tensorboard rocks :-) . https://www.tensorflow.org/get_started/graph_viz

from tf-coriander.

cathalgarvey commented on May 30, 2024

Great advice, thanks!

from tf-coriander.

hughperkins commented on May 30, 2024

Update: tf.random_uniform enhanced tests works on Mac now :-) . Means, tf.random_uniform gives same results as cpu now, not like before. Fixed in 4c1c544 , but specifically in hughperkins/coriander@6452be7

I'm hoping that the experience in PhiloxRandom for Mac will help me to fix the Ubuntu version.

from tf-coriander.

hughperkins commented on May 30, 2024

(well... that's odd... the opencl generated on Mac is different than on Ubuntu. Thats quite unexpected ... since the opencl generation is my own code, therefore invariant, and it's based on the compilation using llvm-4.0, which is also invariant. Oh .... different standard libraries. Maybe for that)

Mac sample:

struct class_tensorflow__random__Array {
    int f0[4];
};
struct class_tensorflow__random__Array_0 {
    int f0[2];
};
struct class_tensorflow__random__NormalDistribution {
    char f0;
};
struct class_tensorflow__random__PhiloxRandom {
    struct class_tensorflow__random__Array f0;
    struct class_tensorflow__random__Array_0 f1;
};

Ubuntu sample:

struct class_tensorflow__random__Array {
    int f0[4];
};
struct class_tensorflow__random__Array_0 {
    int f0[2];
};
struct class_tensorflow__random__NormalDistribution {
    char f0;
};
struct class_tensorflow__random__Array_1 {
    float f0[4];
};
struct class_tensorflow__random__PhiloxRandom {
    struct class_tensorflow__random__Array f0;
    struct class_tensorflow__random__Array_0 f1;
};

An entire extra struct. Exact same kernel being compiled...

Oh, I just found out (as I'm writing this) why random_normal is broken I reckon. There is a shim for sincosf,but the mangled names are different, so on Mac it is correctly shimmed, but not on Ubuntu. That should be easy to fix...

End of OpenCL on Mac:

        /* int v119 = phi v304 */
        v119 = v304;
        goto v3;
    } else {
        goto v10;
    }
v10:;
    goto v11;
v11:;
    return;
}

End of OpenCL on Ubuntu:

        goto v3;
    } else {
        goto v10;
    }
v10:;
    goto v11;
v11:;
    return;
}
void _Z7sincosffPfS_(float v1, float* v2, float* v3, local int *scratch) {

}

That sincos bit has a different mangled name than on Mac, and wasnt detected, but it's easy to tell Coriander the Ubuntu name, and thus detect it.

from tf-coriander.

hughperkins commented on May 30, 2024

Oh... I've found the difference... in Tensorflow, in the random distributions code, they have the following:

#if defined(__linux__)
  sincosf(v1, f0, f1);
#else
  *f0 = sinf(v1);
  *f1 = cosf(v1);
#endif

Different code, depending on linux or not :-O

https://github.com/tensorflow/tensorflow/blob/r0.11/tensorflow/core/lib/random/random_distributions.h#L487-L492

from tf-coriander.

cathalgarvey commented on May 30, 2024

Urgh, so this is an upstream issue, it assumes that Linux's version of available libs (CUDA?) have an extra function defined? Is that something fixable within the coriander framework, just by delegating to sin/cos?

from tf-coriander.

hughperkins commented on May 30, 2024

It's easy to fix, in Coriander, by writing an appropriate shim. There are some shims already https://github.com/hughperkins/coriander/blob/master/src/shims.cpp

Well... easy - ish, since it will be the first shim that needs to handle pointers, and pointers in opencl need address spaces statically declared. That means, I'll need a shim for every combination of address spaces used by the code. ie, in opencl:

float angle = 0.1f;
global float *globala;
global float *globalb;
float *a;
float *b;

sincos(angle, a, b); // both are private
sincos(angle, globala, globalb); // both are global
sincos(angle, a, globalb); // one private, one global...

local float localb[30];
sincos(angle, a, loclalb); // one private, a shared...

Since OpenCL 1.2 is C99, you can only use the name sincos once, for one pair of address spaces, so there'll need to be a unique function name, for every pair of address spaces used, like eg sincos_p_p, sincos_g_g, ... So I'll need to write code to handle that. It's ok, I have various ideas on how to handle this, not a challenging issue, but might take a few man-hours or so.

from tf-coriander.

hughperkins commented on May 30, 2024

Fixed in 7d2deb9 :-) . Turned out didnt need any shims, or address space stuff. Yay :-)

from tf-coriander.

cathalgarvey commented on May 30, 2024

This is great! So, aside from Splitting, this is almost fully-functioning Tensorflow on CL now? :o Congratulations!

from tf-coriander.

hughperkins commented on May 30, 2024

split and conv. but yeah :-) Thanks! :-)

from tf-coriander.

hughperkins commented on May 30, 2024

(well, "fully-functioning". more like "has enough functionality to run some basic conv nets". It'd still be nice to have eg batchnormalization, and some other things, ideally.
)

from tf-coriander.

hughperkins commented on May 30, 2024

Question: what model(s) do you need to run in priority? I cant do everything at once, so if I know which model(s) you are trageting in priority, I can look at those first.

from tf-coriander.

hughperkins commented on May 30, 2024

Cool :-)

from tf-coriander.

hughperkins commented on May 30, 2024

created new v0.17.3 wheel. Dont suppose... do you mind double-checking that tf.random_normal is working ok for you now? Also, maybe run all the unit tests too, ie py.test -v?

from tf-coriander.

cathalgarvey commented on May 30, 2024

Already on it! :)

from tf-coriander.

hughperkins commented on May 30, 2024

:-) . By the way, I'm not sure if the RPATH stuff is correct on ubuntu. If it still fails, it might be because it's using the libcocl.so in /usr/local/lib. So, if it doesnt work after installign the wheel:

let me know, and show me the output
and then, after doing that, you might try just downloading the latest coriander, building and installing that, and see if that changes anything (note: coriander != tf_coriander. coriander is the underlying compiler)

from tf-coriander.

hughperkins commented on May 30, 2024

I'm sorry to say I don't know what you're referring to re: "RPATH". Happy to help test, if you give me context and/or suggested shell commands to interrogate this.

As long as the py.test -v is passing, it's all good. Good news that that is passing now :-)

from tf-coriander.

hughperkins commented on May 30, 2024

Cool! :-)

from tf-coriander.

'tf.random_normal' broken on Ubuntu 16.04 about tf-coriander HOT 34 CLOSED

Comments (34)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs