GithubHelp home page GithubHelp logo

OpenCL support about tensorflow HOT 542 CLOSED

tensorflow avatar tensorflow commented on April 19, 2024 913
OpenCL support

from tensorflow.

Comments (542)

nmabhinandan avatar nmabhinandan commented on April 19, 2024 958

It's strange that Google ditched open OpenCL for proprietary CUDA.
im-just-saying

from tensorflow.

gujunli avatar gujunli commented on April 19, 2024 160

I will be interested in expanding Tensor Flow with OpenCL. As we have already released OpenCL caffe. https://github.com/amd/OpenCL-caffe. Hopefully it can get integrated in light way? Is anyone interested in working together on this?

from tensorflow.

bhack avatar bhack commented on April 19, 2024 62

πŸ‘

from tensorflow.

benoitsteiner avatar benoitsteiner commented on April 19, 2024 32

My apologies for not contributing more to this discussion recently, my plate has been more than full these past 2 weeks.

I'll be coordinating the OpenCL effort on the TensorFlow side. Our current thinking is:

  • TensorFlow relies on c++11 and has taken a "single source" approach, so SYCL seems like a great fit.
  • We don't have a lot of OpenCL experience in house, so we're collaborating closely with Codeplay to bridge this gap. In particular, Codeplay is currently leading the effort to add support for SYCL to the Eigen tensor library.
  • TensorFlow relies on the cuDNN library to compute convolutions on NVidia GPUs. If somebody is interested in contributing an OpenCL equivalent, we'd be happy to help.

In order to help structure the effort, I created a mailing list: [email protected].

from tensorflow.

ebrevdo avatar ebrevdo commented on April 19, 2024 24

At the very least, the Eigen library would have to support OpenCL.

from tensorflow.

lukeiwanski avatar lukeiwanski commented on April 19, 2024 13

Hi all,

Just to keep you posted, we are still investigating how we can change the Eigen interface to better fit the SYCL/OpenCL 1.2 programming model.
Once we come up with a reasonable approach that targets heterogeneous programming models ( not only OpenCL / SYCL ) we will create a proposal.

Thanks,
Luke

from tensorflow.

gujunli avatar gujunli commented on April 19, 2024 13

Pls keep me update. I developed opencl-caffe for AMD. I am also looking at
tensor flow.

Thanks.
Junlu
On Dec 8, 2015 10:19 AM, "Luke Iwanski" [email protected] wrote:

Hi all,

Just to keep you posted, we are still investigating how we can change the
Eigen interface to better fit the SYCL/OpenCL 1.2 programming model.
Once we come up with a reasonable approach we will create a proposal.

Thanks,
Luke

β€”
Reply to this email directly or view it on GitHub
#22 (comment)
.

from tensorflow.

dhess avatar dhess commented on April 19, 2024 10

thumbs up and all that.

from tensorflow.

VincentSC avatar VincentSC commented on April 19, 2024 7

The website http://opencl.org is created to support open source porting projects just like these! We're currently installing all necessary tools at the website and have space for repositories at https://github.com/OpenCL/ - later on we're adding build-servers to test for several types of hardware and can provide our expertise in how to write code that runs at full speed on numerous hardware.

We're launching a porting initiative for GEGL next week, but we're happy to also support you.

from tensorflow.

lukeiwanski avatar lukeiwanski commented on April 19, 2024 6

Hi all,

Here at Codeplay we are looking into Eigen's tensor running on GPU using SYCL (a modern C++ layer on top of OpenCL). From what we have gathered so far, GPU tensor design is very closely coupled with CUDA and it will require interface changes for another programming model and particularly a SYCL and OpenCL 1.2 version.

If anyone is interested in digging deeper / helping out, we are most certainly interested in contributing.

Thanks,
Luke

from tensorflow.

jszuppe avatar jszuppe commented on April 19, 2024 6

πŸ‘ I can help code some OpenCL/SYCL if someone makes a plan, divides work into tasks etc. I recommend using Boost.Compute as a wrapper for OpenCL (it makes running kernels, testing, templating easier).

from tensorflow.

 avatar commented on April 19, 2024 6

+1.
I've an AMD GPU and an Intel GPU in the laptop. I think both have OpenCL drivers and AMD's support seems to be much better. I'd have higher performance, because I've 2 OpenCL devices. I hope you make it scale with OpenCL devices.

from tensorflow.

bhack avatar bhack commented on April 19, 2024 5

/cc @ptillet @gongzg Is there any interest in this by Intel? I really hope that we don't fragment OPENCL here like in Caffe where we have an AMD fork, Intel unmerged PRs, another semi-unofficial AMD PR, and a long staging user PR (plus two old abandoned Opencl efforts). If somebody is interested in the history can take a look at BVLC/caffe#2610 comments.

from tensorflow.

jamesliu96 avatar jamesliu96 commented on April 19, 2024 4

πŸ‘

from tensorflow.

lukeiwanski avatar lukeiwanski commented on April 19, 2024 4

hi all,

we will coordinate the effort of porting Eigen’s tensor module to SYCL for OpenCL as we already have something mostly working, but it’s not ready for review yet.

We are in favour of this approach as it will introduce less invasion to the code base. SYCL supports the single-source C++ templated model that eigen already uses.

Road map design is in progress so it shouldn’t be too long now.

Thanks,
Luke

from tensorflow.

lukeiwanski avatar lukeiwanski commented on April 19, 2024 4

Hi all,

Thanks for the interest!
At this point we are getting our testing infrastructure set up to make sure that nothing that we do introduces regression.
We are in touch with @benoitsteiner to make sure we are in sync with what he's done so far.

We are still in compiling a road map for the integration process - it should be done in couple weeks time, as there is a couple of business details to clarify.

Our goal is to bring the OpenCL to TensorFlow via Eigen by end of this year.

Thanks,

from tensorflow.

ZirconCode avatar ZirconCode commented on April 19, 2024 3

πŸ‘

from tensorflow.

alexatknit avatar alexatknit commented on April 19, 2024 2

πŸ‘

from tensorflow.

ieee8023 avatar ieee8023 commented on April 19, 2024 2

+1

from tensorflow.

gongzg avatar gongzg commented on April 19, 2024 2

@bhack We do have interest in this. Thanks for letting me know. If there is a proposal for Eigen's OpenCL/SYCL implementation, we will see what we can do from Intel side.

from tensorflow.

Iolaum avatar Iolaum commented on April 19, 2024 2

+1

Hope people working on it manage to overcome the CUDNN alternative problem by the time tensorflow gets close to 1.0

from tensorflow.

hsaputra avatar hsaputra commented on April 19, 2024 2

@martinwicke why is this issues closed ?

I don't think your commit fixes this.

from tensorflow.

vrv avatar vrv commented on April 19, 2024 2

Oh GitHub

from tensorflow.

nmabhinandan avatar nmabhinandan commented on April 19, 2024 1

would be great.

from tensorflow.

sasadep avatar sasadep commented on April 19, 2024 1

πŸ‘

from tensorflow.

armish avatar armish commented on April 19, 2024 1

πŸ‘

from tensorflow.

lukeiwanski avatar lukeiwanski commented on April 19, 2024 1

@bhack We are in contact with @benoitsteiner, but we will discuss our proposal with the upstream maintainers before we invest too much effort.

@DanMcLaughlin , @ville-k We are developing our implementation of SYCL, ComputeCpp (https://www.codeplay.com/products/computecpp). For more information, can you please contact me off-list via the email address on my profile?

from tensorflow.

bhack avatar bhack commented on April 19, 2024 1

@benoitsteiner Thank you for the update. It would be wonderful if all involved partners in @KhronosGroup (Google, Nvidia, Amd, Intel, Codeplay, Xilinx etc.) will promote a cudnn like API in a standardized way. A sort of Khronos openvx computer vision standardization effort but for deep learning.

from tensorflow.

karlrupp avatar karlrupp commented on April 19, 2024 1

@bhack Which new Google group?

Other than that, OpenCL and CUDA are too different programming approaches. CUDA works the way it is because one company has full control over everything, so it can embed binary blobs and who knows what in the final executable. This cannot be done with OpenCL, unless one goes down the SyCL path (I have my concerns...) and the SyCL compiler vendor has full control over all possible target architectures (unlikely or impossible in practice). Overall, my opinion is that a good OpenCL-enabled library needs more than just a few tweaks here and there. Probably not what you wanted to hear, but you asked for my opinion :-)

from tensorflow.

bhack avatar bhack commented on April 19, 2024

@gujunli Nice to see AMD here. /cc @naibaf7 @lunochod

from tensorflow.

bhack avatar bhack commented on April 19, 2024

/cc @lukeiwanski for Eigen/OpenCL/SYCL

from tensorflow.

ankdesh avatar ankdesh commented on April 19, 2024

@gujunli Certainly would be interested in contributing. Please let me know when you plan to start.

from tensorflow.

bhack avatar bhack commented on April 19, 2024

@lukeiwanski Thank you for the feedback. I think that @benoitsteiner worked at the tensor extension part of eigen.

from tensorflow.

bhack avatar bhack commented on April 19, 2024

An interesting initiative at https://github.com/ptillet/isaac also if here we rely on Eigen tensor extension.

from tensorflow.

DanMcLaughlin avatar DanMcLaughlin commented on April 19, 2024

I also would like to contribute. @benoitsteiner can you organize it?

from tensorflow.

bhack avatar bhack commented on April 19, 2024

This was included in the Roadmap but also tagged as contribution so a direction/bootstrap could be really useful.

from tensorflow.

gujunli avatar gujunli commented on April 19, 2024

I can contribute to organize it. who is responsible for OpenCL support in
Tensor flow now?

Thanks a lot.
Junli

On Tue, Jan 19, 2016 at 7:50 AM, bhack [email protected] wrote:

This was included in the Roadmap but also tagged as contribution so a
direction/bootstrap could be really useful.

β€”
Reply to this email directly or view it on GitHub
#22 (comment)
.


Junli Gu--谷俊丽
Coordinated Science Lab
University of Illinois at Urbana-Champaign


from tensorflow.

DanMcLaughlin avatar DanMcLaughlin commented on April 19, 2024

I just assumed Benoit because he self assigned the feature, but I think you've got it Junli! Maybe start with an email or forum thread of interested parties?

from tensorflow.

martinwicke avatar martinwicke commented on April 19, 2024

@benoitsteiner knows more about interested parties that may not have shown
up in this thread (or this issue). I'd wait for him to coordinate to make
sure we avoid duplicating work.

On Tue, Jan 19, 2016 at 11:42 AM Dan McLaughlin [email protected]
wrote:

I just assumed Benoit because he self assigned the feature, but I think
you've got it Junli! Maybe start with an email or forum thread of
interested parties?

β€”
Reply to this email directly or view it on GitHub
#22 (comment)
.

from tensorflow.

MikalaiDrabovich avatar MikalaiDrabovich commented on April 19, 2024

I'm interested. Is there any roadmap?

On Jan 19, 2016, at 11:46 AM, Martin Wicke [email protected] wrote:

@benoitsteiner knows more about interested parties that may not have shown
up in this thread (or this issue). I'd wait for him to coordinate to make
sure we avoid duplicating work.

On Tue, Jan 19, 2016 at 11:42 AM Dan McLaughlin [email protected]
wrote:

I just assumed Benoit because he self assigned the feature, but I think
you've got it Junli! Maybe start with an email or forum thread of
interested parties?

β€”
Reply to this email directly or view it on GitHub
#22 (comment)
.

β€”
Reply to this email directly or view it on GitHub.

from tensorflow.

hsaputra avatar hsaputra commented on April 19, 2024

Is there a list of CUDA dependency libraries that Tensorflow relying on?

This would help to see if we could have immediate OpenCL alternatives.

from tensorflow.

naibaf7 avatar naibaf7 commented on April 19, 2024

@hsaputra
There is clFFT, clBLAS (alternatively ViennaCL). Random number generator is a bit more tricky (no curand), either use a CPU generator and transfer to GPU or use another existing kernel for RNG.

The biggest pitfall will again be efficient convolution implementations (something like cuDNN).

There is experience about such issues here:
BVLC/caffe#2610
BVLC/caffe#2195
https://github.com/amd/OpenCL-caffe

from tensorflow.

bhack avatar bhack commented on April 19, 2024

Tensorflow use tensor extension upstreamed to Eigen. So I think that an Opencl/Sycl support to Eigen is needed. See this thread

from tensorflow.

hsaputra avatar hsaputra commented on April 19, 2024

Thanks @naibaf7. Yeah, I don't think there is a viable alternative for cuDNN for OpenCL right now.

from tensorflow.

DanMcLaughlin avatar DanMcLaughlin commented on April 19, 2024

@bhack from that thread and here it seems like @lukeiwanski is looking into it. I think we have enough willing people to work on it, we just need @benoitsteiner, @lukeiwanski or @gujunli to coordinate. Benoit has been quiet, maybe he's on holiday.

from tensorflow.

hsaputra avatar hsaputra commented on April 19, 2024

I would love to help contribute with this initiative.

from tensorflow.

bhack avatar bhack commented on April 19, 2024

@lukeiwanski Are you working or in contact with upstream? Do you think will be accepted upstream in Eigen?

from tensorflow.

Konard avatar Konard commented on April 19, 2024

+1

from tensorflow.

DanMcLaughlin avatar DanMcLaughlin commented on April 19, 2024

Great news @lukeiwanski, let us know of any help you need.

I'll guess you are using your own implementation of SYCL - will that be available for developers/researchers? On what platforms?

from tensorflow.

ville-k avatar ville-k commented on April 19, 2024

@lukeiwanski SYCL seems like the right way to go given the amount of template metaprogramming involved with Eigen. I'm an experienced c++ developer with OpenCL experience gained from developing my own neural nets and linear algebra library. I'd love to help with this effort and get started developing with SYCL.

from tensorflow.

MikalaiDrabovich avatar MikalaiDrabovich commented on April 19, 2024

@lukeiwanski is there any update/estimate regarding plans?

from tensorflow.

strin avatar strin commented on April 19, 2024

interested. would love to contribute.

from tensorflow.

bhack avatar bhack commented on April 19, 2024

Ok so actually seems that it is an effort of Codeplay with some kind of sync to Google internal. What are the role of AMD and Intel subscribers here?

from tensorflow.

bhack avatar bhack commented on April 19, 2024

/cc @keryell if you have any interest on this from SYCL/FPGA universe

from tensorflow.

keryell avatar keryell commented on April 19, 2024

@bhack sure I have some interest for high-end C++ on FPGA :-)
TensorFlow sounds like a good validation use-case for triSYCL too.
By the way, if some people here are looking for some internships on this subject, I have some positions. It looks like Codeplay is looking for some people too, if I trust their web site.

from tensorflow.

bhack avatar bhack commented on April 19, 2024

I'm really interested in @karlrupp and @hughperkins opinions. I hope they want to join in the discussion on the new google group.

from tensorflow.

bhack avatar bhack commented on April 19, 2024

@karlrupp See #22 (comment) at the end for the google group.
I asked your opinion cause you have a great experience with ViennaCL interfacing an algebra library with multiple backends (CPU, GPU, MIC). Tensorflow rely on Eigein library and its new tensor extension contributed by Google upstream (but only with CUDA backend). I think that they don't experienced much all the pitfall you have already encountered with ViennaCL in this years of development.

from tensorflow.

keryell avatar keryell commented on April 19, 2024

@bhack We are currently at the face-to-face meeting in Seattle this week but of course I cannot say whether we are talking about DNN libraries or not... :-)

from tensorflow.

bhack avatar bhack commented on April 19, 2024

@keryell Try to push the cause in Seattle ;)

from tensorflow.

keryell avatar keryell commented on April 19, 2024

@karlrupp You are right, OpenCL and CUDA are too different programming approaches. The single-source aspect found for example in CUDA and OpenMP 4.5 is extremely powerful from a software engineering perspective. This is why there is this SYCL standard for the real C++ programmers. SYCL can be seen as CUDA on steroids without any language extension and with some OpenMP aspects (the tasks). A typical SYCL device compiler is expected to generate SPIR-V kernels.

Your concerns about portability are less an issue with the SPIR-V standard (kind of portable equivalent of nVidia PTX/AMDIL/... in the Vulkan & OpenCL world) which is mandatory to accept in OpenCL 2.1 and Vulkan. So the beauty is that if you have a front-end that generates SPIR-V, you do not need special knowledge of the very details of the hardware to run on. There is a Khronos open-source bidirectional translator between LLVM IR and SPIR-V, so it opens quite new territories.

from tensorflow.

karlrupp avatar karlrupp commented on April 19, 2024

@keryell I agree that SPIR-V is a step forward. However, it does not address all issues of exhaustive jitting.

you do not need special knowledge of the very details of the hardware to run on

Is this a copy&paste from OpenCL 1.0 marketing, which claimed exactly the same? You will always need to go down to the details of the underlying hardware if you aim for maximum performance. This is especially the case in the context of fast tensor contractions.

from tensorflow.

bhack avatar bhack commented on April 19, 2024

...as @scott-gray demonstrated with neon

from tensorflow.

keryell avatar keryell commented on April 19, 2024

@karlrupp

Is this a copy&paste from OpenCL 1.0 marketing, which claimed exactly the same?

Haha. :-)

You will always need to go down to the details of the underlying hardware if you aim for maximum performance. This is especially the case in the context of fast tensor contractions.

Of course, but before playing with the second-order optimization, it is useful to have the huge part of the whole templated C++ code running in some accelerated way.

For the optimization, either you stitch your optimized binary kernels Γ  la NervanaSys or, since SYCL is pure C++, you can use asm("...") in it with a lot of #ifdef to test the target architecture. :-) That said, SPIR-V is itself extensible and I cannot see why we could not put inline VHDL or Verilog in it at some point. :-)

But more concretely, the recent introduction of sub-group operations should help to achieve good performance in a portable way and using simple built-in ad-hoc functions may help.

C++ adds interesting metaprogramming features that allows to replace most of the code generators used such as in clBLAS or other frameworks to generate code more adapted to X or Y hardware.

from tensorflow.

bhack avatar bhack commented on April 19, 2024

Also N4355 in c++17 could enter in the game soon or later

from tensorflow.

benoitsteiner avatar benoitsteiner commented on April 19, 2024

@karlrupp, @bhack The tensorflow approach is to rely on a hardware abstraction (the tensor module) for the majority of the operations needed in by a typical neural network, while relying on specialized libraries (such as cudnn) for the few operations that are really critical performance wise. The hardware abstraction enables us to implement most TensorFlow operations once and have them run on an accelerator with more than good enough performance.

from tensorflow.

keryell avatar keryell commented on April 19, 2024

@bhack Yes I love multidimensional arrays. Also in our domain of interest, there is the SG14 in the C++ committee that tries to have all the people interested in these issues to converge into the standard.
https://groups.google.com/a/isocpp.org/forum/#!forum/sg14
Of course SYCL is in the discussions. :-)

from tensorflow.

bhack avatar bhack commented on April 19, 2024

@benoitsteiner Mainly on cudnn for pooling and convolution. I think that if every vendor will produce an API with its own hardware for this operations with its own binary assembly will not be a so scalable approach. That is why I think some performance crucial API calls would be better to be standardized in some way.

from tensorflow.

bhack avatar bhack commented on April 19, 2024

@keryell There are really interesting topics for Matrix/Tensor in the new SG14 c++ specially in vector/SIMD calls agenda. But seems that nobody talked of convolution, pooling, and others useful "stabilized" deep learning interfaces. Also seems to me that in this specific standardization subgroups there are people from Nvidia, Intel, Amd, CodePlay etc.. but not from Google also if it is in others groups.

from tensorflow.

Andyccs avatar Andyccs commented on April 19, 2024

πŸ‘

from tensorflow.

keryell avatar keryell commented on April 19, 2024

@bhack Yes there is no machine-learning style proposal in SG14 yet. But participation is open, so you can send some proposals. :-) But perhaps SG6 (numerics topics) is more relevant. I do not think they have their own mailing-list/forum yet.

from tensorflow.

krikru avatar krikru commented on April 19, 2024

@gujunli Does OpenCL Caffe run on Android? Sorry for asking this here but I didn't find anywhere else to ask it :) Would be great with a deep learning library that ran on Android devices and could use the GPU but it seems like there are no at the moment. (Correct me if I'm wrong!)

from tensorflow.

naibaf7 avatar naibaf7 commented on April 19, 2024

@krikru
The official (but experimental) OpenCL Caffe branch can be made to run on Android GPUs, however the performance at the moment is far from optimal. See sh1r0/caffe-android-lib#23 and https://github.com/BVLC/caffe/tree/opencl.

from tensorflow.

bhack avatar bhack commented on April 19, 2024

A real alternative to cudnn could be the extension of OpenVx standard objects with support to Tensor, NdConvolution, NdPooling operators and (probably) some other operator that could be considered standardizable.
Also cudnn team need to make some choice on what new API and operators they will introduce in every release. Of course a standard can not move as fast as cudnn releases but I think some operations and objects has enough "citations history" to be standardized.

from tensorflow.

krikru avatar krikru commented on April 19, 2024

@hughperkins At the moment, I haven't tried any deep learning library; I'm just doing some scouting to see which library I could potentially use. Have you tried cltorch and DeepCL on Android? I just assumed cltorch did work on Android, since there is an implementation of Torch that is dedicated specifically for Android. And why would you have such an implementation if there already was one that both worked on Android and used OpenCL, right? But maybe I should have known better.

from tensorflow.

krikru avatar krikru commented on April 19, 2024

@hughperkins For some reason I imagined that torch-android was an official Torch implementation for Android, meaning that no other Torch implementation (at least not official) was likely to run smoothly on Android, including cltorch. I don't know why I thought that, it of course doesn't make any sense.

from tensorflow.

hughperkins avatar hughperkins commented on April 19, 2024

Well... Soumith kind of coordinates torch development. He works at Facebook AI Research. So, since torch-android repo belongs to Soumith, I would say it's fairly close to official. But it maybe is not part of core for some reason. I guess you can ask the question as an issue in that repo, or in https://groups.google.com/forum/#!forum/torch7 Actually, since Soumith is kind of the main person that handles the requests in https://groups.google.com/forum/#!forum/torch7 , I reckon you probably want to post your question there.

from tensorflow.

hughperkins avatar hughperkins commented on April 19, 2024

meaning that no other Torch implementation (at least not official) was likely to run smoothly on Android, including cltorch

Note that cltorch is not an implementatino of torch. It's a plugin, thta provides OpenCL. You need both.

from tensorflow.

krikru avatar krikru commented on April 19, 2024

Note that cltorch is not an implementatino of torch. It's a plugin, thta provides OpenCL. You need both.

Ah, thanks for the clarification.

from tensorflow.

krikru avatar krikru commented on April 19, 2024

@naibaf7 Do the OpenCL Caffe branch and the OpenCL Caffe implementation by AMD have anything more in common besides the name? Have you compared the two or do you know if there is any difference in performance? You write that the OpenCL branch is far from optimal performance. What does that mean and what would be necessary in order to improve it? It would be interesting to try it on Android.

from tensorflow.

bhack avatar bhack commented on April 19, 2024

We are going off topic

from tensorflow.

krikru avatar krikru commented on April 19, 2024

@bhack Yeah, sorry for hijacking this thread. I just didn't know where to ask the question.

from tensorflow.

naibaf7 avatar naibaf7 commented on April 19, 2024

@krikru
please raise an issue about it on the Caffe branch, flag it with Android and OpenCL. Then we can discuss this further. Thanks.

from tensorflow.

bhack avatar bhack commented on April 19, 2024

@keryell Seems that the next f2f SG14 meeting in March will be hosted by Google. Will be any tensorflow internal there?

from tensorflow.

bhack avatar bhack commented on April 19, 2024

/cc @jfbastien

from tensorflow.

keryell avatar keryell commented on April 19, 2024

Perhaps @benoitsteiner could drop by, since he is local.
But before this event there is the full C++ F2F at the end of the month in Jacksonville, Florida.
https://isocpp.org/files/papers/N4568.pdf
Unfortunately I will not be able to attend any of them.

from tensorflow.

bhack avatar bhack commented on April 19, 2024

I don't know if CppCon 2015 talk C++ Multi-dimensional Arrays for Computational Physics and Applied Mathematics generated some paper follow-up.

from tensorflow.

dimchansky avatar dimchansky commented on April 19, 2024

+1

from tensorflow.

keryell avatar keryell commented on April 19, 2024

@bhack Thank you for pointing the talk on multi-dimensional arrays. It is interesting and address the real issues but looks too ad-hoc to be ratified in C++ as is. Personally I use Boost.MultiArray and I am more confident in a polished version of Boost.MultiArray.

from tensorflow.

bhack avatar bhack commented on April 19, 2024

There are also some papers at WG21. As you can see @jfbastien at Google has some activity at WG21 and also helped to host the SG14 f2f meeting at Google in March.

from tensorflow.

jfbastien avatar jfbastien commented on April 19, 2024

@bhack @keryell I think it would be worth taking this discussion to the SG14 mailing list as the details aren't related to OpenCL / tensorflow.

from tensorflow.

bhack avatar bhack commented on April 19, 2024

Yes probably it is no more so strictly confined here with all the details. Other than Eigen/sycl support Is there a plan for the cudnn calls?

from tensorflow.

andyyehoo avatar andyyehoo commented on April 19, 2024

+1 very interesting topic. Hope it coming soon.

from tensorflow.

strin avatar strin commented on April 19, 2024

This thread is very interesting. I've been trying to get caffe to work on android. The results seem to be surprising: caffe running with Mali gpu seems to be 2-3 slower than cpu, but about 4-5x more energy efficient. The test was run on Galaxy S6 (Mali T760, Peak Performance 200 GFlops).

Since GEMM is the core of convolution in caffe, I decided to profile its performance on Android. It seems that ViennaCL is not as efficient as some simple kernels. Now I am able to get GPU run as fast as CPU for large matrices (2k x 2k). This is still counter-intuitive, since normally we expect GPUs to be much faster.

See:
https://github.com/strin/mocha-profile

The kernel implementations can be found here:

OpenCL kernels for GEMM: https://github.com/strin/gemm-android

Any thoughts?

from tensorflow.

bhack avatar bhack commented on April 19, 2024

@strin Have you already followed this thread https://community.arm.com/thread/4935?

from tensorflow.

strin avatar strin commented on April 19, 2024

@bhack thanks for sharing. this thread looks very interesting. i tried to turn of the DVFS as suggested, but no significant performance was seen for sgemm in ViennaCL.

from tensorflow.

iaroslav-ai avatar iaroslav-ai commented on April 19, 2024

+1

from tensorflow.

bhack avatar bhack commented on April 19, 2024

@strin Have you tried the last sgemm version in the MALI SDK?

from tensorflow.

edmondja avatar edmondja commented on April 19, 2024

Tensorflow is latee ! ahah
https://gist.github.com/jarutis/ff28bca8cfb9ce0c8b1a

from tensorflow.

bhack avatar bhack commented on April 19, 2024

This will have an impact on the strategy: http://lists.llvm.org/pipermail/llvm-dev/2016-March/096576.html?
EDIT:
"StreamExecutor is currently used as the runtime for the vast majority of Google's internal GPGPU applications, and a snapshot of it is included in the open-source TensorFlow_ project, where it serves as the GPGPU runtime."

from tensorflow.

bhack avatar bhack commented on April 19, 2024

You can't always use the same commit comments in different repository ;) tensorflow/skflow#22

from tensorflow.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.