GithubHelp home page GithubHelp logo

hughperkins / deepcl Goto Github PK

View Code? Open in Web Editor NEW
863.0 76.0 200.0 6.44 MB

OpenCL library to train deep convolutional neural networks

License: Mozilla Public License 2.0

C 7.41% Python 7.40% JavaScript 0.91% C++ 81.67% Shell 0.35% CMake 1.89% Batchfile 0.37%

deepcl's Introduction

DeepCL

DeepCL

OpenCL library to train deep convolutional networks

  • C++
  • OpenCL
  • Deep convolutional
  • Python wrappers
  • Lua wrappers
  • Q-learning

APIs:

Layer types:

  • convolutional
  • max-pooling
  • normalization
  • activation
  • dropout
  • random translations
  • random patches
  • loss

Loss layer types:

  • softmax
  • cross-entropy (synonymous with multinomial logistic, etc)
  • square loss

Trainers:

  • SGD
  • Anneal
  • Nesterov
  • Adagrad
  • Rmsprop
  • Adadelta

Activations:

  • tanh
  • scaled tanh (1.7519 * tanh(2/3x) )
  • linear
  • sigmoid
  • relu
  • elu (new!)

Loader formats:

  • jpegs
  • mnist
  • kgsv2
  • norb

Weight initializers:

  • original
  • uniform
  • more possible...

Multicolumn net also possible, as in McDnn

Example usages

  • obtained 37.2% test accuracy, on next move prediction task, using 33.6 million training examples from kgsgo v2 dataset
    • commandline used ./deepcl_train dataset=kgsgoall netdef=12*(32c5z-relu)-500n-tanh-361n numepochs=15 learningrate=0.0001
    • 2 epochs, 2 days per epoch, on an Amazon GPU instance, comprising half an NVidia GRID K520 GPU (about half as powerful as a GTX780)
  • obtained 99.5% test accuracy on MNIST, using netdef=rt2-8c5z-relu-mp2-16c5z-relu-mp3-150n-tanh-10n numepochs=20 multinet=6 learningrate=0.002
    • epoch time 99.8 seconds, using an Amazon GPU instance, ie half an NVidia GRID K520 GPU (since we are learning 6 nets in parallel, so 16.6seconds per epoch per net)

Installation

Native library installation

This section installs the native libraries, and the command-line tools. You always need to do this part, even if you will use the Python wrappers.

Windows

Pre-requisites:

  • OpenCL-enabled GPU or APU, along with appropriate OpenCL driver installed
  • Tested using Windows 2012 RC2, and (New!) Visual Studio 2015, this is how the CI builds run

Procedure:

  • Download latest binary zip file from http://deepcl.hughperkins.com/Downloads/ (eg from v8.0.0rc8)
  • unzip it, which creates the dist folder
  • To test it:
    • open a cmd
    • run call dist\bin\activate.bat (adjusting the path appropriately for wherever you downloaded deepcl binaries to)
    • now, eg try deepcl_unittests
    • (New!), you can choose which gpu to run tests on now, eg: deepcl_unittests gpuindex=1

Note that you need to "activate" the installation each time you open a new cmd prompt (or you could add appropriate environment variables permanently, using Control Panel | System | Advanced System Settings | Environment Variables)

Linux

Pre-requisites:

  • OpenCL-enabled GPU or APU, along with appropriate OpenCL driver installed (can check by running clinfo, which should show your desired GPU device)
  • Tested using Ubuntu 14.04 32-bit/64-bit

Procedure:

  • Download latest tar file from http://deepcl.hughperkins.com/Downloads/ (eg from v8.0.0rc8)
  • untar it, which creates the dist sub-folder
  • in a bash prompt, run source dist/bin/activate.sh (adjust the path appropriate for wherever you untarred the binaries tar file to)
  • test by doing, from the same bash prompt, eg deepcl_unittests
    • (New!), you can choose which gpu to run tests on now, eg: deepcl_unittests gpuindex=1

Note that you need to "activate" the installation each time you open a new bash prompt (or you can call activate.sh from your .bashrc file)

Python wrappers

  • make sure you already installed the native library, and "activate"d it, by doing call dist\bin\activate.bat, or source dist/bin/activate.sh
  • run pip install --pre DeepCL
  • test by doing python -c "import PyDeepCL; cl = PyDeepCL.DeepCL()"

To build from source

Building from source is only needed if installing from binaries doesn't work for your configuration, or if you want to modify DeepCL.

See Build.md

What if it doesn't run?

  • Check if you have an OpenCL-enabled device on your system
    • ideally a GPU, or accelerator, since there is no attempt to optimize DeepCL for CPUs (at least, not currently, could change, feel free to submit a pull request :-) )
  • Try running gpuinfo (from EasyCL, but built as part of this project too, for ease of use )
    • it should output at least one OpenCL-enabled device
    • if it doesn't, then you need to make sure you have an OpenCL-enabled device, and that appropriate drivers are installed, and that the ICD is configured appropriately (registry in Windows, and /etc/OpenCL/vendors in linux)

What if I need a new feature?

Please raise an issue, let me know you're interested.

  • If it's on my list of things I was going to do sooner or later anyway (see below), I might do it sooner rather than later.
  • If it's to do with usability, I will try to make that a priority

What if I want to contribute myself?

  • please feel free to fork this repository, tweak things, send a pull request. Or get in contact. Or both :-)

Third-party libraries

Hardware/driver specific issues

Related projects

License

Mozilla Public License 2.0

Recent changes

  • 2017 May 2nd:
    • branch update-easycl-mac updated to latest EasyCL, and unit-tests tested on Mac Sierra against:
      • Intel HD Graphics 530 GPU
      • Radeon Pro 450 GPU
    • This latest EasyCL lets you use environment variable CL_GPUOFFSET to select gpus, eg set to 1 for second GPU, or 2 for third
    • Thank you to my employer ASAPP for providing me use of said Mac Sierra :-)
  • 7th August 2016:
    • "standard" version of windows compiler changed from msvc2010 to msvc2015 update 3 (no change to linux/mac)
    • "standard" version of python 3.x on windows changed from 3.4 to 3.5 (no change to linux/mac)
    • (note: python2.7 continues to work as before on all of Windows 32/64, linux, Mac)
    • standard c++ version on linux/mac changed from c++0x to c++11
  • 29th July 2016:
    • python fixes:
      • CHANGE: must use numpy tensors now, array.array no longer accepted
      • New feature: can provide numpy tensors as 4d tensors now, no longer have to be 1d tensors
      • Bug fix: q-learning working again now (hopefully)
  • 26th July 2016:
    • fixed some bugs in manifest loader
    • no longer need to specify the number of images in the first line of the manifest file
    • added gpuindex= option to deepcl_unittests (quite beta for now...)
  • 4th January 2016:
    • fixed a number of build warnings on Mac, both in OpenCL build, and C++ build
  • 3rd January 2016:
  • 27th November:
  • Week of 26th October:
    • created branch clblas-2.8.0, which works with Visual Studio 2015. It uses the latest 2.8.x release of clBLAS. Thank you to jakakonda for helping to test this and get it working.
  • Aug 28th:
    • merged 8.x branch to master, will release first version of 8.x shortly
    • installation of 8.x from binaries on Windows works now, by doing, eg on 32-bit Windows 7, and assuming you already activated an appropriate python environment (assumes 7-zip is installed, in default location, otherwise do the unzip by hand):
powershell Set-ExecutionPolicy unrestricted
rem following command is like `wget` in linux:
powershell.exe -Command (new-object System.Net.WebClient).DownloadFile('http://deepcl.hughperkins.com/Downloads/deepcl-win32-v8.0.0rc8.zip', 'deepcl-win32-v8.0.0rc8.zip')
rem following command is like `tar -xf` in linux:
"c:\program files\7-Zip\7z.exe" x deepcl-win32-v8.0.0rc8.zip
call dist\bin\activate.bat
pip install --pre DeepCL
python -c "import PyDeepCL; cl = PyDeepCL.DeepCL()"
# (last line is just to check works ok)
  • Aug 26th: installation of 8.x from binaries on linux works now, by doing, eg on 64-bit Ubuntu 14.04:
mkdir 8.0.0rc4
cd 8.0.0rc4
wget http://deepcl.hughperkins.com/Downloads/deepcl-linux64-v8.0.0rc4.tar.bz2
tar -xf deepcl-linux64-v8.0.0rc4.tar.bz2
virtualenv env
source env/bin/activate
source dist/bin/activate.sh
pip install --pre DeepCL
python -c "import PyDeepCL; cl = PyDeepCL.DeepCL()"

(last line is just to check works ok)

  • Aug 21st-24th:
    • 8.x finally builds again on all CI tested configurations!
      • ubuntu 14.04 32-bit Python 2.7
      • ubuntu 14.04 32-bit Python 3.4
      • ubuntu 14.04 64-bit Python 2.7
      • ubuntu 14.04 64-bit Python 3.4
      • visual studio 2010 32-bit python 2.7
      • visual studio 2010 32-bit python 3.4
      • visual studio 2010 64-bit python 2.7
      • visual studio 2010 64-bit python 3.4
  • Aug 19th-20th:
    • Python wrappers now built using a very thin setup.py layer, on top of the standard native DeepCL build
  • Aug 18th:
    • added BackwardIm2Col layer, which uses im2col for backward propagation
    • added BackpropWeightsIm2Col layer, which uses im2col for weight update
    • added BackwardAuto layer, which automatically selects fastest Backward layer
    • added BackpropWeightsAuto layer, which automatically selects faster weight update layer
    • under the covers:
      • created ClBlasHelper, to handle Gemm and Gemv
      • factorized im2col into Im2Col class
  • week up to Aug 17th:
    • added forward and backward im2col layer
    • forward im2col automatically used during forward propagation, where appropriate
    • backwards has yet to be integrated
    • under the covers:
      • added clBLAS
      • migrated the Python build process to use cmake, rather than setup.py (whether this turns out to be good or bad is a bit up in the air for now)
  • June 22nd:
    • removed lua wrappers
    • if you want to use lua with OpenCL, please consider using cltorch and clnn

To get in contact

Just create an issues, in github, in the top right of this page. Don't worry about whether you think the issue sounds silly or anything. The more feedback the better!

Note that I'm currently focused 100.000% on cuda-on-cl, so please be patient during this period.

deepcl's People

Contributors

0stackoverflow0 avatar hughperkins avatar ignotus avatar jmoudrik avatar kennethban avatar kikaxa avatar m-smith avatar maggeych avatar marty1885 avatar merceyz avatar minilight avatar numerio avatar slimeq avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

deepcl's Issues

Factorize kernels

Factorize kernels:

  • eg move activation to separate layers
  • need to check this doesnt affect performance too much, hence need the end to end benchmarking in place first.

Get swig python wrappers working with numpy.i

  • the cython python wrappers, in python directory, can accept numpy arrays (I think...)
  • however, the swig-based ones, in python_swig directory, cannot yet
  • numpy provides numpy.i, in the tools\swig directory of their .tar.gz distribution
  • could be good to integrate numpy.i into the python swig wrappers, so can directly pass numpy arrays into the python swig wrappers

elu activation

Hi,
can it be that your derivative of the elu activation function is wrong?

elu: (alpha=1 is left out the equation)
forward: x >= 0 ? x : exp(x) - 1 correct
backward: x >= 0 ? 1 : exp(x) instead of x >= 0 ? 1 : x + 1

thanks,
filip

Unit tests fail

Hi, I ran unit tests and ran into some errors.
Compiled with Visual C++ 2015 x64 on Windows 7 with Radeon 7970.

Different errors:
"unknown file: error: C++ exception with description "memallocsize too small to use this kernel on this device. Need: 0MB, but only have: -1984MB max alloc size" thrown in the test body.
[ FAILED ] testforward.compare_1_n_biased_nopad (2230 ms)"

"error: Expected: (0.1f) >= (loss), actual: 0.1 vs 2.72727
clblas teardown
[ FAILED ] testsimpleconvolvenet.imagesize_5_3_2layers_filtersize_3_3_biased_n18 (11310 ms)"

"ForwardAuto: kernel 5: this instance cant be used: For ForwardFc, filtersize and inputimagesize must be identical"

"Something went wrong, code -55" thrown in the test body.
[ FAILED ] testbackward.compare_1_n_kgsgo_32c5 (843 ms)"

"ForwardAuto: kernel 2: this instance cant be used: cannot use forward2, since outputimagesize * outputimagesize > maxworkgroupsize"

Full log http://pastebin.com/v9d7ZMFu

Gpuinfo output:

platform index: 0:
platform id: 000007FEDD2AF180
platform vendor: Advanced Micro Devices, Inc.
platform name: AMD Accelerated Parallel Processing
platform num devices: 2

device index: 0
device id: 0000000000379A20
device type: 4
global memory size: 3072MB
local memory size: 32KB
global cache size: 16KB
global cacheline size: 64
max memory alloc size: 2112MB
max compute units: 32
max workgroup size: 256
max workitem dimensions: 3
max workitem sizes: 256 256 256
device name: Tahiti
opencl c version: OpenCL C 1.2
opencl device version: OpenCL 1.2 AMD-APP (1800.8)
frequency MHz: 925

Install on mac (el capitan)

Hello,
Is there any tutorial on how to install it on mac and run with XCode? It could be very helpful.
If I install it correctly, I can make an tutorial, but for now I can't help very much.
Thank you.

Where/How to start reading csv data?

Hi,
my problem data is available in CSV format. I have no documentation found how to load this data. Also, I was wondering no one else has requested this feature.
As a developer, I would simply add a custom Loader class.
Maybe it's worth it for someone else, so it should follow your guidelines. After all I would send it as pull request.

Get ctrl-c working in python swig wrappers

  • the cython wrappers, in python subdirectory, currently abort correctly when ctrl- is pressed (tested on linux at least)
  • the swig ones , in python_swig subdirectoyr, do not
  • would be good to make them do so :-)

DeepCL and optical flow computation

Hi there,

I'm sorry if this is not the best way to contact you, I'm not really pointing an issue, but rather I'd like to ask a question. Do you believe that DeepCL could be suitable to estimate optical flows from video frames, as available, for example, in http://damienteney.info/cnnFlow.htm. This project is very good but it is based on MatConvNet which only supports CUDA computations, while I need an OpenCL based solution.

Thank you for your time.

Best regards,

Davide

VS2015 Build Errors

Hi!

I'm trying to build DeepCL libraries from scratch with Visual Studio 2015, MSVC 14.0 compiler. I do know that is not officially supported, but I love the way this library is written (and supports OpenCL) and decided to give it a shoot.

The error bellow is shown ~4500 times on a couple of different lines and haven't got any ideas where to start dealing with this error.

I also tried building clBLAS as a standalone cloned from clMathLibraries/clBlas git repository and it works without any problems.
Same problem occurs on 32 and 64bit compiler settings, tested under Win 10 x64.

Severity Code Description Project File Line
Error C2719 'alpha': formal parameter with requested alignment of 16 won't be aligned DeepCL D:\DeepCL-Source\clMathLibraries\clBLAS\src\clBLAS.h 4149
Error C2719 'beta': formal parameter with requested alignment of 16 won't be aligned DeepCL D:\DeepCL-Source\clMathLibraries\clBLAS\src\clBLAS.h 5663

Hook for model serialization

I've modified your qlearning example to play tetris and all is fine and dandy but it would be nice to save the net somehow so that I can resume training or use the trained net in another application. Are there any entry points in the current API for doing this or do I need to write something my self?

Predicting use kgsgo filter

Hello,
i'm a noob at this domain, after i train a filter 'weights.dat' with kgsgov2 dataset, i'm wondering how to predict (generate) a move from some board state(maybe i'm missing some document that explain how it work.), i've tried using testLoadSgf.py to make a binary file as input file for prediction, and it's appearing:

ManifestLoaderv1 checking format for move1.dat
matched: 0
GenericLoader::getDimensions
trainFilepath: move1.dat
headstringGO
Something went wrong: Filetype of move1.dat not recognised

please help, thanks!
Deryk.

PyDeepCL: NetLearner not callable with new trainer types

PyDeepCL.NetLearner(adadelta, net, N, images, labels, testN, testImages, testLabels, batchSize)
results in
TypeError: Argument 'sgd' has incorrect type (expected PyDeepCL.SGD, got PyDeepCL.Adadelta)

Same error with Anneal, Nesterov, Adagrad and Rmsprop.

Doc from NeuralNetAPI gives trainer:

NetLearner netLearner(
    trainer, net,   // <<
    Ntrain, trainData, trainLabels,
    Ntest, testData, testLabels );

Float values as output

Is there a way to get float values as "labels" data?
The idea would be to use the network as a regressor, of course.

Replacing InputMaker with InputLayerMaker

This line from the NN API:

net->addLayer(InputLayerMaker::instance()->numPlanes(1)->imageSize(28));

Causes the compilation error:

nn.cpp: In function 'int main()':
nn.cpp:11:19: error: incomplete type 'InputMaker' used in nested name specifier
  net->addLayer(InputMaker::instance()->numPlanes(1)->imageSize(28));
               ^

Following the code in the test files though (testsimpleconvolvenet.cpp#L236), and using InputLayerMaker, compilation succeeds

net->addLayer(InputLayerMaker::instance()->numPlanes(1)->imageSize(28));

Should the docs be changed to use InputLayerMaker instead of InputMaker, or is my library linking incorrect?

Thanks

Error when run "deepcl_unittests.exe" on Win7 x64

error code:-1

C++ exception with description "Error getting OpenCL device
 ids: -1" thrown in the test body.

It seems that it can't find my GPU.But other tests can find it.
You can see the last part of test log as follows:

[ RUN      ] testsgd.basic
Using NVIDIA Corporation , OpenCL platform: NVIDIA CUDA
Using OpenCL device: GeForce GTX 760
layer 0:InputLayer{ outputPlanes=1 outputSize=5 }
layer 1:ConvolutionalLayer{ LayerDimensions{ inputPlanes=1 inputSize=5 numFilter
s=1 filterSize=3 outputSize=3 padZeros=0 biased=0 skip=0} }
layer 2:SquareLossLayer{}

inputtotalsize=50 outputTotalSize=18
forward try kernel 0
  ... not plausibly optimal, skipping
forward try kernel 1
   ... seems valid
ForwardAuto: kernel 1 0ms
calcGradWeights try kernel 0
  ... not plausibly optimal, skipping
calcGradWeights try kernel 1
   ... seems valid
BackpropWeightsAuto: kernel 1 0ms
[       OK ] testsgd.basic (234 ms)
[----------] 1 test from testsgd (234 ms total)

[----------] 9 tests from testCLMathWrapper
[ RUN      ] testCLMathWrapper.assign
unknown file: error: C++ exception with description "Error getting OpenCL device
 ids: -1" thrown in the test body.
[  FAILED  ] testCLMathWrapper.assign (0 ms)
[ RUN      ] testCLMathWrapper.assignScalar
unknown file: error: C++ exception with description "Error getting OpenCL device
 ids: -1" thrown in the test body.
[  FAILED  ] testCLMathWrapper.assignScalar (0 ms)
[ RUN      ] testCLMathWrapper.addinplace
unknown file: error: C++ exception with description "Error getting OpenCL device
 ids: -1" thrown in the test body.
[  FAILED  ] testCLMathWrapper.addinplace (0 ms)
[ RUN      ] testCLMathWrapper.multiplyinplace
unknown file: error: C++ exception with description "Error getting OpenCL device
 ids: -1" thrown in the test body.
[  FAILED  ] testCLMathWrapper.multiplyinplace (0 ms)
[ RUN      ] testCLMathWrapper.addscalar
unknown file: error: C++ exception with description "Error getting OpenCL device
 ids: -1" thrown in the test body.
[  FAILED  ] testCLMathWrapper.addscalar (0 ms)
[ RUN      ] testCLMathWrapper.sqrt
unknown file: error: C++ exception with description "Error getting OpenCL device
 ids: -1" thrown in the test body.
[  FAILED  ] testCLMathWrapper.sqrt (0 ms)
[ RUN      ] testCLMathWrapper.squared
unknown file: error: C++ exception with description "Error getting OpenCL device
 ids: -1" thrown in the test body.
[  FAILED  ] testCLMathWrapper.squared (0 ms)
[ RUN      ] testCLMathWrapper.inverse
unknown file: error: C++ exception with description "Error getting OpenCL device
 ids: -1" thrown in the test body.
[  FAILED  ] testCLMathWrapper.inverse (0 ms)
[ RUN      ] testCLMathWrapper.perelementmult
unknown file: error: C++ exception with description "Error getting OpenCL device
 ids: -1" thrown in the test body.
[  FAILED  ] testCLMathWrapper.perelementmult (0 ms)
[----------] 9 tests from testCLMathWrapper (15 ms total)

[----------] 1 test from testreducesegments
[ RUN      ] testreducesegments.basic
Using NVIDIA Corporation , OpenCL platform: NVIDIA CUDA
Using OpenCL device: GeForce GTX 760
[       OK ] testreducesegments.basic (94 ms)
[----------] 1 test from testreducesegments (94 ms total)

[----------] 4 tests from testGpuOp
[ RUN      ] testGpuOp.addinplace
unknown file: error: C++ exception with description "Error getting OpenCL device
 ids: -1" thrown in the test body.
[  FAILED  ] testGpuOp.addinplace (0 ms)
[ RUN      ] testGpuOp.addoutofplace
unknown file: error: C++ exception with description "Error getting OpenCL device
 ids: -1" thrown in the test body.
[  FAILED  ] testGpuOp.addoutofplace (0 ms)
[ RUN      ] testGpuOp.inverse
unknown file: error: C++ exception with description "Error getting OpenCL device
 ids: -1" thrown in the test body.
[  FAILED  ] testGpuOp.inverse (0 ms)
[ RUN      ] testGpuOp.addscalarinplace
unknown file: error: C++ exception with description "Error getting OpenCL device
 ids: -1" thrown in the test body.
[  FAILED  ] testGpuOp.addscalarinplace (0 ms)
[----------] 4 tests from testGpuOp (15 ms total)

[----------] 1 test from testjpeghelper
[ RUN      ] testjpeghelper.writeread
[       OK ] testjpeghelper.writeread (0 ms)
[----------] 1 test from testjpeghelper (0 ms total)

[----------] Global test environment tear-down
[==========] 158 tests from 29 test cases ran. (63525 ms total)
[  PASSED  ] 145 tests.
[  FAILED  ] 13 tests, listed below:
[  FAILED  ] testCLMathWrapper.assign
[  FAILED  ] testCLMathWrapper.assignScalar
[  FAILED  ] testCLMathWrapper.addinplace
[  FAILED  ] testCLMathWrapper.multiplyinplace
[  FAILED  ] testCLMathWrapper.addscalar
[  FAILED  ] testCLMathWrapper.sqrt
[  FAILED  ] testCLMathWrapper.squared
[  FAILED  ] testCLMathWrapper.inverse
[  FAILED  ] testCLMathWrapper.perelementmult
[  FAILED  ] testGpuOp.addinplace
[  FAILED  ] testGpuOp.addoutofplace
[  FAILED  ] testGpuOp.inverse
[  FAILED  ] testGpuOp.addscalarinplace

13 FAILED TESTS
  YOU HAVE 2 DISABLED TESTS


E:\dist\bin>

Output from network

In a multi class problem like MNIST, how can I get the output of a softmax layer for each class (plane) using the python wrapper? getOutput returns a list of what seems to be probabilities, but what these probabilities represent is unclear to me!

huge RAM used

Maybe it is more a question than an issue, but is it normal that the RAM used keep incresing with the learning ?

With the following network, I have ~70 GB (!) of used RAM after 14 epoch (input and output are float arrays):

N = 12312
batchSize = 171
cl = PyDeepCL.DeepCL()
net = PyDeepCL.NeuralNet(cl)
net.addLayer(PyDeepCL.InputLayerMaker().numPlanes(49).imageSize(3))
net.addLayer(PyDeepCL.ConvolutionalMaker().numFilters(20).filterSize(3).biased())
net.addLayer(PyDeepCL.ActivationMaker().relu())
net.addLayer(PyDeepCL.ConvolutionalMaker().numFilters(49).filterSize(1).biased())
net.addLayer(PyDeepCL.SquareLossMaker())
sgd = PyDeepCL.SGD(cl, 0.00002, 0.0001)

fatal error: 'png++/png.hpp' file not found

/Users/userone/Documents/workspace/DeepCL/src/util/ImagePng.h:10:10: fatal error: 'png++/png.hpp' file not found
In file included from /Users/userone/Documents/workspace/DeepCL/test/testPatchExtractor.cpp:5:
/Users/userone/Documents/workspace/DeepCL/src/util/ImagePng.h:10:10: fatal error: 'png++/png.hpp' file not found
#include "png++/png.hpp"#include "png++/png.hpp"

         ^         ^

1 error generated.
1 error generated.

Run time error report

Hi Hugh,

I have successfully compiled DeepCL on my PC (Windows 7 64Bit, Visual Studio 2010 x86). But when I run the mnist example, it reports an error:

ForwardAuto: kernel 1 6ms
clblas teardown
Something went wrong: Label 28 exceeds number of softmax planes 10

The script I used to run this demo is:

deepcl_train.exe datadir=. trainfile=train-images.idx3-ubyte validatefile=t10k-images.idx3-ubyte

When I traced the error, I found it is in the learning loop while(!netLearner->isLearningDone()) {...}.

Could you give me some clue about how to fix this problem? Thanks.

Windows 7 installation failed

I've faced with an error during installation on Windows 7, python 2.7(from Anaconda).
pip install deepcl or python setup.py install resulting...
deepcl_error
Is it a bug in installation script?

Replace the #defines in some of the opencl cl files with inline method calls

  • currnetly, some opencl cl implementations use #define macros for speed
  • it is plausible that all opencl implementaitons will inline function calls
  • therefore, we can plausibly replace these #define macros with standard function calls

Whoever does this would ideally need to benchmark before and after, on a fairly standard gpu, to check that the change doesnt in fact cause a speed reduction.

MNIST test fails to run on Radeon 8750M

Radeon 8750M on HP Probook 470 G1

clinfo:

Number of platforms                               1
  Platform Name                                   Clover
  Platform Vendor                                 Mesa
  Platform Version                                OpenCL 1.1 MESA 11.0.4
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_icd
  Platform Extensions function suffix             MESA

  Platform Name                                   Clover
Number of devices                                 1
  Device Name                                     AMD OLAND (DRM 2.42.0, LLVM 3.7.0)
  Device Vendor                                   AMD
  Device Vendor ID                                0x1002
  Device Version                                  OpenCL 1.1 MESA 11.0.4
  Driver Version                                  11.0.4
  Device OpenCL C Version                         OpenCL C 1.1 
  Device Type                                     GPU
  Device Profile                                  FULL_PROFILE

Error:

initializing clblas
cl/activate.cl build log: 
input.cl:28:23: warning: implicit declaration of function 'tanh' is invalid in C99
input.cl:11:42: note: expanded from macro 'ACTIVATION_FUNCTION'
unsupported call to function tanh in activate

Add dropout

  • dropout is a vital function, not currently implemented. pretty vital for any self-respecting convolutional neural network really ;-)
  • it doesnt need to be added to all propagate and backprop implementations, but probably at least to:
    • Propagate1: generic gpu-based forward propagation layer
    • BackpropErrorsv2Naive: generic gpu-based backprop of error gradient to previous layer
    • BackpropWeights2Naive: generic gpu-based backprop of error gradient onto weights of same layer
  • you'd need to also add the Dropout option to ConvolutionalMaker class
  • and you'd need to make all other implementations (Propagate2 etc...) throw a runtime_error, in their constructor, if dropout is requested in the passed-in maker object

Try using unroll+clblas GEMM

Following this article, http://petewarden.com/2015/04/20/why-gemm-is-at-the-heart-of-deep-learning/ http://www.reddit.com/r/MachineLearning/comments/338lfs/why_gemm_is_at_the_heart_of_deep_learning/ , decided should try this, in case gives an easy way to speed up DeepCL for large image sizes.

My verdict? Not useful :-(

Tried on my laptop, and on a K520, and the results were:

  • unroll + matmult on cpu is a bit faster than direct cpu convolution. I suppose this is because memory access patterns a better
  • unroll + clblas was faster again
  • the most naive convolutional opencl kernel, ie not using any type of unroll or gemm, was the fastest

For batchsize=128, inputplanes=32, inputsize=128, numfilters=32, filtersize=5, on a K520 got:

  • convolution + cpu: 318s
  • unrolled + cpu: 218s
  • unrolled +clblas: invalid command queue
  • no unrolling, propagate1: 2s

Matrices are apparently a bit too big for unroll + clblas, so tried using a smaller batchsize:
batchsize=16, inputplanes=32, inputsize=128, numfilters=32, filtersize=5:

  • convolution + cpu: 39s
  • unroll + cpu: 26s
  • unroll + clblas GEMM: 2.2s
  • propagate1: 0.27s

Note that propagate1 is DeepCL's most generic, least optimized kernel. It doesnt use local memory (which is why it's generic, and works on anything really, unless it runs out of gpu global memory). Kernels using local memory are around 3-10 times faster than propagate1.

Overall: current conclusion: unroll + clblas GEMM doesnt seem promising?

=> closing issue.

stuckedm but GPU still running

while I am runing the example, I stuck here almost 24 hours,

and I checked the GPU still works!

ubgpu@ubgpu:~/github/DeepCL_Kgsgo/DeepCL/build$ ./deepclrun dataset=kgsgoall netdef=12_(32c5z-relu)-500n-tanh-361n numepochs=15 learningrate=0.0001
Using dataset kgsgoall:
datadir: ../data/kgsgo:
trainfile: kgsgo-trainall-v2.dat:
validatefile: kgsgo-test-v2.dat:
Ntrain 33630595 numPlanes 7 imageSize 19
Ntest 18860 Ntest
after load images 759 ms
image stats mean 12.3638 stdDev 54.7709
image norm translate -12.3638 scale 0.00912893
after getting stats 96 ms
Using NVIDIA Corporation platform: NVIDIA CUDA
Using device: GeForce GTX 970
netDefLower [12_(32c5z-relu)-500n-tanh-361n]
nnString: [12]
repeatNum 12
remainderString [(32c5z-relu)-500n-tanh-361n]
inner [32c5z-relu]
newRemainder [-500n-tanh-361n]
postfix [500n-tanh-361n]
multiplied string: 32c5z-relu-32c5z-relu-32c5z-relu-32c5z-relu-32c5z-relu-32c5z-relu-32c5z-relu-32c5z-relu-32c5z-relu-32c5z-relu-32c5z-relu-32c5z-relu-500n-tanh-361n
GpuAdd: building kernel
CopyBuffer: building kernel
Using trainer SGD{ learningRate=0.0001, momentum=0 }
layer 0:InputLayer{ outputPlanes=7 outputImageSize=19 }
layer 1:NormalizationLayer{ outputPlanes=7 outputImageSize=19 translate=-12.3638 scale=0.00912893 }
layer 2:ConvolutionalLayer{ LayerDimensions{ inputPlanes=7 inputImageSize=19 numFilters=32 filterSize=5 outputImageSize=19 padZeros=1 biased=1 skip=0} }
layer 3:ActivationLayer{ RELU }
layer 4:ConvolutionalLayer{ LayerDimensions{ inputPlanes=32 inputImageSize=19 numFilters=32 filterSize=5 outputImageSize=19 padZeros=1 biased=1 skip=0} }
layer 5:ActivationLayer{ RELU }
layer 6:ConvolutionalLayer{ LayerDimensions{ inputPlanes=32 inputImageSize=19 numFilters=32 filterSize=5 outputImageSize=19 padZeros=1 biased=1 skip=0} }
layer 7:ActivationLayer{ RELU }
layer 8:ConvolutionalLayer{ LayerDimensions{ inputPlanes=32 inputImageSize=19 numFilters=32 filterSize=5 outputImageSize=19 padZeros=1 biased=1 skip=0} }
layer 9:ActivationLayer{ RELU }
layer 10:ConvolutionalLayer{ LayerDimensions{ inputPlanes=32 inputImageSize=19 numFilters=32 filterSize=5 outputImageSize=19 padZeros=1 biased=1 skip=0} }
layer 11:ActivationLayer{ RELU }
layer 12:ConvolutionalLayer{ LayerDimensions{ inputPlanes=32 inputImageSize=19 numFilters=32 filterSize=5 outputImageSize=19 padZeros=1 biased=1 skip=0} }
layer 13:ActivationLayer{ RELU }
layer 14:ConvolutionalLayer{ LayerDimensions{ inputPlanes=32 inputImageSize=19 numFilters=32 filterSize=5 outputImageSize=19 padZeros=1 biased=1 skip=0} }
layer 15:ActivationLayer{ RELU }
layer 16:ConvolutionalLayer{ LayerDimensions{ inputPlanes=32 inputImageSize=19 numFilters=32 filterSize=5 outputImageSize=19 padZeros=1 biased=1 skip=0} }
layer 17:ActivationLayer{ RELU }
layer 18:ConvolutionalLayer{ LayerDimensions{ inputPlanes=32 inputImageSize=19 numFilters=32 filterSize=5 outputImageSize=19 padZeros=1 biased=1 skip=0} }
layer 19:ActivationLayer{ RELU }
layer 20:ConvolutionalLayer{ LayerDimensions{ inputPlanes=32 inputImageSize=19 numFilters=32 filterSize=5 outputImageSize=19 padZeros=1 biased=1 skip=0} }
layer 21:ActivationLayer{ RELU }
layer 22:ConvolutionalLayer{ LayerDimensions{ inputPlanes=32 inputImageSize=19 numFilters=32 filterSize=5 outputImageSize=19 padZeros=1 biased=1 skip=0} }
layer 23:ActivationLayer{ RELU }
layer 24:ConvolutionalLayer{ LayerDimensions{ inputPlanes=32 inputImageSize=19 numFilters=32 filterSize=5 outputImageSize=19 padZeros=1 biased=1 skip=0} }
layer 25:ActivationLayer{ RELU }
layer 26:FullyConnectedLayer{ numPlanes=500 imageSize=1 }
layer 27:ActivationLayer{ TANH }
layer 28:FullyConnectedLayer{ numPlanes=361 imageSize=1 }
layer 29:SoftMaxLayer{ perPlane=0 numPlanes=361 imageSize=1 }
Parameters overview: (skipping 16 layers with 0 params)
layer 2: params=5632 0.1%
layer 4: params=25632 0.4%
layer 6: params=25632 0.4%
layer 8: params=25632 0.4%
layer 10: params=25632 0.4%
layer 12: params=25632 0.4%
layer 14: params=25632 0.4%
layer 16: params=25632 0.4%
layer 18: params=25632 0.4%
layer 20: params=25632 0.4%
layer 22: params=25632 0.4%
layer 24: params=25632 0.4%
layer 26: params=5776500 92.5%
layer 28: params=180861 2.9%
TOTAL : params=6244945
before learning start 46587 ms
MultiplyInPlace: building kernel
sqrt: building kernel
squared: building kernel
PerElementMultInPlace: building kernel
kernelAddScalar: building kernel
kernelInv: building kernel
options -D gWorkgroupSize=361 -D gPixelsPerThread=1
options -D gWorkgroupSize=361 -D gPixelsPerThread=1
options -D gWorkgroupSize=361 -D gPixelsPerThread=1
options -D gWorkgroupSize=361 -D gPixelsPerThread=1
options -D gWorkgroupSize=361 -D gPixelsPerThread=1
options -D gWorkgroupSize=361 -D gPixelsPerThread=1
options -D gWorkgroupSize=361 -D gPixelsPerThread=1
options -D gWorkgroupSize=361 -D gPixelsPerThread=1
options -D gWorkgroupSize=361 -D gPixelsPerThread=1
options -D gWorkgroupSize=361 -D gPixelsPerThread=1
options -D gWorkgroupSize=361 -D gPixelsPerThread=1
options -D gWorkgroupSize=361 -D gPixelsPerThread=1
options -D gWorkgroupSize=32 -D gPixelsPerThread=1
options -D gWorkgroupSize=32 -D gPixelsPerThread=1
layer2 ForwardAuto: instance 5: this instance cant be used: For ForwardFc, filtersize and inputimagesize must be identical
layer4 ForwardAuto: instance 5: this instance cant be used: For ForwardFc, filtersize and inputimagesize must be identical
layer6 ForwardAuto: instance 5: this instance cant be used: For ForwardFc, filtersize and inputimagesize must be identical
layer8 ForwardAuto: instance 5: this instance cant be used: For ForwardFc, filtersize and inputimagesize must be identical
layer10 ForwardAuto: instance 5: this instance cant be used: For ForwardFc, filtersize and inputimagesize must be identical
layer12 ForwardAuto: instance 5: this instance cant be used: For ForwardFc, filtersize and inputimagesize must be identical
layer14 ForwardAuto: instance 5: this instance cant be used: For ForwardFc, filtersize and inputimagesize must be identical
layer16 ForwardAuto: instance 5: this instance cant be used: For ForwardFc, filtersize and inputimagesize must be identical
layer18 ForwardAuto: instance 5: this instance cant be used: For ForwardFc, filtersize and inputimagesize must be identical
layer20 ForwardAuto: instance 5: this instance cant be used: For ForwardFc, filtersize and inputimagesize must be identical
layer22 ForwardAuto: instance 5: this instance cant be used: For ForwardFc, filtersize and inputimagesize must be identical
layer24 ForwardAuto: instance 5: this instance cant be used: For ForwardFc, filtersize and inputimagesize must be identical
layer2 ForwardAuto::forward choosing best instance:
instance 0: cannot be used
instance 1: 6ms
instance 2: 3ms
instance 3: 2ms
instance 4: 2ms
instance 5: cannot be used
instance 6: 4ms
selected: instance 3
layer4 ForwardAuto::forward choosing best instance:
instance 0: cannot be used
instance 1: 22ms
instance 2: 13ms
instance 3: 10ms
instance 4: 6ms
instance 5: cannot be used
instance 6: 29ms
selected: instance 4
layer6 ForwardAuto::forward choosing best instance:
instance 0: cannot be used
instance 1: 22ms
instance 2: 13ms
instance 3: 10ms
instance 4: 6ms
instance 5: cannot be used
instance 6: 29ms
selected: instance 4
layer8 ForwardAuto::forward choosing best instance:
instance 0: cannot be used
instance 1: 22ms
instance 2: 13ms
instance 3: 10ms
instance 4: 6ms
instance 5: cannot be used
instance 6: 29ms
selected: instance 4
layer10 ForwardAuto::forward choosing best instance:
instance 0: cannot be used
instance 1: 22ms
instance 2: 13ms
instance 3: 10ms
instance 4: 6ms
instance 5: cannot be used
instance 6: 29ms
selected: instance 4
layer12 ForwardAuto::forward choosing best instance:
instance 0: cannot be used
instance 1: 23ms
instance 2: 13ms
instance 3: 10ms
instance 4: 6ms
instance 5: cannot be used
instance 6: 29ms
selected: instance 4
layer14 ForwardAuto::forward choosing best instance:
instance 0: cannot be used
instance 1: 22ms
instance 2: 13ms
instance 3: 10ms
instance 4: 6ms
instance 5: cannot be used
instance 6: 29ms
selected: instance 4
layer16 ForwardAuto::forward choosing best instance:
instance 0: cannot be used
instance 1: 22ms
instance 2: 13ms
instance 3: 10ms
instance 4: 6ms
instance 5: cannot be used
instance 6: 32ms
selected: instance 4
layer18 ForwardAuto::forward choosing best instance:
instance 0: cannot be used
instance 1: 22ms
instance 2: 13ms
instance 3: 10ms
instance 4: 6ms
instance 5: cannot be used
instance 6: 31ms
selected: instance 4
layer20 ForwardAuto::forward choosing best instance:
instance 0: cannot be used
instance 1: 22ms
instance 2: 13ms
instance 3: 10ms
instance 4: 6ms
instance 5: cannot be used
instance 6: 30ms
selected: instance 4
layer22 ForwardAuto::forward choosing best instance:
instance 0: cannot be used
instance 1: 23ms
instance 2: 13ms
instance 3: 10ms
instance 4: 6ms
instance 5: cannot be used
instance 6: 30ms
selected: instance 4
layer24 ForwardAuto::forward choosing best instance:
instance 0: cannot be used
instance 1: 22ms
instance 2: 14ms
instance 3: 10ms
instance 4: 6ms
instance 5: cannot be used
instance 6: 31ms
selected: instance 4
layer26 ForwardAuto: instance 6 this instance cant be used: Out of resources, code -5
layer26 ForwardAuto::forward choosing best instance:
instance 0: cannot be used
instance 1: 153ms
instance 2: 378ms
instance 3: 767ms
instance 4: 93ms
instance 5: 27ms
instance 6: cannot be used
selected: instance 5
layer28 ForwardAuto::forward choosing best instance:
instance 0: cannot be used
instance 1: 2ms
instance 2: 16ms
instance 3: 15ms
instance 4: 15ms
instance 5: 13ms
instance 6: 11ms
selected: instance 1

my GPU info:

ubgpu@ubgpu:~/big_data$ nvidia-smi
Wed May 20 21:33:18 2015
+------------------------------------------------------+
| NVIDIA-SMI 346.46 Driver Version: 346.46 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 970 Off | 0000:01:00.0 N/A | N/A |
| 58% 67C P2 N/A / N/A | 664MiB / 4095MiB | N/A Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 C+G Not Supported |
+-----------------------------------------------------------------------------+
ubgpu@ubgpu:~/big_data$

Add momentum to SGD trainer

  • currently SGD training is implemented, with a learning rate and an annealed learning rate
  • it could be good to add momentum

OpenCL build error on Activation kernel

I'm getting an OpenCL kernel build error when compiling activate.cl. I'm using PyDeepCL's NetdefToNet.createNetFromNetdef with the architecture: rt2-8c5z-relu-mp2-16c5z-relu-mp3-150n-tanh-7n.

Using NVIDIA Corporation , OpenCL platform: NVIDIA CUDA
Using OpenCL device: GeForce 940M
initializing clblas
cl/activate.cl build log: 
<built-in>:13:9: error: macro names must be identifiers
#define <C8><U+000F><EB><U+0003> 1
        ^
<built-in>:23:9: error: macro names must be identifiers
#define <C8><U+000F><EB><U+0003> 1
        ^

kernel build error:

kernel source:
1: // Copyright Hugh Perkins 2015 hughperkins at gmail
2: //
3: // This Source Code Form is subject to the terms of the Mozilla Public License,
4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
5: // obtain one at http://mozilla.org/MPL/2.0/.
6: 
7: // expected defines:
8: // one of: [ TANH | RELU | LINEAR | SIGMOID | SCALEDTANH | ELU ]
9: 
10: #ifdef TANH
11:     #define ACTIVATION_FUNCTION(output) (tanh(output))
12: #elif defined SCALEDTANH
13:     #define ACTIVATION_FUNCTION(output) (1.7159f * tanh(0.66667f * output))
14: #elif SIGMOID
15:     #define ACTIVATION_FUNCTION(output) (1.0f / (1 + exp(-output)))
16: #elif defined RELU
17:     #define ACTIVATION_FUNCTION(output) (output> 0 ? output : 0)
18: #elif defined ELU
19:     #define ACTIVATION_FUNCTION(output) (output> 0 ? output : exp(output) - 1)
20: #elif defined LINEAR
21:     #define ACTIVATION_FUNCTION(output) (output)
22: #endif
23: 
24: #ifdef ACTIVATION_FUNCTION // protect against not defined
25: kernel void activate(const int N, global float *inout) {
26:     const int globalId = get_global_id(0);
27:     if (globalId >= N) {
28:         return;
29:     }
30:     inout[globalId] = ACTIVATION_FUNCTION(inout[globalId]);
31: }
32: #endif
33: 
34: #ifdef ACTIVATION_FUNCTION // protect against not defined
35: kernel void forwardNaive(const int N, global float *out, global const float *in) {
36:     const int globalId = get_global_id(0);
37:     if (globalId >= N) {
38:         return;
39:     }
40:     out[globalId] = ACTIVATION_FUNCTION(in[globalId]);
41: }
42: #endif
43: 
44: 

Debugging this, it looks like it's caused by a broken options string ("" -DgOutputSize=32 -DgOutputSizeSquared=1024 -DgInputSize=32 -DgInputSizeSquared=1024 -DgNumPlanes=8 -D \230zt\002"), which is in turn caused by an ActivationFunction that has been optimized out (according to gdb).

Fix luarocks install, so it can download the source

currently, luarocks with the rock file works, but just typing luarocks install luadeepcl fails, because it doesnt download the soruce file, or the rock, just the rockspsec, which points to a non-existent source file.

Create opencl kernels for large image sizes, using local memory

Create opencl kernels for large image sizes, using local memory

Currently, for large images, the only working kernel is propagate1, which is generic, but doesnt use local memory. If we make a dedicated kernel, that uses local memory, eg by blocking the input images, large images should run faster (eg 128x128, this kind of size)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.