Comments (11)
I'm not so familiar with OpenCL (though that's why I came to DeepCL). I'll read through the related code segments to get a better understanding of DeepCL. I can't promise that I can solve this, but I'll try to help.
Ok. The error message is something like:
- there is a CLFloatWrapper object, which is a wrapper around an underlying
cl_mem
object: https://github.com/hughperkins/EasyCL/blob/master/CLFloatWrapper.h - the corresponding
cl_mem
object has not been created yet:- here's a method that creates the
cl_mem
object, for the GPU, without copying from main memory: https://github.com/hughperkins/EasyCL/blob/master/CLWrapper.cpp#L58 - here's a method that copies from main memory buffer to the GPU
cl_mem
bufffer: https://github.com/hughperkins/EasyCL/blob/master/CLWrapper.cpp#L87
- here's a method that creates the
- this buffer is being passed into an OpenCL kernel, via the wrapper class CLKernel, https://github.com/hughperkins/EasyCL/blob/master/CLKernel.cpp
- here is the check and error message:
from deepcl.
It's definitely happening during backprop, but I still can't quite figure out why.
I have a feeling it has to do with the limited set of experiences in the beginning,
I did notice that while gradInputWrapper
and outputWrapper
use createOnDevice()
in:
https://github.com/hughperkins/DeepCL/blob/master/src/dropout/DropoutLayer.cpp#L86
weightsWrapper
does not.
However, in DropoutLayer::backward
, copyToDevice()
is called on weightsWrapper
, which should satisfy all input requirements because that function will createOnDevice()
if it isn't already onDevice
. (And gradOutputWrapper
is sourced from the next layer)
I don't really see anything different from conv/backward, or activate/backward.
Best I can think to do with my limited troubleshooting knowledge would be to attempt some logging and recompile.
from deepcl.
As an aside, I was testing the following network in your Q-Learning example (Gridsize = 15, Random = True)
18C9z-8C3z-128N-4N
LR = 0.0025, Reg = 0.0000001
qLambda = 0.945
maxSamples = 128
epsilon = 0.1
rewards:
move: -0.03
wall: -0.1
win: 1.0
I let it run overnight (900+ rounds now), and it seems quite adept at the game mostly, then one game (seems like one in 50 or so) will take quite a while (10k+ actions) and then it'll do the next ones no problem.
I'm interested to see how it's doing by the end of today, but I'm curious if you have some suggestions for experimentation.
(After 1800 games, it seems to have mastered the 15x15wRand)
from deepcl.
Also, this may not be the best place to ask, but if I wanted to create a QNetwork that would take complex actions (simultaneous and ones with variability), would I really need to create an action set to combine all possibles action states (i.e. hundreds)?
Do you think it would be possible to have a set of outputs where a portion is treated as action states, and the others as action variation (i.e. 5 outputs, [right, left, down, up, distance]) (distance if bound -1 to 1 could be scaled using gridSize). I could try out customizing Qlearner and modifying the example scenario to test this out.
I've read that a Sigmoid activation and Multinomial Cross Entropy loss could allow for multi-class selection for classification issues. I was thinking that if an action set contained items that should be selected simultaneously, there could be an action grouping. Each group will have a high Q value selection and each index is passed forward. (i.e action groupings = direction[right, left, down, up] distance[1, 2, 3... N] valid action [left,5] Depending on the granularity of variability, this could still lead to many output nodes.
What do you think?
from deepcl.
All I can say further for now is that it is during the second step of Qlearner.run(), when it is training on the first experience and during backprop of dropoutlayer.
from deepcl.
I figured out a hack
If I net.setTraining(True)
, then it works just fine. (Edit: I later used setTraining(False)
in the first act(self, index)
; this way the training bit was only set once. The network continued to work, but I think for all intents and purposes, training
should be set when training (I believe this affects Random Patch and Dropout only). I believe this would provide optimal action choice and experience replay while getting the true benefits of a dropout layer.)
https://github.com/hughperkins/DeepCL/blob/master/src/qlearning/QLearner.cpp#L99
I believe there should be a flag change before and after this line.
https://github.com/hughperkins/DeepCL/blob/master/src/dropout/DropoutLayer.cpp#L208
Here is the culprit (but not the issue necessarily) I believe, but I can't yet confirm.
from deepcl.
Ok. Thinking about it, without looking at the code, the dropout layer needs to keep the dropout tensor around, for backprop. If turning training off fixes a problem during backprop, it sounds like somehow that tensor is being recreated somehow? You might want to check carefully then what Wrapper
object holds the dropout tensor, ie the 1s and 0s choosing which inputs to forward to the output, and what happens to it, its lifecycle etc.
Your analysis sounds excellent! Very much appreciate your looking into this :)
from deepcl.
From what I see, masks is declared as a new unsigned char[]
. This is the wrapped variable.
However, it's innards are not given assignment until generateMasks()
is called, which only happens during foward prop when training
is set to true
. This declared, but un-initialized array is attemping to be copied during back prop.
This is probably the issue.
Also,
From my understanding, and I may be wrong, but training
shouldn't be set during predictions because you want the full network. However, in your alternate 'forward' path (not training), you multiple the dropRatio
by the inputs (depressing every node), but I believe it is supposed to be 1 to 1 copy.
https://www.cs.toronto.edu/~hinton/absps/JMLRdropout.pdf page 1931 Figure 2
(edit: I read the document further and found your implementation to be perfect... my bad)
I was having issues compiling, but I figured it out now. I'll make a pull request after I can test the changes. I'm running a couple experiments, so I'll wait for those first.
from deepcl.
My reruns failed >.>
My dropout got twice as far as the non, but the weights vanished (After 150 games)
With some modification testing I can say that it doesn't crash.
It's up to you on the non-training forward implementation.
However, my thought of only having the dropout during the actual training step, rather than across the whole learnFromPast()
seems to not work.
I get vanishing weights fairly early... (Python reports nan
for every weight) (I've been outputting them to file now to review dispersion and to see if a partially trained network after losing it's experiences can build better ones.)
I'm testing flagging training during the whole learnFromPast()
but allowing the action prediction to be full network.
This seems to be working as intended, but I'll wait til tomorrow to confirm.
from deepcl.
I think there might be some conceptual issues with the experience replay.
(edit: nope again, you're implementation is standard)
I'll try digging into the white papers to see if I'm right. If so, I'll code up a proposal.
(edit: The ideas I was having seem to be refered to as ranked experience replay prioritization (possibly not that easy to implement, but alternatively I could experiment with removing uninteresting old experiences), prevent loops from doping experiences, and enhanced experience replay. (Tracking valuable sequences of actions and replaying them backwards))
The last piece seems to be pretty important for a complex environment. Notably, I've been bothered by the standard experience replay implementation because it doesn't take into account the full future reward.
Q(s,a) = r + decay(Q(s',a'))
The consequence of the definition is that you don't know the second half until later, but experience reward value isn't updated later when subsequent reward is found. So in the replay, the estimation of future reward is done poorly by using our own (yet current) predictions.
By replaying a rewardful sequence backwards, you could effectively compensate for this disparity. Effectively giving a real decay to reward and helping out possible state/actions that led to it.
from deepcl.
Sorry for making this a microblog.
I found given an environment so large, a distance reward was helpful.
I attempted to reload a network (wipe all experience) after having completed 50 and 500 rounds.
2x15x15-8C5z-drop(0.5)-relu-8C7-relu-128N-tanh-mse-4x1x1
Each of these had two reloads, one with the full network and one with a perceptron (static conv network) and the loaded FC.
2x15x15-8C5z-drop(0.5)-relu-8C7-relu-2x8x8 --> 128N-sgmd-sftmx-4x1x1
I found that I kept getting trapped in local minimum (a very common one)(entire q field homogenous), until I changed the last activations to sigmoid/softmax for the dual network, or scaledTanh/softmax for the full.
It's not quite there yet, but the perceptron (trained 500 rounds) with changed final activations, seems to behave about as well as the first successful one after it had about 1200+ rounds (though this had no distance reward, and was incredibly lucky it converged; after 1800+ rounds no less, but a larger network)
edit:
LOL I realized my network had blurry vision, but nice performance, so i looked at my second layer.
2x15x15-8C5z-drop(0.5)-elu-18C5-elu-128N-scaledtanh-mse-4x1x1
I changed the second conv network, relu to elu and tanh to scaledtanh. Decay was reduced to 0.8. It very nearly converged after only 85 rounds in less than 30 minutes.
After about 350 rounds, the weights disappeared again...
2x15x15-8C5z-drop(0.5)-relu-18C5-linear-128N-tanh-mse-4x1x1
That seems to do the job nicely.
For one without a distance based reward, a negative action reward for a non-winning move seems to retard the network, but without a prioritized replay (doping it with terminal moves) it'll take a long time to converge, but now it shouldn't get caught in a local minimum at least (and should thus be reproducible).
from deepcl.
Related Issues (20)
- MNIST data format HOT 1
- can passed all test in #ad1ab61, but not now (#b256220) HOT 48
- function "NetLearner::learn" is deprecated, how to train a network by the new method? HOT 2
- How to use ExpectedData correctly in training? HOT 2
- tutorial and documentation is very less
- integrated demo HOT 2
- could deepcl run on FPGAs? HOT 1
- deepcl_predict HOT 5
- does the Neural network created on GPU? HOT 3
- captcha
- pip with windows does not work
- Need cythonize
- deepcl_unittests not running in CentOS 7. HOT 2
- opencl 1.1?
- Feature Request: Add Mish activation HOT 3
- How to set stride of conv layer? HOT 2
- Implementation of DNN on FPGA HOT 2
- Any examples for face detection?
- is there any ConvTranspose2d?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from deepcl.