Comments (11)
Thanks! I felt so free as to update your comment and turn it into an actual code snippet.
Will check it out.
from arraymancer.
Could you provide a small reproducible example?
from arraymancer.
import std/strformat
import arraymancer except softmax # by default, this softmax signature is proc (input: Tensor[softmax.T]): Tensor[softmax.T]
import arraymancer/nn/activation/softmax # this is the softmax we need softmax*[TT](a: Variable[TT]): Variable[TT]
let (N, D_in, H, D_out) = (64, 1000, 100, 10)
let ctx = newContext Tensor[float32]
let
x = ctx.variable(randomTensor[float32](N, D_in, 1'f32))
y = randomTensor[float32](N, D_out, 1'f32)
network ctx, TwoLayersNet:
layers:
fc1: Linear(D_in, H)
fc2: Linear(H, D_out)
forward x:
x.fc1.relu.fc2.softmax
let
model = ctx.init(TwoLayersNet)
optim = model.optimizerSGD(learning_rate = 1e-4'f32)
for t in 0 ..< 500:
let
y_pred = model.forward(x)
loss = y_pred.mse_loss(y)
echo &"Epoch {t}: loss {loss.value[0]}"
loss.backprop()
optim.update()
from arraymancer.
@Vindaar the above is the "simple 2 layer" example modified to simply add softmax in the forward. It produces the same error as above
from arraymancer.
Thanks Vindaar for your fast response and comment edit lol I'm still getting used to Github markdown
from arraymancer.
I've modified the softmax_backward_ag[TT]
procedure to pass in self
rather than Gate
: (see below)
reference: https://github.com/mratsim/Arraymancer/blob/master/src/arraymancer/nn/activation/softmax.nim
proc softmax_backward_ag[TT](self: Gate[TT], payload: Payload[TT]): SmallDiffs[TT] =
let self = SoftmaxActivation[TT](self)#(Gate)
let gradient = payload.variable.grad
result = newDiffs[TT](1)
result[0] = gradient.softmax_backward(self.cache)
This matches what I've found while looking at how relu
is implemented. It took care of this error that was raised:
type mismatch: got <type Gate> but expected 'SoftmaxActivation[Tensor[system.float32]]'
However, now I get this error:
attempting to call undeclared routine: 'softmax_backward'
After a search through the docs I see we don't have the softmax_backward
procedure as mentioned in this issue:
#472
from arraymancer.
@Vindaar please review when you can
from arraymancer.
@Vindaar my good sir can we please get this implemented lol
from arraymancer.
Can you please ping me about this on matrix/discord on the weekend, if I haven't looked into this by then?
from arraymancer.
Ok, I just had a look at it.
As you've mentioned yourself, the practical problem is that the backward pass for softmax
is not implemented. After looking into it now, I realize that the (likely) reason for that is that the backward pass of a pure softmax
is rather ugly, because the softmax itself is defined via a sum of all parameters. In essence then the gradient results in something that depends on the specific indices (you have a δij
in the derivative):
∂sm(x_i) / ∂x_j = sm(x_j) · ( δij - sm(x_i) )
(sorry for somewhat sloppy notation)
See for example:
https://en.wikipedia.org/wiki/Softmax_function
That's why typically one combines the softmax on the last layer directly with a cross entropy loss, for which the gradient is easy to compute.
I don't have the time & mental space atm to figure out how to efficiently implement this (if even possible?). If someone is willing to do so, feel free. Otherwise I'd just recommend to do what one normally does, i.e. use softmax_cross_entropy
)
from arraymancer.
Shit, thanks for looking into it Vindaar. I will take a look when I finally get the time and mental space as well lol
from arraymancer.
Related Issues (20)
- einsum in proc fails when result is a Tensor HOT 4
- example 3 breaks with Nim >= 1.6 HOT 3
- Importing model refusing to output any information that is new? HOT 1
- Ability to clone neural nets instead of all the weights/biases tensors manually HOT 3
- set[int] used that depends on odd compiler behavior.
- Device workload split feature HOT 2
- Fix CI by fixing doc generation
- Error: type mismatch: got 'float' for 'sum(conv2d(x, dkernel, dbias, padding, stride, Im2ColGEMM))' but expected 'Tensor[system.float]'
- Multiplication of tensors with rank > 2 HOT 6
- Can't read hdf5: io_hdf5.nim(17, 6) Error: 'parseNameAndGroup' can have side effects
- ShallowCopy problem using Nim v2 RCx HOT 2
- Futuristic premium
- Using the .item function on a Complex Tensor crashes
- Using arange within a [] expression fails HOT 1
- delete it
- 2023-12-31 - Longstanding missing features HOT 25
- Pretty printing larger-than-printable Tensor hangs the program
- Tensor.append crashes when used on empty tensors
- `size` returns 1 for rank-0 tensors
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from arraymancer.