GithubHelp home page GithubHelp logo

Issues w/ softmax procedure about arraymancer HOT 11 OPEN

Niminem avatar Niminem commented on May 23, 2024
Issues w/ softmax procedure

from arraymancer.

Comments (11)

Vindaar avatar Vindaar commented on May 23, 2024 1

Thanks! I felt so free as to update your comment and turn it into an actual code snippet.

Will check it out.

from arraymancer.

Vindaar avatar Vindaar commented on May 23, 2024

Could you provide a small reproducible example?

from arraymancer.

Niminem avatar Niminem commented on May 23, 2024
import std/strformat
import arraymancer except softmax # by default, this softmax signature is proc (input: Tensor[softmax.T]): Tensor[softmax.T]
import arraymancer/nn/activation/softmax # this is the softmax we need softmax*[TT](a: Variable[TT]): Variable[TT]

let (N, D_in, H, D_out) = (64, 1000, 100, 10)
let ctx = newContext Tensor[float32]

let
  x = ctx.variable(randomTensor[float32](N, D_in, 1'f32))
  y = randomTensor[float32](N, D_out, 1'f32)

network ctx, TwoLayersNet:
  layers:
    fc1: Linear(D_in, H)
    fc2: Linear(H, D_out)
  forward x:
    x.fc1.relu.fc2.softmax

let
  model = ctx.init(TwoLayersNet)
  optim = model.optimizerSGD(learning_rate = 1e-4'f32)

for t in 0 ..< 500:
  let
    y_pred = model.forward(x)
    loss = y_pred.mse_loss(y)

  echo &"Epoch {t}: loss {loss.value[0]}"

  loss.backprop()
  optim.update()

from arraymancer.

Niminem avatar Niminem commented on May 23, 2024

@Vindaar the above is the "simple 2 layer" example modified to simply add softmax in the forward. It produces the same error as above

from arraymancer.

Niminem avatar Niminem commented on May 23, 2024

Thanks Vindaar for your fast response and comment edit lol I'm still getting used to Github markdown

from arraymancer.

Niminem avatar Niminem commented on May 23, 2024

I've modified the softmax_backward_ag[TT] procedure to pass in self rather than Gate: (see below)
reference: https://github.com/mratsim/Arraymancer/blob/master/src/arraymancer/nn/activation/softmax.nim

proc softmax_backward_ag[TT](self: Gate[TT], payload: Payload[TT]): SmallDiffs[TT] =
  let self = SoftmaxActivation[TT](self)#(Gate)
  let gradient = payload.variable.grad
  result = newDiffs[TT](1)
  result[0] = gradient.softmax_backward(self.cache)

This matches what I've found while looking at how relu is implemented. It took care of this error that was raised:
type mismatch: got <type Gate> but expected 'SoftmaxActivation[Tensor[system.float32]]'

However, now I get this error:
attempting to call undeclared routine: 'softmax_backward'

After a search through the docs I see we don't have the softmax_backward procedure as mentioned in this issue:
#472

from arraymancer.

Niminem avatar Niminem commented on May 23, 2024

@Vindaar please review when you can

from arraymancer.

Niminem avatar Niminem commented on May 23, 2024

@Vindaar my good sir can we please get this implemented lol

from arraymancer.

Vindaar avatar Vindaar commented on May 23, 2024

Can you please ping me about this on matrix/discord on the weekend, if I haven't looked into this by then?

from arraymancer.

Vindaar avatar Vindaar commented on May 23, 2024

Ok, I just had a look at it.

As you've mentioned yourself, the practical problem is that the backward pass for softmax is not implemented. After looking into it now, I realize that the (likely) reason for that is that the backward pass of a pure softmax is rather ugly, because the softmax itself is defined via a sum of all parameters. In essence then the gradient results in something that depends on the specific indices (you have a δij in the derivative):

∂sm(x_i) / ∂x_j = sm(x_j) · ( δij - sm(x_i) )

(sorry for somewhat sloppy notation)

See for example:
https://en.wikipedia.org/wiki/Softmax_function

That's why typically one combines the softmax on the last layer directly with a cross entropy loss, for which the gradient is easy to compute.

I don't have the time & mental space atm to figure out how to efficiently implement this (if even possible?). If someone is willing to do so, feel free. Otherwise I'd just recommend to do what one normally does, i.e. use softmax_cross_entropy)

from arraymancer.

Niminem avatar Niminem commented on May 23, 2024

Shit, thanks for looking into it Vindaar. I will take a look when I finally get the time and mental space as well lol

from arraymancer.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.