Hi, It's a great implementation, but I noted the Loss function in the code is the

why apply ridge regularization on output layer? about neural-gc HOT 1 CLOSED

iancovert commented on July 23, 2024

why apply ridge regularization on output layer?

from neural-gc.

Comments (1)

iancovert commented on July 23, 2024

Hi - that's a good question, we didn't discuss this additional penalty in the paper.

The problem is that with the MSE + non-smooth loss function, there's a penalty encouraging the first layer's weights to be small, while the next layer's weights aren't penalized at all. That's bad, because you can end up with a model where the first layer's weights are very small, and not necessarily sparse (containing many zeros), but the next layer's weights are very big. So we use the ridge penalty to ensure that the other layer's weights don't get too large.

In our experiments, we tend to fix the ridge penalty to a small value, and only change the non-smooth penalty to find different levels of sparsity. Let me know if this makes sense.

Ian

from neural-gc.

Recommend Projects

why apply ridge regularization on output layer? about neural-gc HOT 1 CLOSED

Comments (1)

Related Issues (11)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs