Comments (3)
Sorry, I have another problem here. I think what NCE does is just computing the loss on selected target class, which should involve only parts of the weights/biases for linear layer. Say the last linear layer have bias.shape = (V, )
, this indices
should be the correct output and some sampled noise class and index_select
make parts of class (should be total number of len(set(indices.view(-1).numpy()))
) invoved.
Line 60 in 862afc6
But when I look into the gradient after the back propagation,
Line 125 in 862afc6
the number of non-zero gradient for bias (should be (list(model.parameters())[-1].grad!=0).sum().item()
) is always smaller than selected class number.
Do you have any idea why this happened? I am checking this because I think we should do a sparse parameter update in advanced optimizer like Adam
, or the zero gradient of parameters related to un-selected class may destroy the momentum. And in this way, it may also help solving the speed issue.
from pytorch-nce.
@chaoqing squeeze(0)
is definitely a better choice, as you said, squeeze
will remove all the dim with size=1, which is unexpected for N=1. PR is appreciated.
for the non-zero elements mismatching, I suspect that .grad != 0
returns an uint8 value (https://pytorch.org/docs/master/torch.html#torch.eq), and it's highly possible to overflow when you perform a summation. Can you try (list(model.parameters())[-1].grad!=0).long().sum().item()
?
from pytorch-nce.
@Stonesjtu the non-zero elements mismatching still exists after I tried your check. Actually at first I checked the non-zero position and found out the non-zero gradient is always part of the truth or sampled ones, which is expected. We may need look into the how the gradient is formulated to find out why part of touched samples have zero gradient.
from pytorch-nce.
Related Issues (14)
- main.py does not run 'as is' on penn data HOT 1
- Why need to sub math.log(self.noise_ratio) HOT 1
- Why the target index is not removed from noise samples? HOT 1
- Why the nec_linear output loss while output prob for testing? HOT 1
- why the labels in sampled_softmax_loss func are all zero? HOT 1
- Can I use this loss on my customization model ?
- Is the implementation batched NCE? HOT 1
- Error in NCE expression? HOT 1
- How to select negative samples for NCE loss HOT 1
- truncated bptt without padding? HOT 2
- --nce HOT 2
- num_layers does't work HOT 9
- Target Sample can be included in Noise sample HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pytorch-nce.