Comments (7)
while <pad> in array:
remove <pad> from array
remove <eos> from array
from catr.
Thanks for your quick reply!
But if I remove eos from array, how can model learn to stop generating sentence without encountering the eos token?
from catr.
from catr.
But we should let trg[:-1] have eos token when we calculate the loss, right?
like this:
trg[:-1] = [x_1, x_2, x_3, eos, pad, pad]
or
trg[:-1] = [x_1, x_2, x_3, eos]
from catr.
from catr.
Thanks.
In all, I just want to create a dataset with sequences of different lengths. In such a dataset, I insert bos, eos in into the beginning and end of each sequence as the ground-truth. like this:
caps = [sos, x_1, x_2, x_3, eos]
In such a case,
caps[:, :-1] = [sos, x_1, x_2, x_3]
caps[:, 1:] = [x_1, x_2, x_3, eos]
This is what we want for the loss calculation.
outputs = model(samples, caps[:, :-1], cap_masks[:, :-1])
loss = criterion(outputs.permute(0, 2, 1), caps[:, 1:])
However, given different lengths, I have to further insert pad tokens to make them consistent, such as:
caps = [sos, x_1, x_2, x_3, eos, pad, pad, pad]
In such case,
caps[:, :-1] = [sos, x_1, x_2, x_3, eos, pad, pad]
caps[:, 1:] = [x_1, x_2, x_3, eos, pad, pad, pad]
The input of model (caps[:, :-1]) will contain the eos token, which we want to remove.
Considering this, I just further replace the eos token with pad token as pad token will not be calculated for the loss, like this:
caps[:, :-1] = [sos, x_1, x_2, x_3, pad, pad, pad]
And I remain the caps[:, 1:] as
caps[:, 1:] = [x_1, x_2, x_3, eos, pad, pad, pad].
May I ask does this make sense?
from catr.
from catr.
Related Issues (20)
- About the model explanation HOT 9
- Multiple images HOT 3
- Consideration of padding? HOT 2
- Cannot implement data parallel training for multiple GPUs HOT 1
- No simple way to test custom trained model HOT 1
- Questions on using sample images HOT 2
- Hugging Face Hub Integration HOT 5
- Specific words are preferred to be generated when I was training catr-model on my own dataset HOT 4
- RuntimeError: CUDA error: device-side assert triggered HOT 1
- Can I know loss of your pretrained model? HOT 1
- Model Performance HOT 1
- predict.py with using gpu HOT 1
- Beam search
- RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory HOT 1
- What the difference between v1, v2 and v3? HOT 1
- Some questions about image augmentation
- The requirements version
- loading model HOT 4
- loading model HOT 49
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from catr.