Comments (8)
@JACKHAHA363
Thank you for your report.
After carefully comparing the published (cleaned) version and our interval version of the source code, we found that we did joint training of pretraining losses in the internal version, which is done alternatively in the cleaned version.
I patched the code to do the joint training (98a51e6), please try with this version.
Sorry for our mistake, the alternative training will need more iterations to converge.
from vilt.
@JACKHAHA363 Those two need different inputs. For ITM, we use unmasked inputs (and also misaligned image-text pair). So an iteration requires running the transformer three times: aligned masked text + image for MLM, aligned unmasked text + image and misaligned unmasked text + image for ITM.
from vilt.
This is the hparams
config:
batch_size: 4096
data_root: vilt_dataset/
datasets:
- coco
- vg
- sbu
- gcc
decay_power: 1
draw_false_image: 1
draw_false_text: 0
drop_rate: 0.1
end_lr: 0
exp_name: pretrain
fast_dev_run: false
get_recall_metric: false
hidden_size: 768
image_only: false
image_size: 384
learning_rate: 0.0001
load_path: ''
log_dir: result
loss_names:
irtr: 0
itm: 1
mlm: 1
mpp: 0
nlvr2: 0
vqa: 0
lr_mult: 1
max_epoch: 100
max_image_len: 200
max_steps: 100000
max_text_len: 40
mlm_prob: 0.15
mlp_ratio: 4
num_gpus: 8
num_heads: 12
num_layers: 12
num_nodes: 8
num_workers: 8
optim_type: adamw
patch_size: 32
per_gpu_batchsize: 64
precision: 16
resume_from: null
seed: 0
test_only: false
tokenizer: bert-base-uncased
train_transform_keys:
- pixelbert
val_check_interval: 1.0
val_transform_keys:
- pixelbert
vit: vit_base_patch32_384
vocab_size: 30522
vqav2_label_size: 3129
warmup_steps: 2500
weight_decay: 0.01
whole_word_masking: true
from vilt.
Using the official VQA checkpoint I am able to get the suggested results of 71
from vilt.
@JACKHAHA363
https://tensorboard.dev/experiment/mNHxDM08R6eHKeU0JHn5vg
FYI, I have uploaded the pre-training log of ViLT to the above tensorboard.dev
link for mlm_itm + WWM 100K.
from vilt.
@dandelin Thanks for the swift response! Also it seems that in the current implementation, both computing the ITM and computing the MLM will involve the same forwarding procedure? Do you think removing that redundancy could potentially speed up the training?
from vilt.
@dandelin I see. thanks for your help and really impressive work!
from vilt.
The loss curves is normal now, and my VQA test-dev score is close to the paper. I will close this issue.
from vilt.
Related Issues (20)
- RuntimeError: CUDA error: invalid device function HOT 3
- Question about train on coco dataset HOT 1
- pretrain datasets
- The problem of fine-flickr30k
- What is the image resolution during VQA finetuning and pretraining?
- Mistakes in vqa_dict.json ?
- pyarrow.lib.ArrowInvalid: Not an Arrow file HOT 2
- fine-tuning ViLT for MLM task with a new dataset
- Can't the weight folder be opened before the pre-training is over?
- RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu) HOT 2
- What could be the reason that the model weights are not updating while finetuning? HOT 2
- cannot import name 'Final' from 'typing' HOT 2
- AttributeError: 'TracebackException' object has no attribute 'exc_traceback' HOT 1
- KeyError: 'false_image_0'
- error: subprocess-exited-with-error HOT 1
- 更改输入 HOT 1
- Which python could I use
- requests.exceptions.MissingSchema: Invalid URL 'None': No scheme supplied. Perhaps you meant https://None?
- ValueError: Connection error, and we cannot find the requested files in the cached path. Please try again or make sure your Internet connection is on. HOT 1
- When distributed training was performed, the program remained unresponsive
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from vilt.