GithubHelp home page GithubHelp logo

microsoft / oscar Goto Github PK

View Code? Open in Web Editor NEW
1.0K 25.0 248.0 732 KB

Oscar and VinVL

License: MIT License

Python 100.00%
vision-and-language pre-training image-captioning vqa image-text-search oscar vinvl

oscar's Introduction

Oscar: Object-Semantics Aligned Pre-training for Vision-and-Language Tasks

VinVL: Revisiting Visual Representations in Vision-Language Models

Updates

04/17/2023: Visual instruction tuning with GPT-4 is released! Please check out the multimodal model LLaVA: [Project Page] [Paper] [Demo] [Data] [Model]

05/28/2020: Released finetuned models on downstream tasks, please check MODEL_ZOO.md.
05/15/2020: Released pretrained models, datasets, and code for downstream tasks finetuning.
01/13/2021: our new work VinVL proposed OSCAR+, an improved version of OSCAR, and provided a better object-attribute detection model to extract features for V+L tasks. The VinVL work achieved SOTA performance on all seven V+L tasks here. Please stay tuned for the model and code release.
03/08/2021: Oscar+ pretraining code released, please check the last section in VinVL_MODEL_ZOO.md. All image features and model checkpoints in VinVL are also released. Please check VinVL for details.
04/13/2021: Our Scene Graph Benchmark Repo has been released. Welcome to use the code there to extract image features with VinVL pretrained models.

Introduction

This repository contains source code necessary to reproduce the results presented in the paper Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks. We propose a new cross-modal pre-training method Oscar (Object-Semantics Aligned Pre-training). It leverages object tags detected in images as anchor points to significantly ease the learning of image-text alignments. We pre-train Oscar on the public corpus of 6.5 million text-image pairs, and fine-tune it on downstream tasks, creating new state-of-the-arts on six well-established vision-language understanding and generation tasks. For more on this project, see the Microsoft Research Blog post.

Performance

Task t2i t2i i2t i2t IC IC IC IC NoCaps NoCaps VQA NLVR2 GQA
Metric R@1 R@5 R@1 R@5 B@4 M C S C S test-std test-P test-std
SoTA_S 39.2 68.0 56.6 84.5 38.9 29.2 129.8 22.4 61.5 9.2 70.92 58.80 63.17
SoTA_B 54.0 80.8 70.0 91.1 40.5 29.7 137.6 22.8 86.58 12.38 73.67 79.30 -
SoTA_L 57.5 82.8 73.5 92.2 41.7 30.6 140.0 24.5 - - 74.93 81.47 -
----- --- --- --- --- --- --- --- --- --- --- --- --- ---
Oscar_B 54.0 80.8 70.0 91.1 40.5 29.7 137.6 22.8 78.8 11.7 73.44 78.36 61.62
Oscar_L 57.5 82.8 73.5 92.2 41.7 30.6 140.0 24.5 80.9 11.3 73.82 80.05 -
----- --- --- --- --- --- --- --- --- --- --- --- --- ---
VinVL_B 58.1 83.2 74.6 92.6 40.9 30.9 140.6 25.1 92.46 13.07 76.12 83.08 64.65
VinVL_L 58.8 83.5 75.4 92.9 41.0 31.1 140.9 25.2 - - 76.62 83.98 -
gain 1.3 0.7 1.9 0.6 -0.7 0.5 0.9 0.7 5.9 0.7 1.69 2.51 1.48

t2i: text-to-image retrieval; i2t: image-to-text retrieval; IC: image captioning on COCO.

Download

We released pre-trained models, datasets, VinVL image features, and Oscar+ pretraining corpus for downstream tasks. Please check VinVL_DOWNLOAD.md for details.

To download checkpoints for the Vanilla OSCAR, please check DOWNLOAD.md for details.

Installation

Check INSTALL.md for installation instructions.

Model Zoo

Check MODEL_ZOO.md for scripts to run oscar downstream finetuning.

Check VinVL_MODEL_ZOO.md for scripts to run oscar+ pretraining and downstream finetuning.

Citations

Please consider citing this paper if you use the code:

@article{li2020oscar,
  title={Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks},
  author={Li, Xiujun and Yin, Xi and Li, Chunyuan and Hu, Xiaowei and Zhang, Pengchuan and Zhang, Lei and Wang, Lijuan and Hu, Houdong and Dong, Li and Wei, Furu and Choi, Yejin and Gao, Jianfeng},
  journal={ECCV 2020},
  year={2020}
}

@article{zhang2021vinvl,
  title={VinVL: Making Visual Representations Matter in Vision-Language Models},
  author={Zhang, Pengchuan and Li, Xiujun and Hu, Xiaowei and Yang, Jianwei and Zhang, Lei and Wang, Lijuan and Choi, Yejin and Gao, Jianfeng},
  journal={CVPR 2021},
  year={2021}
}

License

Oscar is released under the MIT license. See LICENSE for details.

oscar's People

Contributors

chunyuanli avatar eaidova avatar pzzhang avatar xiyinmsu avatar xjli avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

oscar's Issues

Question about t2i retrieval task

Hi, thank you very much for open source the project! I tried to reproduce the text-to-image retrieval task. However, it appears that only image-to-text retrieval code has been released. May I ask if it is possible to release the text-to-image retrieval code and model as well? Thank you very much for your help!

The result of IR/TR from BERT base without pre-training

Hi there, nice work!

I tried to reproduce the result you provided in Table 3 of the paper, i.e., IR and TR on COCO 1K with the model initialzed from BERT base without pre-training.
My results (default setting with all attentions) are far below what you reported:
TR: 0.6820 @ R1, 0.9180 @ R5, 0.9620 @ R10
IR: 0.5676 @ R1, 0.8748 @ R5, 0.9466 @ R10

I followed the script, but only change the --model_name_or_path to 'bert-base-uncased'.

Did I miss some thing important or need another set of hyper-params for finetune without pre-training?

Thank you!

The image of 2D visualization using t-SNE

Hello, I tried to reduce the dimension of text and image features with t-sne, but the final image and text are not in the same range and the same kind of text and image are not together. Have you processed the feature or dimension reduction before visualization?
Do you mind sharing the code of 2D visualization using t-SNE? thanks!

release the fine-tuned model for Image Text Retrieval ?

may I ask : how much time does it take to do finetuning for the Image Text Retrieval task ? and is it possible to release the fine-tuned model so we can directly inference on the coco dataset ? because the training with (4 V100 with 16G mem) or (8 V100 with 32G mem) is kinda expensive...

i have cannot allocate memory error!

i got this error.

(oscar) ailab@ailab:~/oscar/Oscar/oscar$ python run_vqa.py -j 4 --img_feature_dim 2054 --max_img_seq_length 50 --data_label_type mask --img_feature_type faster_r-cnn --data_dir /media/ailab/jaeyun/oscar/datasets/vqa/2k/ --model_type bert --model_name_or_path /media/ailab/jaeyun/oscar/models/base-vg-labels/ep_107_1192087/ --task_name vqa_text --do_train --do_lower_case --max_seq_length 128 --per_gpu_eval_batch_size 1 --per_gpu_train_batch_size 1 --learning_rate 5e-05 --num_train_epochs 25 --output_dir results --label_file /media/ailab/jaeyun/oscar/datasets/vqa/cache/trainval_ans2label.pkl --save_epoch 1 --seed 88 --evaluate_during_training --logging_steps 4000 --drop_out 0.3 --weight_decay 0.05 --warmup_steps 0 --loss_type bce --img_feat_format pt --classifier linear --cls_hidden_scale 3 --txt_data_dir /media/ailab/jaeyun/oscar/datasets/vqa/2k/
07/06/2020 12:17:14 - WARNING - __main__ - Process rank: -1, device: cuda, n_gpu: 2, distributed training: False, 16-bits training: False
07/06/2020 12:17:14 - INFO - __main__ - Task Name: vqa_text, #Labels: 3129
07/06/2020 12:17:14 - INFO - transformers.pytorch_transformers.modeling_utils - loading configuration file /media/ailab/jaeyun/oscar/models/base-vg-labels/ep_107_1192087/config.json
07/06/2020 12:17:14 - INFO - transformers.pytorch_transformers.modeling_utils - Model config {
  "attention_probs_dropout_prob": 0.1,
  "finetuning_task": "vqa_text",
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "img_feature_dim": 2054,
  "img_feature_type": "faster_r-cnn",
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "num_labels": 3129,
  "output_attentions": false,
  "output_hidden_states": false,
  "torchscript": false,
  "type_vocab_size": 2,
  "vocab_size": 30522
}

07/06/2020 12:17:14 - INFO - transformers.pytorch_transformers.tokenization_utils - Model name '/media/ailab/jaeyun/oscar/models/base-vg-labels/ep_107_1192087/' not found in model shortcut name list (bert-base-uncased, bert-large-uncased, bert-base-cased, bert-large-cased, bert-base-multilingual-uncased, bert-base-multilingual-cased, bert-base-chinese, bert-base-german-cased, bert-large-uncased-whole-word-masking, bert-large-cased-whole-word-masking, bert-large-uncased-whole-word-masking-finetuned-squad, bert-large-cased-whole-word-masking-finetuned-squad, bert-base-cased-finetuned-mrpc). Assuming '/media/ailab/jaeyun/oscar/models/base-vg-labels/ep_107_1192087/' is a path or url to a directory containing tokenizer files.
07/06/2020 12:17:14 - INFO - transformers.pytorch_transformers.tokenization_utils - loading file /media/ailab/jaeyun/oscar/models/base-vg-labels/ep_107_1192087/added_tokens.json
07/06/2020 12:17:14 - INFO - transformers.pytorch_transformers.tokenization_utils - loading file /media/ailab/jaeyun/oscar/models/base-vg-labels/ep_107_1192087/special_tokens_map.json
07/06/2020 12:17:14 - INFO - transformers.pytorch_transformers.tokenization_utils - loading file /media/ailab/jaeyun/oscar/models/base-vg-labels/ep_107_1192087/vocab.txt
07/06/2020 12:17:14 - INFO - transformers.pytorch_transformers.modeling_utils - loading weights file /media/ailab/jaeyun/oscar/models/base-vg-labels/ep_107_1192087/pytorch_model.bin
07/06/2020 12:17:15 - INFO - oscar.modeling.modeling_bert - BertImgModel Image Dimension: 2054
07/06/2020 12:17:16 - INFO - transformers.pytorch_transformers.modeling_utils - Weights of ImageBertForSequenceClassification not initialized from pretrained model: ['classifier.weight', 'classifier.bias']
07/06/2020 12:17:16 - INFO - transformers.pytorch_transformers.modeling_utils - Weights from pretrained model not used in ImageBertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias']
07/06/2020 12:17:17 - INFO - __main__ - Training/evaluation parameters Namespace(adam_epsilon=1e-08, adjust_dp=False, adjust_loss=False, adjust_loss_epoch=-1, cache_dir='', classifier='linear', cls_hidden_scale=3, code_level='top', code_voc=512, config_name='', data_dir='/media/ailab/jaeyun/oscar/datasets/vqa/2k/', data_label_type='mask', device=device(type='cuda'), do_eval=False, do_lower_case=True, do_test=False, do_test_dev=False, do_train=True, do_train_val=False, drop_out=0.3, eval_all_checkpoints=False, evaluate_during_training=True, fp16=False, fp16_opt_level='O1', gradient_accumulation_steps=1, hard_label=False, img_feat_dir=None, img_feat_format='pt', img_feature_dim=2054, img_feature_type='faster_r-cnn', label2ans_file=None, label_file='/media/ailab/jaeyun/oscar/datasets/vqa/cache/trainval_ans2label.pkl', learning_rate=5e-05, load_fast=False, local_rank=-1, logging_steps=4000, loss_type='bce', max_grad_norm=1.0, max_img_seq_length=50, max_seq_length=128, max_steps=-1, model_name_or_path='/media/ailab/jaeyun/oscar/models/base-vg-labels/ep_107_1192087/', model_type='bert', n_gpu=2, no_cuda=False, num_train_epochs=25.0, output_dir='results', output_mode='classification', overwrite_cache=False, overwrite_output_dir=False, per_gpu_eval_batch_size=1, per_gpu_train_batch_size=1, philly=False, save_after_epoch=-1, save_epoch=1, save_steps=-1, scheduler='linear', seed=88, server_ip='', server_port='', task_name='vqa_text', tokenizer_name='', txt_data_dir='/media/ailab/jaeyun/oscar/datasets/vqa/2k/', use_vg=False, use_vg_dev=False, warmup_steps=0, weight_decay=0.05, workers=4)
07/06/2020 12:17:18 - INFO - __main__ - Info: loading val features using 0.13 secs
07/06/2020 12:17:18 - INFO - __main__ - val Data Examples: 10402
07/06/2020 12:17:33 - INFO - __main__ - Info: loading train features using 15.48 secs
07/06/2020 12:17:37 - INFO - __main__ - train Data Examples: 634516
07/06/2020 12:17:37 - INFO - __main__ - ***** Running training *****
07/06/2020 12:17:37 - INFO - __main__ -   Num examples = 634516
07/06/2020 12:17:37 - INFO - __main__ -   Num Epochs = 25
07/06/2020 12:17:37 - INFO - __main__ -   Instantaneous batch size per GPU = 1
07/06/2020 12:17:37 - INFO - __main__ -   Total train batch size (w. parallel, distributed & accumulation) = 2
07/06/2020 12:17:37 - INFO - __main__ -   Gradient Accumulation steps = 1
07/06/2020 12:17:37 - INFO - __main__ -   Total optimization steps = 7931450
Traceback (most recent call last):
  File "run_vqa.py", line 1222, in <module>
    main()
  File "run_vqa.py", line 1145, in main
    global_step, tr_loss = train(args, train_dataset, eval_dataset, model, tokenizer)
  File "run_vqa.py", line 554, in train
    for step, batch in enumerate(train_dataloader):
  File "/home/ailab/anaconda3/envs/oscar/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 278, in __iter__
    return _MultiProcessingDataLoaderIter(self)
  File "/home/ailab/anaconda3/envs/oscar/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 682, in __init__
    w.start()
  File "/home/ailab/anaconda3/envs/oscar/lib/python3.7/multiprocessing/process.py", line 112, in start
    self._popen = self._Popen(self)
  File "/home/ailab/anaconda3/envs/oscar/lib/python3.7/multiprocessing/context.py", line 223, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/home/ailab/anaconda3/envs/oscar/lib/python3.7/multiprocessing/context.py", line 277, in _Popen
    return Popen(process_obj)
  File "/home/ailab/anaconda3/envs/oscar/lib/python3.7/multiprocessing/popen_fork.py", line 20, in __init__
    self._launch(process_obj)
  File "/home/ailab/anaconda3/envs/oscar/lib/python3.7/multiprocessing/popen_fork.py", line 70, in _launch
    self.pid = os.fork()
OSError: [Errno 12] Cannot allocate memory

i think it is lack of my gpu memory.
my gpu is 1080ti, and i use two gpu.
which gpu do you use?
thank you!

Object Detector Trained on OID

An object detector trained on OID-V5 was used in your paper. Do you mind sharing this pre-trained object detector?

Thanks!

Generating inputs to Oscar model

Hi Oscar Team,

Thanks for the interesting paper and open-sourcing your model.

On your download page, you mention that images are fed into Oscar through the outputs of a "Faster R-CNN with ResNet-101, using object and attribute annotations from Visual Genome". Have you made this model available too? It would be great if you could give a link to this pre-trained model, as it is necessary to run Oscar on my own images (I'm interested in image captioning and VQA).

I have tried to look for it myself, and the closest thing I could find was the R101-FPN from the Detectron2 model zoo (PyTorch model). However, this was trained on the COCO dataset of object tags, and I understand that the Visual Genome has significantly more labels. So surely this one would fail to produce the image features that Oscar expects?

I'd be grateful if you could let me know if my thinking is correct and if there is a link to the appropriate PyTorch model for generating inputs that Oscar can use.

Thanks in advance!

Generating label.lineidx and feature.lineidx for my own images

Hey guys, Great work!! I am trying to run the model on my own images. I followed other issues and was able to generate my own feature.tsv and label.tsv files. But I am not sure on how to generate feature.lineidx and label.lineidx files for my own images. I am not sure if I am missing something. it would be great if you could help me with this issue.

Thanks

ERROR 404: The specified blob does not exist

When I run the command: wget https://biglmdiag.blob.core.windows.net/oscar/datasets/$coco_ir.zip. The following error has occurred:
--2020-10-01 19:55:19-- https://biglmdiag.blob.core.windows.net/oscar/datasets/.zip
Resolving biglmdiag.blob.core.windows.net (biglmdiag.blob.core.windows.net)... 52.239.247.100
Connecting to biglmdiag.blob.core.windows.net (biglmdiag.blob.core.windows.net)|52.239.247.100|:443... connected.
HTTP request sent, awaiting response... 404 The specified blob does not exist.
2020-10-01 19:55:20 ERROR 404: The specified blob does not exist..
How can I solve it?

Few questions about the paper.

Our group is currently reviewing your paper. It's awesome :D.

We have a few questions about the model.

  1. Which Faster R-CNN version is used in OSCAR? Is the one on Ross Girshick’s GitHub? or your group reproduce it.
  2. In Image Captioning, the paper says repeating the process until [STOP] token is detected. Is [STOP] token [SEP] in BERT?
  3. During Image Captioning fine-tuning, the paper says “We randomly mask out 15% of the caption tokens…”. Are 15% of the caption tokens masked or each word has a 15% probability to be masked?

Looking forward to reviewing the source code. :D
Cheers

Pre-training for image captioning

Hello, and congrats for your brilliant work!
I’d like to ask. For image captioning, you mention in the appendix:

we directly fine-tune Oscar for image captioning on COCO without additional pre-training on Conceptual Captions

Does that mean you only use COCO dataset for pretraining, and not the rest (SBU, Flickr, GQA)? And the cider score of 1.4 is achieved after fine tuning the coco only pretrained model?

Coco caption pertained model output results are not good

Hello,

    Thank you for your great works!

    I used your pertained model for coco image captioning. Here is the command I used.

   python oscar/run_captioning.py \
--do_test \
--do_eval \
--test_yaml test.yaml \
--per_gpu_eval_batch_size 64 \
--num_beams 5 \
--max_gen_length 20 \
--eval_model_dir image_caption/Oscarrepo/Oscar/checkpoint-29-132780/

where checkpoint-29-132780 is uncompressed pertained coco model folder. But the outputs are not good.
Some following examples are the following:

caption claire libraries libraries libraries libraries libraries robbery libraries libraries libraries libraries libraries libraries libraries librariesletsletslets
caption demanded adoptedrredrred libraries libraries libraries libraries librariessteadsteadsteadsteadsteadstead libraries libraries libraries
caption typing curvature curvature libraries curvature curvature curvature curvature curvature curvature curvature curvature curvature curvature curvature curvature curvature

Do I miss some important steps? Thank you for your help!
Also, where is the test.yaml. Thanks

how to test VQA?

Thanks for your great work!
I want to ask how to test VQA2?
Do you upload to the Eval.al website to test or test with your own code ?
Can you offer me the scrip to test ?thanks a lot!!
Looking forward to your reply!

training_args.bin not included in downloaded datasets base-vg-labels nor large-vg-labels

I am trying to run this project for COCO Captioning.

I downloaded the pretrained base and large vg-models as instructed in the DOWNLOAD.

These were the respective folders:

+-- base-vg-labels
| +-- ep_67_588997
| +-- ep_107_1192087
+-- large-vg-labels
| +-- ep_7_816000
| +-- ep_20_590000
| +-- ep_34_999600
| +-- ep_55_1617000

I tried to get the performance of those checkpoints, but after executing:

python oscar/run_captioning.py \
  --do_test \
  --do_eval \
  --data_dir ../Data/coco_caption \
  --test_yaml test.yaml \
  --per_gpu_eval_batch_size 64 \
  --max_gen_length 20 \
  --num_beams 5 \
  --eval_model_dir ../Models/base-vg-labels/ep_107_1192087

an error ocurred ponting out that the file training_args.bin was not found inside the model's directory (base-vg-labels/ep_107_1192087).

I also downloaded the Checkpoint available in the MODEL_ZOO, under Image Captioning on COCO.
This Checkpoint corresponds to checkpoint-29-66420, which include a file training_args.bin.

These are the files included in each folder:

checkpoint-29-66420 large-vg-labels/ep_55_1617000
added_tokens.json added_tokens.json
config.json config.json
pytorch_model.bin pytorch_model.bin
special_tokens_map.json special_tokens_map.json
training_args.bin ???
vocab.txt vocab.txt

It seems that the only one missing is training_args.bin. After finetunning the provided checkpoint, the generated checkpoints also include that file. Maybe you missed to include them in the downloadable models?

Could you please provide those files?

Or am I missing something?

I also noted that the Checkpoint/checkpoint-29-66420 corresponds to training base-vg-labels with cross-entropy loss (deducted from the provided training logs). So I assume its training_args.bin file is probably used across the entire base-vg-labels training. I am now copying the missing file into base-vg-labels/ep_107_1192087 to test its performance. Does that make sense?

Edit:

The performance of base-vg-labels/ep_107_1192087 with the args.bin borrowed from checkpoint-29-66420 was a failure.

 {'SPICE': 0.00043991859734872146, 
  'Bleu_1': 4.759355107382894e-05, 
  'Bleu_2': 7.759555665291071e-13, 
  'Bleu_3': 2.0103762808759086e-15, 
  'Bleu_4': 1.0403605447565029e-16, 
  'ROUGE_L': 6.248437014771456e-05, 
  'CIDEr': 1.3248802263844757e-06}

Azcopy fail

I failed to download your dataset by executing

azcopy copy https://biglmdiag.blob.core.windows.net/oscar/pretrained_models/coco_caption.zip .

coco_ir.zip also cannot be downloaded by Azcopy. But the fine-tuned models you release are available by Azcopy.

About the image features dimensions

Hello,
Thank you for your great works!
When I extract image features, the dimension of the result is 2048, but the dimension of your model is 2054?

gcc version

I get the error "command 'gcc' failed with exit status 1" when running the line of INSTALL.md "python setup.py install --cuda_ext --cpp_ext"

What gcc version do you use?

Trained features?

For image captioning on COCO, I am trying to obtain an image features from a trained model instead of generating the caption. In DOWNLOAD.md, under Datasets, are the image region features (e.g., train.feature.tsv) extracted before or after training the model on downstream tasks (e.g., image captioning on COCO)? If before, how can I obtain an image features from a trained model?
One more question: in MODEL_ZOO.md, under Image Captioning on COCO, is the Model checkpoint: checkpoint.zip trained and finetuned? or we still need to train with cross-entropy loss and finetune with CIDEr optimization?

export INSTALL_DIR=$PWD

I'm beginner programmer.

i'm doing installation step. but i don't understand what i should do.

export INSTALL_DIR=$PWD << can i pass this code?

VQA custom dataset

Hi, first of all Thank you, for making this work public.

I am quite new to this field, but I would like to use this model for VQA on custom data. I found some .pkl files in your dataset, but I can't find any code associated with creation of these files.

Would you be so kind, to provide me with that code?
If that is not possible, could you at least tell me how were these files created?

Did you use peteanderson80/bottom-up-attention as you did for Image captioning, or any other public code for extracting image features?

Thank you.

HOW LONG did you train and fintune the image-captioning model?

I have downloaded your coco_caption zip and tried to train and finetune the model, but it seems like costing a long time, don't know if I am correct. If convient, could I kown how long it cost when you train or finetune the model through your 8 V100?

The order of tag labels and image features

First, thanks for sharing the code of this nice work!

I have a question about the dataset you provide.
In the case that the number of tag labels and image features are the same, are they 1-1 mapped with the same order or are they just randomly ordered? In other words, the first/second/third labels correspond to the first/second image features respectively? If not, can I get the mapping from labels to features?

Thanks :)

Pretrained Model Release

Hi,

In table 1, the paper claims to use the assembled dataset to pretrained model weights, but in the released pretrained models, both base and large models are from either visual genome or open image. Do you release the pretrained models on the assembled dataset?

i got some error during install apex

i got this error during install apex.
i think it is becuase gcc version.
what is your gcc version?

Thank you:)

~~~~~~~~~~~~~~~
csrc/mlp.cpp:147:46: error: expected primary-expression before ‘>’ token
         grad_o.contiguous().data_ptr<scalar_t>(),
                                              ^
/home/ailab/anaconda3/envs/oscar/lib/python3.7/site-packages/torch/include/ATen/Dispatch.h:12:12: note: in definition of macro ‘AT_PRIVATE_CASE_TYPE’
     return __VA_ARGS__();                          \
            ^
csrc/mlp.cpp:123:3: note: in expansion of macro ‘AT_DISPATCH_FLOATING_TYPES_AND_HALF’
   AT_DISPATCH_FLOATING_TYPES_AND_HALF(inputs[0].type(), "mlp_backward", [&] {
   ^
csrc/mlp.cpp:147:48: error: expected primary-expression before ‘)’ token
         grad_o.contiguous().data_ptr<scalar_t>(),
                                                ^
/home/ailab/anaconda3/envs/oscar/lib/python3.7/site-packages/torch/include/ATen/Dispatch.h:12:12: note: in definition of macro ‘AT_PRIVATE_CASE_TYPE’
     return __VA_ARGS__();                          \
            ^
csrc/mlp.cpp:123:3: note: in expansion of macro ‘AT_DISPATCH_FLOATING_TYPES_AND_HALF’
   AT_DISPATCH_FLOATING_TYPES_AND_HALF(inputs[0].type(), "mlp_backward", [&] {
   ^
csrc/mlp.cpp:148:43: error: expected primary-expression before ‘>’ token
         fprop_outputs[1].data_ptr<scalar_t>(),
                                           ^
/home/ailab/anaconda3/envs/oscar/lib/python3.7/site-packages/torch/include/ATen/Dispatch.h:12:12: note: in definition of macro ‘AT_PRIVATE_CASE_TYPE’
     return __VA_ARGS__();                          \
            ^
csrc/mlp.cpp:123:3: note: in expansion of macro ‘AT_DISPATCH_FLOATING_TYPES_AND_HALF’
   AT_DISPATCH_FLOATING_TYPES_AND_HALF(inputs[0].type(), "mlp_backward", [&] {
   ^
csrc/mlp.cpp:148:45: error: expected primary-expression before ‘)’ token
         fprop_outputs[1].data_ptr<scalar_t>(),
                                             ^
/home/ailab/anaconda3/envs/oscar/lib/python3.7/site-packages/torch/include/ATen/Dispatch.h:12:12: note: in definition of macro ‘AT_PRIVATE_CASE_TYPE’
     return __VA_ARGS__();                          \
            ^
csrc/mlp.cpp:123:3: note: in expansion of macro ‘AT_DISPATCH_FLOATING_TYPES_AND_HALF’
   AT_DISPATCH_FLOATING_TYPES_AND_HALF(inputs[0].type(), "mlp_backward", [&] {
   ^
csrc/mlp.cpp:149:37: error: expected primary-expression before ‘>’ token
         work_space.data_ptr<scalar_t>(),
                                     ^
/home/ailab/anaconda3/envs/oscar/lib/python3.7/site-packages/torch/include/ATen/Dispatch.h:12:12: note: in definition of macro ‘AT_PRIVATE_CASE_TYPE’
     return __VA_ARGS__();                          \
            ^
csrc/mlp.cpp:123:3: note: in expansion of macro ‘AT_DISPATCH_FLOATING_TYPES_AND_HALF’
   AT_DISPATCH_FLOATING_TYPES_AND_HALF(inputs[0].type(), "mlp_backward", [&] {
   ^
csrc/mlp.cpp:149:39: error: expected primary-expression before ‘)’ token
         work_space.data_ptr<scalar_t>(),
                                       ^
/home/ailab/anaconda3/envs/oscar/lib/python3.7/site-packages/torch/include/ATen/Dispatch.h:12:12: note: in definition of macro ‘AT_PRIVATE_CASE_TYPE’
     return __VA_ARGS__();                          \
            ^
csrc/mlp.cpp:123:3: note: in expansion of macro ‘AT_DISPATCH_FLOATING_TYPES_AND_HALF’
   AT_DISPATCH_FLOATING_TYPES_AND_HALF(inputs[0].type(), "mlp_backward", [&] {
   ^
csrc/mlp.cpp: In lambda function:
csrc/mlp.cpp:125:23: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
     for (int i = 0; i < num_layers; i++) {
                       ^
/home/ailab/anaconda3/envs/oscar/lib/python3.7/site-packages/torch/include/ATen/Dispatch.h:12:12: note: in definition of macro ‘AT_PRIVATE_CASE_TYPE’
     return __VA_ARGS__();                          \
            ^
csrc/mlp.cpp:123:3: note: in expansion of macro ‘AT_DISPATCH_FLOATING_TYPES_AND_HALF’
   AT_DISPATCH_FLOATING_TYPES_AND_HALF(inputs[0].type(), "mlp_backward", [&] {
   ^
csrc/mlp.cpp:126:54: error: expected primary-expression before ‘>’ token
       w_ptr.push_back(inputs[i + 1].data_ptr<scalar_t>());
                                                      ^
/home/ailab/anaconda3/envs/oscar/lib/python3.7/site-packages/torch/include/ATen/Dispatch.h:12:12: note: in definition of macro ‘AT_PRIVATE_CASE_TYPE’
     return __VA_ARGS__();                          \
            ^
csrc/mlp.cpp:123:3: note: in expansion of macro ‘AT_DISPATCH_FLOATING_TYPES_AND_HALF’
   AT_DISPATCH_FLOATING_TYPES_AND_HALF(inputs[0].type(), "mlp_backward", [&] {
   ^
csrc/mlp.cpp:126:56: error: expected primary-expression before ‘)’ token
       w_ptr.push_back(inputs[i + 1].data_ptr<scalar_t>());
                                                        ^
/home/ailab/anaconda3/envs/oscar/lib/python3.7/site-packages/torch/include/ATen/Dispatch.h:12:12: note: in definition of macro ‘AT_PRIVATE_CASE_TYPE’
     return __VA_ARGS__();                          \
            ^
csrc/mlp.cpp:123:3: note: in expansion of macro ‘AT_DISPATCH_FLOATING_TYPES_AND_HALF’
   AT_DISPATCH_FLOATING_TYPES_AND_HALF(inputs[0].type(), "mlp_backward", [&] {
   ^
csrc/mlp.cpp:129:23: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
     for (int i = 0; i < inputs.size(); i++) {
                       ^
/home/ailab/anaconda3/envs/oscar/lib/python3.7/site-packages/torch/include/ATen/Dispatch.h:12:12: note: in definition of macro ‘AT_PRIVATE_CASE_TYPE’
     return __VA_ARGS__();                          \
            ^
csrc/mlp.cpp:123:3: note: in expansion of macro ‘AT_DISPATCH_FLOATING_TYPES_AND_HALF’
   AT_DISPATCH_FLOATING_TYPES_AND_HALF(inputs[0].type(), "mlp_backward", [&] {
   ^
csrc/mlp.cpp:130:57: error: expected primary-expression before ‘>’ token
       outputs_ptr.push_back(outputs[i].data_ptr<scalar_t>());
                                                         ^
/home/ailab/anaconda3/envs/oscar/lib/python3.7/site-packages/torch/include/ATen/Dispatch.h:12:12: note: in definition of macro ‘AT_PRIVATE_CASE_TYPE’
     return __VA_ARGS__();                          \
            ^
csrc/mlp.cpp:123:3: note: in expansion of macro ‘AT_DISPATCH_FLOATING_TYPES_AND_HALF’
   AT_DISPATCH_FLOATING_TYPES_AND_HALF(inputs[0].type(), "mlp_backward", [&] {
   ^
csrc/mlp.cpp:130:59: error: expected primary-expression before ‘)’ token
       outputs_ptr.push_back(outputs[i].data_ptr<scalar_t>());
                                                           ^
/home/ailab/anaconda3/envs/oscar/lib/python3.7/site-packages/torch/include/ATen/Dispatch.h:12:12: note: in definition of macro ‘AT_PRIVATE_CASE_TYPE’
     return __VA_ARGS__();                          \
            ^
csrc/mlp.cpp:123:3: note: in expansion of macro ‘AT_DISPATCH_FLOATING_TYPES_AND_HALF’
   AT_DISPATCH_FLOATING_TYPES_AND_HALF(inputs[0].type(), "mlp_backward", [&] {
   ^
csrc/mlp.cpp:137:44: warning: narrowing conversion of ‘(work_size / sizeof (scalar_t))’ from ‘long unsigned int’ to ‘long int’ inside { } [-Wnarrowing]
     auto work_space = at::empty({work_size / sizeof(scalar_t)}, inputs[0].type());
                                            ^
/home/ailab/anaconda3/envs/oscar/lib/python3.7/site-packages/torch/include/ATen/Dispatch.h:12:12: note: in definition of macro ‘AT_PRIVATE_CASE_TYPE’
     return __VA_ARGS__();                          \
            ^
csrc/mlp.cpp:123:3: note: in expansion of macro ‘AT_DISPATCH_FLOATING_TYPES_AND_HALF’
   AT_DISPATCH_FLOATING_TYPES_AND_HALF(inputs[0].type(), "mlp_backward", [&] {
   ^
csrc/mlp.cpp:137:44: warning: narrowing conversion of ‘(work_size / sizeof (scalar_t))’ from ‘long unsigned int’ to ‘long int’ inside { } [-Wnarrowing]
     auto work_space = at::empty({work_size / sizeof(scalar_t)}, inputs[0].type());
                                            ^
/home/ailab/anaconda3/envs/oscar/lib/python3.7/site-packages/torch/include/ATen/Dispatch.h:12:12: note: in definition of macro ‘AT_PRIVATE_CASE_TYPE’
     return __VA_ARGS__();                          \
            ^
csrc/mlp.cpp:123:3: note: in expansion of macro ‘AT_DISPATCH_FLOATING_TYPES_AND_HALF’
   AT_DISPATCH_FLOATING_TYPES_AND_HALF(inputs[0].type(), "mlp_backward", [&] {
   ^
csrc/mlp.cpp:140:36: error: expected primary-expression before ‘>’ token
         inputs[0].data_ptr<scalar_t>(),
                                    ^
/home/ailab/anaconda3/envs/oscar/lib/python3.7/site-packages/torch/include/ATen/Dispatch.h:12:12: note: in definition of macro ‘AT_PRIVATE_CASE_TYPE’
     return __VA_ARGS__();                          \
            ^
csrc/mlp.cpp:123:3: note: in expansion of macro ‘AT_DISPATCH_FLOATING_TYPES_AND_HALF’
   AT_DISPATCH_FLOATING_TYPES_AND_HALF(inputs[0].type(), "mlp_backward", [&] {
   ^
csrc/mlp.cpp:140:38: error: expected primary-expression before ‘)’ token
         inputs[0].data_ptr<scalar_t>(),
                                      ^
/home/ailab/anaconda3/envs/oscar/lib/python3.7/site-packages/torch/include/ATen/Dispatch.h:12:12: note: in definition of macro ‘AT_PRIVATE_CASE_TYPE’
     return __VA_ARGS__();                          \
            ^
csrc/mlp.cpp:123:3: note: in expansion of macro ‘AT_DISPATCH_FLOATING_TYPES_AND_HALF’
   AT_DISPATCH_FLOATING_TYPES_AND_HALF(inputs[0].type(), "mlp_backward", [&] {
   ^
csrc/mlp.cpp:141:43: error: expected primary-expression before ‘>’ token
         fprop_outputs[0].data_ptr<scalar_t>(),
                                           ^
/home/ailab/anaconda3/envs/oscar/lib/python3.7/site-packages/torch/include/ATen/Dispatch.h:12:12: note: in definition of macro ‘AT_PRIVATE_CASE_TYPE’
     return __VA_ARGS__();                          \
            ^
csrc/mlp.cpp:123:3: note: in expansion of macro ‘AT_DISPATCH_FLOATING_TYPES_AND_HALF’
   AT_DISPATCH_FLOATING_TYPES_AND_HALF(inputs[0].type(), "mlp_backward", [&] {
   ^
csrc/mlp.cpp:141:45: error: expected primary-expression before ‘)’ token
         fprop_outputs[0].data_ptr<scalar_t>(),
                                             ^
/home/ailab/anaconda3/envs/oscar/lib/python3.7/site-packages/torch/include/ATen/Dispatch.h:12:12: note: in definition of macro ‘AT_PRIVATE_CASE_TYPE’
     return __VA_ARGS__();                          \
            ^
csrc/mlp.cpp:123:3: note: in expansion of macro ‘AT_DISPATCH_FLOATING_TYPES_AND_HALF’
   AT_DISPATCH_FLOATING_TYPES_AND_HALF(inputs[0].type(), "mlp_backward", [&] {
   ^
csrc/mlp.cpp:147:46: error: expected primary-expression before ‘>’ token
         grad_o.contiguous().data_ptr<scalar_t>(),
                                              ^
/home/ailab/anaconda3/envs/oscar/lib/python3.7/site-packages/torch/include/ATen/Dispatch.h:12:12: note: in definition of macro ‘AT_PRIVATE_CASE_TYPE’
     return __VA_ARGS__();                          \
            ^
csrc/mlp.cpp:123:3: note: in expansion of macro ‘AT_DISPATCH_FLOATING_TYPES_AND_HALF’
   AT_DISPATCH_FLOATING_TYPES_AND_HALF(inputs[0].type(), "mlp_backward", [&] {
   ^
csrc/mlp.cpp:147:48: error: expected primary-expression before ‘)’ token
         grad_o.contiguous().data_ptr<scalar_t>(),
                                                ^
/home/ailab/anaconda3/envs/oscar/lib/python3.7/site-packages/torch/include/ATen/Dispatch.h:12:12: note: in definition of macro ‘AT_PRIVATE_CASE_TYPE’
     return __VA_ARGS__();                          \
            ^
csrc/mlp.cpp:123:3: note: in expansion of macro ‘AT_DISPATCH_FLOATING_TYPES_AND_HALF’
   AT_DISPATCH_FLOATING_TYPES_AND_HALF(inputs[0].type(), "mlp_backward", [&] {
   ^
csrc/mlp.cpp:148:43: error: expected primary-expression before ‘>’ token
         fprop_outputs[1].data_ptr<scalar_t>(),
                                           ^
/home/ailab/anaconda3/envs/oscar/lib/python3.7/site-packages/torch/include/ATen/Dispatch.h:12:12: note: in definition of macro ‘AT_PRIVATE_CASE_TYPE’
     return __VA_ARGS__();                          \
            ^
csrc/mlp.cpp:123:3: note: in expansion of macro ‘AT_DISPATCH_FLOATING_TYPES_AND_HALF’
   AT_DISPATCH_FLOATING_TYPES_AND_HALF(inputs[0].type(), "mlp_backward", [&] {
   ^
csrc/mlp.cpp:148:45: error: expected primary-expression before ‘)’ token
         fprop_outputs[1].data_ptr<scalar_t>(),
                                             ^
/home/ailab/anaconda3/envs/oscar/lib/python3.7/site-packages/torch/include/ATen/Dispatch.h:12:12: note: in definition of macro ‘AT_PRIVATE_CASE_TYPE’
     return __VA_ARGS__();                          \
            ^
csrc/mlp.cpp:123:3: note: in expansion of macro ‘AT_DISPATCH_FLOATING_TYPES_AND_HALF’
   AT_DISPATCH_FLOATING_TYPES_AND_HALF(inputs[0].type(), "mlp_backward", [&] {
   ^
csrc/mlp.cpp:149:37: error: expected primary-expression before ‘>’ token
         work_space.data_ptr<scalar_t>(),
                                     ^
/home/ailab/anaconda3/envs/oscar/lib/python3.7/site-packages/torch/include/ATen/Dispatch.h:12:12: note: in definition of macro ‘AT_PRIVATE_CASE_TYPE’
     return __VA_ARGS__();                          \
            ^
csrc/mlp.cpp:123:3: note: in expansion of macro ‘AT_DISPATCH_FLOATING_TYPES_AND_HALF’
   AT_DISPATCH_FLOATING_TYPES_AND_HALF(inputs[0].type(), "mlp_backward", [&] {
   ^
csrc/mlp.cpp:149:39: error: expected primary-expression before ‘)’ token
         work_space.data_ptr<scalar_t>(),
                                       ^
/home/ailab/anaconda3/envs/oscar/lib/python3.7/site-packages/torch/include/ATen/Dispatch.h:12:12: note: in definition of macro ‘AT_PRIVATE_CASE_TYPE’
     return __VA_ARGS__();                          \
            ^
csrc/mlp.cpp:123:3: note: in expansion of macro ‘AT_DISPATCH_FLOATING_TYPES_AND_HALF’
   AT_DISPATCH_FLOATING_TYPES_AND_HALF(inputs[0].type(), "mlp_backward", [&] {
   ^
error: command 'gcc' failed with exit status 1

Fails in INSTALL.md

I can not successfully run this git clone --recursive [email protected]:xjli/Oscar.git with the following error message:

Submodule 'coco_caption' ([email protected]:LuoweiZhou/coco-caption.git) registered for path 'coco_caption'
Submodule 'transformers' ([email protected]:huggingface/transformers.git) registered for path 'transformers'
Cloning into '/Github/Oscar/coco_caption'...
Permission denied (publickey).
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.
fatal: clone of '[email protected]:LuoweiZhou/coco-caption.git' into submodule path '/Github/Oscar/coco_caption' failed
Failed to clone 'coco_caption'. Retry scheduled
Cloning into '/Github/Oscar/transformers'...
Permission denied (publickey).
fatal: Could not read from remote repository.

Extracted feature for VQA test-dev set

Thank you for making this excellent work public!
I hope to reproduce your result on vqa task, but some problem occurred about the dataset.
I downloaded the vqa dataset following this instruction: https://github.com/microsoft/Oscar/blob/master/DOWNLOAD.md#datasets, and I didn't find image frcnn feature for test-dev. I'm not sure if there's some mistake during my downloading, or this part just wasn't provided.
If it's not possible to share frcnn features for test-dev, could you please provide some code and basic information about how to extract the features myself, so I can reproduce the work correctly. For example,

  • the features were extracted by which version of faster rcnn?
  • what's the correct structure to save these features? [i.e. for each image, how to organize all rois features&location and bound them with image id or question id]

Really thank you for your kind help!

How can i generate caption for any(my) image?

In coco_caption dataset, train.yaml file shows that train.img.tsv is an image, but i couldn`t found train.img.tsv.

  1. Where can i find the train(val or test).img.tsv?

feature: train.feature.tsv

391895 {"num_boxes": 37, "features": "W6aDPlMKLj6FySc9zdycPyewQj7zsqw/8FjLQE+ABUEspTk+AAAAAEg0Dz8FzHo

  1. Can you explain how you changed original image to feature?

What I want to do is look at the image, caption example, like in your paper Fig5.
3. how can i generate caption for any sample image?
image

Generating label.tsv and feature.tsv from image

Hi guys, I am trying to generate my own features.tsv and labels.tsv for my dataset, but I am stuck at the following:

  1. I have a slight confusion regarding what exactly these features are. Upon reading the "Oscar" paper, I can understand that per bounding box a feature vector is of type (v',z) where v' is P-dimensional (2048) and z is 6 dimensional (position).
    I have a difficulty in understanding where do these 2048 features come from. Initially, I thought that these were from the FC-layer of Faster-R-CNN but upon checking the FC-layer size is 4096 in Faster-R-CNN.

  2. The Oscar paper mentions, " Specifically, v and q are generated as follows. Given an image with K regions
    of objects (normally over-sampled and noisy), Faster R-CNN [28] is used to extract the visual semantics of each region"
    . I have a slight confusion regarding how are these K-regions determined. Are these K-image regions the bound-boxes output by Faster-RCNN?

I am relatively new to this area. Any help would be appreciated.

Unable to Reproduce the Baseline results for NLVR2 task

We tried to reproduce the baselines for the NLVR2 task. But our result was off by a visible margin.

Hardware Specifications

Graphic Card : GeForce RTX 208
CUDA version : 10.2

Command Given

CUDA_VISIBLE_DEVICES=0 python run_nlvr.py -j 4 --img_feature_dim 2054 --max_img_seq_length 40 --data_dir dataset/nlvr2/ft_corpus --model_type bert --model_name_or_path model/base-vg-labels/ep_107_1192087 --task_name nlvr --do_lower_case --max_seq_length 55 --per_gpu_eval_batch_size 8 --per_gpu_train_batch_size 9 --gradient_accumulation_steps 8 --learning_rate 3e-05 --num_train_epochs 20 --output_dir results2 --img_feature_type faster_r-cnn --data_label_type all --train_data_type all --eval_data_type all --loss_type xe --save_epoch -1 --seed 88 --evaluate_during_training --logging_steps -1 --drop_out 0.3 --do_train --weight_decay 0.05 --warmup_steps 10000 --classifier mlp --cls_hidden_scale 3 --num_choice 2 --use_pair

Evaluation Result

[{"epoch": 0, "eval_score": 0.5138928673732455, "best_score": 0.5138928673732455}, {"epoch": 1, "eval_score": 0.624462904611859, "best_score": 0.624462904611859}, {"epoch": 2, "eval_score": 0.6764537381839014, "best_score": 0.6764537381839014}, {"epoch": 3, "eval_score": 0.6975078773990261, "best_score": 0.6975078773990261}, {"epoch": 4, "eval_score": 0.7033801203093669, "best_score": 0.7033801203093669}, {"epoch": 5, "eval_score": 0.7413348610713263, "best_score": 0.7413348610713263}, {"epoch": 6, "eval_score": 0.7463477513606417, "best_score": 0.7463477513606417}, {"epoch": 7, "eval_score": 0.7472071039816671, "best_score": 0.7472071039816671}, {"epoch": 8, "eval_score": 0.7446290461185907, "best_score": 0.7472071039816671}, {"epoch": 9, "eval_score": 0.7464909767974792, "best_score": 0.7472071039816671}, {"epoch": 10, "eval_score": 0.7414780865081638, "best_score": 0.7472071039816671}, {"epoch": 11, "eval_score": 0.7593812661128616, "best_score": 0.7593812661128616}, {"epoch": 12, "eval_score": 0.764394156402177, "best_score": 0.764394156402177}, {"epoch": 13, "eval_score": 0.7691205958178172, "best_score": 0.7691205958178172}, {"epoch": 14, "eval_score": 0.7641077055285018, "best_score": 0.7691205958178172}, {"epoch": 15, "eval_score": 0.7656831853337153, "best_score": 0.7691205958178172}, {"epoch": 16, "eval_score": 0.7593812661128616, "best_score": 0.7691205958178172}, {"epoch": 17, "eval_score": 0.7583786880549985, "best_score": 0.7691205958178172}, {"epoch": 18, "eval_score": 0.7621025494127757, "best_score": 0.7691205958178172}, {"epoch": 19, "eval_score": 0.7653967344600401, "best_score": 0.7691205958178172}]

We get the best score as 0.7691205958178172 while the baseline for this task itself gives 0.7807218562016615.

Another issue that we faced was difference in total number of parameters. In the given code they have given 114611714 as the total parameters but we noticed them to be 114606338.

Thanks in advance!

Pretraining Process

Thanks for your great work!
Could you please provide the codes of the pretraining process.
What should i follow, if i wanna pre-train the model with other dataset?
Thanks again!

Oscar+

Hello, are there any papers or codes published on Oscar+?

Installation Failure

Failing to clone the repo and its submodules, please help.

$git clone --recursive [email protected]:microsoft/Oscar.git
Cloning into 'Oscar'...
[email protected]: Permission denied (publickey).
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.

Additionally for cloning using https "https://github.com/microsoft/Oscar.git", the submodules fail to install giving the same error:

[email protected]: Permission denied (publickey).
fatal: Could not read from remote repository.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.