This repository is originally from https://github.com/MILVLG/mcan-vqa. You can read old README from original authors at https://github.com/suakow/mcan-vqa-thai/blob/master/old_README.md.
This repository is a part of term project of NLP course at Chulalongkorn University, semester 2/2020. This project is about VQA in Thai. We chosed the model https://github.com/MILVLG/mcan-vqa and modify language understanding part by replaced Embedding and LSTM layers with WangchanBERTa from VISTEC-AI(https://github.com/vistec-AI/thai2transformers , https://arxiv.org/abs/2101.09635)
We used VQA 2.0 as dataset by selected 8,000 question-answer pairs as training set and 2,000 pairs as test set from original VQA 2.0 validation set. All of selected 10,000 question-answer pairs were translated to Thai by Google Translate and manually verified by our group.
About the image feature, we used original extracted feature from original repository which able to download by this link
You require Google Colaboratory-Pro with GPU enabled for training and inference. If you setup you own environment, you can download dependencies required by this project with requirement.txt
$ pip install -r requirements.txt
And you have to download image feature by this link
The question-answer pairs for training and inference (test set) are included in this repository. You can find them at this link.
If you have done above steps. you can fine-tune the model for Thai language by link by Google Colab or your machine. You have to change the image feature path before running. You can click this link to open with Google Colab directly.
For inference, you have to download original image from this link for display images while inference.
You have to setup as same as training step before running inference. The inference notebook is in this link or click this link to open in Google Colab directly. You may have to change model weight file path(click here to see the code) and original file path(click here to see the code) before running.