Comments (9)
Thanks for the attention. Actually, we used some optimization techniques the same as layoutlmv2. You can refer to the paper. At the same time, based on the StructuralLM model, we still do some continue pre-training on the DocVQA data, mainly to add 2D-position on the question. This can refer to the method of the champion in the CVPR‘20 challenge. We will consider making this code and model open-source in the future.
from alicemind.
Thanks for the attention. Actually, we used some optimization techniques the same as layoutlmv2. You can refer to the paper. At the same time, based on the StructuralLM model, we still do some continue pre-training on the DocVQA data, mainly to add 2D-position on the question. This can refer to the method of the champion in the CVPR‘20 challenge. We will consider making this code and model open-source in the future.
I have the same problem with @Cppowboy, using the released weight can not reach 83.94. Without any trick, how many anls can DocVQA reach?
from alicemind.
Thanks for the attention. Actually, we used some optimization techniques the same as layoutlmv2. You can refer to the paper. At the same time, based on the StructuralLM model, we still do some continue pre-training on the DocVQA data, mainly to add 2D-position on the question. This can refer to the method of the champion in the CVPR‘20 challenge. We will consider making this code and model open-source in the future.
I have the same problem with @Cppowboy, using the released weight can not reach 83.94. Without any trick, how many anls can DocVQA reach?
Thanks for the attention. Just using the released weight can reach 78+ ANLS on the test set with some post-processing, which is always used on the data set. As we mentioned above, we will consider making this continue pre-training code and model open-source in the future.
from alicemind.
Thanks for the attention. Actually, we used some optimization techniques the same as layoutlmv2. You can refer to the paper. At the same time, based on the StructuralLM model, we still do some continue pre-training on the DocVQA data, mainly to add 2D-position on the question. This can refer to the method of the champion in the CVPR‘20 challenge. We will consider making this code and model open-source in the future.
I have the same problem with @Cppowboy, using the released weight can not reach 83.94. Without any trick, how many anls can DocVQA reach?
Thanks for the attention. Just using the released weight can reach 78+ ANLS on the test set with some post-processing, which is always used on the data set. As we mentioned above, we will consider making this continue pre-training code and model open-source in the future.
thanks
from alicemind.
Thanks for the attention. Actually, we used some optimization techniques the same as layoutlmv2. You can refer to the paper. At the same time, based on the StructuralLM model, we still do some continue pre-training on the DocVQA data, mainly to add 2D-position on the question. This can refer to the method of the champion in the CVPR‘20 challenge. We will consider making this code and model open-source in the future.
I have the same problem with @Cppowboy, using the released weight can not reach 83.94. Without any trick, how many anls can DocVQA reach?
Thanks for the attention. Just using the released weight can reach 78+ ANLS on the test set with some post-processing, which is always used on the data set. As we mentioned above, we will consider making this continue pre-training code and model open-source in the future.
Can you briefly introduce the continue pre-training and QG?How much benefit can bring for each?
from alicemind.
The continue pre-training on the DocVQA set can bring about 2.0+ ANLS. train set and validation set. QG can bring about 2.4+ANLS. In addition, merge the train set and dev set can also bring 1.8+ANLS. Tips, the results on the test set are greatly affected by the parameters, which may lead to a difference of 1+ANLS
from alicemind.
The continue pre-training on the DocVQA set can bring about 2.0+ ANLS. train set and validation set. QG can bring about 2.4+ANLS. In addition, merge the train set and dev set can also bring 1.8+ANLS. Tips, the results on the test set are greatly affected by the parameters, which may lead to a difference of 1+ANLS
How much data is used for the continue pre-training?how much data for QG?
from alicemind.
The continue pre-training on the DocVQA set can bring about 2.0+ ANLS. train set and validation set. QG can bring about 2.4+ANLS. In addition, merge the train set and dev set can also bring 1.8+ANLS. Tips, the results on the test set are greatly affected by the parameters, which may lead to a difference of 1+ANLS
How much data is used for the continue pre-training?how much data for QG?
The data for continue pre-training is all the DocVQA data set, and the data for QG is more than one million.
from alicemind.
The continue pre-training on the DocVQA set can bring about 2.0+ ANLS. train set and validation set. QG can bring about 2.4+ANLS. In addition, merge the train set and dev set can also bring 1.8+ANLS. Tips, the results on the test set are greatly affected by the parameters, which may lead to a difference of 1+ANLS
How much data is used for the continue pre-training?how much data for QG?
The data for continue pre-training is all the DocVQA data set, and the data for QG is more than one million.
想问下模型83.94结果是否有使用十折
from alicemind.
Related Issues (20)
- When the mPLUG-2 model can be released? HOT 2
- Fairness of SOTA comparison in mPLUG-2 HOT 2
- RuntimeError: gather(): Expected dtype int64 for index
- There might be sth wrong in this file mPLUG/videocap_mplug.py
- Fine-tuning video captions HOT 1
- Inference of image captioning on single image HOT 4
- how to get the pre-trained model "ViT-L-14.tar"
- how to get ued model A Unified Pretraining Framework For Passage Ranking And Expansion
- “mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video”代码是否会开源?
- Grounding checkpoint evaluation results
- “VECO 2.0: Cross-lingual Language Model Pre-training with Multi-granularity Contrastive Learning”代码是否会开源?
- Missing partial code and files of gqa for VQA in mPLUG
- Zero-Shot Video Captioning script issues HOT 1
- Could you upload StructuralLM to HuggingFace ?
- The logprob of image captioning result of mplug is very small
- SDCUP
- SDCUP训练问题
- 表格数据集
- mplug中两处代码错误问题
- can i pretrain mPLUG model with my own dataset
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from alicemind.