rubenpt91 / mp-docvqa-framework Goto Github PK

View Code? Open in Web Editor NEW

51.0 51.0 10.0 7.95 MB

License: MIT License

Python 100.00%

mp-docvqa-framework's People

Contributors

Stargazers

Watchers

Forkers

uakarsh nilesh-c carlos-vinicios phc2017002 cszjing habibzadeh estkae mach-12 htplex yizhong520

mp-docvqa-framework's Issues

Each page has all the page tokens

I'm not sure this is an issue or a misunderstanding on my part, but during the data preparation phase, each page contains all the page_tokens

page_tokens = ''.join([f"[PAGE_{i}]" for i in range(self.page_tokens)]) # Multiple representation
input_text = [f"{page_tokens}: question: {question[batch_idx]} context: {c}" for c in context[batch_idx]]

Why not just:
page_tokens = [f"[PAGE_{i}]" for i in range(self.page_tokens)] # Multiple representation
input_text = [f"{page_tokens[i]}: question: {question[batch_idx]} context: {c}" for i, c in enumerate(context[batch_idx])]

https://github.com/rubenpt91/MP-DocVQA-Framework/blob/master/models/HiVT5.py#L649

Is freezing the VisualEmbeddings weights done on purpose ?

Is freezing the VisualEmbeddings weights done on purpose ?
I know the model is loading the custom weights from Huggingface but still ? This cannot be changed via the config file and has to be hard coded

https://github.com/rubenpt91/MP-DocVQA-Framework/blob/master/models/HiVT5.py#L38

Bug(?) in the SPDocVQA class

Hi,
Maybe I misused the SDK or misunderstood something, but I think there are some bugs in the SPDocVQA class.

First,

MP-DocVQA-Framework/datasets/SP_DocVQA.py

Line 30 in 7939f52

context = ' '.join([word.lower() for word in record['ocr_tokens']])

Should the context be a list here?

context = [' '.join([word.lower() for word in record['ocr_tokens']])]

Second, consequentially change

MP-DocVQA-Framework/datasets/SP_DocVQA.py

Line 80 in 7939f52

start_idx = context.find(answer)

            start_idx = context[0].find(answer)

Regards,
Cat

Slow training - Partial solution ?

Hello,

If you want partially improve the slow training: data preparation (tokenization and image feature extraction) should be done at the beginning once outside of the training loop
Just like how they do it for layoutlmv2/3 https://huggingface.co/docs/transformers/main/en/model_doc/layoutlmv3#transformers.LayoutLMv3Processor

On a different task (not VQA) by implementing those changes (writing the dataset script as a huggingface dataset so I can use the .map function and then doing the data processing at the beginning ) I managed a 3x speedup in the training phase

Add a License

Test dataset without labels of "answer_page_idx"

Test dataset without labels of "answer_page_idx". How can I use model to predict the answer page from the module of APPM? There are limitations in 20 pages limit. When I input test dataset to HiVT5, I need to prepare the test dataset as in fine-tuning stage with 20 pages limit which includes the information of answer page, But labels in test dataset is unseen. I can not upload my results to the website and get the performance of ANLS or APPA.

Mismatch of imdb names

Hi,
I am trying to do inferences on the spvqa challenge,
The model tries to load "new_imdb_test.npy" from here,

MP-DocVQA-Framework/datasets/SP_DocVQA.py

Line 13 in 7939f52

 data = np.load(os.path.join(imbd_dir, "new_imdb_{:s}.npy".format(split)), allow_pickle=True) 

but the downloaded imdbs [1] only provides "imdb_test.npy":
[1]https://datasets.cvc.uab.es/rrc/DocVQA/Task1/spdocvqa_imdb.zip
Are they the same files? If not, where can I find the new_imdb_test.npy file?

Thanks,
Cat.

rubenpt91 / mp-docvqa-framework Goto Github PK

mp-docvqa-framework's People

Contributors

Stargazers

Watchers

Forkers

mp-docvqa-framework's Issues

Each page has all the page tokens

Is freezing the VisualEmbeddings weights done on purpose ?

Bug(?) in the SPDocVQA class

Slow training - Partial solution ?

Add a License

Test dataset without labels of "answer_page_idx"

Mismatch of imdb names

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs