rubenpt91 / mp-docvqa-framework Goto Github PK
View Code? Open in Web Editor NEWLicense: MIT License
License: MIT License
I'm not sure this is an issue or a misunderstanding on my part, but during the data preparation phase, each page contains all the page_tokens
page_tokens = ''.join([f"[PAGE_{i}]" for i in range(self.page_tokens)]) # Multiple representation
input_text = [f"{page_tokens}: question: {question[batch_idx]} context: {c}" for c in context[batch_idx]]
Why not just:
page_tokens = [f"[PAGE_{i}]" for i in range(self.page_tokens)] # Multiple representation
input_text = [f"{page_tokens[i]}: question: {question[batch_idx]} context: {c}" for i, c in enumerate(context[batch_idx])]
https://github.com/rubenpt91/MP-DocVQA-Framework/blob/master/models/HiVT5.py#L649
Is freezing the VisualEmbeddings weights done on purpose ?
I know the model is loading the custom weights from Huggingface but still ? This cannot be changed via the config file and has to be hard coded
https://github.com/rubenpt91/MP-DocVQA-Framework/blob/master/models/HiVT5.py#L38
Hi,
Maybe I misused the SDK or misunderstood something, but I think there are some bugs in the SPDocVQA class.
First,
MP-DocVQA-Framework/datasets/SP_DocVQA.py
Line 30 in 7939f52
context = [' '.join([word.lower() for word in record['ocr_tokens']])]
Second, consequentially change
MP-DocVQA-Framework/datasets/SP_DocVQA.py
Line 80 in 7939f52
to
start_idx = context[0].find(answer)
Regards,
Cat
Hello,
If you want partially improve the slow training: data preparation (tokenization and image feature extraction) should be done at the beginning once outside of the training loop
Just like how they do it for layoutlmv2/3 https://huggingface.co/docs/transformers/main/en/model_doc/layoutlmv3#transformers.LayoutLMv3Processor
On a different task (not VQA) by implementing those changes (writing the dataset script as a huggingface dataset so I can use the .map function and then doing the data processing at the beginning ) I managed a 3x speedup in the training phase
Please can you add a license to this project? Right now without a license it defaults to all rights reserved.
Test dataset without labels of "answer_page_idx". How can I use model to predict the answer page from the module of APPM? There are limitations in 20 pages limit. When I input test dataset to HiVT5, I need to prepare the test dataset as in fine-tuning stage with 20 pages limit which includes the information of answer page, But labels in test dataset is unseen. I can not upload my results to the website and get the performance of ANLS or APPA.
Hi,
I am trying to do inferences on the spvqa challenge,
The model tries to load "new_imdb_test.npy" from here,
MP-DocVQA-Framework/datasets/SP_DocVQA.py
Line 13 in 7939f52
Thanks,
Cat.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.