Cortx challenge: Deep learning model for answering Google's natural questions
This project looks at solving the problem of answering long answer type questions from Google's natural questions dataset using hugging face transformers.
When given a test example with document tokens, question text, and a list of long answer candidates, the model should predict which long answer candidate in the list most accurately answers the question.
The broad idea is to finetune pre-trained hugging face transformer-based question answering models to answer questions based on Wikipedia pages.
- Idea: The model is trained using examples with a positive long-answer candidate and a uniformly sampled negative long-answer candidate from the positive examples. The idea is to make the model learn which candidates are correct predictions and which candidates are incorrect.
- Model finetuned: bert-base-uncased
- Scores:
- long-best-threshold-f1: 0.4608
- long-best-threshold-precision: 0.4157
- long-best-threshold-recall: 0.5169
- Idea: The reason behind a low f1 score would be that the negative long-answer candidate sampled might not be the most challenging candidate against the positive candidate. Thus, the next step would be to sample a hard negative candidate from a distribution that tells us the probability of hardness of each candidate. To get this distribution, we can use the model trained in version 0 that gives us the probability score of a positive candidate to mine hard negative examples.
- Model to be finetuned: deepset/bert-large-uncased-whole-word-masking-squad2
- Scores: pending
System Requirements:
- Works with Python3
- Trains and validates with a single GPU or a distributed system(multiple GPUs) - (Used lambda stack gpu cloud's 2x RTX 6000(24 GB) instance for this project)
Major Packages Required
- transformers~=4.1.1
- datasets~=1.1.3
- apex
- torch~=1.7.1
-
Clone the repository
git clone https://github.com/gunjanpatil/answering_natural_questions.git cd answering_natural_questions
-
Setup:
To download, unzip and install all necessary packages,run setup.sh. This takes around 4-5 minutes, might vary depending on the network's downloading speed.
. setup.sh
After this setup, you will be in the src directory of this repository. -
Training
-
To train a model with train_v1.py, first modify training configurations in configs/args.json file according to your requirements.
Mandatory modifications required:
- project_path: path to your repository You can make changes to other arguments in the args file depending on your needs. You can also train using mixed precision by setting the fp16 argument to true.
-
To launch training of a model using version 1, run the training script train_v1.py with a config file located in configs/args.json as follows:
python3 -m torch.distributed.launch --nproc_per_node=<number_of_gpus_in_system> train_v1.py --configs=configs/args.json > train_v1_logs.txt
Set nproc_per_node value to the number of GPUs in your system. This command also works with one GPU. The training configs used for this project are the same as the ones in configs/args.json. The model was trained on 2 RTX-6000 GPUs. The weights are saved in the output_dir mentioned in the configs file.
-
-
Validation:
-
To run validate using a mdel trained in version 1, run validate_v1.py to generate predictions on the validation dataset as follows:
python3 validate_v1.py -d=../datasets/natural_questions_simplified/v1.0-simplified/nq-dev-all.jsonl -o=<path_to_directory_to_store_predictions_file -m=<model_name_or_path> -w=<path_to_saved_model_weights>
You can also add a --fp16 argument further if your model was trained in mixed precision. On running the validation script, a predictions.json file will be generated in the output path given during execution of the script.
-
Then, finally run google's evaluation script to generate f1 scores.
python3 nq_eval.py --gold_path=../datasets/natural_questions_simplified/v1.0-simplified/nq-dev-all.jsonl.gz --predictions_path=<path_to_predictions.json_file > scores_predictions.txt
All the scores will be written in the scores_predictions.txt file. The scores for the bert-base-uncased model trained in version 1 is stored under predictions/bert-base-uncased/
-