This project is based on active Kaggle competition named as “Contradictory-My Dear Watson”.
In this project, we’re classifying pairs of sentences (consisting of a premise and a hypothesis) into three categories - entailment, neutral and contradiction. (0 for entailment, 1 for neutral, 2 for contradiction)
Premise-hypothesis pairs in fifteen different languages, including: Arabic, Bulgarian, Chinese, German, Greek, English, Spanish, French, Hindi, Russian, Swahili, Thai, Turkish, Urdu, and Vietnamese.
Training Samples: 12120, Testing Samples: 5195
- Traditional NLP (TF_IDF and Glove embeddings)
- Laser-Embedding
- Using Transformers (BERT and ROBERTa)
- Using Simple Transformer Library (XLMRoberta)
- The predictions for different models were submitted on Kaggle leaderboard
- The testing accuracy of 92.66% was achieved for XLMRoberta-large-xnli model with the Kaggle rank of 12.
- https://www.kaggle.com/c/contradictory-my-dear-watson/data
- https://simpletransformers.ai/docs/installation/
- https://huggingface.co/models
- https://engineering.fb.com/2019/01/22/ai-research/laser-multilingual-sentence-embeddings/
- “Unsupervised Cross-lingual Representation Learning at Scale” at arXiv:1911.02116