GithubHelp home page GithubHelp logo

jeopardy--nlp's Introduction

Jeopardy--NLP

Problem Statement:

 
Build a model to predict the value of the question in the TV game show  “Jeopardy!”. 
Data can be downloaded from this link: https://www.kaggle.com/tunguz/200000-jeopardy-questions 

Data description 
▪ 'category' : the question category, e.g. "HISTORY" 
▪ ‘value' : $ value of the question as string, e.g. "$200" (Note - "None" for Final Jeopardy! and Tiebreaker questions) 
▪ 'question' : text of question (Note: This sometimes contains hyperlinks and other things messy text such as when there's a picture or video question) 
▪ 'answer' : text of answer 
▪ round' : one of "Jeopardy!","Double Jeopardy!","Final Jeopardy!"  or "Tiebreaker" (Note: Tiebreaker questions do happen but they're very rare (like once every 20 years)) 
▪ 'show_number' : string of show number, e.g '4680' 
▪ 'air_date' : the show air date in format YYYY-MM-DD 

Data Preparation

1- First 100k samples from the datatset were taken.
2- Only samples from Jeopardy round were selected.
3- Redundant features like 'round','show_number','airdate' were dropped.
3- Preprocess data : stopwords removal,stemming,lemmatization,lower-casing etc.
4- Depending upon binary/ multi class classification -> A class balanced dataset was prepared.

Approach:

1- Important features are: Question, Ans and Category
2- Using these three features -> value is predicted
3- To generate word embeddings -> fasttext model is fine-tuned on pretrained wiki news dataset.
   Pre-trained embeddings downloaded from: https://fasttext.cc/docs/en/english-vectors.html.
5- To generate sentence vectors from these word embeddings -> concatenated power means method is followed.
   Pmeans paper: https://arxiv.org/pdf/1803.01400.pdf
6- Sentence vectors of "ques","ans" and "category" were concatented together to generate final feature matrix.
7- Using these feature matrix-> Various ML and DL models were trained.

Results

Case A: Binary Classification

Baseline for binary classification: https://github.com/yashajoshi/Predicting-Value-of-Jeopardy-Questions

Best reported metric are:

Fasttext with pmeans->
Best results by XG Boost classifier with hyperparameters:
learning_rate = 0.1,max_depth=2,n_estimators= 140,objective="binary:logistic"

Case B: Multi-class Classification

3 Classes:

5 Classes:

Best performance is given by XG Boost classifier with hyperparameters:
learning_rate = 0.1,max_depth=2,n_estimators= 140,objective="multi:softmax" 

jeopardy--nlp's People

Contributors

mayank05942 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.