GithubHelp home page GithubHelp logo

bradly-plh / nlp_bakeoff_vac_pandas Goto Github PK

View Code? Open in Web Editor NEW

This project forked from learn-co-students/nlp_bakeoff_vac_pandas

0.0 0.0 0.0 295 KB

License: GNU General Public License v3.0

Jupyter Notebook 100.00%

nlp_bakeoff_vac_pandas's Introduction

NLP Bake-Off

Please read the instructions!

Obama Inauguration Audience


Context

Is it possible to identify the political party of U.S. Presidents from their inauguration speeches? That is your challenge in this bake-off!

U.S. Presidents have represented five different political parties in unequal numbers:

  • Democrats
  • Democratic-Republicans
  • Federalists
  • Republicans
  • Whigs

Of the 58 inaugural speeches, about 40% were given by representatives of the Democratic Party, and about 40% were given by representatives of the Republican, with the remainder split among the other three parties named above.


Your Task

In this repo are three pickled pandas Series: X_train, y_train, and X_test. The contents of X_train and X_test are the speeches; the contents of y_train are the political parties of the presidents who gave the corresponding speeches in X_train. Your job is to build a model to predict the political parties of the presidents who gave the speeches contained in X_test.

Preprocessing

Note that you will need to do some preprocessing of the data. The speeches in X_train (and X_test) are mostly clean in the sense of containing little else beyond the English words of the speeches themselves. But you may want to do some further cleaning. And you will definitely want to consider strategies like:

  • Eliminating capital letters and punctuation;
  • Using a stemmer or a lemmatizer;
  • Tokenizing; and
  • Employing CountVectorizer() or TfidfVectorizer()

before you get to modeling.

You may use any of the modeling techniques we have explored that apply to classification problems, including any of the modeling classes contained in sklearn.naive_bayes.

Your final model will be evaluated according to its accuracy.

Get started

To begin the bakeoff:

  1. Fork this repo
  2. Clone your fork onto to your local computer
  3. cd into the repo folder
  4. Create a notebook and get to baking!

Format Your Predictions

Once you have generated predictions for y_test.pkl, you will need to transform the array of numbers back into an array of letters. To do this you can use .inverse_transform(). If you are unfamiliar with this process, an example has been provided below:

from sklearn.preprocessing  import LabelEncoder

array =  ['a', 'b', 'c', 'd']
encoder = LabelEncoder()
encoded_array = encoder.fit_transform(array)
print(encoded_array)
>>> [0 1 2 3]

decoded_array = encoder.inverse_transform(encoded_array)
print(decoded_array)
>>> ['a' 'b' 'c' 'd']

After you have transformed your predictions into an array of letters, please pickle your predictions and label the file using you and your teammate initials seperated by an underscore followed by '_predictions.pkl'.

For example, if Max and Greg were a team their pickled predictions would be called 'mb_gd_predictions.pkl'.

Validate and Submit Your Predictions

Validate

Once you have saved your predictions to file, please open the validate_predictions notebook, and follow the instructions to ensure you predictions are the correct shape and datatype!

Submit

Once you pass all of the tests in the validation notebook send you predictions to the thread on Slack.

Feel free to also push your work to your fork, and share your repo with the group.


Note: It would be possible for you to search for the speeches that are contained in X_test and discover who gave them (and also what the speakers' political parties were). For the sake of this bake-off we of course cannot allow that and will consider any such behavior as cheating. You are on your honor!


nlp_bakeoff_vac_pandas's People

Contributors

gadamico avatar joelsewhere avatar j-max avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.