GithubHelp home page GithubHelp logo

laranea / 7th-annual-harry-potter-conference-harry-potter-by-the-words-a-data-driven-approach Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 609 KB

Harry Potter by the Words: A Data-Driven Approach

License: Apache License 2.0

Python 61.82% Jupyter Notebook 38.18%

7th-annual-harry-potter-conference-harry-potter-by-the-words-a-data-driven-approach's Introduction

7th Annual Harry Potter Conference 2018: Harry Potter by the Words: A Data-Driven Approach Project

Goal: Find lexical diversity scores, sentiment scores, and more in each book and compare them throughout the series using Python

Data Source:

J.K. Rowling's books! Via text files. See online source: http://www.glozman.com/textpages.html

Main Libraries used:

NLTK (Natural Language Toolkit), Textstat

Analysis 1 & 2

Lexical Diversity (unique words, Automated Readability Index (ARI), Average Word Lengths, and Fine Grained Words (W>15). In the process, define the Fine Grained Words as Potter-Specific or Not-Potter Specific (labeled with Excel, seen in dataset). Lastly, compare frequent unigrams, bigrams, and trigrams and see which characters are mentioned most together in each book.

Analysis 3 (different than in presentation, which I expanded further)

Find sentiment scores (positive, negative, neutral, and compound) using Vader.sentiment library in NLTK. This was the most challenging since vader is used primarily for analyzing sentiment of social media text, a.k.a line by line. Courtesy of the good people of stackoverflow, I was able to figure out a way to seperate the text by characters into different list, join and convert them into string into one list, and then run vader sentiment to get an overall sentiment score. Thankfully, after many trial and error, it worked!

Side note: This could be done sentence by sentence, but the sentiment score is not accurate since each book gets longer and longer with more sentences. Essentially, the sentiment gets higher scores (positive and negative) as the series progresses, which is not the case.

Tableau Public for Corresponding Visualizations: https://public.tableau.com/profile/chantel.diaz#!/

7th-annual-harry-potter-conference-harry-potter-by-the-words-a-data-driven-approach's People

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.