GithubHelp home page GithubHelp logo

lleiou / harry-potter-social-network Goto Github PK

View Code? Open in Web Editor NEW

This project forked from tzstatsads/spr2016-proj5-grp5

2.0 2.0 0.0 16.75 MB

Please Check Out the Project Website

Home Page: http://lleiou.github.io/4249FinalProject/index.html

Python 0.24% HTML 91.73% CSS 0.15% JavaScript 0.02% Jupyter Notebook 7.86%

harry-potter-social-network's Introduction

title author date output
When Text Mining meets Harry Potter series
Chenlu Ji, Hexiu Ye, Yueying Teng, Yusen Wang, Ao Liu
04/27/2016
html_document

Project Webpage: HP Social Network

Contents:

1 Introduction

In this project, we explored the Harry Potter series using text mining techniques and visualized the network of the leading characters. Moreover, an interactivce webpage was created to mimic the job of the sorting hat in the novel.

2 Raw Text Processing

  • Name Entity Recognization

The complete novel was downloaded from: https://github.com/abishekk92/potter/tree/master/dataset First,each novel was read by Python line by line to a new text file. Following this, a dictionary containing each character's fullname and nicknames called ep_nick was created for every book. These fullnames were detected by using a package called nltk in Python and put into a list that is combined with each character's nicknames that we obtained from the Internet.

3 Text Mining

  • Obtain Summary Using PageRank

After removing all the stopwords, we calcualted the cosine similarity between each pair of sentences and created a matrix containing all the indexed sentences to storet the cosine similarity obtained before. This matrix was the used as the input that was fed into the Pagerank algorithm in NetworkX Python. The top ten sentences with the highest Pagerank score was used as our summarization.

  • WordCloud

We created worldcoulds for each novel. In order to make the wordcloud more meaningful, apart from removing all the stopwords, we also deleted the names of the three main characters: Harry, Ron and Hermione, in all situations.

Book1:

Book7:

4 Network Building

  • Building Network using Adaboost

We extracted two features, polarity and subjectivity, from the processed text file using sentiment analysis. Furthermore, a co-coccurrence matrix was procuded for each novel that counts the the number of occurrence of each pair of characters. The two features were normalized using the entries in the co-occurrence matrix and these features were taken by Adaboost to classify between characters with positive realationships and those with negative relationships.

5 Sorting Hat

  • The Sorting Hat

We built a multi-class classifier that performs the job of the Sorting Hat in the novel. We parsed the following personal information: name, gender, eyecolor, hair color and House, for each character of our age who attended Hogwarts as features. Also we used random forest as classifier to find the House that corresponds to the input.

At last, we built a webpag to present everthing we obatined so far.

6 Refrence

harry-potter-social-network's People

Contributors

frapoleon avatar lleiou avatar chenluji avatar yueying-dev avatar hy2450 avatar

Stargazers

Gayatri avatar jenn ❥  avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.