NOTE This repository is abandoned at the moment until the Twitter API is more accessible
An analysis of Kanye's lyrics in correlation with his Twitter feed This project explores Kanye West's musical abilities, lyrics, meanings, in correlation with some of the strange things he has said on social media. As of late, Kanye has been in deep water, and though this is not the first time, it would be interesting to find any correlation between the two medias. The end goal of this project is to find correlations between Kanye's lyrics, and the text on the Twitter feed.
After my senior year in college, I've grown an interest in data mining, and specifically text analysis. I've wanted to do a more substantial project, expanding on the skills I've gained on the final projects I worked on in university. These skills include basic text and sentiment analysis, working with the Twitter API, gathering data, preprocessing data, and analyzing data. All of this is in correspondence with the KDD process.
The main idea of this project is to extrapolate correlations between Kanye's creative, and concrete selves. The person he portrays in his music may or may not correlate directly with what he says on social media, and I want to explore this idea further.
This project can be broken up into 3 major parts.
- Gathering lyrical data
- Gathering social media data
- Finding correlations
Each step has a clear cut process of gathering data, preprocessing data, and analyzing the data. The final step with do both of the first two steps together.
The process of analysis can be broken down into three main steps.
Lyrical data will be gathered using Genius's API. Lyrical data of all Kanye's music will be gathered. Specifically:
- Album
- Song Title
- Year released
- Lyrics
The main item to be analyzed will be the lyrics. Some attributes to be mined from the data include:
- Word count
- Most used words
- Themes
This will be found using more advanced text mining techniques like sentiment analysis, named entity recognition, and so on.
The lyrical data is one main half of the data.