GithubHelp home page GithubHelp logo

evanm31 / twittr Goto Github PK

View Code? Open in Web Editor NEW
16.0 3.0 7.0 8 KB

R Shiny app for tweet analysis

R 100.00%
rshiny twitter r natural-language-processing text-analysis text-mining markov-chain latent-dirichlet-allocation

twittr's Introduction

twittR: Bag-of-Words Tweet Analysis with R

This R Shiny app creates a variety of informative figures based off of the bag-of-words distribution of a corpus of Twitter data collected from either a user or a hashtag, including frequency barplots and wordclouds for phrases 1-4 words long, a word correlation table and plot, topic modeling and a topic distribution heatmap, and a random tweet generator. To try the app, visit this link.

alt text alt text alt text alt text alt text alt text

Acknowledgements

First off I would like to thank Gordon Anderson at UMass Amherst for giving me the idea to create this through an assignment in his Introduction to Data Science class, as well as providing code to access the Twitter API, extract tweets, and clean the corpus. RDataMining's Twitter Data Analysis with R presentation inspired the word correlation plot and stem completion functions, and I borrowed much of the code to compute the optimal number of topics from David Meza's excellent post Topic Modeling in R. Drew Schmidt and Christian Heckendorf's Guide to the ngram Package also proved to be a great resource to learn the library from. Last but not least, I would be remiss to not mention the numerous StackOverflow/similar pages that I consulted along the way to make this app a reality. Many thanks to all of you!

Description

The information provided by this app is conditioned on the bag-of-words model of text representation, where grammar and word order are ignored in favor of tracking the frequency words appear among the corpus. Though simplistic in nature, this perspective quickly elucidates valuable information about what words are most important to the corpus through frequency, correlation, and n-gram analysis, and offers easy applications to approaches like topic modeling that help us to understand the text as whole more readily.

The latent Dirichlet allocation model used in this app is based on the idea that each document of the corpus (or tweet, in this case) belongs to some small number of topics, the topics themselves being composed of a small number of words that are used most frequently in each one. The model is constructed multiple times to determine the maximum log liklihood that each document belongs to each topic, which is based on the probability that each "topic" word would appear in the document given its frequency (an aspect of the model that lends itself nicely to bag-of-words style text classification). Each document is assigned a probability of belonging to each topic once the optimal model has been fitted, which is used in the app to generate the heatmap of tweets and topics.

Another interesting aspect of the app is the Markov chain-based tweet generation system, which is itself based on strings of n-grams (or n length sequences of words) from the corpus. A Markov chain is a random, probabilistic process that is able to make predictions of future events based solely on the present state of a system; in the app, this is utilized as sequences of n-grams being linked together based on the probability that each would appear next to each other in the corpus, using these values as the states of the Markov chain. Since the probabilities themselves are calculated based on the frequency of the n-1 length phrases that appear after each n-gram,this multiplicity-based approach complements the bag-of-words model used and greatly extends the possibilities for analysis that the rudimentary approach offers.

twittr's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

twittr's Issues

Error

77: FUN [C:\Users\ADMIN\Documents\Late/server.R#149]
76: lapply
75: sapply
73: getNGramStr [C:\Users\ADMIN\Documents\Late/server.R#149]
72: observeEventHandler [C:\Users\ADMIN\Documents\Late/server.R#190]
1: shiny::runApp

Can anyone please help with this error

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.