GithubHelp home page GithubHelp logo

akb46mayu / cloud-twitter-analytics-system Goto Github PK

View Code? Open in Web Editor NEW

This project forked from acctforgh/cloud-twitter-analytics-system

0.0 2.0 0.0 532 KB

Cloud-based Twitter Data Analytics System (MapReduce/EMR, HBase, MySQL, AWS, Undertow﴿

HTML 33.43% CSS 3.85% JavaScript 48.84% Java 12.36% Python 1.53%

cloud-twitter-analytics-system's Introduction

Cloud-based Twitter Data Analytics System

The project is divided into two tasks/queries and each query is summarized below. The full details can be accessed through the project requirements file 15-319_619 Cloud Computing.html:

Query 1: simple frontend establishment and authentication by deciphering the encripted/wrapped message with private key. Target Throughput: 15000 rps
Query 2: Text Cleaning and Analysis. This task filters the malformed entries that, for example, has fields missing. In addition, it calculates the sentence sentiment score usign AFINN datasets and does texts filtering. Target Throughput: 5500 rps
Budget constraint including system and data processing costs: 40USD

Files Explained:

Frontend Files:
Cloud.java: Undertow server definition and configuration.
Decode: For Q1. Decodes the encripted message.
HBase2: HBase connection and search methods for HBase-based solutions
JDBC4: Java MySQL connection and search methods for MySQL-based solutions
JDBCSelector.java: round robin style load balancer

Backend:
The backend of the system is is based on MySQL and HBase respectively in Amazon Web Services (AWS). Both of them are loaded with the same data to compare and analyze the performance of SQL and NoSQL databases. HBase is configured from AWS EMR(Elastic MapReduce) clusters while MySQL is installed directly into AWS EC2 instances. Both of the two solutions should be configured and optimized to have stable support for heavy workloads. This is done through system cache and memory optimization, duplicating MySQL nodes,multi-threading/connection pool, etc. The scripts used for data loading is included in the repo.

  1. load.sql and loadall.sql: SQL commands for loading filtered output into MySQL database
  2. loadbase.py: python file for loading filtered output into HBase. HDFS bulk load commands are also used but not shown in the repo.

Query 2 ETL Codes

  1. JsonFilter.java: This script cleans the original 1TB of Twitter API data and filters out malformed entries
  2. batch process.py: automated script for running the filtering over the large datasets
  3. SimpleTweet.java, Tweet.java: simplifying data objects for filtering assistance.
  4. StringToDataStructure.java: converts JSON strings to Java objects for filtering purpose
  5. SentiAndCensor.java: Defines methods for calculating Twitter sentiment score and censoring banned words (mostly vulgarities)

cloud-twitter-analytics-system's People

Contributors

acctforgh avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.