twitterstreamapp's Introduction

TwitterStreamApp

The Twitter API provides a streaming endpoint that delivers a roughly 1% random sample of publicly available Tweets in real-time.

Prerequisites:

Docker with Linux containers.

Run Project: from the console: docker compose build docker compose up

Without docker: Need to install RabbitMq software. And change appsettitng : "Server": "rabbitmq", to "Server": "localhost"

Structure: TwitterStreamV2App : App to collect Twitter stream and publish collected tweets to RabbitMQ message queue. RabbitMQ : intermediate queue message broker to keep messages in between. TwitterMassagesConsumerApp : Reads messages from RabbitMq and store in memory. Publish every 50 messages reads and provide statistics.

Results:

Total Tweets count
Top 10 HashTags
Count each HashTag occurrence
Percentile of HashTag occurrences vs all Tweets received.

How to Scale the app to consume 5700 tweets/second:

Add load balancing service that will distribute stream between N container with TwitterStreamV2App.
Use Kubernetes to spawn N containers of TwitterStreamV2App that will push messages into a RabbitMQ or Similar consider replacing with Kafka.
Spawn N TwitterMassagesConsumerApp in Kubernetes to effectively consume messages and store them to DynamoDB or similar concurrent DB.

How to make results persistent:

Each TwitterMassagesConsumerApp will need to save data into a NoSQL database like DynamoDb from AWS.
NoSQL Db provides a lock implementation that can be used to update concurrent writes.
To do that replace the implementation of StorageRepository with a NoSQL one.

TODO: Add Unit and integration Tests to cover functionality with testing.

Recommend Projects

tsviet / twitterstreamapp Goto Github PK