GithubHelp home page GithubHelp logo

karthikchaganti / stockafolio-insight-project Goto Github PK

View Code? Open in Web Editor NEW
3.0 1.0 1.0 41.63 MB

A Realtime Stock Portfolio Manager built using Apache Distributed Technologies!

Home Page: http://www.insightdata.karthikchaganti.com

Python 58.70% HTML 38.75% Shell 1.58% JavaScript 0.96%
spark sparkstreaming cassandra kafka

stockafolio-insight-project's Introduction

Stockafolio

Insight Data Engineering Fellowship, New York City

Your Portfolio at a glance!

What is Stock-a-folio?

  • A highly scalable, low-latent and always available stock portfolio managmeent application.
  • It can handle 10 million users and a throughput of 20,000 trades per second from different users! Yet your portfolio is updated in the blink of an eye.
  • The website hosting the application contains two dashboards, one for the Users and one for the Trading firm to look at the overall trading patterns of all the traders in a day and get some insights to serve them better.
  • Built using all distributed open-source technologies.

Alt text

The Data Pipeline

Alt text

Motivation

  • Back in the early 2000s, the stock portfolio updation was very slow. But today, instantaneous updation is vital as quick decisions make a faster and profiting trade. That is what I tried to replicate in this project.

The Stack

  • Built on Amazon Web Services EC2 - m4.4x large clusters total containing 8 Nodes with a total of 512 GB of RAM.
  • 1 TB of Magnetic Storage has been used.
  • All the code is written in Python.
Tecnology Purpose
Python Data Generator
Apache Kafka Ingestion
Apache Spark Batch Processing Engine
Apache Spark Streaming Stream Processing Engine
Apache Cassandra Data Storage
Python Flask Server-side Web App
Hadoop HDFS Batch File System
Amazon S3 Source of Truth

Data Schema

  • Data is engineered using original S&P 500 stocks and then gaussian distribution is applied to it to simulate change in every second. The script is written in python.

  • In order to simulate high scalability, a total of 10 million users are generated who keep trading on 1-500+ of these stocks randomly. Alt text

  • The throughput is set around 20,000 trades/sec i.e. those many messages are generated by the script every second and pushed into both Kafka and S3.

Stream Pipeline

  • Kafka serves as the beginning of the pipeline which ingests the trades and then outputs them to the Spark Streaming engine. For higher throughput, 4 partitions are used.
  • The Spark Stream engine collects the messages from Kafka in 1 second window batches and then processes them based on the queries selected to be displayed on the website.
  • The processed data is stored on cassandra. Flask is python api that serves as server-end technology to fetch the queries and show them on the website.
  • Inorder to improve the latencies on the database end, SS Tables count was reduced by allocating more I/O bandwidth by increasing the nodes in the cluster. Memtables (JVM Heap) was increased in order to retain more data before writing to the disks to improve the reads and reduce the disk seeks.
  • The latency of read/write from ingestion into Kafka and writing to database is calculated to be 0.27ms/0.015ms.

Batch Pipeline

  • The trades saved on the S3 are read into HDFS are processed by spark and pushed onto Cassandra. The batch serves as source for the Firm Dashboard which updates once in a day at 4 PM.
  • Batch can also be served as a fault-tolerant system to the stream pipeline.

Website

  • The website displays two dashboards, one for the user and the other for the trading firm. The following are displayed: Alt text Alt text

Analytics

stockafolio-insight-project's People

Contributors

karthikchaganti avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar

Forkers

sbnair

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.