GithubHelp home page GithubHelp logo

large-file-stream's Introduction

large-file-stream

A POC application to read large files with stream processing approach

implementation details

two approaches implemented:

  • Stream: simple buffering with backpressure and aggregating with akka actors
  • Kafka:Seperating reading and processing with kafka

how to run

tech stack

  • scala 2.13
  • sbt 1.4.7
  • java 11
  • docker compose

1- Generate a large csv file in given format

1, "DE"
2, "NL"
1, "US"
3, "US"

You can run largecsvgenerator.go by to create a file of 10G.

WARNING A file named largefile.csv must be inside projects root folder.

1- to run stream solution

> export SOLUTION="stream"
> sbt run

2- to run kafka solution

start kafka using docker-compose

> docker-compose up

then

> export SOLUTION="kafka"
> sbt run

Results will be aggregated and displayed on console continously.

Example

...
(90,EnrichedProduct(90,Map(NL -> 17852, DE -> 17674, US -> 17624)))
(91,EnrichedProduct(91,Map(US -> 17915, NL -> 17688, DE -> 17755)))
(92,EnrichedProduct(92,Map(NL -> 17568, DE -> 17476, US -> 17770)))
(93,EnrichedProduct(93,Map(DE -> 17690, US -> 17841, NL -> 17584)))
(94,EnrichedProduct(94,Map(DE -> 17675, NL -> 17631, US -> 17560)))
(95,EnrichedProduct(95,Map(NL -> 17717, US -> 17550, DE -> 17790)))
(96,EnrichedProduct(96,Map(US -> 17609, DE -> 17921, NL -> 17732)))
(97,EnrichedProduct(97,Map(US -> 17441, NL -> 17804, DE -> 17370)))
(98,EnrichedProduct(98,Map(DE -> 17775, US -> 17797, NL -> 17658)))
(99,EnrichedProduct(99,Map(NL -> 17711, US -> 17629, DE -> 17813)))
(100,EnrichedProduct(100,Map(US -> 17830, NL -> 17675, DE -> 17805)))

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.