GithubHelp home page GithubHelp logo

Accessing StateStore about kafka-streams HOT 2 CLOSED

nodefluent avatar nodefluent commented on May 22, 2024
Accessing StateStore

from kafka-streams.

Comments (2)

Protoss78 avatar Protoss78 commented on May 22, 2024 3

Hi,

I'm very familiar with Kafka Streams from a Java perspective, but I haven't used this Javascript library yet. However I can explain the functionality behind the interactive queries:
Usually when creating a Kafka Streams topology you do the following:

  1. Consume data from one or more Kafka topics
  2. Transform the data using the Kafka Streams topology (map, filter, join, ...)
  3. Produce the transformed data into one or more Kafka topics

You can however choose a different target for your streaming topology. Instead producing data to another Kafka topic, you could also write it into a so called "State Store". A State Store is basically a RocksDB Key-Value Database that resides in the local file system of your application (usually in the OS temp folder). This state store can be accessed via a Kafka API and gives you the possibility to get the latest message value for a certain key. You could wrap this functionality behind a REST interface and make the content of a Kafka topic available for a Web Frontend. This is basically what they call interactive queries.

However the above description is a little bit simplified. What I've left out is that there are two different kind of state stores. There is the regular state store and there are global state stores. A global state store will always have all the data from a topic. When you start 3 instances of your application the global state store will be created 3 times. When you use a regular state store and your start 3 instances of your application, Kafka will assign a third of the topic partitions to every instance, therefore the whole topic content is spread over 3 Kafka state stores.

If you use such a regular state store and want to provide a REST service to query the state store content, you are now facing the problem that the data is spread over 3 different server instances. Therefore you have to find out which instance of your application is holding which partition of the Kafka topic. There is also an API for that available (at least in the Java version). So that is the second big and important chunk of the term interactive queries.

You will find a great description of the whole concept here: https://docs.confluent.io/current/streams/developer-guide/interactive-queries.html

Hope this helps.

from kafka-streams.

ahmadsholehin avatar ahmadsholehin commented on May 22, 2024 2

Hi @Protoss78,

Your explanation is remarkably useful. You’re spot on in your second paragraph detailing what Kafka (at least in the Java perspective) provides in the form of interactive queries. This dispels the magic that Kafka packages, makes us aware of the layers of abstraction and in the end understand that the core of what Kafka gives is publishing and subscribing of topics.

I’m starting to understand that this kafka-streams library sits above the pub/sub topics core of Kafka and augments it with a Most.js syntax for stream processing of topics data. Have to also note that this stream processing (map, filter, etc) occurs locally at the application instance as and when data comes in (eg a reduce operation does not combine between multiple application instances).

The library does not come bundled with a state store nor a global state store that mimics what the Kafka API in the Java world has. That though does not stop anyone from building a state store for every application instance yourself. For eg, I’ve been trying to build a Redis-backed state store that does a subset of what the Java Kafka API does.

I also quickly realised that some non-trivial operations, especially aggregations, eg calculating mean value of a stream of values in a distributed manner using multiple application instances, requires much more planning and thought. This library simply does not give you the required options out of the box, though do correct me if I’m wrong.

Thank you once again for your time in your splendid explanation.

from kafka-streams.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.