GithubHelp home page GithubHelp logo

findinpath / search-alert Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 416 KB

Proof of concept project on implementing both near-real-time & batched search agent functionality

Kotlin 100.00%
elasticsearch percolator kafka kafka-consumer cassandra

search-alert's Introduction

Search alert

This is a proof of concept project on implementing both:

  • near real time (NRT)
  • batched (hourly/daily)

search alert functionality.

Introduction

A search alert is as Google Alerts puts it a way to :

Monitor the web for interesting new content

When having a search index at disposal, at the time new content is being indexed, the system can checked whether it fits to any of the search alerts configured and if so notify the user about it.

This system is no match for the scale of the Google Alerts system, but it is rather a proof of concept on how this functionality can be achieved in a decent fashion for websites of medium size.

In the same fashion as Google Alerts, this system should be able to handle both:

  • immediate (as it happens)
  • batched (hourly, daily, weekly)

search alert notifications.

Technological stack

The tech stack on which the system behind this project is built is composed of the following components:

  • Apache Kafka for streaming data between the components of the system
  • Elasticsearch distributed, RESTful search engine
  • Apache Cassandra for storing information about the search alert messages sent in order to avoid sending duplicated messages (e.g.: for a batched search alert on a specific hour/day).

Architecture

The architecture of this proof of concept system is relatively simple:

architecture

For the sake of better understanding the purpose of this system we'll consider that this system handles the search alert functionality for a news website. The users of the website that have interest for a certain topic can register a search alert to be notified when new articles matching their criteria are published on the platform. In case of having matches, the search alerts can be configured by the users to be notified immediately/ hourly/ daily about new results.

When a news article is being published on the news platform, it will also be checked against the registered search alerts to look for matches.

There is a different type of handling between the immediate, and the batched search alerts.

Immediate search alerts

The immediate search alerts that match a newly published article are going to be pushed by the percolator component to the immediate topic which is being handled almost on the spot by the immediate-messenger component to notify the user about a new possible match.

The immediate-messenger component runs indefinitely.

Batched search alerts

The batched search alert notifications need to be "parked" until their corresponding messaging period(hour/day) elapses.

Considering that a search alert is configured to hourly notify a user about new articles matching a specific search criteria, when an article is being published on 2020-06-19 09:21:05 on the marketplace platform, the search alert should then notify the user at 2020-06-19 10:00:00. In this case, the search alert information is going to be pushed to the topic hourly_1592553600000 (1592553600000 corresponds to 2020-06-19 10:00:00). In case that the search alert would have been configured to delivery notifications daily, then the search alert information would be then published to the topic daily_1592604000000 (1592604000000 corresponds to 2020-06-20 00:00:00)

The consumption of the entries from the hourly_1592553600000 topic will start at the earliest at 2020-06-19 10:00:00. The consumption of the entries from the daily_1592604000000 topic will start at the earliest at 2020-06-20 00:00:00.

The batched-messenger component is responsible of handling the batched search alert hits from a specified topic that contain entries for the hourly/daily search alerts. The consumption from the topic corresponding to a specific hour/day time window will begin only after the time window has elapsed.
Once the consumption of these topics reaches the end of the topic, the batched-messenger component can end its runtime.

Once all the partitions of the topic hourly_1592553600000 are read by the batched-messenger component, the batched topic is considered obsolete (and can be eventually be deleted) and not being taken anymore into consideration for consumption.

OPTIONAL In case of finding a matching article for a batched search alert, the percolator component is responsible to "pause" the search alert until its current batching period (hour/day) elapses in order to avoid doing unnecessary matches against new incoming articles.

Orchestrating the batched search alerts

As mentioned previously, the batched search alert notifications need to be "parked" until their corresponding messaging period(hour/day) elapses.

At the beginning of each batched time window (hour/day) should be therefor started one or multiple instances of the batched-messenger component for the batched topics on which the consumer offset of the batched-messenger lags behind.

Depending on the amount of the batched search alert hits from the topic that need to be processed, the orchestrator could then choose how many instances of the batched-messenger component to spawn.

search-alert's People

Contributors

findinpath avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.