GithubHelp home page GithubHelp logo

luatnc87 / real-time-analytic-stack Goto Github PK

View Code? Open in Web Editor NEW
7.0 2.0 0.0 1.17 MB

This repo demonstrate a comprehensive real-time analytic stack using popular open-source tools.

Shell 73.15% Python 26.85%
analytics-platform data-platform dbt doris seatunnel superset

real-time-analytic-stack's Introduction

Real Time Data Analytic Stack

Welcome to Real Time Analytics Stack! This repo showcases a complete real-time analytic stack using popular open-source tools.

In this tutorial, I demonstrate how to use Docker Compose to quickly set up a real time data analytic stack using Apache SeaTunnel, Doris and Superset. The pipeline uses SeaTunnel to ingest real-time CDC event from MySQL database into Doris data warehouse (You can transform the data with dbt) and visualize the data with Superset.

real time data analytic stack architecture

Components of the Real Time Data Analytic Stack

Before we set up the project, let’s briefly look at each tool used in this example of a real-time data analytic stack to make sure you understand their responsibilities.

Apache SeaTunnel

SeaTunnel is a very easy-to-use, ultra-high-performance, distributed data integration platform that supports real-time synchronization of massive data. It can synchronize tens of billions of data stably and efficiently every day, and has been used in production by nearly 100 companies.

Apache Doris

Apache Doris is a high-performance, real-time analytic database base on the MPP (Massive Parralell Processing) architecture and is known for extreme speed and ease of use. It takes only sub-second response time to return query results under massive amounts of data, can support not only highly concurrent point query scenarios, but also high throughput complex analytic scenarios.

Apache Superset

Apache Superset is a modern business intelligence, data exploration and visualization platform. Superset connects with a variety of databases and provides an intuitive interface for visualizing datasets. It offers a wide choice of visualizations as well as a no-code visualization builder. You can run Superset locally with Docker Compose or in the cloud using Preset. Superset sits at the end of this real time data analytics stack example and is used to visualize the data stored in Apache Doris.

Pre-requisites

To follow along, you need to:

Install Docker and Docker Compose in your machine. You can follow this guide to install Docker and this one to install Docker Compose.

Using Docker Compose to Bootstrap a Real Time Data Analytic Stack

This tutorial uses Docker Compose and a shell script to set up the required resources. Docker saves you from installing additional dependencies locall. You can quickly start and stop the instances.

The shell script setup.sh provides two commands, up and down, to start and stop the instances. The compose files are stored in seatunnel/docker-compose-seatunnel.yaml, doris/docker-compose-doris.yaml, and superset/docker-compose-superset.yaml. You can go through these files and make any necessary customization, for example, changing the ports where the instances start or installing additional dependencies.

Setting up SeaTunnel, Doris, Superset with Docker Compose

Setting up Apache SeaTunnel

The script launches the SeaTunnel instance at

Setting up Apache Doris

The script launches the Doris FE (front end) instance at http://localhost:8030. You can see the following screen, which indicates that the FE has start successfully. doris_fe_login.png Note: Here we use the Doris built-in default user (root) to log in with an empty password.

Setting up Apache Superset

One the setup.sh command has completed, visit http://localhost:8088 to access the Superset UI. Enter admin as username and password. Choose Apache Doris from the supported database drop-down, then provide information to finish connection configuration.

Using the Real Time Data Analytic Stack

One the stack is ready and running, you can start using it to ingest and process your data.

Sync real-time CDC event from MySQL database into Apache Doris DWH

Create a materialized view to near real-time aggregate data

Visualize data on dashboard using Superset

Cleaning up

Conclusion

About the author

real-time-analytic-stack's People

Contributors

luatnc87 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

real-time-analytic-stack's Issues

docker-compose-seatunnel.yaml is empty

Hi,

Really nice project. I wanted to try it out, though the seatunnel/docker-compose-seatunnel.yaml file is empty.

If you have it available, could you commit the content to that file as I'm particularly interested in trying out Apache SeaTunnel?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.