🐝 NeonBee Core

NeonBee is an open source reactive dataflow engine, a data stream processing framework using Vert.x.

Description

NeonBee abstracts most of Vert.x's low-level functionality by adding an application layer for modeling a dataflow, data stream processing and for doing data consolidation, exposing everything via standardized RESTfull APIs in form of interfaces called endpoints.

Additionally NeonBee takes care of all the execution and comes bundled with a full-blown application server with all the core capabilities of Vert.x at hand but without you having to care about things like a boot sequence, command line parsing, configuration, monitoring, clustering, deployment, scaling, logging / log handling and much more. We put a rich convenience layer for data handling on top, simplifying technical data exchange.

To achieve this simplification, NeonBee reduces the scope of Vert.x by choosing and picking the most appropriate core and extensions components of Vert.x and providing them in a pre-configured / application server like fashion. For example, NeonBee comes with a default configuration of the SLF4J logging facade and Logback logging backend, so you won't have to deal with choosing and configuring the logging environment in the first place. However, in case you decide to go with the logging setup NeonBee provides, you are still free to change.

NeonBee vs. Vert.x

After reading the introduction, you might ask yourself when do I need NeonBee or when to choose NeonBee over Vert.x? Summarizing a few of our main advantages, that will let you help choose. If you say "no, I do that myself anyways", you might stay with Vert.x for now!

We choose, you use: With NeonBee we came to the conclusion that providing all capabilities of a framework is often not the best choice, especially when it comes to the later operation of the framework. Vert.x provides you with a lot of options to choose from: which cluster manager to take, which logging framework to choose, how many servers to start, what authentication plugins to use, etc. etc. This is great, if you know exactly what you need and everyone in your area that you are working in does. Problems start, when multiple projects should run on the same foundational basis, if it is required for your components to interact. Suddenly, choosing a different cluster manager, or event bus implementation, or for every team creating an own server in an own API format to expose their API soon gets troublesome! Thus, with NeonBee, we took a lot of those kinds of decisions for you and stuck to the industry / enterprise standard. Where there is debate, we made it configurable or provided hooks for you to integrate. However, you do no longer have to care, that your application is abel to run in a cluster, you need an API? You get an API, logging framework – pre-configured, metrics endpoint – check... NeonBee provides you with a perfect foundation to get started and for your team to build-up on, without having to deal with every bit and piece of the lower-level set-up of your project.
Configuration over code: Vert.x has the concept of exposing everything via APIs to developers. It is a purely development-oriented framework. On the one hand side NeonBee is a development-oriented framework as well. Verticles still need implementation effort, however for especially the application server-side tasks, the idea is to configure a well-tested component, rather than writing a new one from scratch every time you need one. Need a RESTful JSON-based CRUD endpoint? Well, have fun writing one in Vert.x or simply add one line to a config file in NeonBee's ServerVerticle to get one. Want to configure your verticles with an own config file when they get deployed, well sure, go ahead and use Vert.x Config to make every verticle configurable, or use NeonBee and have every verticle that gets deployed, read a config from a /config directory automatically. We have built-in support for different endpoint types (like REST, OData and soon more), different authentication schemes, you can define an own error template, configure if and how you cluster your Vert.x instances, which Micrometer registries to use, etc. With this approach we want to set a standard, assuming that it is easier to implement it generically once in a framework, than it is to create it from scratch with every use case that you come across. And if there is still the need to, customize a given part of NeonBee via code, be our guest, you still have plenty of hooks and Vert.x default methods to override to do so!
Need an API? Get an API: As briefly mentioned in the previous points already, Vert.x by default does not provide you with a standardized API endpoint. If you need one, build one, or use any of the Vert.x plugins to get one configured for you. How does it interact with other endpoints, is it properly integrated, e.g., into the event bus, so that consumption of thousands of requests from the endpoint remains scalable? Well, something that will depend heavily on your implementation of the server / endpoint. As a dataflow engine, we have been under the assumption that data processing means nothing, if you cannot expose your result. Thus, NeonBee comes with build-in endpoints and an extensible end-point concept. Meaning that you can get a REST and / or OData endpoint simply by configuring it. It runs integrated with the concept of DataVerticle and EntityVerticle, meaning that everything that you expose via a DataVerticle can also (without one line of additional code), be exposed via a REST endpoint if you choose to do so. Like back in the days, when you had been working on Servlets, you did not have to take care about that a server existed to expose the Servlets to, NeonBee provides you with a even extended concept of structured and unstructured endpoints, meaning that you can expose a full data model, of which the data is provided by an independently scalable set of verticles "under the hood". Again, one less thing you have to worry about!
Scalable by design: Building a scalable application can be hard. Especially if there are too many choices and that one wrong decision or misuse of a certain functionality could have catastrophic effects in terms of performance and scalability. Vert.x is a great and very performant framework, two big parts of this is that it internally based on Netty ant their event-loop concept, as well as providing options to horizontally scale, such as the concept of verticles, the event-bus as well as running in a cluster of multiple Vert.x instances. However, connecting these dots can be hard... You missed to use the event-bus to communicate between your verticles? Well though-luck going to use any clustering. You want to use clustering but are not familiar with the concept of cluster managers, well that will be a chore for you to get into. You need to exchange data between your verticles efficiently, but you do not want to bother with all the internals of the event-bus, codecs, etc.? Yes, this is where NeonBee's build-in data processing / data flow concept comes into full effect! If you stick to the implementation of our DataVerticle and EntityVerticle interfaces, scalability will be simply a given, that you again, do not have to take care about! Verticles in NeonBee can communicate with each other via the event bus by design. At the same time, it is completely transparent to use as a user. You tell, which data you require and NeonBee does take care of the rest, requesting the data, in advance to your verticle getting the data it requested. And this being fully horizontally scalable, by using the build-in support for clustering and cluster management into NeonBee. Starting / operating your server as a single or a horizontally scalable clustered instance, is essentially just two more command line arguments when starting NeonBee. And this only works because we did not allow every verticle to "do whatever they want" there are a couple of rules to stick by, which, we know, are good boundary conditions for a development with Vert.x anyways.
Modularization is key: Developing application in a monolithic design pattern, is a long accepted anti-pattern when it comes to scalable software development. In recent years micro-services had been en vogue, but they come with their own set of disadvantages. Generally micro-services mean, you need a full-stack experience in your developer-base. As the only "loose coupling rule" generally is that micro-services need to use the same protocol to communicate (for instance HTTP/S), there is a whole heck of things that every team needs to take care of. Well... now to put everything in one huge repository and to build one big Vert.x application out of it, might also not be the best thing when it comes to software design and maintainability. This is where NeonBee also shines: It comes with a build-in approach for componentization. Remember the DataVerticles and EntityVerticles that Team A build? Well, they should provide them to you a Fat/Ueber-Jar-like "Module Jar" format that NeonBee defines (essentially a Jar, that contains all dependencies of the verticles, but not shared dependencies, such as the dependency to NeonBee). NeonBee can then take this module and deploy it, into a completely encapsulated class-loader instance before deployment. This essentially allows you to build a scalable server, with as many verticles you would like, from as many teams as you would like. And like with micro-services, all verticles stay only loosely connected via the event-bus. Isn't that great!?
Trust us & don't worry, we got this!: We have been working with Vert.x and together with the Vert.x community for many years now. We know the internals of Vert.x, Netty and learned a lot of how "the clock ticks". Meaning that sometimes we purposefully limited some options or provided only a customizability without the option to further influence a given component by using a development hook. In these cases, we did our homework, to ensure NeonBee stays scalable and performant at any point in time. This is also why for instance the default interface of a DataVerticle is very easy to use, even for somebody who had maybe not had any experience with Vert.x's futurized / promise-based interfaces so far. Give it a shot, we are sure you will like it! If you have anything to do better, tell us and we will be happy to help!

Core Components

To facilitate the true nature of NeonBee, it features a certain set of core components, abstracting and thus simplifying the naturally broad spectrum of the underlying Vert.x framework components:

Server: boot sequence (read config, deploy verticles & data models, start instance), master / slave handling, etc.
Command Line: CLI argument parsing, using Vert.x CLI API
Configuration: YAML config, using Vert.x Config
Monitoring: Using Micrometer.io API and Prometheus
Clustering: Clustered operation using any Cluster Manager
Deployment: Automated deployment of verticles and data model elements
Supervisor & Scaling: Automated supervision and scaling of verticles
Logging: Using the Vert.x Logging and SLF4J facades and Logback as a back end
Authentication: Configurable authentication chain using Vert.x Auth

Dataflow Processing

While you may just use the NeonBee core components to consume functionalities of Vert.x more easily, the main focus of NeonBee lies on data processing via its stream design. Thus, NeonBee adds a sophisticated high-performance data processing interface you can easily extend plug-and-play. The upcoming sections describe how to use NeonBees data processing capabilities hands-on. In case you would like to understand the concept of NeonBees dataflow processing more in detail, for instance on how different resolution strategies can be utilized, for a highly optimized traversal of the data tree, please have a look at this document explaining the theory behind NeonBees data processing.

Data Verticles / Sources

The main component for data processing is more or less a specialization of the verticle concept of Vert.x. NeonBee introduces a new AbstractVerticle implementation called DataVerticle. These types of verticles implement a very simple data processing interface and communicate between each other using the Vert.x Event Bus. Processing data using the DataVerticle becomes a piece of cake. Data retrieval was split in two phases or tasks:

Require: Each verticle first announces the data it requires from other even multiple DataVerticle for processing the request. NeonBee will, depending on the resolution strategy (see below), attempt to pre-fetch all required data, before invoking the next phase of data processing.
Retrieve: In the retrieval phase, the verticle either processes all the data it requested in the previous require processing phase, or it perform arbitrary actions, such as doing a database calls, or plain data processing, mapping, etc.

Conveniently, the method signatures of DataVerticle are named exactly like that. So, it is very easy to request / consolidate / process data from many different sources in a highly efficient manner.

During either phase, data can be requested from other verticles or arbitrary data sources, however, it is to note that those kinds of requests start spanning a new three, thus they can only again be optimized according to the chosen resolution strategy in their sub-tree. It is best to stick with one require / retrieval phase for one request / data stream, however it could become necessary mixing different strategies to achieve the optimal result, depending on the use case.

Entity Verticles & Data Model

Entity verticles are an even more specific abstraction of the data verticle concept. While data verticles can really deal with any kind of data, for data processing it is mostly more convenient to know how the data is structured. To define the data structure, NeonBee utilizes the OASIS Open Data Protocol (OData) 4.0 standard, which is also internationally certified by ISO/IEC.

OData 4.0 defines the structure of data in so-called models. In NeonBee, models can be easily defined using the Core Data and Services (CDS) Definition Language. CDS provide a human-readable syntax for defining models. These models can then be interpreted by NeonBee to build valid OData 4.0 entities.

For processing entities, NeonBee uses the Apache Olingo™ library. Entity verticles are essentially data verticles dealing with Olingo entities. This provides NeonBee the possibility to expose these entity verticles in a standardized OData endpoint, as well as to perform many optimizations when data is processed.

Data Endpoints

Endpoints are standardized interfaces provided by NeonBee. Endpoints call entry verticles (see dataflow processing) to fetch / write back data. Depending on the type of endpoint, different APIs like REST, OData, etc. will be provided and a different set of verticles gets exposed. Given the default configuration, NeonBee will expose two simplified HTTP endpoints:

A /raw HTTP REST endpoint that returns the data of any data verticle, preferably in a JSON format and
a standardized /odata HTTP endpoint to get a valid OData response from entity verticles.

Getting Started

NeonBee itself doesn't provide any examples yet, but some examples can be found in this repository.

The NeonBee data processing framework can be started like any other ordinary application server by using the start-up scripts provided in the root folder of NeonBee. By default, the boot sequence will check for verticles in the /verticles directory as well as from any remote source configured and deploy all these verticles automatically. This can be customized via command line and via a central configuration file in the /config directory. Also, the data model definition can be updated / loaded during runtime. To do so, valid CDS files must be put into the /models directory.

Writing Your First Data Verticles

Writing a DataVerticle is a piece of cake. Imagine one of your verticles reads customer data from a database, while another verticle should do the processing of it. Let's start with the database verticle:

public class DatabaseVerticle extends DataVerticle<JsonArray> {
    @Override
    public String getName() {
        return "Database";
    }

    @Override
    public Future<JsonArray> retrieveData(DataQuery query, DataContext context) {
        // return a future to a JsonArray here, e.g. by connecting to the database
        return Future.succeededFuture(new JsonArray()
            .add(new JsonObject().put("Id", 1).put("Name", "Customer A"))
            .add(new JsonObject().put("Id", 2).put("Name", "Customer B")));
    }
}

Note how the first verticle returns the name Database. By default, the name is set to the full qualified name of the class. We just return a name here to make the implementation of our processing verticle easier. We create a second data verticle called ProcessingVerticle. In order for the processing verticle to get data from the database verticle, a DataRequest can be returned from the processing verticles requireData method. NeonBee will try resolve all the dependencies of a data verticle, before invoking the verticles retrieveData method. This way it becomes very easy to request some data from another data verticle.

public class ProcessingVerticle extends DataVerticle<JsonArray> {
    @Override
    public String getName() {
        return "CustomerData";
    }

    @Override
    public Future<Collection<DataRequest>> requireData(DataQuery query, DataContext context) {
        return Future.succeededFuture(List.of(new DataRequest("Database", query)));
    }

    @Override @SuppressWarnings("unchecked")
    public Future<JsonArray> retrieveData(DataQuery query, DataMap require, DataContext context) {
        AsyncResult<JsonArray> databaseResult = require.<JsonArray>findFirst("Database").orElse(null);
        if (databaseResult.failed()) {
            // retrieving data from the database failed
            return Future.failedFuture(databaseResult.cause());
        }

        // reverse the database result or do any other more reasonable processing ;-)
        List<Object> reversedDatabaseResult = new ArrayList<>();
        new ArrayDeque<Object>(databaseResult.result().getList())
            .descendingIterator()
            .forEachRemaining(reversedDatabaseResult::add);

        return Future.succeededFuture(new JsonArray(reversedDatabaseResult));
    }
}

And that's it. Deploy your verticles by simply putting a JAR file of them into NeonBees /verticles directory. We will soon release a sample repository on how to build a compliant JAR file. Compliant JAR files refer to so called "modules". A module JAR is a fat / ueber JAR, which comes with any dependencies of the verticle, excluding NeonBee / Vert.x dependencies which are provided by the framework.

Start-up the server, on its default port you can fetch the data of your verticle via the raw endpoint at http://localhost:8080/raw/CustomerData/. NeonBee can easily run this verticle in a highly scalable cluster setup out of the box.

Advancing to Entity Verticles

Dealing with plain data verticles is great if you control who consumes the data and own all interfaces. The downside of data verticles is that they do not provide a structured and obliging interface, but if e.g. announcing to expose JsonObject the given JSON object could have any structure. In case you want to provide a public interface or simply need more control over the data returned, dealing with raw JSON, strings or even binary, gets a bit more troublesome.

NeonBee covers that with the concept of EntityVerticle an extension to the DataVerticle concept. Let's define a very simple data model using the CDS Definition Language first:

namespace MyModel;

entity Customers {
    key Id : Integer;
    Name : String;
}

Compile the CDS definition to EDMX.

Put the .cds and the resulting .edmx file in the /models directory of NeonBee. The OData endpoint now already knows about the Customers entity. The last thing to do is to actually define an EntityVerticle which returns the entity data on request:

public class CustomerEntityVerticle extends EntityVerticle {
    @Override
    public Future<Set<FullQualifiedName>> entityTypeNames() {
        return Set.of(new FullQualifiedName("MyModel", "Customers"));
    }

    @Override
    public Future<Collection<DataRequest>> requireData(DataQuery query, DataContext context) {
        return Future.succeededFuture(List.of(new DataRequest("CustomerData", query)));
    }

    @Override
    public Future<EntityWrapper> retrieveData(DataQuery query, DataMap require, DataContext context) {
        List<Entity> entities = new ArrayList<>();

        for (Object object : require.<JsonArray>resultFor("CustomerData")) {
            JsonObject jsonObject = (JsonObject) object;

            Entity entity = new Entity();
            entity.addProperty(new Property(null, "Id", ValueType.PRIMITIVE, jsonObject.getValue("Id")));
            entity.addProperty(new Property(null, "Name", ValueType.PRIMITIVE, jsonObject.getValue("Name")));
            entities.add(entity);
        }

        return Future.succeededFuture(new EntityWrapper(new FullQualifiedName("MyModel", "Customers"), entities));
    }
}

And that's it. You have created your first valid OData 4.0 endpoint in NeonBee. With the default configuration access the entity at http://localhost:8080/odata/Customers/. Similarly to data verticles the OData endpoint and your entity is now available and is highly scalable by NeonBee.

For Your Convenience

NeonBee provides further simplifications when dealing with verticle development.

Especially in large-scale distributed systems, correlating log messages become crucial to reproduce what is actually going on. Conveniently, NeonBee offers a simple LoggingFacade which masks the logging interface with:

// alternatively you can use the masqueradeLogger method,
// to use the facade on an existing SF4J logger
LoggingFacade logger = LoggingFacade.create();

logger.correlateWith(context).info("Hello NeonBee");

The logger gets correlated with a correlated ID passed through the routing context. In the default implementation of NeonBees logging facade, the correlation ID will be logged alongside the actual message as a so-called marker and can easily be used to trace a certain log message, even in distributed clusters. Note that the correlateWith method does not actually correlate the whole logging facade, but only the next message logged. This means you have to invoke the correlateWith method once again when the next message is logged.

Similar to Vert.x's shared instance, NeonBee provides its own shared instance holding some additional properties, such as the NeonBee options and configuration objects, as well as general purpose local and cluster-wide shared map for you to use. Each NeonBee instance has a one to one relation to a given Vert.x instance. To retrieve the NeonBee instance from anywhere, just use the static NeonBee.neonbee method of the NeonBee main class:

NeonBee neonbee = NeonBee.neonbee(vertx);

// get access to the NeonBee CLI options
NeonBeeOptions options = neonbee.getOptions();

// general purpose shared local / async (cluster-wide) maps
LocalMap<String, Object> sharedLocalMap = neonbee.getLocalMap();
AsyncMap<String, Object> sharedAsyncMap = neonbee.getAsyncMap();

Our Road Ahead

We have ambitious plans for NeonBee and would like to extend it to be able to provide a whole platform for dataflow / data stream processing. Generalizing endpoints, so further interfaces like OpenAPI can be provided or extending the data verticle concept by more optimized / non-deterministic resolution strategies are only two of them. Please have a look at our roadmap for an idea on what we are planning to do next.

Contributing

If you have suggestions how NeonBee could be improved, or want to report a bug, read up on our guidelines for contributing to learn about our submission process, coding rules and more.

We'd love all and any contributions.

rohankumardubey / neonbee Goto Github PK

neonbee's Introduction