pellse / cohereflux Goto Github PK

Assembler is a reactive data aggregation framework for querying and merging data from multiple data sources/services. CohereFlux enables efficient implementation of the API Composition Pattern and is also designed to solve the N + 1 query problem. Architecture-agnostic, it can be used as part of a monolithic or microservice architecture.

License: Apache License 2.0

Java 90.84% Kotlin 9.16%

java microservices datasource cqrs event-sourcing reactive-programming reactive reactive-streams kotlin composition-api project-reactor event-driven

cohereflux's Introduction

Assembler

Assembler is a reactive, functional, type-safe, and stateless data aggregation framework for querying and merging data from multiple data sources/services. Assembler enables efficient implementation of the API Composition Pattern and is also designed to solve the N + 1 query problem. Assembler is architecture-agnostic, allowing it to be used as part of a monolithic or microservice architecture.

Internally, Assembler leverages Project Reactor to implement end-to-end reactive stream pipelines and maintain all the reactive stream properties as defined by the Reactive Manifesto, including responsiveness, resilience, elasticity, message-driven with back-pressure, non-blocking, and more.

See the demo app for a comprehensive project utilizing Assembler.

Here is an example from the demo app GitHub repository which integrates Assembler with Spring GraphQL to implement real-time data aggregation of multiple data sources:

SpO2.Readings.mp4

The code implementing the whole real-time data aggregation pipeline is as simple as below:

Use Cases
Basic Usage
- Default values for missing data
Infinite Stream of Data
Reactive Caching
- Pluggable Reactive Caching Strategies
  - Third Party Reactive Cache Provider Integration
- Auto Caching
  - Event Based Auto Caching
Integration with non-reactive sources
Kotlin Support
What's Next?

Use Cases

Assembler can be used in situations where an application needs to access data or functionality that is spread across multiple services. Some common use cases include:

CQRS/Event Sourcing: Assembler can be used on the read side of a CQRS and Event Sourcing architecture to efficiently build materialized views that aggregate data from multiple sources.
API Gateway: Assembler can be used in conjunction with an API Gateway, which acts as a single entry point for all client requests. The API Gateway can combine multiple APIs into a single, unified API, simplifying the client's interactions with the APIs and providing a unified interface for the client to use.
Backends for Frontends: Assembler can also be used in conjunction with Backends for Frontends (BFFs). A BFF is a dedicated backend service that provides a simplified and optimized API specifically tailored for a particular client or group of clients.
Reduce network overhead: By combining multiple APIs into a single API, Assembler can reduce the amount of network traffic required for a client to complete a task. This can improve the performance of the client application and reduce the load on the server.
Solve the N + 1 Query Problem: Assembler can solve the N + 1 query problem by allowing a client to make a single request to a unified API that includes all the necessary data. This approach reduces the number of requests required and database queries, further optimizing the application's performance.

⬆️

Basic Usage

Here is an example of how to use Assembler to generate transaction information from a list of customers of an online store. This example assumes the following fictional data model and API to access different services:

public record Customer(Long customerId, String name) {}

public record BillingInfo(Long id, Long customerId, String creditCardNumber) {
    
  public BillingInfo(Long customerId) {
    this(null, customerId, "0000 0000 0000 0000");
  }
}

public record OrderItem(String id, Long customerId, String orderDescription, Double price) {}

public record Transaction(Customer customer, BillingInfo billingInfo, List<OrderItem> orderItems) {}

Flux<Customer> getCustomers(); // e.g. call to a microservice or a Flux connected to a Kafka source
Flux<BillingInfo> getBillingInfo(List<Long> customerIds); // e.g. connects to relational database (R2DBC)
Flux<OrderItem> getAllOrders(List<Long> customerIds); // e.g. connects to MongoDB

In cases where the getCustomers() method returns a substantial number of customers, retrieving the associated BillingInfo for each customer would require an additional call per customerId. This would result in a considerable increase in network calls, causing the N + 1 queries issue. To mitigate this, we can retrieve all the BillingInfo for all the customers returned by getCustomers() with a single additional call. The same approach can be used for retrieving OrderItem information.

As we are working with three distinct and independent data sources, the process of joining data from Customer, BillingInfo, and OrderItem into a Transaction must be performed at the application level. This is the primary objective of Assembler.

When utilizing the Assembler, the aggregation of multiple reactive data sources and the implementation of the API Composition Pattern can be accomplished as follows:

import reactor.core.publisher.Flux;
import io.github.pellse.assembler.Assembler;

import static io.github.pellse.assembler.AssemblerBuilder.assemblerOf;
import static io.github.pellse.assembler.RuleMapper.oneToMany;
import static io.github.pellse.assembler.RuleMapper.oneToOne;
import static io.github.pellse.assembler.RuleMapperSource.call;
import static io.github.pellse.assembler.Rule.rule;

Assembler<Customer, Transaction> assembler = assemblerOf(Transaction.class)
        .withCorrelationIdResolver(Customer::customerId)
        .withRules(
                rule(BillingInfo::customerId, oneToOne(call(this::getBillingInfo))),
                rule(OrderItem::customerId, oneToMany(OrderItem::id, call(this::getAllOrders))),
                Transaction::new)
        .build();

Flux<Transaction> transactionFlux = assembler.assemble(getCustomers());

The code snippet above demonstrates the process of first retrieving all customers, followed by the concurrent retrieval of all billing information and orders (in a single query) associated with the previously retrieved customers, as defined by the Assembler rules. The final step involves aggregating each customer, their respective billing information, and list of order items (related by the same customer id) into a Transaction object. This results in a reactive stream (Flux) of Transaction objects.

⬆️

Default values for missing data

To provide a default value for each missing values from the result of the API call, a factory function can also be supplied as a 2nd parameter to the oneToOne() function. For example, when getCustomers() returns 3 Customer [C1, C2, C3], and getBillingInfo([ID1, ID2, ID3]) returns only 2 associated BillingInfo [B1, B2], the missing value B3 can be generated as a default value. By doing so, a null BillingInfo is never passed to the Transaction constructor:

rule(BillingInfo::customerId, oneToOne(call(this::getBillingInfo), customerId -> new BillingInfo(customerId)))

or more concisely:

rule(BillingInfo::customerId, oneToOne(call(this::getBillingInfo), BillingInfo::new))

Unlike the oneToOne() function, oneToMany() will always default to generating an empty collection. Therefore, providing a default factory function is not needed. In the example above, an empty List<OrderItem> is passed to the Transaction constructor if getAllOrders([1, 2, 3]) returns null.

⬆️

Infinite Stream of Data

In situations where an infinite or very large stream of data is being handled, such as dealing with 100,000+ customers, Assembler needs to completely drain the upstream from getCustomers() to gather all correlation IDs (customerId). This can lead to resource exhaustion if not handled correctly. To mitigate this issue, the stream can be split into multiple smaller streams and processed in batches. Most reactive libraries already support this concept. Below is an example of this approach, utilizing Project Reactor:

Flux<Transaction> transactionFlux = getCustomers()
  .windowTimeout(100, ofSeconds(5))
  .flatMapSequential(assembler::assemble);

⬆️

Reactive Caching

Apart from offering convenient helper functions to define mapping semantics such as oneToOne() and oneToMany(), Assembler also includes a caching/memoization mechanism for the downstream subqueries via the cached() wrapper function:

import io.github.pellse.assembler.Assembler;

import static io.github.pellse.assembler.AssemblerBuilder.assemblerOf;
import static io.github.pellse.assembler.RuleMapper.oneToMany;
import static io.github.pellse.assembler.RuleMapper.oneToOne;
import static io.github.pellse.assembler.RuleMapperSource.call;
import static io.github.pellse.assembler.Rule.rule;
import static io.github.pellse.assembler.CacheFactory.cached;

var assembler = assemblerOf(Transaction.class)
        .withCorrelationIdResolver(Customer::customerId)
        .withRules(
                rule(BillingInfo::customerId, oneToOne(cached(call(this::getBillingInfo)))),
                rule(OrderItem::customerId, oneToMany(OrderItem::id, cached(call(this::getAllOrders)))),
                Transaction::new)
        .build();

var transactionFlux = getCustomers()
        .window(3)
        .flatMapSequential(assembler::assemble);

⬆️

Pluggable Reactive Caching Strategies

The cached() function includes overloaded versions that enable users to utilize different Cache implementations. By providing an additional parameter of type CacheFactory to the cached() method, users can customize the caching mechanism as per their requirements. In case no CacheFactory parameter is passed to cached(), the default implementation will internally use a Cache based on HashMap.

All Cache implementations are internally decorated with non-blocking concurrency controls, making them safe for concurrent access and modifications.

Here is an example of a different approach that users can use to explicitly customize the caching mechanism e.g. storing cache entries in a TreeMap:

import io.github.pellse.assembler.Assembler;

import static io.github.pellse.assembler.AssemblerBuilder.assemblerOf;
import static io.github.pellse.assembler.RuleMapper.oneToMany;
import static io.github.pellse.assembler.RuleMapper.oneToOne;
import static io.github.pellse.assembler.RuleMapperSource.call;
import static io.github.pellse.assembler.Rule.rule;
import static io.github.pellse.assembler.CacheFactory.cache;
import static io.github.pellse.assembler.CacheFactory.cached;

var assembler = assemblerOf(Transaction.class)
        .withCorrelationIdResolver(Customer::customerId)
        .withRules(
                rule(BillingInfo::customerId, oneToOne(cached(call(this::getBillingInfo), cache(TreeMap::new)))),
                rule(OrderItem::customerId, oneToMany(OrderItem::id, cached(call(this::getAllOrders), cache(TreeMap::new)))),
                Transaction::new)
        .build();

⬆️

Third Party Reactive Cache Provider Integration

Below is a compilation of supplementary modules that are available for integration with third-party caching libraries. Additional modules will be incorporated in the future:

Assembler add-on module	Third party cache library
	Caffeine

Here is a sample implementation of CacheFactory that showcases the use of the Caffeine library, which can be accomplished via the caffeineCache() helper method. This helper method is provided as part of the caffeine add-on module:

import com.github.benmanes.caffeine.cache.Caffeine;

import static com.github.benmanes.caffeine.cache.Caffeine.newBuilder;

import static io.github.pellse.assembler.AssemblerBuilder.assemblerOf;
import static io.github.pellse.assembler.RuleMapper.oneToMany;
import static io.github.pellse.assembler.RuleMapper.oneToOne;
import static io.github.pellse.assembler.RuleMapperSource.call;
import static io.github.pellse.assembler.Rule.rule;
import static io.github.pellse.reactive.assembler.CacheFactory.cached;
import static caffeine.cache.io.github.pellse.assembler.CaffeineCacheFactory.caffeineCache;

Caffeine<Object, Object> cacheBuilder = newBuilder()
        .recordStats()
        .expireAfterWrite(ofMinutes(10))
        .maximumSize(1000);

var assembler = assemblerOf(Transaction.class)
        .withCorrelationIdResolver(Customer::customerId)
        .withRules(
                rule(BillingInfo::customerId, oneToOne(cached(call(this::getBillingInfo), caffeineCache(cacheBuilder)))),
                rule(OrderItem::customerId, oneToMany(OrderItem::id, cached(call(this::getAllOrders), caffeineCache()))),
                Transaction::new)
        .build();

⬆️

Auto Caching

In addition to the cache mechanism provided by the cached() function, Assembler also provides a mechanism to automatically and asynchronously update the cache in real-time as new data becomes available via the autoCache() function. This ensures that the cache is always up-to-date and avoids in most cases the need for cached() to fall back to fetch missing data.

The auto caching mechanism in Assembler can be seen as being conceptually similar to a KTable in Kafka. Both mechanisms provide a way to keep a key-value store updated in real-time with the latest value per key from its associated data stream. However, Assembler is not limited to just Kafka data sources and can work with any data source that can be consumed in a reactive stream.

This is how autoCache() connects to a data stream and automatically and asynchronously update the cache in real-time:

import reactor.core.publisher.Flux;
import io.github.pellse.assembler.Assembler;

import static io.github.pellse.assembler.AssemblerBuilder.assemblerOf;
import static io.github.pellse.assembler.RuleMapper.oneToMany;
import static io.github.pellse.assembler.RuleMapper.oneToOne;
import static io.github.pellse.assembler.RuleMapperSource.call;
import static io.github.pellse.assembler.Rule.rule;
import static io.github.pellse.assembler.CacheFactory.cached;
import static io.github.pellse.assembler.AutoCacheFactory;

Flux<BillingInfo> billingInfoFlux = ... // From e.g. Debezium/Kafka, RabbitMQ, etc.;
Flux<OrderItem> orderItemFlux = ... // From e.g. Debezium/Kafka, RabbitMQ, etc.;

var assembler = assemblerOf(Transaction.class)
        .withCorrelationIdResolver(Customer::customerId)
        .withRules(
                rule(BillingInfo::customerId,
                        oneToOne(cached(call(this::getBillingInfo), caffeineCache(), autoCache(billingInfoFlux)))),
                rule(OrderItem::customerId,
                        oneToMany(OrderItem::id, cached(call(this::getAllOrders), autoCache(orderItemFlux)))),
                Transaction::new)
        .build();

var transactionFlux = getCustomers()
        .window(3)
        .flatMapSequential(assembler::assemble);

It is also possible to customize the Auto Caching configuration via autoCacheBuilder():

import reactor.core.publisher.Flux;
import io.github.pellse.assembler.Assembler;

import static io.github.pellse.assembler.AssemblerBuilder.assemblerOf;
import static io.github.pellse.assembler.RuleMapper.oneToMany;
import static io.github.pellse.assembler.RuleMapper.oneToOne;
import static io.github.pellse.assembler.RuleMapperSource.call;
import static io.github.pellse.assembler.Rule.rule;
import static io.github.pellse.assembler.CacheFactory.cached;
import static io.github.pellse.assembler.AutoCacheFactoryBuilder.autoCacheBuilder;
import static io.github.pellse.assembler.AutoCacheFactory.OnErrorMap.onErrorMap;
import static reactor.core.scheduler.Schedulers.newParallel;
import static java.lang.System.getLogger;

var logger = getLogger("auto-cache-logger");

Flux<BillingInfo> billingInfoFlux = ... // From e.g. Debezium/Kafka, RabbitMQ, etc.;
Flux<OrderItem> orderItemFlux = ... // From e.g. Debezium/Kafka, RabbitMQ, etc.;

var assembler = assemblerOf(Transaction.class)
        .withCorrelationIdResolver(Customer::customerId)
        .withRules(
                rule(BillingInfo::customerId, oneToOne(cached(call(this::getBillingInfo),
                        autoCacheBuilder(billingInfoFlux)
                                .maxWindowSizeAndTime(100, ofSeconds(5))
                                .errorHandler(error -> logger.log(WARNING, "Error in autoCache", error))
                                .scheduler(newParallel("billing-info"))
                                .maxRetryStrategy(50)
                                .build()))),
                rule(OrderItem::customerId, oneToMany(OrderItem::id, cached(call(this::getAllOrders),
                        autoCacheBuilder(orderItemFlux)
                                .maxWindowSize(50)
                                .errorHandler(onErrorMap(MyException::new))
                                .scheduler(newParallel("order-item"))
                                .backoffRetryStrategy(100, ofMillis(10))
                                .build()))),
                Transaction::new)
        .build();

var transactionFlux = getCustomers()
        .window(3)
        .flatMapSequential(assembler::assemble);

By default, the cache is updated for every element from the incoming stream of data, but it can be configured to batch the cache updates, useful when we are updating a remote cache to optimize network calls

⬆️

Event Based Auto Caching

Assuming the following custom domain events not known by Assembler:

sealed interface MyEvent<T> {
  T item();
}

record ItemUpdated<T>(T item) implements MyEvent<T> {}
record ItemDeleted<T>(T item) implements MyEvent<T> {}

record MyOtherEvent<T>(T value, boolean isAddOrUpdateEvent) {}

// E.g. Flux coming from a Change Data Capture/Kafka source
Flux<MyOtherEvent<BillingInfo>> billingInfoFlux = Flux.just(
  new MyOtherEvent<>(billingInfo1, true), new MyOtherEvent<>(billingInfo2, true),
  new MyOtherEvent<>(billingInfo2, false), new MyOtherEvent<>(billingInfo3, false));

// E.g. Flux coming from a Change Data Capture/Kafka source
Flux<MyEvent<OrderItem>> orderItemFlux = Flux.just(
  new ItemUpdated<>(orderItem11), new ItemUpdated<>(orderItem12), new ItemUpdated<>(orderItem13),
  new ItemDeleted<>(orderItem31), new ItemDeleted<>(orderItem32), new ItemDeleted<>(orderItem33));

Here is how autoCache() can be used to adapt those custom domain events to add, update or delete entries from the cache in real-time:

import io.github.pellse.assembler.Assembler;
import io.github.pellse.assembler.CacheFactory.CacheTransformer;

import static io.github.pellse.assembler.AssemblerBuilder.assemblerOf;
import static io.github.pellse.assembler.RuleMapper.oneToMany;
import static io.github.pellse.assembler.RuleMapper.oneToOne;
import static io.github.pellse.assembler.RuleMapperSource.call;
import static io.github.pellse.assembler.Rule.rule;
import static io.github.pellse.assembler.CacheFactory.cached;
import static io.github.pellse.assembler.AutoCacheFactory.autoCache;

CacheTransformer<Long, BillingInfo, BillingInfo> billingInfoAutoCache =
        autoCache(billingInfoFlux, MyOtherEvent::isAddOrUpdateEvent, MyOtherEvent::value);

CacheTransformer<Long, OrderItem, List<OrderItem>> orderItemAutoCache =
        autoCache(orderItemFlux, ItemUpdated.class::isInstance, MyEvent::item);

Assembler<Customer, Transaction> assembler = assemblerOf(Transaction.class)
        .withCorrelationIdResolver(Customer::customerId)
        .withRules(
                rule(BillingInfo::customerId, oneToOne(cached(call(this::getBillingInfo), billingInfoAutoCache))),
                rule(OrderItem::customerId, oneToMany(OrderItem::id, cached(call(this::getAllOrders), orderItemAutoCache))),
                Transaction::new)
        .build();

var transactionFlux = getCustomers()
        .window(3)
        .flatMapSequential(assembler::assemble);

⬆️

Integration with non-reactive sources

A utility function toPublisher() is also provided to wrap non-reactive sources, useful when e.g. calling 3rd party synchronous APIs:

import reactor.core.publisher.Flux;
import io.github.pellse.assembler.Assembler;

import static io.github.pellse.assembler.AssemblerBuilder.assemblerOf;
import static io.github.pellse.assembler.RuleMapper.oneToMany;
import static io.github.pellse.assembler.RuleMapper.oneToOne;
import static io.github.pellse.assembler.RuleMapperSource.call;
import static io.github.pellse.assembler.Rule.rule;
import static io.github.pellse.assembler.QueryUtils.toPublisher;

List<BillingInfo> getBillingInfo(List<Long> customerIds); // non-reactive source

List<OrderItem> getAllOrders(List<Long> customerIds); // non-reactive source

Assembler<Customer, Transaction> assembler = assemblerOf(Transaction.class)
        .withCorrelationIdResolver(Customer::customerId)
        .withRules(
                rule(BillingInfo::customerId, oneToOne(call(toPublisher(this::getBillingInfo)))),
                rule(OrderItem::customerId, oneToMany(OrderItem::id, call(toPublisher(this::getAllOrders)))),
                Transaction::new)
        .build();

⬆️

Kotlin Support

sealed interface MyEvent<T> {
  val item: T
}

data class ItemUpdated<T>(override val item: T) : MyEvent<T>
data class ItemDeleted<T>(override val item: T) : MyEvent<T>

// E.g. Flux coming from a Change Data Capture/Kafka source
val billingInfoFlux: Flux<MyEvent<BillingInfo>> = Flux.just(
  ItemUpdated(billingInfo1), ItemUpdated(billingInfo2),
  ItemUpdated(billingInfo3), ItemDeleted(billingInfo3))

// E.g. Flux coming from a Change Data Capture/Kafka source
val orderItemFlux: Flux<MyEvent<OrderItem>> = Flux.just(
  ItemUpdated(orderItem31), ItemUpdated(orderItem32), ItemUpdated(orderItem33),
  ItemDeleted(orderItem31), ItemDeleted(orderItem32), ItemDeleted(orderItem33))

import io.github.pellse.reactive.assembler.kotlin.assembler
import io.github.pellse.reactive.assembler.kotlin.cached
import io.github.pellse.assembler.RuleMapper.oneToMany
import io.github.pellse.assembler.RuleMapper.oneToOne
import io.github.pellse.assembler.RuleMapperSource.call
import io.github.pellse.assembler.Rule.rule
import io.github.pellse.assembler.CacheFactory.cache
import io.github.pellse.assembler.AutoCacheFactory.autoCache
import io.github.pellse.assembler.AutoCacheFactoryBuilder.autoCacheBuilder

val assembler = assembler<Transaction>()
  .withCorrelationIdResolver(Customer::customerId)
  .withRules(
    rule(BillingInfo::customerId, oneToOne(::getBillingInfo.cached(cache(::sortedMapOf),
      autoCache(billingInfoFlux, ItemUpdated::class::isInstance) { it.item }))),
    rule(OrderItem::customerId, oneToMany(OrderItem::id, ::getAllOrders.cached(
      autoCacheBuilder(orderItemFlux, ItemUpdated::class::isInstance, MyEvent<OrderItem>::item)
        .maxWindowSize(3)
        .build()))), 
    ::Transaction)
  .build()

⬆️

What's Next?

See the list of issues for planned improvements in a near future.

⬆️

cohereflux's People

Contributors

Stargazers

Watchers

Forkers

rajeshkarthik thaingo awgroeneveld pologood sakhtar1979 abramdy nbhatta7 psevestre moksheshd nackman learn-knowlege rinaldodev javalibrary mrloyal sourcespyteam alexkarezin arooba-git

cohereflux's Issues

Implement builder pattern for Assembler rules

Static helper methods make the api really nice and compact for default usage when we don't have to override too many configs (e.g. custom ID collection factory, results collection factory, map factory, etc.), but offering a fluent builder api for assembler rules could make the api even more user friendly

Add module for RxJava implementation

Support passing a rule context in the delegate chain for downstream assembler rules

We currently only pass the correlation ID extractor in rule(), which becomes a limitation for e.g. passing a cached Flux to oneToOne or oneToMany when the query function returning Flux accepts a collection of ids that is not of type List.

If we have:

Publisher<BillingInfo> getAllBillingInfo(Set<Long> customerIds)
Publisher<OrderItem> getAllOrders(Set<Long> customerIds)

public record Transaction(Customer customer, BillingInfo billingInfo, List<OrderItem> orderItems) { }

The following will fail at compile time as ids are of type Set<Long> instead of List<Long>:

var assembler = assemblerOf(TransactionSet.class)
        .withIdExtractor(Customer::customerId)
        .withAssemblerRules(
                rule(BillingInfo::customerId, oneToOne(cached(this::getAllBillingInfo))),
                rule(OrderItem::customerId, oneToManyAsSet(cached(this::getAllOrders))),
                Transaction::new)
        .build();

Current implementation supports collection of ids of other types than List, but we have to pass the id collection supplier both in oneToOne() and cached(), making the api not very user friendly

Support third party integration for asynchronous caching

Many cache providers (e.g. Caffeine, Hazelcast, etc.) provide async apis, we should be able to integrate with those to keep the whole data aggregation process non-blocking

Add javadoc

Increase unit test coverage

Add module for CompletableFuture implementation

Push model for cache population

For real-time stream processing of high velocity data, the current reactive implementation support efficient reactive data aggregation/enrichment through data windowing, but for each window (batch) we still need to invoke downstream sources (i.e. sources defined by assembler rules) on demand to populate the cache.

It would be nice to also be able to connect each downstream and independently populate the cache as data is coming, so that when we perform API composition/data aggregation, the data is already available.

Add the ability to store either R or List<R> as cache values

Support end to end reactive workflow

Subqueries (e.g. oneToXXX() functions) are wrapped to be able to participate into a reactive workflow depending on the adapter used (e.g. FluxAdapter, FlowableAdapter, AkkaSourceAdapter), but the subqueries themselves are synchronous by nature (e.g. a database call, a call to a REST service, etc.). It would be nice to support native Publisher e.g. reactive Spring Data Repository (like MongoDB), RSocket, Microprofile Reactive Stream calls, etc.

Generalize the return type of query functions in MapperUtils helper methods

We currently declare those as D extends Collection<R>, but it would be nice to also be able to return any type of container (e.g. Flux<R>, Stream<R>, etc.). This would be better implemented with higher kinded types (e.g. in Scala), but we don't have that luxury in Java, so this will break the relationship between the container and its generic type, something to think about.

[Question] One to many unit test

sorry to ask here, I don't know how to create a order record
and could not found the rule static method

dependencies {
	implementation 'io.github.pellse:assembler-core:0.5.0'
	implementation 'io.github.pellse:assembler-util:0.5.0'
}

record Order(int id, List<OrderItem> orderItems) {}
record OrderItem(int id, int orderId, Product product) {}
record Product(int id) {}

CompletableFuture<Order> order = assemblerOf(Order.class)
    // ... ???

Pass top level entity instances in `RuleMapperSource` instead of IDs

This would increase:

Type safety, passing IDs introduce the possibility of passing wrong ID list of 2 classes an ID field of same type
It would make it possible to define RuleMapperSource that reuse stream of top level entities to extract specific fields instead of issuing calls to external data sources

Make Assembler reusable

We currently create assembler instances on-demand e.g.:

Flux<Transaction> transactionFlux = Flux.fromIterable(getCustomers())
    .buffer(10)
    .flatMap(customers -> assemblerOf(Transaction.class)
        .from(customers, Customer::getCustomerId)
        .assembleWith(
            oneToOne(this::getBillingInfoForCustomers, BillingInfo::getCustomerId),
            oneToManyAsList(this::getAllOrdersForCustomers, OrderItem::getCustomerId),
            Transaction::new)
        .using(fluxAdapter()))

but doing so means that a few new intermediate short lived objects (to internally build the assembler) are allocated on each invocation, which could make the GC work a bit more under heavy load with very high velocity data. For most use cases this doesn't have any impact, and we should be careful with premature optimization, but for code where the smallest performance gain is significant, changing the current implementation could potentially lead to small increase in data throughput.

Anyhow, it might not be a bad idea to be able to save the 'recipe' once and just apply it whenever data is coming i.e. change the api to be able to do something that could look like:

Assembler<Flux<Transaction>> transactionAssembler = assemblerOf(Transaction.class)
   .withCorrelationId(Customer::getCustomerId)
   .assembleWith(
       oneToOne(this::getBillingInfoForCustomers, BillingInfo::getCustomerId),
       oneToManyAsList(this::getAllOrdersForCustomers, OrderItem::getCustomerId),
       Transaction::new)
   .using(fluxAdapter()));

Flux<Transaction> transactionFlux = Flux.fromIterable(getCustomers())
   .buffer(10)
   .flatMap(transactionAssembler::assemble);

The final api might not be exactly like the example shown above which is just for illustration purpose.

Add module for Akka Stream implementation

Support relevant subset of JPA annotations

If the POJOs returned from the different method calls to be assembled declare JPA annotation e.g. @Id, @OneToOne, etc., it would be nice to honor those, this would allow to simplify the Assembler API invocation as it wouldn't be mandatory to supply an id extractor function:

So the following:

@Data
@AllArgsConstructor
public class Transaction {
    private final Customer customer;
    private final BillingInfo billingInfo;
    private final List<OrderItem> orderItems;
}

Flux<Transaction> transactionFlux = FluxAssembler.of(this::getCustomers, Customer::getCustomerId)
    .assemble(
        oneToOne(this::getBillingInfoForCustomers, BillingInfo::getCustomerId),
        oneToManyAsList(this::getAllOrdersForCustomers, OrderItem::getCustomerId),
        Transaction::new);

could instead be declared as:

Flux<Transaction> transactionFlux = FluxAssembler.of(this::getCustomers)
    .assemble(
        oneToOne(this::getBillingInfoForCustomers),
        oneToManyAsList(this::getAllOrdersForCustomers,
        Transaction::new);

The idea of providing an id extractor function being actually to avoid to have to support annotations and make the framework a lot more flexible by being able to join/assemble arbitrary data sources, full support for id extraction functions should still be there, and explicitly providing id extraction functions should override the mapping declared in JPA annotations.

Convert each id collection separately

The current framework erroneously assume that each function participating to the aggregation process are using the same type of collection to pass ids e.g:

.assembleWith(
    oneToOne(AssemblerTestUtils::getBillingInfoForCustomers, BillingInfo::getCustomerId),
    oneToManyAsList(AssemblerTestUtils::getAllOrdersForCustomers, OrderItem::getCustomerId),
    Transaction::new)

if getBillingInfoForCustomers is defined to accept a set as ids while getAllOrdersForCustomers uses a list instead e.g.:

List<BillingInfo> getBillingInfoForCustomers(Set<Long> customerIds);
List<OrderItem> getAllOrdersForCustomers(List<Long> customerIds);

this will lead to a compilation error.

So oneToOne() and oneToManyXXX() helper methods should allow to override the conversion mechanism to align the collection types

please support passing in a `Publisher<T>` to the `assemble` method instead of a synchronous `Iterable<T>`.

I would like to use Assembler to stream the results and aggregate the details as the results are returning.

Ability to plug event based flux in autoCache

Currently we can plug asynchronous data sources via autoCache() to pre-populate cache, but the semantic is to only add or update existing cache. This feature would allow event based data sources so we can differentiate incoming data between add/update and remove events. A use case for this is e.g. a CDC (Change Data Capture) stream like a Debezium Kafka consumer or a MongoDB Change Streams/Cosmos DB Change Feed connected to autoCache() to synchronize the cache with the associated database or microservice.

Questions

Looks like it is a great work.

a. Is it production-ready.. any one using it?

b. Support for CQRS and Event sourcing out-of-the-box OR do we need to write of our own on top of it.. ny example that demonstrates ES/CQRS?

c. Akka has changed its license. Now it is BSL which is very restrictive. Can we replace Akka with some other?

Thanks

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.