spring-attic / reactive-streams-commons Goto Github PK

View Code? Open in Web Editor NEW

356.0 36.0 51.0 2.34 MB

A joint research effort for building highly optimized Reactive-Streams compliant operators.

License: Apache License 2.0

CSS 0.10% Java 94.01% HTML 5.88%

reactive-streams experimental

reactive-streams-commons's Introduction

reactive-streams-commons is no longer actively maintained by VMware, Inc.

reactive-streams-commons

A joint research effort for building highly optimized Reactive-Streams compliant operators. Current implementors include RxJava2 and Reactor.

Java 8 required.

Maven

repositories {
    maven { url 'https://repo.spring.io/libs-snapshot' }
}

dependencies {
    compile 'io.projectreactor:reactive-streams-commons:0.6.0.BUILD-SNAPSHOT'
}

Snapshot directory.

Operator-fusion documentation

Supported datasources

I.e., converts non-reactive data sources into Publishers.

PublisherAmb : relays signals of that source Publisher which responds first with any signal
PublisherArray : emits the elements of an array
PublisherCallable : emits a single value returned by a Callable
PublisherCompletableFuture : emits a single value produced by a CompletableFuture
PublisherConcatArray : concatenate an array of Publishers
PublisherConcatIterable : concatenate an Iterable sequence of Publishers
PublisherDefer : calls a Supplier to create the actual Publisher the Subscriber will be subscribed to.
PublisherEmpty : does not emit any value and calls onCompleted; use instance() to get its singleton instance with the proper type parameter
PublisherError : emits a constant or generated Throwable exception
PublisherFuture : awaits and emits a single value emitted by a Future
PublisherGenerate : generate signals one-by-one via a function
PublisherInterval : periodically emits an ever increasing sequence of long values
PublisherIterable : emits the elements of an Iterable
PublisherJust : emits a single value
PublisherNever : doesn't emit any signal other than onSubscribe; use instance() to get its singleton instance with the proper type parameter
PublisherRange : emits a range of integer values
PublisherStream : emits elements of a Stream
PublisherTimer : emit a single 0L after a specified amount of time
PublisherUsing : create a resource, stream values in a Publisher derived from the resource and release the resource when the sequence completes or the Subscriber cancels
PublisherZip : Repeatedly takes one item from all source Publishers and runs it through a function to produce the output item

Supported transformations

ConnectablePublisherAutoConnect given a ConnectablePublisher, it connects to it once the given amount of subscribers subscribed
ConnectablePublisherRefCount given a ConnectablePublisher, it connects to it once the given amount of subscribers subscribed to it and disconnects once all subscribers cancelled
ConnectablePublisherPublish : allows dispatching events from a single source to multiple subscribers similar to a Processor but the connection can be manually established or stopped.
PublisherAccumulate : Accumulates the source values with an accumulator function and returns the intermediate results of this function application
PublisherAggregate : Aggregates the source values with an aggergator function and emits the last result.
PublisherAll : emits a single true if all values of the source sequence match the predicate
PublisherAny : emits a single true if any value of the source sequence matches the predicate
PublisherAwaitOnSubscribe : makes sure onSubscribe can't trigger the onNext events until it returns
PublisherBuffer : buffers certain number of subsequent elements and emits the buffers
PublisherBufferBoundary : buffers elements into continuous, non-overlapping lists where another Publisher signals the start/end of the buffer regions
PublisherBufferBoundaryAndSize : buffers elements into continuous, non-overlapping lists where the each buffer is emitted when they become full or another Publisher signals the boundary of the buffer regions
PublisherBufferStartEnd : buffers elements into possibly overlapping buffers whose boundaries are determined by a start Publisher's element and a signal of a derived Publisher
PublisherCollect : collects the values into a container and emits it when the source completes
PublisherCombineLatest : combines the latest values of many sources through a function
PublisherConcatMap : Maps each upstream value into a Publisher and concatenates them into one sequence of items
PublisherCount : counts the number of elements the source sequence emits
PublisherDistinct : filters out elements that have been seen previously according to a custom collection
PublisherDistinctUntilChanged : filters out subsequent and repeated elements
PublisherDefaultIfEmpty : emits a single value if the source is empty
PublisherDelaySubscription : delays the subscription to the main source until the other source signals a value or completes
PublisherDetach : detaches the both the child Subscriber and the Subscription on termination or cancellation.
PublisherDrop : runs the source in unbounded mode and drops values if the downstream doesn't request fast enough
PublisherElementAt : emits the element at the specified index location
PublisherFilter : filters out values which doesn't pass a predicate
PublisherFlatMap : maps a sequence of values each into a Publisher and flattens them back into a single sequence, interleaving events from the various inner Publishers
PublisherFlattenIterable : concatenates values from Iterable sequences generated via a mapper function
PublisherGroupBy : groups source elements into their own Publisher sequences via a key function
PublisherIgnoreElements : ignores values and passes only the terminal signals along
PublisherIsEmpty : returns a single true if the source sequence is empty
PublisherLatest : runs the source in unbounded mode and emits the latest value if the downstream doesn't request fast enough
PublisherLift : maps the downstream Subscriber into an upstream Subscriber which allows implementing custom operators via lambdas
PublisherMap : map values to other values via a function
PublisherPeek : peek into the lifecycle and signals of a stream
PublisherReduce : aggregates the source values with the help of an accumulator function and emits the the final accumulated value
PublisherRepeat : repeatedly streams the source sequence fixed or unlimited times
PublisherRepeatPredicate : repeatedly stream the source if a predicate returns true
PublisherRepeatWhen : repeats a source when a companion sequence signals an item in response to the main's completion signal
PublisherResume : if the source fails, the stream is resumed by another Publisher returned by a function for the failure exception
PublisherRetry : retry a failed source sequence fixed or unlimited times
PublisherRetryPredicate : retry if a predicate function returns true for the exception
PublisherRetryWhen : retries a source when a companion sequence signals an item in response to the main's error signal
PublisherSample : samples the main source whenever the other Publisher signals a value
PublisherScan : aggregates the source values with the help of an accumulator function and emits the intermediate results
PublisherSingle : expects the source to emit only a single item
PublisherSkip : skips a specified amount of values
PublisherSkipLast : skips the last N elements
PublisherSkipUntil : skips values until another sequence signals a value or completes
PublisherSkipWhile: skips values while the predicate returns true
PublisherStreamCollector : Collects the values from the source sequence into a java.util.stream.Collector instance; see Collectors utility class in Java 8+
PublisherSwitchIfEmpty : continues with another sequence if the first sequence turns out to be empty.
PublisherSwitchMap : switches to and streams a Publisher generated via a function whenever the upstream signals a value
PublisherTake : takes a specified amount of values and completes
PublisherTakeLast : emits only the last N values the source emitted before its completion
PublisherTakeWhile : relays values while a predicate returns true for the values (checked before each value)
PublisherTakeUntil : relays values until another Publisher signals
PublisherTakeUntilPredicate : relays values until a predicate returns true (checked after each value)
PublisherThrottleFirst : takes a value from upstream then uses the duration provided by a generated Publisher to skip other values until that other Publisher signals
PublisherThrottleTimeout : emits the last value from upstream only if there were no newer values emitted during the time window provided by a publisher for that particular last value
PublisherTimeout uses per-item Publishers that when they fire mean the timeout for that particular item unless a new item arrives in the meantime
PublisherWindow : splits the source sequence into possibly overlapping windows of given size
PublisherWindowBatch : batches the source sequence into continuous, non-overlapping windows where the length of the windows is determined by a fresh boundary Publisher or a maximum elemenets in that window
PublisherWindowBoundary : splits the source sequence into continuous, non-overlapping windows where the window boundary is signalled by another Publisher
PublisherWindowBoundaryAndSize : splits the source sequence into continuous, non-overlapping windows where the window boundary is signalled by another Publisher or if a window received a specified amount of values
PublisherWindowStartEnd : splits the source sequence into potentially overlapping windows controlled by a start Publisher and a derived end Publisher for each start value
PublisherWithLatestFrom : combines values from a master source with the latest values of another Publisher via a function
PublisherZip : Repeatedly takes one item from all source Publishers and runs it through a function to produce the output item
PublisherZipIterable : pairwise combines a sequence of values with elements from an iterable

Supported extractions

I.e., these allow leaving the reactive-streams world.

BlockingIterable : an iterable that consumes a Publisher in a blocking fashion
BlockingFuture : can return a future that consumes the source entierly and returns the very last value
BlockingStream : allows creating sequential and parallel j.u.stream.Stream flows out of a source Publisher
PublisherBase.blockingFirst : returns the very first value of the source, blocking if necessary; returns null for an empty sequence.
PublisherBase.blockingLast : returns the very last value of the source, blocking if necessary; returns null for an empty sequence.
PublisherBase.peekLast : returns the last value of a synchronous source or likely null for other or empty sequences.

reactive-streams-commons's People

Contributors

Stargazers

Watchers

reactive-streams-commons's Issues

Expand Micro-Fusion ( Back-Fusion and ConditionalSubscriber )

Review micro fusion path for more dropping back-fusion.

Gem 22) discussion

From @dsyer

Interesting. Do you mean Mono.fromCallable() (because when I looked at the source code it seemed to me that the Callable.call() only happens when there is a subscribe())? In general, what's a good way to assert or inspect statements like that? What did I misunderstand?

Fusing subscribeOn with upstream

@akarnokd has written in several places that the classic case of:

publisher.map(x -> someExpensiveComputation(x))
    .observeOn(AndroidSchedulers.mainThread())
    .subscribe(x -> someFn(x))

means that map and observeOn cannot be fused together. I understand this point well. However, I'm interested in a variant of this pipeline:

publisher.map(x -> someExpensiveComputation(x))
    .subscribeOn(Schedulers.io())
    .observeOn(AndroidSchedulers.mainThread())
    .subscribe(x -> someFn(x))

I might be misunderstanding the matrix, but it seems to suggest that subscribeOn cannot be fused with the map which as far as I can tell is possible? Moreover, the matrix suggests that subscribeOn and observeOn can be fused together which I am failing to understand.

Also a related question, is the async fusion in subscribeOn currently implemented? From the code it doesn't look like it (I'm looking at https://github.com/reactor/reactive-streams-commons/blob/master/src/main/java/rsc/publisher/PublisherSubscribeOn.java) but I'm not sure if it's been worked on locally.

Finally is there a place where @smaldini and @akarnokd discuss what is happening in this repo? I would be very interested in following along!

Reactive puzzlers (working title)

This issue, similar to #21, collects some interesting cases and pitfalls in the form of Java Puzzlers, giving a scenario and offering 4-6 answers.

Add IndexedQueue contract to optimize draining paths

IndexedQueue usually work with volatile consumer indices. We could define a convention to "batch" increment this index by doing it once at the end of drain loops instead of using queue.poll() that increments index everytime.

Unbounded FlatMap with slow consumer

The unbounded flatMap has terrible performance if a consumer, such as observeOn processes slower.

The main reasons:

copy-on-write subscriber tracking, which will create N arrays of ever increasing size with O(n * n) complexity and then it takes O(n * n) to remove them one-by-one
the cleanup loop runs over all n sources in case requested is zero to evict empty and completed sources, adding another O(n * n) complexity.

For example, this test runs in about 30s:

range(1, 100_000).flatMap(v -> range(1, 10)).subscribe(new TestSubscriber<>(0L));

(then it takes 12s to drain by requesting 96 elements in a loop, see PublisherFlatMapTest.slowDrain().)

The cleanup can be short circuited because if one finds a non-empty source, there is no need to continue removing any potential finished and empty sources after that, the request-based drain will take care of those entries eventually.

The tracking overhead can be reduced via a free slot queue and power-of-2 allocations, but with some cost because of synchronized and no compaction as of now.

The benchmark results (i7 4790, Windows 7 x64, Java 8u72):

Where COW is the original copy-on-write tracking, FL is a free-list based solution which already shows some promise; COW-2 is COW where the cleanup is short-circuited and FL-2 is FL with the cleanup short circuited as well.

FL-2 seems to be promising with a bunch of +/- 5% loss/gain, probably due to noise.

I'm looking into non-synchronized options with the FL solution.

Explore timed microbatch operators

Currently written with a combination of 2 or 3 operators, timed microbatch like stream.window(10, () -> PublisherBase.timer(....)) or stream.buffer(10, () -> PublisherBase.timer(....)) would be useful.

Review and Mark Unbounded Operators

Unbounded Operators should at least implement Backpressurable#getCapacity() : Long.MAX_VALUE.
Some Unbounded Operators might also benefit from new accounting strategies. Let's find out possible candidate and extract relevant issues.

Add PatitionPublisher and PublisherPartition

Distribute the sequence onto N transformable Publisher that can be eventually joined back.
Ex :

stream.partition(6).dispatchOn(ForkJoinPool.commonPool()).map(service::blockingTask).merge()

Supported PartitionPublisher operator :

All PublisherBase
merge()
concat()

Partition resolution could be provided via key extractor and unlike GroupBy would use a fixed-delimited list of keys.

Clarification on the Scheduler.Worker.schedule contract

The Scheduler.Worker#schedule javadoc says:

    /**
     * Creates a worker of this Scheduler that executed task in a strict
     * FIFO order, guaranteed non-concurrently with each other.
     * <p>
     * Once the Worker is no longer in use, one should call shutdown() on it to
     * release any resources the particular Scheduler may have used.
     * 
     * <p>Tasks scheduled with this worker run in FIFO order and strictly non-concurrently, but
     * there are no ordering guarantees between different Workers created from the same
     * Scheduler.
     * 
     * @return the Worker instance.
     */

The FIFO order seems to be preserved in all of the code examples but I don't understand the semantics being used for "strictly non-concurrently".

For example it appears ExecutorServiceScheduler will have Workers that will run tasks concurrently unless a single thread executor is used.

In other Reactive Streams implementations that have a scheduler aka rxjava there is no mention of that guarantee.

Is there some implementation logic I'm missing or am I misunderstanding the doc?

101 Reactive Gems (working title)

In this issue, we should collect tips and tricks with reactive systems and dataflows.

These are not particularly advanced topics but the markdown support on GitHub makes it easier to write them up.

Once we run out of ideas, we may tidy it up and release it together (maybe a free ebook?).

Please post only gems here and open discussion about them in separate issues. Thanks.

Add PublisherEvery

Signatures:

PublisherBase<T> : <T> PublisherBase#every(int)
PublisherBase<T> : <T> PublisherBase#everyFirst(int)

Return the last or the first element of the given batch count. Allows for simple sampling with fixed size.

Expand Micro-Fusion (Front Fusion)

Use shaded dependency on JCTools instead of copy and paste

Currently this library uses some queue implementations lifted in spirit and to some extent code out of JCTools. The 2 projects share a contributor, @akarnokd.
I would recommend this project directly shades JCTools and uses the originals. Versioned dependencies allow for clear release boundaries and well understood state of code while also enabling future improvements and corrections to be fed in easily. This is why everyone does it...

Add PublisherIndexOf

Signatures:

PublisherBase<Long> : <T> PublisherBase#findIndex(T)
PublisherBase<Long> : <T> PublisherBase#findIndex(T, long)

Return a position relative to the sequence observed when element is found. Might onError(NoSuchElementException) or use the fallback argument long, e.g. -1L.

Design considerations

There are a few decisions to be made:

Repo initialization with Gradle files and directories

Who should do this? I have a gradle setup I usually copy into new projects that can upload to maven based on local settings.

Rackage naming, structure

What should be the base package name. In addition, we may need an internal package.

Lock-free or not?

Most serializing operators can be implemented in both ways but lock-free usually requires lock-free queues (i.e., JCTools).

The project states it wants to avoid queue-use so we don't lock in on Disruptor or JCTools but is it possible to abstract away the queue operations, i.e., is the j.u.c.Queue interface an adequate abstraction?

We can also have implementations for both cases in separate subpackages lockfree and locking.

Which Java version?

Java 8 allows nicer code but Java 6 allows Android use.

Processor implementations

Do we want to provide these as well? Note that the operator redo - which unifies retry and repeat behavior through the signal of a Publisher requires a PublishProcessor at least (or a BehaviorProcessor at most), although these may be inlined.

Factory or classes

Should the library provide the operators via factory methods, i.e., Publisher<R> map(Publisher<T> source, Function<T, R> mapper); or as named classes: class PublisherMap<T, R>?

The latter case helps discovering what's in a chain.

Operator fusion

I'm in a relatively early stage of describing and prototyping operator fusion. What's definitely necessary is that each operator has a distinct class which also exposes its input parameters (i.e., the source and mapper in 6))

Beta/Experimental annotation

Do we want to copy the RxJava way of introducing new operators/classes?

Benchmarks

Do we want JMH benchmarks in the repo?

Reactive-Streams TCK

I have two problems with it: a) requires TestNG which is quite invasive in Eclipse and b) is designed to test either just and a replaying-type Processor implementation. It doesn't work properly for range for example because the test wants to verify if the output of the Publisher is the same as the test generated.

Optimizations vs Reactive-Streams spec

Certain internal operator optimizations, such as reusing a Subscriber instance, may violate the specification although completely safe to do.

The spec says that you can't subscribe the same instance multiple times to the same or different Publishers and some implementations may actually check for this (such as RxJavaReactiveStreams).

Inlining and API leaks

To avoid allocations, some operators may build upon existing classes or implement interfaces that leak into the API. For example, an auto-sharing operator PublisherAutoShare may extend AtomicInteger directly to save on an atomic counter field. However, this class now has a bunch of other methods that leaked in and requires extra care from the users

Building blocks

There are a few frequent building blocks useful for operators (i.e., arbiters, terminal atomics tools). Do we want to officially support them as well or keep them hidden in internal?

Field updaters, Atomic instances or Unsafe?

Which technique do we want? Note that Java 8 is about to receive performance optimizations targeting field updaters so they get very close to Unsafe.

Expand Macro-Fusion paths

Everything that can be done pre-assembly or during assembly

Make GroupBy and future partitioning operator store-agnostic

Currently GroupBy uses a Map to store grouped sequence references. We probably use 4 interactions with Map that could be eventually translated as many java.util.function or java.lang callbacks. The reason is that some repository might actually offer more refined strategies to create and fetch group references. We could use a pooled registry of groups for instance, attach some behavior to a fresh group (timeout) or just increase our route resolution possibilities.

TestSubscriber constructor with delegate Subscriber

In Reactor I have created a TestSubscriber inspired from the one of RSC, and I have a question.

I am not sure if we want (on Reactor side) to keep constructors using a delegate Subscriber because I am not sure about the use cases, could you please give me some hints about what use cases they fit in?

Thanks in advance for your help.

Question: Using Reactor streams

Hi, I am from a Financial Organization. have multiple use cases to use stream processing to send alerts and notifications to customers. We are interested in reactor-streams and I came to this project. I understand it is the second generation of Reactor Streams. Please confirm and let me know when is the targeted release for this project to be consumed for production?

Please share if you have any documentation like user guide that would be helpful.

Thanks and Regards
Karthik

Explore Backpressurable#getCapacity optimizations

Backpressurable#getCapacity can be a useful tool to detect if the source Publisher is a Completable (0), Single (1), Unbounded (Long.MAX), Bounded (N > 1 && < Integer.MAX) or Mixed (-1) backpressure strategy.

Some operator might adapt their supplied Queue while other can choose to short-circuit more expensive inner Subscriber.

Throttling on leading and trailing edge

Request

It would be great when Reactor would support throttling on leading and trailing edge. This is quite a common use case when working on applications that provide data to some kind of reactive user interface, but may be an interesting feature in general.

Disclaimer: This ticket describes throttling requirements from the perspective of a UI. Various kinds of consumers may have the same requirements, even though they are not explicitly mentioned.

What does this mean?

The popular JS library lodash implements this throttling behavior as a default (as mentioned before, quite a common use case for UI applications). Instead of trying to explain this with words, please check out the following JS Bin which shows the behavior in action:

JS Bin: http://jsbin.com/modenetujo/edit?js,console
lodash throttling documentation: https://lodash.com/docs#throttle

For easier reference, here the example logic and resulting output:

const windowSize = 100;
const fn = _.throttle(v => console.log(v), windowSize);

fn('a');                                     // start of first window
setTimeout(() => fn('b'), windowSize * 0.5); // in first window, but dropped due to successive message
setTimeout(() => fn('c'), windowSize * 0.9); // last message in first window

setTimeout(() => fn('d'), windowSize * 1.2); // first message in new window


// expected output:
// a
// c
// d

Why is this interesting?

In reactive UIs, throttling logic which emits only on a single edge, i.e. either leading or trailing edge, is insufficient as:

The UI should not have to wait for the end of a time window to receive a value when there is only a single value (read: trailing edge only throttling).
Only emitting on the leading edge is insufficient as this can result in permanent inconsistent data (read: leading edge only throttling).

Naming?

Hi,

I only skimmed the README, but shouldn't the Transformations be named "Processor" rather than "Publisher"? (https://github.com/reactor/reactive-streams-commons#supported-transformations)