Sidereal Events is a standalone HTTP server providing reactive queries over streaming CloudEvents. When used in conjunction with Change Data Capture, it can turn any database into a real-time database.
A query in Sidereal Events is a disjunction of conjunctions, encoded in
disjunctive normal form,
e.g. (a AND b) OR (c AND d AND e) OR ...
, of key/value terms, e.g.
(key1=value1 AND key2=value2) OR (key3=value3 AND key4=value4) OR ...
.
Queries are internally indexed by every conjunction, by their terms
(e.g. key2=value2
is an index entry), as an inversion to the usual practice
of data being indexed by their fields. This query indexing allows Sidereal
Events to scale efficiently to thousands of concurrent queries while still
ingesting tens of thousands of events per second.
Sidereal Events is still very much a work in progress, and isn't yet at a state where it can be considered "production ready". Some aspects of it might change in a breaking manner in the near future. If you'd like to try Sidereal Events out, please proceed with caution.
You'll need a Java Development Kit (JDK) supporting Java 17 or greater. Sidereal Events builds and runs correctly using any of
Sidereal Events is compiled using Gradle, and this repository includes the Gradle Wrapper, so only a suitable JDK needs to be explicitly installed to build this project.
Run the server with the command
./gradlew run --args="serve"
This will start the Sidereal Events Server listening for HTTP connections
over TCP port 8232. Additional command-line flags can be passed in the string
argument to Gradle's --args
argument. Run
./gradlew run --args="serve --help"
for a list of these flags, or see the Server Configuration section for more details.
Automated tests can be run with
./gradlew test
Gradle can build directly-runnable JAR Files by running
./gradlew jar
The resulting JAR will be saved to ./sidereal/build/libs/sidereal.jar
,
and run directly using
java -jar ./sidereal/build/libs/sidereal.jar serve
Native builds are made possible by the Gradle Plugin for GraalVM Native Image Building. As a prerequisite for its use, you will need to
-
Install a GraalVM JDK, which will also include the Native Build tooling.
-
Export the location of the installed GraalVM JDK with the environment variable
GRAALVM_HOME
.On macOS, if your version of GraalVM covers a specific version of Java for which another JDK is not available, e.g. you have installed GraalVM 21 and do not have any other JDK installed for Java 21, you can set
GRAALVM_HOME
by using thejava_home
tool. For example,export GRAALVM_HOME=$(/usr/libexec/java_home -v 21)
Once GraalVM has been installed, and its installation path has been stored
in the GRAALVM_HOME
environment variable, you can build a native executable
with
./gradlew nativeBuild
and the resulting artifact would be saved to
./sidereal/build/native/nativeCompile/sidereal
.
Automated tests can be run against a native executable by running
./gradlew nativeTest
Sidereal Events is available under the terms of the MIT License as per the LICENSE file. Other licensing information is presented using SPDX License IDs embedded in the source files.
More information regarding the technical practice of maintaining license information is available in docs/development/licenses.md.
Events are sent to named "channels" by requesting an HTTP POST containing JSON against the path
/channels/CHANNEL-NAME
where CHANNEL-NAME
is a
percent-encoded
string containing at least 1 character. For example, a CHANNEL-NAME
of
"with/slash" would be encoded as
/channels/with%2fslash
The name meta
is used internally to report query registration, and does not
receive events from external sources. Attempts to send an event to the meta
channel will result in an HTTP 403 response.
The Content-Type
of the data sent to this endpoint can be one of
application/json
application/cloudevents+json
application/cloudevents-batch+json
with the contents interpreted according to the
HTTP Protocol Binding
for CloudEvents. Of note is how Content-Type: application/json
implicitly
describes the CloudEvent datacontenttype
as application/json
.
The CloudEvent specification expects each combination of "source" and "id" to be globally unique. Sidereal Events internally keeps track of these "source" and "id" combinations it has received, over a configurable time horizon with a configurable number of remembered events. If a producer tries to send a "source", "id" combination to a channel that has already received this combination, Sidereal Events will report the publication as having been successful, but will not deliver the event to consumers. Within the constraints of the time horizon and maximum remembered "source", "id" combinations, this makes event publishing an idempotent operation.
Note that the same "source", "id" combination can be published to multiple channels. Each channel will send the data for the same "source", "id" combination at least once.
Consumers receive an event stream by requesting an HTTP GET against the same path used to send events. The content body of this HTTP GET behaves according to the Server-sent event specification, with the event name being "data".
For example, if an event is sent with
POST /channels/events HTTP/1.1
Content-Type: application/json
Ce-Source: //somewhere
Ce-Id: some-id
Ce-Type: some.type
...
Then the event will be present in a GET to the same path:
GET /channels/events HTTP/1.1
...
HTTP/1.1 200 OK
Connection: keep-alive
transfer-encoding: chunked
Content-Type: text/event-stream; charset=utf-8
event: connect
data: {"timestamp":"...","clientID":"..."}
... After the event is sent
event: data
id: %2F%2Fsomewhere+some-id
data: {"source":"//somewhere","id":"some-id","type":"some.type","data":{...}}
The contents of the Server-sent event are formatted according to the JSON Event Format for CloudEvents.
Sidereal Events is designed to efficiently support deep-content filtering of
its JSON input across thousands of connected clients, with multiple query
terms as part of a logical disjunction of conjunctions (e.g.
(a AND b AND c) OR (d AND e) OR ...
). This filtering is enabled by passing
the terms of the filter as an HTTP query string. For example, if a consumer
were to connect to a channel using
GET /channels/example?one="one"&two="two"&three=3 HTTP/1.1
...
And the producer were to send the events
POST /channels/example HTTP/1.1
Content-Type: application/cloudevents-batch+json
[
{
"source": "somewhere",
"id": "1",
"type": "com.example.sidereal",
"specversion": "1.0",
"data": {
"one": "one",
"two": "two",
"three": 3,
"four": "four"
}
}, {
"source": "somewhere",
"id": "2",
"type": "com.example.sidereal",
"specversion": "1.0",
"data": {
"one": "one",
"two": "two",
"three": 3,
"four": "five"
}
}, {
"source": "somewhere",
"id": "3",
"type": "com.example.sidereal",
"specversion": "1.0",
"data": {
"two": "two",
"three": 3,
"four": "five"
}
}, {
"source": "somewhere",
"id": "4",
"type": "com.example.sidereal",
"specversion": "1.0",
"data": {
"one": "two",
"two": "two",
"three": 3
}
}, {
"source": "somewhere",
"id": "5",
"type": "com.example.sidereal",
"specversion": "1.0",
"data": {
"one": "one",
"two": "one",
"three": 3
}
}, {
"source": "somewhere",
"id": "6",
"type": "com.example.sidereal",
"specversion": "1.0",
"data": {
"one": "one",
"two": 2,
"three": 3
}
}, {
"source": "somewhere",
"id": "7",
"type": "com.example.sidereal",
"specversion": "1.0",
"data": {
"one": "one",
"two": "two",
"three": 4
}
}, {
"source": "somewhere",
"id": "8",
"type": "com.example.sidereal",
"specversion": "1.0",
"data": {
"one": "one",
"two": "two",
"three": "three"
}
}
]
then a client receiving events for the "example" channel would see
GET /channels/example?one="one"&two="two"&three=3 HTTP/1.1
...
HTTP/1.1 200 OK
Connection: keep-alive
transfer-encoding: chunked
Content-Type: text/event-stream; charset=utf-8
event: connect
data: {"timestamp":"...","clientID":"..."}
event: data
id: somewhere+1
data: {"source":"somewhere","id":"1","type":"com.example.sidereal","specversion":"1.0","data":
data: {"one":"one","two":"two","three":3,"four":"four"}}
event: data
id: somewhere+2
data: {"source":"somewhere","id":"2","type":"com.example.sidereal","specversion":"1.0","data":
data: {"one":"one","two":"two","three":3,"four":"five"}}
for the following reasons:
- 1 would match because
data["one"] == "one"
,data["two"] == "two"
,data["three"] == 3
. The contents, or even presence, ofdata["four"]
has no effect on the given filter. - 2 would match because
data["one"] == "one"
,data["two"] == "two"
,data["three"] == 3
. Similar to 1, the contents, or even presence, ofdata["four"]
has no effect. - 3 would not match because
data["one"]
is not present. - 4 would not match because
data["one"] == "two"
when we expecteddata["one"] == "one"
. - 5 would not match because
data["two"] == "one"
when we expecteddata["two"] == "two"
. - 6 would not match because
data["two"] == 2
when we expecteddata["two"] == "two"
. - 7 would not match because
data["three"] == 4
when we expecteddata["three"] == 3
. - 8 would not match because
data["three"] == "three"
when we expecteddata["three"] == 3
.
Disjunctions of conjunctions are made possible by using the now-historical
;
separator character. This separator has a lower affinity for logical
terms than the &
separator character. For example, if a consumer were
to connect to a channel using
GET /channels/example?one="one"&two="two"&three=3;one=1&two=2&three="three" HTTP/1.1
...
this would have similar results to connecting twice with both
GET /channels/example?one="one"&two="two"&three=3 HTTP/1.1
...
GET /channels/example?one=1&two=2&three="three" HTTP/1.1
...
but with the added benefits of
- Only requiring one HTTP connection in the server-sent events interface
- Only reporting data once, even if multiple conjunctions are matched
By default, if a key does not start with /
or ../
, it is assumed to be a
literal key within the "data" object of the event. For example, a query string
of the form
some.key="value"
is interpreted to match
{
"source": "...",
"id": "...",
"type": "...",
"specversion": "1.0",
"data": {
"some.key": "value"
}
}
Access to keys within JSON documents is made possible by using JSON Pointers. As an example, to match "some", then "key" in
{
"source": "...",
"id": "...",
"type": "com.example.sidereal",
"specversion": "1.0",
"data": {
"some": {
"key": "value"
}
}
}
you would use a query string
?/some/key="value"
Note that ~
in a valid key path component must be replaced with ~0
, and
/
in a valid key path must be replaced with ~1
. The replacement of ~
with ~0
should occur before replacing /
with ~1
so that the encoding
~1
is not accidentally rewritten as ~~1
. For a key path of
data["with/slash"]["with~tilde"]
the JSON Pointer encoding would be
?/with~1slash/with~0tilde
Arrays can be accessed with positive integers as the "key" in the reference. For example, the following data
{
"source": "...",
"id": "...",
"type": "com.example.sidereal",
"specversion": "1.0",
"data": {
"some": {
"key": [
"first",
"second",
"third",
"fourth"
]
}
}
}
would match the following query:
?/some/key/2="third"
You can mix in deeper JSON access even using array access. The following data
{
"source": "...",
"id": "...",
"type": "com.example.sidereal",
"specversion": "1.0",
"data": {
"some": {
"key": [
{
"name": "first",
"value": 1
},
{
"name": "second",
"value": 2
}
]
}
}
}
would match the following query:
?/some/key/0/name="first"
Keys are matched starting from the "data" key in the resulting CloudEvent by
default. As an extension to JSON Pointers, if the query string starts with
..
and the remainder is a JSON Pointer, the key is matched starting from
the object root. As an example, to match the CloudEvent type in
{
"source": "...",
"id": "...",
"type": "com.example.sidereal",
"specversion": "1.0",
"data": {
"some": {
"key": "value"
}
}
}
you would use a query string
?../type="com.example.sidereal"
Values are encoded according to their JSON representation. Only null
,
booleans, numbers, and strings are supported as match values. If a value
cannot be decoded as null
, a boolean, or a number, and does not start
with a filter operator prefix, it is assumed to be a string.
Sidereal Events supports more filters than just field equality. The following additional operators are available, but many with caveats on the number of operators per query.
- Logical Not, with a value prefix of
!
. This can be used multiple times in a single query. As an example,?../type=!"com.example.sidereal"
, or..%2Ftype=%21%22com.example.sidereal%22
if using strict percent-encoding. - Array Contains, with a value prefix of
[
. This can be used multiple times in a single query. As an example,?../type=["com.example.sidereal"
, or?..%2Ftype=%5B%22com.example.sidereal%22
if using strict - percent-encoding.
- Less Than, with a value prefix of
<
. This can only be used once in a single query, and precludes the use of Less Than or Equal and Starts With operators. It may be used in conjunction with Greater Than or Equal and Greater Than only if these operators are used with the same key. As an example,?../type=<"com.example.sidereal"
, or?..%2Ftype=%3C%22com.example.sidereal%22
if using strict percent-encoding. - Less Than or Equal, with a value prefix of
<=
. This can only be used once in a single query, and precludes the use of the Less Than and Starts With operators. It may be used in conjunction with Greater Than or Equal and Greater Than only if these operators are used with the same key. As an example,?../type=<="com.example.sidereal"
, or?..%2Ftype=%3C%3D%22com.example.sidereal%22
if using strict percent-encoding. - Greater Than or Equal, with a value prefix of
>=
. This can only be used once in a single query, and precludes the use of the Greater Than and Starts With operators. It may be used in conjunction with Less Than and Less Than or Equal only if these operators are used with the same key. As an example,?../type=>="com.example.sidereal"
, or?..%2Ftype=%3E%3D%22com.example.sidereal%22
if using strict percent-encoding. - Greater Than, with a value prefix of
>
. This can only be used once in a single query, and precludes the use of the Greater Than or Equal and Starts With operators. It may be used in conjunction with the Less Than and Less Than or Equal operators only if these operators are used with the same key. As an example,?../type=>"com.example.sidereal"
, or?..%2Ftype=%3E%22com.example.sidereal%22
if using strict percent-encoding. - Starts With, with a value prefix of
~
. This can only be used once in a single query, can only be used with string values, and precludes the use of the Less Than, Less Than or Equal, Greater Than or Equal, and Greater Than operators. As an example,?../type=~"com.example.sidereal"
, or?..%2Ftype=%7E%22com.example.sidereal%22
if using strict percent-encoding.
Sidereal Events accepts configuration through command-line flags and environment variables.
-
Flag:
--server-port
Environment Variable:SIDEREAL_SERVER_PORT
Type: Integer
Default Value: 8232Sidereal Events will listen for HTTP connections over this TCP port.
-
Flag:
--source-name
Environment Variable:SIDEREAL_SOURCE_NAME
Type: String
Default Value://name.djsweet.sidereal
CloudEvents emitted by Sidereal Events will use this string as the "source" metadata.
-
Flag:
--log-level
Environment Variable:SIDEREAL_LOG_LEVEL
Type: One oftrace
,debug
,info
,warn
, orerror
Default Value:info
Sets the minimum logging level. Log levels are defined in a hierarchy, with
trace
being the lowest anderror
being the highest. If this is set toinfo
, then all logs at a level ofINFO
,WARN
, andERROR
are generated, butTRACE
andDEBUG
are ignored. -
Flag:
--router-threads
Environment Variable:SIDEREAL_ROUTER_THREADS
Type: Integer
Default Value: Number of logical CPU threads reported by the operating system.Sidereal Events will spawn this many operating system threads to route events to consuming queries.
-
Flag:
--translator-threads
Environment Variable:SIDEREAL_TRANSLATOR_THREADS
Type: Integer
Default Value: Number of logical CPU threads reported by the operating system.Sidereal Events will spawn this many operating system threads to translate CloudEvents into its internal indexing representation.
-
Flag:
--web-server-threads
Environment Variable:SIDEREAL_WEB_SERVER_THREADS
Type: Integer
Default Value: Twice the number of logical CPU threads reported by the operating system.Sidereal Events will spawn this many operating system threads to service HTTP requests.
-
Flag:
--max-body-size-bytes
Environment Variable:SIDEREAL_MAX_BODY_SIZE_BYTES
Type: Integer
Default Value: 10,485,760 (10MB)Sidereal Events will reject HTTP bodies with a content length greater than this value, sending an HTTP 413 when the request body is too large according to this value.
-
Flag:
--max-idempotency-keys
Environment Variable:SIDEREAL_MAX_IDEMPOTENCY_KEYS
Type: Integer
Default Value: 1,048,576Sidereal Events will retain this many "source", "id" combinations in a set before discarding the oldest values. Setting this value too low may cause duplicate publishes of events to become non-idempotent, but setting this value too high will result in excess memory usage.
-
Flag:
--max-json-parsing-recursion
Environment Variable:SIDEREAL_MAX_JSON_PARSING_RECURSION
Type: Integer
Default Value: 64Sidereal Events will recurse this deep when translating JSON into its internal indexed representation. At nested objects deeper than the configured value, Sidereal Events will use a stack-iterative algorithm that requires heap allocation. This value is chosen to trade off performance with StackOverflowError exceptions. While Sidereal Events dynamically configures itself to avoid StackOverflowErrors in other areas, it is not expected for JSON documents to contain thousands of levels of nesting, and thus it is left as a configurable value.
-
Flag:
--max-outstanding-events-per-router-thread
Environment Variable:SIDEREAL_MAX_OUTSTANDING_EVENTS_PER_ROUTER_THREAD
Type: Integer
Default Value: 131,072Sidereal Events keeps track of the number of events present "within" the system. An event must be delivered to all interested consumers before it is no longer tracked as being outstanding. If the number of outstanding events exceeds this number multiplied by the number of routing threads, Sidereal Events will respond to producers with an HTTP 429, establishing backpressure within the event routing path. The producers are expected to re-attempt the publication of their events after a brief period of waiting when encountering this HTTP 429.
-
Flag:
--max-query-terms
Environment Variable:SIDEREAL_MAX_QUERY_TERMS
Type: Integer
Default Value: 32Sidereal Events limits the number of filter terms available to consumers to prevent excessively large query indices. If a client attempts to use more filters than this configured value, Sidereal Events will reply with an HTTP \400.
-
Flag:
--body-timeout-ms
Environment Variable:SIDEREAL_BODY_TIMEOUT_MS
Type: Integer
Default Value: 60,000After all HTTP headers are received, Sidereal Events will wait up to this many milliseconds to receive an entire response body. If the response body is not fully received within this time, Sidereal Events will respond with an HTTP 408.
-
Flag:
--idempotency-expiration-ms
Environment Variable:SIDEREAL_IDEMPOTENCY_EXPIRATION_MS
Type: Integer
Default Value: 180,000Sidereal Events will remove its record of a "source", "id" combination from its internal tracking set after this many milliseconds. Any transmission of the same "source", "id" combination after the combination is removed from the tracking set will result in the data being sent to consumers.
-
Flag:
--tcp-idle-timeout-ms
Environment Variable:SIDEREAL_TCP_IDLE_TIMEOUT_MS
Type: Integer
Default Value: 180,000Sidereal Events will close a TCP connection after this many milliseconds of no activity. This prevents connections dropped without a TCP FIN or TCP RST from consuming resources.
Metric observability for Sidereal Events is available by requesting
GET /metrics
. The response body follows
Prometheus' Text-based format.
The currently exposed metrics are
- sidereal_data_byte_budget
A gauge indicating the maximum size of any key/value pair in a query term. This is unlikely to change during normal execution, but may be lowered automatically if Sidereal Events encounters a StackOverflowError when routing an event to consuming queries. - sidereal_outstanding_events
A gauge indicating the number of events accepted by Sidereal Events, but not yet confirmed delivered to consumers. - sidereal_active_queries
A gauge indicating the number of queries being serviced, labeled perrouter
. - sidereal_event_routing_seconds
A summary, without quantiles, of the time (in seconds) spent routing events to consuming queries, labeled perrouter
. - sidereal_idempotency_key_cache_size
A gauge indicating the number of "source", "id" combinations saved in the tracking set, labeled perrouter
. - sidereal_json_translation_seconds
A summary, without quantiles, of the time (in seconds) spent translating events into Sidereal Events' internal index representation, labeled pertranslator
.