json-log-filter

High-performance filtering of to-be-logged JSON. Reads, filters and writes JSON in a single step - drastically increasing throughput (by ~3x-9x). Typical use-cases:

Filter sensitive values from logs (i.e. on request-/response-logging)
- technical details like passwords and so on
- sensitive personal information, for GDPR compliance and such
Improve log readability, filtering
- large String elements like base64-encoded binary data, or
- whole JSON subtrees with low informational value
Reduce amount of data sent to log accumulation tools
- lower cost
- potentially reduce search / visualization latency
- keep within max log-statement size
  - GCP: 256 KB
  - Azure: 32 KB

Features:

Truncate large text values
Mask (anonymize) scalar values like String, Number, Boolean and so on.
Remove (prune) whole subtrees
Truncate large documents (max total output size)
Skip or speed up filtering for remainder of document after a number of anonymize and/or prune hits
Remove whitespace (for pretty-printed documents)
Metrics for the above operations + total input and output size

The library contains multiple filter implementations as to accommodate combinations of the above features with as little overhead as possible. The equivalent filters are also implemented using Jackson.

Bugs, feature suggestions and help requests can be filed with the issue-tracker.

License

Apache 2.0

Obtain

The project is built with Maven and is available on the central Maven repository.

Maven coordinates

Add the property

<json-log-filter.version>x.x.x</json-log-filter.version>

then add

<dependency>
    <groupId>com.github.skjolber.json-log-filter</groupId>
    <artifactId>api</artifactId>
    <version>${json-log-filter.version}</version>
</dependency>
<dependency>
    <groupId>com.github.skjolber.json-log-filter</groupId>
    <artifactId>core</artifactId>
    <version>${json-log-filter.version}</version>
</dependency>

and optionally

<dependency>
    <groupId>com.github.skjolber.json-log-filter</groupId>
    <artifactId>jackson</artifactId>
    <version>${json-log-filter.version}</version>
</dependency>

Gradle coordinates

For

ext {
  jsonLogFilterVersion = 'x.x.x'
}

add

api("com.github.skjolber.json-log-filter:api:${jsonLogFilterVersion}")
api("com.github.skjolber.json-log-filter:core:${jsonLogFilterVersion}")

and optionally

api("com.github.skjolber.json-log-filter:jackson:${jsonLogFilterVersion}")

Usage

Use a DefaultJsonLogFilterBuilder or JacksonJsonLogFilterBuilder to configure a filter instance (all filters are thread safe):

JsonFilter filter = DefaultJsonLogFilterBuilder.createInstance()
                       .withMaxStringLength(127) // cuts long texts
                       .withAnonymize("$.customer.email") // inserts ***** for values
                       .withPrune("$.customer.account") // removes whole subtree
                       .withMaxPathMatches(16) // halt anon/prune after a number of hits
                       .withMaxSize(128*1024)
                       .build();
                       
byte[] json = ...; // obtain JSON

String filtered = filter.process(json); // perform filtering

Max string sizes

Configure max string length for output like

{
    "icon": "QUJDREVGR0hJSktMTU5PUFFSU1... + 46"
}

Mask (anonymize)

Configure anonymize for output like

{
    "password": "*****"
}

for scalar values, and/or for objects / arrays all contained scalar values:

{
    "credentials": {
        "username": "*****",
        "password": "*****"
    }
}

Remove arrays or objects (prune subtrees)

Configure prune to turn input

{
    "context": {
        "boringData": {
        ...
        },
        "staticData": [ ... ]
    }
}

to output like

{
    "context": "PRUNED"
}

Path syntax

A simple syntax is supported, where each path segment corresponds to a field name. Expressions are case-sensitive. Supported syntax:

/my/field/name

with support for wildcards;

/my/field/*

or a simple any-level field name search

//myFieldName

The filters within this library support using multiple expressions at once. Note that path expressions are see through arrays.

Max path matches

Configure max path matches; so that filtering stops after a number of matches. This means the filter speed can be increased considerably if the number of matches is known to be a fixed number; and will approach pass-through performance if those matches are in the beginning of the document.

For example if the to-be filtered JSON document has a schema definition with a header + body structure, and the target value is in the header.

Max size

Configure max size to limit the size of the resulting document. This reduces the size of the document by (silently) deleting the JSON content after the limit is reached.

Metrics

Pass in a JsonFilterMetrics argument to the process method like so:

JsonFilterMetrics myMetrics = new DefaultJsonFilterMetrics();
String filtered = filter.process(json, myMetrics); // perform filtering

The resulting metrics could be logged as metadata alongside the JSON payload or passed to sensors like Micrometer for further processing, for example for

Measuring the impact of the filtering, i.e. reduction in data size
Make sure filters are actually operating as intended

Performance

The core processors within this project are faster than the Jackson-based processors. This is expected as parser/serializer features have been traded for performance:

core is something like 3x-9x as fast as Jackson processors, where
skipping large parts of JSON documents (prune) decreases the difference, and
small documents increase the difference, as Jackson is more expensive to initialize.
working directly on bytes is faster than working on characters for the core processors.

For a typical, light-weight web service, the overall system performance improvement for using the core filters over the Jackson-based filters will most likely be a few percent.

Memory use will be at 2-8 times the raw JSON byte size; depending on the invoked JsonFilter method (some accept string, other raw bytes or chars).

See the benchmark results (JDK 17) and the JMH module for running detailed benchmarks.

There is also a path artifact which helps facilitate per-path filters for request/response-logging applications, which should further improve performance.

skjolber / json-log-filter Goto Github PK

json-log-filter's Introduction

json-log-filter

License

Obtain

Usage

Max string sizes

Mask (anonymize)

Remove arrays or objects (prune subtrees)

Path syntax

Max path matches

Max size

Metrics

Performance

See also

json-log-filter's People

Contributors

Stargazers

Watchers

json-log-filter's Issues

Open

Detected dependencies

Recommend Projects

Recommend Topics

Recommend Org

Jobs