GithubHelp home page GithubHelp logo

skjolber / json-log-filter Goto Github PK

View Code? Open in Web Editor NEW
7.0 3.0 0.0 4.58 MB

World's fastest JSON filter for the JVM

Java 99.98% Shell 0.02%
json json-filter logging gdpr high-performance jmh optimization java cost-optimization

json-log-filter's Introduction

Build Status Maven Central codecov

json-log-filter

High-performance filtering of to-be-logged JSON. Reads, filters and writes JSON in a single step - drastically increasing throughput (by ~3x-9x). Typical use-cases:

  • Filter sensitive values from logs (i.e. on request-/response-logging)
    • technical details like passwords and so on
    • sensitive personal information, for GDPR compliance and such
  • Improve log readability, filtering
    • large String elements like base64-encoded binary data, or
    • whole JSON subtrees with low informational value
  • Reduce amount of data sent to log accumulation tools
    • lower cost
    • potentially reduce search / visualization latency
    • keep within max log-statement size

Features:

  • Truncate large text values
  • Mask (anonymize) scalar values like String, Number, Boolean and so on.
  • Remove (prune) whole subtrees
  • Truncate large documents (max total output size)
  • Skip or speed up filtering for remainder of document after a number of anonymize and/or prune hits
  • Remove whitespace (for pretty-printed documents)
  • Metrics for the above operations + total input and output size

The library contains multiple filter implementations as to accommodate combinations of the above features with as little overhead as possible. The equivalent filters are also implemented using Jackson.

Bugs, feature suggestions and help requests can be filed with the issue-tracker.

License

Apache 2.0

Obtain

The project is built with Maven and is available on the central Maven repository.

Maven coordinates

Add the property

<json-log-filter.version>x.x.x</json-log-filter.version>

then add

<dependency>
    <groupId>com.github.skjolber.json-log-filter</groupId>
    <artifactId>api</artifactId>
    <version>${json-log-filter.version}</version>
</dependency>
<dependency>
    <groupId>com.github.skjolber.json-log-filter</groupId>
    <artifactId>core</artifactId>
    <version>${json-log-filter.version}</version>
</dependency>

and optionally

<dependency>
    <groupId>com.github.skjolber.json-log-filter</groupId>
    <artifactId>jackson</artifactId>
    <version>${json-log-filter.version}</version>
</dependency>

or

Gradle coordinates

For

ext {
  jsonLogFilterVersion = 'x.x.x'
}

add

api("com.github.skjolber.json-log-filter:api:${jsonLogFilterVersion}")
api("com.github.skjolber.json-log-filter:core:${jsonLogFilterVersion}")

and optionally

api("com.github.skjolber.json-log-filter:jackson:${jsonLogFilterVersion}")

Usage

Use a DefaultJsonLogFilterBuilder or JacksonJsonLogFilterBuilder to configure a filter instance (all filters are thread safe):

JsonFilter filter = DefaultJsonLogFilterBuilder.createInstance()
                       .withMaxStringLength(127) // cuts long texts
                       .withAnonymize("$.customer.email") // inserts ***** for values
                       .withPrune("$.customer.account") // removes whole subtree
                       .withMaxPathMatches(16) // halt anon/prune after a number of hits
                       .withMaxSize(128*1024)
                       .build();
                       
byte[] json = ...; // obtain JSON

String filtered = filter.process(json); // perform filtering                       

Max string sizes

Configure max string length for output like

{
    "icon": "QUJDREVGR0hJSktMTU5PUFFSU1... + 46"
}

Mask (anonymize)

Configure anonymize for output like

{
    "password": "*****"
}

for scalar values, and/or for objects / arrays all contained scalar values:

{
    "credentials": {
        "username": "*****",
        "password": "*****"
    }
}

Remove arrays or objects (prune subtrees)

Configure prune to turn input

{
    "context": {
        "boringData": {
        ...
        },
        "staticData": [ ... ]
    }
}

to output like

{
    "context": "PRUNED"
}

Path syntax

A simple syntax is supported, where each path segment corresponds to a field name. Expressions are case-sensitive. Supported syntax:

/my/field/name

with support for wildcards;

/my/field/*

or a simple any-level field name search

//myFieldName

The filters within this library support using multiple expressions at once. Note that path expressions are see through arrays.

Max path matches

Configure max path matches; so that filtering stops after a number of matches. This means the filter speed can be increased considerably if the number of matches is known to be a fixed number; and will approach pass-through performance if those matches are in the beginning of the document.

For example if the to-be filtered JSON document has a schema definition with a header + body structure, and the target value is in the header.

Max size

Configure max size to limit the size of the resulting document. This reduces the size of the document by (silently) deleting the JSON content after the limit is reached.

Metrics

Pass in a JsonFilterMetrics argument to the process method like so:

JsonFilterMetrics myMetrics = new DefaultJsonFilterMetrics();
String filtered = filter.process(json, myMetrics); // perform filtering

The resulting metrics could be logged as metadata alongside the JSON payload or passed to sensors like Micrometer for further processing, for example for

  • Measuring the impact of the filtering, i.e. reduction in data size
  • Make sure filters are actually operating as intended

Performance

The core processors within this project are faster than the Jackson-based processors. This is expected as parser/serializer features have been traded for performance:

  • core is something like 3x-9x as fast as Jackson processors, where
  • skipping large parts of JSON documents (prune) decreases the difference, and
  • small documents increase the difference, as Jackson is more expensive to initialize.
  • working directly on bytes is faster than working on characters for the core processors.

For a typical, light-weight web service, the overall system performance improvement for using the core filters over the Jackson-based filters will most likely be a few percent.

Memory use will be at 2-8 times the raw JSON byte size; depending on the invoked JsonFilter method (some accept string, other raw bytes or chars).

See the benchmark results (JDK 17) and the JMH module for running detailed benchmarks.

There is also a path artifact which helps facilitate per-path filters for request/response-logging applications, which should further improve performance.

See also

See the xml-log-filter for corresponding high-performance filtering of XML, and JsonPath for more advanced filtering.

Using SIMD for parsing JSON:

Alternative JSON filters:

json-log-filter's People

Contributors

dependabot[bot] avatar renovate[bot] avatar skjolber avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

json-log-filter's Issues

Dependency Dashboard

This issue lists Renovate updates and detected dependencies. Read the Dependency Dashboard docs to learn more.

Open

These updates have all been created already. Click a checkbox below to force a retry/rebase of any.

Detected dependencies

github-actions
.github/workflows/maven.yml
  • actions/checkout v4
  • actions/setup-java v4
  • codecov/codecov-action v4.3.1
maven
api/pom.xml
base/pom.xml
benchmark/jmh/pom.xml
  • org.openjdk.jmh:jmh-core 1.37
  • org.openjdk.jmh:jmh-generator-annprocess 1.37
  • org.ow2.asm:asm 9.7
  • org.ow2.asm:asm-commons 9.7
  • com.arakelian:json-filter 4.0.1
frameworks/jackson/pom.xml
impl/core/pom.xml
impl/path/pom.xml
pom.xml
  • org.slf4j:slf4j-api 1.7.36
  • com.fasterxml.jackson.core:jackson-core 2.17.1
  • com.fasterxml.jackson.core:jackson-databind 2.17.1
  • org.slf4j:jcl-over-slf4j 1.7.36
  • commons-io:commons-io 2.16.1
  • commons-codec:commons-codec 1.17.0
  • org.junit.jupiter:junit-jupiter-api 5.10.2
  • org.junit.jupiter:junit-jupiter-engine 5.10.2
  • org.mockito:mockito-core 5.11.0
  • com.google.truth:truth 1.4.2
  • com.google.truth.extensions:truth-java8-extension 1.4.2
  • com.jayway.jsonpath:json-path 2.9.0
  • io.github.classgraph:classgraph 4.8.165
  • com.google.jimfs:jimfs 1.3.0
  • org.apache.maven.plugins:maven-compiler-plugin 3.12.1
  • org.apache.maven.plugins:maven-surefire-plugin 3.0.0-M7
  • org.jacoco:jacoco-maven-plugin 0.8.12
  • org.sonatype.plugins:nexus-staging-maven-plugin 1.6.13
  • org.apache.maven.plugins:maven-source-plugin 3.2.1
  • org.apache.maven.plugins:maven-javadoc-plugin 3.6.3
  • org.apache.maven.plugins:maven-release-plugin 3.0.1
  • org.apache.maven.plugins:maven-gpg-plugin 3.1.0
  • org.apache.maven.plugins:maven-shade-plugin 3.5.2
  • org.apache.maven.plugins:maven-enforcer-plugin 3.4.1
  • org.moditect:moditect-maven-plugin 1.2.1.Final
  • org.sonarsource.scanner.maven:sonar-maven-plugin 3.9.1.2184

  • Check this box to trigger a request for Renovate to run again on this repository

Add delete filter

Allow for completely deleting field name + value/structure. Now the filter always adds a placeholder value, only deletes to address max size conserns.

Add metrics

Add metrics for number of filter interactions.

Support code points on truncate

Make sure we're not leaving half a character in the truncated output. Check high/low surrugate on the last kept character.

Add counter for documents with known number of target fields

Add functionality for stop filtering (i.e. stop parsing) for documents with a known number of target fields. For example a document with a header + body style schema; if the target is in the header, then after filtering the header, the rest of the document (body) can be skipped and copied through.

Simple implementation: counter + limit.
More complex implementation: exit level or exit path

Add support for max document length

Let users configure a max output size.

  • scan through document and keep track of object / array parent
  • at the limit, close the structure and add an informative message
  • consider parsing from the end of the document if that is shorter than from the front

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.