GithubHelp home page GithubHelp logo

bric3 / drain-java Goto Github PK

View Code? Open in Web Editor NEW
23.0 3.0 9.0 1.12 MB

This a pet project to explore log pattern extraction using DRAIN

License: Mozilla Public License 2.0

Java 100.00%
drain log tail java template-mining

drain-java's Issues

Replace JVM file watcher by alternatives that are not affected by filesystems boundaries

The watchservice of the JVM suffers from a few drawbacks regarding its integration with the OS. In Linux in particular events of bind mounts are not received.

Let's investigate alternative, in particular the gradle native integration : https://github.com/gradle/native-platform

+    implementation("net.rubygrapefruit:file-events:0.22")
+    implementation("net.rubygrapefruit:native-platform:0.22")

Currently file watching capabilities just appeared in a 0.22 milestone, unfortunately this is not completely released (platform specific native libraries are not published on bintray (for the published milestone)).

To follow https://github.com/gradle/native-platform/releases

However native-platform:0.21 is available on it's possible to play with some api like the terminal or files, e.g. :

try {
    Terminals terminals = Native.get(Terminals.class);
    var isTerminal = terminals.withAnsiOutput().isTerminal(Output.Stdout);

    if (isTerminal) {
        var terminal = terminals.withAnsiOutput().getTerminal(Output.Stdout);
        terminal.write("Hello");
        SECONDS.sleep(5);
        terminal.cursorStartOfLine()
                .clearToEndOfLine()
                .bold().write("Bold hello")
                .reset();
    }
} catch (InterruptedException e) {
    Thread.currentThread().interrupt();
}

Resolve `MappedFileLineReaderTest` on Windows and macOs builds

On Windows the build fails with

  • MappedFileLineReaderTest.find_start_position_given_last_lines()

    org.opentest4j.AssertionFailedError:
    expected: 42L
    but was : 43L

  • MappedFileLineReaderTest.can_read_from_position()

    org.opentest4j.AssertionFailedError:
    expected: 183L
    but was : 186L

  • MappedFileLineReaderTest.should_watch_with_channel_sink(Path)

java.io.IOException: Failed to delete temp directory C:\Users\RUNNER~1\AppData\Local\Temp\junit8439001560896197356. The following paths could not be deleted (see suppressed exceptions for details): , test4653189040998961269log

On MacOS the build fails with

  • MappedFileLineReaderTest.should_watch_with_channel_sink(Path)

org.opentest4j.AssertionFailedError:
expected: 592L
but was : 38L

Adds a mechanism to retain log event metadata such as the severity

Drain algorithm is a log mining algorithm, it's idea is to find patterns and group similar log event's message.

The good practice is to make the miner to process only the message part, ie strip elements like the date, the severity, the thread, the name.

Yet it might be interesting to keep some of these information. For example the severity or the log name, are unlikely to have a high cardinality, and maybe good candidate as log cluster metadata.

2021-03-29 12:55:24.172 [] DEBUG --- [  restartedMain] o.s.b.w.s.ServletContextInitializerBeans : Mapping filters: filterRegistrationBean urls=[/*] order=-2147483647, requestContextFilter urls=[/*] order=-1, contextServletRequestFilter urls=[/*] order=-2147483648, characterEncodingFilter urls=[/*] order=-2147483648, edgeRequestContextFilter urls=[/*] order=-2147483646, hideEdgeTechnicalEndpointsFilter urls=[/*] order=-2147483646, enableDebugLogsFilter urls=[/*] order=-2147483645, newrelicTransactionsFilter urls=[/*] order=-2147483645, accountingFilter urls=[/*] order=-2147483644, formContentFilter urls=[/*] order=-9900, disabledForwardedHeaderFilter urls=[/*] order=2147483647
2021-03-29 12:55:24.173 [] DEBUG --- [  restartedMain] o.s.b.w.s.ServletContextInitializerBeans : Mapping servlets: metricsService urls=[/metrics], dispatcherServlet urls=[/rest/*, /doc/*, /actuator/*, /error/*, /favicon.ico], com.blablacar.common.java.web.JerseyConfig urls=[/*]
2021-03-29 12:55:24.554 [] INFO  --- [  restartedMain] o.s.b.a.e.web.EndpointLinksResolver      : Exposing 3 endpoint(s) beneath base path '/actuator'
2021-03-29 12:55:24.606 [] INFO  --- [  restartedMain] o.s.s.concurrent.ThreadPoolTaskExecutor  : Initializing ExecutorService 'applicationTaskExecutor'
2021-03-29 12:55:24.972 [] INFO  --- [  restartedMain] o.s.b.d.a.OptionalLiveReloadServer       : LiveReload server is running on port 35729

Print _log clusters_ periodically or on a signal or key combination

Currently in drain mode, the discovered log clusters are only dumped (printed) when the log file has been entirely read.

This is not suitable when it is needed to watch a log file, there should be some mechanism to print the clusters

  • Periodically, with a flag on the command line to tweak the interval
  • Allows to handle a signal (maybe allowing to pick a signal from the os supported signal kill -l, without overriding the standard ones that are handled by the JVM already)
  • In the running tty, send a key combination, like ctrl+d

Process Json log events

Currently the code understand a line as a log event message. However in some production systems, the application can use structured logging using a Json document. The document may contains many additional metadata, or context data, but what drain is interested in is the message, for this reason it has to be able to extract the message from the document given a path.

Usually log events are serialized in a single line json document, see logstash-logback-encoder for example. So when parsing json, the events will be assumed to be single line. However the message itself may be multiline (new line are likely encoded as \n).

So the only work to do is to pre-process the string line as a Json and extract the message field.

Enable cluster lookup given a log message

In #5 introced a interesting feature to look for the cluster of a certain log.

The method findLogMessage will only look for an existing log cluster. This might be interesting to implement for search feature.

Thanks to @TodorKrIv for the idea and initial implementation.

Question

Hello, did you port https://github.com/IBM/Drain3 or original logpai implementation?
In readme - you mension IBM gays, but i see code mismaches (may be it's because IBM project activly updated).

Handle multiline log event (stack traces)

Currently the code is only able to process single line log messages. However it's possible to have multiline log messages.

Scope

In particular this ticket is about handling stacktraces, which usually starts with with whitespaces. I am not familiar with stacktraces in other languages, so the goal of this ticket is to focus on Java stack traces that may appear in a log trail.

Out of scope

  • Multi line log messages not starting by a whitespace
  • Stacktraces of other than Java

Allows to pass custom masks

Currently the code uses really simply tricks to mask some elements of a log event, eg by stripping the date component.

However there are other dynamic log components that may be worth to mask, IPs, UUIDs, etc.

Dependency Dashboard

This issue lists Renovate updates and detected dependencies. Read the Dependency Dashboard docs to learn more.

This repository currently has no open or pending branches.

Detected dependencies

github-actions
.github/workflows/gradle.yml
  • actions/checkout v4
  • gradle/actions v3
  • actions/checkout v4
  • actions/setup-java v4
  • gradle/actions v3
  • actions/upload-artifact v4
  • actions/download-artifact v4
  • mikepenz/action-junit-report v4
  • actions/checkout v4
  • actions/setup-java v4
  • gradle/actions v3
gradle
settings.gradle.kts
build.gradle.kts
drain-java-bom/build.gradle.kts
drain-java-core/build.gradle.kts
drain-java-jackson/build.gradle.kts
gradle/libs.versions.toml
  • com.google.code.findbugs:jsr305 3.0.2
  • info.picocli:picocli 4.7.6
  • info.picocli:picocli-codegen 4.7.6
  • com.fasterxml.jackson.core:jackson-core 2.17.2
  • com.fasterxml.jackson.core:jackson-annotations 2.17.2
  • com.fasterxml.jackson.core:jackson-databind 2.17.2
  • org.assertj:assertj-core 3.26.3
  • org.junit.jupiter:junit-jupiter-api 5.10.3
  • org.junit.jupiter:junit-jupiter-engine 5.10.3
  • de.undercouch.download 5.6.0
  • com.github.johnrengelman.shadow 8.1.1
  • com.github.ben-manes.versions 0.51.0
  • com.github.hierynomus.license 0.16.1
  • com.github.vlsi.gradle-extensions 1.90
  • nebula.release 19.0.10
tailer/build.gradle.kts
gradle-wrapper
gradle/wrapper/gradle-wrapper.properties
  • gradle 8.9

  • Check this box to trigger a request for Renovate to run again on this repository

Document Drain-Java

Currently I only offered hints in the README, but I definitely need to spend some time on the documentation of

  • Drain algorithm
  • Drain java API
  • Drain usage

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.