GithubHelp home page GithubHelp logo

diy1 / clp Goto Github PK

View Code? Open in Web Editor NEW

This project forked from y-scope/clp

0.0 0.0 0.0 1.74 MB

Compressed Log Processor (CLP) is a free tool capable of compressing text logs and searching the compressed logs without decompression.

Home Page: https://yscope.com

License: Apache License 2.0

Shell 1.03% JavaScript 2.80% C++ 86.61% Python 5.75% CSS 0.01% ANTLR 0.06% HTML 0.01% CMake 3.41% Dockerfile 0.13% SCSS 0.19%

clp's Introduction

CLP

Open bug reports Open feature requests CLP on Zulip

YScope's Compressed Log Processor (CLP) compresses your logs, and allows you to search the compressed logs without decompression. CLP supports both JSON logs and unstructured (i.e., free text) logs. It also supports real-time log compression within several logging libraries. CLP also includes purpose-built web interfaces for searching and viewing the compressed logs. To learn more about it, you can read our paper.

Benchmarks

CLP Benchmark on JSON Logs CLP Benchmark on Unstructured Logs

The figures above show CLP's compression and search performance compared to other tools. We separate the experiments between JSON and unstructured logs because (1) some tools can only handle one type of logs, and (2) tools that can handle both types often have different designs for each type (such as CLP).

Compression ratio is measured as the average across a variety of log datasets. Some of these datasets can be found here. Search performance is measured using queries on the MongoDB logs (for JSON) and the Hadoop logs (for unstructured logs). Note that CLP uses an index-less design, so for a fair comparison, we disabled MongoDB and PostgreSQL's indexes; If we left them enabled, MongoDB and PostgreSQL's compression ratio would be worse. We didn't disable indexing for Elasticsearch or Splunk since these tools are fundamentally index-based (i.e., logs cannot be searched without indexes). More details about our experimental methodology can be found in the CLP paper.

System Overview

CLP systems overview

CLP provides an end-to-end log management pipeline consisting of compression, search, analytics, and viewing. The figure above shows the CLP ecosystem architecture. It consists of the following features:

  • Compression and Search: CLP compresses logs into archives, which can be searched and analyzed in a web UI. The input can either be raw logs or CLP's compressed IR (intermediate representation) produced by CLP's logging libraries.

  • Real-time Compression with CLP Logging Libraries: CLP provides logging libraries for Python and Java (Log4j and Logback). The logging libraries compress logs in real-time, so only compressed logs are written to disk or transmitted over the network. The compressed logs use CLP's intermediate representation (IR) format which achieves a higher compression ratio than general purpose compressors like Zstandard. Compressing IR into archives can further double the compression ratio and enable global search, but this requires more memory usage as it needs to buffer enough logs. More details on IR versus archives can be found in this Uber Engineering Blog.

  • Log Viewer: the compressed IR can be viewed in a web-based log viewer. Compared to viewing the logs in an editor, CLP's log viewer supports advanced features like filtering logs based on log level verbosity (e.g., only displaying logs with log level equal or higher than ERROR). These features are possible because CLP's logging libraries parse the logs before compressing them into IR.

  • IR Analytics Libraries: we also provide a Python library and a Go library that can analyze compressed IR.

  • Log parser: CLP also includes a custom pushdown-automata-based log parser that is 3x faster than state-of-the-art regular expression engines like RE2. The log parser is available as a library that can be used by other applications.

Getting Started

You can download a release package which includes support for distributed compression and search. Or, to quickly try CLP's core compression and search, you can use a prebuilt container.

We also have guides for building the package and CLP core from source.

For some logs you can use to test CLP, check out our open-source datasets.

Providing Feedback

You can use GitHub issues to report a bug or request a feature.

Join us on Zulip to chat with developers and other community members.

Project Structure

CLP is currently split across a few different components in the components directory:

  • clp-package-utils contains Python utilities for operating the CLP package.
  • clp-py-utils contains Python utilities common to several of the other components.
  • core contains code to compress uncompressed logs, decompress compressed logs, and search compressed logs.
  • job-orchestration contains code to schedule compression jobs on the cluster.
  • package-template contains the base directory structure and files of the CLP package.

GitHub Packages

The artifacts published to GitHub packages in this repo are a set of Docker container images useful for building and running CLP:

Image name Image contents Link
ghcr.io/y-scope/clp/clp-core-dependencies-x86-centos7.4:main The dependencies necessary to build CLP core in a Centos 7.4 x86 environment.
ghcr.io/y-scope/clp/clp-core-dependencies-x86-ubuntu-focal:main The dependencies necessary to build CLP core in an Ubuntu Focal x86 environment.
ghcr.io/y-scope/clp/clp-core-dependencies-x86-ubuntu-jammy:main The dependencies necessary to build CLP core in an Ubuntu Jammy x86 environment.
ghcr.io/y-scope/clp/clp-core-x86-ubuntu-focal:main The CLP core binaries (clg, clp, clp-s, glt, etc.) built in an Ubuntu Focal x86 environment.
ghcr.io/y-scope/clp/clp-execution-x86-ubuntu-focal:main The dependencies necessary to run the CLP package in an x86 environment.

Next Steps

This is our open-source release which we will be constantly updating with bug fixes, features, etc. If you would like a feature or want to report a bug, please file an issue and we'll be happy to engage.

Contributing

Have an issue you want to fix or a feature you'd like to implement? We'd love to see it!

Linting

Before submitting a PR, ensure you've run our linting tools and either fixed any violations or suppressed the warning. To run our linting workflows locally, you'll need Task. Alternatively, you can run the clp-lint workflow in your fork.

To perform the linting checks:

task lint:check

To also apply any automatic fixes:

task lint:fix

clp's People

Contributors

kirkrodrigues avatar haiqi96 avatar wraymo avatar gibber9809 avatar linzhihao-723 avatar davidlion avatar sharafmohamed avatar diy1 avatar junhaoliao avatar abvarun226 avatar all-less avatar jackluo923 avatar oliversm95 avatar davemarco avatar thepegasos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.