GithubHelp home page GithubHelp logo

pombredanne / lz4-java Goto Github PK

View Code? Open in Web Editor NEW

This project forked from lz4/lz4-java

0.0 0.0 0.0 6.57 MB

LZ4 compression for Java

License: Apache License 2.0

Java 26.26% HTML 4.96% C 65.34% Makefile 2.62% CMake 0.35% Groff 0.47%

lz4-java's Introduction

LZ4 Java

LZ4 compression for Java, based on Yann Collet's work available at http://code.google.com/p/lz4/.

This library provides access to two compression methods that both generate a valid LZ4 stream:

  • fast scan (LZ4):
    • low memory footprint (~ 16 KB),
    • very fast (fast scan with skipping heuristics in case the input looks incompressible),
    • reasonable compression ratio (depending on the redundancy of the input).
  • high compression (LZ4 HC):
    • medium memory footprint (~ 256 KB),
    • rather slow (~ 10 times slower than LZ4),
    • good compression ratio (depending on the size and the redundancy of the input).

The streams produced by those 2 compression algorithms use the same compression format, are very fast to decompress and can be decompressed by the same decompressor instance.

Implementations

For LZ4 compressors, LZ4 HC compressors and decompressors, 3 implementations are available:

  • JNI bindings to the original C implementation by Yann Collet,
  • a pure Java port of the compression and decompression algorithms,
  • a Java port that uses the sun.misc.Unsafe API in order to achieve compression and decompression speeds close to the C implementation.

Have a look at LZ4Factory for more information.

Compatibility notes

  • Compressors and decompressors are interchangeable: it is perfectly correct to compress with the JNI bindings and to decompress with a Java port, or the other way around.

  • Compressors might not generate the same compressed streams on all platforms, especially if CPU endianness differs, but the compressed streams can be safely decompressed by any decompressor implementation on any platform.

Example

LZ4Factory factory = LZ4Factory.fastestInstance();

byte[] data = "12345345234572".getBytes("UTF-8");
final int decompressedLength = data.length;

// compress data
LZ4Compressor compressor = factory.fastCompressor();
int maxCompressedLength = compressor.maxCompressedLength(decompressedLength);
byte[] compressed = new byte[maxCompressedLength];
int compressedLength = compressor.compress(data, 0, decompressedLength, compressed, 0, maxCompressedLength);

// decompress data
// - method 1: when the decompressed length is known
LZ4FastDecompressor decompressor = factory.fastDecompressor();
byte[] restored = new byte[decompressedLength];
int compressedLength2 = decompressor.decompress(compressed, 0, restored, 0, decompressedLength);
// compressedLength == compressedLength2

// - method 2: when the compressed length is known (a little slower)
// the destination buffer needs to be over-sized
LZ4SafeDecompressor decompressor2 = factory.safeDecompressor();
int decompressedLength2 = decompressor2.decompress(compressed, 0, compressedLength, restored, 0);
// decompressedLength == decompressedLength2

xxhash Java

xxhash hashing for Java, based on Yann Collet's work available at http://code.google.com/p/xxhash/. xxhash is a non-cryptographic, extremly fast and high-quality (SMHasher score of 10) hash function.

Implementations

Similarly to LZ4, 3 implementations are available: JNI bindings, pure Java port and pure Java port that uses sun.misc.Unsafe.

Have a look at XXHashFactory for more information.

Compatibility notes

  • All implementation return the same hash for the same input bytes:
    • on any JVM,
    • on any platform (even if the endianness or integer size differs).

Example

XXHashFactory factory = XXHashFactory.fastestInstance();

byte[] data = "12345345234572".getBytes("UTF-8");
ByteArrayInputStream in = new ByteArrayInputStream(data);

int seed = 0x9747b28c; // used to initialize the hash value, use whatever
                       // value you want, but always the same
StreamingXXHash32 hash32 = factory.newStreamingHash32(seed);
byte[] buf = new byte[8]; // for real-world usage, use a larger buffer, like 8192 bytes
for (;;) {
  int read = in.read(buf);
  if (read == -1) {
    break;
  }
  hash32.update(buf, 0, read);
}
int hash = hash32.getValue();

Download

You can download released artifacts from Maven Central.

Documentation

Performance

Both lz4 and xxhash focus on speed. Although compression, decompression and hashing performance can depend a lot on the input (there are lies, damn lies and benchmarks), here are some benchmarks that try to give a sense of the speed at which they compress/decompress/hash bytes.

Build

Requirements

  • JDK version 7 or newer,
  • ant,
  • ivy.

If ivy is not installed yet, ant can take care of it for you, just run ant ivy-bootstrap. The library will be installed under ${user.home}/.ant/lib.

Instructions

Then run ant. It will:

  • generate some Java source files in build/java from the templates that are located under src/build,
  • compile the lz4 and xxhash libraries and their JNI (Java Native Interface) bindings,
  • compile Java sources in src/java (normal sources), src/java-unsafe (sources that make use of sun.misc.Unsafe) and build/java (auto-generated sources) to build/classes, build/unsafe-classes and build/generated-classes,
  • generate a JAR file called lz4-${version}.jar under the dist directory.

The JAR file that is generated contains Java class files, the native library and the JNI bindings. If you add this JAR to your classpath, the native library will be copied to a temporary directory and dynamically linked to your Java application.

lz4-java's People

Contributors

blambov avatar clockfort avatar jpountz avatar linnaea avatar luben avatar magnet avatar ruippeixotog avatar sunxiaoguang avatar yilongli avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.