GithubHelp home page GithubHelp logo

piezoid / pugz Goto Github PK

View Code? Open in Web Editor NEW

This project forked from ebiggers/libdeflate

123.0 123.0 11.0 1.59 MB

Truly parallel gzip decompression

License: MIT License

Makefile 4.19% C 4.83% C++ 89.49% Shell 1.49%
cpp decompression gzip library parallel

pugz's People

Contributors

ebiggers avatar piezoid avatar rchikhi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pugz's Issues

Multipart/blocked gzip file

Adding support wouldn't be so difficult.

Once a random access thread reach a block marked as the last block of a gzip part it could parse the footer and the header of the next part and start a classic decompression from there.

results in README

add a table in the README with results (in MB/s) for e.g. counting lines and outputting whole text, versus gunzip

as results from the paper will likely be improved

core dump

trying to decompress or count lines of compressed tab separated files

~/git/pugz/gunzip -l file.tsv.gz

terminate called after throwing an instance of 'gzip_error'
what(): INVALID_LITERAL
Aborted (core dumped)

Race conditions

Somehow I suddenly am getting sporadic assertion failures, system_error exceptions, deadlocks, and sometimes it works:

parallelization=64
fileSize=$(( parallelization * 512 * 1024 * 1024 ))
filePath="/dev/shm/base64.gz"
base64 /dev/urandom | head -c $fileSize | pigz > "$filePath"
for (( i = 0; i < 20; ++i )); do
    time taskset --cpu-list 0-$(( parallelization - 1 )) pugz -t $parallelization -l "$filePath";
done

using 64 threads for decompression (experimental)
programs/../lib/deflate_decompress.hpp:945: Assertion '_state == state_t::FAIL' failed in 'std::pair<unique_span<unsigned char, lock_releaser<std::mutex> >, long unsigned int> DeflateThread::get_context()'.
Aborted

real	0m1.563s
user	0m39.937s
sys	0m5.495s
using 64 threads for decompression (experimental)
programs/../lib/deflate_decompress.hpp:945: Assertion '_state == state_t::FAIL' failed in 'std::pair<unique_span<unsigned char, lock_releaser<std::mutex> >, long unsigned int> DeflateThread::get_context()'.
Aborted

real	0m1.549s
user	0m44.777s
sys	0m7.197s
using 64 threads for decompression (experimental)
programs/../lib/deflate_decompress.hpp:945: Assertion '_state == state_t::FAIL' failed in 'std::pair<unique_span<unsigned char, lock_releaser<std::mutex> >, long unsigned int> DeflateThread::get_context()'.
Aborted

real	0m1.538s
user	0m47.619s
sys	0m3.533s
using 64 threads for decompression (experimental)
programs/../lib/deflate_decompress.hpp:945: Assertion '_state == state_t::FAIL' failed in 'std::pair<unique_span<unsigned char, lock_releaser<std::mutex> >, long unsigned int> DeflateThread::get_context()'.
Aborted

real	0m1.525s
user	0m44.756s
sys	0m6.804s
using 64 threads for decompression (experimental)
programs/../lib/deflate_decompress.hpp:945: Assertion '_state == state_t::FAIL' failed in 'std::pair<unique_span<unsigned char, lock_releaser<std::mutex> >, long unsigned int> DeflateThread::get_context()'.
Aborted

real	0m1.506s
user	0m47.683s
sys	0m4.834s
using 64 threads for decompression (experimental)
446230368

real	0m8.580s
user	8m3.151s
sys	0m16.146s
using 64 threads for decompression (experimental)
programs/../lib/deflate_decompress.hpp:945: Assertion '_state == state_t::FAIL' failed in 'std::pair<unique_span<unsigned char, lock_releaser<std::mutex> >, long unsigned int> DeflateThread::get_context()'.
Aborted

real	0m1.566s
user	0m47.136s
sys	0m6.704s
using 64 threads for decompression (experimental)
446230368

real	0m8.553s
user	8m3.348s
sys	0m14.164s
using 64 threads for decompression (experimental)
terminate called after throwing an instance of 'std::system_error'
  what():  
Aborted

real	0m5.413s
user	4m57.656s
sys	0m11.702s
using 64 threads for decompression (experimental)
programs/../lib/deflate_decompress.hpp:945: Assertion '_state == state_t::FAIL' failed in 'std::pair<unique_span<unsigned char, lock_releaser<std::mutex> >, long unsigned int> DeflateThread::get_context()'.
Aborted

real	0m1.524s
user	0m57.496s
sys	0m3.829s
using 64 threads for decompression (experimental)
programs/../lib/deflate_decompress.hpp:945: Assertion '_state == state_t::FAIL' failed in 'std::pair<unique_span<unsigned char, lock_releaser<std::mutex> >, long unsigned int> DeflateThread::get_context()'.
Aborted

real	0m1.531s
user	0m44.179s
sys	0m7.995s
using 64 threads for decompression (experimental)
programs/../lib/deflate_decompress.hpp:945: Assertion '_state == state_t::FAIL' failed in 'std::pair<unique_span<unsigned char, lock_releaser<std::mutex> >, long unsigned int> DeflateThread::get_context()'.
Aborted

real	0m1.542s
user	0m41.501s
sys	0m3.428s
using 64 threads for decompression (experimental)
programs/../lib/deflate_decompress.hpp:945: Assertion '_state == state_t::FAIL' failed in 'std::pair<unique_span<unsigned char, lock_releaser<std::mutex> >, long unsigned int> DeflateThread::get_context()'.
Aborted

real	0m1.505s
user	0m44.743s
sys	0m4.971s
using 64 threads for decompression (experimental)
 ^C
real	1m33.887s
user	0m35.798s
sys	0m2.809s

The last one deadlocked so that I had to interrupt it.

I cannot reproduce the problem when compressing with gzip or igzip as opposed to pigz.

When compressing with pigz --oneblock, the error messages change a little:

using 32 threads for decompression (experimental)
pugz: pthread_mutex_lock.c:94: ___pthread_mutex_lock: Assertion `mutex->__data.__owner == 0' failed.
Aborted
using 32 threads for decompression (experimental)
pugz: pthread_mutex_lock.c:94: ___pthread_mutex_lock: Assertion `mutex->__data.__owner == 0' failed.
Aborted
using 32 threads for decompression (experimental)
13944699
using 32 threads for decompression (experimental)
pugz: pthread_mutex_lock.c:94: ___pthread_mutex_lock: Assertion `mutex->__data.__owner == 0' failed.
Aborted
using 32 threads for decompression (experimental)
pugz: pthread_mutex_lock.c:94: ___pthread_mutex_lock: Assertion `mutex->__data.__owner == 0' failed.
Aborted
using 32 threads for decompression (experimental)
pugz: pthread_mutex_lock.c:94: ___pthread_mutex_lock: Assertion `mutex->__data.__owner == 0' failed.
Aborted
using 32 threads for decompression (experimental)
terminate called after throwing an instance of 'gzip_error'
  what():  Got a context from invalid position
Aborted
using 32 threads for decompression (experimental)
terminate called after throwing an instance of 'gzip_error'
  what():  Got a context from invalid position
Aborted
using 32 threads for decompression (experimental)
programs/../lib/deflate_decompress.hpp:945: Assertion '_state == state_t::FAIL' failed in 'std::pair<unique_span<unsigned char, lock_releaser<std::mutex> >, long unsigned int> DeflateThread::get_context()'.
Aborted
using 32 threads for decompression (experimental)
pugz: pthread_mutex_lock.c:94: ___pthread_mutex_lock: Assertion `mutex->__data.__owner == 0' failed.
Aborted

Using a larger --blocksize for pigz, seems to alleviate the problems, but I guess they only become rarer not impossible.

Compile / Installation problem

Hi,

I saw your arXiv paper and was excited to try the lib. When I try to compile pugz, I get this:

(base)  vini@mussismilia  ~/code/pugz   master  make
  AR       libdeflate.a  
  CXX      programs/gunzip.o  
In file included from programs/../lib/deflate_decompress.hpp:58:0,  
                 from programs/../lib/gzip_decompress.hpp:40,  
                 from programs/gunzip.cpp:28:  
programs/../lib/memory.hpp: In function 'malloc_span<T> alloc_huge(size_t)':  
programs/../lib/memory.hpp:382:38: error: 'MADV_HUGEPAGE' was not declared in this scope  
     auto res = ::madvise(ptr, bytes, MADV_HUGEPAGE);
                                      ^~~~~~~~~~~~~
programs/../lib/memory.hpp:382:38: note: suggested alternative: 'MADV_MERGEABLE'
     auto res = ::madvise(ptr, bytes, MADV_HUGEPAGE);
                                      ^~~~~~~~~~~~~
                                      MADV_MERGEABLE
make: *** [Makefile:179: programs/gunzip.o] Error 1

I followed the CLI's suggestion and replaced those strings by running sed -i -- 's/MADV_HUGEPAGE/MADV_MERGEABLE/g' */* . This changed lib/memory.hpp and programs/prog_util.hpp Obviously changing source code unadvised is not recommended, but I was only experimenting.

I then would get this error:

(base)  vini@mussismilia  ~/code/pugz   master ●  make
  CXX      programs/gunzip.o  
  CXX      programs/prog_util.o  
  CXX      programs/tgetopt.o  
  CCLD     gunzip  
/home/vini/anaconda3/bin/../lib/gcc/x86_64-conda_cos6-linux-gnu/7.3.0/../../../../x86_64-conda_cos6-linux-gnu/bin/ld: /tmp/ccLWiWLa.ltrans0.ltrans.o: undefined reference to symbol 'pthread_create@@GLIBC_2.2.5'
/home/vini/anaconda3/bin/../lib/gcc/x86_64-conda_cos6-linux-gnu/7.3.0/../../../../x86_64-conda_cos6-linux-gnu/bin/ld: /home/vini/anaconda3/bin/../x86_64-conda_cos6-linux-gnu/sysroot/lib/libpthread.so.0: error adding symbols: DSO missing from command line
collect2: error: ld returned 1 exit status
make: *** [Makefile:187: gunzip] Error 1

This was resolved by adding -pthread to the compiler flags in the Makefile.

So now the program compiled without errors. But it does not work:

(base)  vini@mussismilia  ~/code/pugz   master ●  ./gunzip predicted_bkp.tar.gz
terminate called after throwing an instance of 'gzip_error'
  what():  INVALID_LITERAL
[1]    41942 abort      ./gunzip predicted_bkp.tar.gz

Any ideas?

Thank you for any assistance you can provide.

PS: I read thorugh #8 but because I would get a different error, thought it would be good to create a different issue.

Make failing

Hi,

I just tried do build the latest commit

$ git rev-parse HEAD
42fb5b4f2ff825b2339a8e7b254ec400e822130c

but received the following error:

$ make
  AR       libdeflate.a
  CXX      programs/gunzip.o
In file included from programs/../lib/deflate_decompress.hpp:61:0,
                 from programs/../lib/gzip_decompress.hpp:40,
                 from programs/gunzip.cpp:28:
programs/../lib/input_stream.hpp: In member function ‘bool InputStream::set_position_bits(size_t)’:
programs/../lib/input_stream.hpp:233:16: warning: declaration of ‘bits’ shadows a member of ‘InputStream’ [-Wshadow]
         size_t bits  = bit_pos & 7;
                ^~~~
programs/../lib/input_stream.hpp:282:39: note: shadowed declaration is here
     template<typename T = uint32_t> T bits(bitbuf_size_t n = 8 * sizeof(T)) const
                                       ^~~~
In file included from programs/../lib/deflate_decompress.hpp:58:0,
                 from programs/../lib/gzip_decompress.hpp:40,
                 from programs/gunzip.cpp:28:
programs/../lib/memory.hpp: In instantiation of ‘unique_span<T, D>::unique_span() [with T = unsigned char; D = lock_releaser<std::mutex>]’:
programs/../lib/deflate_decompress.hpp:946:42:   required from here
programs/../lib/memory.hpp:227:21: error: no matching function for call to ‘lock_releaser<std::mutex>::lock_releaser()’
       , _end(nullptr)
                     ^
programs/../lib/memory.hpp:436:39: note: candidate: lock_releaser<std::mutex>::lock_releaser(std::unique_lock<std::mutex>::mutex_type&)
     using std::unique_lock<Lockable>::unique_lock;
                                       ^~~~~~~~~~~
programs/../lib/memory.hpp:436:39: note:   candidate expects 1 argument, 0 provided
programs/../lib/memory.hpp:436:39: note: candidate: lock_releaser<std::mutex>::lock_releaser(std::unique_lock<std::mutex>::mutex_type&, std::defer_lock_t)
programs/../lib/memory.hpp:436:39: note:   candidate expects 2 arguments, 0 provided
programs/../lib/memory.hpp:436:39: note: candidate: lock_releaser<std::mutex>::lock_releaser(std::unique_lock<std::mutex>::mutex_type&, std::try_to_lock_t)
programs/../lib/memory.hpp:436:39: note:   candidate expects 2 arguments, 0 provided
programs/../lib/memory.hpp:436:39: note: candidate: lock_releaser<std::mutex>::lock_releaser(std::unique_lock<std::mutex>::mutex_type&, std::adopt_lock_t)
programs/../lib/memory.hpp:436:39: note:   candidate expects 2 arguments, 0 provided
programs/../lib/memory.hpp:436:39: note: candidate: template<class _Clock, class _Duration> lock_releaser<std::mutex>::lock_releaser(std::unique_lock<std::mutex>::mutex_type&, const std::chrono::time_point<_Clock, _Duration1>&)
programs/../lib/memory.hpp:436:39: note:   template argument deduction/substitution failed:
programs/../lib/memory.hpp:227:21: note:   candidate expects 2 arguments, 0 provided
       , _end(nullptr)
                     ^
programs/../lib/memory.hpp:436:39: note: candidate: template<class _Rep, class _Period> lock_releaser<std::mutex>::lock_releaser(std::unique_lock<std::mutex>::mutex_type&, const std::chrono::duration<_Rep1, _Period1>&)
     using std::unique_lock<Lockable>::unique_lock;
                                       ^~~~~~~~~~~
programs/../lib/memory.hpp:436:39: note:   template argument deduction/substitution failed:
programs/../lib/memory.hpp:227:21: note:   candidate expects 2 arguments, 0 provided
       , _end(nullptr)
                     ^
programs/../lib/memory.hpp:438:5: note: candidate: lock_releaser<Lockable>::lock_releaser(std::unique_lock<_Mutex>&&) [with Lockable = std::mutex]
     lock_releaser(std::unique_lock<Lockable>&& lock) noexcept
     ^~~~~~~~~~~~~
programs/../lib/memory.hpp:438:5: note:   candidate expects 1 argument, 0 provided
programs/../lib/memory.hpp:434:49: note: candidate: lock_releaser<std::mutex>::lock_releaser(lock_releaser<std::mutex>&&)
 template<typename Lockable = std::mutex> struct lock_releaser : private std::unique_lock<Lockable>
                                                 ^~~~~~~~~~~~~
programs/../lib/memory.hpp:434:49: note:   candidate expects 1 argument, 0 provided
make: *** [programs/gunzip.o] Error 1

My compiler:

$ g++ --version
g++ (GCC) 6.4.0
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Not sure what I missed here.

Thank you for your input on this issue!

Best,

Cedric

Nanopore reads terminate called after throwing an instance of 'gzip_error' what(): Got a context from invalid position Aborted

Dear authors,

I used this tools to decompress nanopore reads file in gz format. Using command like this: gunzip -t 8 1.pass.fastq.gz > 1.fq
but It throw errors like below: terminate called after throwing an instance of 'gzip_error'
what(): Got a context from invalid position
Aborted

I found that the output file contain some reads but the command failed to continue. Could you help me to solve it .

image
image

Buggy ordering of g++ command line

See brewsci/homebrew-bio#644

Error:

/usr/bin/ld: /tmp/ccNbQPgH.ltrans0.ltrans.o: undefined reference to symbol 'pthread_create@@GLIBC_2.2.5'
/lib/x86_64-linux-gnu/libpthread.so.0: error adding symbols: DSO missing from command line
collect2: error: ld returned 1 exit status
Makefile:192: recipe for target 'gunzip' failed

Cause:

g++-5 -o gunzip   -std=c++14 -I. -Icommon -lpthread -Iexternal/type_safe/include -Iexternal/type_safe/external/debug_assert -Wall -Wundef -Wrestrict -Wnull-dereference -Wuseless-cast -Wshadow -Weffc++ -Wpedantic -Wvla -O4 -flto -march=native -mtune=native -g -D_POSIX_C_SOURCE=200809L -D_FILE_OFFSET_BITS=64 -DHAVE_CONFIG_H programs/gunzip.o programs/prog_util.o programs/tgetopt.o libdeflate.a -lrt

Reason:

The -lpthread is in the wrong location on this command line. All the -l options need to occur after the object files programs/gunzip.o programs/prog_util.o programs/tgetopt.o where the -lrt is currently found.

API design

Most gzip libraries provide an easy to use API of the form:

decompress(void* state, const char* input, size_t in_len, char* output, size_t out_len)

It decompresses the input stream to the output buffer until it run either out of input data or out of output buffer space. The user fills/empties the buffers and make another call, until end of file is reached.

In a multi threaded implementation this synchronous interface in no longer possible. The code consuming the decompressed stream must be called back from each decompressor thread when a buffer is ready, not in any particular order.

A gzip file is sliced in sections, processed sequentially, which are themselves splited into chunks, processed in parallel, one for each thread. The first chunk is decompressed normally in CPU cache, and yields ~32KB decompressed buffers to the user callback. The other chunks are decompressed into larger buffers (~100MB) and require a second pass of "back references translation". To do this in a cache friendly manner, we propose to translate the buffer by segments fitting in the L1 cache and invoke the callback after each cache fill.

There is other designs matters, like the possibility to run decompression from your own custom thread pool, C FFI compatibility, etc. See the full discussion bellow.

Constraints:

  1. Doing the translation the back-references on the whole buffer before yielding it to the client is costly since it trigger unnecessary TLB + LL cache misses on large memory region. We need a cache efficient translation/consumption step.

  2. The user callbacks will be called at random time from arbitrary threads with decompressed content from arbitrary positions in the stream. The user should be able to reorder them, either synchronously (eg. blocking reordering four writing to stdout) or asynchronously (eg. parser with rests).

  3. Support different threading implementations. (, OpenMP, custom work stealing, etc)

  4. Static library with minimal headers. Smallish machine code blob. Support multiple language, wit h C as the common denominator.)

  5. Able to generate, load custom indexes.

Implementation:

  • The sequential access thread, yield data in ~32KiB directly from its context window.
  • For the random access (n-1) threads, the low level interface will yield untranslated buffers + lookup tables
  • Adaptors provide ways to translate the buffer by steps of 32KiB.
    • Something more flexible than fixed intervals for parsing, like requesting overlaps with previous buffer. We'll see how it turns out with FASTQ parsing.
  • Possibility to specify unconsumed buffer size: "I didn't consumed the last x bytes, please resend them at the beginning of the next buffer".
  • Full buffer translation for the simpler to use interface (slow).
  1. We need to provide:
  • a total order over content parts,
    • needs polishing: currently (#section, #chunk, [#window flush])
    • could add or replace that with positions in the decompressed stream
  • a reordering for outputting parts in the correct order (currently std::mutex + std::condition_variable),
  • pin user data to threads (or maybe thread_local is good enough for that ?)
  1. Support different threading mechanisms.
  • basic implementation launching n threads with std::thread.
  • API for allocating local decompressor data for each thread, bridge them together and start them from user's threads.
  • Move away from the header-only distribution. It is too long to parse and generates too much code if multiple compilations units include it (without LTO).
  • Hide implementation behind a Pimpl class + a virtual (or struct of callbacks)
  • C interface for FFI interoperability : wrapping function + struct of callbacks.
  • Provide easy to use cmake subproject + sample project.
  • API yielding (stream position, content position, context) is enough to generate an index. (overlaps with 2.)
  • API to decompress from a sync point with an externally provided context (may overlaps with 3.)
  • Generic index format and high level API to use it. As noted by Heng Li, the 32KB contexts can make a heavy index if we want a fine granularity. Large granularity is sufficient for faster parallel decompression, but not for efficient random accesses.
    • We could remove unused characters by re-indexing the back-references.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.