GithubHelp home page GithubHelp logo

i-e-b / ryg_rans Goto Github PK

View Code? Open in Web Editor NEW

This project forked from rygorous/ryg_rans

0.0 2.0 0.0 341 KB

Simple rANS encoder/decoder (arithmetic coding-ish entropy coder).

License: Other

Makefile 0.49% C++ 58.23% C 41.28%

ryg_rans's Introduction

Purpose of this fork:

This is here to experiment with a managed C# port. Please see the original for the definitive version

Original Readme:

This is a public-domain implementation of several rANS variants. rANS is an entropy coder from the ANS family, as described in Jarek Duda's paper "Asymmetric numeral systems" (http://arxiv.org/abs/1311.2540).

  • "rans_byte.h" has a byte-aligned rANS encoder/decoder and some comments on how to use it. This implementation should work on all 32-bit architectures. "main.cpp" is an example program that shows how to use it.
  • "rans64.h" is a 64-bit version that emits entire 32-bit words at a time. It is (usually) a good deal faster than rans_byte on 64-bit architectures, and also makes for a very precise arithmetic coder (i.e. it gets quite close to entropy). The trade-off is that this version will be slower on 32-bit machines, and the output bitstream is not endian-neutral. "main64.cpp" is the corresponding example.
  • "rans_word_sse41.h" has a SIMD decoder (SSE 4.1 to be precise) that does IO in units of 16-bit words. It has less precision than either rans_byte or rans64 (meaning that it doesn't get as close to entropy) and requires at least 4 independent streams of data to be useful; however, it is also a good deal faster. "main_simd.cpp" shows how to use it.

See my blog http://fgiesen.wordpress.com/ for some notes on the design.

I've also written a paper on interleaving output streams from multiple entropy coders:

http://arxiv.org/abs/1402.3392

this documents the underlying design for "rans_word_sse41", and also shows how the same approach generalizes to e.g. GPU implementations, provided there are enough independent contexts coded at the same time to fill up a warp/wavefront or whatever your favorite GPU's terminology for its native SIMD width is.

Finally, there's also "main_alias.cpp", which shows how to combine rANS with the alias method to get O(1) symbol lookup with table size proportional to the number of symbols. I presented an overview of the underlying idea here:

http://fgiesen.wordpress.com/2014/02/18/rans-with-static-probability-distributions/

Results on my machine (Sandy Bridge i7-2600K) with rans_byte in 64-bit mode:


rANS encode: 12896496 clocks, 16.8 clocks/symbol (192.8MiB/s) 12486912 clocks, 16.2 clocks/symbol (199.2MiB/s) 12511975 clocks, 16.3 clocks/symbol (198.8MiB/s) 12660765 clocks, 16.5 clocks/symbol (196.4MiB/s) 12550285 clocks, 16.3 clocks/symbol (198.2MiB/s) rANS: 435113 bytes 17023550 clocks, 22.1 clocks/symbol (146.1MiB/s) 18081509 clocks, 23.5 clocks/symbol (137.5MiB/s) 16901632 clocks, 22.0 clocks/symbol (147.1MiB/s) 17166188 clocks, 22.3 clocks/symbol (144.9MiB/s) 17235859 clocks, 22.4 clocks/symbol (144.3MiB/s) decode ok!

interleaved rANS encode: 9618004 clocks, 12.5 clocks/symbol (258.6MiB/s) 9488277 clocks, 12.3 clocks/symbol (262.1MiB/s) 9460194 clocks, 12.3 clocks/symbol (262.9MiB/s) 9582025 clocks, 12.5 clocks/symbol (259.5MiB/s) 9332017 clocks, 12.1 clocks/symbol (266.5MiB/s) interleaved rANS: 435117 bytes 10687601 clocks, 13.9 clocks/symbol (232.7MB/s) 10637918 clocks, 13.8 clocks/symbol (233.8MB/s) 10909652 clocks, 14.2 clocks/symbol (227.9MB/s) 10947637 clocks, 14.2 clocks/symbol (227.2MB/s) 10529464 clocks, 13.7 clocks/symbol (236.2MB/s) decode ok!


And here's rans64 in 64-bit mode:


rANS encode: 10256075 clocks, 13.3 clocks/symbol (242.3MiB/s) 10620132 clocks, 13.8 clocks/symbol (234.1MiB/s) 10043080 clocks, 13.1 clocks/symbol (247.6MiB/s) 9878205 clocks, 12.8 clocks/symbol (251.8MiB/s) 10122645 clocks, 13.2 clocks/symbol (245.7MiB/s) rANS: 435116 bytes 14244155 clocks, 18.5 clocks/symbol (174.6MiB/s) 15072524 clocks, 19.6 clocks/symbol (165.0MiB/s) 14787604 clocks, 19.2 clocks/symbol (168.2MiB/s) 14736556 clocks, 19.2 clocks/symbol (168.8MiB/s) 14686129 clocks, 19.1 clocks/symbol (169.3MiB/s) decode ok!

interleaved rANS encode: 7691159 clocks, 10.0 clocks/symbol (323.3MiB/s) 7182692 clocks, 9.3 clocks/symbol (346.2MiB/s) 7060804 clocks, 9.2 clocks/symbol (352.2MiB/s) 6949201 clocks, 9.0 clocks/symbol (357.9MiB/s) 6876415 clocks, 8.9 clocks/symbol (361.6MiB/s) interleaved rANS: 435120 bytes 8133574 clocks, 10.6 clocks/symbol (305.7MB/s) 8631618 clocks, 11.2 clocks/symbol (288.1MB/s) 8643790 clocks, 11.2 clocks/symbol (287.7MB/s) 8449364 clocks, 11.0 clocks/symbol (294.3MB/s) 8331444 clocks, 10.8 clocks/symbol (298.5MB/s) decode ok!


Finally, here's the rans_word_sse41 decoder on an 8-way interleaved stream:


SIMD rANS: 435626 bytes 4597641 clocks, 6.0 clocks/symbol (540.8MB/s) 4514356 clocks, 5.9 clocks/symbol (550.8MB/s) 4780918 clocks, 6.2 clocks/symbol (520.1MB/s) 4532913 clocks, 5.9 clocks/symbol (548.5MB/s) 4554527 clocks, 5.9 clocks/symbol (545.9MB/s) decode ok!


There's also an experimental 16-way interleaved AVX2 version that hits faster rates still, developed by my colleague Won Chun; I will post it soon.

Note that this is running "book1" which is a relatively short test, and the measurement setup is not great, so take the results with a grain of salt.

-Fabian "ryg" Giesen, Feb 2014.

ryg_rans's People

Contributors

rygorous avatar i-e-b avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.