GithubHelp home page GithubHelp logo

kodejuice / arithmetic-compressor Goto Github PK

View Code? Open in Web Editor NEW
11.0 11.0 3.0 471 KB

An implementation of the Arithmetic Coding algorithm in Python.

License: MIT License

Python 100.00%
arithmetic-coding compression encoding-library entropy information-theory neural-network

arithmetic-compressor's People

Contributors

kodejuice avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

arithmetic-compressor's Issues

AssertionError: Low or high out of range

I'm trying to use this module on enwik5 data (10 000 bytes). But I encounter this error:

AssertionError: Low or high out of range

Are there any additional limitations in the implementation? Or do I do something wrong?

The script below works ok with enwik4 data (1 000 bytes).

I count statistics myself and then use StaticModel, but I encounter either this Low or high out of range error, or ValueError: Symbol has zero frequency error.

enwik5.zip

fn = 'enwik5'

print(fn)

def read_bytes(path):
    with open(path, 'rb') as f:
        return list(f.read())

data = read_bytes(fn)
nsyms = 256
stats = [0] * nsyms
for c in data:
    stats[c] += 1

from arithmetic_compressor import AECompressor

from arithmetic_compressor.models.base_adaptive_model import BaseFrequencyTable
from arithmetic_compressor.util import *

SCALE_FACTOR = 4096

class StaticModel:
  """A static model, which does not adapt to input data or statistics."""

  def __init__(self, counts_dict):
    #vals = (v for k, v in counts_dict.items())
    #counts_sum = sum(vals)
    #probability = {k: v / counts_sum for k, v in counts_dict.items()}
    #print(probability)
    probability = counts_dict

    symbols = list(probability.keys())

    self.name = "Static"
    self.symbols = symbols
    self.__prob = dict(probability)

    # compute cdf from given probability
    cdf = {}
    prev_freq = 0
    self.freq = freq = {sym: round(SCALE_FACTOR * prob)
                        for sym, prob in probability.items()}
    for sym, freq in freq.items():
      cdf[sym] = Range(prev_freq, prev_freq + freq)
      prev_freq += freq
    self.cdf_object = cdf

  def cdf(self):
    return self.cdf_object

  def probability(self):
    return self.__prob

  def predict(self, symbol):
    assert symbol in self.symbols
    return self.probability()[symbol]

  def update(self, symbol):
    pass

  def test_model(self, gen_random=True, N=10000, custom_data=None):
    self.name = "Static Model"
    return BaseFrequencyTable.test_model(self, gen_random, N, custom_data)

freq_map = {
    sym: freq for sym, freq in enumerate(stats)
    if freq > 0
}

model = StaticModel(freq_map)
coder = AECompressor(model)

N = len(data)
compressed = coder.compress(data)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.