GithubHelp home page GithubHelp logo

amit1rrr / numcompress Goto Github PK

View Code? Open in Web Editor NEW
90.0 8.0 5.0 46 KB

Python package to compress numerical series & numpy arrays into strings

License: MIT License

Python 100.00%
compression decompression series-data compression-library numpy-arrays

numcompress's Introduction

PyPI version Build Status Coverage Status

numcompress

Simple way to compress and decompress numerical series & numpy arrays.

  • Easily gets you above 80% compression ratio
  • You can specify the precision you need for floating points (up to 10 decimal points)
  • Useful to store or transmit stock prices, monitoring data & other time series data in compressed string format

Compression algorithm is based on google encoded polyline format. I modified it to preserve arbitrary precision and apply it to any numerical series. The work is motivated by usefulness of time aware polyline built by Arjun Attam at HyperTrack. After building this I came across arrays that are much efficient than lists in terms memory footprint. You might consider using that over numcompress if you don't care about conversion to string for transmitting or storing purpose.

Installation

pip install numcompress

Usage

from numcompress import compress, decompress

# Integers
>>> compress([14578, 12759, 13525])
'B_twxZnv_nB_bwm@'

>>> decompress('B_twxZnv_nB_bwm@')
[14578.0, 12759.0, 13525.0]
# Floats - lossless compression
# precision argument specifies how many decimal points to preserve, defaults to 3
>>> compress([145.7834, 127.5989, 135.2569], precision=4)
'Csi~wAhdbJgqtC'

>>> decompress('Csi~wAhdbJgqtC')
[145.7834, 127.5989, 135.2569]
# Floats - lossy compression
>>> compress([145.7834, 127.5989, 135.2569], precision=2)
'Acn[rpB{n@'

>>> decompress('Acn[rpB{n@')
[145.78, 127.6, 135.26]
# compressing and decompressing numpy arrays
>>> from numcompress import compress_ndarray, decompress_ndarray
>>> import numpy as np

>>> series = np.random.randint(1, 100, 25).reshape(5, 5)
>>> compressed_series = compress_ndarray(series)
>>> decompressed_series = decompress_ndarray(compressed_series)

>>> series
array([[29, 95, 10, 48, 20],
       [60, 98, 73, 96, 71],
       [95, 59,  8,  6, 17],
       [ 5, 12, 69, 65, 52],
       [84,  6, 83, 20, 50]])

>>> compressed_series
'5*5,Bosw@_|_Cn_eD_fiA~tu@_cmA_fiAnyo@o|k@nyo@_{m@~heAnrbB~{BonT~lVotLoinB~xFnkX_o}@~iwCokuCn`zB_ry@'

>>> decompressed_series
array([[29., 95., 10., 48., 20.],
       [60., 98., 73., 96., 71.],
       [95., 59.,  8.,  6., 17.],
       [ 5., 12., 69., 65., 52.],
       [84.,  6., 83., 20., 50.]])

>>> (series == decompressed_series).all()
True

Compression Ratio

Test # of Numbers Compression ratio
Integers 10k 91.14%
Floats 10k 81.35%

You can run the test suite with -s switch to see the compression ratio. You can even modify the tests to see what kind of compression ratio you will get for your own input.

pytest -s

Here's a quick example showing compression ratio:

>>> series = random.sample(range(1, 100000), 50000)  # generate 50k random numbers between 1 and 100k
>>> text = compress(series)  # apply compression

>>> original_size = sum(sys.getsizeof(i) for i in series)
>>> original_size
1200000

>>> compressed_size = sys.getsizeof(text)
>>> compressed_size
284092

>>> compression_ratio = ((original_size - compressed_size) * 100.0) / original_size
>>> compression_ratio
76.32566666666666

We get ~76% compression for 50k random numbers between 1 & 100k. This ratio increases for real world numerical series as the difference between consecutive numbers tends to be lower. Think of stock prices, monitoring & other time series data.

Contribute

If you see any problem, open an issue or send a pull request. You can write to me at [email protected]

numcompress's People

Contributors

adnanmuttaleb avatar amit1rrr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

numcompress's Issues

How about multi dimensional list (numpy array)

Great work. I know I can write small loop to collect single dimensional arrays from multi dimesional numpy array to do compress and store into list of compressed strings. And get back to numbers by calling decompress into series of numbers to make original N-dimensional numpy array.

But still, do you have any further plan to extend this numcompress to take multi-dimensional array input ?

Thanks.

No special characters

Good morning,
I would like to know if it would be possible to compress a list without the use of special characters, so only numbers and letters.

really thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.