GithubHelp home page GithubHelp logo

custom-bzip's Introduction

custom-bzip

Implementation of a text encoder + decoder similar to bzip.

How to run the program

This program consists of three scripts:

  1. st2sa.py - Creates a suffix array using Ukkonen Algorithm and returns a suffix array. As an argument, it takes a filename of a target string.
  2. bwtzip.py - Zips a given string input into a binary file. As an argument, it takes a filename of a target string.
  3. bwtunzip.py - Unzips a zipped string using bwtzip.py. It requires the zipped string location as an argument.

Encoder and Decoder Design

As shown below, both the encoder and decoder use several techniques to reduce the size of text efficiently. It first uses Ukkonen's algorithm to generate a suffix array. BWT uses this suffix array and apply L-F mapping to compute the BWT encoded text. Now, the run length encoder, consisting of Huffman and Elias encoders, will encode the BWT encoded text to bitstreams run by run. The final binary file starts with a number of unique characters in the text, followed by the Huffman table and the encoded text. Encoder Design

custom-bzip's People

Contributors

superleesa avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.