GithubHelp home page GithubHelp logo

fehmisener / lz77-and-deflate-compressor Goto Github PK

View Code? Open in Web Editor NEW
12.0 1.0 2.0 390 KB

This project is about the implementation of the LZ77 and Deflate COMPRESSION algorithm.

License: GNU General Public License v3.0

C 100.00%
lzz lz77 deflate data data-compression compression-algorithm data-compressor text-compression deflate-compare-lz77 c lz77-compression-algorithm deflate-algorithm

lz77-and-deflate-compressor's Introduction

LZ77 and Deflate Compressor

This project is about the implementation of the LZ77 and Deflate COMPRESSION algorithm.
main.c file contains implementations of both LZ77 and Deflate algorithm. It also contains the huffman code and huffman tree code required for the deflate algorithm.

General summary of the project

The project aims to compress and store the text document file with .txt extension, which is received as input.
To do this, LZSS (Lempel – Ziv – Storer – Szymanski) and DEFLATE algorithm are used. LZSS algorithm aims to compress by eliminating repetitive places.

How the lz77 algorithm works

LZ77 iterates sequentially through the input string and stores any new match into a search buffer. The process of compression can be divided in 3 steps:
1-Find the longest match of a string that starts at the current position with a pattern available in the search buffer.
2- Output a triple (o, l, c) where,
-> o: offset, represents the number of positions that we would need to move backwards in order to find the start of the matching string.
-> l: length, represents the length of the match.
-> c: character, represents the character that is found after the match.
Move the cursor l+1 positions to the right.
Simple Example of LZ77
for more information: https://towardsdatascience.com/how-data-compression-works-exploring-lz77-3a2c2e06c097

How the DEFLATE algorithm works

The DEFLATE compressed data format consists of a series of blocks, corresponding to successive blocks of input data. Each block is compressed using a combination of the LZ77 algorithm and Huffman coding . The LZ77 algorithm finds repeated substrings and replaces them with backward references (relative distance offsets). The LZ77 algorithm can use a reference to a duplicated string occurring in the same or previous blocks, up to 32K input bytes back.

Project language

It is written in C language and Dev-C ++ IDE is used.
!!! IMPORTANT WARNING: The program causes a problem in compressing text documents over 200-300Kb. (Also explained in the report) !!!
!!!!! Just put the text documents to be tested in the input folder, and there is no need for another folder for either algorithm. !!!!!

USING THE PROGRAM STEP BY STEP;

1- Since two different compression algorithms will be used, there are two different input folders. (INPUT), (INPUTDEFLATE)
2- Among them, INPUT contains the txt files required for "LZSS algorithm" and "DEFLATE". You don't need to add anything to other folders.
3- Txt files must be in the same number and content for comparison.
4- The names of the txt files placed in the folders are 461-465. It should be corrected harmoniously between the lines.
5- In the same way, the same steps are valid for the output files.
6- There are 3 test txt in the folder of the program by default.
7- After the necessary steps are done, all you have to do is compile and run the program.
8- After the program runs, the (LZSS-DEFLATE) algorithms work in order to perform the necessary compression operations.
9- Compression rates and file sizes are displayed to the user through the program.
10- There may be changes in compression rates depending on the test text used. If adequate matching is not achieved, one of the disadvantages of the algorithm, "PADDING" ie inflating process is done and the file size increases. (In the compression ratio can be "-" ratio).
11- After the program runs, in the positions specified in lines 467 and 477; LZFG (Our own extension) extension in the "output" folder for LZ77, For DEFLATE, files with .def (short for Deflate) extension are created in the "outputDEFLATE" folder.
12- These created files are the compressed version of the .txt files received in the input folder.

Resources Used

https://cs.stanford.edu/people/eroberts/courses/soco/projects/data-compression/lossless/lz77/index.htm
https://pdfs.semanticscholar.org/e8d7/c01594cf4359c3d50aef7db88b0153c7fcbd.pdf
http://corpus.canterbury.ac.nz/descriptions/#cantrbry
http://altanmesut.trakya.edu.tr/pubs/DR_Tez.pdf
https://www.youtube.com/watch?v=0vMyw4Lv3Hw
https://www.youtube.com/watch?v=goOa3DGezUA
https://www.youtube.com/watch?v=JsTptu56GM8
https://ysar.net/algoritma/lz77.html
https://zlib.net/feldspar.html
https://sites.google.com/site/datacompressionguide/lz77
http://michael.dipperstein.com/lzss/
http://www.mathcs.emory.edu/~cheung/Courses/255/Syllabus/1-C-intro/bitarray.html
https://github.com/dbrgn/algorithms/blob/master/lz/lz77.c
https://github.com/ggz/huffman_coding
https://people.ok.ubc.ca/ylucet/DS/Huffman.html
https://www.researchgate.net/figure/Simplified-DEFLATEalgorithm_fig3_228411140

lz77-and-deflate-compressor's People

Contributors

fehmisener avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

jatin837 datolll

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.