GithubHelp home page GithubHelp logo

vyx-'s Introduction

What is this?

This is a attempt to compress vyxal code. It's a huffman tree compressor trained on 1500 Vyxal programs from CSGE.

How good is it?

Medium

Overall

N=1831 Original Compressed Difference Ratio
mean 31.12 26.24 -4.87 0.92
stdev 191.14 131.28 92.76 0.10
min 0.00 0.00 -3680.00 0.00
10% med 4.00 3.00 -3.00 0.80
25% med 6.00 5.00 -1.00 0.87
median 10.00 9.00 -1.00 0.92
75% med 19.00 18.00 0.00 1.00
90% med 48.00 44.80 0.00 1.00
max 6553.00 5131.00 37.00 2.00
min mode 6.00 5.00 -1.00 1.00
max mode 6.00 5.00 -1.00 1.00

Short programs (length < 10)

N=906 Original Compressed Difference Ratio
mean 5.67 5.17 -0.49 0.91
stdev 2.20 2.13 0.54 0.13
min 0.00 0.00 -2.00 0.00
10% med 3.00 2.00 -1.00 0.75
25% med 4.00 3.00 -1.00 0.83
median 6.00 5.00 0.00 1.00
75% med 8.00 7.00 0.00 1.00
90% med 9.00 8.00 0.00 1.00
max 9.00 9.00 1.00 2.00
min mode 6.00 5.00 0.00 1.00
max mode 6.00 5.00 0.00 1.00

Medium programs (10 <= length < 100)

N=843 Original Compressed Difference Ratio
mean 25.17 23.29 -1.88 0.92
stdev 18.78 17.58 2.65 0.06
min 10.00 8.00 -25.00 0.69
10% med 10.00 10.00 -4.00 0.85
25% med 13.00 12.00 -2.00 0.89
median 18.00 16.00 -1.00 0.92
75% med 30.00 28.00 -1.00 0.96
90% med 52.60 49.00 0.00 1.00
max 99.00 97.00 7.00 1.18
min mode 10.00 10.00 -1.00 1.00
max mode 10.00 10.00 -1.00 1.00

Long programs (100 <= length)

N=82 Original Compressed Difference Ratio
mean 373.39 289.45 -83.94 0.91
stdev 833.94 557.59 433.24 0.15
min 102.00 77.00 -3680.00 0.13
10% med 117.90 108.00 -84.80 0.76
25% med 144.00 128.00 -36.25 0.82
median 220.00 201.00 -15.00 0.89
75% med 268.25 268.25 9.50 1.05
90% med 533.30 470.40 19.00 1.08
max 6553.00 5131.00 37.00 1.09
min mode 250.00 108.00 18.00 0.92
max mode 250.00 108.00 18.00 1.07

Saves around 4-5 bytes on average (but with huge variability), does nothing most of the time. Occasionally make the program slightly longer. More savings for long programs but still a small average reduction even for very short programs.

How do I run it?

Collect your own data

If you want to collect your own data, you will need to set the STACK_API_KEY environment variable to your stack apps API key then remove the if False from collect_data.py and run it.

Building a tree from the data

This depends on the code_json.json file created in the previous step. You could also chose to replace this with your own corpus.

You can then run analize_data.py to build a tree.

To visualize the tree you can use the command dot -Tsvg graph.dot -o out.svg. It's debatable whether this visualization is helpful.

Use the data

The encode_decode.py will benchmark the encoding and produce the table like seen above. It will also make sure the enccoding is sane.

To use in your own script, use like this:

from vyxμ import encode_decode

forest = encode_decode.load_forest()
encoded_string = encode_decode.bits_to_bytes(encode_decode.encode(string, forest))

For decoding use the same except inverse.

vyx-'s People

Contributors

mousetail avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.