What is this?

This is a attempt to compress vyxal code. It's a huffman tree compressor trained on 1500 Vyxal programs from CSGE.

How good is it?

Medium

Overall

N=1831	Original	Compressed	Difference	Ratio
mean	31.12	26.24	-4.87	0.92
stdev	191.14	131.28	92.76	0.10
min	0.00	0.00	-3680.00	0.00
10% med	4.00	3.00	-3.00	0.80
25% med	6.00	5.00	-1.00	0.87
median	10.00	9.00	-1.00	0.92
75% med	19.00	18.00	0.00	1.00
90% med	48.00	44.80	0.00	1.00
max	6553.00	5131.00	37.00	2.00
min mode	6.00	5.00	-1.00	1.00
max mode	6.00	5.00	-1.00	1.00

Short programs (length < 10)

N=906	Original	Compressed	Difference	Ratio
mean	5.67	5.17	-0.49	0.91
stdev	2.20	2.13	0.54	0.13
min	0.00	0.00	-2.00	0.00
10% med	3.00	2.00	-1.00	0.75
25% med	4.00	3.00	-1.00	0.83
median	6.00	5.00	0.00	1.00
75% med	8.00	7.00	0.00	1.00
90% med	9.00	8.00	0.00	1.00
max	9.00	9.00	1.00	2.00
min mode	6.00	5.00	0.00	1.00
max mode	6.00	5.00	0.00	1.00

Medium programs (10 <= length < 100)

N=843	Original	Compressed	Difference	Ratio
mean	25.17	23.29	-1.88	0.92
stdev	18.78	17.58	2.65	0.06
min	10.00	8.00	-25.00	0.69
10% med	10.00	10.00	-4.00	0.85
25% med	13.00	12.00	-2.00	0.89
median	18.00	16.00	-1.00	0.92
75% med	30.00	28.00	-1.00	0.96
90% med	52.60	49.00	0.00	1.00
max	99.00	97.00	7.00	1.18
min mode	10.00	10.00	-1.00	1.00
max mode	10.00	10.00	-1.00	1.00

Long programs (100 <= length)

N=82	Original	Compressed	Difference	Ratio
mean	373.39	289.45	-83.94	0.91
stdev	833.94	557.59	433.24	0.15
min	102.00	77.00	-3680.00	0.13
10% med	117.90	108.00	-84.80	0.76
25% med	144.00	128.00	-36.25	0.82
median	220.00	201.00	-15.00	0.89
75% med	268.25	268.25	9.50	1.05
90% med	533.30	470.40	19.00	1.08
max	6553.00	5131.00	37.00	1.09
min mode	250.00	108.00	18.00	0.92
max mode	250.00	108.00	18.00	1.07

Saves around 4-5 bytes on average (but with huge variability), does nothing most of the time. Occasionally make the program slightly longer. More savings for long programs but still a small average reduction even for very short programs.

How do I run it?

Collect your own data

If you want to collect your own data, you will need to set the STACK_API_KEY environment variable to your stack apps API key then remove the if False from collect_data.py and run it.

Building a tree from the data

This depends on the code_json.json file created in the previous step. You could also chose to replace this with your own corpus.

You can then run analize_data.py to build a tree.

To visualize the tree you can use the command dot -Tsvg graph.dot -o out.svg. It's debatable whether this visualization is helpful.

Use the data

The encode_decode.py will benchmark the encoding and produce the table like seen above. It will also make sure the enccoding is sane.

To use in your own script, use like this:

from vyxμ import encode_decode

forest = encode_decode.load_forest()
encoded_string = encode_decode.bits_to_bytes(encode_decode.encode(string, forest))

For decoding use the same except inverse.

mousetail / vyx- Goto Github PK

vyx-'s Introduction

What is this?

How good is it?

Overall

Short programs (length < 10)

Medium programs (10 <= length < 100)

Long programs (100 <= length)

How do I run it?

Collect your own data

Building a tree from the data

Use the data

vyx-'s People

Contributors

Stargazers

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs