wizzat / distribution Goto Github PK

View Code? Open in Web Editor NEW

453.0 453.0 23.0 255 KB

Short, simple, direct scripts for creating ASCII graphical histograms in the terminal.

License: GNU General Public License v2.0

Perl 44.97% Python 50.33% Shell 4.70%

distribution's People

Contributors

Stargazers

Watchers

distribution's Issues

Support for buckets

Hello,

does "distribution" has support for "buckets"? In other words, can keys be grouped together by value?

Example: when reading input with values in range 0-255, is there a way to create a histogram like this?
cat /dev/urandom | od -tu1 -w1 -An -v | head -10000 | histogram.py | cut -d: -f1

NumSamples = 10000; Min = 0.00; Max = 255.00

Mean = 126.874400; Variance = 5451.876425; SD = 73.836823; Median 127.000000

each ∎ represents a count of 14

0.0000 -    25.5000 [  1052]

25.5000 - 51.0000 [ 994]
51.0000 - 76.5000 [ 976]
76.5000 - 102.0000 [ 997]
102.0000 - 127.5000 [ 989]
127.5000 - 153.0000 [ 1075]
153.0000 - 178.5000 [ 965]
178.5000 - 204.0000 [ 982]
204.0000 - 229.5000 [ 986]
229.5000 - 255.0000 [ 984]

Thanks a lot!
Jirka

How do I install this package?

Please document installation instructions in README.md

perl/python true-up

Opening this to track fixing up inconsistencies between Perl and Python versions, but only insofar as such differences make it difficult to run tests, per #10

Perl randomizes the sort order of dictionaries, which causes histogram ties to be ordered randomly, which is particularly meaningful if the tie happens at the boundary of how many items are displayed in a histogram, so that sometimes one key or the other will fit inside the histogram. This causes the "runTest" script to return random results between runs even on the same machine, since some of the input test files have ties.
Perl visually renders left-justified tables, whereas Python right-justifies. e.g. " logline| 6|" vs. "logline |6 |" -- not sure whether this really matters, but would be nice to make the tests independent of it.
The Perl script is more forgiving about command line syntax, e.g. it will accept "--h" or "-height" whereas Python will not, it only accepts "-h" or "--height". Again, this probably doesn't matter, the more important task is to make "runTests.sh" use syntax that works between both versions, as it currently uses some Perl-only option syntax.

Opened a branch (https://github.com/tstearns/distribution/tree/perl-python-trueup) to propose fixes, with item 1 above done, and the other two to follow, will open the branch merge as a PR for review when done, referencing this ticket.

Please tag a stable release

Thanks!

Add ability to read from file or files in addition to stdin

This seems like a minor thing, but it would be helpful to have the ability to read from one or more files passed as command line options instead of standard input.

This could save some keystrokes especially in the case of reading from multiple files, where instead of cat /foo/bar/* | distribution you can do distribution /foo/bar/*.

test discrepancies

The "runTests" script has a few items that I'm curious about:

The MD5 hashes don't reveal what the expected output is, so when they don't match, I can't tell why -- because I don't know what the original output*.txt files were that generated those numbers. (this is meaningful because I haven't been able to get the hashes on master or v1.2.2 to match on any system -- either Python or Perl on OSX 10.11, CentOS 5.11, or OpenBSD 5.9). Does it make sense to instead include the expected output files in the repo, and have the runTests script run MD5 on those files rather than storing the hashes directly in the script, so that it's possible to view discrepancies? I'd be happy to submit a PR for that, but can't include the expected files because I've never been able to generate them such that they match the existing hashes.
The Python and Perl scripts appear to have slightly different formatting output, which means that their output hashes differ, which makes me wonder which script the tests are supposed to be run against. For example, the alignment of the key field is different when running tests with Python 2.7 or Perl 5.18.

Python:

^[[32m /etc/mateconf|^[[34m7780758 ^[[35m(44.60%) ^[[37m••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••^[[0m

Perl:

/etc/mateconf^[[0m |^[[32m7780758 ^[[35m(44.60%) ^[[34m••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••^[[0m

alignment issues with multibyte characters

It seems distribution does not handle special characters well:

Chelsea Wolfe - Abyss                                           |44 (17.46%)  ▬▬▬▬▬▬▬▬
Chelsea Wolfe - Ἀποκάλυψις                           |23 (9.13%)   ▬▬▬▬

Breaks with python 3

Change the shebang to python2 maybe?

new release tag?

Is it possible to tag a new release with the latest changes? I'd like to make a distribution package for OpenBSD, and will base it on the next release tag. Thanks!

distribution.py: IndexError: list index out of range

❯❯❯ seq 5 |distribution.py
Traceback (most recent call last):
  File "/usr/local/bin/distribution.py", line 536, in <module>
    main(sys.argv[1:])
  File "/usr/local/bin/distribution.py", line 520, in main
    s = Settings()
  File "/usr/local/bin/distribution.py", line 336, in __init__
    if '--rcfile' in sys.argv[1]:
IndexError: list index out of range

add a quick way of tokenizing by character

Because the "tokenize" parameter is tested for existence, it's challenging to tokenize on "nothing" (which would split everything into individual characters)

Notably, there is also a difference in behavior between the Python and Perl implementations, in that distribution.py will successfully split on "0", while Perl will act as though I hadn't passed anything tokenize parameter in at all, with "-t=0"

The Perl-with-zero behavior should be easy to fix, but I'd suggest adding another special "tokenize" value (along with the existing "white" and "word") of "char" or something similar.

I'm not very experienced with Python, and while in Perl you can simply add a line like
elsif ($tokenize eq 'char') { $tokenize = ''; }
as far as I can tell, Python will not behave that way with splitting on an empty regex. And it's also beyond me how to properly test for "None" vs. some other existence thing to see if it was defined at all on the command line.

Anyway, there's always a work-around for now to split the entire thing before it even gets in.
e.g.
cat theFile | perl -ne 'print join "\n", split //' | distribution
But it feels like something that should be available more easily.

Improve color behavior when piping output or redirecting to file

Currently, if using options that colorize the output, distribution doesn't player super nicely when piping its output to another command, or redirecting it to a file, due to the the shell escape sequences used for colorizing the output.

For example, if your default options have color enabled, and you redirect the output to a file, opening that file in a text editor results in something like this (note all of the escape sequences):

342191L:25�[0m|�[32m1 �[35m(3.85%) �[34m█████████████████████▏�[0m
342191L:24�[0m|�[32m1 �[35m(3.85%) �[34m█████████████████████▏�[0m
342191L:23�[0m|�[32m1 �[35m(3.85%) �[34m█████████████████████▏�[0m
342191L:22�[0m|�[32m1 �[35m(3.85%) �[34m█████████████████████▏�[0m
342191L:21�[0m|�[32m1 �[35m(3.85%) �[34m█████████████████████▏�[0m
342191L:20�[0m|�[32m1 �[35m(3.85%) �[34m█████████████████████▏�[0m
342191L:19�[0m|�[32m1 �[35m(3.85%) �[34m█████████████████████▏�[0m
342191L:18�[0m|�[32m1 �[35m(3.85%) �[34m█████████████████████▏�[0m
342191L:17�[0m|�[32m1 �[35m(3.85%) �[34m█████████████████████▏�[0m
342191L:16�[0m|�[32m1 �[35m(3.85%) �[34m█████████████████████▏�[0m
342191L:15�[0m|�[32m1 �[35m(3.85%) �[34m█████████████████████▏�[0m
342191L:14�[0m|�[32m1 �[35m(3.85%) �[34m█████████████████████▏�[0m
342191L:13�[0m|�[32m1 �[35m(3.85%) �[34m█████████████████████▏�[0m
342191L:12�[0m|�[32m1 �[35m(3.85%) �[34m█████████████████████▏�[0m
342191L:11�[0m|�[32m1 �[35m(3.85%) �[34m█████████████████████▏�[0m

I propose extending the --color option to have three options, always, never, and auto, similar to POSIX tools like grep and ls. The auto option detects whether the output is a terminal, and only enables color if so.

wizzat / distribution Goto Github PK

distribution's People

Contributors

Stargazers

Watchers

Forkers

distribution's Issues

NumSamples = 10000; Min = 0.00; Max = 255.00

Mean = 126.874400; Variance = 5451.876425; SD = 73.836823; Median 127.000000

each ∎ represents a count of 14

Recommend Projects

Recommend Topics

Recommend Org

Jobs