bonzaithepenguin / wikisort Goto Github PK

Fast and stable sort algorithm that uses O(1) memory. Public domain.

License: The Unlicense

C 31.41% C++ 36.84% Java 31.76%

wikisort's Issues

Looking to improve Grailsort with ideas from Wiki; help wanted!!

Hello, Bonzai!

I've been quite interested in Wikisort and Block (Merge) Sort in general for a while now. Back in AP Comp Sci, you could say I was a bit floored when I found out a stable, O(1) space, and worst-case O(n log n) time sort actually existed. :P

I've also done a bit of personal studying into sorts, and forked over a decently popular "algorithm animation" program, which I've been working on for quite some time now! In fact, here are some videos I made regarding Wikisort (Feel free to use them!):
WikiSort - Bar Graph: https://youtu.be/ciQG_uUG6O4
WikiSort - Scatter Plot: https://youtu.be/mMr4b0Yg4yg
WikiSort - Color Circle: https://youtu.be/OWxuXJQ3Guw
Wikisorting over 16k Numbers: https://youtu.be/X3SyGvfj1d8

I've been doing a lot of research on and off not only into Wikisort, but also Andrey Astrelin's Grailsort (I saw he talked to you in another post), which turns out to be a bit of a different implementation of Block Merge Sort. What it sacrifices in terms of adaptability in some places (best-case O(n log n) time instead of Wikisort's O(n) best-case), it makes up with raw speed in other cases (I would wager Grailsort's "Build Blocks" function is one of the fastest merge sorts out there).

I've done some visualizations of the sort here; hopefully they demonstrate the major differences between Wiki and Grail:
Grail Sorting 2,048 Integers: https://youtu.be/LJbXsV_qGbs
GrailSort Redistributing Its Internal Buffer: https://youtu.be/8SV18oH8kPc
A Detailed Visual of GrailSort: https://youtu.be/U29NB96Y-9w
Grailsorting over 16k numbers: https://youtu.be/kZJ7h307bcQ

I also have some refactored repos for Grailsort. They're rough drafts, but hopefully the code is easier to read than the original:
Java Version: https://github.com/MusicTheorist/Grail-Sorting-for-Java
C/C++ Version: https://github.com/MusicTheorist/Grail-Sort-Refactored

I'm also looking to create some documentation for both Grail and Block Merge Sort to make them much more intuitive, like what you've done with your docs.

I basically agree with Mr. Astrelin on the major changes between Grail and Wiki: "[Grail] mainly differs by swapping blocks and their tags in parallel before merging, and shifting the position of an internal buffer used for locally merging/appending portions of an array".

While Grailsort is a great algorithm, I do see some significant room for improvement, and I thought I would ask you for some help if you're interested! There are definitely aspects of Wikisort I think Grailsort would benefit from, and hopefully a conversation would help to make sure I understand Blocksort alright!

My questions for now would be:

What do you consider the fastest/most efficient step(s) in Wikisort? What about the slowest/least efficient?
I saw one of your closed issues asking if a backwards merge was possible. Is it, and even if it is, would it possibly break stability?
What was your strategy for getting Wikisort to be adaptive?
Do you have any updates on why Java's JIT compiler slows down Wikisort?
Are there any important edge cases to watch out for when reordering/swapping A and B blocks?
What's the complexity of your "extract keys" method? Grailsort's "find keys" method alone is O(n log n) comparisons in the worst-case, i.e. when there are less than 2 * sqrt(n) unique keys in the array.

Feel free to answer whenever and however detailed you want. Hope you're staying well during the pandemic!

Thanks for reading!!

John

Merge backwards

Just remembered that an optimization exists where if B is ever smaller than A, it'd be faster to merge B backwards into A. Right now it always merges A into B.

Faster than std::stable_sort, so now it's time to take on TimSort

WikiSort easily beats std::stable_sort now, so the next step is benchmarking it against a good TimSort implementation (which was unusually hard to find). WikiSort is destroyed in a few synthetic tests (groups of 10,000 presorted items? Really?), but it also loses by a decent margin in tests that are closer to real-world data so I'm looking into possible improvements.

The biggest and most obvious one was that it extracts and redistributes the internal buffers at each level of the merge sort even if it turned out they weren't needed since the items were already in order. A less-obvious optimization is pulling out buffers that are large enough to be used for two or more levels of the merge sort, so we can skip at least one extraction and redistribution step. This will be tricky to get right, but it could speed things up slightly in some situations.

Another possibility is using something similar to TimSort up to a certain point by using a fixed-size cache, after which it would switch back to the old algorithm. I love me some fixed-size caches.

Link to paper is broken

http://ak.hanyang.ac.kr/papers/tamc2008.pdf

In the C implementation, is Test::index required?

In the C implementation, the type is defined as typedef struct { int value, index; } Test. Is Test::index required or is it just for the testing purpose? If we are allowed to add an index field to the array, it is trivial to achieve stable sort with any sorting algorithms, by using a comparison function

#define stable_lt(a,b) ((a).value < (b).value || ((a).value==(b).value \
                        && (a).index<(b).index))

to break any ties, but this is not the true in-place stable sort.

Shouldn't always pull out two internal buffers

Whoops, just noticed it pulls out two internal buffers even if it only ends up using one of them (when the A blocks fit into the cache). Should probably fix that...

Needs better in-place stable merging

The merge operation that is currently in place needs to be redone with a more intelligent algorithm, since it has an O(n^2) worst case. Right now I'm looking at a paper called "Optimizing stable in-place merging", by Jingchao Chen, but there are many similar papers online.

Description of reverse

Thank you for publishing your work on this algorithm. It is very fascinating. I am reading Chapter 1. I would just like to get some clarification about the reverse method. I know that the method is simple, but does this really do what it says it does? I have written a small example to test it.

public class WikiSortTest{

    static class Range{
        public int start;
        public int end;

        public Range(int start1, int end1){
            start = start1;
            end = end1;
        }

        public Range(){
            start =0;
            end =0;
        }

        void set(int start1, int end1){
            start = start1;
            end = end1;
        }

        int length(){
            return end-start;
        }

    }

    static void swap(int[] A, int i, int j){
        int temp = A[i];
        A[i] = A[j];
        A[j] = temp;
    }

    static void reverse(int[] A, Range range){
        for(int index=range.length()/2 -1; index >=0; index--){
            swap(A,A[range.start+index],A[range.end-index-1]);
        }
    }

    static void printArray(int[] A){
        System.out.println();
        for(int i=0; i < A.length; i++){
            System.out.printf(A[i]+" | ");
        }
        System.out.println();
    }

    public static void main(String[] args){
        int[] A = new int[]{0,1,2,3,4};
        printArray(A);
        Range range = new Range(0,3);
        reverse(A,range);
        printArray(A);
    }
}

The output from this program is

PS D:\Algorithm Analysis> javac WikiSortTest.java
PS D:\Algorithm Analysis> java WikiSortTest      

0 | 1 | 2 | 3 | 4 | 

2 | 1 | 0 | 3 | 4 |

Is this to be expected?

"you should probably just stick to Java's built-in sort so you get native speeds."

See WikiSort.java:8.

you should probably just stick to Java's built-in sort so you get native speeds.

This is no longer correct, the JVM's very good at JIT compiling now.

As long as your program is long-running and the java code is good, there's no reason this won't be faster than Arrays.sort().

For comparison, here's the current sort implementation used in Java. It's not native.

Reducing the number of comparisons

Finally getting around to replacing those linear searches with something more intelligent, to drastically lower the number of comparisons needed when there aren't very many unique values within the array. Right now even after a few comparison optimizations it's still using 30% more compares than __inplace_stable_sort() in one test case. Should be able to do a lot better!

Performance Percentage Meanings

When comparing the performance of the two sorts, the seconds percentage uses the ratio time2/time1, while the compares percentage uses compares1/compares2. Shouldn't these two match? Should they be comparing Wiki to Merge, or Merge to Wiki?

On another note, would it be beneficial to time the test cases?

Getting a crash in the Rotate function

The line that says Rotate(array, Range_length(range) - count, range, cache, cache_size); calls into memmove(&array[range2.end - Range_length(range1)], &array[range1.start], Range_length(range1) * sizeof(array[0])); with a range where the end is less than the beginning, so it causes a crash. Any ideas why? I'm building for the iOS architecture, and on there it appears that size_t is unsigned, is it signed for you?

Gah, why is the Java version so slow?

This has been bothering me for a while, but the Java version is nowhere near the performance of a standard merge sort. I spent most of yesterday removing all class allocations (no Range, no Pull, and no Iterator), then when that didn't help I started ripping out chunks of code bit by bit, and eventually I got it down to a standard bottom-up merge sort and realized it was still a lot slower than a standard recursive merge sort. This doesn't make any sense.

There has to be something weird about this Java VM's internal design that causes it to not like the way the items in the array are being accessed, or something. The recursive version always accesses elements in the same areas, but the iterative version iterates over the array log(n) times. The number of accesses are no different, but the order of the accesses changes quite a bit. In the C version the iterative merge sort is much faster than the recursive one, which is what I would expect.

Maybe Java has its own caching system running behind the scenes, separately from any bare metal hardware caching...? Or maybe it isn't communicating its access intentions to the system properly, resulting in a ton of cache misses?

Anyway, I asked about it on Stack Overflow here:
http://stackoverflow.com/questions/23121831/why-is-my-bottom-up-merge-sort-so-slow-in-java

Merging many repeated values

Just remembered that the block-based merge requires sqrt(n) unique values to exist, so an array with 1,000,000 items needs at least 1,000 unique values. The original paper offers a few techniques to deal with failing to extract enough unique values, but they have not been implemented yet. Looking into it now...

Adding unsigned corrections now

If anyone's curious, the trick to testing signedness is to replace all instances of "long" with "short" and see if it works for more than 32768 items. It doesn't.

Needs faster sorting for small data sets

Finally ready to start caring about how well WikiSort performs for small data sets. The worst case was obviously going to be Testing::Descending with slow comparisons, as that turns the InsertionSort call into the O(n^2) worst case. And yes, it's a LOT slower than std::stable_sort() at the moment – about 5x, actually.

C++ implementation is broken for non-PODs

The C++ implementation relies on std::memcpy and std::memmove in its implementation. Unfortunately, this is undefined behaviour unless the argument type passed to the functions are pointers to POD types.

Possible improvement

Searching the A and B subarrays for two internal buffers to pull out uses a linear search over the items in the array, and simply increments a counter whenever the value changes. In the case where the array does not contain enough unique values to fill the two internal buffers, this ends up performing a linear search through the entire array for every level of the merge. We could reduce the number of comparisons during this step with a binary search, and increment the counter whenever the middle value differs from the start or end value. If the middle value is ever the same as the start value, that means all of the values between it were the same too, so we don't have to search that range anymore.

bonzaithepenguin / wikisort Goto Github PK

wikisort's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs