bonzaithepenguin / wikisort Goto Github PK

Fast and stable sort algorithm that uses O(1) memory. Public domain.

License: The Unlicense

C 31.41% C++ 36.84% Java 31.76%

wikisort's Introduction

WikiSort

WikiSort is an implementation of "block merge sort", which is a stable merge sort based on the work described in "Ratio based stable in-place merging", by Pok-Son Kim and Arne Kutzner [PDF]. It's generally as fast as a standard merge sort while using O(1) memory, and can be modified to use additional memory optionally provided to it which can further improve its speed.

C, C++, and Java versions are currently available, and you have permission from me and the authors of the paper (Dr. Kim and Dr. Kutzner) to do whatever you want with this code.

Related: Check out the GrailSort project for a similar algorithm based on a paper by Huang and Langston, or the Rewritten Grailsort project which continues its work.

If you want to learn how it works, check out the documentation:
  • Chapter 1: Tools
  • Chapter 2: Merging
  • Chapter 3: In-Place
  • Chapter 4: Faster!

Or you can check out the Wikipedia page for block merge sort. WikiSort was actually (poorly) named out of the hope that you'll update and improve its various components almost like a Wikipedia article, since this is still very much an open area of research that could use your expertise!

WikiSort vs. std::stable_sort()
^{(clang++ version 3.2, sorting 0 to 1.5 million items)}

Using a 512-item fixed-size cache for O(1) memory:

Test             Fast comparisons   Slow comparisons   150,000,000 items    0-32 items
Random               6% faster        95% as fast         35% faster        45% faster
RandomFew            5% faster        16% faster          20% faster        45% faster
MostlyDescending    97% as fast       13% faster          99% as fast       53% faster
MostlyAscending    149% faster       117% faster         286% faster        47% faster
Ascending         1280% faster       518% faster        1101% faster       242% faster
Descending          23% faster       121% faster          12% faster       164% faster
Equal             1202% faster       418% faster        1031% faster       227% faster
Jittered           526% faster       298% faster         733% faster        70% faster
MostlyEqual         15% faster        57% faster          10% faster        42% faster
Append             153% faster        90% faster         348% faster       112% faster

Using a dynamically allocated half-size cache:

Test             Fast comparisons   Slow comparisons
Random              11% faster         3% faster
RandomFew           10% faster         5% faster
MostlyDescending    19% faster        26% faster
MostlyAscending     98% faster        79% faster
Ascending          861% faster       463% faster
Descending          39% faster       142% faster
Equal              837% faster       460% faster
Jittered           326% faster       243% faster
MostlyEqual         15% faster         2% faster
Append             159% faster        94% faster

wikisort's People

Contributors

Stargazers

Watchers

Forkers

whackashoe sangohan evangileon cartertsai bingmann cusell vivekvetri rizaramadan omarmalik robbestad kennyhlam townie kesshoryu chrstfer suppwe navalsaini suchintan geod24 afthill blackball salhi-mahmoud aacosta813 woodphil cqyy alyahmed imclab haowooi 1o-1o sonal-raj jamesgore amprokop helei24 grimderp xqsw tml tochange desktopqa divyasr aoboturov sing1ee itevents dinuand h2oboi89-forks passerby4j linearregression caomw heshuimu pombredanne shadrd0g shabesoglu lazarch prdinesh cloudxtreme odor foo123 jensworkgit morganbauer isumitg ghaseminya wangliye00 baimurzin-bookmark jmeberlein octwanna beaver-company andyblueduke abuldakov jacklilhammers hackerfoo auscanaoy melonmtrs fjfabz michaelaxtmann qdhqf leolinoj arryboom anurag-aith clayne alfex4936 itzaayush marcos30004347 fdemolde sumi0 freeblow pharmhaus-2 cfriedt hjhk258 kimsoar kdtiankong gbakerkallus sakthimurugane phamhung2312 pystraf vitaly-z elivais

wikisort's Issues

In the C implementation, is Test::index required?

In the C implementation, the type is defined as typedef struct { int value, index; } Test. Is Test::index required or is it just for the testing purpose? If we are allowed to add an index field to the array, it is trivial to achieve stable sort with any sorting algorithms, by using a comparison function

#define stable_lt(a,b) ((a).value < (b).value || ((a).value==(b).value \
                        && (a).index<(b).index))

to break any ties, but this is not the true in-place stable sort.

C++ implementation is broken for non-PODs

The C++ implementation relies on std::memcpy and std::memmove in its implementation. Unfortunately, this is undefined behaviour unless the argument type passed to the functions are pointers to POD types.

Gah, why is the Java version so slow?

This has been bothering me for a while, but the Java version is nowhere near the performance of a standard merge sort. I spent most of yesterday removing all class allocations (no Range, no Pull, and no Iterator), then when that didn't help I started ripping out chunks of code bit by bit, and eventually I got it down to a standard bottom-up merge sort and realized it was still a lot slower than a standard recursive merge sort. This doesn't make any sense.

There has to be something weird about this Java VM's internal design that causes it to not like the way the items in the array are being accessed, or something. The recursive version always accesses elements in the same areas, but the iterative version iterates over the array log(n) times. The number of accesses are no different, but the order of the accesses changes quite a bit. In the C version the iterative merge sort is much faster than the recursive one, which is what I would expect.

Maybe Java has its own caching system running behind the scenes, separately from any bare metal hardware caching...? Or maybe it isn't communicating its access intentions to the system properly, resulting in a ton of cache misses?

Anyway, I asked about it on Stack Overflow here:
http://stackoverflow.com/questions/23121831/why-is-my-bottom-up-merge-sort-so-slow-in-java

Reducing the number of comparisons

Finally getting around to replacing those linear searches with something more intelligent, to drastically lower the number of comparisons needed when there aren't very many unique values within the array. Right now even after a few comparison optimizations it's still using 30% more compares than __inplace_stable_sort() in one test case. Should be able to do a lot better!

Needs better in-place stable merging

The merge operation that is currently in place needs to be redone with a more intelligent algorithm, since it has an O(n^2) worst case. Right now I'm looking at a paper called "Optimizing stable in-place merging", by Jingchao Chen, but there are many similar papers online.

Possible improvement

Searching the A and B subarrays for two internal buffers to pull out uses a linear search over the items in the array, and simply increments a counter whenever the value changes. In the case where the array does not contain enough unique values to fill the two internal buffers, this ends up performing a linear search through the entire array for every level of the merge. We could reduce the number of comparisons during this step with a binary search, and increment the counter whenever the middle value differs from the start or end value. If the middle value is ever the same as the start value, that means all of the values between it were the same too, so we don't have to search that range anymore.

Performance Percentage Meanings

When comparing the performance of the two sorts, the seconds percentage uses the ratio time2/time1, while the compares percentage uses compares1/compares2. Shouldn't these two match? Should they be comparing Wiki to Merge, or Merge to Wiki?

On another note, would it be beneficial to time the test cases?

Description of reverse

Thank you for publishing your work on this algorithm. It is very fascinating. I am reading Chapter 1. I would just like to get some clarification about the reverse method. I know that the method is simple, but does this really do what it says it does? I have written a small example to test it.

public class WikiSortTest{

    static class Range{
        public int start;
        public int end;

        public Range(int start1, int end1){
            start = start1;
            end = end1;
        }

        public Range(){
            start =0;
            end =0;
        }

        void set(int start1, int end1){
            start = start1;
            end = end1;
        }

        int length(){
            return end-start;
        }

    }

    static void swap(int[] A, int i, int j){
        int temp = A[i];
        A[i] = A[j];
        A[j] = temp;
    }

    static void reverse(int[] A, Range range){
        for(int index=range.length()/2 -1; index >=0; index--){
            swap(A,A[range.start+index],A[range.end-index-1]);
        }
    }

    static void printArray(int[] A){
        System.out.println();
        for(int i=0; i < A.length; i++){
            System.out.printf(A[i]+" | ");
        }
        System.out.println();
    }

    public static void main(String[] args){
        int[] A = new int[]{0,1,2,3,4};
        printArray(A);
        Range range = new Range(0,3);
        reverse(A,range);
        printArray(A);
    }
}

The output from this program is

PS D:\Algorithm Analysis> javac WikiSortTest.java
PS D:\Algorithm Analysis> java WikiSortTest      

0 | 1 | 2 | 3 | 4 | 

2 | 1 | 0 | 3 | 4 |

Is this to be expected?

Merging many repeated values

Just remembered that the block-based merge requires sqrt(n) unique values to exist, so an array with 1,000,000 items needs at least 1,000 unique values. The original paper offers a few techniques to deal with failing to extract enough unique values, but they have not been implemented yet. Looking into it now...

"you should probably just stick to Java's built-in sort so you get native speeds."

See WikiSort.java:8.

you should probably just stick to Java's built-in sort so you get native speeds.

This is no longer correct, the JVM's very good at JIT compiling now.

As long as your program is long-running and the java code is good, there's no reason this won't be faster than Arrays.sort().

For comparison, here's the current sort implementation used in Java. It's not native.

Merge backwards

Just remembered that an optimization exists where if B is ever smaller than A, it'd be faster to merge B backwards into A. Right now it always merges A into B.

Looking to improve Grailsort with ideas from Wiki; help wanted!!

Hello, Bonzai!

I've been quite interested in Wikisort and Block (Merge) Sort in general for a while now. Back in AP Comp Sci, you could say I was a bit floored when I found out a stable, O(1) space, and worst-case O(n log n) time sort actually existed. :P

I've also done a bit of personal studying into sorts, and forked over a decently popular "algorithm animation" program, which I've been working on for quite some time now! In fact, here are some videos I made regarding Wikisort (Feel free to use them!):
WikiSort - Bar Graph: https://youtu.be/ciQG_uUG6O4
WikiSort - Scatter Plot: https://youtu.be/mMr4b0Yg4yg
WikiSort - Color Circle: https://youtu.be/OWxuXJQ3Guw
Wikisorting over 16k Numbers: https://youtu.be/X3SyGvfj1d8

I've been doing a lot of research on and off not only into Wikisort, but also Andrey Astrelin's Grailsort (I saw he talked to you in another post), which turns out to be a bit of a different implementation of Block Merge Sort. What it sacrifices in terms of adaptability in some places (best-case O(n log n) time instead of Wikisort's O(n) best-case), it makes up with raw speed in other cases (I would wager Grailsort's "Build Blocks" function is one of the fastest merge sorts out there).

I've done some visualizations of the sort here; hopefully they demonstrate the major differences between Wiki and Grail:
Grail Sorting 2,048 Integers: https://youtu.be/LJbXsV_qGbs
GrailSort Redistributing Its Internal Buffer: https://youtu.be/8SV18oH8kPc
A Detailed Visual of GrailSort: https://youtu.be/U29NB96Y-9w
Grailsorting over 16k numbers: https://youtu.be/kZJ7h307bcQ

I also have some refactored repos for Grailsort. They're rough drafts, but hopefully the code is easier to read than the original:
Java Version: https://github.com/MusicTheorist/Grail-Sorting-for-Java
C/C++ Version: https://github.com/MusicTheorist/Grail-Sort-Refactored

I'm also looking to create some documentation for both Grail and Block Merge Sort to make them much more intuitive, like what you've done with your docs.

I basically agree with Mr. Astrelin on the major changes between Grail and Wiki: "[Grail] mainly differs by swapping blocks and their tags in parallel before merging, and shifting the position of an internal buffer used for locally merging/appending portions of an array".

While Grailsort is a great algorithm, I do see some significant room for improvement, and I thought I would ask you for some help if you're interested! There are definitely aspects of Wikisort I think Grailsort would benefit from, and hopefully a conversation would help to make sure I understand Blocksort alright!

My questions for now would be:

What do you consider the fastest/most efficient step(s) in Wikisort? What about the slowest/least efficient?
I saw one of your closed issues asking if a backwards merge was possible. Is it, and even if it is, would it possibly break stability?
What was your strategy for getting Wikisort to be adaptive?
Do you have any updates on why Java's JIT compiler slows down Wikisort?
Are there any important edge cases to watch out for when reordering/swapping A and B blocks?
What's the complexity of your "extract keys" method? Grailsort's "find keys" method alone is O(n log n) comparisons in the worst-case, i.e. when there are less than 2 * sqrt(n) unique keys in the array.

Feel free to answer whenever and however detailed you want. Hope you're staying well during the pandemic!

Thanks for reading!!

John

Adding unsigned corrections now

If anyone's curious, the trick to testing signedness is to replace all instances of "long" with "short" and see if it works for more than 32768 items. It doesn't.

Needs faster sorting for small data sets

Finally ready to start caring about how well WikiSort performs for small data sets. The worst case was obviously going to be Testing::Descending with slow comparisons, as that turns the InsertionSort call into the O(n^2) worst case. And yes, it's a LOT slower than std::stable_sort() at the moment – about 5x, actually.

Link to paper is broken

http://ak.hanyang.ac.kr/papers/tamc2008.pdf

Getting a crash in the Rotate function

The line that says Rotate(array, Range_length(range) - count, range, cache, cache_size); calls into memmove(&array[range2.end - Range_length(range1)], &array[range1.start], Range_length(range1) * sizeof(array[0])); with a range where the end is less than the beginning, so it causes a crash. Any ideas why? I'm building for the iOS architecture, and on there it appears that size_t is unsigned, is it signed for you?

Shouldn't always pull out two internal buffers

Whoops, just noticed it pulls out two internal buffers even if it only ends up using one of them (when the A blocks fit into the cache). Should probably fix that...

Faster than std::stable_sort, so now it's time to take on TimSort

WikiSort easily beats std::stable_sort now, so the next step is benchmarking it against a good TimSort implementation (which was unusually hard to find). WikiSort is destroyed in a few synthetic tests (groups of 10,000 presorted items? Really?), but it also loses by a decent margin in tests that are closer to real-world data so I'm looking into possible improvements.

The biggest and most obvious one was that it extracts and redistributes the internal buffers at each level of the merge sort even if it turned out they weren't needed since the items were already in order. A less-obvious optimization is pulling out buffers that are large enough to be used for two or more levels of the merge sort, so we can skip at least one extraction and redistribution step. This will be tricky to get right, but it could speed things up slightly in some situations.

Another possibility is using something similar to TimSort up to a certain point by using a fixed-size cache, after which it would switch back to the old algorithm. I love me some fixed-size caches.

bonzaithepenguin / wikisort Goto Github PK

wikisort's Introduction

WikiSort

wikisort's People

Contributors

Stargazers

Watchers

Forkers

wikisort's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs