bonzaithepenguin / wikisort Goto Github PK
View Code? Open in Web Editor NEWFast and stable sort algorithm that uses O(1) memory. Public domain.
License: The Unlicense
Fast and stable sort algorithm that uses O(1) memory. Public domain.
License: The Unlicense
Hello, Bonzai!
I've been quite interested in Wikisort and Block (Merge) Sort in general for a while now. Back in AP Comp Sci, you could say I was a bit floored when I found out a stable, O(1) space, and worst-case O(n log n) time sort actually existed. :P
I've also done a bit of personal studying into sorts, and forked over a decently popular "algorithm animation" program, which I've been working on for quite some time now! In fact, here are some videos I made regarding Wikisort (Feel free to use them!):
WikiSort - Bar Graph: https://youtu.be/ciQG_uUG6O4
WikiSort - Scatter Plot: https://youtu.be/mMr4b0Yg4yg
WikiSort - Color Circle: https://youtu.be/OWxuXJQ3Guw
Wikisorting over 16k Numbers: https://youtu.be/X3SyGvfj1d8
I've been doing a lot of research on and off not only into Wikisort, but also Andrey Astrelin's Grailsort (I saw he talked to you in another post), which turns out to be a bit of a different implementation of Block Merge Sort. What it sacrifices in terms of adaptability in some places (best-case O(n log n) time instead of Wikisort's O(n) best-case), it makes up with raw speed in other cases (I would wager Grailsort's "Build Blocks" function is one of the fastest merge sorts out there).
I've done some visualizations of the sort here; hopefully they demonstrate the major differences between Wiki and Grail:
Grail Sorting 2,048 Integers: https://youtu.be/LJbXsV_qGbs
GrailSort Redistributing Its Internal Buffer: https://youtu.be/8SV18oH8kPc
A Detailed Visual of GrailSort: https://youtu.be/U29NB96Y-9w
Grailsorting over 16k numbers: https://youtu.be/kZJ7h307bcQ
I also have some refactored repos for Grailsort. They're rough drafts, but hopefully the code is easier to read than the original:
Java Version: https://github.com/MusicTheorist/Grail-Sorting-for-Java
C/C++ Version: https://github.com/MusicTheorist/Grail-Sort-Refactored
I'm also looking to create some documentation for both Grail and Block Merge Sort to make them much more intuitive, like what you've done with your docs.
I basically agree with Mr. Astrelin on the major changes between Grail and Wiki: "[Grail] mainly differs by swapping blocks and their tags in parallel before merging, and shifting the position of an internal buffer used for locally merging/appending portions of an array".
While Grailsort is a great algorithm, I do see some significant room for improvement, and I thought I would ask you for some help if you're interested! There are definitely aspects of Wikisort I think Grailsort would benefit from, and hopefully a conversation would help to make sure I understand Blocksort alright!
My questions for now would be:
Feel free to answer whenever and however detailed you want. Hope you're staying well during the pandemic!
Thanks for reading!!
Just remembered that an optimization exists where if B is ever smaller than A, it'd be faster to merge B backwards into A. Right now it always merges A into B.
WikiSort easily beats std::stable_sort now, so the next step is benchmarking it against a good TimSort implementation (which was unusually hard to find). WikiSort is destroyed in a few synthetic tests (groups of 10,000 presorted items? Really?), but it also loses by a decent margin in tests that are closer to real-world data so I'm looking into possible improvements.
The biggest and most obvious one was that it extracts and redistributes the internal buffers at each level of the merge sort even if it turned out they weren't needed since the items were already in order. A less-obvious optimization is pulling out buffers that are large enough to be used for two or more levels of the merge sort, so we can skip at least one extraction and redistribution step. This will be tricky to get right, but it could speed things up slightly in some situations.
Another possibility is using something similar to TimSort up to a certain point by using a fixed-size cache, after which it would switch back to the old algorithm. I love me some fixed-size caches.
In the C implementation, the type is defined as typedef struct { int value, index; } Test
. Is Test::index
required or is it just for the testing purpose? If we are allowed to add an index
field to the array, it is trivial to achieve stable sort with any sorting algorithms, by using a comparison function
#define stable_lt(a,b) ((a).value < (b).value || ((a).value==(b).value \
&& (a).index<(b).index))
to break any ties, but this is not the true in-place stable sort.
Whoops, just noticed it pulls out two internal buffers even if it only ends up using one of them (when the A blocks fit into the cache). Should probably fix that...
The merge operation that is currently in place needs to be redone with a more intelligent algorithm, since it has an O(n^2) worst case. Right now I'm looking at a paper called "Optimizing stable in-place merging", by Jingchao Chen, but there are many similar papers online.
Thank you for publishing your work on this algorithm. It is very fascinating. I am reading Chapter 1. I would just like to get some clarification about the reverse method. I know that the method is simple, but does this really do what it says it does? I have written a small example to test it.
public class WikiSortTest{
static class Range{
public int start;
public int end;
public Range(int start1, int end1){
start = start1;
end = end1;
}
public Range(){
start =0;
end =0;
}
void set(int start1, int end1){
start = start1;
end = end1;
}
int length(){
return end-start;
}
}
static void swap(int[] A, int i, int j){
int temp = A[i];
A[i] = A[j];
A[j] = temp;
}
static void reverse(int[] A, Range range){
for(int index=range.length()/2 -1; index >=0; index--){
swap(A,A[range.start+index],A[range.end-index-1]);
}
}
static void printArray(int[] A){
System.out.println();
for(int i=0; i < A.length; i++){
System.out.printf(A[i]+" | ");
}
System.out.println();
}
public static void main(String[] args){
int[] A = new int[]{0,1,2,3,4};
printArray(A);
Range range = new Range(0,3);
reverse(A,range);
printArray(A);
}
}
The output from this program is
PS D:\Algorithm Analysis> javac WikiSortTest.java
PS D:\Algorithm Analysis> java WikiSortTest
0 | 1 | 2 | 3 | 4 |
2 | 1 | 0 | 3 | 4 |
Is this to be expected?
See WikiSort.java:8.
you should probably just stick to Java's built-in sort so you get native speeds.
This is no longer correct, the JVM's very good at JIT compiling now.
As long as your program is long-running and the java code is good, there's no reason this won't be faster than Arrays.sort()
.
For comparison, here's the current sort implementation used in Java. It's not native.
Finally getting around to replacing those linear searches with something more intelligent, to drastically lower the number of comparisons needed when there aren't very many unique values within the array. Right now even after a few comparison optimizations it's still using 30% more compares than __inplace_stable_sort() in one test case. Should be able to do a lot better!
When comparing the performance of the two sorts, the seconds percentage uses the ratio time2/time1, while the compares percentage uses compares1/compares2. Shouldn't these two match? Should they be comparing Wiki to Merge, or Merge to Wiki?
On another note, would it be beneficial to time the test cases?
The line that says Rotate(array, Range_length(range) - count, range, cache, cache_size);
calls into memmove(&array[range2.end - Range_length(range1)], &array[range1.start], Range_length(range1) * sizeof(array[0]));
with a range where the end is less than the beginning, so it causes a crash. Any ideas why? I'm building for the iOS architecture, and on there it appears that size_t is unsigned, is it signed for you?
This has been bothering me for a while, but the Java version is nowhere near the performance of a standard merge sort. I spent most of yesterday removing all class allocations (no Range, no Pull, and no Iterator), then when that didn't help I started ripping out chunks of code bit by bit, and eventually I got it down to a standard bottom-up merge sort and realized it was still a lot slower than a standard recursive merge sort. This doesn't make any sense.
There has to be something weird about this Java VM's internal design that causes it to not like the way the items in the array are being accessed, or something. The recursive version always accesses elements in the same areas, but the iterative version iterates over the array log(n) times. The number of accesses are no different, but the order of the accesses changes quite a bit. In the C version the iterative merge sort is much faster than the recursive one, which is what I would expect.
Maybe Java has its own caching system running behind the scenes, separately from any bare metal hardware caching...? Or maybe it isn't communicating its access intentions to the system properly, resulting in a ton of cache misses?
Anyway, I asked about it on Stack Overflow here:
http://stackoverflow.com/questions/23121831/why-is-my-bottom-up-merge-sort-so-slow-in-java
Just remembered that the block-based merge requires sqrt(n) unique values to exist, so an array with 1,000,000 items needs at least 1,000 unique values. The original paper offers a few techniques to deal with failing to extract enough unique values, but they have not been implemented yet. Looking into it now...
If anyone's curious, the trick to testing signedness is to replace all instances of "long" with "short" and see if it works for more than 32768 items. It doesn't.
Finally ready to start caring about how well WikiSort performs for small data sets. The worst case was obviously going to be Testing::Descending with slow comparisons, as that turns the InsertionSort call into the O(n^2) worst case. And yes, it's a LOT slower than std::stable_sort() at the moment โ about 5x, actually.
The C++ implementation relies on std::memcpy
and std::memmove
in its implementation. Unfortunately, this is undefined behaviour unless the argument type passed to the functions are pointers to POD types.
Searching the A and B subarrays for two internal buffers to pull out uses a linear search over the items in the array, and simply increments a counter whenever the value changes. In the case where the array does not contain enough unique values to fill the two internal buffers, this ends up performing a linear search through the entire array for every level of the merge. We could reduce the number of comparisons during this step with a binary search, and increment the counter whenever the middle value differs from the start or end value. If the middle value is ever the same as the start value, that means all of the values between it were the same too, so we don't have to search that range anymore.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.