orlp / glidesort Goto Github PK
View Code? Open in Web Editor NEWA Rust implementation of Glidesort, my stable adaptive quicksort/mergesort hybrid sorting algorithm.
A Rust implementation of Glidesort, my stable adaptive quicksort/mergesort hybrid sorting algorithm.
Have had this half-finished for a while; I'm writing up what I can even though there's more to investigate. CC @scandum, @Voultapher as I believe ipn's picked up glidesort's candidate approach.
I looked into how the choice of pivot candidates affects median accuracy using a simulation on random piecewise-linear inputs. These seem like they capture one type of order that might be expected in an input, and I don't have any other promising ideas for testing that.
I found that the accuracy depends not only on the way candidates are grouped into sets of 3 (what I initially wanted to test) but also the positions of those candidates (unexpected, but obvious in retrospect). So I measured both for an even distribution, and Glidesort's recursive selection, using midpoints of the nine intervals that come out of the recursion. I copied the arithmetic for this by hand; it looks right but it's possible I made a mistake here. I've also tested various positions not shown here. Error is measured as cost, assuming sorting each of the two partitions takes time proportional to n log(n). For context, the baseline cost is 8965784, so that a relatively high cost difference of 30000 adds 0.33% to total sorting cost due to that step (it'll compound at every partition). The tables below show cost for the best and worst groupings after testing every possibility, as well as some others:
The arrangement 011201220 that I've used and recommended previously does badly, often worse than 000111222.
Beyond any particular choice of grouping it doesn't seem like Glidesort's positions do well: the "true of 9" row is the true median of those and is worse than most pseudomedians with even spacing. The middle 4 makes Glidesort's positions skewed and prevents perfect performance on 1-piece (totally linear) inputs, but changing it to 3.5 isn't much of an improvement (in fact the true median is worse for 2-piece inputs). So I think it's mainly the clustering that weakens it, although I can't say exactly why the effect is so strong. Something that seems a little better is to use 3 as the midpoint for one or two intervals and 4 for the others.
I also did some theoretical analysis on the median-of-median-of-... idea. I find that 3^k candidates processed with recursive medians have about as much power as a true median of (π/2)*2.25^k pivots. For example the pseudomedian of 243 for a 32768-element list is worth a true median of 91. The probability distributions for the median value found this way with random candidates are equal at the exact midpoint, and the shapes seem similar (if anything the pseudomedian is wider). I can write the math up if you're interested.
Even spacing (0.5/9, ..., 8.5/9):
Partition | Diagram | 1 | 2 | 3 | 7 |
---|---|---|---|---|---|
True of 3 | ~147 | 0 | 13413 | 25575 | 77986 |
True of 9 | 012345678 | 0 | 1557 | 2940 | 8960 |
012102120 | 048/136/257 | 0 | 1557 | 3174 | 16728 |
012012012 | 036/147/258 | 0 | 1557 | 4276 | 18077 |
000111222 | 012/345/678 | 0 | 13413 | 20636 | 28802 |
001112022 | 016/234/578 | 35848 | 31942 | 29785 | 30945 |
Glidesort arrangement (approx 0.008, 0.07, 0.117, 0.508, 0.57, 0.617, 0.883, 0.945, 0.992):
Partition | Diagram | 1 | 2 | 3 | 7 |
---|---|---|---|---|---|
True of 3 | ~147 | 0 | 13413 | 25575 | 77986 |
True of 9 | 012345678 | 14184 | 69439 | 77943 | 74393 |
001122102 | 017/236/458 | 184 | 26029 | 43807 | 74776 |
012102120 | 048/136/257 | 14184 | 69439 | 79653 | 81007 |
012012012 | 036/147/258 | 14184 | 69439 | 84625 | 85439 |
000111222 | 012/345/678 | 14184 | 79120 | 97432 | 98255 |
011120220 | 058/123/467 | 39866 | 141889 | 139266 | 99083 |
Source for this, run with CBQN. I can translate to some other language on request. The inputs are created with •rand
which changes between runs, but the results above don't change significantly.
# All partitions of 9 candidates into 3+3+3
part ← (⊐⊸≡∧(3⥊3)≡/⁼)¨⊸/⥊↕3⌊1+↕9
pfmt ← > ('0'⊸+ ⋈ ·∾⟜"/"⊸∾´ '0'+⊔)¨ part # Display
# Hard-coded partitions of interest
pint ← "000111222"‿"012012012"‿"012102120" #‿"011201220"
# Make random piecewise-linear functions
GetFn ← {
R ← •rand.Range
p←0∾1∾˜∧(𝕩-2)R 0 ⋄ q←𝕩 R 0 # x endpoints, y endpoints
m←q÷○(«⊸-)p ⋄ y←q-m×p # Slope, y intercept
{(⊏⟜y+𝕩×⊏⟜m)(1↓p)⍋𝕩}
}
dist ← GetFn¨ 1e3/≍2‿3‿4‿8 # 1e3 sets with each of 2, 3, 4, 8 vertices
Sample ← dist {𝕎𝕩}¨ < # Sample values given positions
# Candidate sampling
Mid ← {(0.5+↕𝕩)÷𝕩} # Midpoints of equal-sized intervals, 𝕩 total
glide ← {t←0‿4‿7÷8 ⋄ ⥊t+⌜(t+÷16)÷8} # Midpoints of glidesort candidate intervals
# Scoring: cost increase relative to true median on 1e3 elements
list ← Sample Mid l←1e3
ScoreAll ← {+˝˘ (2×{𝕩×2⋆⁼𝕩}l÷2) -˜ +○{𝕩×2⋆⁼𝕩}⟜(l⊸-) list +´∘≤¨⎉∞‿¯1 𝕩}
MakeTable ← {
cand ← Sample 𝕩
med ← part (1⊑∧){𝔽𝔽¨}∘⊔⌜ cand # Pseudomedians
score ← +˝˘ scoremat ← ScoreAll med
t39 ← > (⌊2÷˜3‿9) ⊑⟜∧¨¨ ⟨Sample Mid 3, cand⟩ # True medians
∾⟨
["True of 3"‿"147","True of 9"‿('0'+↕9)] ∾˘ ⌊ ScoreAll t39
((⌽⊸∨0=↕∘≠)∨pint∊˜⊏˘)⊸/ score ⍋⊸⊏ (⌊scoremat) ∾˘˜ pfmt
⟩
}
•Show∘MakeTable¨ ⟨Mid 9, glide⟩
Do you plan a parallel version as Rayon? That works be cool/fast.
This is just a hack to host visualization videos on Github.
The minimum run length commit seems to introduce quadratic behavior for runs somewhat shorter than sqrt(n / 2)
, because the run is repeatedly followed and discarded. If so, this would cause worst-case performance of O(n^(3/2)) by multiplying O(n) time to get past each run by O(sqrt(n)) runs that fit in a length-n array. I took the following timings on a length 1e8 array to confirm that this has a practical impact; the input data is just 0, 1, ... r-1 repeated, for run length r. sqrt(1e8 / 2)
is about 7071; strangely, performance improves gradually from about 6000 to that number instead of sharply as I'd expected. The "% create" here is a loose estimate from perf top of fraction of time spent in LogicalRun<B,T>::create
, and "Time create" is that multiplied by total time.
Run | Time (s) | % create | Time create |
---|---|---|---|
500 | 2.44 | 0.20 | 0.49 |
1000 | 2.96 | 0.25 | 0.74 |
2000 | 3.74 | 0.38 | 1.42 |
4000 | 4.48 | 0.45 | 2.02 |
Add some kind of license, so editing your repository and using it become legal
If this sorting algorithm is stable and strictly better performance-wise than the standard library sort, it seems like the standard library implementation could be replaced completely (like it happened previously with hashbrown
and crossbeam-channel
)?
Please, use optimizations from this cool @gerben-stavenga's article: https://blog.reverberate.org/2020/05/29/hoares-rebuttal-bubble-sorts-comeback.html
Hey, glidesort
is very impressive.
Could this provide an easy way to run benchmarks on our machines and see these results? (I'm willing to help if necessary).
(EDIT: by chance, I have a 4800 MHz dual-channel system available, and a 2666MHz single-channel one, I'm curious to compare bench results on both.)
Well, I'll be blunt. I find the glidesort codebase hard to work with even by the standards of sorting research. I feel that in your desire to give a polished view of the algorithm to the world, you've actually made it harder for people who want to dig into the details. Given that pdqsort (Rust version included) was basically my entry point into high-performance sorting, seeing the next step like this is tough. Worse, from what I understand of the algorithm, it doesn't seem to be much more complicated than pdqsort, given that it throws out a lot of things like pivot scrambling and heapsort. I'll describe my difficulties as best as I'm able to give you the most information if you'd like to help.
I believe a real commit history would be very useful, and find the decision to publish as a single commit surprising for software presented at an open source conference. You apparently had enough of an implementation to benchmark for the talk in May, without full panic safety infrastructure. Because I can't access any version like this, I have no way to test your claim that panic safety accounts for 10-15% of time taken by the algorithm. Could it be different across processors? I have no insight into how tuning decisions were made, which is often available in the history too.
You've shared benchmarks from an ARM machine that's presumably your M1, and an unspecified AMD processor, as a png. Could you include, or link to, the processor specs and raw data in this repository?
Generally it feels that while sorting concepts are well explained, the way they are implemented isn't. As I understand it the way you use Rust isn't typical, so I expect even fluent Rust readers (I'm not one) could use some help. For example gap_guard.rs makes no attempt to explain what "the gap" is. And much of branchless_merge.rs is taken up by implementation of the BranchlessMergeState
structure with no explanation of how this structure will be used.
Other structures have no comments at all. I suppose the names are supposed to be self-documenting. Take enum PartitionStrategy<T>
in quicksort. LeftWithPivot
is meaningless to me. What goes left? And of course there's a pivot, you're partitioning! Eventually I figured out that the un-named parameter is the pivot value to be used. Is left pivoting the variety used for the left side in a bidirectional partition? Because the block comment at the top never connects to any specific part of the code, I can't tell. Are LeftIfNewPivotEquals
and LeftIfNewPivotEqualsCopy
identical other than the way they way they store the pivot? The definition of partition_left
certainly suggests this, but later less_strategy
and geq_strategy
recognize only the Copy
version.
Where does the recursive median-based strategy for pivot selection come from? Is there a reference? To me it seems obviously questionable because if just two of the three systematically-chosen regions based on a
, b
, and c
have lower median values, then you'll get a low pivot. For example, what happens with an array consisting of three up-down patterns?
What is tracking? I couldn't even google cfg(feature = "tracking")
.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.