orlp / glidesort Goto Github PK

View Code? Open in Web Editor NEW

1.6K 1.6K 25.0 61 KB

A Rust implementation of Glidesort, my stable adaptive quicksort/mergesort hybrid sorting algorithm.

Rust 100.00%

glidesort's People

Contributors

Stargazers

Watchers

glidesort's Issues

Median candidate analysis

Have had this half-finished for a while; I'm writing up what I can even though there's more to investigate. CC @scandum, @Voultapher as I believe ipn's picked up glidesort's candidate approach.

I looked into how the choice of pivot candidates affects median accuracy using a simulation on random piecewise-linear inputs. These seem like they capture one type of order that might be expected in an input, and I don't have any other promising ideas for testing that.

I found that the accuracy depends not only on the way candidates are grouped into sets of 3 (what I initially wanted to test) but also the positions of those candidates (unexpected, but obvious in retrospect). So I measured both for an even distribution, and Glidesort's recursive selection, using midpoints of the nine intervals that come out of the recursion. I copied the arithmetic for this by hand; it looks right but it's possible I made a mistake here. I've also tested various positions not shown here. Error is measured as cost, assuming sorting each of the two partitions takes time proportional to n log(n). For context, the baseline cost is 8965784, so that a relatively high cost difference of 30000 adds 0.33% to total sorting cost due to that step (it'll compound at every partition). The tables below show cost for the best and worst groupings after testing every possibility, as well as some others:

000111222 is the current glidesort scheme, which I find is fairly poor regardless of positions
012012012 transposes it, and seems generally good
012102120 stands out as consistently very good

The arrangement 011201220 that I've used and recommended previously does badly, often worse than 000111222.

Beyond any particular choice of grouping it doesn't seem like Glidesort's positions do well: the "true of 9" row is the true median of those and is worse than most pseudomedians with even spacing. The middle 4 makes Glidesort's positions skewed and prevents perfect performance on 1-piece (totally linear) inputs, but changing it to 3.5 isn't much of an improvement (in fact the true median is worse for 2-piece inputs). So I think it's mainly the clustering that weakens it, although I can't say exactly why the effect is so strong. Something that seems a little better is to use 3 as the midpoint for one or two intervals and 4 for the others.

I also did some theoretical analysis on the median-of-median-of-... idea. I find that 3^k candidates processed with recursive medians have about as much power as a true median of (π/2)*2.25^k pivots. For example the pseudomedian of 243 for a 32768-element list is worth a true median of 91. The probability distributions for the median value found this way with random candidates are equal at the exact midpoint, and the shapes seem similar (if anything the pseudomedian is wider). I can write the math up if you're interested.

Even spacing (0.5/9, ..., 8.5/9):

Partition	Diagram	1	2	3	7
True of 3	~147	0	13413	25575	77986
True of 9	012345678	0	1557	2940	8960
012102120	048/136/257	0	1557	3174	16728
012012012	036/147/258	0	1557	4276	18077
000111222	012/345/678	0	13413	20636	28802
001112022	016/234/578	35848	31942	29785	30945

Glidesort arrangement (approx 0.008, 0.07, 0.117, 0.508, 0.57, 0.617, 0.883, 0.945, 0.992):

Partition	Diagram	1	2	3	7
True of 3	~147	0	13413	25575	77986
True of 9	012345678	14184	69439	77943	74393
001122102	017/236/458	184	26029	43807	74776
012102120	048/136/257	14184	69439	79653	81007
012012012	036/147/258	14184	69439	84625	85439
000111222	012/345/678	14184	79120	97432	98255
011120220	058/123/467	39866	141889	139266	99083

Source for this, run with CBQN. I can translate to some other language on request. The inputs are created with •rand which changes between runs, but the results above don't change significantly.

# All partitions of 9 candidates into 3+3+3
part ← (⊐⊸≡∧(3⥊3)≡/⁼)¨⊸/⥊↕3⌊1+↕9
pfmt ← > ('0'⊸+ ⋈ ·∾⟜"/"⊸∾´ '0'+⊔)¨ part  # Display
# Hard-coded partitions of interest
pint ← "000111222"‿"012012012"‿"012102120" #‿"011201220"

# Make random piecewise-linear functions
GetFn ← {
  R ← •rand.Range
  p←0∾1∾˜∧(𝕩-2)R 0 ⋄ q←𝕩 R 0  # x endpoints, y endpoints
  m←q÷○(«⊸-)p ⋄ y←q-m×p       # Slope, y intercept
  {(⊏⟜y+𝕩×⊏⟜m)(1↓p)⍋𝕩}
}
dist ← GetFn¨ 1e3/≍2‿3‿4‿8    # 1e3 sets with each of 2, 3, 4, 8 vertices
Sample ← dist {𝕎𝕩}¨ <         # Sample values given positions

# Candidate sampling
Mid ← {(0.5+↕𝕩)÷𝕩}                  # Midpoints of equal-sized intervals, 𝕩 total
glide ← {t←0‿4‿7÷8 ⋄ ⥊t+⌜(t+÷16)÷8} # Midpoints of glidesort candidate intervals

# Scoring: cost increase relative to true median on 1e3 elements
list ← Sample Mid l←1e3
ScoreAll ← {+˝˘ (2×{𝕩×2⋆⁼𝕩}l÷2) -˜ +○{𝕩×2⋆⁼𝕩}⟜(l⊸-) list +´∘≤¨⎉∞‿¯1 𝕩}
MakeTable ← {
  cand ← Sample 𝕩
  med ← part (1⊑∧){𝔽𝔽¨}∘⊔⌜ cand                # Pseudomedians
  score ← +˝˘ scoremat ← ScoreAll med
  t39 ← > (⌊2÷˜3‿9) ⊑⟜∧¨¨ ⟨Sample Mid 3, cand⟩ # True medians
  ∾⟨
    ["True of 3"‿"147","True of 9"‿('0'+↕9)] ∾˘ ⌊ ScoreAll t39
    ((⌽⊸∨0=↕∘≠)∨pint∊˜⊏˘)⊸/ score ⍋⊸⊏ (⌊scoremat) ∾˘˜ pfmt
  ⟩
}
•Show∘MakeTable¨ ⟨Mid 9, glide⟩

parallel version

Do you plan a parallel version as Rayon? That works be cool/fast.

Host visualizations on Github

This is just a hack to host visualization videos on Github.

glidesort_merge_example.mp4

glidesort_adaptiveness_example.mp4

Quadratic-time run scanning

The minimum run length commit seems to introduce quadratic behavior for runs somewhat shorter than sqrt(n / 2), because the run is repeatedly followed and discarded. If so, this would cause worst-case performance of O(n^(3/2)) by multiplying O(n) time to get past each run by O(sqrt(n)) runs that fit in a length-n array. I took the following timings on a length 1e8 array to confirm that this has a practical impact; the input data is just 0, 1, ... r-1 repeated, for run length r. sqrt(1e8 / 2) is about 7071; strangely, performance improves gradually from about 6000 to that number instead of sharply as I'd expected. The "% create" here is a loose estimate from perf top of fraction of time spent in LogicalRun<B,T>::create, and "Time create" is that multiplied by total time.

Run	Time (s)	% create	Time create
500	2.44	0.20	0.49
1000	2.96	0.25	0.74
2000	3.74	0.38	1.42
4000	4.48	0.45	2.02

Add license

Add some kind of license, so editing your repository and using it become legal

Potentially replace standard library sort?

If this sorting algorithm is stable and strictly better performance-wise than the standard library sort, it seems like the standard library implementation could be replaced completely (like it happened previously with hashbrown and crossbeam-channel)?

Implement optimizations from Stavenga's article

Please, use optimizations from this cool @gerben-stavenga's article: https://blog.reverberate.org/2020/05/29/hoares-rebuttal-bubble-sorts-comeback.html

Provide a way to easily reproduce benchmarks

Hey, glidesort is very impressive.

Could this provide an easy way to run benchmarks on our machines and see these results? (I'm willing to help if necessary).

(EDIT: by chance, I have a 4800 MHz dual-channel system available, and a 2666MHz single-channel one, I'm curious to compare bench results on both.)

Frustrations

Well, I'll be blunt. I find the glidesort codebase hard to work with even by the standards of sorting research. I feel that in your desire to give a polished view of the algorithm to the world, you've actually made it harder for people who want to dig into the details. Given that pdqsort (Rust version included) was basically my entry point into high-performance sorting, seeing the next step like this is tough. Worse, from what I understand of the algorithm, it doesn't seem to be much more complicated than pdqsort, given that it throws out a lot of things like pivot scrambling and heapsort. I'll describe my difficulties as best as I'm able to give you the most information if you'd like to help.

I believe a real commit history would be very useful, and find the decision to publish as a single commit surprising for software presented at an open source conference. You apparently had enough of an implementation to benchmark for the talk in May, without full panic safety infrastructure. Because I can't access any version like this, I have no way to test your claim that panic safety accounts for 10-15% of time taken by the algorithm. Could it be different across processors? I have no insight into how tuning decisions were made, which is often available in the history too.

You've shared benchmarks from an ARM machine that's presumably your M1, and an unspecified AMD processor, as a png. Could you include, or link to, the processor specs and raw data in this repository?

Generally it feels that while sorting concepts are well explained, the way they are implemented isn't. As I understand it the way you use Rust isn't typical, so I expect even fluent Rust readers (I'm not one) could use some help. For example gap_guard.rs makes no attempt to explain what "the gap" is. And much of branchless_merge.rs is taken up by implementation of the BranchlessMergeState structure with no explanation of how this structure will be used.

Other structures have no comments at all. I suppose the names are supposed to be self-documenting. Take enum PartitionStrategy<T> in quicksort. LeftWithPivot is meaningless to me. What goes left? And of course there's a pivot, you're partitioning! Eventually I figured out that the un-named parameter is the pivot value to be used. Is left pivoting the variety used for the left side in a bidirectional partition? Because the block comment at the top never connects to any specific part of the code, I can't tell. Are LeftIfNewPivotEquals and LeftIfNewPivotEqualsCopy identical other than the way they way they store the pivot? The definition of partition_left certainly suggests this, but later less_strategy and geq_strategy recognize only the Copy version.

Where does the recursive median-based strategy for pivot selection come from? Is there a reference? To me it seems obviously questionable because if just two of the three systematically-chosen regions based on a, b, and c have lower median values, then you'll get a low pivot. For example, what happens with an array consisting of three up-down patterns?

What is tracking? I couldn't even google cfg(feature = "tracking").

orlp / glidesort Goto Github PK

glidesort's People

Contributors

Stargazers

Watchers

Forkers

glidesort's Issues

Median candidate analysis

parallel version

Host visualizations on Github

Quadratic-time run scanning

Add license

Potentially replace standard library sort?

Implement optimizations from Stavenga's article

Provide a way to easily reproduce benchmarks

Frustrations

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs