GithubHelp home page GithubHelp logo

Comments (20)

BonzaiThePenguin avatar BonzaiThePenguin commented on June 14, 2024 1

I'll be sure to check it out! The difference you're describing is known as the movement imitation buffer – this paper recommended using that too, but I found tagging to be faster and easier to visualize so I went with that.

I think the issue previous O(n log n) algorithms had was that they were many times slower than a standard merge sort, which made them impractical for general use. Even this one was 2-3x slower until I added a bunch of optimizations, and the Java version still has awful performance for some reason. All I've been able to determine so far is that it has something to do with the order of array accesses, and is caused by something within the JVM specifically.

from wikisort.

BonzaiThePenguin avatar BonzaiThePenguin commented on June 14, 2024

I implemented the algorithm described in "Ratio based stable in-place merging", by Pok-Son Kim and Arne Kutzner, and then I went back and wrote a much simpler version and got that working too! I should have the new version up by tomorrow night.

from wikisort.

Mrrl avatar Mrrl commented on June 14, 2024

Nice algorithm :)
I have wrote almost the same about 3 months ago: https://github.com/Mrrl/GrailSort . Difference is that instead of tagging blocks I've swapped elements of internal buffer in parallel with swapping of blocks, so I could say what was the initial order of blocks by comparing of internal buffer elements.

from wikisort.

BonzaiThePenguin avatar BonzaiThePenguin commented on June 14, 2024

On the first run your algorithm was 20% faster than WikiSort when I removed the cache, so you've got my attention! What else is it doing differently? Structurally it looks a lot different from WikiSort, and Huang and Langston wasn't cited in Dr. Kim's and Dr. Kutzner's paper so I'm not terribly familiar with their work.

from wikisort.

Mrrl avatar Mrrl commented on June 14, 2024

Probably, another difference may be that in merging (both for sorting of elements in blocks and for block merging) I use moving internal buffer: before merge of two blocks or subblocks A and B it is before block A, and after merge it is after sorted part of (A+B). That is sequence (free, A, B) transforms to (sorted(A+B), free, rest(A or B)). Such method reduces the number of swapping operations.
As for other differences, it's difficult to say - they are well hidden in details of implementations. Now I'm not sure whether there is a good way to add cache array (of fixed size) to my implementation. Probably, yes - but it will require to duplicate half of merging functions where swap will be replaced by move.
Some description of my implementation is here: http://habrahabr.ru/post/205290/ but unfortunately it's in Russian.

from wikisort.

BonzaiThePenguin avatar BonzaiThePenguin commented on June 14, 2024

I like the idea of only pulling out the internal buffer one time at the very start and using MergeInPlace to redistribute it – I had tried doing that on my own once, but I made the mistake of always keeping the buffer at the start rather than shifting it through the array as new items are added. This caused it to become an n3/2 operation in the worst case so I gave up and didn't think about it any further.

I'll be sure to look into the moving internal buffer idea as well. I may try bringing back the movement imitation buffer too to see if it ends up being faster with this different design.

If you're still working on GrailSort, you should try implementing the idea used in WikiSort where it scales the power-of-two math to the actual size of the array. I found that to be 10% faster overall since it reduced the number of merge operations needed when the array size isn't a power of two.

but unfortunately it's in Russian.

Not even a slight problem for Chrome! It translated it automatically and I can make sense of it.

Also, if you're interested there's now a Wikipedia article for this type of sorting algorithm. It could certainly use a Russian translation and/or more information about this class of algorithms! I only detailed the algorithm used in Kim's and Kutzner's paper and a few parts of other papers, since that's all I knew about at the time. You could also link to your GitHub project at the bottom if you're willing to release it to the Creative Commons or public domain.

from wikisort.

BonzaiThePenguin avatar BonzaiThePenguin commented on June 14, 2024

(It ended up being called "Block merge sort" because I thought Dr. Kutzner was the first to create a practical algorithm and allowed him to name it, then I decided "block sort" was a nice nickname for it.)

from wikisort.

Mrrl avatar Mrrl commented on June 14, 2024

I've added a couple of functions in my version - sorting with fixed buffer (512 items) and with dynamic buffer (less than 2*sqrt(N) items). On 150M array with enough different values dynamic version is 1.3 times faster than stable_sort. Not much. but I can live with it :)

from wikisort.

BonzaiThePenguin avatar BonzaiThePenguin commented on June 14, 2024

Awesome! Minor correction, though: I think it's actually .3x faster, or 30%. 1x faster means twice as fast. The test you used seems closest to RandomFew, which was only 20% faster for WikiSort.

I switched WikiSort over to extracting an internal buffer one time at the start (the code isn't uploaded yet) and it's definitely faster than what it was doing before – it went from 70% as fast to 82% as fast, without the 512-item cache. Plus the code is a lot shorter now, so that's nice. Still need to implement the other parts.

from wikisort.

Mrrl avatar Mrrl commented on June 14, 2024

I think that in RandomFew test number of keys is much smaller than 2*sqrt(N), so it's deep inside area of "not enough keys". I've tested algorithms in two situations: when number of keys is slightly less than this margin (and speed of GrailSort was down to 0.75 of stable_sort's speed) and when number of keys is slightly more (and in this case speed was up to 1.3 of stable_sort's speed). I think that in "dynamic" version I could allocate array int[sqrt(N)] for blocks tagging, but it will be completely different algorithm, with O(sqrt(N)) memory (without internal buffers at all) - and it's not good to combine it with pure BlockSort.

from wikisort.

Mrrl avatar Mrrl commented on June 14, 2024

Yes, pure O(sqrt(N)) memory algorithm works 40% faster than stable_sort on random data.
https://github.com/Mrrl/SqrtSort

from wikisort.

BonzaiThePenguin avatar BonzaiThePenguin commented on June 14, 2024

Alright, I'm looking into the Pardo/Huang algorithm now. It looks like the block rearranging step is actually completely different than the one WikiSort uses. WikiSort rolls the A blocks through the B blocks and selection sorts the A blocks as it goes, but Pardo tags the A and B blocks using a pseudo-merge step then selection sorts both of them together. It looks like Pardo's version uses significantly fewer swaps, and possibly fewer comparisons.

from wikisort.

Mrrl avatar Mrrl commented on June 14, 2024

As far as I see, tagging of blocks in Huang and Langston article (by swapping of two elements) doesn't work in all situations: in some combinations of equal elements it's impossible to recover list partition to A and B streams.
I didn't read Pardo's article yet.

from wikisort.

BonzaiThePenguin avatar BonzaiThePenguin commented on June 14, 2024

Yeah, I was having trouble trying to figure out how Huang's and Langston's paper was supposed to work in all cases – they just kind of gloss over the details while I'm writing version after version trying to fix the holes in my understanding of it. Pardo's paper is incredibly thorough and somewhat similar.

The design I'm considering right now is selection sorting the A and B blocks by their first values (giving precedence to A if the values are equal), then for any B blocks that contain at least two unique values you swap the first and last value so we can tell later that it's a B block. It turns out that if the B block contains all equal values we don't need to know that it used to be a B block.

For example, it's impossible for this B block to be inserted before the next A block unless it's less than the first value, which is 2:

A [0 1 1 1] B [1 1 1 1] A [2 2 2 2]

But as you can see, the blocks are already in order and nothing needs to be merged, so we didn't need to know the middle block came from B.

But if the first A block ends with a 2, like so:

A [0 1 1 2] B [1 1 1 1] A [2 2 2 2]

We'd know [1 1 1 1] was a B block because we can't have an A block that's less than the value of the previous A block.

And if B contains more than one unique value, like so:

A [0 1 1 1] B [1 1 1 2] A [2 2 2 2]
                     ^

We'd need to know it was a B block since the 2 is out of order, but since we have two unique values we can swap the first and last value:

A [0 1 1 1] B [2 1 1 1] A [2 2 2 2]
               ^     ^

I don't know if that's the way I'm supposed to be doing it – again, the paper seems to be missing a lot of important details and doesn't seem to work in all cases.

from wikisort.

Mrrl avatar Mrrl commented on June 14, 2024

But what will you do in this example:
B:[0 1 1 1] A:[1 1 1 1] B:[1 1 1 1] A:[2 2 2 2]
In this case you have to remember that the third block is B because all 1s from the first block should go between second and third blocks.

Actually, if you sort blocks by selection sorting, you can always know for every block is it from A or from B:
If array in the beginning is [A A A A A A A | B B B B B B] then after some selections of the minimal block and placing it in the place you will get [ C C C A A A A | A A B B B B B ], where Cs are pulled and merged A and B blocks, and you always know the position of the first B block. Next block will be either one of A blocks in the second half or the first B block. So when you find next minimal block, you know whether it's A or B.
Problem is that in this process you can lose order of A blocks that go to the second half. So we can reduce number of unique keys to 3/2*sqrt(N) (sqrt(N) for internal buffer and sqrt(N)/2 for tracking order of A blocks).

from wikisort.

BonzaiThePenguin avatar BonzaiThePenguin commented on June 14, 2024

You're completely right – thanks for catching that. One possible fix would be swapping the first and last values in the B blocks before selection sorting the blocks by their first values, although I haven't fully tested this yet:

These two cases stay the same in this system:

A [0 1 1 1] B [1 1 1 1] A [2 2 2 2]
A [0 1 1 2] B [1 1 1 1] A [2 2 2 2]

Then in this case where the B would have been selected as the second block:

A [0 1 1 1] B [1 1 1 2] A [2 2 2 2]

We instead swap the first and last value before selecting, which causes it to be inserted as the third block:

A [0 1 1 1] A [2 2 2 2] B [2 1 1 1]
                           ^     ^

Then instead of having this:

B [0 1 1 1] A [1 1 1 1] B [1 1 1 1] A [2 2 2 2]

The first and last values are swapped and B is selected as the second block:

A [1 1 1 1] B [1 1 1 0] B [1 1 1 1] A [2 2 2 2]
               ^     ^

I'm going to research this further right now; this was just a thought that popped into my head and hasn't been fully tested. The goal was that by rearranging and encoding the blocks in every pair of subarrays before merging all of them, the internal buffer could be reduced to sqrt(N).

from wikisort.

Mrrl avatar Mrrl commented on June 14, 2024

Looks like you can work with internal buffer of any size 2*sqrt(N/k) for fixed k:

  • first, sort blocks of size sqrt(N*k) using buffer of sqrt(N/k) for tagging
  • then split each block to k subblocks of size sqrt(N/k)
  • then reorder these subblocks using O(k)=O(1) memory for their tagging and immediately merge them using second buffer of sqrt(N/k) (you will have not more than 2_k unmerged subblocks at every moment).
    It will cost extra O(N_log(N)) swapping operations for the second reordering, but you will win in number of comparing operations.

from wikisort.

davidrhee avatar davidrhee commented on June 14, 2024

Block Sort is not linear in time because the number of reads and writes during extract, tag and drop-roll phases are worse than O(N Log(N)).

Also you cannot just "drop" the coefficients, for instance initial sort of 16 * N or 32 * N is N Log(N) where N = 2^16 and 2^31 (1 Billion).

So you need to sort arrays much larger than the block size to even prove that you have gained back the efficiencies. Sadly, researchers present their ideas with small search ranges, even 100 million is too small to evaluate an algorithm.

Comparison operations are cheap vs memory reads and writes, every practical algorithm is bottle-necked by memory transfer rates. E.g., a modern GPU may have 10 times the speed of memory transfers compared to a CPU, and for this reason alone, you can get a maximum of 10 times speedup, if your sort can be parallelized.

Finally , the authors of Block Sort should have realized that the minimum number of memory moves for In-Place MERGE of two arrays is N Log(N), and therefore the minimum In-Place Merge SORT should be N Log(N)^2.

Why N Log(N) for In-Place MERGE? - Because at least Log(N) rotations are required for any element to reach it's final destination. You can think of it in terms of increasing powers of a base, e.g., 1, 10, 100, 1000. If an element is between 1 and 1000, it should only be rotated within the array 3 times Log10(1000) = 3.

With the power of desktop PCs today, anyone can verify these algorithms with hard data, but you'll struggle to find charts and performance analysis for significant sample sizes. So the myth will continue that the is the best but no body uses it.

from wikisort.

eaglgenes101 avatar eaglgenes101 commented on June 14, 2024

Because at least Log(N) rotations are required for any element to reach it's final destination

If blocks of elements are rotated around together, then that rotation cost is distributed over the elements. Which for this sort is what is happening.

If you still think that this sort isn't O(n log n) as claimed, you would do well to directly identify a mistaken reasoning step in the derivation of the time complexity.

from wikisort.

RoyiAvital avatar RoyiAvital commented on June 14, 2024

Would you recommend this algorithm for low overhead small arrays?
For instance ~5-500 elements of Unsigned Integers?

from wikisort.

Related Issues (18)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.