The hppcrt from vsonnier

HPPCRT-7 : Use consistent min/max allowable size for containers

Review the containers to consistently implement the following policy, if not already done so :

Their minimum size is always clamped by a defined min/max, regardless of input (remove assertions in constructors about that)
Their maximum growth, is limited, so that going beyond this limit triggers some kind of OutOfMemoryException.
Upgrade the maximum allocable size to something like Integer.MAX_VALUE - 32
since 64bit JVM are now capable of allocating arrays with almost as much as Integer.MAX_VALUE elements.
For instance, in JDK 7 , max ArrayList is indeed Integer.MAX_VALUE - 8.

HPPCRT-21: Publish HPPC-RT v0.65 stable on Maven

For now, chosen groupId = vsonnier,
Classpasth is com.carrotsarch.hppcrt.*

HPPCRT-9: Remove template files and other internal usage classes from output jars

By now, executable jars and source jars are generated including files that should remain private to the project:

Template files
Intrinsics class
Internals class.

This creates a bigger runtime jar than necessary, and a user could use template files in runtime code instead of using Object versions. This creates confusion and lead to the usage of inferior performance code.

HPPCRT-34: Relax signature of sort() methods to match JDK ones

Relax signature of

<KType extends Comparable<? super KType>> KTypeSort.sort( Comparable )

because it restrics too much the practical usage, into simply

<KType> KTypeSort.sort(...)

just like in the JDK, the price being the same: ClassCastException if the types are indeed not Comparable.

As a bright side, primitive and Generics list-like containers sorts APIs are now the same.

HPPCRT-4 : Improve TemplateOptions for conditional generation depending of specific primitive types

HPPCRT-11 : Iterators states are inconsistent

When released back to pools, iterators may keep references over internal buffers, that may cause memory leaks.
In LinkedList, when inserting through iterators, their state is not uptodate w.r.t the potentially new buffer reférences follwing reallocations, creating crashes.

Solution:

Add reset() method to ObjectPool
Properly update buffer references in LinkedList insertAfter/insertBefore iterator methods.

HPPCRT-13 : Heaps equals() errors and various other equals() inconsistencies

Misc problems:

Heaps: equals() do not use the comparator to compare elements. Besides, equals() was not unit-tested at all, so was likely to not work at ALL.
Containers should be comparable between their common interfaces (IndexedContainer x Deque) for list-like containers.

HPPC-101: Drop JUB, switch to JMH for benchmarks (originally HPPC issue)

As HPPC-101 said, get rid of JUnitBenchmarks
and convert them to JMH. And in the darkness, bind them.

HPPCRT-29 : Regression: setDefaultValue() method absent from v0.66 IndexedHeaps

Damn.

HPPCRT-39 : Port the new AST-based template processor of HPPC v0.7.x

This is a meta-issue tracking the "port" of HPPC v0.7.0 template processor to HPPCRT, in order to benefit of the much more powerfull AST-based template processor of HPPC v0.7.0, and ease the future merges with HPPC evolutions.

HPPC-126 : Template preprocessor should fully parse and understand signatures. This is the main issue.
The port will proceed in several steps:

First step, make the new processor generate the same output as 0.7.x-RT branch, i.e v0.68 relase + HPPCRT-35 + HPPCRT-36 + HPPCRT-37.
Second step, make Intrinsics and other inlines "registered" to the processor, which itself will make the necesary replacements (instead of the regex-powered filterInlines() pass). Once again, the processor will have to generate the same output as before.
Update 2015.05.22 : WONTFIX, no clear advantage because of the variety of method calls, viewed as a generic expression rule, which has to be quite analysed to be usable. So, continue to use a regex-based replacer in filterInlines() as before.
Third step, introduce all the Intrinsics syntax novelties:
HPPC-122 + HPPC-137 : An overhaul of intrinsics (equality comparisons, no vtype/ ktype distinction, etc.)
Again, the processor will have to generate the same sources as before.
Final step, make the final code changes:
HPPC-118: Buffer arrays for generic types should be declared as Object[] to avoid compiler-injected automatic casts.
HPPC-110: putOrAdd and other primitive-only methods should be template-compilable.

Maybe ?

HPPC-105: Cleanup project structure and IDE integration
==> some manual cherry-picks of good ideas here and there, but not much more.

HPPCRT-36: Simplify specific IdentityHash impl by deriving a specific CustomHash

Sort of duplicate of HPPC-111:
Replace the specific IndentityHash classes implementations by simply a specific, derived CustomHash class using a KTypeIdentityHash strategy.
As a maintenance POV, this is much lighter.

HPPCRT-14: Add a forEach(KTypeVTypePredicate) for the KTypeVTypeAssociativeContainer

HPPCRT-31: Hash containers, merge allocated array and keys

Following this discussion, I experimented the following idea for hash containers:

if (Key == defaultValue) {
 // special case, use an explicit allocatedDefaultValue flag to test existence...etc.
} else {
// since all keys are unique in a map/set, defaultValue in the key array means allocated = false, else
//every other value indeed mean allocated = true. So effectively the allocated[] array looks no longer needed that way ?
}

That is defaultValue being one special key value also acting as sentinel meaning "not allocated" in the keys array itself, effectively removing the need of an allocated array of flags.
Also choosing defaultValue = 0 / null is likely to have good performance.

All in all after much effort in the hash-single-array branch, the BenchmarkHashContainersSuite showed some definite improvements. Randomizedtesting also helped me a lot.

The modifications were done on Primitive hashs and Identity Hash, i.e were all keys behave like PODs.
The CustomHash and Object containers remains Robin-Hood, the main advantage being here that allocated array acts also as a hash caching mechanism.

All in all, Primitive Hash shows somewhere 10 % improvement, more or less. Identity Hash improvement is much smaller, but since this solution also implies less memory, its a keeper either way.

As usual, the single-array code is enabled in the template code by conditional compilation, so we may revert it in the future. Note however that Robin-Hood and Single Array are indeed imcompatible in template code for simplification, which effectively remove Robin-hood for Primitive OpenHash containers.

While testing and trying to complete Unit tests coverage, I'll aso managed to catch some more bugs: #32 and 40640fe thanks to eclEmma plugin in Eclipse and Randomizedtesting.

HPPCRT-37: API changes and internal code refactorings from HPPC v0.7.0

This is a meta-issue tracking what I may add in HPPC-RT from HPPC v0.7.0 master.

Already done in v0.68 and before:

HPPC-97 : Use unallocated slot marker key instead of an explicit allocation table for hash containers
HPPC-101 : All current benchmarks are JMH or ad-hoc ones.
HPPC-106 : Drop Guava adapter (and dependency)
HPPC-113 : Cleaned up API transitioning from capacity to expected elements. Already more or less HPPC-RT principle, RT has also capacity() method.
HPPC-119 : Make the default hashing strategy an intrinsic (through the generic inlining mechanism)

Added for next release:

Won't do:

HPPC-112: Add ensureCapacity(elements) to containers.
For simplification sake, don't implement Preallocable
and only do HPPC-114 to be able to catch OMM situations.
The HPPC-RT philosophy is to use preallocated sizes if the user wants zero-garbage behaviour. The moment reallocation occurs you lost anyway, which means either your application is misbehaving, or you fucked up by not preallocating at the right size. In this case, diagnosis messages from HPPC-114 are enough.

HPPC-116 : Index - based methods to replace lget()/lslot() and so on. Let's drop the subject completely for the sake of API simplification and maintainance POV.
HPPC-139: Add release() to the API.
HPPC-143: Add KTypeScatterSet and KTypeVTypeScatterMap. For API simplicity sake, (and maintainance for the dev. POV) don't add another specialization which is only there for performance reasons.
HPPC-144: Separate esoteric container combinations into a separate JAR. Although I agree that floating point keys in hash maps do not make a lot of sense (even more for indirect floating point CustomHash !), I don't want to make another jar : So either drop them completely, or keep them as is. Either way, it is only a matter of existing Velocity directives doNotGenerateKType() to customize...

HPPCRT-33: Adopt perfomance enhancements found in fastutil 6.6x and Koloboke

Shamelessly steal the following performance optimizations from Fastutil 6.6x and Koloboke:
-- Replace MurmurHash3 scrambling by phiMix from Koloboke,
-- Unroll some fast-paths in some Hash methods for faster searches.

Study their actual performance gain, maybe also update the Caliper benches to JMH taking example on
java-performance.info benchmarks ( #27)

HPPCRT-5 : Add B+tree, ordered Map and Set

HPPCRT-16: Import some HPPC v60 fixes + Template inlines

Shamelessly import those fixes:
HPPC-85 : addTo and putOrAdd pulled up to ObjectIntMap interface.
HPPC-93 : NaN keys are not treated correctly in hash sets/ maps. + completed by sorting tests.
Plus pom.xml fixes.

New internal feature : add some kind of primitive, localized (private) inline expressions, like:

/*! #if ($TemplateOptions.inline("add","(x, y)", "x+y")) !*/
private add(int x , int y) {
     return x + y ;
}
/*! #end !*/

is inlined as "x + y" in expressions in generated code.
Use this feature to make some special functions in ArrayDeque and LinkedList private instead on putting them into Intrinsics. (oneLeft, oneRight, linkXXXX...etc.)
This will allow transparent usage of small expression functions that are garanteed to be inlined in generated code for performance. For B+tree, one day. :)

HPPCRT-42: Remove Boolean containers entirely.

Well, boolean containers are not that usefull, really. We can easily emulate this with simple expressions as (bool_expression?1:0 in a normal container, and back.
However, the biggest problem is that booleans are not box/unboxable into Number, which leads that the whole Unit Test set needs to be adapted heavily. So much that until now, boolean variants are not test-covered at all.
Although the risk of bug is small, the boolean versions are actually not proved to be correct which may lead to forbid the usage of the lib in Safety-sensitive applications.

_Conclusion_ : kill all booleans !

HPPCRT-15 : Add flexibility by lower bounding Comparators and HashStrategy all over the containers (Comparator<? super KType> and HashStrategy<? super KType>

Caliper benchmark framework looks abandoned, migrate the benchmarks to JMH

Caliper sounds dead from January 2014. Besides, Caliper v1.0 has become a big bloat of code from v0.5 forcing new usage with things like :

no longer simple Console output, only results "Web publishing" style ...
not even a simple statically-generated site, à la Surefire reports, which would be fine.

Those are show stoppers for all kinds of scripting process, and more importantly, what of real-life closed source corporate developpement ? We certainly don't want any kind of internal info leaking out.

Well, so either migrate those in "misc" with hand-made "synthetic" benchmarks or use JMH v1.0+ which looks the new boss in town.

HPPCRT-19: Moved strategies from OpenHash and create CustomHash containers instead.

Do as fastutil does.

HPPCRT-44: Simplify LinkedList iteration

Removed ìsFirst(), isLast(), isHead(), isTail() iterator methods and replace them by hasAfter()/ hashBefore(),
see Javadoc for examples.

HPPCRT-41: Remove senseless key types for CustomHash containers

Specific version of "HPPC-144: Separate esoteric container combinations into a separate JAR."
applied on CustomHash containers. I indeed cannot conceive "indirect" hash classes application outside the cases of keys as either Objects (of course) or indexes, pointer-like indirections to another structure. (int or longs).
So, finally do no generate CustomHash for keys = byte, char, short, float, double, boolean,
only keep (int, long, Object).

This leads to smaller Jar, quicker Unit tests execution as a bonus.

Comparison to other Collection libraries

Hello,

I'm interested to see a comparison to other libraries - both hppc and hppc-rt seem to be fairly active nowadays, why should I pick hppc-rt? Is it really RT as it claims? How does it compare to fastutil, Javolution, SmoothieMap, Koloboke?

I see you have quite a lot of JMH benchmarks in there, so would be interested to see the results on a wiki page or something.

Or, which would be much, much better - contribute benchmarks for hppc-rt to other libraries so that they have a direct comparison which they can showcase, too.

Thank you,
Petr

HPPCRT-25: KTypeLinkedList.descendingForEach(Predicate) is not working.

Due to lack of test coverage, this one slipped under the radar, caused by a cut-n-paste issue.

Add test coverage for out-of-bounds conditions

In 0.72+, test coverage is complete for the nominal use cases except trivial short methods or small facades, which do not deserve testing.
What is missing is the coverage of out-of bounds conditions:

Correct OutOfMemoryError translated into BufferAllocationException (all containers)

Update 2016.04.30: done by code inspection.
Range methods of KTypeIndexedContainer, triggering either InvalidArgumentExceptions , or IndexArrayOutOfBoundsExceptions,
assert conditions (all containers).

HPPCRT-1 : initial constructor sizes of some containers do not garantee zero reallocations

This the HPPC-RT version of the HPPC-91 issue.
In the Realtime context, the initially provided size for HashSets and HashMaps must be considered as a true garantee the container will never be realllocated if the number the elements stays <= to the intital constructor size.

Solution is, as suggested in the HPPC-91 discussion,
to take into account the load factor in constructors :
allocateBuffers(roundCapacity((int) (initialCapacity / loadFactor) + 1));

HPPCRT-2 : Simplify template syntax for KTypeVTypeOpenHashMap.putOrAdd()

This have unecessary duplicate syntax, depending of KType: try to simplify it with cascading Velocity #if.

HPPCRT-10: Heaps removeAll(Predicate) is not exception - safe

Fix Exception safety for Heaps with predicates
Add Unit tests for exception safety in heaps, but also in HashSets / HashMaps to be sure

Improved Heaps test coverage

Following HPPCRT-23, HPPCRT-24, HPPCRT-25 I'm now using eclEmma for Eclipse to track coverage deficencies. Following the 3 abovementionned bugs, I patched most the holes
for all containers.

Remains Heaps coverage to complete.

HPPCRT-3 : Add #define/#ifdef/#ifndef...etc C-pre-processor-like syntax to TemplateOptions

Add #define/#ifdef/#ifndef...etc C-pre-processor-like syntax to TemplateOptions
so that true conditional compilation could take place.
Typical usage is to generate "DEBUG" builds, with more sophisticated options
than using "assert".
This could be a big advantage when developping complex Template code.

HPPCRT-26: Add specialized IdentityHash containers

Following the remark here, I've decided that maybe a really IdentityHashMap and IdentityHashSet would be worth the performance gain, in place in using CustomHashMap with IdentityHash.

Of couse, System.identityHashmap() supposely being a "perfect hash", no need to Robin-Hood hashing there.

HPPCRT-27: Removed Guava adapter classes, which had very little value anyway

So now the HPPC-RT runtime has really no external dependencies.

HPPCRT-12 : Make ArrayDeque a KTypeIndexedContainer

Turns out an ArrayDeque is almost a KTypeIndexedContainer with a constant cost calculation address, so it is cool to set() and get() on it, and iterating by index instead of boring and ugly iterators.
Just like Stacks, the meaning of indices are now consistent with de KTypeIndexedContainer, meaning that index 0 means the head, size() - 1 the last element, in a case of the queue.

Consequently the existing removeFirst/LastOccurence() are now consistent with KTypeIndexedContainer and returns indices the abovementionned way.

The almost part means I won't bother support insert() and removeRange(), which are very costly on a circular buffer and not in the spirit of the dequeue at all.
Alternatively, use LinkedList which is incidently a full KTypeIndexedContainer and Dequeue.

HPPCRT-32: CustomHashSet wrongly uses equals()/hashCode() instead of strategies

It uses a usual equals() test instead of the strategy-specified one in some places. Bad !
Looks like CustomHashs need a bunch of new tests for contains(), get(), remove() with non-trivial strategies (purposefully NOT equivalent to equals/hashCode()) so that bugs cannot hide this time.

HPPCRT-8: consistently Implement trim/trim-to-size methods for all kinds of ordered containers

For all containers that have implicit or explit ordering, implement trim / trim-to-size methods,
with consistent naming. Such methods will really free the related memory structures.
Such containers are : lists, deques, stacks, prio-queues, B+tree maps and sets.

HPPCRT-45: API with intervals arguments is inconsistent with JDK conventions

I'll be damned.
The KTypeIndexedContainer methods removeRange(), forEach(procedure/predicate slices) are inconsistent with JDK conventions, and so are KTypeSort.quickSort(beginIndex, endIndex) methods.

The usual is that intervals are expressed as [beginIndex, endIndex[. However, inputs such as 0 <= beginIndex <= endIndex <= size() are all considered valid. The former if OK, but not the later in HPPC-RT.
Which means that the quite usual beginIndex == endIndex idiom meaning null-sized interval, throws exceptions instead of being accepted as it is the case for JDK.
For instance, KTypeSort.quicksort(table, 0, table.size()) with a null-sized table should be accepted, instead of throwing exceptions.

HPPCRT-24: Bug in Heaps with contains()/removeAllOccurences()

They are using equals() instead of Comparable/Comparator defined for the container.

HPPCRT-17 : Test and eventually adopt Robin Hood hashing for Hash containers

Experiment with Robin Hood hashing from
MoonPolySoft Robin hood hashing experiment
and determine if it is interesting to have or not.

Improve it by caching the computed hashes.

HPPCRT-30: Object[VType]OpenHashMap.putOrAdd() wrong in Robin-Hood version

Meh. Lack of coverage again. Hail to the eclEmma Eclipse plug-in !

HPPCRT-20 : Repackage to HPPC-RT and reduce the mess

Repackage into com.carootsearch.hppcrt.lists|sets|maps|heaps for housekeeping

HPPCRT-22 : Follow-up of HPPCRT-18 for more Perturbation-free fixes in Hashs containers.

Complete the HPPCRT-18 issue.
As Dawid Weiss told me in this discussion, the reverse-iteration trick was missing the complete picture : indeed problems could arise for ANY kind of process that iterate the hash container buffer.
In this case then, direct iteration as well as forEach() or removeAll(final KTypePredicate<? super KType> predicate) still iterate the underlying buffer "in-order"
which should create the same kind of trouble as putAll() if such iterations are used to fill other hash containers.

Solution:

Also apply reverse iteration to all forEach() of the Hash containers
We cannot change removeAll(final KTypePredicate<? super KType> predicate) iteration direction, so add a big warning in Javadoc.
Add a Javadoc warning over direct iteration of keys buffer

Also add some benchmarks to show performance dangers of direct iteration.

HPPCRT-38: Study fastutil 6.6.4 new genetically engineered bit mixing

Bench against HPPCRT-35 for non-negative impact.
Also bench against put/contains/remove and see if it can be used also on Objects, so replacing Murmurhash3 everywhere.

HPPCRT-43 : Remove Stacks entirely.

Stacks have indeed little added value.

How so:

Counter intuitive KTypeIndexedContainer more often than not,
ArrayDeque is just fine for the same job, with a clear view of how elements are ordered,
I have never used it

To compansate, added some Stack-like methods directly to KTypeArrayList.

HPPCRT-35: Hash to hash batch copying hangs, regression from the working perturbation policy

This is the great return of the problem (that indeed never left) of the hash to hash putAll() hang problem.
First discovered, then fix by the 'perturbation' policy, as it was named then: HPPC-80.
Later, in HPPCRT-18 then in HPPCRT-22 I beleived to have solved the performance impact of perturbation by replacing it by a "reverse iteration".

Alas, the benchmark on which I based my judgement was flawed, as Dawid showed me recently.
Indeed, this simple code hangs bautifully:

public class HashCollisionsCornerCaseTest
{
    @Test
    public void testHashHangs()
    {
        IntOpenHashSet a = new IntOpenHashSet (N);
        for (int i = N; i-- != 0;) {
            a.add(i);
        }
        System.out.println("Start...");
        IntOpenHashSet  b2 = new IntOpenHashSet (P);
        b2.addAll(a);
        System.out.println("End...");
    }
}

With N and P preallocation sizes.
It hangs with N = 1e7 and P = 16 (default)

HPPCRT-18 : Remove Perturbation methods in Hashs and replace with reverse iteration.

Perturbation methods have been introduced by
HPPC-80 : Practical deadlock on populating a set/ map with an iterator over another map
to prevent exessive slowness when putAll().
However, this method introduce a slowdown in the general case.
Use this instead : make the iterator of Hashs to go "in reverse" from the buffer when iterating, so going
in the opposing direction of the filling of the destination container, so side-stepping the "longest conflict chain". BTW, this is exactly what expandAndXXX() does already !

Benchmarks shows a clear performance gain ( BenchmarkPerturbedVsHashedOnly) where we can now putAll() in another hash with the same performance as before, while gaining the same performances as the previous "non-perturbated" versions on all the other methods.

HPPCRT-6: Stack use ArrayList.sort() method for sorting, which leads to counterintuitive sort

By inheritance , the Stack uses the ArrayList.sort() methods. That leads for the stack to be
sorted with the smallest elements at the bottom of the stack, which is surely not the expected behaviour.
3 ways to fix :

Be rough, override on Stack and throw UnsupportedOperationExecption()
Be lazy, document the existing behaviour,
Be a good boy, implement inverse sorts in sorting package, and use a specialized implem for Stacks.
3-2) Then realize that yoy can actually implement inverse-sort to all others containers too..
3-3) Damn that means loads of additional unit tests !!!

I've not chosen the option yet.

HPPCRT-23: Make KTtypeIndexedPriorityQueue a IntKTypeMap + API additions

Several sub-tasks:

A) Another API breaking change, meh.
I was not satisfied with the API duplication the IndexedHeapPriorityQueue induced, with
redefinition of all Predicates, Procedures, and so on, while this container is well, just a (K,V) = (int, VType) map.

So I decided to change it and make KTtypeIndexedPriorityQueue a IntVTypeMap<VType> map,
or more pratically a KTtypeIndexedPriorityQueue<KType> implements IntKTypeMap<KType>.
Trouble is, IntKTypeMap do not exist in templates, or would overriding with KTypeVTypeMap<KType, VType> when instantiated.
Finally it turns out to be quite simple, using a poor man partial template specialization in templates
only : just create the IntKType[Map, Procedure|Predicate...] in templates and anotate them
with
/! ${TemplateOptions.doNotGenerateKType("all")} !/
so they are not instantiated at all.
For smother operation, I then modified TemplateProcessor so that
Velocity parsing truely bail-out when ${TemplateOptions.doNotGenerateKType(...) directive arises.
Which also means that contrary to before, the file may even be an invalid Velocity file except for the first lines before /! ${TemplateOptions.doNotGenerateKType("all")} !/ appears.

So well, it works. IndexedHeaps are now truly maps, supporting all methods, forEach(), keys(), values() and all the fun stuff, with detailed Javadoc to explain the differences.

To finish, KTypePriorityQueue.insert() also changed to be KTypePriorityQueue.add() for general
consistency naming usage.

B) API refactorings:
- getComparator() renamed to comparator()
- changePriority() and refreshPriorities() renamed updatePriority() and updatePriorities() to reflect their fonction more.

C) API additions inspired by fastutil heaps API
which are indeed interesting to have :
- updateTopPriority() : update prio of the top() element (all heaps)
- topKey() : gives the key of the top() element for the IndexedHeap

HPPCRT-28: Sort improvements : generic KTypeIndexedContainer sorts and others

Tasks:

A) Because [Double|Float].compare(a,b) is overkill to do everytime for each value couple, do a dedicated NaN pass and proceed afterwards with normal ordering comparions (<,>, ==) which are faster. That is what java.util.Arrays do since forever, after all.

B) Add a stable sort version for objects, in-place. Found this by Thomas Baudel.

C) Add a generic sorts for KTypeIndexedContainer, and so add range sorts to Dequeues, LinkedLists (with big warning for LinkedLists)

HPPCRT-40 : Remove DoubleLinkedIntSet and Bitset

I have neither use, nor motivation to maintain that w.r.t pooled iterators, test code coverage, and such. Have a nice day.
Also a duplicate of HPPC-146.

vsonnier / hppcrt Goto Github PK

hppcrt's People

Contributors

Stargazers

Watchers

Forkers

hppcrt's Issues

Solution:

Maybe ?

Already done in v0.68 and before:

Added for next release:

Won't do:

Recommend Projects

Recommend Topics

Recommend Org

Jobs