GithubHelp home page GithubHelp logo

vsonnier / hppcrt Goto Github PK

View Code? Open in Web Editor NEW

This project forked from carrotsearch/hppc

87.0 87.0 6.0 128 MB

HPPC-RT, fork of CarrotSearch's for Realtime

Java 98.23% HTML 0.28% ANTLR 1.49%
hppc hppcrt java-5 no-dependencies realtime retrolambda

hppcrt's People

Contributors

dweiss avatar vsonnier avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

hppcrt's Issues

HPPCRT-7 : Use consistent min/max allowable size for containers

Review the containers to consistently implement the following policy, if not already done so :

  • Their minimum size is always clamped by a defined min/max, regardless of input (remove assertions in constructors about that)
  • Their maximum growth, is limited, so that going beyond this limit triggers some kind of OutOfMemoryException.
  • Upgrade the maximum allocable size to something like Integer.MAX_VALUE - 32
    since 64bit JVM are now capable of allocating arrays with almost as much as Integer.MAX_VALUE elements.
    For instance, in JDK 7 , max ArrayList is indeed Integer.MAX_VALUE - 8.

HPPCRT-9: Remove template files and other internal usage classes from output jars

By now, executable jars and source jars are generated including files that should remain private to the project:

  • Template files
  • Intrinsics class
  • Internals class.

This creates a bigger runtime jar than necessary, and a user could use template files in runtime code instead of using Object versions. This creates confusion and lead to the usage of inferior performance code.

HPPCRT-34: Relax signature of sort() methods to match JDK ones

Relax signature of

<KType extends Comparable<? super KType>> KTypeSort.sort( Comparable ) 

because it restrics too much the practical usage, into simply

<KType> KTypeSort.sort(...) 

just like in the JDK, the price being the same: ClassCastException if the types are indeed not Comparable.

As a bright side, primitive and Generics list-like containers sorts APIs are now the same.

HPPCRT-11 : Iterators states are inconsistent

  1. When released back to pools, iterators may keep references over internal buffers, that may cause memory leaks.
  2. In LinkedList, when inserting through iterators, their state is not uptodate w.r.t the potentially new buffer reférences follwing reallocations, creating crashes.

Solution:

  1. Add reset() method to ObjectPool
  2. Properly update buffer references in LinkedList insertAfter/insertBefore iterator methods.

HPPCRT-39 : Port the new AST-based template processor of HPPC v0.7.x

This is a meta-issue tracking the "port" of HPPC v0.7.0 template processor to HPPCRT, in order to benefit of the much more powerfull AST-based template processor of HPPC v0.7.0, and ease the future merges with HPPC evolutions.

HPPC-126 : Template preprocessor should fully parse and understand signatures. This is the main issue.
The port will proceed in several steps:

  • First step, make the new processor generate the same output as 0.7.x-RT branch, i.e v0.68 relase + HPPCRT-35 + HPPCRT-36 + HPPCRT-37.
  • Second step, make Intrinsics and other inlines "registered" to the processor, which itself will make the necesary replacements (instead of the regex-powered filterInlines() pass). Once again, the processor will have to generate the same output as before.
  • Update 2015.05.22 : WONTFIX, no clear advantage because of the variety of method calls, viewed as a generic expression rule, which has to be quite analysed to be usable. So, continue to use a regex-based replacer in filterInlines() as before.
  • Third step, introduce all the Intrinsics syntax novelties:
    HPPC-122 + HPPC-137 : An overhaul of intrinsics (equality comparisons, no vtype/ ktype distinction, etc.)
    Again, the processor will have to generate the same sources as before.
  • Final step, make the final code changes:
    HPPC-118: Buffer arrays for generic types should be declared as Object[] to avoid compiler-injected automatic casts.
    HPPC-110: putOrAdd and other primitive-only methods should be template-compilable.
Maybe ?
  • HPPC-105: Cleanup project structure and IDE integration
    ==> some manual cherry-picks of good ideas here and there, but not much more.

HPPCRT-31: Hash containers, merge allocated array and keys

Following this discussion, I experimented the following idea for hash containers:

if (Key == defaultValue) {
 // special case, use an explicit allocatedDefaultValue flag to test existence...etc.
} else {
// since all keys are unique in a map/set, defaultValue in the key array means allocated = false, else
//every other value indeed mean allocated = true. So effectively the allocated[] array looks no longer needed that way ?
}

That is defaultValue being one special key value also acting as sentinel meaning "not allocated" in the keys array itself, effectively removing the need of an allocated array of flags.
Also choosing defaultValue = 0 / null is likely to have good performance.

All in all after much effort in the hash-single-array branch, the BenchmarkHashContainersSuite showed some definite improvements. Randomizedtesting also helped me a lot.

The modifications were done on Primitive hashs and Identity Hash, i.e were all keys behave like PODs.
The CustomHash and Object containers remains Robin-Hood, the main advantage being here that allocated array acts also as a hash caching mechanism.

All in all, Primitive Hash shows somewhere 10 % improvement, more or less. Identity Hash improvement is much smaller, but since this solution also implies less memory, its a keeper either way.

As usual, the single-array code is enabled in the template code by conditional compilation, so we may revert it in the future. Note however that Robin-Hood and Single Array are indeed imcompatible in template code for simplification, which effectively remove Robin-hood for Primitive OpenHash containers.

While testing and trying to complete Unit tests coverage, I'll aso managed to catch some more bugs: #32 and 40640fe thanks to eclEmma plugin in Eclipse and Randomizedtesting.

HPPCRT-37: API changes and internal code refactorings from HPPC v0.7.0

This is a meta-issue tracking what I may add in HPPC-RT from HPPC v0.7.0 master.

Already done in v0.68 and before:

HPPC-97 : Use unallocated slot marker key instead of an explicit allocation table for hash containers
HPPC-101 : All current benchmarks are JMH or ad-hoc ones.
HPPC-106 : Drop Guava adapter (and dependency)
HPPC-113 : Cleaned up API transitioning from capacity to expected elements. Already more or less HPPC-RT principle, RT has also capacity() method.
HPPC-119 : Make the default hashing strategy an intrinsic (through the generic inlining mechanism)

Added for next release:
  • HPPC-115 and HPPC-103 : Return of the lost perturbation. I'll roll my own vision here with HPPCRT-35 without any customization for simplification.
  • HPPC-111:Drop identity(map|set) specialization and add overrideable comparator. Easy enough by subclassing CustomHashMap, done through HPPCRT-36
  • HPPC-104 : Ability to create a read-only view of a map. Consists in removing l* methods from maps and sets.
  • HPPC-117: API simplifications and changes. For me, just newInstanceXXX() renamed into newInstance() with proper overrides.
  • HPPC-120: Rework entry shifting routine to be less hairy
  • HPPC-121: Rename remove{All|First|Last}Occurrences(key) to remove{All|First|Last}(key)
  • HPPC-108: Rename IntDoubleLinkedSet to DoubleLinkedIntSet
  • HPPC-125: equals should not compare with subclasses of itself.
    This was mostly already done, with some modifications: all containers with overridable criteria for equality or order are already compared against their overridable criteria. So add getClass() test instead of instanceof to prevent comparison with subclasses.
    On the other hand, all lists have an implicit order independent of the nature of elements, so finally all KTypeIndexedContainer are comparable with each other.
  • HPPC-140 : Maps return a fully fledged collection from values(). Indeed, keys() and values() already returned Collections from a long time in HPPC-RT. Still, rename the retuned types from KeysContainer/ValuesContainer into KeysCollection/ValuesCollection to stress on that fact.
  • HPPC-130: removeAll(KTypeLookupContainer) had an incorrect generic signature
  • HPPC-131: retainAll(KTypeLookupContainer) had an incorrect generic signature
  • HPPC-133: KTypeContainer.toArray(Class) can return incorrect array type
  • HPPC-135: KTypeVTypeAssociativeContainer#removeAll had an incorrect generic signature
  • HPPC-114: Buffer resizing and allocation should be throwing non-assertion mode exceptions. Finally add it, among lots of other little changes here and there to converge on the HPPC sources.
  • HPPC-134: Set and Map's removeAll() should pick the best removal strategy.
  • HPPC-141: Drop mutables (*Holder classes), but keep the IntHolder for tests.
  • HPPC-145: Remove "Open" to simplify class names of hash containers.
  • HPPC-146: Remove DoubleLinkedIntSet, duplicate of HPPCRT-40.
  • HPPC-149: Recognize tests.seed as the initialization seed for randomized key mix strategies. Simply plug the property into Containers.randomSeed64() which generates the unique perturbation seed per-container. The only usefullness is indeed for tests, where we can actually have deterministic perturbation values.
  • HPPC-152: Add XorShift128+, and replace XorShiftRandom. Make it extend Random and add next(bits) like the old XorShiftRandom so that it is also a full fledged Random class.
  • HPPC-159: Add .visualizeKeyDistribution(int characters) to maps and sets, but only for tests. No need to weigh down the public API more with a thing the final user have no control of. (HPPC-RT key "randomization" is not configurable contrary to HPPC)
Won't do:

HPPC-112: Add ensureCapacity(elements) to containers.
For simplification sake, don't implement Preallocable
and only do HPPC-114 to be able to catch OMM situations.
The HPPC-RT philosophy is to use preallocated sizes if the user wants zero-garbage behaviour. The moment reallocation occurs you lost anyway, which means either your application is misbehaving, or you fucked up by not preallocating at the right size. In this case, diagnosis messages from HPPC-114 are enough.

HPPC-116 : Index - based methods to replace lget()/lslot() and so on. Let's drop the subject completely for the sake of API simplification and maintainance POV.
HPPC-139: Add release() to the API.
HPPC-143: Add KTypeScatterSet and KTypeVTypeScatterMap. For API simplicity sake, (and maintainance for the dev. POV) don't add another specialization which is only there for performance reasons.
HPPC-144: Separate esoteric container combinations into a separate JAR. Although I agree that floating point keys in hash maps do not make a lot of sense (even more for indirect floating point CustomHash !), I don't want to make another jar : So either drop them completely, or keep them as is. Either way, it is only a matter of existing Velocity directives doNotGenerateKType() to customize...

HPPCRT-16: Import some HPPC v60 fixes + Template inlines

Shamelessly import those fixes:
HPPC-85 : addTo and putOrAdd pulled up to ObjectIntMap interface.
HPPC-93 : NaN keys are not treated correctly in hash sets/ maps. + completed by sorting tests.
Plus pom.xml fixes.

New internal feature : add some kind of primitive, localized (private) inline expressions, like:

/*! #if ($TemplateOptions.inline("add","(x, y)", "x+y")) !*/
private add(int x , int y) {
     return x + y ;
}
/*! #end !*/

is inlined as "x + y" in expressions in generated code.
Use this feature to make some special functions in ArrayDeque and LinkedList private instead on putting them into Intrinsics. (oneLeft, oneRight, linkXXXX...etc.)
This will allow transparent usage of small expression functions that are garanteed to be inlined in generated code for performance. For B+tree, one day. :)

HPPCRT-42: Remove Boolean containers entirely.

Well, boolean containers are not that usefull, really. We can easily emulate this with simple expressions as (bool_expression?1:0 in a normal container, and back.
However, the biggest problem is that booleans are not box/unboxable into Number, which leads that the whole Unit Test set needs to be adapted heavily. So much that until now, boolean variants are not test-covered at all.
Although the risk of bug is small, the boolean versions are actually not proved to be correct which may lead to forbid the usage of the lib in Safety-sensitive applications.

_Conclusion_ : kill all booleans !

Caliper benchmark framework looks abandoned, migrate the benchmarks to JMH

Caliper sounds dead from January 2014. Besides, Caliper v1.0 has become a big bloat of code from v0.5 forcing new usage with things like :

  • no longer simple Console output, only results "Web publishing" style ...
  • not even a simple statically-generated site, à la Surefire reports, which would be fine.

Those are show stoppers for all kinds of scripting process, and more importantly, what of real-life closed source corporate developpement ? We certainly don't want any kind of internal info leaking out.

Well, so either migrate those in "misc" with hand-made "synthetic" benchmarks or use JMH v1.0+ which looks the new boss in town.

HPPCRT-41: Remove senseless key types for CustomHash containers

Specific version of "HPPC-144: Separate esoteric container combinations into a separate JAR."
applied on CustomHash containers. I indeed cannot conceive "indirect" hash classes application outside the cases of keys as either Objects (of course) or indexes, pointer-like indirections to another structure. (int or longs).
So, finally do no generate CustomHash for keys = byte, char, short, float, double, boolean,
only keep (int, long, Object).

This leads to smaller Jar, quicker Unit tests execution as a bonus.

Comparison to other Collection libraries

Hello,

I'm interested to see a comparison to other libraries - both hppc and hppc-rt seem to be fairly active nowadays, why should I pick hppc-rt? Is it really RT as it claims? How does it compare to fastutil, Javolution, SmoothieMap, Koloboke?

I see you have quite a lot of JMH benchmarks in there, so would be interested to see the results on a wiki page or something.

Or, which would be much, much better - contribute benchmarks for hppc-rt to other libraries so that they have a direct comparison which they can showcase, too.

Thank you,
Petr

Add test coverage for out-of-bounds conditions

In 0.72+, test coverage is complete for the nominal use cases except trivial short methods or small facades, which do not deserve testing.
What is missing is the coverage of out-of bounds conditions:

  • Correct OutOfMemoryError translated into BufferAllocationException (all containers)

    Update 2016.04.30: done by code inspection.

  • Range methods of KTypeIndexedContainer, triggering either InvalidArgumentExceptions , or IndexArrayOutOfBoundsExceptions,

  • assert conditions (all containers).

HPPCRT-1 : initial constructor sizes of some containers do not garantee zero reallocations

This the HPPC-RT version of the HPPC-91 issue.
In the Realtime context, the initially provided size for HashSets and HashMaps must be considered as a true garantee the container will never be realllocated if the number the elements stays <= to the intital constructor size.

Solution is, as suggested in the HPPC-91 discussion,
to take into account the load factor in constructors :
allocateBuffers(roundCapacity((int) (initialCapacity / loadFactor) + 1));

HPPCRT-26: Add specialized IdentityHash containers

Following the remark here, I've decided that maybe a really IdentityHashMap and IdentityHashSet would be worth the performance gain, in place in using CustomHashMap with IdentityHash.

Of couse, System.identityHashmap() supposely being a "perfect hash", no need to Robin-Hood hashing there.

HPPCRT-12 : Make ArrayDeque a KTypeIndexedContainer

Turns out an ArrayDeque is almost a KTypeIndexedContainer with a constant cost calculation address, so it is cool to set() and get() on it, and iterating by index instead of boring and ugly iterators.
Just like Stacks, the meaning of indices are now consistent with de KTypeIndexedContainer, meaning that index 0 means the head, size() - 1 the last element, in a case of the queue.

Consequently the existing removeFirst/LastOccurence() are now consistent with KTypeIndexedContainer and returns indices the abovementionned way.

The almost part means I won't bother support insert() and removeRange(), which are very costly on a circular buffer and not in the spirit of the dequeue at all.
Alternatively, use LinkedList which is incidently a full KTypeIndexedContainer and Dequeue.

HPPCRT-45: API with intervals arguments is inconsistent with JDK conventions

I'll be damned.
The KTypeIndexedContainer methods removeRange(), forEach(procedure/predicate slices) are inconsistent with JDK conventions, and so are KTypeSort.quickSort(beginIndex, endIndex) methods.

The usual is that intervals are expressed as [beginIndex, endIndex[. However, inputs such as 0 <= beginIndex <= endIndex <= size() are all considered valid. The former if OK, but not the later in HPPC-RT.
Which means that the quite usual beginIndex == endIndex idiom meaning null-sized interval, throws exceptions instead of being accepted as it is the case for JDK.
For instance, KTypeSort.quicksort(table, 0, table.size()) with a null-sized table should be accepted, instead of throwing exceptions.

HPPCRT-22 : Follow-up of HPPCRT-18 for more Perturbation-free fixes in Hashs containers.

Complete the HPPCRT-18 issue.
As Dawid Weiss told me in this discussion, the reverse-iteration trick was missing the complete picture : indeed problems could arise for ANY kind of process that iterate the hash container buffer.
In this case then, direct iteration as well as forEach() or removeAll(final KTypePredicate<? super KType> predicate) still iterate the underlying buffer "in-order"
which should create the same kind of trouble as putAll() if such iterations are used to fill other hash containers.

Solution:

  • Also apply reverse iteration to all forEach() of the Hash containers
  • We cannot change removeAll(final KTypePredicate<? super KType> predicate) iteration direction, so add a big warning in Javadoc.
  • Add a Javadoc warning over direct iteration of keys buffer

Also add some benchmarks to show performance dangers of direct iteration.

HPPCRT-43 : Remove Stacks entirely.

Stacks have indeed little added value.

How so:

  • Counter intuitive KTypeIndexedContainer more often than not,
  • ArrayDeque is just fine for the same job, with a clear view of how elements are ordered,
  • I have never used it

To compansate, added some Stack-like methods directly to KTypeArrayList.

HPPCRT-35: Hash to hash batch copying hangs, regression from the working perturbation policy

This is the great return of the problem (that indeed never left) of the hash to hash putAll() hang problem.
First discovered, then fix by the 'perturbation' policy, as it was named then: HPPC-80.
Later, in HPPCRT-18 then in HPPCRT-22 I beleived to have solved the performance impact of perturbation by replacing it by a "reverse iteration".

Alas, the benchmark on which I based my judgement was flawed, as Dawid showed me recently.
Indeed, this simple code hangs bautifully:

public class HashCollisionsCornerCaseTest
{
    @Test
    public void testHashHangs()
    {
        IntOpenHashSet a = new IntOpenHashSet (N);
        for (int i = N; i-- != 0;) {
            a.add(i);
        }
        System.out.println("Start...");
        IntOpenHashSet  b2 = new IntOpenHashSet (P);
        b2.addAll(a);
        System.out.println("End...");
    }
}

With N and P preallocation sizes.
It hangs with N = 1e7 and P = 16 (default)

HPPCRT-18 : Remove Perturbation methods in Hashs and replace with reverse iteration.

Perturbation methods have been introduced by
HPPC-80 : Practical deadlock on populating a set/ map with an iterator over another map
to prevent exessive slowness when putAll().
However, this method introduce a slowdown in the general case.
Use this instead : make the iterator of Hashs to go "in reverse" from the buffer when iterating, so going
in the opposing direction of the filling of the destination container, so side-stepping the "longest conflict chain". BTW, this is exactly what expandAndXXX() does already !

Benchmarks shows a clear performance gain ( BenchmarkPerturbedVsHashedOnly) where we can now putAll() in another hash with the same performance as before, while gaining the same performances as the previous "non-perturbated" versions on all the other methods.

HPPCRT-6: Stack use ArrayList.sort() method for sorting, which leads to counterintuitive sort

By inheritance , the Stack uses the ArrayList.sort() methods. That leads for the stack to be
sorted with the smallest elements at the bottom of the stack, which is surely not the expected behaviour.
3 ways to fix :

  1. Be rough, override on Stack and throw UnsupportedOperationExecption()
  2. Be lazy, document the existing behaviour,
  3. Be a good boy, implement inverse sorts in sorting package, and use a specialized implem for Stacks.
    3-2) Then realize that yoy can actually implement inverse-sort to all others containers too..
    3-3) Damn that means loads of additional unit tests !!!

I've not chosen the option yet.

HPPCRT-23: Make KTtypeIndexedPriorityQueue a IntKTypeMap + API additions

Several sub-tasks:

A) Another API breaking change, meh.
I was not satisfied with the API duplication the IndexedHeapPriorityQueue induced, with
redefinition of all Predicates, Procedures, and so on, while this container is well, just a (K,V) = (int, VType) map.

So I decided to change it and make KTtypeIndexedPriorityQueue a IntVTypeMap<VType> map,
or more pratically a KTtypeIndexedPriorityQueue<KType> implements IntKTypeMap<KType>.
Trouble is, IntKTypeMap do not exist in templates, or would overriding with KTypeVTypeMap<KType, VType> when instantiated.
Finally it turns out to be quite simple, using a poor man partial template specialization in templates
only : just create the IntKType[Map, Procedure|Predicate...] in templates and anotate them
with
/! ${TemplateOptions.doNotGenerateKType("all")} !/
so they are not instantiated at all.
For smother operation, I then modified TemplateProcessor so that
Velocity parsing truely bail-out when ${TemplateOptions.doNotGenerateKType(...) directive arises.
Which also means that contrary to before, the file may even be an invalid Velocity file except for the first lines before /! ${TemplateOptions.doNotGenerateKType("all")} !/ appears.

So well, it works. IndexedHeaps are now truly maps, supporting all methods, forEach(), keys(), values() and all the fun stuff, with detailed Javadoc to explain the differences.

To finish, KTypePriorityQueue.insert() also changed to be KTypePriorityQueue.add() for general
consistency naming usage.

B) API refactorings:
- getComparator() renamed to comparator()
- changePriority() and refreshPriorities() renamed updatePriority() and updatePriorities() to reflect their fonction more.

C) API additions inspired by fastutil heaps API
which are indeed interesting to have :
- updateTopPriority() : update prio of the top() element (all heaps)
- topKey() : gives the key of the top() element for the IndexedHeap

HPPCRT-28: Sort improvements : generic KTypeIndexedContainer sorts and others

Tasks:

A) Because [Double|Float].compare(a,b) is overkill to do everytime for each value couple, do a dedicated NaN pass and proceed afterwards with normal ordering comparions (<,>, ==) which are faster. That is what java.util.Arrays do since forever, after all.

B) Add a stable sort version for objects, in-place. Found this by Thomas Baudel.

C) Add a generic sorts for KTypeIndexedContainer, and so add range sorts to Dequeues, LinkedLists (with big warning for LinkedLists)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.