GithubHelp home page GithubHelp logo

dadhi / imtools Goto Github PK

View Code? Open in Web Editor NEW
225.0 7.0 10.0 8.33 MB

Fast and memory-efficient immutable collections and helper data structures

License: MIT License

C# 99.59% Batchfile 0.16% F# 0.25%
immutable persistent data-structures functional-programming performance value-semantics reference-semantics compare-and-swap lock-free map

imtools's Issues

Consider renaming KV into KeyValue.

KV is not .NET/BCL. Even for newest nugets for .NET do not do such things. How would enterprise developer add reference to this assembly?

Replace the ImMap methods with Update delegate parameter with composable Add and Update methods

To improve the extensibility I have introduced the base GetOrAddEntry and ReplaceEntry methods.
Another important thing is to inversion the control, instead of passing the delegates to operate on the particular type of Entry (either Single entry or Conflicting Hash entry) the GetOrAddEntry is returning the entry, then it can be updated or added based on the type on the consumer side, and then passed to the ReplaceEntry.

Upgrade build system to .NET Core

It is much harder to work with old dotnet for me as VS Code does not runs package.config manually. Plus config and csproj files do conflict in old projects. Plus new projects may build nuget out of csproj.

Review of public Ref(external count, Thread.Yield, error return)

Did some view onto, hope you find some useful.

throw new InvalidOperationException(_errorRetryCountExceeded);

Also it is public, it may deliver some caveats to possible users:

  1. Spin probably should yield for better CPU https://referencesource.microsoft.com/#mscorlib/system/threading/thread.cs,dd960cd58d3d20c1,references
  2. Default counter should in public API, not private hidden surprise in production.
  3. Consider return error(value tuple, option, or by ref return) result from Swap instead of throwing exception. Retry exceed could be normal condition on low level.
  4. Reproduce in test retry count reached via one fast and one slow getNewValue. Until getNewValue is hanged and if Thread.Yield used I guess retry could be made infinite by default.
  5. Consider make it more private-internal if possible.
  6. T is class. Consider adding know primiteves swap like Interlocked for long int etc.
    public static T Swap<T>(ref T value, Func<T, T> getNewValue) where T : class
  7. /// <summary>Compares current Referred value with <paramref name="currentValue"/> and if equal replaces current with <paramref name="newValue"/></summary>
    consider document reference equals instead of equals (to be crystal clear).
  8. CompareAndSwap or other like name may be better name
    public bool TrySwapIfStillCurrent(T currentValue, T newValue) =>
  9. Interlocked.CompareExchange(ref _value, newValue, currentValue) == currentValue;
    does compares by reference, but returns by bool depending on possible == https://docs.microsoft.com/en-us/dotnet/csharp/language-reference/operators/equality-comparison-operator. Same seems here
    if (Interlocked.CompareExchange(ref value, newValue, oldValue) == oldValue)
  10. Consider document usage scenarios, i.e. it seems not for long running operations, not ordered operations (while does not orders getNewValue delegates).
  11. Inspire https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-579.pdf :) and think of laptop Intel with 12 cores or Ryzen with 32 cores https://en.wikipedia.org/wiki/Ryzen (these are commodities now)

Benchmark hybrid immutable array of ImHashMaps

The array will represent a root of data structure to do a initial lookup by hash & (arrLength - 1) where arrLength is power of 2.

The array item would be an Im(Hash)Map tree.

Space saving here:

  1. Hamt root array.
  2. The part of hash used for array reuse for storing the height.

ImHashMap<int, SampleClass> vs MemoryOwner<SampleClass>

Hello,
for my specific project I should be able to use int type as key, therefore I decided to compare ImHashMap and MemoryOwner and what I got surprised me:
ImToolsVsMemoryOwner
BenchmarkProject.zip

I was expecting MemoryOwner to be faster and better in terms of allocation, but I was not prepared for such big difference. So, I'm wondering if I have used ImTools library in the wrong way.

I feel like just using what seems to be the obvious choice is not a guarantee that could be the right choice. So I would like to know what a very skilled and talented developer like you would choose based on the following:

  • A source generator will run at each compilation and will allows me to map keys into int and calculate the exact buffer size needed.
  • The value is a class that store delegates and closures. This means that every while I will have to clear some of the values in order to allow GC. At runtime, with ImTools I can delete and add again the whole entry as needed. With MemoryOwner I'm not sure is possible to resize without performing a full reallocation (ex. if a value is mapped to index 9, that index must be always available for that value because I could need to add that value again). I'm unable to determine the impact on memory usage.
  • I will keep only one static (and lazy loaded) instance of the collection, therefore its lifetime will be the same of the application
  • The collection is mainly used to cache lambda delegates (Func and Action). I'm unable to estimate an exact size, but how many lambda delegates can use an application? 10, 50, 1000?

I initially tried with ImHashMap<Type, Type> and I was surprised on how ImTools performed compared to other solutions. No mapping needed, no source generators needed and no waste of memory. However when I tried to use int keys, I saw a big increase in performance which became gigantic with MemoryOwner and now I'm in doubt on what could be the best solution. I feel like they boths have their pro and con which I'm unable to estimate in order to take an informed decision.
Your help would be really appreciated.
Thanks

Concurrent HashMap

  1. Fix concurrent update to work as expected in all cases, proove it
  2. Improve on current linear probing version to the max
  3. Experiment with new impl., e.g. leap-frog probing

Update

Here is the existing implementation to copy and adapt Ariadne

Test and verify via RelaSharp.

Update 2

Here is the Fibonacci Hashing reveal to fit hash into smaller space, instead of current modulo approach:
https://probablydance.com/2018/06/16/fibonacci-hashing-the-optimization-that-the-world-forgot-or-a-better-alternative-to-integer-modulo/

Review of ArrayTools to use Span, documentation, attributes

Span

As I understand ArrayTools consider arrays as immutable(while .NET devs have build Spans for arrays to be really immutable), may try to apply

public static class ArrayTools

https://msdn.microsoft.com/en-us/magazine/mt814808.aspx?f=255&MSPPError=-2147217396
(these are only couple percentage slower)

Would you accept migration to Span in some future version?

Doc

Methods to work with immutable arrays replace with Methods to work with arrays as immutable

/// <summary>Methods to work with immutable arrays, and general array sugar.</summary>

Attributes

Consider something existing which on arrays:

        [ReliabilityContract(Consistency.WillNotCorruptState, Cer.MayFail)]
        public static int BinarySearch<T>(T[] array, int index, int length, T value, IComparer<T> comparer);

        [ReliabilityContract(Consistency.WillNotCorruptState, Cer.MayFail)]
        [SecuritySafeCritical]
        public static int LastIndexOf(Array array, object value, int startIndex, int count);

	[MethodImpl(MethodImplOptions.AggressiveInlining)]
	public ReadOnlyMemory<T> Slice(int start)

Add ImMap.GetSurePresentValue method

It will save us some checks and will improve the performance in case we know that the key is in the map. And it is not a rare situation because the map is immutable and no one can remove the key without us noticing.

Related: dadhi/DryIoc#322

Provide number of slots to ImMapArray via struct type argument

  • Config settings should be jitted as constants without performance overhead
  • Don't take space in ImMapArray to store the config
struct ImMapArray<V, Config> where Config : struct, IImArrayMapSlotConfig 
{
	public bool TryFind(int key, out V value) =>
		Slots[key & default(Config).SlotMask].TryFind(key & default(Config).KeyMask, out value); 
}


struct DefaultConfig : IImArrayMapSlotConfig
{
	public int SlotCount => 32;
	public int SlotMask  => 31;
	public int KeyMask  => ~31;  
}

Add output of the ImHashMap as mermaid diagram

Example of ToMermaidString<K, V>(this ImHashMap<K, V> map) output:

graph TD
L5_-2122864884(`-2122864884`,`2`,`b`/`-2106679681`,`12`,`12`/`-1525889598`,`18`,`18`/`-1333024111`,`20`,`20`/`-1103748092`,`14`,`14`)
L5P_-2025051795>`-2025051795`,`22`,`22`]-->L5_-2122864884
L2_-756060967(`-756060967`,`7`,`7`/`-697679721`,`17`,`17`)
B2_-897298592[`-897298592`,`4`,`d`]-->L5P_-2025051795
B2_-897298592-->L2_-756060967
L5_-355422310(`-355422310`,`15`,`15`/`-159924239`,`11`,`11`/`-16327267`,`16`,`16`/`561526489`,`1`,`a`/`843106263`,`13`,`13`)
L5P_709486099>`709486099`,`23`,`23`]-->L5_-355422310
L5_1127870384(`1127870384`,`5`,`e`/`1319350685`,`19`,`19`/`1759459841`,`10`,`10`/`2002406165`,`9`,`9`/`2030411590`,`3`,`c`)
L5P_1959121513>`1959121513`,`21`,`21`]-->L5_1127870384
B2_1063643491[`1063643491`,`6`,`6`]-->L5P_709486099
B2_1063643491-->L5P_1959121513
B2_-377816965[`-377816965`,`8`,`8`]-->B2_-897298592
B2_-377816965-->B2_1063643491
Loading

ImMap and ImHashMap should provide enumeration without IEnumerable

What we often need is invoking some method or delegate for each map element. We don't need a separate data structure for this - we may just pass the delegate to the iteration method. The cast of delegate creation usually is much smaller than support for IEnumerable/IEnumerator implementation.

Optimize Im(Hash)Map memory and performance

The list of opportunities:

  • Optimize KeepBalanced, remove thrown away allocations
  • Inline internal _data, benchmark vs inlined, compacted hierarchy
  • Make a struct Enumerator, benchmark
  • (X) Compact leaf nodes
  • [?] Enumerate should return nodes as-is without allocations
  • Benchmark different ways of Equals and GetHashCode. Select the right way.
  • (X) Encode height disbalance in 2 bits of hash and get rid of the height field
    • Seems that there is a way, but it still needs some bits and a separate field.
  • Fix the lookup in empty maps to don't necessarily call GetHashCode

Add ImHashMap23 on par with ImMap23

The newly added maps have a better performance and memory consumption than the V2 maps, so they are replacing the V2 maps and dropping the 23 suffix.

Consistent DU naming

  • Should emphase that the class of union and case is just a wrapper, holder of the nested interface.

  • Should consider named and unnamed union.

  • Should consider the ergonomics of the fact that types are declared once but used often: names of classes in declaration may be longer, but names of implementation nested things shorter.

  • Should consider how is the final stored value access looks like:

case of<A> a: a.val

Try out 2-3-4 tree

todo

  • proof of concept and tests
  • add Leaf 4, 5
  • add right-leaning branch3 and fast lookups
  • polish and improve the perf plus benchmarks
  • add Enumerate
  • add Soft-delete in branches and hard delete in leafs - no need for rebalancing
  • Add under Experimental.ImMap234 name
  • Add bucketed map implementation

why

2-3-4 tree is a self-balanced tree with the much simplier and straightforward balancing comparing to AVL rotations (IMHO).

The plain 2, 3 leaf nodes and 3 branch node compared to AVL binary only branches promises the menory savings.

In addition I may apply some of the technics I did in AVL Experimental.ImMap for further savings and speedup:

  1. Prevent unnecessary temp node creation and construct the final structure. Compared to AVL, t234 has only split-3-leaf/branch case to consider.

  2. Flatten the lower level structure to save space, e.g. Branch2 with the Leaf branches can flattened to the special 4, 5, 6, 7 leafs nodes. We can start with 4 and 5 leaf nodes because the addition won't split them, so we keep current split approach intact.
    Moreover we can hold a single shape for 4 leafs - right-leaning, and hold the centralized shape for 5 leafs. This way the nodes will be locally rebalanced minimizing the braching split!

Here is the current wip tree structure, then I will describe the adjustments for the point 2.

class ImMap
{ 
    static ImMap Empty = new ImMap();
    class Entry : ImMap {}
    class Leaf2 : ImMap {}
    class Leaf3 : Leaf2 {}
    class Branch2 : ImMap {}
    class Branch3 : Branch2 {}
}

Adjustments:

class ImMap
{ 
    static ImMap Empty = new ImMap();
    class Entry : ImMap {}
    class Leaf2 : ImMap {}
    // maybe not need for virtual Leaf lookup *
    class Leaf3 : Leaf2 {} 

    // don't inherit Leaf3 because it is checked for split
    class Branch2Leaf4 : ImMap {}
    class Branch2Leaf5 : Branch2Leaf4 {}

    class Branch2 : ImMap {}
    class Branch3 : Branch2 {}
}

* - Regarding Lookup we may combine iteratition for branches and virtual call (polymorphism) for Leafs and BranchLeafs. But it should be proved by benchmark. So the Leaf3 : Leaf2 won't be needed to speedup lookup, likely it would be faster if the inheritance for Leaf3 is cut to ImMap.

Fast mutable non-concurrent HashMap

I want a fast, configurable for the small sizes and less-allocating variant of HashMap in comparison with Dictionary and DictionarySlim.

Alternative implementation to look for and compare:

Tasks:

  • Implementation, tests and documentation
  • Battle test in FastExpressionCompiler
  • Benchmarks in readme
  • Establish a consistent naming across the other ImTools types, e.g. SmallMap vs FHashMap, EtcMap
  • Add the variant to keep an items on stack for the small map and progressively expand to the heap if needed. Ensure there is no "significant" performance degradation comparing to the full-on heap version

Results:

  • The new SmallMap hash table

Make using of ImTools concurrently with DryIoc

Current simpton of trying to use ImTool with other projects that encapsulates this library is impossible.

It may sounds strange, but makes sense when DryIoc and ImTools itself is on different upgrade path in a project. The container will always be in lower version due to safety and compatibility, while in other parts of the project the new functionality of the ImTools could be benefical.

I would suggest using compile macro usage to differentiate the base namespace between embedded and normal usage of the library, thus making it easy for everyone.

Any response is appreciated. Thanks!

Optimize the case of split B2 to B3 when B2 other leaf still accommodate the Entry

The Branch2 thing at the right with small right leaf but full left leaf:

graph TD
L5_21030449(`21030449`,`21637`,`21637`/`131778175`,`24995`,`24995`/`221540030`,`53169`,`53169`/`507310657`,`41691`,`41691`/`631038222`,`80976`,`80976`)
L5P_364635560>`364635560`,`94562`,`94562`]-->L5_21030449
L5_761574385(`761574385`,`33461`,`33461`/`1049478692`,`15842`,`15842`/`1278461947`,`61688`,`61688`/`1593872025`,`54708`,`54708`/`2037997398`,`34187`,`34187`)
L5P_848679699>`848679699`,`7046`,`7046`]-->L5_761574385
L5PP_752405386>`752405386`,`56521`,`56521`]-->L5P_848679699
B2_648516453[`648516453`,`21318`,`21318`]-->L5P_364635560
B2_648516453-->L5PP_752405386
Loading

Add to the ImHashMap ref state arguments to be propagated into the Update delegates

Pros:

  • The idea is that we pass the struct state deep into the stack when updating (other ops) on Map
  • We may get the result back using this stack, like the old value, some calculated value, etc.
  • Encourage the passing state, by enabling more scenarios where you can pass it, and avoid memory allocation and performance hit introducing closure in delegate.
  • Have the fewer overloads because passing the struct parameter by ref enables to combine multiple state items into one without hurting performance.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.