GithubHelp home page GithubHelp logo

jtmueller / collections.pooled Goto Github PK

View Code? Open in Web Editor NEW
540.0 25.0 48.0 6.68 MB

Fast, low-allocation ports of List, Dictionary, HashSet, Stack, and Queue using ArrayPool and Span.

License: MIT License

C# 100.00%
list dictionary stack arraypool span hashset queue

collections.pooled's Introduction

Collections.Pooled

Master Latest
Build status Build status

This library is based on classes from System.Collections.Generic that have been altered to take advantage of the new System.Span<T> and System.Buffers.ArrayPool<T> libraries to minimize memory allocations, improve performance, and/or allow greater interoperablity with modern API's.

Collections.Pooled supports both .NET Standard 2.0 (.NET Framework 4.6.1+) as well as an optimized build for .NET Core 2.1+. An extensive set of unit tests and benchmarks have been ported from corefx.

Total tests: 27501. Passed: 27501. Failed: 0. Skipped: 0.
Test Run Successful.
Test execution time: 9.9019 Seconds

Installation

NuGet Version

Install-Package Collections.Pooled
dotnet add package Collections.Pooled
paket add Collections.Pooled

Benchmarks

PooledList<T>

PooledList<T> is based on the corefx source code for System.Collections.Generic.List<T>, modified to use ArrayPool for internal array-storage allocation, and to support Span<T>.

There are some API changes worth noting:

  • Find and FindLast have become TryFind and TryFindLast, following the standard .NET "try" pattern.
  • The new PooledList<T>.Span property returns a Span<T> over the portion of the internal array store that is populated. This span can be further sliced, or passed into other methods that can read from or write to a span.
    • Enumerating the Span property can be more than twice as fast as enumerating the list itself, but as we are unable to do the normal "collection modified during enumeration" checks, please use with caution.
    • The list is unable to increment its version-checking flag when setting values via the Span. Therefore, "collection modified during enumeration" checks can be bypassed. Please use with caution.
  • GetRange now returns a Span<T>. For example, foo.GetRange(5, 100) is equivalent to foo.Span.Slice(5, 100).
  • CopyTo now takes a Span<T> and doesn't offer any start-index or count parameters. Use the Span property and slicing instead.
  • AddRange and InsertRange can now accept a ReadOnlySpan<T>.
  • The new AddSpan and InsertSpan methods ensure the internal storage has the capacity to add the requested number of items, and return a Span<T> that can be used to write directly to that section of the internal storage. Caveats about "collection modified during enumeration" checks apply here as well.
  • Delegate types such as Predicate<T> and Converter<T1, T2> have been replaced with standard Func<> equivalents.
  • PooledList implements IDisposable. Disposing the list returns the internal array to the ArrayPool. If you forget to dispose the list, nothing will break, but memory allocations and GC pauses will be closer to those of List<T> (you will still benefit from pooling of intermediate arrays as the PooledList is resized).
  • A selection of ToPooledList() extension methods are provided.
  • You can optionally supply a custom implementation of ArrayPool<T> to the constructor, to be used instead of the default ArrayPool<T>.Shared pool.
  • The clearMode constructor parameter gives you control over whether data is cleared before returning arrays to the ArrayPool.
  • The sizeToCapacity constructor parameter causes the list to start out with Count == Capacity. All entries in the list will have the default value for the type, or if clearMode is set to ClearMode.Never then entries in the list may have a previously-used value from the array pool. This feature is primarily useful when working with value types and avoiding unnecessary allocations.

Performance

Adding items to a list is one area where ArrayPool helps us quite a bit: List Add Benchmarks

PooledDictionary<TKey, TValue>

PooledDictionary<TKey, TValue> is based on the corefx source code for System.Collections.Generic.Dictionary<TKey, TValue>, modified to use ArrayPool for internal storage allocation, and to support Span<T>.

There are some API changes worth noting:

  • New methods include: AddRange, GetOrAdd, AddOrUpdate
  • Both constructors and AddRange can take a sequence of KeyValuePair<TKey, TValue> objects, or a sequence of ValueTuple<TKey, TValue> objects.
  • Significantly reduced memory allocations when adding many items.
  • PooledDictionary implements IDisposable. Disposing the dictionary returns the internal arrays to the ArrayPool. If you forget to dispose the dictionary, nothing will break, but memory allocations and GC pauses will be closer to those of Dictionary<TKey, TValue> (you will still benefit from pooling of intermediate arrays as the PooledDictionary is resized).
  • A selection of ToPooledDictionary() extension methods are provided.
  • The ClearMode constructor parameter gives you control over whether data is cleared before returning arrays to the ArrayPool.

Performance

Adding to dictionaries is where using ArrayPool really has an impact: Dictionary Add Benchmarks

PooledSet<T>

PooledSet<T> is based on the corefx source code for System.Generic.Collections.HashSet<T>, modified to use ArrayPool for internal storage allocation, and to support ReadOnlySpan<T> for all set functions.

  • Constructors, and all set methods have overloads that accept ReadOnlySpan<T>.
  • There are also overloads that accept arrays but forward to the span implementation. They are present to avoid ambiguous method calls.
  • PooledSet slightly outstrips HashSet in performance in almost every case, while allocating less memory. In cases where the internal arrays must be resized, PooledSet enjoys a substantial advantage.
  • PooledSet implements IDisposable. Disposing the set returns the internal arrays to the ArrayPool. If you forget to dispose the set, nothing will break, but memory allocations and GC pauses will be closer to those of HashSet<T> (you will still benefit from pooling of intermediate arrays as the PooledSet is resized).
  • A selection of ToPooledSet() extension methods are provided.
  • The ClearMode constructor parameter gives you control over whether data is cleared before returning arrays to the ArrayPool.

Performance

Here's what pooling does for us when adding to a PooledSet.

Set Add Benchmarks

PooledStack<T>

PooledStack<T> is based on the corefx source code for System.Generic.Collections.Stack<T>, modified to use ArrayPool for internal storage allocation.

  • Other than the ability to pass Spans into the constructor, the only other API change from Stack<T> is the addition of the RemoveWhere method: because sometimes you just need to remove something from a stack.
  • Significantly reduced memory allocations when pushing many items.
  • PooledStack implements IDisposable. Disposing the stack returns the internal array to the ArrayPool. If you forget to dispose the stack, nothing will break, but memory allocations and GC pauses will be closer to those of Stack<T> (you will still benefit from pooling of intermediate arrays as the PooledStack is resized).
  • A selection of ToPooledStack() extension methods are provided.
  • You can optionally supply a custom implementation of ArrayPool<T> to the constructor, to be used instead of the default ArrayPool<T>.Shared pool.
  • The ClearMode constructor parameter gives you control over whether data is cleared before returning arrays to the ArrayPool.

Performance

Once again, pushing to a stack shows off some of the advantages of using ArrayPool: Stack Push Benchmarks

PooledQueue<T>

PooledQueue<T> is based on the corefx source code for System.Generic.Collections.Queue<T>, modified to use ArrayPool for internal storage allocation.

  • Other than the ability to pass Spans into the constructor, the only other API change from Queue<T> is the addition of the RemoveWhere method: because sometimes you just need to remove something from a queue.
  • Significantly reduced memory allocations when enqueueing many items.
  • PooledQueue implements IDisposable. Disposing the queue returns the internal array to the ArrayPool. If you forget to dispose the queue, nothing will break, but memory allocations and GC pauses will be closer to those of Queue<T> (you will still benefit from pooling of intermediate arrays as the PooledQueue is resized).
  • A selection of ToPooledQueue() extension methods are provided.
  • You can optionally supply a custom implementation of ArrayPool<T> to the constructor, to be used instead of the default ArrayPool<T>.Shared pool.
  • The ClearMode constructor parameter gives you control over whether data is cleared before returning arrays to the ArrayPool.

Performance

Queue Enqueue Benchmarks

collections.pooled's People

Contributors

ares128 avatar dzmitry-lahoda avatar ipavel83 avatar jtmueller avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

collections.pooled's Issues

PooledDictionary get fails with ValueTuple key

Describe the bug
Lookups fail in a PooledDictionary<(int, int), string> but succeed in a Dictionary<(int, int), string>.

To Reproduce

class Program
{
    private static void PooledTest()
    {
        var dict = new PooledDictionary<(int, int), string>
        {
            { (1, 0), "1,0" },
            { (2, 0), "2,0" },
            { (0, 1), "0,1" },
            { (0, 2), "0,2" }
        };

        bool found1 = dict.TryGetValue((1, 0), out var _);
        bool found2 = dict.TryGetValue((2, 0), out var _);
        bool found3 = dict.TryGetValue((0, 1), out var _);
        bool found4 = dict.TryGetValue((0, 2), out var _);

        Console.WriteLine("PooledDictionary: {0}, {1}, {2}, {3}", found1, found2, found3, found4);
    }

    private static void DictTest()
    {
        var dict = new Dictionary<(int, int), string>
        {
            { (1, 0), "1,0" },
            { (2, 0), "2,0" },
            { (0, 1), "0,1" },
            { (0, 2), "0,2" }
        };

        bool found1 = dict.TryGetValue((1, 0), out var _);
        bool found2 = dict.TryGetValue((2, 0), out var _);
        bool found3 = dict.TryGetValue((0, 1), out var _);
        bool found4 = dict.TryGetValue((0, 2), out var _);

        Console.WriteLine("Dictionary: {0}, {1}, {2}, {3}", found1, found2, found3, found4);
    }

    static void Main(string[] args)
    {
        DictTest();
        PooledTest();
    }
}

Expected behavior

Dictionary: True, True, True, True
PooledDictionary: False, False, False, True

We should get the same results from both dictionary types, without having to specify custom comparers.

Add a Span-based constructor overload to PooledStack

Is your feature request related to a problem? Please describe.

If foo is PooledList<int>...
new PooledStack<int>(foo.Span)
...does not work without an explicit cast.

Describe the solution you'd like
It would be nice to not have to cast. There may be other pooled collections to which this applies.

GC of boxing if key is struct

Is your feature request related to a problem? Please describe.

In Unity runtime if key is struct, than key == null produced garbage of boxing. Probably same happens within .NET Core.

Describe the solution you'd like

Could check !typeof(TKey).IsValueType before comparing with null, or may be is null could do that already. How fast is that check?

Describe alternatives you've considered
Copy and paste whole code.

Add support for Tuple<TKey, TValue> to dictionary constructor/AddRange

Is your feature request related to a problem? Please describe.
We already support ValueTuple, but there are cases where support for Tuple would be convenient, particularly in F# to make an equivalent to the dict function.

Describe the solution you'd like
Constructors, AddRange, and extension methods for PooledDictionary should support IEnumerable<Tuple<TKey, TValue>>. No need to support spans of same.

Make PooledStack Indexable and Insertable

Is your feature request related to a problem? Please describe.
Although not need for my primary use cases, there are some situations where I'd find it beneficial for a Stack implementation to be indexable and insertable. Unfortunately, most implementations do not support this, therefore I have to use a List and it's Add and RemoveAt and forgo the optimizations done for Pop, Peak, and Push.

Describe the solution you'd like
Add an Indexer and implement Insert(int index, T item) for PooledStack

Binary Serialization

Describe the bug
This is a reminder to test a likely issue that's not been verified yet: it's likely that when pooled collections are serialized and then deserialized, disposing them causes an exception because the deserialized arrays do not come from the ArrayPool, which throws an error if you return to it an array it didn't create.

Expected behavior
Either ArrayPool returns should be in a try/catch block, or deserialization should copy the deserialized array into an ArrayPool array instead of using it directly.

Support for .NET Standard 2.1

.NET Standard 2.1 is in Preview (as part of .NET Core 3.0) as of this writing. The current version...

  • Is implemented by .NET Core 3.0 but not .NET Core 2.1 or 2.2.
  • Adds Span, Memory, ArrayPool, and support for the C# 8 Index and Range syntax
  • Fully supports everything Collections.Pooled needs.

The downside is this:

  • Currently we target .NET Standard 2.0 (with NuGet packages for Span and ArrayPool) and .NET Core 2.1 (with a higher-performance built-in Span)
  • It would be nice to target only .NET Standard 2.0 and 2.1, without targeting any specific frameworks. However, this would mean that .NET Core 2.1/2.2 clients (which don't support .NET Standard 2.1) would get the Standard 2.0 version of the library with the slower implementation of Span, even though there's a faster one built in. Maybe type redirection would cause it to use fast-Span regardless, but I'm not sure.
  • However, keeping the .NET Core 2.1 target while adding .NET Standard 2.1 is also a problem. A .NET Core 3.0 client (at least as of the current preview) will default to loading the .NET Core 2.1 version of the library when presented with a choice between Core 2.1 and Standard 2.1. I guess that specific frameworks trump standards, even when the supported standard is newer. This means that .NET Core 3.0 clients would not get the version of the library that supports Indexes and Ranges.
  • To fix that last issue, I would have to target four frameworks: Standard 2.0 and 2.1, Core 2.1 and 3.0. This list would likely keep growing over time. I don't want to do this, because it complicates conditional compilation, and I'd rather target a small number of standards than many different versions of .NET Core.

My current plan is this, when the final release of .NET Core 3 comes out:

  • Increment the major version number, as this is a breaking change for .NET Core 2.1/2.2 users.
  • Target only .NET Standard 2.0 and 2.1.

By incrementing the major version, .NET Core 2.x users will not auto-upgrade to the new release, and can keep using Collections.Pooled 1.x. Users of .NET Core 3 will get the Standard 2.1 version, and all other users will get Standard 2.0 (until another framework adds support for Standard 2.1 of course).

This code is currently fully implemented in the core-3 branch if you want to experiment with it.

@dzmitry-lahoda and @Wixred I'm tagging you because you've shown an interest. Do you have any objection to this plan, or any alternate suggestions?

Pooled Array

It would be great if there was PooledArray implementation along w/ the it's supporting LINQ methods (ToPooledArray etc)

Performance of dictionary

I looked into dozen of dict benchmarks. I do not see when pooled is faster. In some cases it is even slower for some reason. Could you please document?

PooledList.Dispose semantics

As I see current semantics is to return array into pool, set capacity-size to 0, and allow use later.

        /// <summary>
        /// Returns the internal buffers to the ArrayPool.
        /// </summary>
        public void Dispose()
        {
            ReturnArray();
            _size = 0;
            _version++;
        }

This contradicts with corefx classes Dispose when you cannot use object after Dispose.

Real semantic seems to be Dispose = Reallocate(capacity: 0). ReturnPool => Reallocate(capacity: 0). I suggest to rename Dispose to something else.

Or document:

        /// <summary>
        /// Returns the internal buffers to the ArrayPool and sets capacity to zero. Instance still can be used.
        /// </summary>

Add support for PooledQueue

Is your feature request related to a problem? Please describe.
PooledQueue is the only major non-Sorted collection left to support.

Describe the solution you'd like
Implement PooledQueue, as with the others.

Add suppport for custom ArrayPools

Is your feature request related to a problem? Please describe.
Currently the pooled collections are all hard-coded to use ArrayPool<T>.Shared, but ArrayPool<T> is an abstract class intended to allow for custom implementations. For example, on a resource-constrained device someone might need an ArrayPool of a fixed size, or that pre-allocates arrays or something. At the moment there is no way to get PooledCollections to make use of such a custom ArrayPool.

Describe the solution you'd like
We need a way to optionally specify a custom ArrayPool, and default to ArrayPool<T>.Shared if none is provided. However, passing in an instance of a custom ArrayPool wouldn't work for PooledDictionary or PooledSet, because those types rely on arrays of a private type inaccessible from outside.

Perhaps there's a way to use a generic type parameter to specify the type of a custom ArrayPool, and then create an instance of this type inside the collection code?

Resurrect rented buffers or arrays in finalyzer

Has it been considered to provide an option of returning the rented buffers/arrays from the collection finalyzer. May be useful when disposing of a pooled collection instance is too complicated or impossible to enforce. For instance, a collection is being returned from some library as its base interface (IList as an example).

To not waste anybody's time I am not providing possible ways to implementing such feature until know for sure it is not a duplicate to some other, known request or issue.

Thank you.

PooledSet is generating GC when item gets removed

in my test HashSet is much faster and 0 gc

using System.Collections.Generic;
using Collections.Pooled;
using Gma.DataStructures;
using UnityEngine;

public class Test169 : bs
{
    HashSet<int> HashSet = new HashSet<int>(){1,2,3,4,5,6,7,8,9};
    PooledSet<int> PooledSet = new PooledSet<int>(){1,2,3,4,5,6,7,8,9};
    
    public void Update()
    {
        
        using (bs.Profile("pooled"))
            for (int i = 0; i < 1000; i++)
            {
                PooledSet.Remove(2);
            }
        
        using(bs.Profile("list"))
            for (int i = 0; i < 1000; i++)
            {
                HashSet.Remove(2);
            }
        // Debug.Log(dd);
    }
}

linq benefits from pooled?

Hi,

I've update my project to pooled collections, but it seems time costs has not been imporoved. I've used a lot of linq technologies in my project. So I wonder does linq benefits from Pooled?

Thank you.

PooledDictionary.Remove with Count 0 = Exception

Describe the bug
PooledDictionary.Remove with Count 0 = Exception

To Reproduce

var dict = new PooledDictionary<string, string>();
dict.Remove("Something");

Expected behavior
return without doing anything.

PooledList: the indexer-get operation is 9-10 times slower than that of List, but Span is fine

PooledList: the indexer-get operation is 9-10 times slower than that of List, despite being the exact same code, as far as I can tell! I haven't found any way of closing that gap other than removing the bounds-check, which isn't a good idea. I suspect some sort of List-specific optimization in the framework that PooledList can't benefit from. However, indexing into PooledList.Span is 84% faster than indexing into List, so there's that.

https://github.com/jtmueller/Collections.Pooled/blob/master/docs/benchmarks/netcoreapp2.2/List_Indexer_Types-report-github.md

I will look into this.

Aggressive optimization and inlining foor PooledList.Span

Is your feature request related to a problem? Please describe.
Next method is kind of slow in Unity runtime (not sure about Core):

public Span<T> Span => _items.AsSpan(0, _size);

Describe the solution you'd like
Mark this method as aggressive optimization and inlining.

Describe alternatives you've considered
Provide unsafe access to writable array as ReadOnlyProperty.

Additional context
We get data from Span by ref.

I know Core 3 optimized Span, but there are issues to measure that because of BDN-JIT instability. Same time Unity is involved into .NET Ecosystem and hopefully will optimize Span in some time (because people will use .NET Standard with Spans and Unity stated it is 2.0 and will be 2.1)

Pooled Lookup

Would it be possible to have a PooledLookup implementation that implements ILookup?

This would drastically help us in scenarios we have a large (>1m objects), intermediate lookup we create that gets GCā€™d right away.

Nested pooled collections crash the Visual Studio heap snapshot

Describe the bug
When nesting pooled collections inside each other, the debugger crashes when taking a heap snapshot.

To Reproduce
Create a nested collection, for example PooledDictionary<int, PooledSet<int>> and attempt to take a heap snapshot in the Visual Studio debugger. This occurs in both the prerelease and 1.0 Nuget packages.

Expected behavior
The debugger should not crash.

Is it also possible that nesting pooled collections is just not recommended usage? I'm not really sure how it works internally.

Retrieve Memory<T> from PooledList<T>

I am new to Span and Memory, and am trying to migrate my library to them. I discovered PooledList when looking into how to AddRange a Span to a List. My initial migration increased the warm of my program by 30% (parsing a large data file and loading it into different lists and dictionaries in memory). Thanks!
However I did find one point that I needed to work around:
It currently isn't possible to get a Memory from PooledList.
Being able to access a Memory from the PooledList will allow me to pass around heap storable references of sub-ranges which makes the PooledList even more flexible. I'd be able to use references to it with Linq IEnumerable and also be able to use it in local functions. Ie. it would free the references for the restrictions that are required by Span.

I understand that PooledList is built upon ArrayPool which means that returning a Memory will cause a memory leak when expanding the PooledList.
Is it possible to implement this functionality - or would it come as an unacceptable tradeoff to the performance/complexity of the library?

Thanks again

new PooledList<PlayerState>(Players.Count, preallocate : true)

Is your feature request related to a problem? Please describe.
Zero overhead structs clone.

Describe the solution you'd like

            PooledList<PlayerState> players = null;
            if (Players != null)
            {
                players = new PooledList<PlayerState>(Players.Count, preallocate : true); // i guess preallocated just sets _size = capacity;
                for (var i = 0; i < players.Count; i++)
                {
                    PlayerState.Clone(in Players.Span[i], ref players.Span[i]);
                }
            }

Describe alternatives you've considered

            PooledList<PlayerState> players = null;
            if (Players != null)
            {
                players = new PooledList<PlayerState>(Players.Count);
                for (var i = 0; i < players.Count; i++)
                {
                    // Clone will fail until
                    players[i] = default; // overhead to create 200+ byte struct on stack and pass it into indexer (not sure if optimized away by compilers?)
                    PlayerState.Clone(in Players.Span[i], ref players.Span[i]);
                }
            }

Additional context
Usual List could not do that because it would leak abstraction (that it is actually array list).
PooledList already leaks Span as property and ArrayPool as ctor arg(also ctor arg is no of so problem). So adding that method will not increase leakage, but will allow more performance uses for structs.

Add ability to force clear of pooled data

Is your feature request related to a problem? Please describe.
Pooled collections have different behavior between .NET Core 2.1+ and .NET Standard 2.0. Because .NET Standard has no mechanism to easily determine if a value type contains a reference type, the .NET Standard version always clears values before returning arrays to the array pool.

The .NET Core 2.1+ version has an optimization that only clears values before returning arrays when those values are reference types or contain reference types. This helps performance, but some users might be concerned about leaking sensitive data to other code running in the same process, and might want even value-type data to be cleared before returning arrays to the pool.

Describe the solution you'd like
Pooled collections should allow the user to specify an "always clear" behavior - at least in the .NET Core 2.1+ version.

Support for ObservableCollection

Would be amazing if you could add support for ObservableCollection as it's the only collection type that fully supports binding. If you add support I would recommend or suggest you also add a AddRange method as the original collection hasn't but for performance reasons it's a must-have. There are many modified versions that added such as a method but none which is optimized with ArrayPool etc.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    šŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. šŸ“ŠšŸ“ˆšŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ā¤ļø Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.