GithubHelp home page GithubHelp logo

spaskalev / buddy_alloc Goto Github PK

View Code? Open in Web Editor NEW
134.0 4.0 10.0 592 KB

A single header buddy memory allocator for C & C++

Home Page: https://spaskalev.github.io/buddy_alloc/

License: BSD Zero Clause License

C 97.57% Makefile 1.90% CMake 0.34% C++ 0.19%
allocator allocators memory-allocation memory-management memory-allocator buddy-allocator c single-header-lib osdev

buddy_alloc's Introduction

buddy_alloc

A buddy memory allocator for C

Status

Licensing

This project is licensed under the 0BSD license. See the LICENSE.md file for details.

Overview

This is a memory allocator suitable for use in applications that require predictable allocation and deallocation behavior. The allocator's metadata is kept separate from the arena and its size is a function of the arena and minimum allocations sizes.

Features

  • Bounded allocation and deallocation cost
  • Fixed call stack usage, no recursion, no global state
  • C99-compatibility for code and tests
  • 100% line and branch test coverage
  • Supports 32-bit and 64-bit platforms
  • Endian-agnostic, works on both LE and BE
  • Compiles with GCC, Clang, MSVC and Pelles C

Usage

Initializing and using the buddy allocator with metadata external to the arena is done using the buddy_init function.

size_t arena_size = 65536;
/* You need space for the metadata and for the arena */
void *buddy_metadata = malloc(buddy_sizeof(arena_size));
void *buddy_arena = malloc(arena_size);
struct buddy *buddy = buddy_init(buddy_metadata, buddy_arena, arena_size);

/* Allocate using the buddy allocator */
void *data = buddy_malloc(buddy, 2048);
/* Free using the buddy allocator */
buddy_free(buddy, data);

free(buddy_metadata);
free(buddy_arena);

Initializing and using the buddy allocator with metadata internal to the arena is done using the buddy_embed function.

size_t arena_size = 65536;
/* You need space for arena and builtin metadata */
void *buddy_arena = malloc(arena_size);
struct buddy *buddy = buddy_embed(buddy_arena, arena_size);

/* Allocate using the buddy allocator */
void *data = buddy_malloc(buddy, 2048);
/* Free using the buddy allocator */
buddy_free(buddy, data);

free(buddy_arena);

Metadata sizing

The following table documents the allocator metadata space requirements according to desired arena (8MB to 1024GB) and alignment/minimum allocation (64B to 8KB) sizes. The resulting values are rounded up to the nearest unit.

         |     64B |   128B |   256B |   512B |    1KB |    2KB |    4KB |    8KB |
---------+---------+--------+--------+--------+--------+--------+--------+--------+
    8 MB |    65KB |   33KB |   17KB |    9KB |    5KB |    3KB |    2KB |   678B |
   16 MB |   129KB |   65KB |   33KB |   17KB |    9KB |    5KB |    3KB |    2KB |
   32 MB |   257KB |  129KB |   65KB |   33KB |   17KB |    9KB |    5KB |    3KB |
   64 MB |   513KB |  257KB |  129KB |   65KB |   33KB |   17KB |    9KB |    5KB |
  128 MB |     2MB |  513KB |  257KB |  129KB |   65KB |   33KB |   17KB |    9KB |
  256 MB |     3MB |    2MB |  513KB |  257KB |  129KB |   65KB |   33KB |   17KB |
  512 MB |     5MB |    3MB |    2MB |  513KB |  257KB |  129KB |   65KB |   33KB |
    1 GB |     9MB |    5MB |    3MB |    2MB |  513KB |  257KB |  129KB |   65KB |
    2 GB |    17MB |    9MB |    5MB |    3MB |    2MB |  513KB |  257KB |  129KB |
    4 GB |    33MB |   17MB |    9MB |    5MB |    3MB |    2MB |  513KB |  257KB |
    8 GB |    65MB |   33MB |   17MB |    9MB |    5MB |    3MB |    2MB |  513KB |
   16 GB |   129MB |   65MB |   33MB |   17MB |    9MB |    5MB |    3MB |    2MB |
   32 GB |   257MB |  129MB |   65MB |   33MB |   17MB |    9MB |    5MB |    3MB |
   64 GB |   513MB |  257MB |  129MB |   65MB |   33MB |   17MB |    9MB |    5MB |
  128 GB |  1025MB |  513MB |  257MB |  129MB |   65MB |   33MB |   17MB |    9MB |
  256 GB |  2049MB | 1025MB |  513MB |  257MB |  129MB |   65MB |   33MB |   17MB |
  512 GB |  4097MB | 2049MB | 1025MB |  513MB |  257MB |  129MB |   65MB |   33MB |
 1024 GB |  8193MB | 4097MB | 2049MB | 1025MB |  513MB |  257MB |  129MB |   65MB |

Design

The allocator was designed with the following requirements in mind.

  • Allocation and deallocation operations should behave in a similar and predictable way regardless of the state of the allocator.
  • The allocator's metadata size should be predictable based on the arena's size and not dependent on the state of the allocator.
  • The allocator's metadata location should be external to the arena.
  • Returned memory should be aligned to known and specified block size.

The following were not design goals

  • To be used by multiple threads at the same time without additional locking.
  • To be a general purpose malloc() replacement.

Rationale

Why use a custom allocator (like buddy_alloc) ?

A custom allocator is useful where there is no system allocator (e.g. on bare-metal) or when the system allocator does not meet some particular requirements, usually in terms of performance or features. The buddy_alloc custom allocator has bounded performance and bounded storage overhead for its metadata. The bounded performance is important in time-sensitive systems that must perform some action in a given amount of time. The bounded storage overhead is important for ensuring system reliability and allows for upfront system resource planing.

A common example of systems that require both bound performance and bounded storage overhead from their components are games and gaming consoles. Games are time-sensitive in multiple aspects - they have to render frames fast to ensure a smooth display and sample input regularly to account for player input. But just fast is not enough - if an allocator is fast on average but occasionally an operation happens to be an order of magnitude slower this will impact both the display of the game as well as the input and may frustrate the player. Games and game consoles are also sensitive to their storage requirements - game consoles usually ship with fixed hardware and game developers have to optimize their games to perform well on the given machines.

A custom allocator can supplement the system allocator where needed. A parser that is parsing some structured data (e.g. a json file) may need to allocate objects based on the input's structure. Using the system allocator for this is a risk as the parser may have a bug that causes it to allocate too much or the input may be crafted in such a way. Using a custom allocator with a fixed size for this sort of operations allows the operation to fail safely without impacting the application or the overall system stability.

An application developer may also need object allocation that is relocatable. Using memory representation as serialization output is a valid technique and it is used for persistence and replication. The buddy_alloc embedded mode is relocatable allowing it to be serialized and restored to a different memory location, a different process or a different machine altogether (provided matching architecture and binaries).

With the introduction of the buddy_walk function the allocator can be used to iterate all the allocated slots with its arena. This can be used for example for a space-bounded mailbox where a failure to allocate means the mailbox is full and the walk can be used to process its content. This can also form the basis of a managed heap for garbage collection.

Implementation

+-------------+                  +----------------------------------------------------------+
|             |                  | The allocator component works with 'struct buddy *' and  |
|  allocator  +------------------+ is responsible for the allocator interface (malloc/free )|
|             |                  | and for interfacing with the allocator tree.             |
+------+------+                  +----------------------------------------------------------+
       |
       |(uses)
       |
+------v------+                   +------------------------------------------------------+
|             |                   | The allocator tree is the core internal component.   |
|  allocator  +-------------------+ It provides the actual allocation and deallocation   |
|    tree     |                   | algorithms and uses a binary tree to keep its state. |
|             |                   +------------------------------------------------------+
+------+------+
       |
       |(uses)
       |
+------v------+                   +---------------------------------------------------+
|             |                   | The bitset is the allocator tree backing store.   |
|   bitset    +-------------------+                                                   |
|             |                   | The buddy_tree_internal_position_* functions map  |
+-------------+                   | a tree position to the bitset.                    |
                                  |                                                   |
                                  | The write_to and read_from (internal position)    |
                                  | functions encode and decode values in the bitset. |
                                  |                                                   |
                                  | Values are encoded in unary with no separators as |
                                  | the struct internal_position lists their length.  |
                                  | The unary encoding is faster to encode and decode |
                                  | on unaligned boundaries.                          |
                                  +---------------------------------------------------+

Metadata

The allocator uses a bitset-backed perfect binary tree to track allocations. The tree is fixed in size and remains outside of the main arena. This allows for better cache performance in the arena as the cache is not loading allocator metadata when processing application data.

Allocation and deallocation

The binary tree nodes are labeled with the largest allocation slot available under them. This allows allocation to happen with a limited number of operations. Allocations that cannot be satisfied are fast to fail. Once a free node of the desired size is found it is marked as used and the nodes leading to root of the tree are updated to account for any difference in the largest available size. Deallocation works in a similar way - the allocated block size for the given address is found, marked as free and the same node update as with allocation is used to update the tree upwards.

Fragmentation

To minimize fragmentation the allocator will pick the more heavily-used branches when descending the tree to find a free slot. This ensures that larger continuous spans are kept available for larger-sized allocation requests. A minor benefit is that clumping allocations together can allow for better cache performance.

Space requirements

The tree is stored in a bitset with each node using just enough bits to store the maximum allocation slot available under it. For leaf nodes this is a single bit. Other nodes sizes depend on the height of the tree.

Non-power-of-two arena sizes

The perfect binary tree always tracks an arena which size is a power-of-two. When the allocator is initialized or resized with an arena that is not a perfect fit the binary tree is updated to mask out the virtual arena complement to next power-of-two.

Resizing

Resizing is available for both split and embedded allocator modes and supports both growing the arena and shrinking it. Checks are present that prevent shrinking the arena when memory that is to be reduced is still allocated.

Users

If you are using buddy_alloc in your project and you would like project to be featured here please send a PR or file an issue. If you like buddy_alloc please star it on GitHub so that more users can learn of it. Thanks!

  • Use in game development - 1
  • Use in OS kernels - 1, 2
  • Use in user-space software - 1
  • Use in scientific software - 1

buddy_alloc's People

Contributors

anders-zpt avatar itay2805 avatar oreyg avatar photoszzt avatar spaskalev avatar xtexchooser avatar yunhsiao avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

buddy_alloc's Issues

Expose the buddy_embed_offset function

The purpose of the embedded mode is to be able to copy/relocate/serialize&deserialize/etc and arena as a single span with the allocator metadata inside of it. The returned pointer from buddy_embed points within the arena but users currently have no way to get the buddy pointer again when they move the arena.

The buddy_embed_offset function solves this internally and should be available to users for this purpose.

Optimize buddy tree resize

The current implementation builds on the tree abstraction. It is possible to achieve the same using ranged shifts on the backing bitset which should be faster just by the virtue of performing less operations.

alignment is not passed to buddy_sizeof

Describe the bug

size_t size = buddy_sizeof(memory_size);

The alignment is not passed to the buddy_sizeof. It should be buddy_sizeof_alignment.

Expected behavior
A clear and concise description of what you expected to happen.

To Reproduce
Provide a minimal code example that demonstrates the bug.

virtual slots should be marked on the left side of the tree

Currently virtual slots for non-power-of-two arenas are marked on the right side of the tree. This makes find_free prefer the right side for allocations (until full). Nothing wrong with that as far as regular usage goes but if there are actual allocations on the right it makes downsizing the arena impossible. If the virtual slots were put on the left side of the tree the bias will be on the left which will allow for downsizing on the right (provided that it is free of course).

Revisit the first fit algorithm after recent speed improvements

The change in 9126073 reduced full position reads substantially but still needs them every now and then to implement fragmentation minimizing allocation. Full position reads (as well as unavoidable branch misses because of the tree) are the most time-consuming operations done when allocating.

When extra fragmentation isn't that much of a problem (e.g. for a managed heap that will be eventually defragmented regardless of it) going with a biased (left-fit or right-fit) approach can result in additional speed improvements.

optimize read_from_internal_position

The flamegraph shows that the next flat spot is in read_from_internal position. The read (and write) currently operate on a single bit in a loop - using masks to fetch/set larger fragments should help

Convincing Use Cases

Are required to give you some authority. For example stack-allocated metadata?

Something like:

size_t arena_size = 65536;
/* STACK space for the metadata and for the arena */
void *buddy_metadata = alloca(buddy_sizeof(arena_size));
/* Assumption is this works too ? */
void *buddy_arena = alloca(arena_size);
struct buddy *buddy = buddy_init(buddy_metadata, buddy_arena, arena_size);

/* Allocate using the buddy allocator */
void *data = buddy_malloc(buddy, 2048);

/* freeing not necessary */

More Compilers Support?

Great library, really helped a lot by doing exactly the thing we needed!

Any chance to support more compilers/environments? It's basically pretty straightforward, by making some simple changes we were able to make it work in our target environments: (right now an MSVC cpp project only but soon to be tested on many more platforms)
https://github.com/YunHsiao/buddy_alloc/tree/cpp

Not sure it's the best way to do this, but you are welcomed to take any references!

16 byte alignment

I'd like 16 byte alignment (for SIMD). I noticed the codebase uses sizeof(size_t) throughout and it's not immediately clear if simply changing BUDDY_ALLOC_ALIGN is enough.

buddy alignment doesn't pass to buddy_init_alignment

Describe the bug
In this line in buddy_embed_alignment,

struct buddy *buddy = buddy_init(main+result.offset, main, result.offset);

Why the alignment are not passed to buddy_init_alignment but using the default alignment value?

Expected behavior
I'm not sure this is a bug. But I expect to use buddy_init_alignment instead of buddy_init.

To Reproduce
Provide a minimal code example that demonstrates the bug.

Add a realloc-like function that ignores data

Describe the request
When using the allocator to handle dynamic buffers the buffers could be resized on demand and reduced to some fixed default after processing. Doing this with buddy_realloc will incur a memcpy operation if the resulting slots don't have the same address. If the user does not care about the data after processing a resize request could ignore the the data, e.g. a trim-like realloc.

Justification
This will provide a speed improvement for certain uses.

Add a build check that detects and fails on recursion

Certain industrial coding standards disallow the use of recursion. (FWIW they also disallow the use of dynamic memory allocation ;)) The only place where recursion is currently used in the allocator is in the tree debug function.

Memoize size_for_order

The current flamegraph shows a fair amount of time spent in size_for_order. Since for the majority of the calls the tree order doesn't change we can cache the results for it in a move to front buffer.

Tree resize use where the tree is morphed in place should continue using the existing calculation.

Return a status from buddy_safe_free

The intention for buddy_safe_free is to avoid freeing a slot just because it matches in location but doesn't match in size. A call like this is unconditionally a bug in the calling code and calling code should be made aware of it so that it can assert and fail early (as opposed to currently the free doing nothing and leaking memory, therefore delaying the failure until the arena is eventually unable to satisfy a request)

abstract the tree walk state machine

The tree walk state machine code is repeated with a couple of variations - in buddy_walk, in buddy_debug, in check_invariant, in fragmentation report. It should be made generic.

Support compiling with Pelles C

Pelles C is a freeware C compiler and IDE for windows. It currently fails to compile the allocator due to lacking ssize_t. It would also be nice to have it in the CI/CD tests.

Memoization of the tree path during allocation for use in update_parent_chain

The find_free function traverses the tree downwards to find a suitable slot. After the slot is marked the update_parent chain traverses the tree upwards to restore the invariant and update the parent nodes. This means that nodes are read twice - during the descend and during the ascend (with an early terminating condition on the ascend).

If the status of the nodes is stored during the descend it can be used for quick reading during the ascend - and since this allocator is mostly bound by reading its tree this should provide some speedup at the cost of using some scratch memory during allocation.

Support range reservations

When used as a physical memory allocator reserving certain ranges can be helpful. A helper function can be added for this that would mark the slots covering the required range as already-allocated.

Convincing Benchmarking

Will not ruin this lib. It will just show honestly how slow or fast it is, compared to standard malloc and friends. And on different platforms, too.

I use and recommend UBENCH. Hint: it is a single header and small but eminently usable.

Thus it will not be a lot of work to repeat the comparisons on all the OS-es that matter.

BUDDY_ALLOC_ALIGN default definition is misleading

Using #define BUDDY_ALLOC_ALIGN (sizeof(size_t) * CHAR_BIT) doesn't actually align it to the target size but aligns it to the result that can hold the expression.

This doesn't affect functionality but works by chance - a better definition is needed.

Add a way to visualize the state of the allocator

While the buddy_debug and buddy_tree_debug functions will print out the internal state of the allocator their output can be hard to interpret by users. Visualizing the output graphically can help understanding the allocator better.

Issue with (very) large arena

Hello,

I am currently using buddy_alloc.h in my project to manage the address space of a GPU accelerator. I took buddy_alloc.h from this commit :

commit eef4b9097248196f26b752758113c39d2760687b
Author: Stanislav Paskalev <[email protected]>
Date:   Sun Dec 17 13:18:33 2023 +0200

    Create CODE_OF_CONDUCT.md (#101)

The allocator fails allocating memory on a 120Gb arena :

[ECA buddy]$ !cat
cat main.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define BUDDY_ALLOC_IMPLEMENTATION
#include "buddy_alloc.h"
#undef BUDDY_ALLOC_IMPLEMENTATION

typedef struct
{
  void * metadata;
  void * arena;
  struct buddy * buddy;
} buddy_t;


void main (int argc, char * argv[])
{
  buddy_t B;
  long long int dev_alloc_size = atoll (argv[1]);
  printf (" DEV_ALLOC_SIZE = %lld\n", dev_alloc_size);

  B.metadata = malloc (buddy_sizeof (dev_alloc_size));
  B.arena    = malloc (dev_alloc_size);
  B.buddy    = buddy_init (B.metadata, B.arena, dev_alloc_size);


  size_t siz = 10000;
  void * ptr;

  ptr = buddy_malloc (B.buddy, siz);

  printf (" ptr = 0x%llx\n", ptr);

}

[ECA buddy]$ cc -std=gnu99  main.c ; ./a.out  120000000000 
 DEV_ALLOC_SIZE = 120000000000
 ptr = 0x0

While it works for smaller sizes (eg 60Gb).

Apparently, using the latest version of buddy_alloc.h solves the issue.

Do you confirm this bug has been fixed in the latest version of buddy_alloc.h ?

Thank you for your help and this very useful piece of code.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.