The 4 allocators every C developer should know

The 4 allocators every C developer should know

malloc is a general-purpose tool. It handles arbitrary-sized blocks, in any order, with individual deallocation. That flexibility has a cost: per-block metadata, fragmentation, underlying system calls (brk, mmap), internal locks for thread safety.

On a burst of one million allocations, a bump allocator is routinely an order of magnitude faster than malloc. Not because malloc is badly written — glibc is excellent — but because it solves a more general problem than the one you actually have.

Here are four specialized allocators, from the simplest to the most complete. Each trades one capability of malloc for performance or simplicity.

1. Bump allocator (linear allocator)

Principle: one big block of memory, one pointer that advances. Each allocation moves the pointer forward. No individual deallocation — everything is freed at once at the end.

typedef struct {
    unsigned char *buffer;
    size_t capacity;
    size_t offset;
} BumpAllocator;

void *bump_allocate(BumpAllocator *allocator, size_t size, size_t alignment) {
    size_t aligned_offset = (allocator->offset + alignment - 1) & ~(alignment - 1);
    if (aligned_offset + size > allocator->capacity) return NULL;
    void *pointer = allocator->buffer + aligned_offset;
    allocator->offset = aligned_offset + size;
    return pointer;
}

void bump_reset(BumpAllocator *allocator) {
    allocator->offset = 0;
}

The alignment parameter guarantees the returned pointer is correctly aligned for the requested type (typically _Alignof(type), or 16 for generic use). Without it, accesses to types like double or structures cause undefined behavior on some architectures.

Cost: one addition, one mask, and one comparison per allocation. No metadata, no free list, no fragmentation.

Use case: batch processing. You allocate everything you need to process a request, a file, an image — then reset the allocator and start over. It’s the “allocate, process, free everything” pattern.

Limitation: you cannot free an individual block. If you need to free the third of ten blocks, the bump allocator doesn’t fit.

Performance: an order of magnitude faster than malloc on large allocation bursts. The gain comes from the complete absence of internal bookkeeping.

2. Pool allocator

Principle: a block of memory divided into fixed-size cells. A linked list links the free cells. Allocation pops the first free cell; free pushes it back at the head of the list.

typedef struct PoolFreeNode {
    struct PoolFreeNode *next;
} PoolFreeNode;

typedef struct {
    unsigned char *buffer;
    size_t cell_size;
    size_t cell_count;
    PoolFreeNode *free_list;
} PoolAllocator;

void *pool_allocate(PoolAllocator *allocator) {
    if (!allocator->free_list) return NULL;
    PoolFreeNode *node = allocator->free_list;
    allocator->free_list = node->next;
    return node;
}

void pool_deallocate(PoolAllocator *allocator, void *pointer) {
    PoolFreeNode *node = pointer;
    node->next = allocator->free_list;
    allocator->free_list = node;
}

Cost: one pointer operation per allocation and per free. Guaranteed O(1), no fragmentation possible (all cells have the same size).

Use case: when you manage many objects of the same size — tree nodes, table entries, network connections, game engine entities. bc-duplicate uses a pool allocator for its file entries: every entry is exactly the same size, hundreds of thousands are allocated and freed during execution.

Limitation: a single block size. If you need 32-byte blocks and 256-byte blocks, you need two separate pools (or a different allocator).

3. Free-list allocator

Principle: like malloc, but simplified. One block of memory, a linked list of free regions of variable sizes. On each allocation, you walk the list to find a large-enough block. On each free, you push the block back into the list and coalesce adjacent blocks.

typedef struct FreeBlock {
    size_t size;
    struct FreeBlock *next;
} FreeBlock;

typedef struct {
    unsigned char *buffer;
    size_t capacity;
    FreeBlock *free_list;
} FreeListAllocator;

void *freelist_allocate(FreeListAllocator *allocator, size_t size) {
    FreeBlock *previous = NULL;
    FreeBlock *current = allocator->free_list;

    while (current) {
        if (current->size >= size) {
            if (previous)
                previous->next = current->next;
            else
                allocator->free_list = current->next;
            return (unsigned char *)current + sizeof(FreeBlock);
        }
        previous = current;
        current = current->next;
    }
    return NULL;
}

void freelist_deallocate(FreeListAllocator *allocator, void *pointer) {
    FreeBlock *block = (FreeBlock *)((unsigned char *)pointer - sizeof(FreeBlock));
    block->next = allocator->free_list;
    allocator->free_list = block;
}

Allocation walks the list (first-fit strategy here) and returns the first block that’s large enough. Free pushes the block at the list’s head. This minimal version doesn’t split blocks and doesn’t coalesce neighbors — a complete implementation would add both to keep fragmentation in check.

Cost: O(n) worst case (n = number of free blocks). Slower than bump and pool, but lets you free any block individually.

Use case: when you need variable-sized blocks with individual free, but want to avoid malloc (for instance, to control total memory used, or to avoid syscalls). Game engines often use a free-list for resources with unpredictable lifetimes.

Limitation: fragmentation. Over time, memory splinters into small unusable blocks. Coalescing strategies help but don’t eliminate it. That’s exactly the problem malloc addresses with arenas, bins, and fallback mmap.

4. Arena allocator

Principle: a bump allocator that knows how to grow. When the current block is full, the arena allocates a new block and keeps going. Deallocation stays collective: everything is freed at once.

typedef struct ArenaBlock {
    unsigned char *buffer;
    size_t capacity;
    size_t offset;
    struct ArenaBlock *next;
} ArenaBlock;

typedef struct {
    ArenaBlock *current;
    size_t default_block_size;
} ArenaAllocator;

Allocation works like a bump. If the current block is out of room, a new block is chained and becomes the current. Reset walks the chain and frees or resets each block.

Cost: almost identical to the bump — one addition and one comparison per allocation, plus a rare block allocation. Amortization is excellent.

Use case: the most versatile of the four. It combines bump speed with the ability to grow without a predetermined cap. Ideal when object sizes vary, the total count is not known ahead of time, and everything will be freed together at the end of processing — for instance, file paths accumulated during a directory walk.

Compilers use arenas heavily: each phase (lexing, parsing, code generation) allocates into its own arena, then everything is freed at the end of the phase.

Limitation: like the bump, no individual free. And the block chain can waste memory if block size is poorly calibrated (too small = too many system allocations, too large = unused memory).

When to use what

AllocatorIndividual freeVariable sizeCost per allocationTypical use case
BumpNoYesO(1), near zeroBatch processing
PoolYesNoO(1)Homogeneous objects
Free-listYesYesO(n) worst caseUnpredictable lifetimes
ArenaNoYesO(1) amortizedCompiler phases, parsers

The real message

These four allocators don’t replace malloc. They complement it. malloc remains the right choice when you don’t know your allocation patterns ahead of time, when portability matters, when the code is called by third parties that expect free().

But in your own code, in critical loops, in the paths where you allocate thousands of objects per second — knowing these alternatives gives you a choice. And choice, in C, is the reason you picked the language.

The source code of all four allocators is available in bc-allocators under LGPLv3 (library, links cleanly into proprietary code).