Skip to content

# Windows (v5.0.6): out-of-bounds heap write in monkey libevent backend (cb_event) under high event load — out_forward to an unreachable upstream #11905

@reto-bachmann

Description

@reto-bachmann

Summary

On Windows, an out_forward output pointing at an upstream that stays unreachable accumulates a large number of retry timers (each a socketpair + libevent event on the engine event base). The monkey libevent backend (lib/monkey/mk_core/mk_event_libevent.c) collects ready events in a fixed-size ctx->fired array that is allocated once at loop creation (256 entries for the engine loop) and never grows, while cb_event appends to it with no bounds check. When more than queue_size events become ready in a single event_base_loop() pass, cb_event writes past the end of the array — an out-of-bounds heap write that corrupts adjacent allocations or, when it reaches the guard/unmapped page, faults directly.

Confirmed on v5.0.6 (latest release) under full page heap: the faulting write lands exactly on the guard page immediately after ctx->fired, and the debugger names the overrun variable fired. The same defect is also present in v4.0.13 (identical fault, identical Windows FAILURE_ID_HASH), so this is a long-standing issue, not a recent regression.

Environment

  • Fluent Bit v5.0.6 — official Windows x64 build (fluent-bit-5.0.6-win64, FileVersion 5.0.6.0); source tag v5.0.6. Crash reproduced in ~7–22 minutes.
  • Windows Server 2019 — 10.0.17763, x64, 4 procs.
  • Output: forward with TLS + Upstream, multiple workers; continuous input (tail / Windows event log).
  • libevent (bundled) built without thread locking (evthread_use_* is never called).
  • Also reproduced on v4.0.13 (see "Also present in v4.0.13").

Reproduction

Point an out_forward at a black-hole address so every connect runs into the timeout and retries pile up. This happened in production while the destination was not available. The black hole is just to speed up the issue.

[OUTPUT]
    Name         forward
    Match        *
    Host         10.255.255.1   # black hole, no RST
    Port         24224
    Retry_Limit  false          # unlimited retries -> many concurrent retry timers

Drive continuous input so the engine keeps scheduling flushes/retries. The process crashes after minutes. Capture with:
procdump -accepteula -ma -t -e -w fluent-bit.exe C:\dumps

Root cause

The engine event loop is created with a fixed size:

evl = mk_event_loop_create(256);          /* src/flb_engine.c  */

which allocates the fired array exactly once and records its capacity:

/* _mk_event_loop_create() */
ctx->fired = mk_mem_alloc_z(sizeof(struct mk_event) * size);   /* size = 256 */
ctx->queue_size = size;

_mk_event_add() registers further fds into libevent (event_new(... cb_event, event); event_add(...)) without ever growing ctx->fired or queue_size. The number of registered events is therefore unbounded, but the fired array stays at 256.

cb_event() appends one entry per ready event, with no bounds check:

/* cb_event(), mk_event_libevent.c */
i = ctx->fired_count;
fired = &ctx->fired[i];
fired->fd   = event->fd;       /* line 99 */
fired->mask = mask;            /* line 100 */
fired->data = event;
ctx->fired_count++;

fired_count is reset to 0 before each loop and counts up across all events fired in that pass:

/* _mk_event_wait_with_flags() */
ctx->fired_count = 0;
event_base_loop(ctx->base, flags);

When more than queue_size (256) events become ready in a single pass, &ctx->fired[fired_count] walks past the allocation and cb_event corrupts whatever follows it on the heap. (The same unchecked append exists in _mk_event_inject().)

Faulting dump (v5.0.6, without page heap)

fluent_bit!cb_event+0xa8                       [mk_event_libevent.c @ 100]   <-- mov [rax+8],ecx
fluent_bit!event_persist_closure+0x2f6         [libevent/event.c @ 1580]
fluent_bit!event_process_active_single_queue   [libevent/event.c @ 1639]
fluent_bit!event_process_active                [libevent/event.c @ 1738]
fluent_bit!event_base_loop+0x296               [libevent/event.c @ 1961]
fluent_bit!_mk_event_wait_with_flags+0x3a      [mk_event_libevent.c @ 456]
fluent_bit!mk_event_wait / flb_engine_start    [src/flb_engine.c @ 1141]

rax = 0x0000023e9628eff8, write to [rax+8] = 0x0000023e9628f000 (next, unmapped page), ecx = 1 (MK_EVENT_READ). struct mk_event is { int fd; int type; uint32_t mask; ... }, so offset 8 is mask — the faulting instruction is exactly fired->mask = mask, with fired = &ctx->fired[fired_count] at the end of the 256-entry allocation.

Confirmed under full page heap (v5.0.6)

Re-run with full page heap enabled (gflags /p /enable fluent-bit.exe /full; NTGLOBALFLAG: 2000000, APPLICATION_VERIFIER_LOADED: 1):

fluent_bit!cb_event+0x9d   [mk_event_libevent.c @ 99]   mov dword ptr [rax],ecx
FAULTING_LOCAL_VARIABLE_NAME:  fired
FAILURE_BUCKET_ID:  INVALID_POINTER_WRITE_AVRF_c0000005_fluent-bit.exe!cb_event

rax = 0x0000025a923e6000 is exactly page-aligned — the page-heap guard page placed immediately after the ctx->fired allocation. With page heap the fault now occurs on the first field write of the entry (fired->fd = event->fd, line 99, ecx = the fd value) rather than fired->mask, because the whole entry now starts past the array end. The debugger names the overrun target directly: FAULTING_LOCAL_VARIABLE_NAME: fired. The guard-page alignment places this write at index 256 of the 256-entry array — the append for the 257th simultaneously-ready event in one event_base_loop pass. This is a definitive heap buffer overrun of ctx->fired, not a use-after-free.

Also present in v4.0.13

Under full page heap, v4.0.13 faults identically (cb_event, line 99, FAULTING_LOCAL_VARIABLE_NAME: fired, write on the guard page after ctx->fired) and Windows assigns it the same FAILURE_ID_HASH as the v5.0.6 page-heap crash — i.e. it is classified as the same defect. Without page heap the overrun surfaced in v4.0.13 as roaming corruption of adjacent structures (a libevent timer min-heap and the engine event priority queue, with stray fd-range integers and partial-pointer overwrites), consistent with struct mk_event entries written past ctx->fired. The defect is unchanged across releases.

Secondary defect — timeout teardown (also present in v5.0.6)

Independent of the overflow, the timer teardown double-closes the read-end fd and has two owners freeing the same ev_map:

  • _mk_event_timeout_destroy() closes event->fd (= ev_map->pipe[0]) without nulling it, then calls _mk_event_del(), which closes ev_map->pipe[0] again. On Windows the fd is reused immediately, so the second close can hit a socket now owned by another event_base.
  • cb_timeout() self-frees ev_map on send failure while _mk_event_del() also frees it (double-free / UAF).

Worth fixing in the same pass, but not the corruptor demonstrated above.

Suggested fixes

1. Bound / grow the fired array (primary)

The number of events that can fire in one event_base_loop() pass equals the number of registered events, which is unbounded — so the fixed-capacity fired array must grow with it. Both append sites (cb_event and _mk_event_inject) need the guard, so factor it into one helper in mk_event_libevent.c:

/* Append a fired event, growing ctx->fired on demand so it can never
 * overflow when more events fire in one loop pass than queue_size. */
static inline int mk_event_fired_push(struct mk_event_ctx *ctx,
                                      evutil_socket_t fd, int mask,
                                      struct mk_event *event)
{
    struct mk_event *tmp;
    int new_size;

    if (ctx->fired_count >= ctx->queue_size) {
        new_size = (ctx->queue_size > 0) ? (ctx->queue_size * 2) : 256;
        tmp = mk_mem_realloc(ctx->fired, sizeof(struct mk_event) * new_size);
        if (tmp == NULL) {
            return -1;            /* OOM: drop rather than overflow */
        }
        ctx->fired      = tmp;
        ctx->queue_size = new_size;
    }

    ctx->fired[ctx->fired_count].fd   = fd;
    ctx->fired[ctx->fired_count].mask = mask;
    ctx->fired[ctx->fired_count].data = event;
    ctx->fired_count++;

    return 0;
}

cb_event then becomes:

static void cb_event(evutil_socket_t fd, short flags, void *data)
{
    int mask = 0;
    struct mk_event *event = data;
    struct ev_map  *map    = event->data;

    if (flags & EV_READ)  mask |= MK_EVENT_READ;
    if (flags & EV_WRITE) mask |= MK_EVENT_WRITE;

    mk_event_fired_push(map->ctx, event->fd, mask, event);
}

and the append block in _mk_event_inject:

    event->mask = mask;
    if (mk_event_fired_push(ctx, event->fd, mask, event) == 0) {
        loop->n_events++;
    }
    return 0;

The mk_mem_realloc happens inside cb_event during event_base_loop(), but that is safe: no pointer into ctx->fired is cached across cb_event calls (each call re-indexes ctx->fired[ctx->fired_count]), libevent holds no pointer into it, and the consumer reads ctx->fired only after the loop returns. Simply raising the static 256 in mk_event_loop_create() is not a fix — it only moves the threshold.

2. Single-owner timeout teardown (secondary)

static inline int _mk_event_timeout_destroy(struct mk_event_ctx *ctx, void *data)
{
    if (data == NULL) {
        return 0;
    }
    /* _mk_event_del() is the single owner: it closes both pipe ends, sets them
     * to -1, and frees the event + ev_map exactly once. Do NOT pre-close
     * event->fd here -- it aliases ev_map->pipe[0] and would be closed twice
     * (the second close can hit a fd already reused by another event_base). */
    return _mk_event_del(ctx, (struct mk_event *) data);
}

static void cb_timeout(evutil_socket_t fd, short flags, void *data)
{
    uint64_t val = 1;
    struct ev_map *ev_map = data;
    /* Signal only. Lifetime is owned solely by the explicit destroy path; never
     * free here, or it races with _mk_event_del() and double-frees ev_map. */
    (void) send(ev_map->pipe[1], (char *) &val, sizeof(uint64_t), 0);
}

This makes the explicit destroy path the sole owner. It assumes every timeout is torn down via mk_event_timeout_destroy(); any timeout that relied on cb_timeout's self-cleanup (on read-end close) should be reviewed before adopting this.

Capture notes (for maintainers)

The overflow is confirmed under full page heap (see "Confirmed under full page heap" above): the fault lands on the guard page immediately after ctx->fired, and the debugger identifies the overrun variable as fired. !heap -p -a <addr-inside-block> shows the offending allocation originates from _mk_event_loop_create (mk_mem_alloc_z(sizeof(struct mk_event) * 256)).

Workaround for affected users (mitigation, not a fix)

Keep the number of simultaneously-ready events on the engine loop well under the 256-entry fired capacity:

  • lower storage.max_chunks_up below 256 (default 128) — caps the concurrent up-chunk/task/timer population
  • finite Retry_Limit on the forward output — failed chunks leave the retry population instead of accumulating
  • storage.total_limit_size to bound the per-output backlog (drops oldest chunks)
  • log_level info to cut log-pipe pressure

These keep the load under the overflow threshold but do not remove the bug; safety depends on chunk size and burst patterns. Lowering log_level alone did not prevent the crash in testing.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions