Your agents aren't failing. Their context is.

See how we fix it

Blog

Single-shot reliable consumers with XREADGROUP CLAIM in Redis 8.4

May 26, 202615 minute read
Sergey Georgiev
Sergey Georgiev

In Redis 8.4, we extended XREADGROUP with a new optional CLAIM parameter that lets a single command both consume new stream entries and reclaim idle pending ones. In this blog post, we'll cover:

  • Why reliable Redis Streams consumers historically required multiple commands per loop iteration
  • How the new CLAIM option collapses that loop into a single round trip
  • The performance gains it delivers — up to 22.5x faster than XAUTOCLAIM
  • The data structures that make it efficient, and how we made them even leaner with a follow-up optimization

Should I care? If you use Redis Streams consumer groups and want workers to automatically recover messages abandoned by crashed, slow, or unhealthy consumers, then yes. Before Redis 8.4, that recovery usually required a loop combining XPENDING, XCLAIM or XAUTOCLAIM, and XREADGROUP. Now a single XREADGROUP call can first reclaim idle pending messages and then read new ones. You still need to XACK messages after successful processing — CLAIM simplifies recovery, but it does not replace acknowledgements.

A quick primer on the Pending Entries List

Redis Streams is an append-only log data structure introduced in Redis 5.0. On top of it, consumer groups give you the building blocks for reliable, at-least-once message processing across a fleet of workers.

The mechanism that makes this reliability possible is the Pending Entries List (PEL). Here's how it works.

When a client reads messages from a stream with XREADGROUP, it identifies itself with a consumer group and a consumer name. For every message that gets delivered, Redis creates a pending entry inside the group. Internally, each pending entry is a small C struct called streamNACK (short for "negative acknowledgment" — it represents a message that hasn't yet been acknowledged). Each streamNACK records:

  • Which message it refers to (the stream ID)
  • Which consumer received it
  • When it was delivered (delivery timestamp)
  • How many times it has been delivered (delivery count)

You'll see streamNACK referenced throughout this post; whenever it appears, just read it as "the record for one pending entry."

The pending entry sits in the PEL until the client confirms it has finished processing the message by calling XACK. Once acknowledged, the entry is removed and Redis considers that message done.

This is what makes Streams suitable for work queues and event pipelines: if a consumer crashes mid-processing — or simply takes too long — its pending entries stay in the PEL. Another consumer in the same group can then take ownership of those entries with XCLAIM and pick up the work. To find out which entries are eligible for this kind of takeover, clients use XPENDING, which can filter pending entries by how long they've been idle.

There's an important corollary: a pending entry that nobody acknowledges never goes away. It stays in the PEL forever and continues to consume memory. Correct consumer-group usage isn't just about making sure every message gets processed — it's about making sure every message gets acknowledged, even if processing fails. The PEL is the heart of fault tolerance in Streams, and it's also where the operational complexity lives.

The reliable consumer loop, before Redis 8.4

To build a correct consumer that handles both fresh messages and orphaned pending ones, you need three commands working together:

  • XREADGROUP with the special > ID, which means "give me messages never delivered to any consumer in this group"
  • XPENDING to discover entries that have been idle long enough to be considered abandoned
  • XCLAIM or XAUTOCLAIM to take ownership of those idle entries

A typical loop looks something like this:

This pattern works, but it has real costs:

  1. Extra round trips. Every iteration of every consumer makes two or three calls to Redis even when nothing needs to be claimed.
  2. Implementation burden. Each application reimplements the same orchestration logic — often subtly wrong. Multi-stream consumers compound the complexity.
  3. Inefficient scans. XAUTOCLAIM walks the PEL in stream-ID order and checks idle time per entry. With a large PEL and few idle entries, most of that scan is wasted work.

We wanted to give Streams users a single command that does the right thing.

Introducing the CLAIM option

At a high level, CLAIM tells XREADGROUP to do two things in one call: first sweep up any pending messages that have been sitting idle in the group for longer than a threshold you specify, then read new messages as usual. The reclaimed entries are returned alongside the new ones, with extra metadata so the consumer can tell them apart and decide how to handle them.

That single command does the same job the old recovery loop did, but with a few properties that are hard to get right by hand: it runs in one round trip, it prioritizes orphaned work over new arrivals, it blocks reactively on both new messages and aging pending entries, and it returns the metadata clients need to build retry caps and dead-letter logic. We'll cover each of those below. In Redis 8.4, XREADGROUP accepts an optional CLAIM parameter:

When CLAIM min-idle-time is specified, the command does two things in order, sharing a single COUNT budget:

  1. Claim first. It scans for pending entries across the requested streams that have been idle for at least min-idle-time milliseconds, claims them for the calling consumer, and adds them to the response — up to COUNT entries total.
  2. Then read. If the claim step filled the COUNT budget, the command returns immediately with only reclaimed entries. Otherwise it spends the remaining budget reading new entries from the streams, exactly as a normal XREADGROUP would.

So COUNT is a cap on the total number of entries , not on each step independently. Reclaimed entries always get priority — a consumer never starts processing new work while there's old work it could be picking up instead.

The consumer loop above collapses to this:

One command. One round trip. Idle pending entries are prioritized so that no message gets indefinitely stuck behind a backlog of new arrivals.Response format with CLAIM

When CLAIM is in use, each returned entry carries two extra fields so the client can make informed decisions without going back to Redis:

  • Idle time (ms) — milliseconds since this entry was last delivered. A value greater than zero means the entry was reclaimed; zero means it's freshly delivered.
  • Delivery count — how many times this entry has been delivered. 0 for new messages, 1 or more for claimed ones.

These two fields are what make this useful for building self-healing consumers. With them in hand, a client can implement retry caps, route poison messages to a dead-letter stream, escalate critically delayed work, and detect stuck processing — all without an XPENDING call.

If the client passes a specific message ID instead of > — meaning "replay my own pending history from this point" rather than "give me new messages" — CLAIM is ignored and the response uses the standard format.

Behavior with BLOCK

BLOCK is more interesting. A reliable consumer that blocks on new messages also wants to wake up the moment a pending entry crosses the min-idle-time threshold — otherwise the whole point of CLAIM is defeated during quiet periods.

To handle this, Redis tracks, per stream, the earliest timestamp at which the next pending entry will become claimable. A small bookkeeping function called from the blockedBeforeSleep hook checks that timestamp against the wall clock and wakes the relevant blocked clients exactly when needed. When multiple blocked clients are watching the same stream with different min-idle-time values, Redis keeps the minimum of their wakeup times so the earliest interested consumer is served first.

The result is reactive blocking: clients sleep efficiently and wake up as soon as either condition becomes true — new entries arrive, or pending entries age into eligibility.

Performance impact

The whole reason to add this option — beyond ergonomics — is that it can be implemented far more efficiently than the multi-command alternative. The bottleneck in the old approach is finding which pending entries are idle. XAUTOCLAIM solves this by scanning the PEL in stream-ID order, which has no relationship to delivery time, so it ends up checking many entries that aren't yet eligible.

We took a different approach: maintain a separate, time-ordered index of the PEL so finding idle entries is a range query rather than a scan.

When this helps most

The speedup is largest when your PEL is large but only a small number of messages are actually idle enough to reclaim. That's common when consumers are mostly healthy and only a few messages get stuck because a worker crashed or timed out. In that case XAUTOCLAIM may scan many pending entries to find a few reclaimable ones, while XREADGROUP CLAIM can go directly to the idle entries. If your PEL is small, or most pending messages are already idle, the speedup will be smaller.

Latency benchmarks

To compare the two approaches, we designed a test that stresses the case where XAUTOCLAIM does the most wasted work: a large PEL where only a small fraction of entries are actually idle enough to claim. This is a realistic production scenario — a backlog of recently-delivered work where a handful of stragglers have timed out — and it's exactly the shape that the time-ordered index is designed to handle well.

Test setup:

  1. Insert 20,000 messages into a stream
  2. Read all of them with XREADGROUP to fully populate the PEL
  3. Set idle time to 1100 ms on 1,000 randomly selected pending messages (the 5% that are eligible to claim)
  4. Set idle time to 50 ms on the remaining 19,000 (ineligible)
  5. Execute the target command with min-idle-time=1000 and COUNT=1000 to claim the eligible entries
  6. Repeat steps 3–5 for 1,000 iterations
MetricXAUTOCLAIMXREADGROUP CLAIMImprovement
Average54.671 ms2.426 ms95.6% lower
Median53.582 ms2.571 ms95.2% lower
P9562.536 ms3.370 ms94.6% lower
P9968.800 ms4.212 ms93.9% lower
Max71.596 ms4.653 ms—

That's up to 22.5x faster on average on this workload, with a much tighter tail. The improvement isn't a constant-factor win from removing a round trip — it's algorithmic. The new index turns a per-entry scan into an O(log n + k) range query, where k is the number of idle entries actually returned. (As we'll see further down, a follow-up optimization brought this even lower — to O(k) — by replacing the index with a simpler structure.)

Where does the 22.5x come from?

The arithmetic here is worth unpacking, because it tells you when to expect this kind of speedup in your own workload — and when not to.

XAUTOCLAIM walks the PEL in stream-ID order, which has no relationship to delivery time. To find the 1,000 eligible entries in this test, it has to examine roughly all 20,000 pending entries — that's O(n). XREADGROUP CLAIM uses the time-ordered index, so it visits only the 1,000 entries it actually returns — that's O(k). The theoretical ratio is n / k = 20,000 / 1,000 = 20x, and the measured 22.5x lines up cleanly with that.

So the speedup scales with the ratio of total PEL size to actually-idle entries. The bigger the gap, the bigger the win:

  • PEL mostly fresh, few stragglers (this benchmark): large speedup, because XAUTOCLAIM wastes work on every ineligible entry.
  • PEL mostly idle (e.g., 18,000 of 20,000 eligible): much smaller speedup. Both approaches end up touching most of the PEL, so they do similar amounts of work.
  • Small PEL: the difference shrinks too; constants start to dominate.

The headline number isn't "every workload becomes 22.5x faster" — it's "the pathological case for XAUTOCLAIM stops being pathological." That case happens to be very common in production: a busy stream where consumers are mostly keeping up, with occasional stuck messages that need to be reclaimed. That's exactly the regime where reliability matters most, and where the old approach paid the highest cost.

The cost we paid

The time-ordered index isn't free, and it's worth being explicit about what it costs — because some of those costs are paid even by consumers that don't use CLAIM.

Extra work on the write path. Every time XREADGROUP delivers a message, Redis now has to insert a streamNACK into pel_by_time in addition to the existing PEL structures. Every XACK has to remove it. Every XCLAIM and re-delivery has to update its position. Each of those rax-tree operations is O(log n) with a non-trivial constant — a 32-byte key traversal plus tree rebalancing. That's overhead on Redis's hot path, and it's paid whether or not anything ever queries the index.

For consumers that do use CLAIM, the trade is obviously worth it. For consumers that don't, they're paying a small tax for a feature they aren't using. We'll come back to this in a moment, because it's a big part of why the second optimization was worth doing.

Memory footprint. To measure the index's memory cost, we ran a separate test:

  1. Insert 200,000 messages into a stream
  2. Read them in blocks of 100 with XREADGROUP, populating the PEL
  3. Wait 5 ms between blocks to simulate realistic processing delays
  4. Compare memory used with and without the index
Without indexWith index
After insertion6.80 MB6.81 MB
After reading41.53 MB45.07 MB
Increase from reading34.72 MB38.27 MB

The index added 3.55 MB across 200,000 pending entries, or about 18.6 bytes per entry — a roughly 8.7% overhead on total memory. The overhead only applies to entries that are actually pending; once a message is acknowledged, its index entry goes away with it.

For most workloads these costs are acceptable, but they're real, and they motivated the follow-up work we describe next.

Under the hood

The rest of this post explains the internal Redis data structures that make CLAIM efficient. If you mainly want to use the feature, the key takeaway is that XREADGROUP CLAIM replaces the old multi-command recovery loop with a single command.

The original design: A time-ordered rax tree

The first version of the feature introduced a new rax (radix) tree per consumer group called pel_by_time. Each entry in this tree is keyed by:

The 32-byte composite key gives us three properties for free:

  • Uniqueness. Two pending entries can share a delivery time, but they can never share a stream ID. The composite is globally unique.
  • Time ordering. Rax trees sort lexicographically; with delivery_time as the prefix, that's equivalent to chronological order.
  • Range queries. "Find all entries idle for at least N ms" becomes a range scan from the start of the tree up to current_time - N. That's O(log n + k).

No node values were needed — the stream ID is embedded in the key itself, so once we've located an idle entry we can immediately retrieve its full streamNACK from the existing PEL structures.

This design shipped in the first version of XREADGROUP CLAIM and is what powered the benchmarks above.

The follow-up: From rax tree to linked list

Once we had the time-ordered index in production-shaped tests, we noticed something interesting about the workload.

99% of delivery_time updates set the time to "now."

This matters a lot, because — as we noted in the "cost we paid" section above — every XREADGROUP delivery, every XCLAIM, and every re-delivery is doing a rax-tree update on the hot path. Every time an entry is reclaimed or re-delivered, we were doing:

Two rax operations — each touching a 32-byte key, walking down a tree — for what is, fundamentally, an append to the tail of a time-ordered sequence. The rax tree was over-engineered for this access pattern, and it was making every consumer pay for it, even ones that never used CLAIM.

We replaced it with a doubly-linked list embedded directly in each streamNACK — that is, each pending-entry record now carries its own neighbors in the time-ordered sequence:

The consumer group now keeps a pel_time_head and pel_time_tail pointer. Updating a NACK's delivery time becomes:

For the typical case — "this entry was just delivered, push it to the tail" — we do a handful of pointer updates. For the rare case of XCLAIM with an explicit IDLE value in the past, pelListInsertSorted() scans backward from the tail; rare enough that its O(N) worst case doesn't matter in practice.

Why this works

The linked list is a perfect fit because:

  • Idle-entry queries still start at the head. The list is sorted by delivery time, so the oldest entries are at the front. Finding entries idle for at least N ms is a forward walk from the head until the condition stops holding — no lookup step needed, because we always begin at the head.
  • Updates are appends. The 99% case sets delivery time to "now," which means moving the NACK to the tail. Both unlink and append are O(1).
  • No separate structure to maintain. The previous design had a streamNACK in the PEL hash and a key in the rax tree. The list pointers live inside the streamNACK itself, so cache locality improves and we have one less allocation per entry.

Putting the algorithmic story side by side:

OperationXAUTOCLAIM (pre-CLAIM)CLAIM with rax treeCLAIM with linked list
Find idle entriesO(n) — scans the PEL in stream-ID orderO(log n + k) — locate range, then walkO(k) — walk from head
Update delivery timeO(log n) — two tree operationsO(log n) — two tree operationsO(1) — unlink + append

The first jump (scan → range query) is what gave us the headline speedup over XAUTOCLAIM on the stress workload — and as we saw, that ratio scales with n / k, the fraction of the PEL that's actually idle. The second jump (range query → head walk, plus O(1) updates) is what gave us the additional 28% throughput on top.

Memory got better too

The arithmetic is straightforward:

Per pending entry
Added by linked list (id + 2 pointers)32 bytes
Removed with rax tree (composite key + node overhead)~40–50 bytes
NetLower memory

So we got better throughput, lower latency, and a smaller footprint.

Throughput benchmarks for the optimization

We re-ran the workload after the switch to a linked list using memtier_benchmark with 2M messages.

ImplementationXREADGROUP RPSAvg latencyP99 latency
Rax tree4,935 ops/sec0.195 ms0.212 ms
Linked list6,321 ops/sec0.152 ms0.168 ms

That's +28% throughput, –22% average latency, –21% P99 latency, on top of the original 22.5x improvement over XAUTOCLAIM. XADD performance was unchanged at ~69K ops/sec — the optimization is purely on the consumer path.

Compatibility

The CLAIM option is fully optional. Consumers that don't use it see the same behavior and response format as before. Within a single consumer group, you can freely mix:

  • Consumers that use CLAIM and process both new and orphaned entries
  • Consumers that don't, and only handle new ones

The linked-list optimization is what makes this clean: the per-delivery bookkeeping is now O(1), so consumers that never touch CLAIM no longer pay a measurable performance cost for the feature's existence. And the optimization itself is fully internal — no protocol, RDB, or AOF format changes.

Wrapping up

Reliable Streams consumers used to require stitching together XPENDING, XCLAIM/XAUTOCLAIM, and XREADGROUP per loop iteration. With Redis 8.4, a single XREADGROUP ... CLAIM ... STREAMS ... does the same job in one round trip, prioritizes orphaned work correctly, blocks reactively on both new arrivals and aging pending entries, and returns the metadata clients need to build retry caps and dead-letter logic.

Under the hood, a time-ordered index turns "find me the idle entries" from a scan into a range query, and a linked-list implementation of that index gives us O(1) updates for the overwhelmingly common case where delivery time advances to "now."

The end result is up to 22.5x faster claim latency, 28% higher throughput, and a substantially simpler consumer loop — with full backward compatibility for everyone who isn't ready to change.

We're excited to see what reliability patterns the community builds on top of this. Happy streaming.

Get started with Redis today

Speak to a Redis expert and learn more about enterprise-grade Redis today.