Delivery guarantees

Delivery guarantees define how many times each event from the input topic is processed by a streaming query. Understanding them is essential when you design data pipelines.

Note

We are actively improving stream processing. Guarantees will get stronger in future releases.

Data plane guarantees:

  • At-least-once — for all query types, every event is processed at least once.

Anomalies when modifying queries (control plane):

Checkpoints and recovery

YDB periodically saves a checkpoint — a snapshot of query state containing:

  • Offsets in input topics — positions up to which events were read and processed;
  • Aggregation state — intermediate results, for example accumulators in GROUP BY HOP.

YDB stores read offsets in its own checkpoints; it does not rely on external consumer offsets.

On recovery, the query rolls back to the latest checkpoint: it resumes reading from saved offsets and restores aggregation state. Events that arrived between the checkpoint and the failure are processed again. For more on checkpoints, see Checkpoints.

Data plane: at-least-once

If processing fails (compute restart, network loss, timeout), YDB automatically recovers the query from the latest checkpoint. At-least-once delivery holds for all streaming query types — every event is processed at least once. The query resumes from the saved offset and may emit results again. That applies to queries without aggregation (filter, enrich, transform) and to queries with windowed aggregation.

When writing to a table via UPSERT, retries do not duplicate rows: UPSERT updates the row by primary key. Data is not lost and duplicates do not accumulate.

When writing to an output topic, retries can duplicate messages: the same events may be written more than once. Downstream consumers must deduplicate if needed.

Control plane: query modification anomalies

Changing query text without stopping the query is not supported today. Updates use DROP + CREATE; in that case at-least-once semantics across the replacement do not hold — some events may be skipped. Scenarios are described below.

Partial first window after start

Time windows (GROUP BY HOP) align to wall-clock time. Window boundaries snap to multiples from the epoch: for a 1-minute window, boundaries are 12:00:00, 12:01:00, 12:02:00, etc., regardless of when the query started. If the query starts at 12:00:30, it lands in the window [12:00:00 .. 12:01:00], but data only arrives from 12:00:30. The aggregate for the first window therefore covers 30 seconds instead of a full minute.

This is expected on first start — later windows cover full intervals. Consider it when recreating queries.

Lost events when recreating a query

To change query text you use DROP + CREATE. On DROP, the checkpoint is deleted with the query; YDB stores read offsets internally, so they are removed too. The new query has no saved position and starts reading from the end of the topic. Events that arrived between deleting the old query and starting the new one are not read.

The same happens if data referenced by an offset in the checkpoint was already removed from the topic by TTL.

For windowed queries, the first windows after recreation may have gaps and understated aggregates.

Incomplete aggregates without watermarks

In stream processing, watermarks are system signals that, after a certain time, all data for an interval has been received. YDB does not support watermarks yet.

Note

Watermarks are planned for release 26.1.

Without watermarks, YDB closes a time window by wall-clock time, not by completeness of data. If a topic has multiple partitions and one partition is slow, some events may arrive after the window closes and are excluded from the aggregate.

This shows up when:

  • the topic has several partitions with uneven load;
  • producers write with different latency to partitions;
  • the network or a producer is temporarily slow.

Window aggregates can be understated by the share of events from “slow” partitions. The larger the latency spread across partitions, the stronger the effect.

See also