Backup Collection

A backup collection organizes full and incremental backups of selected row-oriented tables into managed chains. It allows recovery to the state of the latest backup in the chain, providing protection against accidental data loss such as erroneous deletions or modifications.

Note

For practical instructions on creating and managing backup collections, see the Backup and Recovery guide.

Overview

Backup collections address common backup challenges for production workloads:

Storage efficiency: Incremental backups capture only changes since the previous backup, significantly reducing storage requirements compared to multiple full backups.
Consistent recovery: All tables in a collection are backed up from the same global snapshot, ensuring referential integrity across tables during restoration.
Chain-based recovery: Recovery to the state of the latest backup in the current chain — see RESTORE.

For a comparison with other backup methods (export/import, dump/restore), see Backup concepts.

Key Concepts

These terms are essential to understanding backup collections. For detailed definitions, see the glossary.

Full backup: A complete snapshot of all data in the collection at a specific point in time. Serves as the foundation for subsequent incremental backups.
Incremental backup: Captures only changes (inserts, updates, deletes) since the previous backup. Requires the entire backup chain for restoration.
Backup chain: An ordered sequence starting with a full backup followed by zero or more incremental backups.

Limitations

Before using backup collections, understand these constraints:

Row-oriented tables only: Column-oriented tables are not supported.
One collection per table: A table can only belong to one backup collection at a time. To include a table in a different collection, run DROP BACKUP COLLECTION for the current collection and create a new one with the desired set of tables.
Immutable membership: Once created, the table list in a collection cannot be modified. To add new tables, create a new collection that includes all desired tables.
No partial restore: You cannot restore individual tables from a collection; the entire collection is restored together.
External scheduling required: YDB does not provide built-in backup scheduling. Use external tools like cron for automated backups.

Architecture

Backup collections use a copy-on-write mechanism combined with changefeeds for efficient incremental backups. This section explains how the components work together.

How Backup Collections Work

The following diagram illustrates the backup workflow:

Collection creation defines which tables to include and creates the schema object. This is a fast, metadata-only operation.

Full backup creates a consistent snapshot of all tables in the collection. Key characteristics:

Uses a global snapshot that ensures referential integrity across all tables in the collection.
Creates changefeeds on each table to track subsequent modifications.
Uses copy-on-write: The backup is created quickly by referencing existing data; actual data copying occurs only when source data is modified.

Incremental backup captures all changes since the previous backup:

Uses a distributed transaction to read changefeeds from all tables at a consistent point, ensuring referential integrity across the collection.
Reads accumulated changes from the changefeeds created during the full backup, which have been accumulating changes since the previous backup (full or incremental).
Records all modifications: inserts, updates, and deletes (as tombstone records).
Compacts changefeed data into incremental backup tables.

Warning

Schema changes (ALTER TABLE) to tables in a backup collection are not tracked by incremental backups. If you need to modify the schema of a backed-up table, create a new full backup after the schema change to ensure the backup chain reflects the new structure.

Note

Changefeeds created during full backup are automatically removed when the backup collection is dropped. They cannot be manually removed or reused for other purposes while the collection exists.

Storage

Backup collections are stored within the YDB cluster in a dedicated directory structure:

/Root/database/.backups/collections/
├── my_collection/
│   ├── 20250821141425Z_full/
│   │   ├── table_1/
│   │   └── table_2/
│   └── 20250821151519Z_incremental/
│       ├── table_1/
│       └── table_2/

Note

The .backups directory is created automatically when the first backup collection is created. Do not create this directory manually. Once it exists, you can manage backup tables within it (for example, when exporting or importing backups).

Cluster Storage

By default, backups are stored within the cluster. Cluster-stored backups are designed for recovery from logical errors such as accidental DROP TABLE, TRUNCATE TABLE, or erroneous data modifications. Benefits include:

Fast backup and restore operations.
Integrated security mechanisms.
No external infrastructure required.

Warning

Cluster-stored backups share the same fault domain as the data they protect. If the cluster experiences a failure that exceeds its fault tolerance (such as total cluster loss or catastrophic data center events), both the data and backups may be lost. For protection against such scenarios, use external storage.

External Storage

For disaster recovery protection against cluster-wide failures, regularly export backup collections to external storage (S3-compatible storage or filesystem) using export/import operations.

To export backups to external storage, use the YDB CLI:

ydb export s3 for S3-compatible storage.
ydb tools dump for filesystem storage.

Each backup in the chain must be exported separately. Preserve the chain order during export/import to ensure successful restoration.

Background Operations

All backup and restore operations run asynchronously, allowing normal database operations to continue. Monitor progress using ydb operation list incbackup.

Restoring from Backups

Restoration recovers data to the state of the latest backup in the chain currently in the cluster. To restore to an earlier point, import only the desired prefix of the chain from external storage — see Importing and Restoring.

Restore Workflow

Import from external storage (if applicable): If backups were exported, import the full backup and all incremental backups up to the desired restore point.
Execute restore: Run RESTORE collection_name to restore all tables from the backup collection. The system applies the full backup and all incremental backups in sequence to reach the most recent backup point.

Warning

The restore operation fails if any of the tables being restored already exists at the same path. Rename or drop the conflicting tables before restoring.

The restore operation maintains transactional consistency across all tables in the collection.

Note

During the restore operation, target tables are unavailable for modifications. Partially restored data may be visible to read workloads. Plan restoration during a maintenance window or disable application access to affected tables until the operation completes.

Backup Collection

Overview

Key Concepts

Limitations

Architecture

How Backup Collections Work

Storage

Cluster Storage

External Storage

Background Operations

Restoring from Backups

Restore Workflow

See Also

Was the article helpful?