Backup Collection
A backup collection organizes full and incremental backups of selected row-oriented tables into managed chains. It allows recovery to the state of the latest backup in the chain, providing protection against accidental data loss such as erroneous deletions or modifications.
Note
For practical instructions on creating and managing backup collections, see the Backup and Recovery guide.
Overview
Backup collections address common backup challenges for production workloads:
- Storage efficiency: Incremental backups capture only changes since the previous backup, significantly reducing storage requirements compared to multiple full backups.
- Consistent recovery: All tables in a collection are backed up from the same global snapshot, ensuring referential integrity across tables during restoration.
- Chain-based recovery: Recovery to the state of the latest backup in the current chain — see
RESTORE.
For a comparison with other backup methods (export/import, dump/restore), see Backup concepts.
Key Concepts
These terms are essential to understanding backup collections. For detailed definitions, see the glossary.
- Full backup: A complete snapshot of all data in the collection at a specific point in time. Serves as the foundation for subsequent incremental backups.
- Incremental backup: Captures only changes (inserts, updates, deletes) since the previous backup. Requires the entire backup chain for restoration.
- Backup chain: An ordered sequence starting with a full backup followed by zero or more incremental backups.
Limitations
Before using backup collections, understand these constraints:
- Row-oriented tables only: Column-oriented tables are not supported.
- One collection per table: A table can only belong to one backup collection at a time. To include a table in a different collection, run
DROP BACKUP COLLECTIONfor the current collection and create a new one with the desired set of tables. - Immutable membership: Once created, the table list in a collection cannot be modified. To add new tables, create a new collection that includes all desired tables.
- No partial restore: You cannot restore individual tables from a collection; the entire collection is restored together.
- External scheduling required: YDB does not provide built-in backup scheduling. Use external tools like cron for automated backups.
Architecture
Backup collections use a copy-on-write mechanism combined with changefeeds for efficient incremental backups. This section explains how the components work together.
How Backup Collections Work
The following diagram illustrates the backup workflow:
Collection creation defines which tables to include and creates the schema object. This is a fast, metadata-only operation.
Full backup creates a consistent snapshot of all tables in the collection. Key characteristics:
- Uses a global snapshot that ensures referential integrity across all tables in the collection.
- Creates changefeeds on each table to track subsequent modifications.
- Uses copy-on-write: The backup is created quickly by referencing existing data; actual data copying occurs only when source data is modified.
Incremental backup captures all changes since the previous backup:
- Uses a distributed transaction to read changefeeds from all tables at a consistent point, ensuring referential integrity across the collection.
- Reads accumulated changes from the changefeeds created during the full backup, which have been accumulating changes since the previous backup (full or incremental).
- Records all modifications: inserts, updates, and deletes (as tombstone records).
- Compacts changefeed data into incremental backup tables.
Warning
Schema changes (ALTER TABLE) to tables in a backup collection are not tracked by incremental backups. If you need to modify the schema of a backed-up table, create a new full backup after the schema change to ensure the backup chain reflects the new structure.
Note
Changefeeds created during full backup are automatically removed when the backup collection is dropped. They cannot be manually removed or reused for other purposes while the collection exists.
Storage
Backup collections are stored within the YDB cluster in a dedicated directory structure:
/Root/database/.backups/collections/
├── my_collection/
│ ├── 20250821141425Z_full/
│ │ ├── table_1/
│ │ └── table_2/
│ └── 20250821151519Z_incremental/
│ ├── table_1/
│ └── table_2/
Note
The .backups directory is created automatically when the first backup collection is created. Do not create this directory manually. Once it exists, you can manage backup tables within it (for example, when exporting or importing backups).
Cluster Storage
By default, backups are stored within the cluster. Cluster-stored backups are designed for recovery from logical errors such as accidental DROP TABLE, TRUNCATE TABLE, or erroneous data modifications. Benefits include:
- Fast backup and restore operations.
- Integrated security mechanisms.
- No external infrastructure required.
Warning
Cluster-stored backups share the same fault domain as the data they protect. If the cluster experiences a failure that exceeds its fault tolerance (such as total cluster loss or catastrophic data center events), both the data and backups may be lost. For protection against such scenarios, use external storage.
External Storage
For disaster recovery protection against cluster-wide failures, regularly export backup collections to external storage (S3-compatible storage or filesystem) using export/import operations.
To export backups to external storage, use the YDB CLI:
ydb export s3for S3-compatible storage.ydb tools dumpfor filesystem storage.
Each backup in the chain must be exported separately. Preserve the chain order during export/import to ensure successful restoration.
Background Operations
All backup and restore operations run asynchronously, allowing normal database operations to continue. Monitor progress using ydb operation list incbackup.
Restoring from Backups
Restoration recovers data to the state of the latest backup in the chain currently in the cluster. To restore to an earlier point, import only the desired prefix of the chain from external storage — see Importing and Restoring.
Restore Workflow
-
Import from external storage (if applicable): If backups were exported, import the full backup and all incremental backups up to the desired restore point.
-
Execute restore: Run
RESTORE collection_nameto restore all tables from the backup collection. The system applies the full backup and all incremental backups in sequence to reach the most recent backup point.
Warning
The restore operation fails if any of the tables being restored already exists at the same path. Rename or drop the conflicting tables before restoring.
The restore operation maintains transactional consistency across all tables in the collection.
Note
During the restore operation, target tables are unavailable for modifications. Partially restored data may be visible to read workloads. Plan restoration during a maintenance window or disable application access to affected tables until the operation completes.
See Also
- Backup concepts: Overview of all backup approaches in YDB
- Backup and Recovery guide: Practical operations guide
- Recipes and examples: Common scenarios and examples
- YQL reference: