Importing table schemas and data from MySQL
Introduction
Migrating schemas and data from MySQL to YDB — including large tables whose size exceeds available RAM — is performed using the mysql2ydb utility.
The utility's goal is to get an exact one-to-one copy of the source database. Table and column names are preserved as in MySQL; data types are mapped to the nearest equivalents in YDB (INT → Int32, VARCHAR → Text, AUTO_INCREMENT → BigSerial, etc.). Since names are transferred unchanged, they must be valid identifiers in YDB (characters, length, reserved words). The result should be a YDB database that replicates the original structure and allows queries with minimal "translation" from MySQL.
Each data fragment is read from MySQL by a single bounded query SELECT, written to YDB in a batch (BulkUpsert or transactional UPSERT), and then freed from memory. The process repeats until the entire table is processed — no more than one batch is stored in memory at a time (no more than two with pipeline reading).
The import is performed in the following order:
- The utility connects to MySQL and, based on the
information_schemametadata, determines the list of tables and their structure. - Target tables with mapped data types are created in YDB.
- Data is transferred in batches: reading from MySQL and writing to YDB.
- After each batch is successfully written, the cursor position is saved to the state table — in case of failure, migration can be resumed from the last successful fragment.
MySQL schema mapping
When transferring the schema, mysql2ydb aims for structural correspondence: the same table and column names, indexes where YDB supports them, and types that preserve values without loss (unsigned integers remain unsigned, boolean types remain boolean, etc.).
The utility reads MySQL metadata from information_schema and generates YDB DDL directly. The main type mappings are:
| MySQL type | Type in YDB |
|---|---|
INT, MEDIUMINT, SMALLINT, TINYINT |
Int32 |
BIGINT |
Int64 |
INT UNSIGNED, MEDIUMINT UNSIGNED, SMALLINT UNSIGNED, TINYINT UNSIGNED |
Uint32 |
BIGINT UNSIGNED |
Uint64 |
TINYINT(1) |
Bool |
FLOAT |
Float |
DOUBLE, REAL |
Double |
BIT |
Uint64 (including BIT(1) — wrapped in Uint64, not in Bool) |
CHAR, VARCHAR, TEXT, MEDIUMTEXT, LONGTEXT |
Text |
ENUM, SET, JSON |
Text |
BINARY, VARBINARY, BLOB, MEDIUMBLOB, LONGBLOB |
String (bytes inline in the same column) |
DATE |
Date |
DATETIME, TIMESTAMP |
Timestamp — the value is stored as UTC. For DATETIME (a type without time zone), the interpretation depends on the time_zone MySQL setting during reading; keep this in mind when migrating historical data. |
DECIMAL(p, s), NUMERIC(p, s) |
Decimal(22, 9) for all — the original p/s are not preserved; values that do not fit in 22 digits will be lost |
YEAR |
Uint16 |
| Other types | Text (fallback) |
Structural decisions
| MySQL schema element | mysql2ydb decision |
|---|---|
| Table and column names | As in MySQL (users, orders, …); must be valid YDB identifiers |
AUTO_INCREMENT |
BigSerial + ALTER SEQUENCE … START WITH from TABLES.AUTO_INCREMENT |
Secondary KEY / UNIQUE KEY |
INDEX … GLOBAL ASYNC / GLOBAL UNIQUE SYNC in CREATE TABLE |
Table without PRIMARY KEY |
Not supported when automatically creating the schema — see Tables without a primary key |
| Partitioning | Only AUTO_PARTITIONING_BY_LOAD |
Secondary indexes
Non-unique MySQL keys become INDEX … GLOBAL ASYNC, unique ones (except the primary key) become INDEX … GLOBAL UNIQUE SYNC. Tables with synchronous unique indexes automatically switch to transactional UPSERT during data loading, because BulkUpsert only supports asynchronous indexes.
Examples
AUTO_INCREMENT — the MySQL column id BIGINT AUTO_INCREMENT becomes:
`id` BigSerial NOT NULL,
PRIMARY KEY (`id`)
After CREATE TABLE, if the value of information_schema.TABLES.AUTO_INCREMENT is known:
ALTER SEQUENCE `<db>/<table>/_serial_column_id` START WITH <next_value> RESTART
Tables without a primary key
In YDB, each table must have a PRIMARY KEY. mysql2ydb takes the key only from MySQL columns with COLUMN_KEY = 'PRI'; UNIQUE KEY indexes become secondary GLOBAL UNIQUE SYNC, but are not substituted for the primary key. If a table has no PRIMARY KEY, schema creation fails with an error (PRIMARY KEY () — invalid DDL), and data transfer does not start.
Before migration, you need to set a key in MySQL for such tables, for example:
ALTER TABLE my_table ADD COLUMN id BIGINT AUTO_INCREMENT PRIMARY KEY FIRST;
or assign PRIMARY KEY to an existing column (or set of columns) if it is truly unique.
If you cannot add PRIMARY KEY in MySQL, but you can prepare the schema in YDB manually, create the table with the required key yourself and run the utility with the -data-only flag. Reading from MySQL without PRIMARY KEY in the metadata will go through LIMIT/OFFSET: on each page, MySQL rescans all previous rows, so migration time grows quadratically with table size — for large tables, it is better to add a key in MySQL beforehand. The key columns in YDB must be present among the MySQL columns.
If you need automatic import without a key in MySQL, consider YDB Importer — it adds a synthetic column ydb_synth_key (SHA-256 of the row). mysql2ydb does not create synthetic keys.
Efficient access to MySQL
The utility minimizes accesses to MySQL and limits the size of each query:
- Page-by-page reading by primary key —
WHERE (pk…) > (?) ORDER BY pk LIMIT n. Without full table scan into memory. - Read → write pipeline — one goroutine reads the next batch from MySQL while another writes the previous one to YDB. No more than two batches are in flight at the same time per table.
- Adaptive batch size — the number of rows is limited by available RAM and the average row size from
information_schema. - Progress checkpoint — after each successfully written batch, the cursor position and row counter are saved in the YDB service table; a restart continues from the last batch.
MySQL: SELECT batch N → BulkUpsert(batch N) → YDB
MySQL: SELECT batch N+1 → BulkUpsert(batch N+1) → YDB
...
Features
- One-to-one schema copy — original table and column names; mapping of MySQL types to the nearest YDB types (
AUTO_INCREMENT→BigSerial,UNSIGNED→Uint32/Uint64, secondary indexes,TINYINT(1)→Bool; see the MySQL schema mapping section). - Schema creation — tables in YDB are created from MySQL metadata (
information_schema). - Fragment-by-fragment data migration: the table is processed in fixed-size chunks; the data volume may exceed available RAM.
- Upsert by key — data is written via
BulkUpsertor transactionalUPSERT; re-running does not create duplicate rows. - Resumption — progress for each table is stored in YDB; an interrupted migration can be resumed from the last batch.
Installation
Source code and build instructions are in the mysql-ydb-importer repository on GitHub.
Building requires Go version not lower than specified in go.mod of the repository:
git clone https://github.com/ydb-platform/mysql-ydb-importer.git
cd mysql-ydb-importer
go build -o mysql2ydb ./cmd/mysql2ydb
Usage procedure
MySQL connection parameters are read by default from the ~/.my.cnf file (like the mysql client), section [client]. The -mysql flag overrides the configuration file.
Tables without primary key
Tables without PRIMARY KEY in MySQL are not supported during automatic schema creation. Before running, add a primary key in MySQL or prepare the schema manually and use -data-only (see the section Tables without a primary key).
Example ~/.my.cnf:
[client]
user = myuser
password = mypass
host = localhost
port = 3306
database = mydb
If the server requires a secure connection (require_secure_transport=ON), add to [client]:
ssl-mode = REQUIRED
For a self-signed certificate (without verification):
ssl-mode = REQUIRED
ssl-verify = 0
Alternative option — ssl=1 and ssl-verify=0.
Minimum startup when ~/.my.cnf is present (only -ydb is required):
./mysql2ydb -ydb "grpc://localhost:2136"
With explicit MySQL DSN (ignores ~/.my.cnf):
./mysql2ydb -mysql "user:$MYSQL_PASSWORD@tcp(localhost:3306)/mydb" -ydb "grpc://localhost:2136" -batch-size 10000
Command line flags
| Flag | Description |
|---|---|
-mysql |
DSN MySQL (if specified, overrides ~/.my.cnf) |
-ydb |
Endpoint YDB (required) |
-ydb-database |
Path to YDB database (default local) |
-schema-only |
Only create the schema, without data transfer |
-data-only |
Only transfer data (the schema in YDB must already exist — including one created manually) |
-batch-size |
Target batch size in rows (default 10,000) |
-max-chunk-rows |
Hard upper limit of rows in a batch (default 25,000) |
-parallel-tables |
Number of tables for parallel transfer (default 1) |
-tables |
Comma-separated list of tables (default – all) |
-force |
Migrate all tables from scratch, ignoring the saved state |
-force-recreate |
Delete all tables in YDB and recreate the schema from scratch |
Examples
Schema only:
./mysql2ydb -mysql "..." -ydb "grpc://localhost:2136" -schema-only
Only data (after schema creation):
./mysql2ydb -mysql "..." -ydb "grpc://localhost:2136" -data-only -batch-size 5000
Selected tables:
./mysql2ydb -mysql "..." -ydb "grpc://localhost:2136" -tables "users,orders"
Fragment-based reading
- For tables with a primary key in MySQL, cursor-based pagination is used:
WHERE (pk > ?) ORDER BY pk LIMIT batch_size. No more than one batch is kept in memory at a time. - If there is no
PRIMARY KEYin MySQL, but the schema in YDB has already been created manually (mode-data-only),LIMIT batch_size OFFSET offsetis applied — see the section Tables without a primary key.
Writing to YDB
By default, data is written via BulkUpsert. For tables with synchronous unique indexes, the utility automatically switches to transactional UPSERT, since BulkUpsert only supports asynchronous indexes.
Queries to YDB are marked as idempotent, so the SDK automatically retries them on network failures. The absence of duplicates when restarting the migration is ensured by overwriting rows by key and saving progress in a service table (see the Features section, the "Resumption" item).
Comparison with YDB Importer
There are other tools for loading data into YDB. YDB Importer is a universal JDBC importer for PostgreSQL, Oracle, SQL Server, and other sources: XML configuration, parallel import, partitioning configuration.
mysql2ydb solves a more specific task — one-to-one migration of a MySQL database using a single binary file, without JVM. For migration only from MySQL with large tables, ~/.my.cnf and the -ydb flag are sufficient; progress is saved in the service table YDB.
| YDB Importer | mysql2ydb | |
|---|---|---|
| Runtime | Java + JDBC drivers + XML configuration | One binary file in Go, native MySQL protocol |
| Scope | Multiple JDBC sources | Only MySQL |
| Default batch size | 1,000 rows (max-batch-rows) |
10,000 rows (-batch-size), with auto-tuning |
| Large tables | Parallel ranges, partition buffers | Cursor-based paginated reading, batch size based on RAM |
| Resumption after failure | Re-running the import | Checkpoint per batch in the service table YDB |
| Setting | JDBC JAR files, XML configuration | ~/.my.cnf and two flags |
YDB Importer is designed for importing from many JDBC sources into a configurable YDB structure (table-name-format, blob-name-format, date conversion modes, optional synthetic keys). Below are the practical differences when importing the same MySQL database with two tools:
| MySQL feature | mysql2ydb (one-to-one copy) | ydb-importer |
|---|---|---|
| Table names | As in MySQL (users, orders, …) |
Rename by pattern, e.g., mysql1/${schema}/${table} → mysql1/mydb/users ( configuration example) |
| Column names | As in MySQL | Sanitization: spaces, ., /, ` → _ |
AUTO_INCREMENT |
BigSerial + ALTER SEQUENCE … START WITH from TABLES.AUTO_INCREMENT |
Regular Int32/Int64; no BigSerial or sequence reset |
INT UNSIGNED, BIGINT UNSIGNED, … |
Uint32 / Uint64 |
JDBC maps integers to signed Int32/Int64 |
TINYINT(1) |
Bool |
TINYINT → Int32 (BOOLEAN → Bool) |
BIT |
Uint64 |
Bool (JDBC Types.BIT) |
Secondary KEY / UNIQUE KEY |
INDEX … GLOBAL ASYNC / GLOBAL UNIQUE SYNC in CREATE TABLE |
Only PRIMARY KEY in DDL; secondary indexes are not created |
ENUM, SET, JSON |
Text |
Text (similar) |
BLOB / BINARY |
Same column, value inline in String (bytes) |
Column becomes Int64 (blob identifier); data is moved to a separate table ${schema}/${table}_${field} with (id, pos, val) rows in 64 KB blocks |
TEXT / CLOB (large text) |
Inline Text in the main table |
Optionally a separate CLOB table (32 K character blocks) or inline Text depending on settings |
| Table without a primary key | Not supported — CREATE TABLE fails with an error; add PRIMARY KEY in MySQL before migration |
A column ydb_synth_key Text is added as PK (SHA-256 of the row); duplicates are collapsed into one row |
DATE |
Date |
Date32 by default (conv-date=DATE_NEW) |
DATETIME / TIMESTAMP |
Timestamp |
Datetime64 / Timestamp64 by default |
TIME |
Text (fallback) |
Int32 (seconds since midnight) |
DECIMAL(p,s) |
Decimal(22, 9) for all |
Decimal(p,s) when allow-custom-decimal=true |
YEAR |
Uint16 |
Not covered in MySQL type tests |
| Partitioning | Only AUTO_PARTITIONING_BY_LOAD |
PARTITION_AT_KEYS, splitting by source partitions, HASH for column store, etc. |
Working with BLOBs
MySQL table attachments (id INT, data MEDIUMBLOB):
Column data in the main table |
Where bytes are stored | |
|---|---|---|
| mysql2ydb | data String |
In the same row |
| ydb-importer | data Int64 (reference) |
Separate table …/attachments_data with chunked rows |
With mysql2ydb, you can perform SELECT data FROM attachments WHERE id = 1 the same way as in MySQL. With ydb-importer, you need to join the main table with an additional blob table.