TABLESAMPLE and SAMPLE
Warning
Supported only for row-oriented tables. Support for column-oriented tables is currently under development.
Building a random sample from the data source specified in FROM
.
TABLESAMPLE
is part of the SQL standard and works as follows:
-
The operating mode is specified:
BERNOULLI
means "slowly, straightforwardly going through all the data, but in a truly random way".SYSTEM
uses knowledge about the physical data storage of data to avoid full data scans, but somewhat sacrificing randomness of the sample.
The data is split into sufficiently large blocks, and the whole data blocks are sampled. For applied calculations on sufficiently large tables, the result may well be consistent.
- The size of the random sample is indicated as a percentage after the operating mode, in parentheses.
- To manage the block size in the
SYSTEM
mode, use theyt.SamplingIoBlockSize
pragma. - Optionally, it can be followed by the
REPEATABLE
keyword and an integer in parentheses to be used as a seed for a pseudorandom number generator.
SAMPLE
is a shorter alias without sophisticated settings and sample size specified as a fraction. It currently corresponds to the BERNOULLI
mode.
Note
In the BERNOULLI
mode, if the REPEATABLE
keyword is added, the seed is mixed with the chunk ID for each chunk in the table. That's why sampling from different tables with the same content might produce different results.
Examples
SELECT *
FROM my_table
TABLESAMPLE BERNOULLI(1.0) REPEATABLE(123); -- one percent of the table
SELECT *
FROM my_table
TABLESAMPLE SYSTEM(1.0); -- about one percent of the table
SELECT *
FROM my_table
SAMPLE 1.0 / 3; -- one-third of the table