SELECT
Retrieves data from a table.
Syntax
[with_clause] SELECT [ALL | DISTINCT | DISTINCTROW] select_expr [[AS] alias], ... [INTO variable [, ...]] [ FROM table_references [WHERE expr] [GROUP BY [{{col_name | expr | position}, ...} | ALL]] | extended_grouping_expr}] [HAVING expr] [ORDER BY [{col_name | expr | position} [ASC | DESC ][ NULLS {FIRST | LAST}], ... | ALL]] [LIMIT {[offset,] row_count | row_count OFFSET offset}] [FOR UPDATE] ] [{ INTO OUTFILE 'file_name' | INTO FS 'destination_directory/file_name' [WITH COMPRESSION] | INTO S3 bucket/target CONFIG configuration_json CREDENTIALS credentials_json [WITH COMPRESSION] | INTO HDFS '<hdfs://<namenode DNS> | <IP address>:<port>/<directory>' [ CONFIG configuration_json ] | INTO GCS bucket/path CONFIG configuration_json CREDENTIALS credentials_json [WITH COMPRESSION] | INTO KAFKA kafka_topic_endpoint [kafka_configuration] [kafka_credentials] | INTO AZURE "container/blob-prefix" CREDENTIALS credentials_json [WITH COMPRESSION] } [format_options] ] [INTO LINK [db name.]connection_name 'backup path'] format_options: csv_options | external_format_options csv_options: [{FIELDS | COLUMNS} [TERMINATED BY 'string'] [[OPTIONALLY] ENCLOSED BY 'char'] [ESCAPED BY 'char'] ] [LINES [STARTING BY 'string'] [TERMINATED BY 'string'] ] external_format_options: FORMAT PARQUET table_references: table_factor | join_table table_factor: tbl_name [[AS] alias] [sample_ratio_clause] | (subquery) [[AS] alias]
Remarks
The
join_table
clause is documented in JOIN and Subqueries.The
join
andusing
clause is documented in JOIN and USING.CONFIG
andCREDENTIALS
can be specified in either order (CONFIG
followed byCREDENTIALS
orCREDENTIALS
followed byCONFIG
). For configuration examples refer BACKUP DATABASEformat_options
clauses are documented in the Format Options section.sample_ratio_clause
is documented in the SELECT … WITH (SAMPLE_RATIO = <value>) section.with_clause
is documented on the WITH (Common Table Expressions) page.If you are querying against a columnstore table with a FULLTEXT index defined, see MATCH and HIGHLIGHT for syntax and examples on how to do full-text search queries.
Non-aggregated data can also be transformed into a pivot table output format. See PIVOT for syntax and examples.
extended_grouping_expr
clauses include CUBE and ROLLUP. See CUBE and ROLLUP for syntax and examples.A subquery does not require an alias, assuming that removing the alias does not create ambiguity.
In a transaction, you can read from multiple databases.
This command must be run on the master aggregator or a child aggregator node (see Node Requirements for SingleStoreDB Commands).
You can specify a table named
DUAL
as a placeholder table for use with queries that do not reference a table.DUAL
contains one logical row. This convenience is provided for situations where you may be required to include a FROM clause in every SQL statement, even statements that do not need one in order to function. It can also help when porting applications from other database products that supportDUAL
.SELECT CURRENT_DATE from DUAL; **** +--------------+ | CURRENT_DATE | +--------------+ | 2023-04-03 | +--------------+
SELECT CURRENT_DATE from DUAL
is identical toSELECT CURRENT_DATE
.SELECT CURRENT_DATE; **** +--------------+ | CURRENT_DATE | +--------------+ | 2023-04-03 | +--------------+
A column expression can reference a column alias. For example, in the query
SELECT a + b as c, c + 10 FROM t;
, the column expressionc + 10
references the column aliasc
.A
WHERE
clause can reference a column alias. For example, in the querySELECT a + b as c FROM t WHERE c < 100;
, theWHERE
clause references the column aliasc
.A
HAVING
clause can reference a column alias. For example, in the querySELECT a + 50 as b,COUNT(*) FROM t GROUP BY a HAVING b < 1000;
, theHAVING
clause references the column aliasb
.Queries using the
WHERE. . .IN
clause with multiple columns is not supported.SELECT * FROM <table_name_1> WHERE (column_name_1, column_name_2) IN (SELECT column_name_1, column_name_2 FROM <table_name_2>;)
The following table and dataset were used for the examples
CREATE TABLE assets( asset_id INT, asset_type VARCHAR(50), asset_desc VARCHAR(50), asset_value DECIMAL(6,2), emp_id NUMERIC (5)); INSERT INTO assets VALUES('1049','laptop','mac_book_pro','1999.00','14'), ('49', 'cell_phone', 'iphone_12','879.00','102'), ('1100', 'laptop', 'mac_book_pro','1999.00','298'), ('2037', 'laptop', 'mac_book_air_M2','1199.00', '399'), ('58', 'cell_phone', 'iphone_12', '879.00','410'), ('130', 'cell_phone', 'iphone_13', '699','110'), ('210', 'laptop', 'mac_book_pro','2500.00','312'), ('111', 'laptop', 'mac_book_pro','2100.00', '089'), ('099', 'laptop', 'mac_book_air_M1','999','075'), ('140', 'cell_phone', 'iphone_13_pro','999.00', '263'), ('2100', 'laptop', 'mac_book_pro_M2', '2500.00', '691'), ('160', 'cell_phone', 'iphone_14_pro_max','1200.00', '691'), ('2110', 'laptop', 'mac_book_pro_M2', '2500.00', '817'), ('2120', 'laptop', 'mac_book_pro_M2', '2500.00', NULL), ('150', 'cell_phone', 'iphone_14_pro_','1100.00', NULL);
NULLS FIRST
andNULLS LAST
added to anORDER BY
clause sorts NULL values to the beginning or end of the results set. Ascending order is assumed for the non-NULL values.DESC
can be added to the clause to order the non-NULL values in descending order.SELECT * FROM assets ORDER BY emp_id NULLS FIRST; **** +----------+------------+-------------------+-------------+--------+ | asset_id | asset_type | asset_desc | asset_value | emp_id | +----------+------------+-------------------+-------------+--------+ | 2120 | laptop | mac_book_pro_M2 | 2500.00 | NULL | | 150 | cell_phone | iphone_14_pro_ | 1100.00 | NULL | | 1049 | laptop | mac_book_pro | 1999.00 | 14 | | 99 | laptop | mac_book_air_M1 | 999.00 | 75 | | 111 | laptop | mac_book_pro | 2100.00 | 89 | | 49 | cell_phone | iphone_12 | 879.00 | 102 | | 130 | cell_phone | iphone_13 | 699.00 | 110 | | 140 | cell_phone | iphone_13_pro | 999.00 | 263 | | 1100 | laptop | mac_book_pro | 1999.00 | 298 | | 210 | laptop | mac_book_pro | 2500.00 | 312 | | 2037 | laptop | mac_book_air_M2 | 1199.00 | 399 | | 58 | cell_phone | iphone_12 | 879.00 | 410 | | 2100 | laptop | mac_book_pro_M2 | 2500.00 | 691 | | 160 | cell_phone | iphone_14_pro_max | 1200.00 | 691 | | 2110 | laptop | mac_book_pro_M2 | 2500.00 | 817 | +----------+------------+-------------------+-------------+--------+
SELECT * FROM assets ORDER BY emp_id DESC NULLS FIRST; **** +----------+------------+-------------------+-------------+--------+ | asset_id | asset_type | asset_desc | asset_value | emp_id | +----------+------------+-------------------+-------------+--------+ | 150 | cell_phone | iphone_14_pro_ | 1100.00 | NULL | | 2120 | laptop | mac_book_pro_M2 | 2500.00 | NULL | | 2110 | laptop | mac_book_pro_M2 | 2500.00 | 817 | | 2100 | laptop | mac_book_pro_M2 | 2500.00 | 691 | | 160 | cell_phone | iphone_14_pro_max | 1200.00 | 691 | | 58 | cell_phone | iphone_12 | 879.00 | 410 | | 2037 | laptop | mac_book_air_M2 | 1199.00 | 399 | | 210 | laptop | mac_book_pro | 2500.00 | 312 | | 1100 | laptop | mac_book_pro | 1999.00 | 298 | | 140 | cell_phone | iphone_13_pro | 999.00 | 263 | | 130 | cell_phone | iphone_13 | 699.00 | 110 | | 49 | cell_phone | iphone_12 | 879.00 | 102 | | 111 | laptop | mac_book_pro | 2100.00 | 89 | | 99 | laptop | mac_book_air_M1 | 999.00 | 75 | | 1049 | laptop | mac_book_pro | 1999.00 | 14 | +----------+------------+-------------------+-------------+--------+
SELECT * FROM assets ORDER BY emp_id DESC NULLS LAST; **** +----------+------------+-------------------+-------------+--------+ | asset_id | asset_type | asset_desc | asset_value | emp_id | +----------+------------+-------------------+-------------+--------+ | 2110 | laptop | mac_book_pro_M2 | 2500.00 | 817 | | 2100 | laptop | mac_book_pro_M2 | 2500.00 | 691 | | 160 | cell_phone | iphone_14_pro_max | 1200.00 | 691 | | 58 | cell_phone | iphone_12 | 879.00 | 410 | | 2037 | laptop | mac_book_air_M2 | 1199.00 | 399 | | 210 | laptop | mac_book_pro | 2500.00 | 312 | | 1100 | laptop | mac_book_pro | 1999.00 | 298 | | 140 | cell_phone | iphone_13_pro | 999.00 | 263 | | 130 | cell_phone | iphone_13 | 699.00 | 110 | | 49 | cell_phone | iphone_12 | 879.00 | 102 | | 111 | laptop | mac_book_pro | 2100.00 | 89 | | 99 | laptop | mac_book_air_M1 | 999.00 | 75 | | 1049 | laptop | mac_book_pro | 1999.00 | 14 | | 150 | cell_phone | iphone_14_pro_ | 1100.00 | NULL | | 2120 | laptop | mac_book_pro_M2 | 2500.00 | NULL | +----------+------------+-------------------+-------------+--------+
ORDER BY ALL
sorts by all the columns in the same order they are written. ASC/DESC may be used but it will apply to all the columns.SELECT asset_id, asset_type, asset_desc FROM assets ORDER BY ALL; **** +----------+------------+-------------------+ | asset_id | asset_type | asset_desc | +----------+------------+-------------------+ | 49 | cell_phone | iphone_12 | | 58 | cell_phone | iphone_12 | | 99 | laptop | mac_book_air_M1 | | 111 | laptop | mac_book_pro | | 130 | cell_phone | iphone_13 | | 140 | cell_phone | iphone_13_pro | | 150 | cell_phone | iphone_14_pro_ | | 160 | cell_phone | iphone_14_pro_max | | 210 | laptop | mac_book_pro | | 1049 | laptop | mac_book_pro | | 1100 | laptop | mac_book_pro | | 2037 | laptop | mac_book_air_M2 | | 2100 | laptop | mac_book_pro_M2 | | 2110 | laptop | mac_book_pro_M2 | | 2120 | laptop | mac_book_pro_M2 | +----------+------------+-------------------+
The
GROUP BY
clause is used with aSELECT
statement to organize data intoGROUPS
. Aggregates are applied within those groups. One row of output is produced for each unique combination of values for the specified grouping expressions.SELECT asset_type, SUM(asset_value) AS total_value FROM assets GROUP BY asset_type; **** +------------+-------------+ | asset_type | total_value | +------------+-------------+ | cell_phone | 5756.00 | | laptop | 18296.00 | +------------+-------------+
The
GROUP BY ALL
clause causes the query to group by all non-aggregate expressions in the SELECT list, so all the columns to be grouped by do not have to be entered explicitly.SELECT asset_id, asset_type, asset_desc, SUM(asset_value) FROM assets GROUP BY asset_id, asset_type, asset_desc; SELECT asset_id, asset_type, asset_desc, SUM(asset_value) FROM assets GROUP BY ALL; **** +----------+------------+-------------------+------------------+ | asset_id | asset_type | asset_desc | SUM(asset_value) | +----------+------------+-------------------+------------------+ | 160 | cell_phone | iphone_14_pro_max | 1200.00 | | 2100 | laptop | mac_book_pro_M2 | 2500.00 | | 2110 | laptop | mac_book_pro_M2 | 2500.00 | | 140 | cell_phone | iphone_13_pro | 999.00 | | 99 | laptop | mac_book_air_M1 | 999.00 | | 210 | laptop | mac_book_pro | 2500.00 | | 150 | cell_phone | iphone_14_pro_ | 1100.00 | | 58 | cell_phone | iphone_12 | 879.00 | | 49 | cell_phone | iphone_12 | 879.00 | | 1049 | laptop | mac_book_pro | 1999.00 | | 2120 | laptop | mac_book_pro_M2 | 2500.00 | | 1100 | laptop | mac_book_pro | 1999.00 | | 130 | cell_phone | iphone_13 | 699.00 | | 2037 | laptop | mac_book_air_M2 | 1199.00 | | 111 | laptop | mac_book_pro | 2100.00 | +----------+------------+-------------------+------------------+
Below is an S3 example for
SELECT * FROM <TABLE> INTO OUTFILE
using compression.SELECT * FROM <table_name> into S3 's3://<bucket_name>/<filename>' CONFIG '{"region":"us-east-1"}' CREDENTIALS {"aws_access_key_id": "<xxxxxxxx>", "aws_secret_access_key": "<xxxxxxxxxxxxxxx>"}' FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' WITH COMPRESSION GZIP;
Aggregations for Expression Syntax
SingleStore supports these Aggregate Functions for expression syntax in SELECT
statements:
All aggregate functions exclude NULLs from their computations. For example, SELECT COUNT(c) FROM t returns the number of rows in t where c is not NULL. SELECT AVG(c) FROM t returns the average of the non-NULL c’s in t.
COUNT(*)
is equivalent to COUNT(1)
. It counts every row matching the query, even rows whose column data contain nothing but NULLs.
SELECT … LIMIT
Syntax
LIMIT {[offset,] row_count | row_count OFFSET offset}
Remarks
The
LIMIT
clause constrains the number of rows returned by theSELECT
statement.Both the arguments must be non-negative integer constants.
The
row_count
specifies the number of rows to return from the beginning of the result set, and theoffset
specifies the offset of the first row to return.The offset of the first row in a table is 0 (not 1).
Examples
SELECT * FROM hrRec; **** +-----------+-----------+----------+--------+ | FirstName | LastName | City | Tenure | +-----------+-----------+----------+--------+ | Adam | Gale | Brooklyn | 40 | | Samantha | Beck | New York | 44 | | Clara | Wakefield | DC | 24 | | Todd | Bridges | DC | 30 | | Ron | Fletcher | New York | 23 | +-----------+-----------+----------+--------+
SELECT * FROM hrRec LIMIT 2; **** +-----------+----------+----------+--------+ | FirstName | LastName | City | Tenure | +-----------+----------+----------+--------+ | Adam | Gale | Brooklyn | 40 | | Samantha | Beck | New York | 44 | +-----------+----------+----------+--------+
SELECT * FROM hrRec LIMIT 1,2; **** +-----------+-----------+----------+--------+ | FirstName | LastName | City | Tenure | +-----------+-----------+----------+--------+ | Samantha | Beck | New York | 44 | | Clara | Wakefield | DC | 24 | +-----------+-----------+----------+--------+
SELECT * FROM hrRec LIMIT 2 OFFSET 1; **** +-----------+-----------+----------+--------+ | FirstName | LastName | City | Tenure | +-----------+-----------+----------+--------+ | Samantha | Beck | New York | 44 | | Clara | Wakefield | DC | 24 | +-----------+-----------+----------+--------+
SELECT … FOR UPDATE
The SELECT ... FOR UPDATE
command is intended to be used inside of a transaction. This command takes write locks on the rows returned from the SELECT
query and the locks are held until the end of the transaction. Other transactions are blocked and cannot write to the locked rows until the locks are released.
We recommend the following when using SELECT ... FOR UPDATE
:
It is recommended to commit or abort the transaction immediately so that the locks are released. Too many locks in a transaction can result in a huge queue of transactions waiting on the locked rows.
To limit the number of rows that are locked, it is recommended to use a column with unique values in the
WHERE
clause of theSELECT
statement, for example thePRIMARY KEY
column.
Example
The following example uses the Orders
table:
DESCRIBE Orders; **** +--------------+-------------+------+------+---------+-------+ | Field | Type | Null | Key | Default | Extra | +--------------+-------------+------+------+---------+-------+ | OrderNumber | int(11) | NO | PRI | NULL | | | Status | varchar(20) | NO | | NULL | | | CustomerName | char(20) | YES | | NULL | | +--------------+-------------+------+------+---------+-------+
SELECT * FROM Orders; **** +-------------+------------+--------------+ | OrderNumber | Status | CustomerName | +-------------+------------+--------------+ | 1 | Delivered | John | | 3 | In Transit | Bon | | 2 | Delivered | Kerry | | 4 | Delivered | Tom | +-------------+------------+--------------+
The following transaction locks the row where OrderNumber
is 3
, using the FOR UPDATE
clause. The row will remain locked as long as the transaction is open.
BEGIN; SELECT * FROM Orders WHERE OrderNumber = 3 FOR UPDATE; **** +-------------+------------+--------------+ | OrderNumber | Status | CustomerName | +-------------+------------+--------------+ | 3 | In Transit | Bon | +-------------+------------+--------------+
Now, execute the following query in a different connection:
UPDATE Orders SET Status = "Delivered" WHERE OrderNumber=3; **** ERROR 1205 (HY000): Leaf Error (127.0.0.1:3307): Lock wait timeout exceeded; try restarting transaction. Lock owned by connection id 77, query `open idle transaction`
The above query returns an error since the rows are locked by the previous transaction. To remove the lock on the rows, commit or rollback the open transaction. See the Query Errors topic for more information on locked rows.
JOIN and Subqueries
Syntax
SingleStoreDB supports the following JOIN
and subquery syntax for the table_reference
part of SELECT
statements:
join_table: table_reference {LEFT | RIGHT | FULL} [OUTER] JOIN table_factor join_condition | table_reference [INNER | CROSS] JOIN table_factor [join_condition] | table_reference NATURAL {LEFT | RIGHT} [OUTER] JOIN table_factor | table_reference STRAIGHT_JOIN table_factor [join_condition] | table_reference, table_factor join_condition: ON conditional_expr
Remarks
This command must be run on the master aggregator or a child aggregator node (see Node Requirements for SingleStoreDB Commands)
STRAIGHT_JOIN
forces tables to be joined in the order in which they are listed in theFROM
clauseFULL OUTER JOIN
requires the join condition to be an equality. Also, the right argument to a full outer join cannot be a reference table.SingleStoreDB supports joining tables across databases:
CREATE DATABASE test1; USE test1; CREATE TABLE t1(id INT, col1 VARCHAR(10)); CREATE DATABASE test2; USE test2; CREATE TABLE t2(id INT, col1 VARCHAR(10)); SELECT test1.t1.*, test2.t2.* FROM test1.t1 JOIN test2.t2 ON t1.id = t2.id;
Examples
SELECT * FROM my_MemSQL_table WHERE col = 1; SELECT COUNT(*), user_name, page_url from clicks, users, pages -> WHERE clicks.user_id = users.user_id AND pages.page_id = clicks.page_id -> GROUP BY users.user_id, pages.page_id -> ORDER BY COUNT(*) DESC; +- ---------+- ----------+- -------------------------------+ | COUNT(*) | user_name | page_url | +- ---------+- ----------+- -------------------------------+ | 5 | jake | memsql.com | | 2 | john | http://www.singlestore.com/download | | 1 | jake | docs.singlestore.com | | 1 | jake | memsql.com | | 1 | jake | http://www.singlestore.com/download | +- ---------+- ----------+- -------------------------------+ 5 rows in set (0.00 sec)
SELECT t1.*, t2.* FROM t1 FULL OUTER JOIN t2 ON t1.a = t2.a;
JOIN and USING
Syntax
Using a JOIN
clause with the USING
clause will match only one column when there are more than one columns that match.
SELECT <table1_name>.<column_name> AS <column_name> FROM <table_name> JOIN <table_name> USING (<column_name>);
Remarks
You can utilize the
USING
clause instead of theON
clause inJOIN
operations with an explicitJOIN
clause.
Examples
TABLE 1: CREATE TABLE assets( asset_id int, asset_type varchar(50), asset_desc varchar(50), emp_id numeric (5) ); TABLE 2: CREATE TABLE employees( emp_id numeric (5), emp_name varchar(75) ); SELECT employees.emp_name AS employees, asset_desc as assets, assets.asset_type FROM assets JOIN employees USING (emp_id); **** +------------------------------------+ |employees |assets |asset_type | +------------------------------------+ |T_Willams |macbook pro | laptop | +------------------------------------+ |A_Young |iphone 12 | cell phone | +------------------------------------+ |D_Karras |macbook air | laptop | +------------------------------------+
SELECT … INTO <variable>
SELECT ... INTO variable
statement is used to initialize variables in stored procedures. It allows you to query a single value, a single column, or multiple columns from one or more tables and store the selected values into variable(s).
For information about creating user defined variables for use outside of stored procedures, see User-Defined Variables.
Remarks
The
SELECT ... INTO
statement must return only a single result row.The number of columns/expressions in the
SELECT
query must be the same as the number of variables being initialized in theINTO
list.SELECT ... INTO variable
statements must be used inside MPSQL procedure blocks.The
INTO variable
clause can only be used once inside aSELECT
query.A
SELECT ... INTO
statement cannot be used inside of a sub-select query.The variables in
SELECT ... INTO
statements must be declared with scalar data types.SELECT ... INTO variable
andSELECT ... INTO OUTFILE/S3/KAFKA
cannot be used in the same query.If the
SELECT ... INTO
statement returns0
rows, SingleStore throws an error:ER_INTO_VARIABLES_NO_ROWS
. To accommodate for this error, you can specify anEXCEPTION
in theDECLARE
block of a stored procedure (see Example 3 below).
Examples
Note: The following examples use the hrRec table.
DESC hrRec; **** +-----------+-------------+------+------+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-----------+-------------+------+------+---------+-------+ | FirstName | varchar(20) | YES | | NULL | | | LastName | varchar(20) | YES | | NULL | | | City | varchar(20) | YES | | NULL | | | Tenure | int(11) | YES | | NULL | | +-----------+-------------+------+------+---------+-------+
Example 1
The following example queries multiple columns from a single row and stores them in variables.
DELIMITER // CREATE OR REPLACE PROCEDURE into_var () AS DECLARE fname VARCHAR(20); lname VARCHAR(20); city VARCHAR(20); ten INT; BEGIN SELECT FirstName, LastName, City, Tenure INTO fname, lname, city, ten FROM hrRec WHERE Tenure > 40; ECHO SELECT CONCAT(fname, " ", lname) AS "Name", city AS "City", ten AS "Tenure"; END // DELIMITER ; CALL into_var(); **** +---------------+----------+--------+ | Name | City | Tenure | +---------------+----------+--------+ | Samantha Beck | New York | 44 | +---------------+----------+--------+
Example 2
The following example queries multiple aggregate functions and stores their values in variables.
DELIMITER // CREATE OR REPLACE PROCEDURE into_var () AS DECLARE row_c INT; sum_t INT; BEGIN SELECT COUNT(*), SUM(Tenure) INTO row_c, sum_t FROM hrRec; ECHO SELECT row_c AS "Row Count", sum_t AS "Tenure"; END // DELIMITER ; CALL into_var(); **** +-----------+--------+ | Row Count | Tenure | +-----------+--------+ | 5 | 170 | +-----------+--------+
Alternatively, you can query the values in a dynamic query and store them in variables, like
DELIMITER // CREATE OR REPLACE PROCEDURE into_var () AS DECLARE row_c INT; sum_t INT; qry VARCHAR(50); BEGIN qry = "SELECT COUNT(*), SUM(Tenure) FROM hrRec"; EXECUTE IMMEDIATE qry INTO row_c, sum_t; ECHO SELECT row_c AS "Row Count", sum_t AS "Tenure"; END // DELIMITER ; CALL into_var(); **** +-----------+--------+ | Row Count | Tenure | +-----------+--------+ | 5 | 170 | +-----------+--------+
Example 3
The following example shows how to assign a value to a variable if the SELECT ... INTO
statement returns 0
rows.
DELIMITER // CREATE OR REPLACE PROCEDURE cityCount () AS DECLARE cnum INT; BEGIN SELECT City INTO cnum FROM hrRec WHERE Tenure > 50; ECHO SELECT cnum; END // DELIMITER ; CALL cityCount(); **** ERROR 2439 (HY000): Unhandled exception Type: ER_INTO_VARIABLES_NO_ROWS (2439) Message: Query with 'INTO VARIABLES' clause returned zero rows whereas expected 1 row
Add an exception to the stored procedure.
DELIMITER // CREATE OR REPLACE PROCEDURE cityCount () AS DECLARE cnum INT; BEGIN SELECT City INTO cnum FROM hrRec WHERE Tenure > 50; ECHO SELECT cnum; EXCEPTION WHEN ER_INTO_VARIABLES_NO_ROWS THEN cnum = 'No city found!'; ECHO SELECT cnum; END // DELIMITER ; CALL cityCount(); **** +----------------+ | cnum | +----------------+ | No city found! | +----------------+
You can also add an exception for a scenario where the SELECT ... INTO
statement returns more than one row. For example,
EXCEPTION WHEN ER_SUBQUERY_NO_1_ROW THEN ECHO SELECT 'SELECT ... INTO statement returned more than 1 row.';
Nested Scalar Sub-Selects
SELECT
statements can be nested in SingleStoreDB queries.
SELECT ... [SELECT ...[SELECT [...]]]
Remarks
For scalar sub-selects, sub-select queries must not return more than one row.
The maximum allowed depth of nested sub-select queries is 40.
Sub-selects are not supported inside
GROUP BY/ORDER BY/HAVING
clauses, for nested sub-select queries of depth > 2.
Examples
The following examples show the use of nested sub-selects.
SELECT cust_id FROM customers WHERE EXISTS ( SELECT order_id FROM orders WHERE order_id IN ( SELECT id FROM transaction WHERE count > 5) );
DELETE FROM records WHERE id = ( SELECT order_id FROM orders WHERE order_date > ( SELECT CURRENT_DATE() + 30) );
SELECT ... ALL DISTINCT DISTINCTROW
Return rows with or without duplicates.
Syntax
SELECT [ALL | DISTINCT | DISTINCTROW] FROM table_reference
Remarks
The
ALL
clause returns all the matching rows including duplicates.The
DISTINCT
orDISTINCTROW
clause returns the matching rows but eliminates the duplicate rows from the output.DISTINCTROW
is a synonym ofDISTINCT
.
Examples
SELECT * FROM Product; +------------+------------+----------+ | Product_id | Brand_name | City | +------------+------------+----------+ | 1 | Nike | London | | 3 | Nike | New York | | 2 | Adidas | Paris | | 3 | Puma | Spain | +------------+------------+----------+ SELECT ALL Brand_name FROM Product; +------------+ | Brand_name | +------------+ | Nike | | Nike | | Adidas | | Puma | +------------+ SELECT DISTINCT Brand_name FROM Product; +------------+ | Brand_name | +------------+ | Nike | | Adidas | | Puma | +------------+
SELECT … INTO OUTFILE
SELECT ... INTO OUTFILE
formats and writes the results of a SELECT
query to a text file. The format_options
are similar to the parsing options used with LOAD DATA
. See Format Options for more information.
format_options: [{FIELDS | COLUMNS} [TERMINATED BY 'string'] [[OPTIONALLY] ENCLOSED BY 'char'] [ESCAPED BY 'char'] ] [LINES [STARTING BY 'string'] [TERMINATED BY 'string'] ]
Remarks
If a relative path that that does not start with
/
is specified, SingleStoreDB writes the file to the directory specified in the global variabletmpdir
(the same asdatadir
by default). To specify another location, enter the absolute path to the file as thefile_name
parameter.The default text formatting, used when the user omits the
FIELDS
andLINES
blocks, is to separate values with tabs (\t
) and rows with newlines (\n
).
Example
The following query writes the contents of table_name
to a file in the home directory of username. The file is formatted as a comma-separated values (CSV) file.
SELECT * FROM table_name INTO OUTFILE '/home/username/file_name.csv' FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n'
SELECT ... INTO AZURE
SELECT ... INTO AZURE
performs a distributed SELECT
into an Azure container. The CREDENTIALS
clause is required and its value is defined in credentials_json
as follows:
credentials_json: '{"account_name": "your_account_name_here", "account_key": "your_account_key_here" }'
The WITH COMPRESSION GZIP
clause, if specified, must be included after the CREDENTIALS
clause. If WITH COMPRESSION GZIP
is specified, the results of the SELECT
query are written to the Azure container in compressed gzip files. A compressed file with the .gz extension is created for each partition.
The WITH COMPRESSION
clause can be specified instead of the WITH COMPRESSION GZIP
clause. Both clauses are equivalent. However, if other compression formats are supported in the future, WITH COMPRESSION
may use a compression format other than gzip.
format_options
, if specified, must be included after the CREDENTIALS
clause, or after the WITH COMPRESSION clause (if it is specified). The format_options
are similar to the parsing options used with LOAD DATA
. See Format Options for more information.
format_options: [{FIELDS | COLUMNS} [TERMINATED BY 'string'] [[OPTIONALLY] ENCLOSED BY 'char'] [ESCAPED BY 'char'] ] [LINES [STARTING BY 'string'] [TERMINATED BY 'string'] ]
As an alternative to using a SELECT ... INTO AZURE
statement where you specify the CREDENTIALS
clause, you can use a SELECT ... INTO LINK
statement, where you reference a connection link. For more information, see Configuring and Using Connection Links.
SELECT … INTO HDFS
SELECT INTO ... HDFS
writes the result of a SELECT
query to HDFS. The configuration options for SELECT INTO ... HDFS
are similar to the HDFS Pipelines syntax.
Example
The following example demonstrates how to write the contents of a table to HDFS at a specified path.
SELECT * FROM stockTo INTO HDFS 'hdfs://hadoop-namenode:8020/stock_dir/records_file.csv'
SELECT … INTO LINK
SELECT ... INTO LINK
writes the results of a SELECT
query to S3, GCS, HDFS, or Kafka using a connection link. To use this command, you only need to know the connection link name, not the connection details and configuration. However, you require the SHOW LINK
permission, provided by your administrator, to use a connection link.
Example
The following example writes the contents of the table t1
, to the S3 bucket at the specified path, using the S3 connection link S3con
stored in the db1
database:
USE db1; SELECT * FROM t1 INTO LINK S3con 'testing/output';
Note: The connection link S3Con
should already exist in db1
. Refer to the CREATE LINK topic for details on creating a connection link.
SELECT … INTO FS
SELECT ... INTO FS
works similarly to SELECT INTO OUTFILE
, except that if the SELECT
logic determines that the results can be computed in parallel (e.g. there is no order by
clause, etc.), the query is pushed down to each leaf and executed on each partition, and a separate file is written for each partition. SELECT INTO OUTFILE
always writes to a single file.
When writing to multiple files, the file names will be:
destination_directory/file_name_0 destination_directory/file_name_1 destination_directory/file_name_2 etc.
Otherwise:
destination_directory/file_name
If WITH COMPRESSION GZIP
is specified, the results of the SELECT
query are written to the filesystem in compressed gzip files. A compressed file with the .gz extension is created for each partition.
The WITH COMPRESSION
clause can be specified instead of the WITH COMPRESSION GZIP
clause. Both clauses are equivalent. However, if other compression formats are supported in the future, WITH COMPRESSION
may use a compression format other than gzip.
The format_options
are similar to the parsing options used with LOAD DATA
. See Format Options for more information.
format_options: [{FIELDS | COLUMNS} [TERMINATED BY 'string'] [[OPTIONALLY] ENCLOSED BY 'char'] [ESCAPED BY 'char'] ] [LINES [STARTING BY 'string'] [TERMINATED BY 'string'] ]
Remarks
By default, SingleStoreDB writes the files to the directory specified in the global variable
tmpdir
(The same asdatadir
by default). To specify another location, enter the absolute path to the file as thedestination_directory
parameter.The default text formatting, used when the user omits the
FIELDS
andLINES
blocks, is to separate values with tabs (\t
) and rows with newlines (\n
).
Example
The following query writes the contents of table_name
to a set of files in the /tmp
directory on each leaf node (/tmp/a_0
, /tmp/a_1
, /tmp/a_2
, etc.). The files are comma-separated value (CSV) with one output file per partition (a_0
is for partition 0, a_1
is for partition 1, etc.).
SELECT * FROM table_name INTO FS '/tmp/a' FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n'
SELECT … INTO S3
SELECT ... INTO S3
performs a distributed select into a S3 bucket. The CREDENTIALS
clause is required and its value is defined in credentials_json
as follows.
configuration_json: '{"region":"your_region" [, "multipart_chunk_size_mb":<size_in_MB>]}' credentials_json: '{"aws_access_key_id": "replace_with_your_access_key_id", "aws_secret_access_key": "replace_with_your_secret_access_key", ["aws_session_token": "replace_with_your_temp_session_token",] ["role_arn":"replace_with_your_role_arn"] }'
The WITH COMPRESSION GZIP
clause, if specified, must be included after the CREDENTIALS
clause. If WITH COMPRESSION GZIP
is specified, the results of the SELECT
query are written to the S3 bucket in compressed gzip files. A compressed file with the .gz extension is created for each partition.
The WITH COMPRESSION
clause can be specified instead of the WITH COMPRESSION GZIP
clause. Both clauses are equivalent. However, if other compression formats are supported in the future, WITH COMPRESSION
may use a compression format other than gzip.
format_options
, if specified, must be included after the CREDENTIALS
clause, or after the WITH COMPRESSION clause (if it is specified). The format_options
are similar to the parsing options used with LOAD DATA
. See Format Options for more information.
format_options: [{FIELDS | COLUMNS} [TERMINATED BY 'string'] [[OPTIONALLY] ENCLOSED BY 'char'] [ESCAPED BY 'char'] ] [LINES [STARTING BY 'string'] [TERMINATED BY 'string'] ]
Remarks
Warning
The S3 bucket needs to be created before running this command.
The
multipart_chunk_size_mb
must be in the range of [5..500]. By default, the chunk size is5 MB
. A larger chunk size allows users to upload large files without going over Amazon’s limitation on maximum number of parts per upload. Although, a larger chunk size increases the chance of a network error during the upload to S3. If a chunk fails to upload, SingleStoreDB retries uploading it until the limit set on the number of retries by AWS is reached.Note
Each partition will use “multipart_chunk_size_mb” MB(s) of additional memory.
The output of
SELECT ... INTO S3
is stored with the content type binary/octet-stream in the S3 bucket.If the insert select logic determines that the
SELECT ... INTO S3
query can be run as distributed, the query will be pushed down to each leaf and executed on each partition. The name of each object will be:<bucket/target>_<partition ID>
If the insert select logic determines that the
SELECT ... INTO S3
query can only be on the aggregator because it contains aggregations, ORDER BY, GROUP BY, etc. then the query will be run on each leaf but the result will be collected on the aggregator and then output to S3. The object name will just be:<bucket/target>
The
SELECT
query will validate if the<bucket/target>
or<bucket/target>_
already exists on the S3 bucket first and fail if any of the object(s) already exist.As an alternative to using a
SELECT ... INTO S3
statement where you specify theCONFIG
andCREDENTIALS
clauses, you can use aSELECT ... INTO LINK
statement, where you reference a connection link. For more information, see Configuring and Using Connection Links.
Examples
The following simple example shows how to save all rows in table t1 to an S3 bucket using an AWS access key.
SELECT * FROM t1 INTO S3 'testing/output' CONFIG '{"region":"us-east-1"}' CREDENTIALS '{"aws_access_key_id":"your_access_key_id","aws_secret_access_key":"your_secret_access_key"}'
The following example saves the result set of a SELECT
query with a GROUP BY
clause and sends the file in chunks to an S3 bucket using an Amazon Resource Name (ARN) for AWS Identity and Access Management (IAM).
SELECT t1.a, t2.a FROM t1, t2 WHERE t1.a = t2.a GROUP BY t1.a INTO S3 'bucket_name/file_name' CONFIG '{"region":"us-east-1", "multipart_chunk_size_mb":100}' CREDENTIALS '{"role_arn": "arn:aws:iam::<AccountID>:role/EC2AmazonS3FullAccess"}'
The following example uses the format options to output the data in CSV format.
SELECT * FROM t INTO S3 'tmp/a' CONFIG '{"region":"us-east-1"}' CREDENTIALS '{"aws_access_key_id":"your_access_key_id","aws_secret_access_key":"your_secret_access_key"}' FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n'
SELECT … INTO GCS
SELECT ... INTO GCS
performs a distributed select into a Google Cloud Storage (GCS) bucket.The CONFIG
clause is not required, and can be excluded or left empty as defined in the configuration_json
that follows. The CREDENTIALS
clause is required and its value is defined in the credentials_json
that follows.
configuration_json: '{}' credentials_json: '{"access_id": "replace_with_your_google_access_key", "secret_key": "replace_with_your_google_secret_key" }'
The WITH COMPRESSION
clause can be specified instead of the WITH COMPRESSION GZIP
clause. Both clauses are equivalent. However, if other compression formats are supported in the future, WITH COMPRESSION
may use a compression format other than gzip.
format_options
, if specified, must be included after the CREDENTIALS
clause, or after the WITH COMPRESSION clause (if it is specified). The format_options
are similar to the parsing options used with LOAD DATA
. See Format Options for more information.
format_options: [{FIELDS | COLUMNS} [TERMINATED BY 'string'] [[OPTIONALLY] ENCLOSED BY 'char'] [ESCAPED BY 'char'] ] [LINES [STARTING BY 'string'] [TERMINATED BY 'string'] ]
Remarks
Unlike other filesystem options, with
SELECT ... INTO GCS
, data written by a single partition is not divided into chunks. However, there may still be multiple partitions and thus multiple files (one per partition) will appear in GCS.The maximum object size supported by GCS is 5 TB; this limit also applies to
SELECT ... INTO GCS
.The
CONFIG
clause may optionally specify an endpoint_url.The
CREDENTIALS
clause is required.We support only HMAC keys.
The
CREDENTIALS
clause should be a JSON object with two fields:
access_id
: usually a 24 or 60 character alphanumeric string, which is linked to the Google account, typically all uppercase and starts with GOOG
.
secret_key
: usually a 40 character Base-64 encoded string that is linked to a specific access_id
.
As an alternative to using a SELECT ... INTO GCS
statement where you specify the CONFIG
and CREDENTIALS
clauses, you can use a SELECT ... INTO LINK
statement, where you reference a connection link. For more information, see Configuring and Using Connection Links.
Examples
The following simple example shows how to save all rows in table table_name
to a GCS bucket using a Google access key, and outputs them as a CSV.
CREATE TABLE table_name (column_name INT); INSERT INTO table_name VALUES (1), (2), (3); SELECT * FROM table_name INTO GCS 'bucket/path' CREDENTIALS '{"access_id": "replace_with_your_google_access_key", "secret_key": "replace_with_your_google_secret_key"}' FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n';
SELECT … INTO KAFKA …
SELECT ... INTO KAFKA ...
runs a SELECT
query, constructs a Kafka message for each row in the result set, and publishes the messages to a Kafka topic.
When SELECT ... INTO KAFKA ...
constructs a Kafka message, it includes every column value in the result set’s row and separates the column values by a delimiter. To configure the delimiter and other message formatting settings, use the Format Options clauses.
kafka_configuration: CONFIG 'string' kafka_credentials: CREDENTIALS 'string' format_options: [{FIELDS | COLUMNS} [TERMINATED BY 'string'] [ENCLOSED BY 'char'] [ESCAPED BY 'char'] ] [LINES [TERMINATED BY 'string'] [STARTING BY 'string'] ] kafka_topic_endpoint: host:port[, ...]/topic
Arguments
kafka_configuration
Optional. The Kafka configuration properties, specified in JSON format, that are used while SingleStoreDB publishes messages to a topic. The configuration properties are the same properties that are stored in the server.properties
file on each Kafka broker.
kafka_credentials
Optional. The credentials, in JSON format, used to connect to Kafka.
format_options
Optional. Clauses that specify how a Kafka message is formatted when it is constructed from a row in the SELECT
result set. See the Format Options section for more information.
See an example of how to use the clauses.
kafka_topic_endpoint
The list of Kafka brokers, followed by the topic to which SingleStoreDB will publish messages. For each broker in the list, specify its host and port.
Remarks
When possible, SingleStoreDB queries leaf nodes directly, bypassing the aggregator. This allows SingleStoreDB to send data directly from the leaves to Kafka partitions.
SingleStoreDB constructs a Kafka message as an array of bytes.
As an alternative to using a
SELECT ... INTO KAFKA
statement where you specify theCONFIG
andCREDENTIALS
clauses, you can use aSELECT ... INTO LINK
statement, where you reference a connection link. For more information, see Configuring and Using Connection Links.
Examples
Example: Specifying the Kafka Message Format
The following query uses the FIELDS
and LINES
clauses to format the Kafka messages that are constructed from the SELECT
result set.
SELECT col1, col2, col3 FROM t ORDER BY col1 INTO KAFKA 'host.example.com:9092/test-topic' FIELDS TERMINATED BY ',' ENCLOSED BY '"' ESCAPED BY "\t" LINES TERMINATED BY '}' STARTING BY '{';
Suppose the result set returned by SELECT col1, col2, col3 FROM t ORDER BY col1
is:
+------+------+------+ | col1 | col2 | col3 | +------+------+------+ | a | b | c | | d | e | f | | g | h\ti | j | +------+------+------+
This result set will be converted to three Kafka messages having the following format:
Message 1: {a,b,c}
Message 2: {d,e,f}
Message 3: {g,h<tab character>i,j}
<tab character>
will be replaced by the number of spaces that your tab is set to.
Note
If a SELECT ... INTO ... KAFKA
query does not include the FIELDS...
and LINES..
. clauses, the default Kafka message formatting settings are used. The default settings would look something like this:
SELECT col1, col2, col3 FROM t ORDER BY col1 INTO KAFKA 'host.example.com:9092/test-topic' FIELDS TERMINATED BY '\t' ENCLOSED BY '' ESCAPED BY '\\' LINES TERMINATED BY '\n' STARTING BY '';
Example: Using the CONFIG
and CREDENTIALS
Clauses
The following example uses the CONFIG
and CREDENTIALS
clauses.
SELECT text FROM t INTO KAFKA 'host.example.com:9092/test-topic' CONFIG '{"security.protocol": "ssl", "ssl.certificate.location": "/var/private/ssl/client_memsql_client.pem", "ssl.key.location": "/var/private/ssl/client_memsql_client.key", "ssl.ca.location": "/var/private/ssl/ca-cert.pem"}' CREDENTIALS '{"ssl.key.password": "abcdefgh"}'
For more information on what settings to include in the CONFIG
and CREDENTIALS
clauses, see Securely Connect to Kafka.
Example: Specifying One Kafka Broker
The following example imports the data in the column text
from table t
into the Kafka topic test-topic
. One Kafka broker is specified; it is located on port 9092
at host.example.com
.
SELECT text FROM t INTO KAFKA 'host.example.com:9092/test-topic'
Example: Specifying Multiple Kafka Brokers
The following example imports the data in the column text
from table t
into the Kafka topic test-topic
. Three Kafka brokers are specified, located on port 9092
, at host1.example.com
, host2.example.com
and host3.example.com
.
SELECT text FROM t INTO KAFKA 'host1.example.com:9092,host2.example.com:9092,host3.example.com:9092/test-topic'
SELECT INTO ... FORMAT PARQUET
SELECT INTO ... FORMAT PARQUET
allows SingleStoreDB data to be exported into Apache Parquet (Parquet) format for the given SQL type. See Apache Parquet documentation for detailed information.
Some SQL types may not fit into a particular Parquet type. This is dependent on the actual stored values for that type. For data that does not fit, the user will receive suggestions for the appropriate type for storing data in Parquet format.
Syntax
FORMAT PARQUET
Remarks
The
FORMAT PARQUET
clause will go at the end of theSELECT
statement.Time and date types are stored in the engine timezone without adjusting to UTC. Below are some examples of the differences between SQL and Parquet time and date formats. If any value of SQL TIME(6) is outside the range of the Parquet TIME, the following error message may occur:
can not save SQL TIME(6) … because parquet's TIME is within a single day (hour should be between 0 and 23); consider casting this column to BIGINT
If any value of TIMESTAMP is zero, which is outside the range of the Parquet TIMESTAMP, the following error message may occur:
can not be saved as TIMESTAMP with millisecond precision because zero value is outside the scope of the parquet type; consider casting this column to INT64
Examples
The examples below have been shortened to illustrate the use of the Parquet format. The data ingestion, creation of table2, and start pipeline are not shown.
Example 1
Creates a table.
CREATE TABLE table1 (f_CHAR_2_5 CHAR(2), f_LONGTEXT4 LONGTEXT NOT NULL, f_DATETIME_6_3 DATETIME(6), f_NUMERIC_27_13__UNSIGNED2 NUMERIC(27,13) UNSIGNED, f_TINYINT1 TINYINT);
Uses the Parquet format.
SELECT * FROM table1 into fs ‘/tmp/parquet_files’ FORMAT PARQUET;
Creates the pipeline using Parquet syntax.
CREATE PIPELINE pipe1 AS LOAD DATA fs ‘/tmp/parquet_files’ INTO TABLE table2 FORMAT PARQUET (f_CHAR_2_5 <- %::f_CHAR_2_5, f_LONGTEXT4 <- %::f_LONGTEXT4, @f_DATETIME_6_3 <- %::f_DATETIME_6_3, f_NUMERIC_27_13__UNSIGNED2 <- %::f_NUMERIC_27_13__UNSIGNED2, f_TINYINT1 <- %::f_TINYINT1 ) SET f_DATETIME_6_3=DATE_ADD('1970-01-01', INTERVAL @f_DATETIME_6_3 MICROSECOND);
Example 2
Creates a table.
CREATE TABLE table1 (f_REAL3 REAL NOT NULL, f_TIME2 TIME, f_ENUM__small___medium___large__1 ENUM('small','medium','large'));
Uses the Parquet format.
SELECT * FROM table1 into fs '/tmp/parquet_files' FORMAT PARQUET;
Creates the pipeline using Parquet syntax.
CREATE PIPELINE pipe2 AS LOAD DATA fs '/tmp/parquet_files' INTO TABLE table2 FORMAT PARQUET (f_REAL3 <- %::f_REAL3, @f_TIME2 <- %::f_TIME2, f_ENUM__small___medium___large__1 <- %::f_ENUM__small___medium___large__1 ) SET f_TIME2=sec_to_time(@f_TIME2/1000);
Example 3
Creates a table.
CREATE ROWSTORE TABLE table1 (f_TEXT5 TEXT NOT NULL, f_GEOGRAPHY4 GEOGRAPHY NOT NULL, f_VARCHAR_200_3 VARCHAR(200), f_DECIMAL_27_13_2 DECIMAL(27,13), f_INT_UNSIGNED1 INT UNSIGNED);
Uses the Parquet format.
SELECT * FROM table1 INTO OUTFILE '/tmp/parquet_files3' FORMAT PARQUET;
Create Link.
CREATE LINK p_link_20_56_14_557666 AS fs CREDENTIALS '{}' CONFIG '{}';
Creates the pipeline using Parquet syntax.
CREATE PIPELINE pipe3 AS LOAD DATA LINK p_link_20_56_14_557666 '/tmp/parquet_files3' INTO TABLE table2 FORMAT PARQUET (f_TEXT5 <- %::f_TEXT5, f_GEOGRAPHY4 <- %::f_GEOGRAPHY4, f_VARCHAR_200_3 <- %::f_VARCHAR_200_3, f_DECIMAL_27_13_2 <- %::f_DECIMAL_27_13_2, f_INT_UNSIGNED1 <- %::f_INT_UNSIGNED1);
Format Options
format_options: [{FIELDS | COLUMNS} [TERMINATED BY 'string'] [[OPTIONALLY] ENCLOSED BY 'char'] [ESCAPED BY 'char'] ] [LINES [STARTING BY 'string'] [TERMINATED BY 'string'] ]
FIELDS TERMINATED BY 'string'
: The delimiter to use between each column value in the message. If not specified, defaults to\t
.FIELDS ENCLOSED BY 'char'
: Prefixes and suffixes each column value of the message with the “char” value. If not specified, defaults to nothing. TheOPTIONALLY
keyword does not affect the behavior of this option; it exists to maintain compatibility with MySQL.FIELDS ESCAPED BY 'char'
: Translates the escape sequence in the “char” value, instead of using the literal “char” value. Translation is done for all occurrences of the “char” value in the message. If not specified, defaults to\\
.LINES TERMINATED BY 'string'
: A string to insert at the end of the message. If not specified, defaults to\n
.LINES STARTING BY 'string'
: A string to insert at the beginning of the message. If not specified, defaults to nothing.
SELECT … WITH (SAMPLE_RATIO = <value>)
SELECT ... WITH (SAMPLE_RATIO = <value>)
retrieves a random sample of data from a table (specifically, a Bernoulli sample).
sample_ratio_clause: WITH (SAMPLE_RATIO = ratio)
Argument
ratio
A decimal between 0.0 and 1.0 specifying the probability that each row will be retrieved from the table. The probability is applied to each row independently.
Remarks
When you run multiple
SELECT
statements against the same table usingWITH (SAMPLE_RATIO = <value>)
, you will receive similar results. This is the case even when your sample ratio is different in yourSELECT
statements. For example, the sample returned for a sample ratio of0.25
is often a subset of the sample ratio returned for0.5
.Using a sample ratio that is one divided by a power of two allows
WITH (SAMPLE_RATIO = <value>)
to run more efficiently.You can use
WITH (SAMPLE_RATIO = <value>)
in aSELECT
subquery.
Warning
You can use WITH (SAMPLE_RATIO = <value>)
with any table. It operates most efficiently when used with a rowstore table whose primary key is not a hash key, or a columnstore table with AUTOSTATS_SAMPLING
enabled (the default). Otherwise, it will scan all rows, or all rows in a range, to do sampling.
Examples
Simple SELECT
statements
The following example populates a table with the integers 1
to 10
and selects two random samples.
DROP DATABASE IF EXISTS singlestore_docs_example; CREATE DATABASE singlestore_docs_example; USE singlestore_docs_example; CREATE TABLE example_table(num INT); INSERT INTO example_table VALUES (5), (6), (10), (11), (12), (14), (20), (24), (33), (40);
The query below returns a random sample of the num
column. Each row in the sample has a seventy percent probability of being returned.
SELECT num FROM example_table WITH (SAMPLE_RATIO = 0.7) ORDER BY num;
Example output follows. You will likely see different results, as the query returns a random sample.
+------+ | num | +------+ | 6 | | 11 | | 14 | | 20 | | 24 | | 33 | +------+
The query below returns the average of a random sample of the num
column. Each row in the sample has an eighty percent probability being included in the sample and therefore in the AVG
computation.
SELECT AVG(num) FROM example_table WITH (SAMPLE_RATIO = 0.8);
Example output follows. You will likely see different results, as the query returns a random sample.
+----------+ | AVG(num) | +----------+ | 18.1111 | +----------+
Using WITH (SAMPLE_RATIO = <value>)
with SELECT
Subqueries
You may use WITH (SAMPLE_RATIO = <value>)
as part of a SELECT
subquery, as shown in the following example.
CREATE TABLE example_table_2 AS SELECT * FROM example_table WITH (SAMPLE_RATIO = 0.6); SELECT num FROM example_table2 ORDER BY num;
Example output follows. You will likely see different results, as the query returns a random sample.
+------+ | num | +------+ | 10 | | 12 | | 14 | | 24 | | 33 | | 40 | +------+
Using WITH (SAMPLE_RATIO = <value>)
with Joins
You may use WITH (SAMPLE_RATIO = <value>)
in a SELECT
statement that contains a join. However, you should do so carefully to ensure the join yields the results you intend. For example, consider joining a CUSTOMER
table to an ORDER
table using the customer_id
field in the ORDER
table. The placement of the WITH (SAMPLE_RATIO = <value>)
clause within the join will greatly impact the result, as shown below.
The following query selects a sample of customers and returns the orders for those customers only.
SELECT c.name, o.order_id, o.order_total FROM customer WITH (SAMPLE_RATIO = 0.4) c JOIN order o ON c.customer_id = o.customer_id
The following query selects a sample of all orders and returns only those orders along with their associated customers.
SELECT c.name, o.order_id, o.order_total FROM customer c JOIN order o WITH (SAMPLE_RATIO = 0.4) ON c.customer_id = o.customer_id