SELECT
On this page
Retrieves data from a table.
Syntax
[with_clause]SELECT[ALL | DISTINCT | DISTINCTROW]select_expr [[AS] alias], ...[COLLATE collation][INTO variable [, ...]][ FROM table_references[WHERE expr][GROUP BY [{{col_name | expr | position}, ...} | ALL]]| extended_grouping_expr}][HAVING expr][ORDER BY [{col_name | expr | position} [ASC | DESC ][ NULLS {FIRST | LAST}], ... | ALL]][LIMIT {[offset,] row_count | row_count OFFSET offset}][FOR UPDATE]][{ INTO OUTFILE 'file_name' |INTO FS 'destination_directory/file_name' [WITH COMPRESSION] |INTO S3 bucket/target CONFIG configuration_json CREDENTIALS credentials_json [WITH COMPRESSION] |INTO HDFS '<hdfs://<namenode DNS> | <IP address>:<port>/<directory>' [ CONFIG configuration_json ] |INTO GCS bucket/path CONFIG configuration_json CREDENTIALS credentials_json [WITH COMPRESSION] |INTO KAFKA kafka_topic_endpoint [kafka_configuration] [kafka_credentials] [KAFKA KEY <column_name>] |INTO AZURE "container/blob-prefix" CREDENTIALS credentials_json [WITH COMPRESSION]}[format_options]][INTO LINK [db name.]connection_name 'backup path']format_options:csv_options | external_format_optionscsv_options:[{FIELDS | COLUMNS}[TERMINATED BY 'string'][[OPTIONALLY] ENCLOSED BY 'char'][ESCAPED BY 'char']][LINES[STARTING BY 'string'][TERMINATED BY 'string']]external_format_options:FORMAT PARQUETtable_references:table_factor | join_tabletable_factor:tbl_name [[AS] alias] [sample_ratio_clause]| (subquery) [[AS] alias]
Remarks
-
The
join_
clause is documented in JOIN and Subqueries.table -
The
join
andusing
clause is documented in JOIN and USING. -
The
LATERAL
keyword can be placed before any subquery (also known as a lateral derived table) used as an inner table within a join.Refer to Lateral Join for details. -
COLLATE
specifies the collation of a string literal.SELECT "My string literal" COLLATE utf8mb4_unicode_ci; -
CONFIG
andCREDENTIALS
can be specified in either order (CONFIG
followed byCREDENTIALS
orCREDENTIALS
followed byCONFIG
).For configuration examples refer BACKUP DATABASE -
format_
clauses are documented in the Format Options section.options -
sample_
is documented in the SELECT … WITH (SAMPLE_ratio_ clause RATIO = <value>) section. -
with_
is documented on the WITH (Common Table Expressions) page.clause -
If you are querying against a columnstore table with a FULLTEXT index defined, see MATCH and HIGHLIGHT for syntax and examples on how to do full-text search queries.
-
Non-aggregated data can also be transformed into a pivot table output format.
See PIVOT for syntax and examples. -
extended_
clauses include CUBE and ROLLUP.grouping_ expr See CUBE and ROLLUP for syntax and examples. -
A subquery does not require an alias, assuming that removing the alias does not create ambiguity.
-
In a transaction, you can read from multiple databases.
-
You can specify a table named
DUAL
as a placeholder table for use with queries that do not reference a table.DUAL
contains one logical row.This convenience is provided for situations where you may be required to include a FROM clause in every SQL statement, even statements that do not need one in order to function. It can also help when porting applications from other database products that support DUAL
.SELECT CURRENT_DATE from DUAL;+--------------+ | CURRENT_DATE | +--------------+ | 2023-04-03 | +--------------+
SELECT CURRENT_
is identical toDATE from DUAL SELECT CURRENT_
.DATE SELECT CURRENT_DATE;+--------------+ | CURRENT_DATE | +--------------+ | 2023-04-03 | +--------------+
-
A column expression can reference a column alias.
For example, in the query SELECT a + b as c, c + 10 FROM t;
, the column expressionc + 10
references the column aliasc
. -
A
WHERE
clause can reference a column alias.For example, in the query SELECT a + b as c FROM t WHERE c < 100;
, theWHERE
clause references the column aliasc
. -
A
HAVING
clause can reference a column alias.For example, in the query SELECT a + 50 as b,COUNT(*) FROM t GROUP BY a HAVING b < 1000;
, theHAVING
clause references the column aliasb
. -
Queries using the
WHERE.
clause with multiple columns is not supported.. . IN SELECT * FROM <table_name_1> WHERE (column_name_1, column_name_2) IN (SELECT column_name_1, column_name_2 FROM <table_name_2>;) -
The following table and dataset were used for the examples
CREATE TABLE assets(asset_id INT,asset_type VARCHAR(50),asset_desc VARCHAR(50),asset_value DECIMAL(6,2),emp_id NUMERIC (5));INSERT INTO assets VALUES('1049','laptop','mac_book_pro','1999.00','14'),('49', 'cell_phone', 'iphone_12','879.00','102'),('1100', 'laptop', 'mac_book_pro','1999.00','298'),('2037', 'laptop', 'mac_book_air_M2','1199.00', '399'),('58', 'cell_phone', 'iphone_12', '879.00','410'),('130', 'cell_phone', 'iphone_13', '699','110'),('210', 'laptop', 'mac_book_pro','2500.00','312'),('111', 'laptop', 'mac_book_pro','2100.00', '089'),('099', 'laptop', 'mac_book_air_M1','999','075'),('140', 'cell_phone', 'iphone_13_pro','999.00', '263'),('2100', 'laptop', 'mac_book_pro_M2', '2500.00', '691'),('160', 'cell_phone', 'iphone_14_pro_max','1200.00', '691'),('2110', 'laptop', 'mac_book_pro_M2', '2500.00', '817'),('2120', 'laptop', 'mac_book_pro_M2', '2500.00', NULL),('150', 'cell_phone', 'iphone_14_pro_','1100.00', NULL); -
NULLS FIRST
andNULLS LAST
added to anORDER BY
clause sorts NULL values to the beginning or end of the results set.Ascending order is assumed for the non-NULL values. DESC
can be added to the clause to order the non-NULL values in descending order.SELECT * FROM assets ORDER BY emp_id NULLS FIRST;+----------+------------+-------------------+-------------+--------+ | asset_id | asset_type | asset_desc | asset_value | emp_id | +----------+------------+-------------------+-------------+--------+ | 2120 | laptop | mac_book_pro_M2 | 2500.00 | NULL | | 150 | cell_phone | iphone_14_pro_ | 1100.00 | NULL | | 1049 | laptop | mac_book_pro | 1999.00 | 14 | | 99 | laptop | mac_book_air_M1 | 999.00 | 75 | | 111 | laptop | mac_book_pro | 2100.00 | 89 | | 49 | cell_phone | iphone_12 | 879.00 | 102 | | 130 | cell_phone | iphone_13 | 699.00 | 110 | | 140 | cell_phone | iphone_13_pro | 999.00 | 263 | | 1100 | laptop | mac_book_pro | 1999.00 | 298 | | 210 | laptop | mac_book_pro | 2500.00 | 312 | | 2037 | laptop | mac_book_air_M2 | 1199.00 | 399 | | 58 | cell_phone | iphone_12 | 879.00 | 410 | | 2100 | laptop | mac_book_pro_M2 | 2500.00 | 691 | | 160 | cell_phone | iphone_14_pro_max | 1200.00 | 691 | | 2110 | laptop | mac_book_pro_M2 | 2500.00 | 817 | +----------+------------+-------------------+-------------+--------+
SELECT * FROM assets ORDER BY emp_id DESC NULLS FIRST;+----------+------------+-------------------+-------------+--------+ | asset_id | asset_type | asset_desc | asset_value | emp_id | +----------+------------+-------------------+-------------+--------+ | 150 | cell_phone | iphone_14_pro_ | 1100.00 | NULL | | 2120 | laptop | mac_book_pro_M2 | 2500.00 | NULL | | 2110 | laptop | mac_book_pro_M2 | 2500.00 | 817 | | 2100 | laptop | mac_book_pro_M2 | 2500.00 | 691 | | 160 | cell_phone | iphone_14_pro_max | 1200.00 | 691 | | 58 | cell_phone | iphone_12 | 879.00 | 410 | | 2037 | laptop | mac_book_air_M2 | 1199.00 | 399 | | 210 | laptop | mac_book_pro | 2500.00 | 312 | | 1100 | laptop | mac_book_pro | 1999.00 | 298 | | 140 | cell_phone | iphone_13_pro | 999.00 | 263 | | 130 | cell_phone | iphone_13 | 699.00 | 110 | | 49 | cell_phone | iphone_12 | 879.00 | 102 | | 111 | laptop | mac_book_pro | 2100.00 | 89 | | 99 | laptop | mac_book_air_M1 | 999.00 | 75 | | 1049 | laptop | mac_book_pro | 1999.00 | 14 | +----------+------------+-------------------+-------------+--------+
SELECT * FROM assets ORDER BY emp_id DESC NULLS LAST;+----------+------------+-------------------+-------------+--------+ | asset_id | asset_type | asset_desc | asset_value | emp_id | +----------+------------+-------------------+-------------+--------+ | 2110 | laptop | mac_book_pro_M2 | 2500.00 | 817 | | 2100 | laptop | mac_book_pro_M2 | 2500.00 | 691 | | 160 | cell_phone | iphone_14_pro_max | 1200.00 | 691 | | 58 | cell_phone | iphone_12 | 879.00 | 410 | | 2037 | laptop | mac_book_air_M2 | 1199.00 | 399 | | 210 | laptop | mac_book_pro | 2500.00 | 312 | | 1100 | laptop | mac_book_pro | 1999.00 | 298 | | 140 | cell_phone | iphone_13_pro | 999.00 | 263 | | 130 | cell_phone | iphone_13 | 699.00 | 110 | | 49 | cell_phone | iphone_12 | 879.00 | 102 | | 111 | laptop | mac_book_pro | 2100.00 | 89 | | 99 | laptop | mac_book_air_M1 | 999.00 | 75 | | 1049 | laptop | mac_book_pro | 1999.00 | 14 | | 150 | cell_phone | iphone_14_pro_ | 1100.00 | NULL | | 2120 | laptop | mac_book_pro_M2 | 2500.00 | NULL | +----------+------------+-------------------+-------------+--------+
-
ORDER BY ALL
sorts by all the columns in the same order they are written.ASC/DESC may be used but it will apply to all the columns. SELECT asset_id, asset_type, asset_desc FROM assets ORDER BY ALL;+----------+------------+-------------------+ | asset_id | asset_type | asset_desc | +----------+------------+-------------------+ | 49 | cell_phone | iphone_12 | | 58 | cell_phone | iphone_12 | | 99 | laptop | mac_book_air_M1 | | 111 | laptop | mac_book_pro | | 130 | cell_phone | iphone_13 | | 140 | cell_phone | iphone_13_pro | | 150 | cell_phone | iphone_14_pro_ | | 160 | cell_phone | iphone_14_pro_max | | 210 | laptop | mac_book_pro | | 1049 | laptop | mac_book_pro | | 1100 | laptop | mac_book_pro | | 2037 | laptop | mac_book_air_M2 | | 2100 | laptop | mac_book_pro_M2 | | 2110 | laptop | mac_book_pro_M2 | | 2120 | laptop | mac_book_pro_M2 | +----------+------------+-------------------+
-
The
GROUP BY
clause is used with aSELECT
statement to organize data intoGROUPS
.Aggregates are applied within those groups. One row of output is produced for each unique combination of values for the specified grouping expressions. SELECT asset_type, SUM(asset_value) AS total_valueFROM assets GROUP BY asset_type;+------------+-------------+ | asset_type | total_value | +------------+-------------+ | cell_phone | 5756.00 | | laptop | 18296.00 | +------------+-------------+
-
The
GROUP BY ALL
clause causes the query to group by all non-aggregate expressions in the SELECT list, so all the columns to be grouped by do not have to be entered explicitly.SELECT asset_id, asset_type, asset_desc, SUM(asset_value)FROM assets GROUP BY asset_id, asset_type, asset_desc;SELECT asset_id, asset_type, asset_desc, SUM(asset_value)FROM assets GROUP BY ALL;+----------+------------+-------------------+------------------+ | asset_id | asset_type | asset_desc | SUM(asset_value) | +----------+------------+-------------------+------------------+ | 160 | cell_phone | iphone_14_pro_max | 1200.00 | | 2100 | laptop | mac_book_pro_M2 | 2500.00 | | 2110 | laptop | mac_book_pro_M2 | 2500.00 | | 140 | cell_phone | iphone_13_pro | 999.00 | | 99 | laptop | mac_book_air_M1 | 999.00 | | 210 | laptop | mac_book_pro | 2500.00 | | 150 | cell_phone | iphone_14_pro_ | 1100.00 | | 58 | cell_phone | iphone_12 | 879.00 | | 49 | cell_phone | iphone_12 | 879.00 | | 1049 | laptop | mac_book_pro | 1999.00 | | 2120 | laptop | mac_book_pro_M2 | 2500.00 | | 1100 | laptop | mac_book_pro | 1999.00 | | 130 | cell_phone | iphone_13 | 699.00 | | 2037 | laptop | mac_book_air_M2 | 1199.00 | | 111 | laptop | mac_book_pro | 2100.00 | +----------+------------+-------------------+------------------+
-
SELECT .
a 3rd party file storage system (S3, HDFS, Azure Blob, GSC, or Kafka) requires. . INTO SELECT
,FILE WRITE
, andOUTBOUND
privileges.Without the required privileges, an error will be generated:
SELECT ... INTO:singlestore> SELECT * FROM t1INTO S3 'testing/output'CONFIG '{"region":"us-east-1"}'CREDENTIALS '{"aws_access_key_id":"your_access_key_id","aws_secret_access_key":"your_secret_access_key"}';ERROR 1142 (42000): OUTBOUND command denied to user 'user1'@'%' for table 't1'Note
Users will still be able to execute
SELECT .
but only if outbound connectivity is not created.. . FROM; Here's how to set the privileges for a user to be allowed to execute this command:
SELECT ... INTO:singlestore> GRANT SELECT, FILE WRITE, OUTBOUND ON *.* to user1;The
OUTBOUND
privilege will be displayed in theSHOW GRANTS
command:SHOW GRANTS FOR user1;+-----------------------------------------------------------+ | Grants for user1@% | +-----------------------------------------------------------+ | GRANT SELECT, FILE WRITE, OUTBOUND ON *.* TO 'user1'@'% | +-----------------------------------------------------------+
-
Below is an S3 example for
SELECT * FROM <TABLE> INTO OUTFILE
using compression.SELECT * FROM <table_name> into S3 's3://<bucket_name>/<filename>'CONFIG '{"region":"us-east-1"}'CREDENTIALS{"aws_access_key_id": "<xxxxxxxx>","aws_secret_access_key": "<xxxxxxxxxxxxxxx>"}'FIELDS TERMINATED BY ','LINES TERMINATED BY '\n'WITH COMPRESSION GZIP;
Aggregations for Expression Syntax
SingleStore supports these Aggregate Functions for expression syntax in SELECT
statements:
All aggregate functions exclude NULLs from their computations.
COUNT(*)
is equivalent to COUNT(1)
.
SELECT … LIMIT
Syntax
LIMIT {[offset,] row_count | row_count OFFSET offset}
Remarks
-
The
LIMIT
clause constrains the number of rows returned by theSELECT
statement. -
Both the arguments must be non-negative integer constants.
-
The
row_
specifies the number of rows to return from the beginning of the result set, and thecount offset
specifies the offset of the first row to return. -
The offset of the first row in a table is 0 (not 1).
Examples
SELECT * FROM hrRec;
+-----------+-----------+----------+--------+
| FirstName | LastName | City | Tenure |
+-----------+-----------+----------+--------+
| Adam | Gale | Brooklyn | 40 |
| Samantha | Beck | New York | 44 |
| Clara | Wakefield | DC | 24 |
| Todd | Bridges | DC | 30 |
| Ron | Fletcher | New York | 23 |
+-----------+-----------+----------+--------+
SELECT * FROM hrRec LIMIT 2;
+-----------+----------+----------+--------+
| FirstName | LastName | City | Tenure |
+-----------+----------+----------+--------+
| Adam | Gale | Brooklyn | 40 |
| Samantha | Beck | New York | 44 |
+-----------+----------+----------+--------+
SELECT * FROM hrRec LIMIT 1,2;
+-----------+-----------+----------+--------+
| FirstName | LastName | City | Tenure |
+-----------+-----------+----------+--------+
| Samantha | Beck | New York | 44 |
| Clara | Wakefield | DC | 24 |
+-----------+-----------+----------+--------+
SELECT * FROM hrRec LIMIT 2 OFFSET 1;
+-----------+-----------+----------+--------+
| FirstName | LastName | City | Tenure |
+-----------+-----------+----------+--------+
| Samantha | Beck | New York | 44 |
| Clara | Wakefield | DC | 24 |
+-----------+-----------+----------+--------+
SELECT … FOR UPDATE
The SELECT .
command is intended to be used inside of a transaction.SELECT
query and the locks are held until the end of the transaction.
SingleStore recommends the following when using SELECT .
:
-
SingleStore recommends to commit or abort the transaction immediately so that the locks are released.
Too many locks in a transaction can result in a huge queue of transactions waiting on the locked rows. -
To limit the number of rows that are locked, SingleStore recommends to use a column with unique values in the
WHERE
clause of theSELECT
statement, for example thePRIMARY KEY
column.
Example
The following example uses the Orders
table:
DESCRIBE Orders;
+--------------+-------------+------+------+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------------+-------------+------+------+---------+-------+
| OrderNumber | int(11) | NO | PRI | NULL | |
| Status | varchar(20) | NO | | NULL | |
| CustomerName | char(20) | YES | | NULL | |
+--------------+-------------+------+------+---------+-------+
SELECT * FROM Orders;
+-------------+------------+--------------+
| OrderNumber | Status | CustomerName |
+-------------+------------+--------------+
| 1 | Delivered | John |
| 3 | In Transit | Bon |
| 2 | Delivered | Kerry |
| 4 | Delivered | Tom |
+-------------+------------+--------------+
The following transaction locks the row where OrderNumber
is 3
, using the FOR UPDATE
clause.
BEGIN;SELECT * FROM Orders WHERE OrderNumber = 3 FOR UPDATE;
+-------------+------------+--------------+
| OrderNumber | Status | CustomerName |
+-------------+------------+--------------+
| 3 | In Transit | Bon |
+-------------+------------+--------------+
Now, execute the following query in a different connection:
UPDATE Orders SET Status = "Delivered" WHERE OrderNumber=3;
ERROR 1205 (HY000): Leaf Error (127.0.0.1:3307): Lock wait timeout exceeded; try restarting transaction. Lock owned by connection id 77, query `open idle transaction`
The above query returns an error since the rows are locked by the previous transaction.
JOIN and Subqueries
Syntax
SingleStore Helios supports the following JOIN
and subquery syntax for the table_
part of SELECT
statements:
join_table:
table_reference {LEFT | RIGHT | FULL} [OUTER] JOIN table_factor join_condition
| table_reference [INNER | CROSS] JOIN table_factor [join_condition]
| table_reference NATURAL {LEFT | RIGHT} [OUTER] JOIN table_factor
| table_reference STRAIGHT_JOIN table_factor [join_condition]
| table_reference, table_factor
| table_reference LEFT [OUTER] JOIN LATERAL (subquery) [[AS] alias] join_condition
| table_reference [INNER | CROSS] JOIN LATERAL (subquery) [[AS] alias] [join_condition]
| table_reference, LATERAL (subquery) [[AS] alias]
join_condition:
ON conditional_expr
Remarks
-
STRAIGHT_
forces tables to be joined in the order in which they are listed in theJOIN FROM
clause -
FULL OUTER JOIN
requires the join condition to be an equality. -
SingleStore supports joining tables across databases:
CREATE DATABASE test1;USE test1;CREATE TABLE t1(id INT, col1 VARCHAR(10));CREATE DATABASE test2;USE test2;CREATE TABLE t2(id INT, col1 VARCHAR(10));SELECT test1.t1.*, test2.t2.* FROM test1.t1 JOIN test2.t2 ON t1.id = t2.id;
Examples
SELECT * FROM my_MemSQL_table WHERE col = 1;SELECT COUNT(*), user_name, page_url from clicks, users, pages-> WHERE clicks.user_id = users.user_id AND pages.page_id = clicks.page_id-> GROUP BY users.user_id, pages.page_id-> ORDER BY COUNT(*) DESC;+- ---------+- ----------+- -------------------------------+| COUNT(*) | user_name | page_url |+- ---------+- ----------+- -------------------------------+| 5 | jake | memsql.com || 2 | john | http://www.singlestore.com/download || 1 | jake | docs.singlestore.com || 1 | jake | memsql.com || 1 | jake | http://www.singlestore.com/download |+- ---------+- ----------+- -------------------------------+5 rows in set (0.00 sec)
SELECT t1.*, t2.* FROM t1 FULL OUTER JOIN t2 ON t1.a = t2.a;
Nested Scalar Sub-Selects
SELECT
statements can be nested in SingleStore Helios queries.
SELECT ... [SELECT ...[SELECT [...]]]
Remarks
-
For scalar sub-selects, sub-select queries must not return more than one row.
-
The maximum allowed depth of nested sub-select queries is 40.
-
Sub-selects are not supported inside
GROUP BY/ORDER BY/HAVING
clauses, for nested sub-select queries of depth > 2.
Examples
The following examples show the use of nested sub-selects.
SELECT cust_id FROM customersWHERE EXISTS( SELECT order_id FROM ordersWHERE order_id IN( SELECT id FROM transaction WHERE count > 5));
DELETE FROM recordsWHEREid = ( SELECT order_idFROM ordersWHERE order_date > ( SELECT CURRENT_DATE() + 30));
Lateral Join
Lateral join allows a subquery in the FROM
clause of a SQL query to reference another table in that same FROM
clause.LATERAL
keyword such references are allowed.
For example, the following query will generate an error because the subquery (SELECT b FROM t2 WHERE b = t1.
references t1
which is not allowed.
SELECT *FROM t1 LEFT JOIN (SELECT b FROM t2 WHERE b = t1.a) AS t3 ...;
Adding the LATERAL
keyword before the subquery, as shown below, makes the reference to t1
allowed and the following query is legal.(SELECT b from t2 WHERE b = t1.
is referred to as a lateral derived table, and the clause b = t1.
is referred to as the correlating condition.
SELECT *FROM t1 LEFT JOIN LATERAL (SELECT b FROM t2 WHERE b = t1.a) AS t3 ...;
Syntax
Lateral joins use the same syntax as joins with the LATERAL
keyword inserted before the lateral derived table as shown in the queries above.
join_table:table_reference LEFT [OUTER] JOIN LATERAL (subquery) [[AS] alias] join_condition| table_reference [INNER | CROSS] JOIN LATERAL (subquery) [[AS] alias] [join_condition]| table_reference, LATERAL (subquery) [[AS] alias]join_condition:ON conditional_expr
The (subquery) [[AS] alias]
clause is referred to as the lateral derived table.
Remarks
The following are restrictions on lateral join:
-
Lateral derived tables can appear only in
FROM
clauses.A comma-separated list of tables or a join specification with JOIN
,INNER JOIN
,CROSS JOIN
,LEFT [OUTER] JOIN
are supported. -
The correlating condition can appear only in the
WHERE
clause of the lateral derived table. -
The correlating condition must be an equality condition (uses =).
-
The correlating condition may use
AND
, but may not useOR
. -
Lateral derived tables may not use
GROUP BY
. -
If a lateral derived table references an aggregate function, the function's aggregation query cannot be the one that owns the
FROM
clause in which the lateral derived table occurs. -
If the lateral derived table has an
ORDER BY … LIMIT
clause, then theLIMIT
must be>0
. -
TABLE
functions are implicitly executed as a lateral join.The LATERAL
keyword is not allowed beforeTABLE
functions.
Examples
These examples use a table of artists and a table of albums by those artists.
The statements below create artists
and albums
tables and insert data into those tables.artist_
.
CREATE TABLE artists(id INT, name TEXT);CREATE TABLE albums(album_id INT, title TEXT,genre ENUM("R&B", "Soft Rock", "Jazz"),artist_id INT, sales DOUBLE);INSERT INTO artists VALUES(1, "Fleetwood Mac"),(2, "Whitney Huston"),(3, "The Eagles"),(4, "Norah Jones"),(5, "ABBA");INSERT INTO albums VALUES(1, "The Bodyguard", "R&B", 2, 45),(2, "Their Greatest Hits (1971-1975)", "Soft Rock", 3, 44),(3, "Hotel California", "Soft Rock", 3, 42),(4, "Rumours", "Soft Rock", 1, 40),(5, "Come Away with Me", "Jazz", 4, 28);
Example 1 - INNER JOIN - Maximum Album Sales by Artist
The following query calculates, for each artist, the sales of their highest-selling album.LATERAL
or with a GROUP BY
as shown below.
The query below uses a comma-separated list of tables to specify the join.
SELECT AR.name, AL1.max_salesFROM artists AR,LATERAL (SELECT MAX(sales) as max_salesFROM albums WHERE artist_id = AR.id) AS AL1ORDER BY AL1.max_sales DESC;
+----------------+-----------+
| name | max_sales |
+----------------+-----------+
| Whitney Huston | 45 |
| The Eagles | 44 |
| Fleetwood Mac | 40 |
| Norah Jones | 28 |
+----------------+-----------+
LATERAL
can also be used with a JOIN
specification as shown in the query below.
SELECT AR.name, AL1.max_salesFROM artists ARJOIN LATERAL (SELECT MAX(sales) as max_salesFROM albumsWHERE artist_id = AR.id) AS AL1ORDER BY AL1.max_sales DESC;
+----------------+-----------+
| name | max_sales |
+----------------+-----------+
| Whitney Huston | 45 |
| The Eagles | 44 |
| Fleetwood Mac | 40 |
| Norah Jones | 28 |
+----------------+-----------+
The query below shows the same query with a GROUP BY
instead of a LATERAL
.
SELECT name, MAX(sales)FROM artists JOIN albums ON artists.id = albums.artist_idGROUP BY artists.idORDER BY sales DESC;
+----------------+------------+
| name | MAX(sales) |
+----------------+------------+
| Whitney Huston | 45 |
| The Eagles | 44 |
| Fleetwood Mac | 40 |
| Norah Jones | 28 |
+----------------+------------+
Example 2 - LEFT JOIN - Include Artists with No Album Sales Data
The queries in Example 1 found the maximum album sales for each artist, but did not include artists without album sales data.LEFT JOIN
.
The following query uses LEFT JOIN LATERAL
to find the maximum sales of any album by an artist including artists for which there is no album sales data.GROUP BY
.LEFT OUTER JOIN
.
LEFT JOIN
requires a join condition, thus an ON
clause is required despite the fact that the join is specified in the lateral derived table.ON = TRUE
).INNER JOIN LATERAL
could also be in this query.
SELECT AR.name, AL1.max_salesFROM artists ARLEFT JOIN LATERAL (SELECT sales, MAX(sales) as max_salesFROM albums WHERE artist_id = AR.id) AS AL1 ON TRUEORDER BY AL1.max_sales DESC;
+----------------+-----------+
| name | max_sales |
+----------------+-----------+
| Whitney Huston | 45 |
| The Eagles | 44 |
| Fleetwood Mac | 40 |
| Norah Jones | 28 |
| ABBA | NULL |
+----------------+-----------+
Example 3 - Most Sales by Artist - With Album Name
The queries in Example 1 found the maximum album sales for each artist, but did not include the title of the album with those sales.
SELECT AR.name AS artist, AL2.title, AL1.max_salesFROM artists ARJOIN LATERAL (SELECT MAX(sales) as max_salesFROM albumsWHERE artist_id = AR.id) AS AL1 ON TRUEJOIN LATERAL (SELECT titleFROM albumsWHERE artist_id = AR.idAND sales = AL1.max_sales) AS AL2 ON TRUEORDER BY AL1.max_sales DESC;
+----------------+---------------------------------+-----------+
| artist | title | max_sales |
+----------------+---------------------------------+-----------+
| Whitney Huston | The Bodyguard | 45 |
| The Eagles | Their Greatest Hits (1971-1975) | 44 |
| Fleetwood Mac | Rumours | 40 |
| Norah Jones | Come Away with Me | 28 |
+----------------+---------------------------------+-----------+
Lateral joins are implemented with query rewrites.EXPLAIN DUPLICATE
can be used to show the rewritten query.LATERAL
can often simplify a query.
EXPLAIN DUPLICATESELECT AR.name AS artist, AL2.title, AL1.max_salesFROM artists ARJOIN LATERAL (SELECT MAX(sales) as max_salesFROM albumsWHERE artist_id = AR.id) AS AL1 ON TRUEJOIN LATERAL (SELECT titleFROM albumsWHERE artist_id = AR.idAND sales = AL1.max_sales) AS AL2 ON TRUEORDER BY AL1.max_sales DESC;
SELECT `artists_4`.`name` AS `artist`, `SUBQ_VWW_2_6`.`title` AS `title`,
`SUBQ_VWW_1_5`.`max_sales` AS `max_sales`
FROM (( (
SELECT WITH(NO_MERGE_THIS_SELECT=1) `artists_4_0`.`name` AS `name`,
`artists_4_0`.`id` AS `id`
FROM `test`.`artists` as `artists_4_0` ) AS `artists_4`
JOIN (
SELECT WITH(NO_MERGE_THIS_SELECT=1) `SUBQ_VWW_1_5_0`.
`max_sales` AS `max_sales`, `SUBQ_VWW_1_5_0`.`artist_id` AS `artist_id`
FROM (
SELECT `albums_2`.`artist_id` AS `artist_id`,
MAX(`albums_2`.`sales`) AS `max_sales`
FROM `test`.`albums` as `albums_2` GROUP BY 1 ) AS `SUBQ_VWW_1_5_0` )
AS `SUBQ_VWW_1_5`)
JOIN (
SELECT WITH(NO_MERGE_THIS_SELECT=1) `SUBQ_VWW_2_6_0`.`title` AS `title`,
`SUBQ_VWW_2_6_0`.`artist_id` AS `artist_id`,
`SUBQ_VWW_2_6_0`.`sales` AS `sales`
FROM `test`.`albums` as `SUBQ_VWW_2_6_0` ) AS `SUBQ_VWW_2_6`)
WHERE ((`SUBQ_VWW_2_6`.`artist_id` = `artists_4`.`id`)
AND (`SUBQ_VWW_2_6`.`sales` = `SUBQ_VWW_1_5`.`max_sales`)
AND (`SUBQ_VWW_1_5`.`artist_id` = `artists_4`.`id`)
AND (`SUBQ_VWW_1_5`.`artist_id` = `SUBQ_VWW_2_6`.`artist_id`))
ORDER BY 3 DESC
Example 4 - Album with the Most Sales by Genre
This example attempts to find the albums with the most sales by genre.LATERAL
which does work to answer this question.LATERAL
.
The following is an attempt to find albums with the most sales by genre using a GROUP BY
.
SELECT genre, MAX(sales)FROM albumsGROUP BY genre;
+-----------+------------+
| genre | MAX(sales) |
+-----------+------------+
| R&B | 45 |
| Jazz | 28 |
| Soft Rock | 44 |
+-----------+------------+
To try to add the album titles, the query below adds the album title to the SELECT
statement.GROUP BY
attributes.
SELECT genre, title, MAX(sales)FROM albumsGROUP BY genre;
Query run with strict ANSI SQL and the error message produced.
SET SQL_MODE = ANSI;SELECT genre, title, MAX(sales)FROM albumsGROUP BY genre;
ERROR 2295 (HY000): Column 'albums.title' from project list
is used outside an aggregate function and does not appear in
the GROUP BY clause. This is illegal because there may be
multiple possible values for the field. If it is in fact
uniquely determined, or if you wish to use one of the many
possible values, you can use the ANY_VALUE aggregate function.
Query run without strict ANSI SQL and the warnings produced.SHOW WARNINGS
; however, those warnings only occur the first time the query is run.
SELECT genre, title, MAX(sales)FROM albumsGROUP BY genre;SHOW WARNINGS;
Warning | 2295 | Column 'albums.title' from project list is used
outside an aggregate function and does not appear in the GROUP BY
clause. This is illegal because there may be multiple possible values
for the field. If it is in fact uniquely determined, or if you wish to
use one of the many possible values, you can use the ANY_VALUE aggregate
function.
Finally, the query below uses JOIN LATERAL
to find the albums with the most sales by genre.AL
must be used in the inner queries to refer to the albums
table instance in the outer query.
SELECT AL2.genre, AL2.title, AL1.max_salesFROM albums ALJOIN LATERAL (SELECT MAX(sales) as max_salesFROM albumsWHERE genre = AL.genre) AS AL1 ON TRUEJOIN LATERAL (SELECT title, genreFROM albumsWHERE title = AL.titleAND genre = AL.genreAND sales = AL1.max_sales) AS AL2 ON TRUEORDER BY AL1.max_sales DESC;
+-----------+---------------------------------+-----------+
| genre | title | max_sales |
+-----------+---------------------------------+-----------+
| R&B | The Bodyguard | 45 |
| Soft Rock | Their Greatest Hits (1971-1975) | 44 |
| Jazz | Come Away with Me | 28 |
+-----------+---------------------------------+-----------+
If the dataset was larger and the request was to find the top 10 best selling albums per genre, that query will require a LATERAL
or alternatively a window function.
Example 5 - TABLE
When the TABLE
keyword is used, there is an implicit lateral join.LATERAL
keyword is not allowed to be used with TABLE
.t
with a TABLE
expression is allowed; the second query will return an error.
SELECT * FROM t, TABLE(JSON_TO_ARRAY('[1,2,3]'));SELECT * FROM t, LATERAL TABLE(JSON_TO_ARRAY('[1,2,3]'));
Example 6 - RIGHT JOIN
RIGHT JOIN LATERAL
is not supported and will produce the following error.
SELECT artists.name, AL1.max_salesFROM artistsRIGHT JOIN LATERAL (SELECT sales, MAX(sales) as max_salesFROM albums WHERE artist_id = artists.id) AS AL1 ON TRUEORDER BY AL1.max_sales DESC;
ERROR 1054 (42S22): Unknown column 'artists' in 'previous table'
Example 7 - Access to Multiple Levels Up
Lateral joins can reference fields from more than one level up in the query tree.AR.
references a table two levels up in the query tree.
SELECT AR.name AS artist, AL2.max_salesFROM artists ARJOIN LATERAL(SELECT FORMAT(AL1.max_sales_in_mil*1000000,0) as max_salesFROM(SELECT MAX(sales) as max_sales_in_milFROM albumsWHERE artist_id = AR.id) AS AL1) AS AL2 ON TRUEORDER BY AL2.max_sales DESC;
+----------------+------------+
| artist | max_sales |
+----------------+------------+
| Whitney Huston | 45,000,000 |
| The Eagles | 44,000,000 |
| Fleetwood Mac | 40,000,000 |
| Norah Jones | 28,000,000 |
+----------------+------------+
JOIN and USING
Syntax
Using a JOIN
clause with the USING
clause will match only one column when there are more than one columns that match.
SELECT <table1_name>.<column_name> AS <column_name>FROM <table_name> JOIN <table_name> USING (<column_name>);
Remarks
-
You can utilize the
USING
clause instead of theON
clause inJOIN
operations with an explicitJOIN
clause.
Examples
TABLE 1:CREATE TABLE assets(asset_id int,asset_type varchar(50),asset_desc varchar(50),emp_id numeric (5));TABLE 2:CREATE TABLEemployees(emp_id numeric (5),emp_name varchar(75));SELECT employees.emp_name AS employees, asset_desc as assets, assets.asset_typeFROM assets JOIN employees USING (emp_id);
+------------------------------------+
|employees |assets |asset_type |
+------------------------------------+
|T_Willams |macbook pro | laptop |
+------------------------------------+
|A_Young |iphone 12 | cell phone |
+------------------------------------+
|D_Karras |macbook air | laptop |
+------------------------------------+
SELECT … INTO <variable>
SELECT .
statement is used to initialize variables in stored procedures, anonymous code blocks, and session variables.
For information about creating user defined variables for use outside of stored procedures, see User-Defined Variables.
Remarks
-
The
SELECT .
statement must return only a single result row.. . INTO -
The number of columns/expressions in the
SELECT
query must be the same as the number of variables being initialized in theINTO
list. -
SELECT .
statements must be used inside PSQL procedure blocks.. . INTO variable -
The
INTO variable
clause can only be used once inside aSELECT
query. -
A
SELECT .
statement cannot be used inside of a sub-select query.. . INTO -
The variables in
SELECT .
statements must be declared with scalar data types.. . INTO -
SELECT .
and. . INTO variable SELECT .
cannot be used in the same query.. . INTO OUTFILE/S3/KAFKA -
If the
SELECT .
statement returns. . INTO 0
rows, SingleStore throws an error:ER_
.INTO_ VARIABLES_ NO_ ROWS To accommodate for this error, you can specify an EXCEPTION
in theDECLARE
block of a stored procedure (see Example 3 below).
Examples
Note: The following examples use the hrRec table.
DESC hrRec;
+-----------+-------------+------+------+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------+-------------+------+------+---------+-------+
| FirstName | varchar(20) | YES | | NULL | |
| LastName | varchar(20) | YES | | NULL | |
| City | varchar(20) | YES | | NULL | |
| Tenure | int(11) | YES | | NULL | |
+-----------+-------------+------+------+---------+-------+
Example 1
The following example queries multiple columns from a single row and stores them in variables.
DELIMITER //CREATE OR REPLACE PROCEDURE into_var ()ASDECLAREfname VARCHAR(20); lname VARCHAR(20); city VARCHAR(20); ten INT;BEGINSELECT FirstName, LastName, City, TenureINTO fname, lname, city, tenFROM hrRecWHERE Tenure > 40;ECHO SELECT CONCAT(fname, " ", lname) AS "Name", city AS "City", ten AS "Tenure";END //DELIMITER ;CALL into_var();
+---------------+----------+--------+
| Name | City | Tenure |
+---------------+----------+--------+
| Samantha Beck | New York | 44 |
+---------------+----------+--------+
Example 2
The following example queries multiple aggregate functions and stores their values in variables.
DELIMITER //CREATE OR REPLACE PROCEDURE into_var () ASDECLARErow_c INT; sum_t INT;BEGINSELECT COUNT(*), SUM(Tenure) INTO row_c, sum_t FROM hrRec;ECHO SELECT row_c AS "Row Count", sum_t AS "Tenure";END //DELIMITER ;CALL into_var();
+-----------+--------+
| Row Count | Tenure |
+-----------+--------+
| 5 | 170 |
+-----------+--------+
Alternatively, you can query the values in a dynamic query and store them in variables, like
DELIMITER //CREATE OR REPLACE PROCEDURE into_var () ASDECLARErow_c INT; sum_t INT; qry VARCHAR(50);BEGINqry = "SELECT COUNT(*), SUM(Tenure) FROM hrRec";EXECUTE IMMEDIATE qry INTO row_c, sum_t;ECHO SELECT row_c AS "Row Count", sum_t AS "Tenure";END //DELIMITER ;CALL into_var();
+-----------+--------+
| Row Count | Tenure |
+-----------+--------+
| 5 | 170 |
+-----------+--------+
Example 3
The following example shows how to assign a value to a variable if the SELECT .
statement returns 0
rows.
DELIMITER //CREATE OR REPLACE PROCEDURE cityCount () ASDECLARE cnum INT;BEGINSELECT City INTO cnum FROM hrRec WHERE Tenure > 50;ECHO SELECT cnum;END //DELIMITER ;CALL cityCount();
ERROR 2439 (HY000): Unhandled exception
Type: ER_INTO_VARIABLES_NO_ROWS (2439)
Message: Query with 'INTO VARIABLES' clause returned zero rows whereas expected 1 row
Add an exception to the stored procedure.
DELIMITER //CREATE OR REPLACE PROCEDURE cityCount () ASDECLARE cnum INT;BEGINSELECT City INTO cnum FROM hrRec WHERE Tenure > 50;ECHO SELECT cnum;EXCEPTIONWHEN ER_INTO_VARIABLES_NO_ROWS THENcnum = 'No city found!';ECHO SELECT cnum;END //DELIMITER ;CALL cityCount();
+----------------+
| cnum |
+----------------+
| No city found! |
+----------------+
You can also add an exception for a scenario where the SELECT .
statement returns more than one row.
EXCEPTIONWHEN ER_SUBQUERY_NO_1_ROW THENECHO SELECT 'SELECT ... INTO statement returned more than 1 row.';
SELECT . . . ALL DISTINCT DISTINCTROW
Return rows with or without duplicates.
Syntax
SELECT [ALL | DISTINCT | DISTINCTROW] FROM table_reference
Remarks
-
The
ALL
clause returns all the matching rows including duplicates.The
DISTINCT
orDISTINCTROW
clause returns the matching rows but eliminates the duplicate rows from the output.DISTINCTROW
is a synonym ofDISTINCT
.
Examples
SELECT * FROM Product;
+------------+------------+----------+
| Product_id | Brand_name | City |
+------------+------------+----------+
| 1 | Nike | London |
| 3 | Nike | New York |
| 2 | Adidas | Paris |
| 3 | Puma | Spain |
+------------+------------+----------+
SELECT ALL Brand_name FROM Product;
+------------+
| Brand_name |
+------------+
| Nike |
| Nike |
| Adidas |
| Puma |
+------------+
SELECT DISTINCT Brand_name FROM Product;
+------------+
| Brand_name |
+------------+
| Nike |
| Adidas |
| Puma |
+------------+
SELECT . . . INTO AZURE
SELECT .
performs a distributed SELECT
into an Azure container.CREDENTIALS
clause is required and its value is defined in credentials_
as follows:
credentials_json:
'{"account_name": "your_account_name_here",
"account_key": "your_account_key_here"
}'
The WITH COMPRESSION GZIP
clause, if specified, must be included after the CREDENTIALS
clause.WITH COMPRESSION GZIP
is specified, the results of the SELECT
query are written to the Azure container in compressed gzip files.
The WITH COMPRESSION
clause can be specified instead of the WITH COMPRESSION GZIP
clause.WITH COMPRESSION
may use a compression format other than gzip.
format_
, if specified, must be included after the CREDENTIALS
clause, or after the WITH COMPRESSION clause (if it is specified).format_
are similar to the parsing options used with LOAD DATA
.
format_options:
[{FIELDS | COLUMNS}
[TERMINATED BY 'string']
[[OPTIONALLY] ENCLOSED BY 'char']
[ESCAPED BY 'char']
]
[LINES
[STARTING BY 'string']
[TERMINATED BY 'string']
]
As an alternative to using a SELECT .
statement where you specify the CREDENTIALS
clause, you can use a SELECT .
statement, where you reference a connection link.
SELECT … INTO HDFS
SELECT INTO .
writes the result of a SELECT
query to HDFS.SELECT INTO .
are similar to the HDFS Pipelines syntax.
Example
The following example demonstrates how to write the contents of a table to HDFS at a specified path.
SELECT * FROM stockToINTO HDFS 'hdfs://hadoop-namenode:8020/stock_dir/records_file.csv'
SELECT … INTO LINK
SELECT .
writes the results of a SELECT
query to S3, GCS, HDFS, or Kafka using a connection link.SHOW LINK
permission, provided by your administrator, to use a connection link.
Note
SASL OAUTHBEARER is supported with CREATE PIPELINE
but not with SELECT INTO KAFKA
.
Example
The following example writes the contents of the table t1
, to the S3 bucket at the specified path, using the S3 connection link S3con
stored in the db1
database:
USE db1;SELECT * FROM t1 INTO LINK S3con 'testing/output';
Note: The connection link S3Con
should already exist in db1
.
SELECT … INTO S3
SELECT .
performs a distributed select into a S3 bucket.CREDENTIALS
clause is required and its value is defined in credentials_
as follows.
configuration_json:
'{"region":"your_region" [, "multipart_chunk_size_mb":<size_in_MB>]}'
credentials_json:
'{"aws_access_key_id": "replace_with_your_access_key_id",
"aws_secret_access_key": "replace_with_your_secret_access_key",
["aws_session_token": "replace_with_your_temp_session_token",]
}'
The WITH COMPRESSION GZIP
clause, if specified, must be included after the CREDENTIALS
clause.WITH COMPRESSION GZIP
is specified, the results of the SELECT
query are written to the S3 bucket in compressed gzip files.
The WITH COMPRESSION
clause can be specified instead of the WITH COMPRESSION GZIP
clause.WITH COMPRESSION
may use a compression format other than gzip.
format_
, if specified, must be included after the CREDENTIALS
clause, or after the WITH COMPRESSION clause (if it is specified).format_
are similar to the parsing options used with LOAD DATA
.
format_options:
[{FIELDS | COLUMNS}
[TERMINATED BY 'string']
[[OPTIONALLY] ENCLOSED BY 'char']
[ESCAPED BY 'char']
]
[LINES
[STARTING BY 'string']
[TERMINATED BY 'string']
]
To overwrite existing files in an S3 bucket, specify ENABLE_
after the CREDENTIALS
clause in the SELECT … INTO S3
statement.
SELECT *FROM t1INTO S3 'testing/output'CONFIG '{"region":"us-east-1"}'CREDENTIALS '{"aws_access_key_id":"your_access_key_id","aws_secret_access_key":"your_secret_access_key"}'ENABLE_OVERWRITE;
To overwrite existing files in a connection link destination/target, specify ENABLE_
after the INTO LINK
clause.
SELECT *FROM t1INTO LINK S3con 'testing/output'ENABLE_OVERWRITEFields terminated by ‘,’Lines terminated by ‘\n’;
Remarks
Warning
The S3 bucket needs to be created before running this command.
-
The
multipart_
must be in the range of [5.chunk_ size_ mb . 500]. By default, the chunk size is 5 MB
.A larger chunk size allows users to upload large files without going over Amazon’s limitation on maximum number of parts per upload. Although, a larger chunk size increases the chance of a network error during the upload to S3. If a chunk fails to upload, SingleStore Helios retries uploading it until the limit set on the number of retries by AWS is reached. Note
Each partition will use
multipart_
MB(s) of additional memory.chunk_ size_ mb -
The output of
SELECT .
is stored with the content type binary/octet-stream in the S3 bucket.. . INTO S3 -
The
ENABLE_
clause is only supported in SELECT .OVERWRITE . . INTO {S3 | LINK } statements. -
If the insert select logic determines that the
SELECT .
query can be run as distributed, the query will be pushed down to each leaf and executed on each partition.. . INTO S3 The name of each object will be: <bucket/target>_<partition ID>
-
If the insert select logic determines that the
SELECT .
query can only be on the aggregator because it contains aggregations, ORDER BY, GROUP BY, etc.. . INTO S3 then the query will be run on each leaf but the result will be collected on the aggregator and then output to S3. The object name will just be: <bucket/target>
-
The
SELECT
query will validate if the<bucket/target>
or<bucket/target>_
already exists on the S3 bucket first and fail if any of the object(s) already exist. -
As an alternative to using a
SELECT .
statement where you specify the. . INTO S3 CONFIG
andCREDENTIALS
clauses, you can use aSELECT .
statement, where you reference a connection link.. . INTO LINK For more information, see Configuring and Using Connection Links.
Examples
The following simple example shows how to save all rows in table t1 to an S3 bucket using an AWS access key.
SELECT *FROM t1INTO S3 'testing/output'CONFIG '{"region":"us-east-1"}'CREDENTIALS '{"aws_access_key_id":"your_access_key_id","aws_secret_access_key":"your_secret_access_key"}'
The following example uses the format options to output the data in CSV format.
SELECT *FROM tINTO S3 'tmp/a'CONFIG '{"region":"us-east-1"}'CREDENTIALS '{"aws_access_key_id":"your_access_key_id","aws_secret_access_key":"your_secret_access_key"}'FIELDS TERMINATED BY ','LINES TERMINATED BY '\n'
SELECT … INTO GCS
SELECT .
performs a distributed select into a Google Cloud Storage (GCS) bucket.CONFIG
clause is not required, and can be excluded or left empty as defined in the configuration_
that follows.CREDENTIALS
clause is required and its value is defined in the credentials_
that follows.
configuration_json:
'{}'
credentials_json:
'{"access_id": "replace_with_your_google_access_key",
"secret_key": "replace_with_your_google_secret_key"
}'
The WITH COMPRESSION GZIP
clause, if specified, must be included after the CREDENTIALS
clause.WITH COMPRESSION GZIP
is specified, the results of the SELECT
query are written to the GCS bucket in compressed gzip files.
The WITH COMPRESSION
clause can be specified instead of the WITH COMPRESSION GZIP
clause.WITH COMPRESSION
may use a compression format other than gzip.
format_
, if specified, must be included after the CREDENTIALS
clause, or after the WITH COMPRESSION clause (if it is specified).format_
are similar to the parsing options used with LOAD DATA
.
format_options:
[{FIELDS | COLUMNS}
[TERMINATED BY 'string']
[[OPTIONALLY] ENCLOSED BY 'char']
[ESCAPED BY 'char']
]
[LINES
[STARTING BY 'string']
[TERMINATED BY 'string']
]
Remarks
-
Unlike other filesystem options, with
SELECT .
, data written by a single partition is not divided into chunks.. . INTO GCS However, there may still be multiple partitions and thus multiple files (one per partition) will appear in GCS. -
The maximum object size supported by GCS is 5 TB; this limit also applies to
SELECT .
.. . INTO GCS -
The
CONFIG
clause may optionally specify an endpoint_url. -
The
CREDENTIALS
clause is required. -
We support only HMAC keys.
-
The
CREDENTIALS
clause should be a JSON object with two fields:
access_
: usually a 24 or 60 character alphanumeric string, which is linked to the Google account, typically all uppercase and starts with GOOG
.
secret_
: usually a 40 character Base-64 encoded string that is linked to a specific access_
.
As an alternative to using a SELECT .
statement where you specify the CONFIG
and CREDENTIALS
clauses, you can use a SELECT .
statement, where you reference a connection link.
Examples
The following simple example shows how to save all rows in table table_
to a GCS bucket using a Google access key, and outputs them as a CSV.
CREATE TABLE table_name (column_name INT);INSERT INTO table_name VALUES (1), (2), (3);SELECT *FROM table_nameINTO GCS 'bucket/path'CREDENTIALS '{"access_id": "replace_with_your_google_access_key", "secret_key": "replace_with_your_google_secret_key"}' FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n';
SELECT … INTO KAFKA …
SELECT .
runs a SELECT
query, constructs a Kafka message for each row in the result set, and publishes the messages to a Kafka topic.
When SELECT .
constructs a Kafka message, it includes every column value in the result set's row and separates the column values by a delimiter.
kafka_configuration:
CONFIG 'string'
kafka_credentials:
CREDENTIALS 'string'
format_options:
[{FIELDS | COLUMNS}
[TERMINATED BY 'string']
[ENCLOSED BY 'char']
[ESCAPED BY 'char']
]
[LINES
[TERMINATED BY 'string']
[STARTING BY 'string']
]
kafka_topic_endpoint:
host:port[, ...]/topic
Arguments
kafka_ configuration
Optional.server.
file on each Kafka broker.
kafka_ credentials
Optional.
format_ options
Optional.SELECT
result set.
See an example of how to use the clauses.
kafka_ topic_ endpoint
The list of Kafka brokers, followed by the topic to which SingleStore Helios will publish messages.
KAFKA KEY <column_ name>
Optional.
KAFKA KEY
should be used only for kafka targets.
The Kafka key column name must be a valid column name in the database table.
The Kafka key column name must be the same as the first column name of the SELECT INTO
statement.
Remarks
-
When possible, SingleStore Helios queries leaf nodes directly, bypassing the aggregator.
This allows SingleStore Helios to send data directly from the leaves to Kafka partitions. -
SingleStore Helios constructs a Kafka message as an array of bytes.
-
As an alternative to using a
SELECT .
statement where you specify the. . INTO KAFKA CONFIG
andCREDENTIALS
clauses, you can use aSELECT .
statement, where you reference a connection link.. . INTO LINK For more information, see Configuring and Using Connection Links.
Examples
Example: Specifying the Kafka Message Format
The following query uses the FIELDS
and LINES
clauses to format the Kafka messages that are constructed from the SELECT
result set.
SELECT col1, col2, col3 FROM tORDER BY col1INTO KAFKA 'host.example.com:9092/test-topic'FIELDS TERMINATED BY ',' ENCLOSED BY '"' ESCAPED BY "\t"LINES TERMINATED BY '}' STARTING BY '{';
Suppose the result set returned by SELECT col1, col2, col3 FROM t ORDER BY col1
is:
+------+------+------+
| col1 | col2 | col3 |
+------+------+------+
| a | b | c |
| d | e | f |
| g | h\ti | j |
+------+------+------+
This result set will be converted to three Kafka messages having the following format:
Message 1: {a,b,c}
Message 2: {d,e,f}
Message 3: {g,h<tab character>i,j}
<tab character>
will be replaced by the number of spaces that your tab is set to.
Note
If a SELECT .
query does not include the FIELDS.
and LINES.
.
SELECT col1, col2, col3 FROM tORDER BY col1INTO KAFKA 'host.example.com:9092/test-topic'FIELDS TERMINATED BY '\t' ENCLOSED BY '' ESCAPED BY '\\'LINES TERMINATED BY '\n' STARTING BY '';
Example: Publishing Kafka Messages to Confluent Cloud
SELECT text FROM t INTO
KAFKA '<Confluent Cloud workspace endpoint>/test-topic'
CONFIG '{"sasl.username": "<CLUSTER_API_KEY>",
"sasl.mechanism": "PLAIN",
"security.protocol": "SASL_SSL",
"ssl.ca.location": "/etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem"}'
CREDENTIALS '{"sasl.password": "<CLUSTER_API_SECRET>"}';
Prior to running the SELECT .
statement, you must upload an SSL certificate to the SingleStore Helios.
Use the following steps to enable TLS/SSL encryption between SingleStore Helios and Kafka.
-
From the Deployments page on the Cloud Portal, select the workspace group on which to enable TLS/SSL connections.
-
Select the Security tab at the top of the page.
-
Select the Upload Certificate button to upload your CA certificate.
This will make it available to all the nodes and will allow you to secure outbound connections via TLS/SSL.
Example: Specifying One Kafka Broker
The following example imports the data in the column text
from table t
into the Kafka topic test-topic
.9092
at host.
.
SELECT text FROM t INTOKAFKA 'host.example.com:9092/test-topic'
Example: Specifying Multiple Kafka Brokers
The following example imports the data in the column text
from table t
into the Kafka topic test-topic
.9092
, at host1.
, host2.
and host3.
.
SELECT text FROM t INTOKAFKA 'host1.example.com:9092,host2.example.com:9092,host3.example.com:9092/test-topic'
Example: Specifying a Kafka Key
For this example, suppose table kktest has three columns: column_SELECT .
statement might look like this:
SELECT column_b, column_a, column_b, column_c FROM kktest INTO
KAFKA 'localhost:1234/kafkaWithKey' KAFKA KEY "column_b";
SELECT INTO . . . FORMAT PARQUET
SELECT INTO .
allows SingleStore data to be exported into Apache Parquet (Parquet) format for the given SQL type.
Some SQL types may not fit into a particular Parquet type.
Syntax
FORMAT PARQUET
Remarks
-
The
FORMAT PARQUET
clause will go at the end of theSELECT
statement. -
Time and date types are stored in the engine timezone without adjusting to UTC.
Below are some examples of the differences between SQL and Parquet time and date formats. If any value of SQL TIME(6) is outside the range of the Parquet TIME, the following error message may occur: can not save SQL TIME(6) … because parquet's TIME is within a single day (hour should be between 0 and 23); consider casting this column to BIGINT
-
If any value of TIMESTAMP is zero, which is outside the range of the Parquet TIMESTAMP, the following error message may occur:
can not be saved as TIMESTAMP with millisecond precision because zero value is outside the scope of the parquet type; consider casting this column to INT64
Examples
Below is a Parquet file with the following schema:
message m1 {
required boolean f1;
required fixed_len_byte_array(4) f2;
optional group g1 {
optional binary f3 (STRING);
required int64 f4;
}
}
Create a JSON formatted file with these records:
{"f1": false, "f2": "rec1", "g1": null}{"f1": true, "f2": "rec2", "g1": null}{"f1": true, "f2": "rec3", "g1": {"f3": null, "f4": 3}}{"f1": true, "f2": "rec4", "g1": {"f3": "four", "f4": 4}}
Create a table:
CREATE TABLE parquet_tbl(c2 BLOB, c3 BLOB, c4 BIGINT UNSIGNED);
Create a pipeline to pull the example file from an AWS S3 bucket:
CREATE PIPELINE parquet_pipeAS LOAD DATA S3 's3://<bucket_name>/<filename.parquet>'CONFIG '{"region":"us-west-2"}'CREDENTIALS '{"aws_access_key_id": "<XXXXXXXXXXXXXXXXX>","aws_secret_access_key": "<XXXXXXXXXXXXXX>"}'INTO TABLE parquet_tbl(@v1 <- f1,@v2 <- f2,c3 <- g1::f3,c4 <- g1::f4)FORMAT PARQUETSET c2 = UPPER(CONVERT(@v2, CHAR))WHERE @v1 = TRUE;
Start the pipeline:
START PIPELINE parquet_pipe;
Check the table for the imported data:
SELECT * FROM parquet_tbl;+------+------+------+| c2 | c3 | c4 |+------+------+------+| REC2 | NULL | NULL || REC3 | NULL | 3 || REC4 | four | 4 |+------+------+------+
Format Options
format_options:
[{FIELDS | COLUMNS}
[TERMINATED BY 'string']
[[OPTIONALLY] ENCLOSED BY 'char']
[ESCAPED BY 'char']
]
[LINES
[STARTING BY 'string']
[TERMINATED BY 'string']
]
-
FIELDS TERMINATED BY 'string'
: The delimiter to use between each column value in the message.If not specified, defaults to \t
. -
FIELDS ENCLOSED BY 'char'
: Prefixes and suffixes each column value of the message with thechar
value.If not specified, defaults to nothing. The OPTIONALLY
keyword does not affect the behavior of this option; it exists to maintain compatibility with MySQL. -
FIELDS ESCAPED BY 'char'
: Translates the escape sequence in thechar
value, instead of using the literalchar
value.Translation is done for all occurrences of the char
value in the message.If not specified, defaults to \\
. -
LINES TERMINATED BY 'string'
: A string to insert at the end of the message.If not specified, defaults to \n
. -
LINES STARTING BY 'string'
: A string to insert at the beginning of the message.If not specified, defaults to nothing.
Specify Partition ID and Timestamp in Output Filename in SELECT INTO
Users may define placeholders for partition ID (ORDINAL) and timestamp as part of the output filename in SELECT INTO
statements.
The placeholders for partition ID (${PARTITION_
) and timestamp (${TIME_
) are automatically replaced with their respective values during query execution.
In the query below, the ${PARTITION_
placeholder is replaced with the value of the partition ID in the output filenames.789
and the format was parquet, the query and output would be as follows:
SELECT * FROM tablename INTO S3 's3://bucket/test/filename_${PARTITION_ID}.parquet';
bucket/test/filename_789.parquet
Similar functionality is provided for the ${TIME_
placeholder as shown in the examples below.
This placeholder functionality works for outputs into S3, Azure, HDFS, Link, and GCS.
Examples
The examples below show how to use the placeholders.
These examples assume the existence of a table t
and assume that TIME_
= 123456
and that the table t
has two partitions with partition IDs 1
and 2
.
Example 1: Resolve Placeholders in Filenames
Placeholders are dynamically replaced in filenames with actual values to generate context-specific filenames based on the actual values of the placeholders.
SELECT * FROM tINTO S3 's3://bucket/test/file_${TIME_STAMP}_${PARTITION_ID}.parquet';
bucket/test/file_123456_.parquet
bucket/test/file_123456_1.parquet
bucket/test/file_123456_2.parquet
Example 2: Default Placeholder
The partition ID must always be included in the filenames.${PARTITION_
placeholder is not in the path, the partition ID is appended to the output filenames.
SELECT * FROM tINTO S3 's3://bucket/test/file_${TIME_STAMP}.parquet';
bucket/test/file_123456.parquet
bucket/test/file_123456.parquet_1
bucket/test/file_123456.parquet_2
Example 3: Paths Without Placeholders
When no placeholders are found in the file path, the partition IDs are appended to the filename.
SELECT * FROM tINTO S3 's3://bucket/test/file.parquet';
bucket/test/file.parquet
bucket/test/file.parquet_1
bucket/test/file.parquet_2
Example 4: Placeholders in Directory Paths
Placeholders are resolved in the filename, but not in the directory path.
When the placeholder ${PARTITION_
is used in the directory path as in the query below, ${PARTITION_
will not be translated to the value of the partition ID.${PARTITION_
is used in the filename, as is also shown in the query below, ${PARTITION_
will be translated to the value of the partition ID.${TIMESTAMP}
placeholder.
SELECT * FROM tINTO S3 's3://bucket/${PARTITION_ID}/file_${PARTITION_ID}.parquet';
/bucket/${PARTITION_ID}/file.parquet
/bucket/${PARTITION_ID}/file_1.parquet
/bucket/${PARTITION_ID}/file_2.parquet
SELECT … WITH (SAMPLE_ RATIO = <value>)
SELECT .
retrieves a random sample of data from a table (specifically, a Bernoulli sample).
sample_ratio_clause: WITH (SAMPLE_RATIO = ratio)
Argument
ratio
A decimal between 0.
Remarks
-
When you run multiple
SELECT
statements against the same table usingWITH (SAMPLE_
, you will receive similar results.RATIO = <value>) This is the case even when your sample ratio is different in your SELECT
statements.For example, the sample returned for a sample ratio of 0.
is often a subset of the sample ratio returned for25 0.
.5 -
Using a sample ratio that is one divided by a power of two allows
WITH (SAMPLE_
to run more efficiently.RATIO = <value>) -
You can use
WITH (SAMPLE_
in aRATIO = <value>) SELECT
subquery. -
SAMPLE_
is not applicable on Common Table Expressions (CTE) but on the base tables.RATIO
Warning
You can use WITH (SAMPLE_
with any table.AUTOSTATS_
enabled (the default).
Examples
Simple SELECT
statements
The following example populates a table with the integers 1
to 10
and selects two random samples.
DROP DATABASE IF EXISTS singlestore_docs_example;CREATE DATABASE singlestore_docs_example;USE singlestore_docs_example;CREATE TABLE example_table(num INT);INSERT INTO example_table VALUES (5), (6), (10), (11), (12),(14), (20), (24), (33), (40);
The query below returns a random sample of the num
column.
SELECT num FROM example_table WITH (SAMPLE_RATIO = 0.7) ORDER BY num;
Example output follows.
+------+
| num |
+------+
| 6 |
| 11 |
| 14 |
| 20 |
| 24 |
| 33 |
+------+
The query below returns the average of a random sample of the num
column.AVG
computation.
SELECT AVG(num) FROM example_table WITH (SAMPLE_RATIO = 0.8);
Example output follows.
+----------+
| AVG(num) |
+----------+
| 18.1111 |
+----------+
Using WITH (SAMPLE_
with SELECT
Subqueries
You may use WITH (SAMPLE_
as part of a SELECT
subquery, as shown in the following example.
CREATE TABLE example_table_2 AS SELECT * FROM example_tableWITH (SAMPLE_RATIO = 0.6);SELECT num FROM example_table2 ORDER BY num;
Example output follows.
+------+
| num |
+------+
| 10 |
| 12 |
| 14 |
| 24 |
| 33 |
| 40 |
+------+
Using WITH (SAMPLE_
with Joins
You may use WITH (SAMPLE_
in a SELECT
statement that contains a join.CUSTOMER
table to an ORDER
table using the customer_
field in the ORDER
table.WITH (SAMPLE_
clause within the join will greatly impact the result, as shown below.
The following query selects a sample of customers and returns the orders for those customers only.
SELECT c.name, o.order_id, o.order_totalFROM customer WITH (SAMPLE_RATIO = 0.4) cJOIN order oON c.customer_id = o.customer_id
The following query selects a sample of all orders and returns only those orders along with their associated customers.
SELECT c.name, o.order_id, o.order_totalFROM customer cJOIN order o WITH (SAMPLE_RATIO = 0.4)ON c.customer_id = o.customer_id
Last modified: January 13, 2025