Performing Upserts
On this page
An upsert inserts a row r
into a table if the table does not already contain an existing row with the same primary key or unique key as r
.
An upsert can be performed by specifying the ON DUPLICATE KEY UPDATE
clause of the INSERT
statement.
-
If a row is to be inserted that would result in a duplicate value in a
PRIMARY KEY
orUNIQUE KEY
, SingleStore Helios will instead perform anUPDATE
of the old row. -
An upsert can only be done against a
PRIMARY KEY
orUNIQUE KEY
column. -
ON DUPLICATE KEY UPDATE
cannot update unique key columns in columnstore tables.For rowstore tables, ON DUPLICATE KEY UPDATE
is allowed as long as the updated column isn't part of the table's explicitly defined shard key or primary key if there's no explicitly defined shard key. -
An upsert conflict occurs when there are multiple rows that could potentially be updated, because there are multiple unique keys defined on the table.
When such a conflict occurs, only one row is updated. There is no guarantee as to which one. -
The affected row count will be 0 if no existing row is changed, 1 if a new row is inserted, and 2 if an existing row is updated.
In Example 1 below, the ON DUPLICATE KEY UPDATE
clause updates the existing row instead of inserting a new row to avoid duplicate entry in aPRIMARY KEY
field, and the query result shows that 2 rows were affected.
Examples
The following examples use the cust
table, which is defined as a columnstore table as follows:
CREATE TABLE cust(name VARCHAR(32), id INT(11), orders INT(11), SORT KEY(id), UNIQUE KEY(id) USING HASH, SHARD KEY(id));
DESC cust;
+--------+-------------+------+------+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------+-------------+------+------+---------+-------+
| name | varchar(32) | YES | | NULL | |
| id | int(11) | YES | PRI | NULL | |
| orders | int(11) | YES | | NULL | |
+--------+-------------+------+------+---------+-------+
Add three rows of sample data for the examples:
INSERT INTO cust (name, id, orders) VALUES ("Chris",7214,2), ("Adam",3412,5), ("Elen",8301,4);
You could also define cust
as a rowstore table, as follows, and the examples would return the same results.
CREATE ROWSTORE TABLE cust(name VARCHAR(32), id INT(11) PRIMARY KEY, orders INT(11));
Example 1: Basic Example of ON DUPLICATE KEY UPDATE
The following example demonstrates the use of INSERT
with ON DUPLICATE KEY UPDATE
.
SELECT * FROM cust;
+-------+------+--------+
| NAME | ID | ORDERS |
+-------+------+--------+
| Chris | 7214 | 2 |
| Elen | 8301 | 4 |
| Adam | 3412 | 5 |
+-------+------+--------+
INSERT INTO cust (ID, ORDERS) VALUES (7214, 3)ON DUPLICATE KEY UPDATE ORDERS=3;
Query OK, 2 rows affected
SELECT * FROM cust;
+-------+------+--------+
| NAME | ID | ORDERS |
+-------+------+--------+
| Chris | 7214 | 3 |
| Elen | 8301 | 4 |
| Adam | 3412 | 5 |
+-------+------+--------+
Example 2: Using the VALUES() Function
The VALUES()
function can be used to refer to column values from the INSERT
clause of the INSERT .
statement.
SELECT * FROM cust;
+-------+------+--------+
| NAME | ID | ORDERS |
+-------+------+--------+
| Chris | 7214 | 2 |
| Adam | 3412 | 5 |
| Elen | 8301 | 4 |
+-------+------+--------+
INSERT INTO cust (ID, ORDERS) VALUES (7214, 4)ON DUPLICATE KEY UPDATE ORDERS = VALUES(ORDERS) + ORDERS;
Query OK, 2 rows affected
SELECT * FROM cust;
+-------+------+--------+
| NAME | ID | ORDERS |
+-------+------+--------+
| Chris | 7214 | 6 |
| Adam | 3412 | 5 |
| Elen | 8301 | 4 |
+-------+------+--------+
Here, the VALUES()
function refers to the value that would have been inserted, if there were no duplicate-key conflicts, and should refer to a column name of the destination table.
Example 3: Using SELECT with ON DUPLICATE KEY UPDATE
The following code block demonstrates the use of SELECT
with ON DUPLICATE KEY UPDATE
:
SELECT * FROM cust_new;
+------+------+--------+
| NAME | ID | ORDERS |
+------+------+--------+
| Bill | 21 | 5 |
| Gwen | 7214 | 3 |
| Sam | 22 | 2 |
+------+------+--------+
The cust_
table is defined using the same table definition as cust
, which is shown in the beginning of the examples.
SELECT * FROM cust;
+-------+------+--------+
| NAME | ID | ORDERS |
+-------+------+--------+
| Chris | 7214 | 6 |
| Elen | 8301 | 4 |
| Adam | 3412 | 5 |
+-------+------+--------+
INSERT INTO cust (NAME, ID, ORDERS) SELECT * FROM cust_new ON DUPLICATE KEY UPDATE NAME = VALUES(NAME), ORDERS = VALUES (ORDERS);
Query OK, 4 rows affected
Records: 3 Duplicates: 1 Warnings: 0
SELECT * FROM cust;
+------+------+--------+
| NAME | ID | ORDERS |
+------+------+--------+
| Bill | 21 | 5 |
| Gwen | 7214 | 3 |
| Elen | 8301 | 4 |
| Sam | 22 | 2 |
| Adam | 3412 | 5 |
+------+------+--------+
Here, the VALUES()
function refers to the value that would have been inserted, if there were no duplicate-key conflicts, and should refer to a column name of the destination table.
Example 4: Updating Values through a Two-step Process
The above examples are functionally equivalent to this common two-step operation.
Note: While this works, it is much less performant than the single-steps above.
SELECT * FROM cust;
+-------+------+--------+
| NAME | ID | ORDERS |
+-------+------+--------+
| Chris | 7214 | 6 |
| Elen | 8301 | 4 |
| Adam | 3412 | 5 |
+-------+------+--------+
SELECT * FROM cust_new;
+------+------+--------+
| NAME | ID | ORDERS |
+------+------+--------+
| Bill | 21 | 5 |
| Gwen | 7214 | 3 |
| Sam | 22 | 2 |
+------+------+--------+
The cust_
table is defined using the same table definition as cust
, which is shown in the beginning of the examples.
Copy all the unique values from the cust_
to cust
table.
INSERT IGNORE INTO cust SELECT * FROM cust_new;SELECT * FROM cust;
+-------+------+--------+
| NAME | ID | ORDERS |
+-------+------+--------+
| Bill | 21 | 5 |
| Chris | 7214 | 6 |
| Elen | 8301 | 4 |
| Sam | 22 | 2 |
| Adam | 3412 | 5 |
+-------+------+--------+
Now, perform an update to copy all the non-unique (duplicate) values.
UPDATE custJOIN cust_new ON ( cust_new.ID=cust.ID )SET cust.NAME = cust_new.NAME,cust.ORDERS = cust_new.ORDERS;SELECT * FROM cust;
+------+------+--------+
| NAME | ID | ORDERS |
+------+------+--------+
| Bill | 21 | 5 |
| Gwen | 7214 | 3 |
| Elen | 8301 | 4 |
| Sam | 22 | 2 |
| Adam | 3412 | 5 |
+------+------+--------+
Example 5: Using ON DUPLICATE KEY UPDATE with a Pipeline
The following example demonstrates an upsert on a Pipeline.
Assume the directory /order_
has one file orders1.
, which contains the following data:
Chris,7214,6
Elen,8301,4
Adam,3412,5
Rachel,9125,2
Susan,8301,7
George,3412,9
Create a Pipeline with an ON DUPLICATE KEY UPDATE
clause:
CREATE PIPELINE p AS LOAD DATA FS '/order_files/orders*.txt'INTO TABLE cust FIELDS TERMINATED BY ',' ON DUPLICATE KEY UPDATE orders = 3;
Start the Pipeline:
START PIPELINE p;
As the Pipeline ingests the data from orders1.
into the cust
table, it encounters the fifth and sixth records in the file, which contain the duplicate keys 8301
and 3412
.cust
), are updated with 3
, because orders = 3
is specified in the ON DUPLICATE KEY UPDATE
clause.
SELECT * FROM cust ORDER BY name;
+--------+------+--------+
| name | id | orders |
+--------+------+--------+
| Adam | 3412 | 3 |
| Chris | 7214 | 6 |
| Elen | 8301 | 3 |
| Rachel | 9125 | 2 |
+--------+------+--------+
Example 5 works because the new value for orders
is specified directly (orders = 3
).
Feature 'UPDATE value depends on tables other than that being updated.' is not supported by MemSQL.
In order to update existing values with a new value specified in the pipeline, use the VALUES()
clause.
Chris,7214,6
Elen,8301,99
Adam,3412,99
Rachel,9125,2
Susan,8301,99
George,3412,9
And we update the ON DUPLICATE KEY UPDATE statement:
CREATE PIPELINE p2 AS LOAD DATA FS '/order_files/orders*.txt'INTO TABLE cust FIELDS TERMINATED BY ',' ON DUPLICATE KEY UPDATE orders = VALUES(orders);START PIPELINE p2;
The updated results:
SELECT * FROM cust ORDER BY name;
+--------+------+--------+
| name | id | orders |
+--------+------+--------+
| Adam | 3412 | 9 |
| Chris | 7214 | 6 |
| Elen | 8301 | 99 |
| Rachel | 9125 | 2 |
+--------+------+--------+
Example 6: Using ON DUPLICATE KEY UPDATE
with a Multi-Column Key
This example uses the product
table, which is defined as follows, with the multi-column unique key (id_
.
CREATE TABLE product(name VARCHAR(32), id_1 INT(11), id_2 INT(11), quantity INT(11), SORT KEY (id_1,id_2), UNIQUE KEY(id_1,id_2) USING HASH, SHARD KEY(id_1,id_2));
Assume the table contains the following data:
+-------------+------+------+----------+
| name | id_1 | id_2 | quantity |
+-------------+------+------+----------+
| blue pen | 2792 | 3 | 200 |
| red pen | 2792 | 5 | 250 |
| white paper | 4624 | 1 | 100 |
| brown paper | 4624 | 8 | 175 |
+-------------+------+------+----------+
Attempt to insert a record with the quantity 325
.(2792,5)
is already in the table, the existing record is updated with 325
:
INSERT INTO product(name,id_1,id_2,quantity) VALUES ('red pen',2792,5,325) ON DUPLICATE KEY UPDATE quantity = 325;
Attempt to insert a record with the quantity 125
.(4624,7)
, the insert succeeds.
INSERT INTO product(name,id_1,id_2,quantity) VALUES ('yellow paper',4624,7,125) ON DUPLICATE KEY UPDATE quantity = 125;
+--------------+------+------+----------+
| name | id_1 | id_2 | quantity |
+--------------+------+------+----------+
| blue pen | 2792 | 3 | 200 |
| red pen | 2792 | 5 | 325 |
| white paper | 4624 | 1 | 100 |
| yellow paper | 4624 | 7 | 125 |
| brown paper | 4624 | 8 | 175 |
+--------------+------+------+----------+
Last modified: March 22, 2024