Performing Upserts

An upsert inserts a row r into a table if the table does not already contain an existing row with the same primary key or unique key as r. Otherwise, the upsert updates the existing row.

An upsert can be performed by specifying the ON DUPLICATE KEY UPDATE clause of the INSERT statement. The following applies for an upsert:

  • If a row is to be inserted that would result in a duplicate value in a PRIMARY KEY or UNIQUE KEY, SingleStore will instead perform an UPDATE of the old row.

  • An upsert can only be done against a PRIMARY KEY or UNIQUE KEY column.

  • ON DUPLICATE KEY UPDATE cannot update unique key columns in columnstore tables. For rowstore tables, ON DUPLICATE KEY UPDATE is allowed as long as the updated column isn't part of the table's explicitly defined shard key or primary key if there's no explicitly defined shard key.

  • An upsert conflict occurs when there are multiple rows that could potentially be updated, because there are multiple unique keys defined on the table. When such a conflict occurs, only one row is updated. There is no guarantee as to which one.

  • The affected row count will be 0 if no existing row is changed, 1 if a new row is inserted, and 2 if an existing row is updated. In Example 1 below, the ON DUPLICATE KEY UPDATE clause updates the existing row instead of inserting a new row to avoid duplicate entry in a PRIMARY KEY field, and the query result shows that 2 rows were affected.

Examples

The following examples use the cust table, which is defined as a columnstore table as follows:

CREATE TABLE cust(name VARCHAR(32), id INT(11), orders INT(11), SORT KEY(id), UNIQUE KEY(id) USING HASH, SHARD KEY(id));
DESC cust;
+--------+-------------+------+------+---------+-------+
| Field  | Type        | Null | Key  | Default | Extra |
+--------+-------------+------+------+---------+-------+
| name   | varchar(32) | YES  |      | NULL    |       |
| id     | int(11)     | YES  | PRI  | NULL    |       |
| orders | int(11)     | YES  |      | NULL    |       |
+--------+-------------+------+------+---------+-------+

You could also define cust as a rowstore table, as follows, and the examples would return the same results.

CREATE ROWSTORE TABLE cust(name VARCHAR(32), id INT(11) PRIMARY KEY, orders INT(11));

Example 1: Basic Example of ON DUPLICATE KEY UPDATE

The following example demonstrates the use of INSERT with ON DUPLICATE KEY UPDATE.

SELECT * FROM cust;
+-------+------+--------+
| NAME  | ID   | ORDERS |
+-------+------+--------+
| Chris | 7214 |      2 |
| Elen  | 8301 |      4 |
| Adam  | 3412 |      5 |
+-------+------+--------+
INSERT INTO cust (ID, ORDERS) VALUES (7214, 3)
ON DUPLICATE KEY UPDATE ORDERS=3;
Query OK, 2 rows affected
SELECT * FROM cust;
+-------+------+--------+
| NAME  | ID   | ORDERS |
+-------+------+--------+
| Chris | 7214 |      3 |
| Elen  | 8301 |      4 |
| Adam  | 3412 |      5 |
+-------+------+--------+

Example 2: Using the VALUES() Function

The VALUES() function can be used to refer to column values from the INSERT clause of the INSERT ... ON DUPLICATE KEY UPDATE statement. For example:

SELECT * FROM cust;
+-------+------+--------+
| NAME  | ID   | ORDERS |
+-------+------+--------+
| Chris | 7214 |      2 |
| Adam  | 3412 |      5 |
| Elen  | 8301 |      4 |
+-------+------+--------+
INSERT INTO cust (ID, ORDERS) VALUES (7214, 4)
ON DUPLICATE KEY UPDATE ORDERS = VALUES(ORDERS) + ORDERS;
Query OK, 2 rows affected
SELECT * FROM cust;
+-------+------+--------+
| NAME  | ID   | ORDERS |
+-------+------+--------+
| Chris | 7214 |      6 |
| Adam  | 3412 |      5 |
| Elen  | 8301 |      4 |
+-------+------+--------+

Here, the VALUES() function refers to the value that would have been inserted, if there were no duplicate-key conflicts, and should refer to a column name of the destination table.

Example 3: Using SELECT with ON DUPLICATE KEY UPDATE

The following code block demonstrates the use of SELECT with ON DUPLICATE KEY UPDATE:

SELECT * FROM cust_new;
+------+------+--------+
| NAME | ID   | ORDERS |
+------+------+--------+
| Bill |   21 |      5 |
| Gwen | 7214 |      3 |
| Sam  |   22 |      2 |
+------+------+--------+

The cust_new table is defined using the same table definition as cust, which is shown in the beginning of the examples.

SELECT * FROM cust;
+-------+------+--------+
| NAME  | ID   | ORDERS |
+-------+------+--------+
| Chris | 7214 |      6 |
| Elen  | 8301 |      4 |
| Adam  | 3412 |      5 |
+-------+------+--------+
INSERT INTO cust (NAME, ID, ORDERS) SELECT * FROM cust_new ON DUPLICATE KEY UPDATE NAME = VALUES(NAME), ORDERS = VALUES (ORDERS);
Query OK, 4 rows affected
Records: 3  Duplicates: 1  Warnings: 0
SELECT * FROM cust;
+------+------+--------+
| NAME | ID   | ORDERS |
+------+------+--------+
| Bill |   21 |      5 |
| Gwen | 7214 |      3 |
| Elen | 8301 |      4 |
| Sam  |   22 |      2 |
| Adam | 3412 |      5 |
+------+------+--------+

Here, the VALUES() function refers to the value that would have been inserted, if there were no duplicate-key conflicts, and should refer to a column name of the destination table.

Example 4: Updating Values through a Two-step Process

The above examples are functionally equivalent to this common two-step operation.

Note: While this works, it is much less performant than the single-steps above.

SELECT * FROM cust;
+-------+------+--------+
| NAME  | ID   | ORDERS |
+-------+------+--------+
| Chris | 7214 |      6 |
| Elen  | 8301 |      4 |
| Adam  | 3412 |      5 |
+-------+------+--------+
SELECT * FROM cust_new;
+------+------+--------+
| NAME | ID   | ORDERS |
+------+------+--------+
| Bill |   21 |      5 |
| Gwen | 7214 |      3 |
| Sam  |   22 |      2 |
+------+------+--------+

The cust_new table is defined using the same table definition as cust, which is shown in the beginning of the examples.

Copy all the unique values from the cust_new to cust table.

INSERT IGNORE INTO cust SELECT * FROM cust_new;
SELECT * FROM cust;
+-------+------+--------+
| NAME  | ID   | ORDERS |
+-------+------+--------+
| Bill  |   21 |      5 |
| Chris | 7214 |      6 |
| Elen  | 8301 |      4 |
| Sam   |   22 |      2 |
| Adam  | 3412 |      5 |
+-------+------+--------+

Now, perform an update to copy all the non-unique (duplicate) values.

UPDATE cust
JOIN cust_new ON ( cust_new.ID=cust.ID )
SET cust.NAME = cust_new.NAME,
cust.ORDERS = cust_new.ORDERS;
SELECT * FROM cust;
+------+------+--------+
| NAME | ID   | ORDERS |
+------+------+--------+
| Bill |   21 |      5 |
| Gwen | 7214 |      3 |
| Elen | 8301 |      4 |
| Sam  |   22 |      2 |
| Adam | 3412 |      5 |
+------+------+--------+

Example 5: Using ON DUPLICATE KEY UPDATE with a Pipeline

The following example demonstrates an upsert on a Pipeline.

Assume the directory /order_files has one file orders1.txt, which contains the following data:

Chris,7214,6
Elen,8301,4
Adam,3412,5
Rachel,9125,2
Susan,8301,7
George,3412,9

Create a Pipeline with an ON DUPLICATE KEY UPDATE clause:

CREATE PIPELINE p AS LOAD DATA FS '/order_files/orders*.txt'
INTO TABLE cust FIELDS TERMINATED BY ',' ON DUPLICATE KEY UPDATE orders = 3;

Start the Pipeline:

START PIPELINE p;

As the Pipeline ingests the data from orders1.txt into the cust table, it encounters the fifth and sixth records in the file, which contain the duplicate keys 8301 and 3412. The second and third records containing those duplicate keys (which have already been imported into cust), are updated with 3, because orders = 3 is specified in the ON DUPLICATE KEY UPDATE clause.

SELECT * FROM cust ORDER BY name;
+--------+------+--------+
| name   | id   | orders |
+--------+------+--------+
| Adam   | 3412 |      3 |
| Chris  | 7214 |      6 |
| Elen   | 8301 |      3 |
| Rachel | 9125 |      2 |
+--------+------+--------+

Last modified: March 15, 2024

Was this article helpful?