Performing Upserts

If a row is to be inserted that would result in a duplicate value in a PRIMARY KEY or UNIQUE KEY, SingleStore will instead perform an UPDATE of the old row.
An upsert can only be done against a PRIMARY KEY or UNIQUE KEY column.
ON DUPLICATE KEY UPDATE cannot update unique key columns in columnstore tables. For rowstore tables, ON DUPLICATE KEY UPDATE is allowed as long as the updated column isn't part of the table's explicitly defined shard key or primary key if there's no explicitly defined shard key.
An upsert conflict occurs when there are multiple rows that could potentially be updated, because there are multiple unique keys defined on the table. When such a conflict occurs, only one row is updated. There is no guarantee as to which one.
The affected row count will be 0 if no existing row is changed, 1 if a new row is inserted, and 2 if an existing row is updated. In Example 1 below, the ON DUPLICATE KEY UPDATE clause updates the existing row instead of inserting a new row to avoid duplicate entry in a PRIMARY KEY field, and the query result shows that 2 rows were affected.

Examples

The following examples use the cust table, which is defined as a columnstore table as follows:

SQL

CREATE TABLE cust(name VARCHAR(32), id INT(11), orders INT(11), SORT KEY(id), UNIQUE KEY(id) USING HASH, SHARD KEY(id));

SQL

DESC cust;

+--------+-------------+------+------+---------+-------+
| Field  | Type        | Null | Key  | Default | Extra |
+--------+-------------+------+------+---------+-------+
| name   | varchar(32) | YES  |      | NULL    |       |
| id     | int(11)     | YES  | PRI  | NULL    |       |
| orders | int(11)     | YES  |      | NULL    |       |
+--------+-------------+------+------+---------+-------+

Add three rows of sample data for the examples:

SQL

INSERT INTO cust (name, id, orders) VALUES ("Chris",7214,2), ("Adam",3412,5), ("Elen",8301,4);

You could also define cust as a rowstore table, as follows, and the examples would return the same results.

SQL

CREATE ROWSTORE TABLE cust(name VARCHAR(32), id INT(11) PRIMARY KEY, orders INT(11));

Example 1: Basic Example of ON DUPLICATE KEY UPDATE

The following example demonstrates the use of INSERT with ON DUPLICATE KEY UPDATE.

SQL

SELECT * FROM cust;

+-------+------+--------+
| NAME  | ID   | ORDERS |
+-------+------+--------+
| Chris | 7214 |      2 |
| Elen  | 8301 |      4 |
| Adam  | 3412 |      5 |
+-------+------+--------+

SQL

INSERT INTO cust (ID, ORDERS) VALUES (7214, 3)
ON DUPLICATE KEY UPDATE ORDERS=3;

Query OK, 2 rows affected

SQL

SELECT * FROM cust;

+-------+------+--------+
| NAME  | ID   | ORDERS |
+-------+------+--------+
| Chris | 7214 |      3 |
| Elen  | 8301 |      4 |
| Adam  | 3412 |      5 |
+-------+------+--------+

Example 2: Using the VALUES() Function

The VALUES() function can be used to refer to column values from the INSERT clause of the INSERT ... ON DUPLICATE KEY UPDATE statement. For example:

SQL

SELECT * FROM cust;

+-------+------+--------+
| NAME  | ID   | ORDERS |
+-------+------+--------+
| Chris | 7214 |      2 |
| Adam  | 3412 |      5 |
| Elen  | 8301 |      4 |
+-------+------+--------+

SQL

INSERT INTO cust (ID, ORDERS) VALUES (7214, 4)
  ON DUPLICATE KEY UPDATE ORDERS = VALUES(ORDERS) + ORDERS;

Query OK, 2 rows affected

SQL

SELECT * FROM cust;

+-------+------+--------+
| NAME  | ID   | ORDERS |
+-------+------+--------+
| Chris | 7214 |      6 |
| Adam  | 3412 |      5 |
| Elen  | 8301 |      4 |
+-------+------+--------+

Here, the VALUES() function refers to the value that would have been inserted, if there were no duplicate-key conflicts, and should refer to a column name of the destination table.

Example 3: Using SELECT with ON DUPLICATE KEY UPDATE

The following code block demonstrates the use of SELECT with ON DUPLICATE KEY UPDATE:

SQL

SELECT * FROM cust_new;

+------+------+--------+
| NAME | ID   | ORDERS |
+------+------+--------+
| Bill |   21 |      5 |
| Gwen | 7214 |      3 |
| Sam  |   22 |      2 |
+------+------+--------+

The cust_new table is defined using the same table definition as cust, which is shown in the beginning of the examples.

SQL

SELECT * FROM cust;

+-------+------+--------+
| NAME  | ID   | ORDERS |
+-------+------+--------+
| Chris | 7214 |      6 |
| Elen  | 8301 |      4 |
| Adam  | 3412 |      5 |
+-------+------+--------+

SQL

INSERT INTO cust (NAME, ID, ORDERS) SELECT * FROM cust_new ON DUPLICATE KEY UPDATE NAME = VALUES(NAME), ORDERS =  VALUES (ORDERS);

Query OK, 4 rows affected
Records: 3  Duplicates: 1  Warnings: 0

SQL

SELECT * FROM cust;

+------+------+--------+
| NAME | ID   | ORDERS |
+------+------+--------+
| Bill |   21 |      5 |
| Gwen | 7214 |      3 |
| Elen | 8301 |      4 |
| Sam  |   22 |      2 |
| Adam | 3412 |      5 |
+------+------+--------+

Here, the VALUES() function refers to the value that would have been inserted, if there were no duplicate-key conflicts, and should refer to a column name of the destination table.

Example 4: Updating Values through a Two-step Process

The above examples are functionally equivalent to this common two-step operation.

Note: While this works, it is much less performant than the single-steps above.

SQL

SELECT * FROM cust;

+-------+------+--------+
| NAME  | ID   | ORDERS |
+-------+------+--------+
| Chris | 7214 |      6 |
| Elen  | 8301 |      4 |
| Adam  | 3412 |      5 |
+-------+------+--------+

SQL

SELECT * FROM cust_new;

+------+------+--------+
| NAME | ID   | ORDERS |
+------+------+--------+
| Bill |   21 |      5 |
| Gwen | 7214 |      3 |
| Sam  |   22 |      2 |
+------+------+--------+

The cust_new table is defined using the same table definition as cust, which is shown in the beginning of the examples.

Copy all the unique values from the cust_new to cust table.

SQL

INSERT IGNORE INTO cust SELECT * FROM cust_new;

SELECT * FROM cust;

+-------+------+--------+
| NAME  | ID   | ORDERS |
+-------+------+--------+
| Bill  |   21 |      5 |
| Chris | 7214 |      6 |
| Elen  | 8301 |      4 |
| Sam   |   22 |      2 |
| Adam  | 3412 |      5 |
+-------+------+--------+

Now, perform an update to copy all the non-unique (duplicate) values.

SQL

UPDATE cust
JOIN cust_new ON ( cust_new.ID=cust.ID )
SET cust.NAME = cust_new.NAME,
    cust.ORDERS = cust_new.ORDERS;

SELECT * FROM cust;

+------+------+--------+
| NAME | ID   | ORDERS |
+------+------+--------+
| Bill |   21 |      5 |
| Gwen | 7214 |      3 |
| Elen | 8301 |      4 |
| Sam  |   22 |      2 |
| Adam | 3412 |      5 |
+------+------+--------+

Example 5: Using ON DUPLICATE KEY UPDATE with a Pipeline

The following example demonstrates an upsert on a Pipeline.

Assume the directory /order_files has one file orders1.txt, which contains the following data:

Chris,7214,6
Elen,8301,4
Adam,3412,5
Rachel,9125,2
Susan,8301,7
George,3412,9

Create a Pipeline with an ON DUPLICATE KEY UPDATE clause:

SQL

CREATE PIPELINE p AS LOAD DATA FS '/order_files/orders*.txt' 
INTO TABLE cust FIELDS TERMINATED BY ',' ON DUPLICATE KEY UPDATE orders = 3;

Start the Pipeline:

SQL

START PIPELINE p;

As the Pipeline ingests the data from orders1.txt into the cust table, it encounters the fifth and sixth records in the file, which contain the duplicate keys 8301 and 3412. The second and third records containing those duplicate keys (which have already been imported into cust), are updated with 3, because orders = 3 is specified in the ON DUPLICATE KEY UPDATE clause.

SQL

SELECT * FROM cust ORDER BY name;

+--------+------+--------+
| name   | id   | orders |
+--------+------+--------+
| Adam   | 3412 |      3 |
| Chris  | 7214 |      6 |
| Elen   | 8301 |      3 |
| Rachel | 9125 |      2 |
+--------+------+--------+

Example 5 works because the new value for orders is specified directly (orders = 3). If you try to update the value with an expression (such as a value from the input file), you will see an error like this:

Feature 'UPDATE value depends on tables other than that being updated.' is not supported by MemSQL.

In order to update existing values with a new value specified in the pipeline, use the VALUES() clause. So if the orders.txt file looks instead like this:

Chris,7214,6
Elen,8301,99
Adam,3412,99
Rachel,9125,2
Susan,8301,99
George,3412,9

And we update the ON DUPLICATE KEY UPDATE statement:

SQL

CREATE PIPELINE p2 AS LOAD DATA FS '/order_files/orders*.txt' 
INTO TABLE cust FIELDS TERMINATED BY ',' ON DUPLICATE KEY UPDATE orders = VALUES(orders);

START PIPELINE p2;

The updated results:

SELECT * FROM cust ORDER BY name;

+--------+------+--------+
| name   | id   | orders |
+--------+------+--------+
| Adam   | 3412 |      9 |
| Chris  | 7214 |      6 |
| Elen   | 8301 |     99 |
| Rachel | 9125 |      2 |
+--------+------+--------+