# LOAD DATA

> **❗ Important**: SingleStore clusters can be integrated with many [third-party ETL and CDC tools](https://docs.singlestore.com/db/v9.1/load-data/integrate-with-singlestore.md).

The `LOAD DATA` command imports data stored in a CSV, JSON, BSON, Avro, or Parquet file into a SingleStore table (referred to as the destination table in this topic).

## Remarks

The syntax and semantics of loading data from a CSV, JSON, BSON, Avro, or Parquet file are detailed below.

`REPLACE`, `SKIP CONSTRAINT ERRORS`, and `SKIP DUPLICATE KEY ERRORS` are supported with non-CSV pipelines.

During the import of data stored in any of these files, you can optionally apply operations to the data as follows:

* Use the `WHERE` clause to filter on incoming data. Only rows that satisfy the expression in the `WHERE` clause will be loaded into SingleStore. For an example of how to use the `WHERE` clause, see the [examples section](https://docs.singlestore.com/db/v9.1/reference/sql-reference/data-manipulation-language-dml/load-data/#UUID-a1ff6bf6-7cd6-5486-9be7-369bd43ad19d.md).
* Use the `SET` clause to set columns using specific values or expressions with variables. For example, if your input file has 9 columns but the table has a 10th column called `foo`, you can add `SET foo=0` or `SET foo=@myVariable`. Note that column names may only be used on the left side of `SET` expressions.
* Use the `CHARACTER SET` clause to import files with any supported character set into SingleStore.

  For more information, see [Character Encoding](https://docs.singlestore.com/db/v9.1/reference/sql-reference/character-encoding.md).

Refer to the [Permissions Matrix](https://docs.singlestore.com/db/v9.1/reference/sql-reference/security-management-commands/permissions-matrix.md) for the required permissions.

> **❗ Important**: If a query uses `@` in a `LOAD DATA` statement, SingleStore interprets it as a reference to a `LOAD DATA` assignment to a variable, not as a reference to a [user-defined variable](https://docs.singlestore.com/db/v9.1/reference/sql-reference/user-defined-variables.md).

The behavior of SingleStore’s `LOAD DATA` command has several functional differences from MySQL’s command:

* `LOAD DATA` will load the data into SingleStore in parallel to maximize performance. This makes `LOAD DATA` in SingleStore much faster on machines with a larger number of processors.
* `LOAD DATA` supports loading compressed .gz files.
* The only supported `charset_name` is `utf8`.
* While it is possible to execute `LOAD DATA` on leaf nodes, it should only be run on master aggregator or child aggregator node types. See [Cluster Management Commands](https://docs.singlestore.com/db/v9.1/reference/sql-reference/cluster-management-commands.md) for more information. Note that when running this command on reference tables you must connect to the master aggregator.

The `mysqlimport` utility can also be used to import data into SingleStore. `mysqlimport` uses `LOAD DATA` internally.

By default, the `LOAD DATA` operation does not fail when it encounters row-level errors. Instead, SingleStore records errors and continues ingesting data. The default maximum number of errors allowed is 1000 per table, not per partition. If the total number of errors across all partitions exceeds this limit, the operation fails.

This prevents out-of-memory issues when unintentionally loading large files with incorrect formatting or an incorrect `LOAD DATA` statement. Use the `MAX_ERRORS` clause at the end of the `LOAD DATA` statement to change the limit on the number of errors per partition before the operation fails:

* Specify `MAX_ERRORS` `n` (where `n > 0`) to allow up to n errors per partition.
* Specify `MAX_ERRORS 0` to disable error tracking. In this case, all row errors are ignored and the operation continues without failing.

For example, if the total number of errors across all partitions exceeds 1000, the `LOAD DATA` operation fails. When using the `ERRORS HANDLE` clause, the default `MAX_ERRORS` limit is disabled, and errors are handled according to the specified error-handling configuration.

Writing to multiple databases in a transaction is not supported.

## Transforming Data with Expressions

The `LOAD DATA` command supports inline data transformation using the `SET` clause. This allows you to clean, format, and manipulate data during ingestion, eliminating the need for post-processing.

## Syntax

```
LOAD DATA [LOCAL] INFILE 'file_path'
INTO TABLE table_name
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
(@var1, @var2, @var3, ...)
SET
   column1 = expression1,
   column2 = expression2,
   ...;

```

The `SET` clause maps user-defined variables (prefixed with `@`) to table columns and applies transformations.

The right-hand side of the `=` operator in the `SET` clause can include any scalar SQL expression. Subqueries are not supported.

Expressions can reference one or more input column variables (for example, `@var1`, `@var2`) and can combine them using conditional logic or other operations.

Supported expressions include:

* Built-in functions (for example, `UPPER()`, `CAST()`, `IF()`, `COALESCE()`)
* Arithmetic and string operations
* User-defined functions (UDFs), including Procedural SQL (PSQL) and external UDFs
* Expressions that reference multiple input column variables (for example, `SET col = IF(@var1 = '', @var2, @var1)`)

## Examples

The following examples demonstrate how to use the `SET` clause in `LOAD DATA` to perform inline data transformations during ingestion. These transformations ensure that data is consistent, properly formatted, and ready for use at the time of insertion.

## Example 1: Clean and Normalize Text Data

Standardize string values during import.

```sql
CREATE TABLE products (
   product_id INT,
   product_name VARCHAR(100),
   category VARCHAR(50)
);

LOAD DATA LOCAL INFILE '/tmp/products.csv'
INTO TABLE products
FIELDS TERMINATED BY ','
(@id, @name, @cat)
SET
   product_id = @id,
   product_name = UPPER(TRIM(@name)),   -- Convert to uppercase and trim whitespace
   category = LOWER(TRIM(@cat));        -- Convert to lowercase and trim whitespace

```

## Example 2: Extract Fields from JSON

Parse JSON input and map fields to columns.

```sql
CREATE TABLE user_events (
   user_id INT,
   event_type VARCHAR(50),
   event_value VARCHAR(255),
   metadata JSON
);

LOAD DATA INFILE '/data/events.json'
INTO TABLE user_events
FIELDS TERMINATED BY ','
(@user, @json_data)
SET
   user_id = @user,
   event_type = JSON_EXTRACT_STRING(@json_data, 'event_type'),
   event_value = JSON_EXTRACT_STRING(@json_data, 'value'),
   metadata = @json_data;  -- Store full JSON payload

```

## CSV LOAD DATA

## Syntax

```sql
LOAD DATA [LOCAL] INFILE '<file_name>'
  [REPLACE | IGNORE | SKIP { ALL | CONSTRAINT | DUPLICATE KEY | PARSER } ERRORS]
  INTO TABLE <table_name>
  [CHARACTER SET <character_set_name>]
  [{FIELDS | COLUMNS}
  [TERMINATED BY '<string>']
      [[OPTIONALLY] ENCLOSED BY '<char>']
      [ESCAPED BY '<char>']
  ]
  [LINES
  [STARTING BY '<string>']
  [TERMINATED BY '<string>']
  ]
  [TRAILING NULLCOLS]
  [NULL DEFINED BY <string> [OPTIONALLY ENCLOSED]]
  [IGNORE <number> LINES]
  [ ({<column_name> | @<variable_name>}, ...) ]
  [SET <column_name> = <expression>,...]
  [WHERE <expression>,...]
  [MAX_ERRORS <number>]
  [ERRORS HANDLE <string>]

```

> **📝 Note**: For more information on using LOAD DATA to load CSV files, refer to [Load CSV Files with LOAD DATA](https://docs.singlestore.com/db/v9.1/load-data/load-data-from-files/load-data-from-csv-files/load-csv-files-with-load-data.md).

## JSON LOAD DATA

## Syntax

```sql
LOAD DATA [LOCAL] INFILE 'file_name'
  [REPLACE | SKIP { CONSTRAINT | DUPLICATE KEY } ERRORS]
  INTO TABLE tbl_name
  FORMAT JSON
  subvalue_mapping
  [SET col_name = expr,...]
  [WHERE expr,...]
  [MAX_ERRORS number]
  [ERRORS HANDLE string]

subvalue_mapping:
  ( {col_name | @variable_name} <- subvalue_path [DEFAULT literal_expr], ...)

subvalue_path:
  {% | [%::]ident [::ident ...]}

```

> **📝 Note**: For more information on using LOAD DATA to load JSON files, refer to [Load JSON Files with LOAD DATA](https://docs.singlestore.com/db/v9.1/load-data/load-data-from-files/load-data-from-json-files/load-json-files-with-load-data.md).

## Loading JSON Data from a File

The following example loads data from a JSON file hosted on S3 into a SingleStore table and maps nested JSON fields to individual table columns.

This example uses the following sample data set (`books.json`):

```json
{
  "title": "Onyx Storm",
  "author": "Rebecca Yarros",
  "details": {
    "publisher": "Entangled:Red Tower Books",
    "numpages": 544,
    "series": "The Empyrean"
  }
}
{
  "title": "The Maid",
  "author": "Nita Prose",
  "details": {
    "publisher": "Ballantine Books",
    "numpages": 385
  }
}
```

Create a table to store the data:

```sql
CREATE TABLE books_json_multi_col (
    title_s2 TEXT,
    author_s2 TEXT,
    publisher_s2 TEXT,
    numpages_s2 INT,
    series_s2 TEXT
);
```

Load the JSON data and extract nested fields using the `::` subvalue path syntax:

```sql
LOAD DATA LOCAL INFILE 'books.json'
INTO TABLE books_json_multi_col
(
    title_s2 <- title,
    author_s2 <- author,
    publisher_s2 <- details::publisher,
    numpages_s2 <- details::numpages,
    series_s2 <- details::series DEFAULT NULL
)
FORMAT JSON;
```

The JSON data has been ingested and is now stored in your SingleStore database.

```sql
SELECT * FROM books_json_multi_col;

```

```output

+-----------------+----------------+---------------------------+-------------+--------------+
| title_s2        | author_s2      | publisher_s2              | numpages_s2 | series_s2    |
+-----------------+----------------+---------------------------+-------------+--------------+
| Onyx Storm      | Rebecca Yarros | Entangled:Red Tower Books | 544         | The Empyrean |
| The Maid        | Nita Prose     | Ballantine Books          | 385         | NULL         |
+-----------------+----------------+---------------------------+-------------+--------------+
```

## BSON LOAD DATA

The `LOAD DATA` command supports loading BSON data from files using the `FORMAT BSON` clause. The `LOAD DATA ... FORMAT BSON` SQL statement is similar to `LOAD DATA ... FORMAT JSON` with the following exceptions:

* The `FORMAT BSON` clause does not support default values.
* The `subvalue_mapping` clause must be specified in the `LOAD DATA ... FORMAT BSON` SQL statement.
* The target columns in the `subvalue_mapping` clause must be `BSON` type columns. If the target columns are non-`BSON` type, they must be mapped to a user-defined variable and then assigned to the column using the `SET` clause.

Refer to [JSON LOAD DATA](https://docs.singlestore.com/#UUID-390757f0-40b4-4d47-acec-d7c9a4c84ceb.md) for more information.

## Syntax

```sql
LOAD DATA [LOCAL] INFILE 'file_name'
  [REPLACE | SKIP { CONSTRAINT | DUPLICATE KEY } ERRORS]
  INTO TABLE tbl_name
  FORMAT BSON
  subvalue_mapping
  [SET col_name = expr,...]
  [WHERE expr,...]
  [MAX_ERRORS number]
  [ERRORS HANDLE string]

subvalue_mapping:
  ( {col_name | @variable_name} <- subvalue_path, ...)

subvalue_path:
  {% | [%::]ident [::ident ...]}
```

## Loading BSON Data from a File

The following example restores a MongoDB® backup into SingleStore.

This example uses the following sample data set.

```MongoDB
use dbm
db.bsonExport.insertMany( [
  { _id: 1, Code: "xv1f", Qty: 45 },
  { _id: 2, Code: "nm3w", Qty: 30 },
  { _id: 3, Code: "qoma", Qty: 20 },
  { _id: 4, Code: "hr3k", Qty: 15 } ] )

```

```output

{ acknowledged: true,
  insertedIds: { '0': 1, '1': 2, '2': 3, '3': 4 } }
```

Create a binary export of the MongoDB® data using the `mongodump` tool:

```shell
mongodump --uri="mongodb://<username>:<password>@<mongodb-endpoint>:27017/?authMechanism=PLAIN&tls=true&loadBalanced=true" --db="dbm" --collection="bsonExport" --out="<path_to_output_directory>"
```

This command creates a `bsonExport.bson` file in the target output directory.

Create a table in your SingleStore database to store the BSON data:

```sql
CREATE TABLE bsonExport (
  _id BSON NOT NULL, 
  _more BSON NOT NULL COMMENT 'KAI_MORE', 
  `$_id` AS BSON_NORMALIZE_NO_ARRAY(`_id`) PERSISTED LONGBLOB COMMENT 'KAI_AUTO', 
  SHARD KEY (`$_id`), PRIMARY KEY (`$_id`));
```

Load the `bsonExport.bson` file into SingleStore using the following command:

```sql
LOAD DATA INFILE '<path_to_output_directory>/bsonExport.bson'  
  INTO TABLE bsonEx  FORMAT BSON (_id <- %::_id, @V1 <- %) 
  SET _more = BSON_EXCLUDE_MASK(@V1,'{"_id":1}');
```

The BSON data has been ingested and is now stored in your SingleStore database.

```sql
SELECT _id:>JSON AS "_id", _more:>JSON AS "_more" FROM bsonEx;

```

```output

+------+--------------------------+
| _id  | _more                    |
+------+--------------------------+
| 4    | {"Code":"hr3k","Qty":15} |
| 3    | {"Code":"qoma","Qty":20} |
| 2    | {"Code":"nm3w","Qty":30} |
| 1    | {"Code":"xv1f","Qty":45} |
+------+--------------------------+
```

## Avro LOAD DATA

## Syntax for LOAD DATA Local Infile

```sql
LOAD DATA [LOCAL] INFILE 'file_name'
  WHERE/SET/SKIP ERRORS[REPLACE | SKIP { CONSTRAINT | DUPLICATE KEY } ERRORS]
  INTO TABLE tbl_name
  FORMAT AVRO SCHEMA REGISTRY {"IP" | "Hostname"}
  subvalue_mapping
  [SET col_name = expr,...]
  [WHERE expr,...]
  [MAX_ERRORS number]
  [ERRORS HANDLE string]
  [SCHEMA 'avro_schema']

subvalue_mapping:
  ( {col_name | @variable_name} <- subvalue_path, ...)

subvalue_path:
  {% | [%::]ident [::ident ...]}

```

See the associated [GitHub repo](https://github.com/singlestore-labs/singlestore-avro-sample).

> **📝 Note**: For more information on using LOAD DATA to load AVRO files, refer to [Load Avro Files with LOAD DATA](https://docs.singlestore.com/db/v9.1/load-data/load-data-from-files/load-data-from-avro-files/load-avro-files-with-load-data.md).

## Loading Parquet Data

The `LOAD DATA` command supports loading Parquet files from AWS S3 or local files. You can also use the `LOAD DATA` clause in a [CREATE PIPELINE .. FORMAT ](https://docs.singlestore.com/db/v9.1/reference/sql-reference/pipelines-commands/create-pipeline.md) statement to create a pipeline that loads Parquet files.

## Syntax for LOAD DATA AWS S3 or Local File Source

Parquet-formatted data stored in an AWS S3 bucket or the local filesystem can be loaded via a LOAD DATA query without a pipeline. This streamlines the process of loading cloud-stored data into tables. Other `LOAD DATA` clauses (`SET`, `WHERE`, etc.) are supported (but not shown) in the following syntax examples.

For S3:

```sql
LOAD DATA S3 '<bucket name>'
CONFIG '{"region" : "<region_name>"}' 
CREDENTIALS '{"aws_access_key_id" : "<key_id> ", 
             "aws_secret_access_key": "<access_key>"}' 
INTO TABLE <table_name>
       (`<col_a>` <- %, 
 `<col_b>` <- % DEFAULT NULL , 
  ) FORMAT PARQUET;
```

This data can also be loaded from S3 by using a connection link. Refer to [CREATE LINK](https://docs.singlestore.com/db/v9.1/reference/sql-reference/security-management-commands/create-link.md) for more information on. connection links.

```sql
LOAD DATA LINK <link_name> '<bucket name>/<path>'
INTO TABLE <table_name>(`<col_a>` <- %,
`<col_b>` <- % DEFAULT NULL ,
) FORMAT PARQUET;
```

For local file:

```sql
LOAD DATA INFILE '<path_to_file/file_name>'
INTO TABLE <table_name>
    (val1 <- source1, 
     val2 <- source2
     [ ... ]
) [COMPRESSION { AUTO | NONE | LZ4 | GZIP }]
[ ... ]
FORMAT PARQUET;
```

## Handling Data Compression

The `COMPRESSION` clause specifies how `LOAD DATA` handles the compression of an input file.

## Syntax for LOAD DATA Local Infile

```sql
LOAD DATA INFILE 'filename' COMPRESSION { AUTO | NONE | LZ4 | GZIP } INTO TABLE ...

LOAD DATA INFILE 'filename' INTO TABLE `tablename` COMPRESSION { AUTO | NONE | LZ4 | GZIP } ...

```

## Arguments

* `AUTO`: This is the default setting, it tells `LOAD DATA` to identify the compression type from the input file’s extension.
* `NONE`: Specifies that the input file is uncompressed.
* `LZ4`: Specifies that the input file is compressed with `LZ4` compression algorithm.
* `GZIP`: Specifies that the input file is compressed with `GZIP` compression algorithm.

## Remarks

* If `COMPRESSION` is set to `NONE`, `LZ4`, or `GZIP`, `LOAD DATA` will not use the extension of the input file to determine the type of compression. For example, if you load a file `test.gz` and specify the `COMPRESSION` as `NONE`, then `LOAD DATA` will handle `test.gz` as an uncompressed file.

## LOCAL

`LOCAL` affects the expected file location, the search behavior for relative path names, and [Error Handling](https://docs.singlestore.com/#UUID-a9508e60-d2a3-8105-7a41-5fd7459d3972.md) behavior.

When you specify `LOCAL`, the client reads `file_name` and sends it to the server. The expected location of the file is within the client directory. If `file_name` is a relative path, it is relative to the current working directory of the client.

When `LOCAL` is not specified, the file is read by the server, and needs to be located on the related server host. If a relative file path name is specified, it is searched for as relative to the server’s data directory, or as relative to the directory of the default database in cases where no leading components are given.

Because files need to be sent from the client to the server, specifying `LOCAL` can be slower. However, it does have the security advantage in that the client user must be able to read the file(s) being loaded. Where `LOCAL` is not specified, the server needs access to the full data directory, meaning that any user who has the permissions to `LOAD DATA` or `CREATE PIPELINE` can read the directory. This is because those permissions include `FILE READ` [Permissions Matrix](https://docs.singlestore.com/db/v9.1/reference/sql-reference/security-management-commands/permissions-matrix.md)

`LOCAL` does not support *globbing* (such as using wildcards in directory or filenames).

An example of using `LOCAL` follows:

```sql
LOAD DATA LOCAL INFILE '/example-directory/foo.csv'
  INTO TABLE foo
  COLUMNS TERMINATED BY ',';
```

## Error Logging

When you run the `LOAD DATA` command and use the `ERRORS HANDLE` clause, `LOAD DATA` logs errors to the `information_schema.LOAD_DATA_ERRORS` table. The logged errors are the erroneous lines that `LOAD DATA` encountered as it processed the input file. When this clause is used, it disables the `MAX_ERRORS` default limit of 1000.

When `LOAD DATA` is used with a `TEMPORARY TABLE`, errors are not logged in `information_schema.load_data_errors` if the `LOAD DATA` statement fails.

See the next section for example data that `LOAD DATA ... ERRORS HANDLE` populates in the `information_schema.LOAD_DATA_ERRORS` [table](https://docs.singlestore.com/db/v9.1/reference/information-schema-reference/data-ingest/load-data-errors.md).

Use the [CLEAR LOAD ERRORS](https://docs.singlestore.com/db/v9.1/reference/sql-reference/data-manipulation-language-dml/clear-load-errors.md) command to remove errors from `information_schema.LOAD_DATA_ERRORS`.

## Error Handling

`LOAD DATA` has several options to handle errors that it encounters as it processes the input file. When you write a `LOAD DATA` statement, you can decide which option to use.

* By default, `LOAD DATA` returns errors to the client application. Errors are returned one at a time.
* To ignore duplicate key/index value errors in the input file, use the `REPLACE` clause to replace existing rows with input rows. This clause first deletes the existing rows that have the same value for a primary key or unique index as the input rows, and then inserts the new row.
* To skip errors in the input file, use the `SKIP ... ERRORS` clause. Data in the erroneous lines will not be inserted into the destination table.
* To ignore errors in the input file, use the `IGNORE` clause. This clause replaces invalid values with their defaults, discards extra fields, or discards erroneous lines completely.
* When `LOCAL` is specified, duplicate-key and data interpretation errors do not stop the operation. When `LOCAL` is not specified, duplicate-key and data interpretation stop the operation.

> **⚠️ Warning**: In most cases, use `SKIP ... ERRORS` instead of `IGNORE`. If you use `IGNORE` without understanding how it behaves, `LOAD DATA` may produce unexpected results as it inserts data into the destination table.

The four error handling options are discussed in the following topics.

## Default Error Handling

By default, `LOAD DATA` returns errors to the client application. Errors are returned one at a time. If it returns an error, no data will be inserted into the destination table.

## Error Handling Example

Create a new table with a `PRIMARY KEY` column:

```sql
CREATE TABLE orders(
  id BIGINT PRIMARY KEY,
  customer_id INT,
  item_description VARCHAR(255),
  order_time TIMESTAMP NOT NULL
  );

```

The following CSV file will loaded be into this table as `orders.csv`. Note that line 2 has an extra column and there is a duplicate primary key value of `2` in line 4.

```
1,372,Apples,2016-05-09
3,307,Oranges,2016-07-31,1000
2,138,Pears,2016-07-14
2,236,Bananas,2016-06-23

```

Load the data into the table:

```sql
LOAD DATA INFILE 'orders.csv'
  INTO TABLE orders
  FIELDS TERMINATED BY ',';

```

```output

ERROR 1262 (01000): Row 2 was truncated; it contained more data than there were input columns

```

After removing the extra column from row 2:

```sql
ERROR 1062 (23000): Leaf Error (127.0.0.1:3308): Duplicate entry '2' for key 'PRIMARY'

```

After removing the duplicate primary key entry, the `LOAD DATA` statement is successful and the input file is loaded into the table.

## `REPLACE` Error Handling

SingleStore’s `REPLACE` behavior allows you to replace the existing rows with the new rows; only those rows that have the same value for a primary key or unique index as the input rows are replaced. In case of an error that arises due to duplicate key value, it first deletes the conflicting row in the destination that has the duplicate key value and then inserts the new row from the source file.

`LOAD DATA` inserts source file rows into the destination table in the order in which the rows appear in the source file. When `REPLACE` is specified, source files that contain duplicate unique or primary key values will be handled in the following way:

* If the destination table’s schema specifies a unique or primary key column, and
* The source file contains a row with the same primary or unique key value as the destination table, then
* The row in the destination table that has the same unique or primary key value as the row in the source file will be deleted and a new row from the source file that matches the primary key value will be inserted into the destination table.

**Note**: If the source file contains multiple rows with the same primary or unique key value as the destination table, then only the last row in the source file with the same primary or unique key value (as the destination table) replaces the existing row in the destination table.

**Note**: `REPLACE` cannot be combined with `SKIP DUPLICATE KEY ERRORS`. The default behavior of SingleStore is to throw an error for duplicate keys. However, both `REPLACE` and `SKIP DUPLICATE KEY ERRORS` does not throw a duplicate key error; `REPLACE` replaces the old row with the new row, while `SKIP DUPLICATE KEY ERRORS` discards the new row and retains the old row.

## `REPLACE` Error Handling Example

Create a new table with a `PRIMARY KEY` column:

```sql
CREATE TABLE orders(
  id BIGINT PRIMARY KEY,
  customer_id INT,
  item_description VARCHAR(255),
  order_time DATETIME NOT NULL
  );

```

A row with a primary key `4` is inserted as follows:

```sql
INSERT INTO orders VALUES(4,236,"Bananas",2016-06-23);

```

The following CSV file is loaded into the table as `orders.csv`. Note the duplicate primary key value of `4` in line 2:

```
1,372,Apples,2016-05-09
4,138,Pears,2016-07-14
3,307,Oranges,2016-07-31

```

Load the data into the table:

```sql
LOAD DATA INFILE 'orders.csv'
  REPLACE
  INTO TABLE orders
  FIELDS TERMINATED BY ','
  ERRORS HANDLE 'orders_errors';

```

Line 2 in the source file contained a duplicate primary key `4`. The `REPLACE` error handler deletes the row `4,236,"Bananas",2016-06-23` in the destination table and replaces it with the value `4,138,Pears,2016-07-14` from the source file.

## `SKIP ... ERRORS` Error Handling

SingleStore’s `SKIP ... ERRORS` behavior allows you to specify an error scenario that, when encountered, discards an offending row. Three kinds of error scenarios can be skipped:

* `SKIP DUPLICATE KEY ERRORS`: Any row in the source data that contains a duplicate unique or primary key will be discarded. If the row contains invalid data other than a duplicate key, an error will be generated. See [SKIP DUPLICATE KEY ERRORS](https://docs.singlestore.com/#UUID-cc0145ee-ec9d-05a4-261f-c0ff174faed2.md) below.
* `SKIP CONSTRAINT ERRORS`: Inclusive of `SKIP DUPLICATE KEY ERRORS`. If a row violates a column’s `NOT NULL` constraint, or the row contains invalid JSON or Geospatial values, the row will be discarded. If the row contains invalid data outside the scope of constraint or invalid value errors, an error will be generated. See [SKIP CONSTRAINT ERRORS](https://docs.singlestore.com/#UUID-5d886ddc-b1dc-b08f-ed95-8ff7fac9da9e.md) below.
* `SKIP ALL ERRORS`: Inclusive of `SKIP DUPLICATE KEY ERRORS` and `SKIP CONSTRAINT ERRORS`. Also includes any parsing errors in the row caused by issues such as an invalid number of fields. See [SKIP ALL ERRORS](https://docs.singlestore.com/#UUID-8d8456d6-b63b-4a11-94e1-da02f8ee4083.md) below.

## SKIP DUPLICATE KEY ERRORS

When `SKIP DUPLICATE KEY ERRORS` is specified, source files that contain duplicate unique or primary key values will be handled in the following way:

* If the destination table’s schema specifies a unique or primary key column, and
* The source file contains one or more rows with a duplicate key value that already exists in the destination table or exists elsewhere in the source file, then
* Every duplicate row in the source file will be discarded and will not be inserted into the destination table.

`SKIP DUPLICATE KEY ERRORS` cannot be combined with `REPLACE`.

## SKIP DUPLICATE KEY ERRORS Example

Create a new table with a `PRIMARY KEY` column:

```sql
CREATE TABLE orders(
  id BIGINT PRIMARY KEY,
  customer_id INT,
  item_description VARCHAR(255),
  order_time TIMESTAMP NOT NULL
  );

```

The following CSV file will loaded be into this table as `orders.csv`. Note the duplicate primary key value of `2` in line 3:

```
1,372,Apples,2016-05-09
2,138,Pears,2016-07-14
2,236,Bananas,2016-06-23
3,307,Oranges,2016-07-31

```

Load the data into the table:

```sql
LOAD DATA INFILE 'orders.csv'
  SKIP DUPLICATE KEY ERRORS
  INTO TABLE orders
  FIELDS TERMINATED BY ','
  ERRORS HANDLE 'orders_errors';

```

Note that only 3 rows were inserted even though 4 rows were present in the source file. Line 3 in the source file contained a duplicate primary key, and you can verify that it was not inserted by querying the `INFORMATION_SCHEMA.LOAD_DATA_ERRORS` table:

```sql
SELECT load_data_line_number, load_data_line, error_message
  FROM INFORMATION_SCHEMA.LOAD_DATA_ERRORS
  WHERE handle = 'orders_errors'
  ORDER BY load_data_line_number;

```

```output

+-----------------------+---------------------------+--------------------------------+
| load_data_line_number | load_data_line            | error_message                  |
+-----------------------+---------------------------+--------------------------------+
|                     3 | 2,236,Bananas,2016-06-23  | Duplicate entry for unique key |
+-----------------------+---------------------------+--------------------------------+

```

## SKIP CONSTRAINT ERRORS

`SKIP CONSTRAINT ERRORS` is inclusive of `SKIP DUPLICATE KEY ERRORS` if `REPLACE` is not specified. It also applies to rows that violate a column’s `NOT NULL` constraint and fields that contain invalid JSON or Geospatial values, and handles the offending rows in the following ways:

**NOT NULL Constraint**

* If a column in the destination table specifies a `NOT NULL` constraint, and
* The source file contains one or more rows with a null value for the constraint column, then
* The offending row(s) will be discarded and will not be inserted into the destination table.

**Invalid JSON or Geospatial Data**

* If a column in the destination table specifies a `JSON`, `GEOGRAPHYPOINT`, or `GEOGRAPHY` data type, and
* The source file contains one or more rows with invalid values for fields of these types, then
* The offending row(s) will be discarded and will not be inserted into the destination table.

`SKIP CONSTRAINT ERRORS` can also be combined with the `REPLACE` clause.

## SKIP CONSTRAINT ERRORS Example

Create a new table with a `JSON` column type that also has a `NOT NULL` constraint:

```sql
CREATE TABLE orders(
  id BIGINT PRIMARY KEY,
  customer_id INT,
  item_description VARCHAR(255),
  order_properties JSON NOT NULL
  );

```

The following CSV file will loaded be into this table as `orders.csv`. Note the malformed JSON in line 2, as well as a null value (`\N`) for JSON in line 4:

```
1,372,Apples,{"order-date":"2016-05-09"}
2,138,Pears,{"order-date"}
3,236,Bananas,{"order-date":"2016-06-23"}
4,307,Oranges,\N

```

Load the data into the table:

```sql
LOAD DATA INFILE 'orders.csv'
  SKIP CONSTRAINT ERRORS
  INTO TABLE orders
  FIELDS TERMINATED BY ','
  ERRORS HANDLE 'orders_errors';

```

Note that only 2 rows were inserted even though 4 rows were present in the source file. Line 2 contained malformed JSON, and Line 4 contained an invalid null value. You can verify that both of these offending rows were not inserted by querying the `INFORMATION_SCHEMA.LOAD_DATA_ERRORS` table:

```sql
SELECT load_data_line_number, load_data_line, error_message
  FROM INFORMATION_SCHEMA.LOAD_DATA_ERRORS
  WHERE handle = 'orders_errors'
  ORDER BY load_data_line_number;

```

```output

+-----------------------+-----------------------------+--------------------------------------------------------------+
| load_data_line_number | load_data_line              | error_message                                                |
+-----------------------+-----------------------------+--------------------------------------------------------------+
|                     2 | 2,138,Pears,{"order-date"}  | Invalid JSON value for column 'order_properties'             |
|                     4 | 4,307,Oranges,\N            | NULL supplied to NOT NULL column 'order_properties' at row 4 |
+-----------------------+-----------------------------+--------------------------------------------------------------+

```

## SKIP ALL ERRORS

`SKIP ALL ERRORS` is inclusive of `SKIP DUPLICATE KEY ERRORS` and `SKIP CONSTRAINT ERRORS` in addition to any parsing error. Offending rows are handled in the following way:

* If one or more rows in the source file cause `... DUPLICATE KEY ...` or `... CONSTRAINT ...` errors, or
* If one or more rows in the source file cause parsing errors such as invalid delimiters or an invalid number of fields,
* The offending row(s) will be discarded and will not be inserted into the destination table.

`SKIP ALL ERRORS` can also be combined with `REPLACE`.

## SKIP ALL ERRORS Example

Create a new table with a `JSON` column type that also has a `NOT NULL` constraint:

```sql
CREATE TABLE orders(
  id BIGINT PRIMARY KEY,
  customer_id INT,
  item_description VARCHAR(255),
  order_properties JSON NOT NULL
  );

```

The following CSV file will loaded be into this table as `orders.csv`. There are three things wrong with this file:

* Line 2 contains only 3 fields
* Line 3 has a duplicate primary key
* Line 4 has a null value for a `NOT NULL` constraint

```
1,372,Apples,{"order-date":"2016-05-09"}
2,138,Pears
1,236,Bananas,{"order-date":"2016-06-23"}
4,307,Oranges,\N

```

Load the data into the table:

```sql
LOAD DATA INFILE 'orders.csv'
  SKIP ALL ERRORS
  INTO TABLE orders
  FIELDS TERMINATED BY ','
  ERRORS HANDLE 'orders_errors';

```

Only 1 row was written, despite the source file containing 4 rows. Line 2 was dropped because it contained an invalid number of fields, Line 3 was dropped because it contained a duplicate primary key, and line 4 was dropped because it contained a null value for a `NOT NULL` constraint. You can verify that these offending rows were not inserted by querying the `INFORMATION_SCHEMA.LOAD_DATA_ERRORS` table:

```sql
SELECT load_data_line_number, load_data_line, error_message
  FROM INFORMATION_SCHEMA.LOAD_DATA_ERRORS
  WHERE handle = 'orders_errors'
  ORDER BY load_data_line_number;

```

```output

+-----------------------+--------------------------------------------+--------------------------------------------------------------+
| load_data_line_number | load_data_line                             | error_message                                                |
+-----------------------+--------------------------------------------+--------------------------------------------------------------+
|                     2 | 2,138,Pears                                | Row 2 doesn't contain data for all columns                   |
|                     3 | 1,236,Bananas,{"order-date":"2016-06-23"}. | Duplicate entry for unique key                               |
|                     4 | 4,307,Oranges,\N                           | NULL supplied to NOT NULL column 'order_properties' at row 4 |
+-----------------------+--------------------------------------------+--------------------------------------------------------------+

```

## `IGNORE` Error Handling

SingleStore’s `IGNORE` behavior is identical to [MySQL’s `IGNORE` behavior](http://dev.mysql.com/doc/refman/5.7/en/load-data.html), and exists only to support backwards compatibility with applications written for MySQL. `IGNORE` either discards malformed rows, discards extra fields, or replaces invalid values with default data type values. In addition, if an inserted row would have produced an error if `IGNORE` was not specified, it will be converted to a warning instead.

## Consequences of Using IGNORE Instead of SKIP ERRORS

Unlike `SKIP ... ERRORS` which discards offending rows, `IGNORE` may change the inserted row’s data to ensure that it adheres to the table schema. This behavior can have serious repercussions for the data integrity of the destination table.

In a best case scenario where a malformed row uses the proper delimiters and contains the correct number of fields, the row can be partially salvaged. Any invalid values are updated with default values, and the modified row is written to the destination table. The result is that at least some of the source data was written to the destination table.

However, the worst case scenario can be severe. For example, if a row’s values are separated by an invalid delimiter, each field is updated with meaningless default values and the modified row is written to the destination table. For the sake of the table’s data integrity, it would have been better if the offending row was discarded. But a row with meaningless data was inserted instead.

Due to the potential consequences of using `IGNORE`, in most cases `SKIP ... ERRORS` is a better option. To understand `IGNORE`’s behavior for each error scenario, continue reading the sections below:

* [Duplicate Unique or Primary Key Values](https://docs.singlestore.com/#UUID-a9dbe656-98db-f564-2b7d-bf81d7e953e0.md)
* [Values with Invalid Types According to the Destination Table’s Schema](https://docs.singlestore.com/#UUID-40a6c57c-40b7-6adc-9852-e157fd233fa6.md)
* [Rows That Contain an Invalid Number of Fields](https://docs.singlestore.com/#UUID-087b6fdb-d441-4556-e014-754445007b87.md)

## Duplicate Unique or Primary Key Values

When `IGNORE` is specified, source files that contain duplicate unique or primary key values will be handled in the following way:

* If the destination table’s schema specifies a unique or primary key column, and
* The source file contains one or more rows with a duplicate key value that already exists in the destination table or exists elsewhere in the source file, then
* Every duplicate row in the source file will be discarded (ignored) and will not be inserted into the destination table.

## Duplicate Unique or Primary Key Values Example

Create a new table with a `PRIMARY KEY` column:

```sql
CREATE TABLE orders(
  id BIGINT PRIMARY KEY,
  customer_id INT,
  item_description VARCHAR(255),
  order_time DATETIME NOT NULL
  );

```

The following CSV file will loaded be into this table as `orders.csv`. Note the duplicate primary key value of `2` in line 3:

```
1,372,Apples,2016-05-09
2,138,Pears,2016-07-14
2,236,Bananas,2016-06-23
3,307,Oranges,2016-07-31

```

Load the data into the table:

```sql
LOAD DATA INFILE 'orders.csv'
  IGNORE
  INTO TABLE orders
  FIELDS TERMINATED BY ','
  ERRORS HANDLE 'orders_errors';

```

Note that only 3 rows were inserted even though 4 rows were present in the source file. Line 3 in the source file contained a duplicate primary key. You can verify that these offending rows were not inserted by querying the `INFORMATION_SCHEMA.LOAD_DATA_ERRORS` table:

```sql
SELECT load_data_line_number, load_data_line, error_message
  FROM INFORMATION_SCHEMA.LOAD_DATA_ERRORS
  WHERE handle = 'orders_errors'
  ORDER BY load_data_line_number;

```

```output

+-----------------------+---------------------------+--------------------------------+
| load_data_line_number | load_data_line            | error_message                  |
+-----------------------+---------------------------+--------------------------------+
|                     3 | 2,236,Bananas,2016-06-23  | Duplicate entry for unique key |
+-----------------------+---------------------------+--------------------------------+

```

Line 3 in the source file contained a duplicate primary key and was discarded because line 2 was inserted first.

## Values with Invalid Types According to the Destination Table’s Schema

When `IGNORE` is specified, source files that contain rows with invalid types that violate the destination table’s schema will be handled in the following way:

* If the source file contains one or more rows with values that do not adhere to the destination table’s schema,
* Each value of an invalid type in a row will be replaced with the default value of the appropriate type, and
* The modified row(s) will be inserted into the destination table.

`IGNORE` behaves in a potentially unexpected way for columns that have a `DEFAULT` value specified. When an invalid value in the inserted row is replaced with the default value of the column’s type, the column’s `DEFAULT` value is ignored. Instead, the default value for the column’s data type is used.

## Example

Create a new table with a `PRIMARY KEY` column:

```sql
CREATE TABLE orders(
  id BIGINT PRIMARY KEY,
  customer_id INT,
  item_description VARCHAR(255),
  order_time DATETIME NOT NULL
  );

```

The following CSV file will be loaded be into this table as `orders.csv`. Line 4 contains a `NULL` value for `order_time`, whereas the table schema does not allow `NULL` values for this field.

```
1,372,Apples,2016-05-09
2,138,Pears,2016-07-14
3,236,Bananas,2016-06-23
4,307,Oranges,\N

```

Load the data into the table:

```sql
LOAD DATA INFILE 'orders.csv'
  IGNORE
  INTO TABLE orders
  FIELDS TERMINATED BY ','
  ERRORS HANDLE 'orders_errors';

```

Note that 4 rows were inserted despite the fact that line 4 in the source file contained a null value for a `NOT NULL` column. You can verify the error with the fourth row by querying the `INFORMATION_SCHEMA.LOAD_DATA_ERRORS` table:

```sql
SELECT load_data_line_number, load_data_line, error_message
  FROM INFORMATION_SCHEMA.LOAD_DATA_ERRORS
  WHERE handle = 'orders_errors'
  ORDER BY load_data_line_number;

```

```output

+-----------------------+------------------+--------------------------------------------------------+
| load_data_line_number | load_data_line   | error_message                                          |
+-----------------------+------------------+--------------------------------------------------------+
|                     4 | 4,307,Oranges,\N | NULL supplied to NOT NULL column 'order_time' at row 4 |
+-----------------------+------------------+--------------------------------------------------------+

```

To see what was inserted by replacing the invalid `DATETIME` value with a default value, query the table:

```sql
SELECT * FROM orders ORDER BY 1;

```

```output

+----+-------------+------------------+---------------------+
| id | customer_id | item_description | order_time          |
+----+-------------+------------------+---------------------+
|  1 |         372 | Apples           | 2016-05-09 00:00:00 |
|  2 |         138 | Pears            | 2016-07-14 00:00:00 |
|  3 |         236 | Bananas          | 2016-06-23 00:00:00 |
|  4 |         307 | Oranges          | 0000-00-00 00:00:00 |
+----+-------------+------------------+---------------------+

```

In this example, the invalid null `DATETIME` value was replaced with its default value: `0000-00-00 00:00:00`.

## Rows That Contain an Invalid Number of Fields

When `IGNORE` is specified, source files that contain rows with an invalid number of fields will be handled in one of two ways:

**Too Few Fields**

* If the source file contains one or more rows with too few fields according to the destination table’s schema,
* Each row’s empty field(s) will be updated with default values, and
* The row will be inserted into the destination table.

**Too Many Fields**

* If the source file contains one or more rows with too many fields according to the destination table’s schema,
* Each extra field in the row(s) will be discarded (ignored), and
* The row will be inserted into the destination table.

## Example

Create a new table with a `PRIMARY KEY` column:

```sql
CREATE TABLE orders(
  id BIGINT PRIMARY KEY,
  customer_id INT,
  item_description VARCHAR(255),
  order_time DATETIME NOT NULL
  );

```

The following CSV file will loaded be into this table as `orders.csv`. There are two things wrong with this file:

* Line 2 contains only 3 fields instead of 4 and does not have a `TIMESTAMP`:
* Line 4 contains an extra field, for a total of 5

```
1,372,Apples,2016-05-09
2,138,Pears
3,236,Bananas,2016-06-23
4,307,Oranges,2016-07-31,Berries

```

Load the data into the table:

```sql
LOAD DATA INFILE 'orders.csv'
  IGNORE
  INTO TABLE orders
  FIELDS TERMINATED BY ','
  ERRORS HANDLE 'orders_errors';

```

Note that 4 rows were inserted despite the invalid number of fields for two of the rows. You can verify the error with the fourth row by querying the `INFORMATION_SCHEMA.LOAD_DATA_ERRORS` table:

```sql
SELECT load_data_line_number, load_data_line, error_message
  FROM INFORMATION_SCHEMA.LOAD_DATA_ERRORS
  WHERE handle = 'orders_errors'
  ORDER BY load_data_line_number;

```

```output

+-----------------------+----------------------------------+---------------------------------------------------------------------------+
| load_data_line_number | load_data_line                   | error_message                                                             |
+-----------------------+----------------------------------+---------------------------------------------------------------------------+
|                     2 | 2,138,Pears                      | Row 2 doesn't contain data for all columns                                |
|                     4 | 4,307,Oranges,2016-07-31,Berries | Row 4 was truncated; it contained more data than there were input columns |
+-----------------------+----------------------------------+---------------------------------------------------------------------------+

```

Note that there is a warning for the missing value in row 2 and the extra value in row 4. To see how the data was inserted, query the table:

```sql
SELECT * FROM orders ORDER BY 1;

```

```output

+----+-------------+------------------+---------------------+
| id | customer_id | item_description | order_time          |
+----+-------------+------------------+---------------------+
|  1 |         372 | Apples           | 2016-05-09 00:00:00 |
|  2 |         138 | Pears            | 0000-00-00 00:00:00 |
|  3 |         236 | Bananas          | 2016-06-23 00:00:00 |
|  4 |         307 | Oranges          | 2016-07-31 00:00:00 |
+----+-------------+------------------+---------------------+

```

Line 2 did not have a `DATETIME` value, so the default value for its type was inserted instead. Line 4’s extra value was discarded, and otherwise the row was inserted with the expected data.

## Performance Considerations

**Shard Keys**

Loading data into a table with a shard key requires reading the necessary columns on the aggregator to compute the shard key before sending data to the leaves. For CSV LOAD DATA *only*, SingleStore recommends that columns included in the shard key appear earlier in input rows, if either the shard key design or the input format is flexible. Order does not significantly affect the performance of Avro or JSON LOAD DATA.

**Keyless Sharding**

Loading data into a keylessly sharded table (no shard key is declared, or `shard()` is specified) will result in batches of data loaded into different partitions, in a round-robin fashion. See [Sharding](https://docs.singlestore.com/db/v9.1/introduction/distributed-architecture/sharding.md) for more information.

## Retrieve loading status

The `information_schema.LMV_LOAD_DATA_STATUS` table reports information about rows and bytes read by in-progress `LOAD DATA` queries.

It also reports activity and database names, which you can use to find corresponding rows in workload profiling tables. See [Management View Reference](https://docs.singlestore.com/db/v9.1/reference/information-schema-reference/management.md) for more details.

> **❗ Important**: Result sets will only be returned if `LMV_LOAD_DATA_STATUS` is queried on the same aggregator as the in-progress `LOAD_DATA` queries.

## information\_schema.LMV\_LOAD\_DATA\_STATUS Table Schema

| Column Name     | Description                                                                     |
| --------------- | ------------------------------------------------------------------------------- |
| `ID`            | The connection ID.                                                              |
| `ACTIVITY_NAME` | The name of the database activity.                                              |
| `DATABASE_NAME` | The name of the database associated with the file being loaded into thecluster. |
| `BYTES_READ`    | Bytes read from the input file stream.                                          |
| `ROWS_READ`     | A count of rows read in from the source file (including skipped rows).          |

```sql
SELECT * FROM information_schema.LMV_LOAD_DATA_STATUS;

```

```output

+------+------------------------------------+---------------+------------+-----------+
| ID   | ACTIVITY_NAME                      | DATABASE_NAME | BYTES_READ | ROWS_READ |
+------+------------------------------------+---------------+------------+-----------+
| 2351 | load_data_company_0e8dec6d07d9cba5 | trades        |   94380647 |    700512 |
+------+------------------------------------+---------------+------------+-----------+

```

## Related Topics

* [INSERT](https://docs.singlestore.com/db/v9.1/reference/sql-reference/data-manipulation-language-dml/insert.md)
* [REPLACE](https://docs.singlestore.com/db/v9.1/reference/sql-reference/data-manipulation-language-dml/replace.md)
* [Performing Upserts](https://docs.singlestore.com/db/v9.1/query-data/advanced-query-topics/performing-upserts.md)
* [Sharding](https://docs.singlestore.com/db/v9.1/introduction/distributed-architecture/sharding.md)

***

Modified at: June 11, 2026

Source: [/db/v9.1/reference/sql-reference/data-manipulation-language-dml/load-data/](https://docs.singlestore.com/db/v9.1/reference/sql-reference/data-manipulation-language-dml/load-data/)

(An index of the documentation is available at /llms.txt)
