# Load Avro Files with LOAD DATA

Data from AVRO files can be loaded using the LOAD DATA command. For more information on the LOAD DATA command, refer to [LOAD DATA](https://docs.singlestore.com/db/v9.1/reference/sql-reference/data-manipulation-language-dml/load-data.md). To see a few examples of using LOAD DATA with AVRO files, refer to [Load Avro Files Examples](https://docs.singlestore.com/db/v9.1/load-data/load-data-from-files/load-data-from-avro-files/load-avro-files-examples.md).

## Avro LOAD DATA

## Syntax for LOAD DATA Local Infile

```sql
LOAD DATA [LOCAL] INFILE 'file_name'
  WHERE/SET/SKIP ERRORS[REPLACE | SKIP { CONSTRAINT | DUPLICATE KEY } ERRORS]
  INTO TABLE tbl_name
  FORMAT AVRO SCHEMA REGISTRY {"IP" | "Hostname"}
  subvalue_mapping
  [SET col_name = expr,...]
  [WHERE expr,...]
  [MAX_ERRORS number]
  [ERRORS HANDLE string]
  [SCHEMA 'avro_schema']

subvalue_mapping:
  ( {col_name | @variable_name} <- subvalue_path, ...)

subvalue_path:
  {% | [%::]ident [::ident ...]}

```

See the associated [GitHub repo](https://github.com/singlestore-labs/singlestore-avro-sample).

## Syntax for LOAD DATA AWS S3 Source

Avro-formatted data stored in an AWS S3 bucket can use a LOAD DATA query without a pipeline. This streamlines the process of loading cloud-stored data into tables.

```sql
LOAD DATA S3 '<bucket name>'
CONFIG '{"region" : "<region_name>"}' 
CREDENTIALS '{"aws_access_key_id" : "<key_id> ", 
             "aws_secret_access_key": "<access_key>"}'
INTO TABLE <table_name>
       (`<col_a>` <- %, 
 `<col_b>` <- % DEFAULT NULL , 
  ) FORMAT AVRO;
```

This data can also be loaded from S3 with a connection link. Refer to [CREATE LINK](https://docs.singlestore.com/db/v9.1/reference/sql-reference/security-management-commands/create-link.md) for more information on connection links.&#x20;

```sql
LOAD DATA LINK <link_name> '<bucket name>'
INTO TABLE <table_name>
(`<col_a>` <- %,`<col_b>` <- % DEFAULT NULL ,
) FORMAT AVRO;
```

## Semantics

Error Logging and Error Handling are discussed in the [LOAD DATA](https://docs.singlestore.com/db/v9.1/reference/sql-reference/data-manipulation-language-dml/load-data.md) page.

`LOAD DATA` for Avro does not support file name globbing (for example: `LOAD DATA INFILE '/data/nfs/gp1/*.avro`). `LOAD DATA` for Avro only supports loading a single file per statement.

Extract specified subvalues from each Avro value in `file_name`. Assign them to specified columns of a new row in `tbl_name`, or to variables used for a column assignment in a `SET` clause. Discard rows which don’t match the `WHERE` clause.

To specify the compression type of an input file, use the `COMPRESSION` clause. See [LOAD DATA](https://docs.singlestore.com/db/v9.1/reference/sql-reference/data-manipulation-language-dml/load-data/#UUID-55530b3c-3939-f404-a8e0-db8a648b36fd.md) for more information.

Avro `LOAD DATA` expects Avro data in one of two “sub-formats”, depending on the `SCHEMA` clause.

If no `SCHEMA` clause is provided, `file_name` must name an Avro *Object Container File* as described in [version 1.8.2 of the Avro specification](https://avro.apache.org/docs/1.8.2/spec.html). In addition, the following restrictions hold:

* The compression codec of the file must be `null`.
* Array and map values must not have more than 16384 elements.
* The type name of a `record` must not be used in a symbolic “reference to previously defined name” in any of its fields. It may still be used in a symbolic reference outside the record definition, however.

  For example, self-referential schemas like the following are rejected by `LOAD DATA`:
  ```json
  {
    "type": "record",
    "name": "PseudoLinkedList",
    "fields" : [{"name": "value", "type": "long"},
                {"name": "next", "type": ["null", "PseudoLinkedList"]}]
  }

  ```

If a `SCHEMA` clause is provided, the file must be a “raw stream” consisting of *only* the concatenated binary encodings of instances of `avro_schema`. `avro_schema` must be a SQL string containing a JSON Avro schema. The restrictions on *Object Container Files* also apply to “raw stream” files.

> **⚠️ Warning**: It’s an error to provide a `SCHEMA` clause when loading an *Object Container File* because it contains metadata alongside the encoded values.

All optional Avro schema attributes except the `namespace` attribute are ignored. Notably, `logicalType` attributes are ignored.

If `file_name` ends in `.gz` or `.lz4`, it will be decompressed.

Writing to multiple databases in a transaction is not supported.

The `SCHEMA REGISTRY {"IP" | "Hostname"}` option allows `LOAD DATA` to pull the schema from a schema registry. For more information, see the [Avro Schema Evolution With Pipelines](https://docs.singlestore.com/db/v9.1/load-data/load-data-from-files/load-data-from-avro-files/avro-schema-evolution-with-pipelines.md) topic.

***

Modified at: April 8, 2025

Source: [/db/v9.1/load-data/load-data-from-files/load-data-from-avro-files/load-avro-files-with-load-data/](https://docs.singlestore.com/db/v9.1/load-data/load-data-from-files/load-data-from-avro-files/load-avro-files-with-load-data/)

(An index of the documentation is available at /llms.txt)
