Load Avro Files with LOAD DATA
On this page
Data from AVRO files can be loaded using the LOAD DATA command.
Avro LOAD DATA
Syntax for LOAD DATA Local Infile
LOAD DATA [LOCAL] INFILE 'file_name'WHERE/SET/SKIP ERRORS[REPLACE | SKIP { CONSTRAINT | DUPLICATE KEY } ERRORS]INTO TABLE tbl_nameFORMAT AVRO SCHEMA REGISTRY {"IP" | "Hostname"}subvalue_mapping[SET col_name = expr,...][WHERE expr,...][MAX_ERRORS number][ERRORS HANDLE string][SCHEMA 'avro_schema']subvalue_mapping:( {col_name | @variable_name} <- subvalue_path, ...)subvalue_path:{% | [%::]ident [::ident ...]}
See the associated GitHub repo.
Syntax for LOAD DATA AWS S3 Source
Avro-formatted data stored in an AWS S3 bucket can use a LOAD DATA query without a pipeline.
LOAD DATA S3 '<bucket name>'CONFIG '{"region" : "<region_name>"}'CREDENTIALS '{"aws_access_key_id" : "<key_id> ","aws_secret_access_key": "<access_key>"}'INTO TABLE <table_name>(`<col_a>` <- %,`<col_b>` <- % DEFAULT NULL ,) FORMAT AVRO;
This data can also be loaded from S3 with a connection link.
LOAD DATA LINK <link_name> '<bucket name>'INTO TABLE <table_name>(`<col_a>` <- %,`<col_b>` <- % DEFAULT NULL ,) FORMAT AVRO;
Semantics
Error Logging and Error Handling are discussed in the LOAD DATA page.
LOAD DATA for Avro does not support file name globbing (for example: LOAD DATA INFILE '/data/nfs/gp1/*.).LOAD DATA for Avro only supports loading a single file per statement.
Extract specified subvalues from each Avro value in file_.tbl_, or to variables used for a column assignment in a SET clause.WHERE clause.
To specify the compression type of an input file, use the COMPRESSION clause.
Avro LOAD DATA expects Avro data in one of two sub-formats
, depending on the SCHEMA clause.
If no SCHEMA clause is provided, file_ must name an Avro Object Container File as described in version 1.
-
The compression codec of the file must be
null. -
Array and map values must not have more than 16384 elements.
-
The type name of a
recordmust not be used in a symbolicreference to previously defined name
in any of its fields.It may still be used in a symbolic reference outside the record definition, however. For example, self-referential schemas like the following are rejected by
LOAD DATA:{"type": "record","name": "PseudoLinkedList","fields" : [{"name": "value", "type": "long"},{"name": "next", "type": ["null", "PseudoLinkedList"]}]}
If a SCHEMA clause is provided, the file must be a raw stream
consisting of only the concatenated binary encodings of instances of avro_.avro_ must be a SQL string containing a JSON Avro schema.raw stream
files.
Warning
It’s an error to provide a SCHEMA clause when loading an Object Container File because it contains metadata alongside the encoded values.
All optional Avro schema attributes except the namespace attribute are ignored.logicalType attributes are ignored.
If file_ ends in . or ., it will be decompressed.
Writing to multiple databases in a transaction is not supported.
The SCHEMA REGISTRY {"IP" | "Hostname"} option allows LOAD DATA to pull the schema from a schema registry.
Last modified: April 8, 2025