Load Avro Files Examples
On this page
Note
For more information on the LOAD DATA command, refer to LOAD DATA.
Examples
Example 1
Consider an Avro Object Container Fileexample. with the following schema:
{"type": "record","name": "data","fields": [{ "name": "id", "type": "long"},{ "name": "payload", "type": [ "null","string" ]}]}
example. contains three Avro values whose JSON encodings are:
{"id":1,"payload":{"string":"first"}}{"id":1,"payload":{"string":"second"}}{"id":1,"payload":null}
example. can be loaded as follows:
CREATE TABLE t(payload TEXT, input_record JSON);LOAD DATA LOCAL INFILE "example.avro"INTO TABLE tFORMAT AVRO( payload <- %::payload::string,input_record <- % );SELECT * FROM t;
+---------+----------------------------------------+
| payload | input_record |
+---------+----------------------------------------+
| first | {"id":1,"payload":{"string":"first"}} |
| second | {"id":1,"payload":{"string":"second"}} |
| NULL | {"id":1,"payload":null} |
+---------+----------------------------------------+LOAD DATA was able to parse example. because Avro Object Container Files have a header which contains their schema.
Example 2
Consider a file named example., with the same values as example. from Example 1 but in the raw stream format
.example. consists of the binary encoded values and nothing else.SCHEMA clause to tell LOAD DATA to expect a raw stream
with the provided schema:
CREATE TABLE t(payload TEXT, input_record JSON);LOAD DATA LOCAL INFILE "example.raw_avro"INTO TABLE tFORMAT AVRO( payload <- %::payload::string,input_record <- % )schema'{"type": "record","name": "data","fields": [{ "name": "id", "type": "long"},{ "name": "payload", "type": [ "null", "string" ]}]}';SELECT * FROM t;
+---------+----------------------------------------+
| payload | input_record |
+---------+----------------------------------------+
| first | {"id":1,"payload":{"string":"first"}} |
| second | {"id":1,"payload":{"string":"second"}} |
| NULL | {"id":1,"payload":null} |
+---------+----------------------------------------+Example 3
Consider an Object Container Fileexample3. with a more complicated payload than Example 1.
{ "type": "record","namespace": "ns","name": "data","fields": [{ "name": "id", "type": "long" },{ "name": "payload", "type":[ "null",{ "type": "record","name": "payload_record","namespace": "ns","fields": [{ "name": "f_bytes", "type": "bytes"},{ "name": "f_string", "type": "string"},{ "name": "f_map", "type":{ "type": "map","values": { "type": "array","items": "int" }}}]}]}]}
The raw JSON encoding of the contents of this file can be seen in column c_ after the following LOAD DATA:
CREATE TABLE t (c_id bigint,c_bytes longblob,c_string longblob,c_array_second int,c_whole_raw longblob,c_whole_json json);LOAD DATA INFILE "example3.avro"INTO TABLE tFORMAT AVRO( c_id <- %::id,c_bytes <- %::payload::`ns.payload_record`::f_bytes,c_string <- %::payload::`ns.payload_record`::f_string,@v_map <- %::payload::`ns.payload_record`::f_map,c_whole_raw <- %,c_whole_json <- %)SET c_array_second = JSON_EXTRACT_JSON(@v_map, "a", 1);SELECT * FROM t;
*** 1. row ***
c_id: 1
c_bytes: NULL
c_string: NULL
c_array_second: NULL
c_whole_raw: {"id":1,"payload":null}
c_whole_json: {"id":1,"payload":null}
*** 2. row ***
c_id: 2
c_bytes: "A
c_string: "A
c_array_second: 2
c_whole_raw: {"id":2,"payload":{"ns.payload_record":{"f_bytes":"\u0022\u0041","f_string":"\"A","f_map":{"a":[1,2]}}}}
c_whole_json: {"id":2,"payload":{"ns.payload_record":{"f_bytes":"\"A","f_map":{"a":[1,2]},"f_string":"\"A"}}}There are several things to note:
-
We attempted to extract subvalues of the
payload_branch of the union-typerecord payloadfield.Since that wasn’t the selected member of the union in record 1, LOAD DATAassignedNULLtoc_andbytes @v_.map -
We assigned the JSON encoding of
f_tomap @v_and then performed JSON map and array lookups in themap SETclause to ultimately extract2. -
f_andstring f_had the same contents, but we can see how their different Avro types affected their JSON encodings and interacted with the SQL JSON typebytes -
The JSON encoding of the Avro
stringvaluef_, as seen instring c_, encodes special characters likewhole_ raw "as the escape sequence\". -
The JSON encoding of the Avro
bytesvaluef_, as seen inbytes c_, encodes every byte with a JSON escape.whole_ raw -
When converting the JSON encoding of record 2 to the SQL JSON type while assigning to
c_,whole_ json LOAD DATAnormalized both representations of the byte sequence"Ato\"A.
-
Last modified: April 8, 2025