Warning
SingleStore 9.0 gives you the opportunity to preview, evaluate, and provide feedback on new and upcoming features prior to their general availability. In the interim, SingleStore 8.9 is recommended for production workloads, which can later be upgraded to SingleStore 9.0.
Load Avro Files Examples
On this page
Note
For more information on the LOAD DATA command, refer to LOAD DATA.
Examples
Example 1
Consider an Avro Object Container Fileexample.
with the following schema:
{"type": "record","name": "data","fields": [{ "name": "id", "type": "long"},{ "name": "payload", "type": [ "null","string" ]}]}
example.
contains three Avro values whose JSON encodings are:
{"id":1,"payload":{"string":"first"}}{"id":1,"payload":{"string":"second"}}{"id":1,"payload":null}
example.
can be loaded as follows:
CREATE TABLE t(payload TEXT, input_record JSON);LOAD DATA LOCAL INFILE "example.avro"INTO TABLE tFORMAT AVRO( payload <- %::payload::string,input_record <- % );SELECT * FROM t;
+---------+----------------------------------------+
| payload | input_record |
+---------+----------------------------------------+
| first | {"id":1,"payload":{"string":"first"}} |
| second | {"id":1,"payload":{"string":"second"}} |
| NULL | {"id":1,"payload":null} |
+---------+----------------------------------------+
LOAD DATA
was able to parse example.
because Avro Object Container Files have a header which contains their schema.
Example 2
Consider a file named example.
, with the same values as example.
from Example 1 but in the raw stream format
.example.
consists of the binary encoded values and nothing else.SCHEMA
clause to tell LOAD DATA
to expect a raw stream
with the provided schema:
CREATE TABLE t(payload TEXT, input_record JSON);LOAD DATA LOCAL INFILE "example.raw_avro"INTO TABLE tFORMAT AVRO( payload <- %::payload::string,input_record <- % )schema'{"type": "record","name": "data","fields": [{ "name": "id", "type": "long"},{ "name": "payload", "type": [ "null", "string" ]}]}';SELECT * FROM t;
+---------+----------------------------------------+
| payload | input_record |
+---------+----------------------------------------+
| first | {"id":1,"payload":{"string":"first"}} |
| second | {"id":1,"payload":{"string":"second"}} |
| NULL | {"id":1,"payload":null} |
+---------+----------------------------------------+
Example 3
Consider an Object Container Fileexample3.
with a more complicated payload than Example 1.
{ "type": "record","namespace": "ns","name": "data","fields": [{ "name": "id", "type": "long" },{ "name": "payload", "type":[ "null",{ "type": "record","name": "payload_record","namespace": "ns","fields": [{ "name": "f_bytes", "type": "bytes"},{ "name": "f_string", "type": "string"},{ "name": "f_map", "type":{ "type": "map","values": { "type": "array","items": "int" }}}]}]}]}
The raw JSON encoding of the contents of this file can be seen in column c_
after the following LOAD DATA
:
CREATE TABLE t (c_id bigint,c_bytes longblob,c_string longblob,c_array_second int,c_whole_raw longblob,c_whole_json json);LOAD DATA INFILE "example3.avro"INTO TABLE tFORMAT AVRO( c_id <- %::id,c_bytes <- %::payload::`ns.payload_record`::f_bytes,c_string <- %::payload::`ns.payload_record`::f_string,@v_map <- %::payload::`ns.payload_record`::f_map,c_whole_raw <- %,c_whole_json <- %)SET c_array_second = JSON_EXTRACT_JSON(@v_map, "a", 1);SELECT * FROM t;
*** 1. row ***
c_id: 1
c_bytes: NULL
c_string: NULL
c_array_second: NULL
c_whole_raw: {"id":1,"payload":null}
c_whole_json: {"id":1,"payload":null}
*** 2. row ***
c_id: 2
c_bytes: "A
c_string: "A
c_array_second: 2
c_whole_raw: {"id":2,"payload":{"ns.payload_record":{"f_bytes":"\u0022\u0041","f_string":"\"A","f_map":{"a":[1,2]}}}}
c_whole_json: {"id":2,"payload":{"ns.payload_record":{"f_bytes":"\"A","f_map":{"a":[1,2]},"f_string":"\"A"}}}
There are several things to note:
-
We attempted to extract subvalues of the
payload_
branch of the union-typerecord payload
field.Since that wasn’t the selected member of the union in record 1, LOAD DATA
assignedNULL
toc_
andbytes @v_
.map -
We assigned the JSON encoding of
f_
tomap @v_
and then performed JSON map and array lookups in themap SET
clause to ultimately extract2
. -
f_
andstring f_
had the same contents, but we can see how their different Avro types affected their JSON encodings and interacted with the SQL JSON typebytes -
The JSON encoding of the Avro
string
valuef_
, as seen instring c_
, encodes special characters likewhole_ raw "
as the escape sequence\"
. -
The JSON encoding of the Avro
bytes
valuef_
, as seen inbytes c_
, encodes every byte with a JSON escape.whole_ raw -
When converting the JSON encoding of record 2 to the SQL JSON type while assigning to
c_
,whole_ json LOAD DATA
normalized both representations of the byte sequence"A
to\"A
.
-
Last modified: April 8, 2025