# Load Avro Files Examples

> **📝 Note**: For more information on the LOAD DATA command, refer to [LOAD DATA](https://docs.singlestore.com/db/v9.1/reference/sql-reference/data-manipulation-language-dml/load-data.md).

## Examples

**Example 1**

Consider an Avro *Object Container File*`example.avro` with the following schema:

```json
{
  "type": "record",
  "name": "data",
  "fields": [{ "name": "id", "type": "long"},
             { "name": "payload", "type": [ "null",
                                            "string" ]}]
}

```

`example.avro` contains three Avro values whose JSON encodings are:

```json
{"id":1,"payload":{"string":"first"}}
{"id":1,"payload":{"string":"second"}}
{"id":1,"payload":null}

```

`example.avro` can be loaded as follows:

```sql
CREATE TABLE t(payload TEXT, input_record JSON);

LOAD DATA LOCAL INFILE "example.avro"
  INTO TABLE t
  FORMAT AVRO
  ( payload <- %::payload::string,
  input_record <- % );

SELECT * FROM t;

```

```output

+---------+----------------------------------------+
| payload | input_record                           |
+---------+----------------------------------------+
| first   | {"id":1,"payload":{"string":"first"}}  |
| second  | {"id":1,"payload":{"string":"second"}} |
| NULL    | {"id":1,"payload":null}                |
+---------+----------------------------------------+

```

`LOAD DATA` was able to parse `example.avro` because Avro *Object Container Files* have a header which contains their schema.

**Example 2**

Consider a file named `example.raw_avro`, with the same values as `example.avro` from Example 1 but in the “raw stream format”. That is, `example.raw_avro` consists of the binary encoded values and nothing else. We add a `SCHEMA` clause to tell `LOAD DATA` to expect a “raw stream” with the provided schema:

```sql
CREATE TABLE t(payload TEXT, input_record JSON);
LOAD DATA LOCAL INFILE "example.raw_avro"
  INTO TABLE t
  FORMAT AVRO
  ( payload <- %::payload::string,
  input_record <- % )
  schema
  '{
    "type": "record",
    "name": "data",
    "fields": [{ "name": "id", "type": "long"},
      { "name": "payload", "type": [ "null", "string" ]}]
  }';

SELECT * FROM t;

```

```output

+---------+----------------------------------------+
| payload | input_record                           |
+---------+----------------------------------------+
| first   | {"id":1,"payload":{"string":"first"}}  |
| second  | {"id":1,"payload":{"string":"second"}} |
| NULL    | {"id":1,"payload":null}                |
+---------+----------------------------------------+

```

**Example 3**

Consider an *Object Container File*`example3.avro` with a more complicated payload than Example 1. We illustrate extracting values from nested unions and records, and also indirectly extracting elements of nested maps and arrays.

```json
{ "type": "record",
  "namespace": "ns",
  "name": "data",
  "fields": [
      { "name": "id", "type": "long" },
      { "name": "payload", "type":
        [ "null",
          { "type": "record",
            "name": "payload_record",
            "namespace": "ns",
            "fields": [
                { "name": "f_bytes", "type": "bytes"},
                { "name": "f_string", "type": "string"},
                { "name": "f_map", "type":
                  { "type": "map",
                    "values": { "type": "array",
                                "items": "int" }}}
            ]
          }
        ]
      }
  ]
}

```

The raw JSON encoding of the contents of this file can be seen in column `c_whole_raw` after the following `LOAD DATA`:

```sql
CREATE TABLE t (
  c_id bigint,
  c_bytes longblob,
  c_string longblob,
  c_array_second int,
  c_whole_raw longblob,
  c_whole_json json
  );

LOAD DATA INFILE "example3.avro"
  INTO TABLE t
  FORMAT AVRO
  ( c_id <- %::id,
  c_bytes <- %::payload::`ns.payload_record`::f_bytes,
  c_string <- %::payload::`ns.payload_record`::f_string,
  @v_map <- %::payload::`ns.payload_record`::f_map,
  c_whole_raw <- %,
  c_whole_json <- %)
  SET c_array_second = JSON_EXTRACT_JSON(@v_map, "a", 1);

SELECT * FROM t;

```

```output

*** 1. row ***
          c_id: 1
       c_bytes: NULL
      c_string: NULL
c_array_second: NULL
   c_whole_raw: {"id":1,"payload":null}
  c_whole_json: {"id":1,"payload":null}
*** 2. row ***
          c_id: 2
       c_bytes: "A
      c_string: "A
c_array_second: 2
   c_whole_raw: {"id":2,"payload":{"ns.payload_record":{"f_bytes":"\u0022\u0041","f_string":"\"A","f_map":{"a":[1,2]}}}}
  c_whole_json: {"id":2,"payload":{"ns.payload_record":{"f_bytes":"\"A","f_map":{"a":[1,2]},"f_string":"\"A"}}}

```

There are several things to note:

* We attempted to extract subvalues of the `payload_record` branch of the union-type `payload` field. Since that wasn’t the selected member of the union in record 1, `LOAD DATA` assigned `NULL` to `c_bytes` and `@v_map`.
* We assigned the JSON encoding of `f_map` to `@v_map` and then performed JSON map and array lookups in the `SET` clause to ultimately extract `2`.
* `f_string` and `f_bytes` had the same contents, but we can see how their different Avro types affected their JSON encodings and interacted with the SQL JSON type

  * The JSON encoding of the Avro `string` value `f_string`, as seen in `c_whole_raw`, encodes special characters like `"` as the escape sequence `\"`.
  * The JSON encoding of the Avro `bytes` value `f_bytes`, as seen in `c_whole_raw`, encodes *every* byte with a JSON escape.
  * When converting the JSON encoding of record 2 to the SQL JSON type while assigning to `c_whole_json`, `LOAD DATA` normalized both representations of the byte sequence `"A` to `\"A`.

***

Modified at: April 8, 2025

Source: [/db/v9.1/load-data/load-data-from-files/load-data-from-avro-files/load-avro-files-examples/](https://docs.singlestore.com/db/v9.1/load-data/load-data-from-files/load-data-from-avro-files/load-avro-files-examples/)

(An index of the documentation is available at /llms.txt)
