# Load Data from Avro Files

SingleStore supports loading data from AVRO files. These files can be stored in multiple data sources. You can also load data from AVRO files that are stored locally. To view the list of data sources supported by SingleStore, refer to [Data Sources](https://docs.singlestore.com/db/v9.1/load-data/data-sources.md).

## Syntax for Avro LOAD DATA

Below is the basic syntax for loading data from a local Avro file into a table.

```sql
LOAD DATA [LOCAL] INFILE 'file_name'
  WHERE/SET/SKIP ERRORS[REPLACE | SKIP { CONSTRAINT | DUPLICATE KEY } ERRORS]
  INTO TABLE tbl_name
  FORMAT AVRO SCHEMA REGISTRY {"IP" | "Hostname"}
  subvalue_mapping
  [SET col_name = expr,...]
  [WHERE expr,...]
  [MAX_ERRORS number]
  [ERRORS HANDLE string]
  [SCHEMA 'avro_schema']

subvalue_mapping:
  ( {col_name | @variable_name} <- subvalue_path, ...)

subvalue_path:
  {% | [%::]ident [::ident ...]
```

## Order of Operations for Updating an Avro Schema and its Associated Database Objects

When updating an Avro schema, follow the steps below (in order) to modify the pipeline and the pipeline’s target table to match the updated schema. The schema (which resides in the schema registry) can be modified at any time during this process.

1. Run [STOP PIPELINE](https://docs.singlestore.com/db/v9.1/reference/sql-reference/pipelines-commands/stop-pipeline.md) to detach the Pipeline. **When adding a field to the schema, this step is not required**.

2. Using [ALTER TABLE](https://docs.singlestore.com/db/v9.1/reference/sql-reference/data-definition-language-ddl/alter-table.md), to add or remove fields from the pipeline’s target tables.

3. Using [ALTER PIPELINE](https://docs.singlestore.com/db/v9.1/reference/sql-reference/pipelines-commands/alter-pipeline.md) or [CREATE OR REPLACE PIPELINE](https://docs.singlestore.com/db/v9.1/reference/sql-reference/pipelines-commands/create-pipeline.md), to add or remove fields from the pipeline.

4. Run [START PIPELINE](https://docs.singlestore.com/db/v9.1/reference/sql-reference/pipelines-commands/start-pipeline.md) to start the pipeline. **If a field is added to the schema and the pipeline is running, this step is not required.**

If adding a field to the schema prior to adding a field to the pipeline and its target table, and have not stopped the Pipeline, the Pipeline continues to run. The newly added field will not contain data unless it has also been added to the pipeline's target table. If a field is added to the schema before being added to a running pipeline and its target table, the pipeline will continue to run.

Before the new field is added to the schema, a default value for the field is populated in the pipeline’s target table. See [Example 1: Adding a New Field to an Avro Schema](https://docs.singlestore.com/db/v9.1/load-data/load-data-from-files/load-data-from-avro-files/avro-schema-evolution-with-pipelines/#UUID-fd9ae8c9-caf6-52dd-d4f6-2ef6d48666c2.md) below for details.

## Schema Evolution Examples

## Example 1: Adding a New Field to an Avro Schema

Consider an Avro schema `product` with the fields `id` and `color`:

```json
{
  "type": "record",
  "name": "product",
  "fields": [
    { "name": "id", 
      "type": "long"
    },
    { 
      "name": "color", 
      "type": [ 
        "null",
        "string" 
      ]
    }
  ]
}
```

The fields are loaded into the SingleStore table `t` through the Pipeline via the following commands:

```sql
CREATE TABLE t(id BIGINT DEFAULT 0, color VARCHAR(30) DEFAULT NULL, input_record JSON DEFAULT NULL);
CREATE PIPELINE p AS LOAD DATA FS "/path/to/files/data.avro" 
  INTO TABLE t 
  FORMAT AVRO SCHEMA REGISTRY "your_schema_registry_host_name_or_ip:your_schema_registry_port" 
 (id <- %::id, color <- %::color, input_record <- %);

```

Now, the Avro schema evolves with a new field `price`:

```json
{ 
  "type": "record", 
  "name": "product", 
  "fields": [
    { 
      "name": "id", 
      "type": "long"
    }, 
    { 
      "name": "color", 
      "type": [ 
        "null", 
        "string" 
      ]
    }, 
    { 
      "name": "price", 
      "type": "float" 
    }
  ]
}

```

To reflect the addition of the new field, you need to first update the table `t` along with the default value of the new field `age`:

```sql
ALTER TABLE t ADD field price FLOAT DEFAULT NULL;

```

Using the `CREATE OR REPLACE PIPELINE` command, update the Pipeline to load the new field. This allows the Pipeline to continue running without losing any offsets.

```sql
CREATE OR REPLACE PIPELINE p AS LOAD DATA FS "/path/to/files/data.avro" 
  INTO TABLE t FORMAT AVRO SCHEMA REGISTRY "your_schema_registry_host_name_or_ip:your_schema_registry_port" 
  (id <- %::id, color <- %::color, price <- %::price DEFAULT NULL, input_record <- %);

```

## Populating the Pipeline’s Target Table with a Default Value

If, prior to updating the Avro schema to include the new field `price`, you updated the Pipeline and its target table to include the field, the table’s field would be populated with the `DEFAULT` value. This value can be specified either in the `ALTER` statement that alters the table (`DEFAULT NULL` in this example), or in the `CREATE OR REPLACE PIPELINE` statement that alters the Pipeline (`DEFAULT NULL` in this example). If the `DEFAULT` value is specified in both places, as in this example, the table’s `DEFAULT` value is used. If the `DEFAULT` value is specified in neither place, an error is thrown.

## Example 2: Removing a Field from an Avro Schema

Consider an Avro schema `product` with the fields `id`, `color`, and `price`:

```json
{ 
  "type": "record", 
  "name": "product", 
  "fields": [
    { 
      "name": "id", 
      "type": "long"
    }, 
    { "name": "color", 
      "type": [ 
        "null", 
        "string" 
      ]
    }, 
    { 
      "name": "price", 
      "type": "float" 
    }
  ]
}

```

Load the schema into the SingleStore table `t` through the Pipeline by the following command:

```sql
CREATE OR REPLACE PIPELINE p AS LOAD DATA FS "/path/to/files/data.avro" 
  INTO TABLE t FORMAT AVRO SCHEMA REGISTRY "your_schema_registry_host_name_or_ip:your_schema_registry_port" 
 (id <- %::id, color <- %::color, price <- %::price, input_record <- %);

```

Now, the Avro schema evolves by no longer containing the field `price`:

```json
{ 
  "type": "record", 
  "name": "data", 
  "fields": [
    { 
      "name": "id", 
      "type": "long"
    }, 
    { 
      "name": "color", 
      "type": [ 
        "null", 
        "string" 
      ]
    }
  ]
}

```

Run the following commands in sequence to evolve the schema:

```sql
STOP PIPELINE DETACH p;

```

**Note:** The `DETACH` option in the `STOP PIPELINE` command allows to temporarily stop a pipeline in order to make changes to a source table to remove a field.

```sql
ALTER TABLE t REMOVE field age;

```

```sql
CREATE OR REPLACE PIPELINE p AS LOAD DATA FS "/path/to/files/data.avro" 
  INTO TABLE t FORMAT AVRO SCHEMA REGISTRY "" (id <- %::id, color <- %::color, input_record <- %);

```

```sql
START PIPELINE p;

```

## Example 3: Connecting to Confluent Schema Registry Over SSL

> **📝 Note**: This example only applies to SingleStore 7.3.5 and later.

The following example shows how to connect to Confluent Schema Registry over SSL, using the SSL configuration settings in the `CONFIG` and `CREDENTIALS` clauses of `CREATE PIPELINE`.

```sql
CREATE OR REPLACE PIPELINE p AS LOAD DATA FS "/path/to/files/data.avro" 
  INTO TABLE t FORMAT AVRO SCHEMA REGISTRY "" (id <- %::id, color <- %::color, input_record <- %); 
  CONFIG '{"schema.registry.ssl.certificate.location": "/var/private/ssl/client_memsql_client.pem", 
    "schema.registry.ssl.key.location": "/var/private/ssl/client_memsql_client.key", 
    "schema.registry.ssl.ca.location": "/var/private/ssl/ca-cert.pem"}' 
  CREDENTIALS '{"schema.registry.ssl.key.password": "abcdefgh"}'
```

```sql
START PIPELINE p;
```

> **📝 Note**: You can use a subset of the \`ssl.\` settings as follows:- `schema.registry.ssl.key.location`, `schema.registry.ssl.ca.location`, and `schema.registry.ssl.key.password`
> - `schema.registry.ssl.certificate.location`, `schema.registry.ssl.key.location`, and `schema.registry.ssl.key.password`

`schema.registry.ssl.key.password` is only required if your SSL certificate key has a password.

## In this section

* [Avro Schema Evolution with Pipelines](https://docs.singlestore.com/db/v9.1/load-data/load-data-from-files/load-data-from-avro-files/avro-schema-evolution-with-pipelines.md)
* [Load Avro Files with LOAD DATA](https://docs.singlestore.com/db/v9.1/load-data/load-data-from-files/load-data-from-avro-files/load-avro-files-with-load-data.md)
* [Extracting and Converting Avro Values](https://docs.singlestore.com/db/v9.1/load-data/load-data-from-files/load-data-from-avro-files/extracting-and-converting-avro-values.md)
* [Load Avro Files Examples](https://docs.singlestore.com/db/v9.1/load-data/load-data-from-files/load-data-from-avro-files/load-avro-files-examples.md)

***

Modified at: April 3, 2025

Source: [/db/v9.1/load-data/load-data-from-files/load-data-from-avro-files/](https://docs.singlestore.com/db/v9.1/load-data/load-data-from-files/load-data-from-avro-files/)

(An index of the documentation is available at /llms.txt)
