Schema Evolution Examples

Example 1: Adding a New Field to an Avro Schema

Consider an Avro schema product with the fields id and color:

{
"type": "record",
"name": "product",
"fields": [{ "name": "id", "type": "long"},
{ "name": "color", "type": [ "null","string" ]}]
}

The fields are loaded into the SingleStoreDB table t through the Pipeline via the following commands:

CREATE TABLE t(id BIGINT DEFAULT 0, color VARCHAR(30) DEFAULT NULL, input_record JSON DEFAULT NULL);
CREATE PIPELINE p AS LOAD DATA FS "/path/to/files/data.avro"
INTO TABLE t
FORMAT AVRO
SCHEMA REGISTRY "your_schema_registry_host_name_or_ip:your_schema_registry_port"
(id <- %::id,
color <- %::color,
input_record <- %);

Now, the Avro schema evolves with a new field price:

{
"type": "record",
"name": "product",
"fields": [{ "name": "id", "type": "long"},
{ "name": "color", "type": [ "null", "string" ]}
{ "name": "price", "type": "float" }]
}

To reflect the addition of the new field, you need to first update the table t along with the default value of the new field age:

ALTER TABLE t ADD field price FLOAT DEFAULT NULL;

Using the CREATE OR REPLACE PIPELINE command, update the Pipeline to load the new field. This allows the Pipeline to continue running without losing any offsets.

CREATE OR REPLACE PIPELINE p AS LOAD DATA FS "/path/to/files/data.avro"
INTO TABLE t
FORMAT AVRO
SCHEMA REGISTRY "your_schema_registry_host_name_or_ip:your_schema_registry_port"
(id <- %::id,
color <- %::color,
price <- %::price DEFAULT NULL,
input_record <- %);

Populating the Pipeline’s Target Table with a Default Value

If, prior to updating the Avro schema to include the new field price, you updated the Pipeline and its target table to include the field, the table’s field would be populated with the DEFAULT value. This value can be specified either in the ALTER statement that alters the table (DEFAULT NULL in this example), or in the CREATE OR REPLACE PIPELINE statement that alters the Pipeline (DEFAULT NULL in this example). If the DEFAULT value is specified in both places, as in this example, the table’s DEFAULT value is used. If the DEFAULT value is specified in neither place, an error is thrown.

Example 2: Removing a Field from an Avro Schema

Consider an Avro schema product with the fields id, item, and price:

{
"type": "record",
"name": "product",
"fields": [{ "name": "id", "type": "long"},
{ "name": "color", "type": [ "null", "string" ]}
{ "name": "price", "type": "float" }]
}

Load the schema into the SingleStoreDB table t through the Pipeline by the following command:

CREATE OR REPLACE PIPELINE p AS LOAD DATA FS "/path/to/files/data.avro"
INTO TABLE t
FORMAT AVRO
SCHEMA REGISTRY "your_schema_registry_host_name_or_ip:your_schema_registry_port"
(id <- %::id,
color <- %::color,
price <- %::price,
input_record <- %);

Now, the Avro schema evolves by no longer containing the field price:

{
"type": "record",
"name": "data",
"fields": [{ "name": "id", "type": "long"},
{ "name": "color", "type": [ "null", "string" ]}
}

Run the following commands in sequence to evolve the schema:

STOP PIPELINE DETACH p;

Note: The DETACH option in the STOP PIPELINE command allows to temporarily stop a pipeline in order to make changes to a source table to remove a field.

ALTER TABLE t REMOVE field age;
CREATE OR REPLACE PIPELINE p AS LOAD DATA FS "/path/to/files/data.avro"
INTO TABLE t
FORMAT AVRO
SCHEMA REGISTRY ""
(id <- %::id,
color <- %::color,
input_record <- %);
START PIPELINE p;

Example 3: Connecting to Confluent Schema Registry Over SSL

Note

This example only applies to SingleStoreDB 7.3.5 and later.

The following example shows how to connect to Confluent Schema Registry over SSL, using the SSL configuration settings in the CONFIG and CREDENTIALS clauses of CREATE PIPELINE.

CREATE OR REPLACE PIPELINE p AS LOAD DATA FS "/path/to/files/data.avro"
INTO TABLE t
FORMAT AVRO
SCHEMA REGISTRY ""
(id <- %::id,
color <- %::color,
input_record <- %);
CONFIG '{"schema.registry.ssl.certificate.location": "/var/private/ssl/client_memsql_client.pem",
"schema.registry.ssl.key.location": "/var/private/ssl/client_memsql_client.key",
"schema.registry.ssl.ca.location": "/var/private/ssl/ca-cert.pem"}'
CREDENTIALS '{"schema.registry.ssl.key.password": "abcdefgh"}'
START PIPELINE p;

Note

You can use a subset of the `ssl.` settings as follows:

  • schema.registry.ssl.key.location, schema.registry.ssl.ca.location, and schema.registry.ssl.key.password

  • schema.registry.ssl.certificate.location, schema.registry.ssl.key.location, and schema.registry.ssl.key.password

schema.registry.ssl.key.password is only required if your SSL certificate key has a password.

Last modified: September 5, 2023

Was this article helpful?