# Iceberg Ingest > **📝 Note**: This is a Preview feature. [Apache Iceberg](https://iceberg.apache.org/) is an open-source table format that helps simplify analytical data processing for large datasets in data lakes. SingleStore can be used to add a speed layer to Iceberg tables. Iceberg tables can be directly ingested into SingleStore without the need for an external ETL tool and subsequently processed using SingleStore's high-performance database engine. ## Remarks The following are supported: * Iceberg Version 1 tables and Iceberg Version 2 tables with data files in [Parquet](https://parquet.apache.org/) format. * Iceberg tables stored in Amazon S3 with catalogs: GLUE, Snowflake, REST, JDBC, Hive, Hadoop, Polaris, and Unity. > **❗ Important**: The global engine variable `enable_iceberg_ingest` must be set to `ON` to use Iceberg Ingest. This variable is set to `OFF` by default.```sql > SET GLOBAL enable_iceberg_ingest = ON; > ``` ## Syntax ```sql CREATE [OR REPLACE] PIPELINE AS LOAD DATA S3 '[]' CONFIG '{"catalog_type": "GLUE|SNOWFLAKE|REST|JDBC|HIVE|HADOOP", [, "endpoint_url": ""] [, "ingest_mode": "append|upsert|one_time"] [, "catalog_name": "" ] [, “catalog.” : “property_value” [, …]] [, "iceberg_vended_credentials":true|false] }' CREDENTIALS '’ [REPLACE | MERGE] INTO TABLE FORMAT ICEBERG; :( { | @}<- [, ... ]) : {ident [::ident ...]} ``` All the data shaping options for Parquet pipelines are supported for Iceberg pipelines. Refer to [Data Shaping with Pipelines](https://docs.singlestore.com/cloud/load-data/about-singlestore-pipelines/pipeline-concepts/data-shaping-with-pipelines.md) for more information. Schema inference for Iceberg pipelines is supported. Refer to [Schema and Pipeline Inference](https://docs.singlestore.com/cloud/load-data/about-singlestore-pipelines/pipeline-concepts/schema-and-pipeline-inference.md) and [Infer Data from Iceberg Files](https://docs.singlestore.com/cloud/load-data/about-singlestore-pipelines/pipeline-concepts/schema-and-pipeline-inference/#section-idm234989481470077.md) for details. ## Table Identifier * The `` identifies the Iceberg table. The `` is catalog-specific but is typically in the form: `database_name.table_name`. ## CONFIG ## Catalog Specification * The `catalog_type` is required for the catalog specification. * The `catalog_name` is a name to associate with the catalog when reading table metadata and is used internally in SingleStore for logging and metrics purposes. The `catalog_name` is required for the JDBC catalog and is optional for other catalogs. * The `catalog.property` is a list of key-value pairs for configuring the catalog connection. The property and value are passed directly to the Iceberg SDK to establish the catalog connection. Refer to [Iceberg Catalog and Security Configuration](https://docs.singlestore.com/#section-id235382377087001.md) for more information. ## S3 Specification * The `` is a JSON string for S3 configuration parameters such as `region`, `endpoint_url`, and `compatibility_mode`. Refer to [CREATE PIPELINE](https://docs.singlestore.com/cloud/reference/sql-reference/pipelines-commands/create-pipeline.md) and [CREATE LINK](https://docs.singlestore.com/cloud/reference/sql-reference/security-management-commands/create-link.md) for more information. ## S3 Compatible Backend Storage SingleStore supports Amazon S3 compatible backend storage for all catalog types used with Iceberg tables. The `endpoint_url` is an Amazon S3 compatible storage backend, such as MinIO or IBM Storage Ceph. This value is distinct from the `endpoint_url` defined in the [S3 Specification](https://docs.singlestore.com/#section-idm234375301764122.md). ## Ingest Modes SingleStore supports multiple mechanisms for ingesting Iceberg data. SingleStore recommends using `MERGE` pipelines as the preferred method for ingesting Iceberg data. If `MERGE` pipelines do not meet performance or functional requirements, three alternative ingest modes are available using `ingest_mode:` `one_time`, `append`, and `upsert`. Refer to [Types of Ingest](https://docs.singlestore.com/#section-id235382401268725.md) for details and a comparison of the ingest mechanisms. > **❗ Important**: `MERGE` pipelines cannot be used with `ingest_mode` options. You cannot specify `MERGE` and an `ingest_mode` option in the same pipeline. ## Vended Credentials (`iceberg_vended_credentials`) SingleStore supports vended credentials for Iceberg REST catalogs with S3 storage. Refer to [Iceberg Catalog and Security Configuration](https://docs.singlestore.com/#section-id235382377087001.md) for details. ## AWS EKS IRSA (`eks_irsa`) SingleStore supports AWS Elastic Kubernetes Service (EKS) IAM vended credentials for Iceberg REST catalogs with S3 storage. Refer to [Iceberg Catalog and Security Configuration](https://docs.singlestore.com/#section-id235382377087001.md) for details. ## CREDENTIALS (``) * The `` specifies S3 credentials in JSON format. With this specification, credentials are directly included in the pipeline. While including credentials in the pipeline is simpler, it is less secure than using vended credentials or EKS IRSA. For information about the supported credential options, refer to [CREATE PIPELINE](https://docs.singlestore.com/cloud/reference/sql-reference/pipelines-commands/create-pipeline/#UUID-e9b729f5-e821-593f-4780-33e798409694.md). * When using Vended Credentials or EKS IRSA, the `CREDENTIALS` clause is optional; however, the clause may be included to add additional authentication parameters. ## REPLACE INTO [REPLACE INTO](https://docs.singlestore.com/cloud/reference/sql-reference/data-manipulation-language-dml/replace.md) is required for pipelines that use `ingest_mode: upsert`. SingleStore recommends that `REPLACE INTO` not be used with other types of pipelines as doing so causes unnecessary overhead. When you execute a `REPLACE INTO` statement on a table with a `PRIMARY KEY` or `UNIQUE` index, the engine checks for matching values in these indexes. If a match is found, it deletes the existing row and replaces it with the new row. If no match exists or if there is no `PRIMARY KEY` or `UNIQUE` index, `REPLACE INTO` operates like `INSERT`. ## MERGE INTO The `MERGE INTO` clause creates a pipeline, called a `MERGE` pipeline, that continuously ingests data from an Iceberg table into a SingleStore table. A `MERGE` pipeline continuously detects inserts, updates, and deletes to and from the Iceberg table and applies those updates to the SingleStore table. SingleStore recommends `MERGE` pipelines as the preferred method for ingesting Iceberg data. Refer to [Types of Ingest](https://docs.singlestore.com/#section-id235382401268725.md) for details on `MERGE` and a comparison of `MERGE` and `ingest_mode`. ## Subvalue Mappings The `iceberg_subvalue_mapping` assigns fields from the Iceberg table to columns in the SingleStore table or to temporary variables. A `::`-separated list of field names is used in `iceberg_subvalue_path` to look up fields in nested schemas. When the files in the Iceberg table are Parquet files, the `::`-separated list of field names is used to look up fields in nested Parquet schemas. The following rules apply: * The last field in `iceberg_subvalue_path` must be a primitive type. * All `iceberg_subvalue_path` components containing whitespace or punctuation must be surrounded by backticks (\`). * The `iceberg_subvalue_path` may not contain Parquet nested types (list or map types). Refer to [Parquet - Nested Types](https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#nested-types) for more information. ## Iceberg Catalog and Security Configuration ## Catalog Types SingleStore supports the following Iceberg Catalogs: Glue, Snowflake, REST, JDBC, Hive, and Hadoop. Polaris and Unity are REST catalogs and are supported. ## Vended Credentials SingleStore supports "vended credentials" for Iceberg REST catalogs with S3 storage. Vended credentials allow clients to access Iceberg tables without the Iceberg catalog credentials. The Iceberg catalog generates temporary limited credentials for a client to access the storage, improving the security and ease of catalog administration. Catalogs have catalog-specific settings for vended credentials. All such settings must be added as catalog properties using `"catalog."` in the `CONFIG` section. In addition, specify `"iceberg_vended_credentials": true` in the `CONFIG`. ## AWS Elastic Kubernetes Service (EKS) IAM Roles for Service Accounts (IRSA) AWS EKS IRSA can be used to provide credentials for a S3 pipeline. To use EKS IRSA: * Configure [delegated entities](https://docs.singlestore.com/cloud/user-and-workspace-administration/cloud-workload-identity-and-delegated-entities/#section-id235390403966648.md) to use [Cloud Workload Identity](https://docs.singlestore.com/cloud/user-and-workspace-administration/cloud-workload-identity-and-delegated-entities.md) instead of static credentials. * EKS IRSA must be enabled by setting the `enable_eks_irsa` engine variable to `ON`. ```sql SET GLOBAL enable_eks_irsa = ON; ``` * Include the `"creds_mode": "eks_irsa"` option in the pipeline `CONFIG`. SingleStore supports two scenarios for access with IAM roles and EKS IRSA: 1. The role assigned to a pod has direct access to S3 resources. 1. Include the `"creds_mode": "eks_irsa"` option in the pipeline `CONFIG`. ```sql CREATE PIPELINE AS LOAD DATA S3 's3://bucket_name/directory' CONFIG '{"region":"us-west-2", "creds_mode": "eks_irsa"}' CREDENTIALS '{}' INTO TABLE FORMAT ICEBERG; ``` 2. The role assigned to a pod does not have direct access to the S3 resources, but it has permissions to assume a role that has the desired access. 1. Provide the role to be assumed with `role_arn` in the pipeline `CREDENTIALS`. ```sql CREATE PIPELINE AS LOAD DATA S3 's3://bucket_name/directory' CONFIG '{"region":"us-west-2", "creds_mode": "eks_irsa"}' CREDENTIALS '{"role_arn": ""}' INTO TABLE FORMAT ICEBERG; ``` Refer to the Minimum Required S3 Pipeline Syntax and AWS Elastic Kubernetes Service (EKS) IAM Roles for Service Accounts (IRSA) Authentication examples in [CREATE PIPELINE](https://docs.singlestore.com/cloud/reference/sql-reference/pipelines-commands/create-pipeline/#UUID-e9b729f5-e821-593f-4780-33e798409694.md). ## AWS Secrets Manager Secrets Secrets in the AWS Secrets Manager can be used in the pipeline `CONFIG` or `CREDENTIALS`. To place a secret part of a pipeline `CONFIG` or `CREDENTIALS` into the AWS Secrets Manager: 1. Identify the part of the configuration that you want to hide. Use the secrets feature only for credentials; do not store the entire configuration into the Secrets Manager, as this may cause problems with the pipeline. 2. Add the secret part as a secret in AWS Secrets Manager. As AWS Secrets Manager stores secrets as JSON, a secret may contain multiple key-value pairs. For example, you might create a secret that contains values for `"catalog.credential"` and `"catalog.another_credential"`. 3. Ensure the pipeline's AWS credentials allow accessing the AWS Secrets Manager. Those credentials could be either static credentials, or a service role provided to the pod by EKS IRSA. 4. Remove the secret part or parts from the pipeline CONFIG and replace them with `"ref": "arn_of_your_secret"`. ```json { "catalog_type":"REST", "region":"us-east-1", "catalog.uri":"https://whatever.snowflakecomputing.com/polaris/api/catalog", "ref": "arn:aws:secretsmanager:us-east-1:188440000000:secret:test-secret-VkYPHc", ... } ``` Every `"ref"` property in the pipeline `CONFIG` or `CREDENTIALS` is replaced by the values contained in the secret mentioned in the property's value. These values are placed on the same level as the `"ref"` they replace. The AWS Secrets Manager can be accessed with a role that must be assumed before accessing the secret. Set this role in `CREDENTIALS` using `secretsmanager_role_arn`, or in `ref` by using a pair of role ARN and secret ARN separated by a pipe; for example: `"ref": "arn:aws:iam::188440000000:role/role_to_assume|arn:aws:secretsmanager:us-east-1:188440000000:secret:my-secret"`. When AWS credentials are used only to access the Secrets Manager, and not to access data on S3 (for example, when vended credentials are used to access S3) those credentials must be prefixed with the string `"secretsmanager_"`, such as, `"secretsmanager_role_arn"`. ## Types of Ingest ## Compare Types of Continuous Ingest SingleStore supports several types of continuous ingest: `MERGE` pipelines, Append-Only, and Upsert. With continuous ingest, a running pipeline automatically detects updates to the Iceberg table and ingests them into the SingleStore table. These types of ingest have different characteristics, SingleStore recommends using `MERGE` pipelines as the preferred method for ingesting Iceberg data. Append-Only and Upsert are provided as performance optimizations. * `MERGE` can process Append, Overwrite, Replace, and Delete (positional and equality) Iceberg snapshots; Append-Only and Upsert using `ingest_mode` have limitations on the types of Iceberg snapshots processed. * The `MERGE` pipeline has the broadest coverage of what can be ingested as incremental updates, merge pipelines are therefore inherently more complex. * `MERGE` requires adding three columns (`$_file`, `$_row`, `$_delete`) in the destination (SingleStore) table. These three extra columns have the following impacts: * `MERGE` is slightly slower on initial ingest compared to Append-Only mode. * `MERGE` takes slightly more space in storage. * The `MERGE` pipeline statement is more complex, but that can be mitigated with [INFER](https://docs.singlestore.com/cloud/reference/sql-reference/pipelines-commands/infer-pipeline.md). * When using `MERGE`, users may not directly update the destination (SingleStore) table. If the destination (SingleStore) table is updated, updates made to the source Iceberg table may not propagate properly. * When using `MERGE`, the Replace type of snapshot is not a no-op. Data files will be re-ingested if the source table went through compaction. * `MERGE` implements merge-on-read (MOR). * `MERGE` requires an extra condition for `SELECT` statements to manage MOR-style deletes. * An extra entity (`VIEW`) is created by `INFER` by default. * Impacts scan performance. * `ingest_mode` updates the SingleStore table directly and does not have a read penalty. ## Merge Pipelines (`MERGE`) The `MERGE` clause creates a merge pipeline, that continuously ingests data from an Iceberg table into a SingleStore table. Append, Overwrite, Replace, and Delete (positional and equality) Iceberg V2 snapshots are supported. For a merge pipeline, the SingleStore (destination) table: * Must have the following three columns: * `` `$_file` VARCHAR(2048) COMMENT 'ICEBERG_FILE_PATH' `` * `` `$_row` BIGINT COMMENT 'ICEBERG_FILE_POS' `` * `` `$_delete` JSON DEFAULT '{}' COMMENT 'ICEBERG_DELETED_BY' `` * You may use different column names for the `` `$_file` ``, `` `$_row` ``, and `` `$_delete` `` columns as long as the columns are marked with the `COMMENT` clauses shown above. * Must have shard key defined as: * ``SHARD KEY(`$_file`, `$_row`)`` * Must not be updated or modified. If the destination table is modified, updates may be applied incorrectly. The pipeline declaration must set the `` `$_file` `` and `` `$_row` `` columns as follows: ```sql SET `$_file` = pipeline_source_file(), `$_row` = pipeline_source_file_row() ``` To ingest deletes, a merge pipeline must be created using a [VIEW](https://docs.singlestore.com/cloud/reference/sql-reference/data-definition-language-ddl/create-view.md) over the destination table. This view must have the following `WHERE` clause: ```sql WHERE JSON_LENGTH(`$_delete`) = 0 ``` Merge pipelines perform merge-on-read on-demand during table scan when a view is used. [MERGE Pipeline](https://docs.singlestore.com/#section-idm235005322019542.md) provides a merge pipeline example. ## Append-Only (`ingest_mode`) Append-only mode is used for scenarios in which new rows are added to the Iceberg table, but no rows are deleted or modified. In append-only mode (`"ingest_mode":"append"`) the pipeline will process `APPEND` Iceberg snapshots. If a `DELETE` or `OVERWRITE` snapshot is encountered, an error is raised. Users may override those errors by specifying `"ignore_unsupported_modifications":true` in the pipeline configuration. SingleStore does not recommend setting `"ignore_unsupported_modifications":true` as doing so may lead to data inconsistency. ## Upsert (`ingest_mode`) Upsert mode is used for scenarios where updates modify non-key columns based on a specified key column(s). An upsert is an insert and update, an upsert updates a row in the table if there is a primary key match or inserts the row if there is no primary key match. In upsert mode (`"ingest_mode":"upsert"`), the pipeline will process updates to the Iceberg table as upserts to the SingleStore table. That is, the pipeline will process Iceberg `APPEND` and `OVERWRITE` snapshots of the Iceberg table as upserts to the SingleStore table. Requirements: * The SingleStore table must have a primary key or a unique index. In the ``, a column(s) in the Iceberg table must be mapped to the column(s) in SingleStore on which there is a key or unique index. Typically, the column(s) in the Iceberg table will also be declared as a key. * The pipeline must be created using `REPLACE`. Refer to [Additional CREATE PIPELINE Examples](https://docs.singlestore.com/cloud/reference/sql-reference/pipelines-commands/additional-create-pipeline-examples/#section-idm4517954193070433070171573573.md) for more information. Limitations: * Pipelines will fail on delete snapshots. Users may override these errors by specifying `"ignore_unsupported_modifications":true` in the pipeline configuration. SingleStore does not recommend setting `"ignore_unsupported_modifications":true` as doing so may lead to data inconsistency. * Limitations are expected to be addressed in future releases. ## One-Time (ingest\_mode) A one-time ingest (`"ingest_mode":"one_time"`) is supported. In this mode, SingleStore requests Iceberg table metadata, and loads data from the latest snapshot available at that moment. Subsequent updates to the Iceberg table are not automatically ingested. ## CREATE OR REPLACE When a pipeline with a specified `pipeline_name` already exists, the `CREATE OR REPLACE` command functions similarly to `CREATE PIPELINE`, with the added benefit of preserving existing pipeline metadata, such as loaded offsets and data files. Running `CREATE OR REPLACE` on an existing pipeline initiates the Iceberg pipeline to retrieve a new snapshot, schema, and data files, and inserts data from these new files into the destination table in SingleStore. Executing `CREATE OR REPLACE` on an existing Iceberg pipeline may cause some data files to be ingested twice. To avoid this, use `CREATE OR REPLACE` only with `REPLACE` statements or in an upsert configuration. ```sql CREATE PIPELINE books_create_pipe AS LOAD DATA S3 'db.books' CONFIG '{"region":"us-west-2", "catalog_type": "GLUE", "catalog_name": "s3_catalog", "ingest_mode": "one_time" }' CREDENTIALS '{"aws_access_key_id": "", "aws_secret_access_key": ""}' REPLACE INTO TABLE books (Id <- id, Name <- name, NumPages <- numPages, Rating <- rating) FORMAT ICEBERG; ``` Refer to [CREATE PIPELINE](https://docs.singlestore.com/cloud/reference/sql-reference/pipelines-commands/create-pipeline.md) for syntax for `CREATE OR REPLACE PIPELINE`, `REPLACE INTO TABLE`, and `ON DUPLICATE KEY UPDATE`. Refer to [Performing Upserts](https://docs.singlestore.com/cloud/query-data/advanced-query-topics/performing-upserts.md) for more information on upserts. ## Iceberg Pipeline Configuration ## Data Type Translation The following table lists Iceberg Types, the SingleStore data type that can be used to store those types, and the recommended conversion to be applied with a `SET` clause. | **Iceberg Type** | **RecommendedSingleStoreData Type** | **Recommended Conversion** | | ---------------- | ----------------------------------- | --------------------------------------------------------------- | | boolean | `TINYINT`/`BOOL`/`BOOLEAN` | | | int | `INT` | | | long | `BIGINT` | | | float | `FLOAT` | | | double | `DOUBLE` | | | decimal(P,S) | `DECIMAL` | | | date | `DATE` | DATE\_ADD('1970-01-01', INTERVAL @date DAY) | | time | `TIME(6)` | DATE\_ADD('1970-01-01', INTERVAL @time\_value MICROSECOND) | | timestamp | `DATETIME(6)` | DATE\_ADD('1970-01-01', INTERVAL @timestamp\_value MICROSECOND) | | timestamptz | `DATETIME(6)` | DATE\_ADD('1970-01-01', INTERVAL @timestamp\_value MICROSECOND) | | string | `LONGTEXT (utf8mb4_bin)` | | | uuid | `BINARY(16)` | | | fixed (L) | `BINARY(L)` | | | binary | `LONGBLOB` | | ## Data File Filtering When ingesting from Iceberg tables, SingleStore applies file-level filtering to reduce the number of data files downloaded. File-level filtering is determined by the `WHERE` clause in the pipeline definition. * Only a subset of expressions can be used for file filtering. * Expressions that are not eligible for file-level filtering are applied later as row-level filtering after the file is downloaded. To enable file filtering before download, write the `WHERE` clause using the following rules: * Use any combination of the boolean operators `AND` and `OR`. * Use the binary operators `=`, `<`, `<=`, `>`, `>=` with a column name on one side and a literal value on the other. For example, `WHERE` `column_name_1 < 9999` * Function calls are not supported in file filtering. Pipelines always evaluate the full `WHERE` clause during ingestion, after file-level filtering. * Row-level filtering may include more complex expressions, such as function calls, that are not eligible for file-level filtering. * When the `WHERE` clause uses an `AND` operator, unsupported expressions can be discarded for file-level filtering while still being applied during row-level filtering. For example, `WHERE col1 > 1000 AND col2 = my_udf(col3)` * Per-file filtering: `col1 > 1000` * Per-row filtering: `col1 > 1000 AND col2 = my_udf(col3)` If none of the expressions in the `WHERE` clause can be applied at the file level, the `CREATE PIPELINE` statement issues the following warning: `Pipeline's where clause is not suitable for file-level filtering` File-level filtering relies on column-level metadata that Iceberg generates for each data file. Refer to [Manifest Entry Fields](https://iceberg.apache.org/spec/#manifest-entry-fields) for more information. ## Monitor Ingest The per-file status of the Iceberg table ingestion can be queried through the [PIPELINES\_ICEBERG\_FILES](https://docs.singlestore.com/cloud/reference/information-schema-reference/data-ingest/pipelines-iceberg-files.md) information schema view. This view lists all files currently tracked by the pipeline and their corresponding status. Enabling `OFFSETS_METADATA_GC` on the Iceberg pipeline reduces the number of entries by tracking files only referenced by the pipeline’s current Iceberg table snapshot. The option `OFFSETS_METADATA_GC` can be set when you [CREATE](https://docs.singlestore.com/cloud/reference/sql-reference/pipelines-commands/create-pipeline.md) the pipeline or modify it using [ALTER PIPELINE](https://docs.singlestore.com/cloud/reference/sql-reference/pipelines-commands/alter-pipeline.md). ## Configure Iceberg Ingest The global engine variable `enable_iceberg_ingest` must be set to `ON` to use Iceberg ingest. This variable is set to `OFF` by default. ```sql SET GLOBAL enable_iceberg_ingest = ON; ``` Engine variables can be used to control parameters including memory usage and timeouts for Iceberg pipelines. The default values of these engine variables are defined to work well for common Iceberg pipelines uses. Pipeline parameters including memory usage and timeouts can be controlled using global engine variables with the `pipelines_*` prefix as specified in [List of Engine Variables](https://docs.singlestore.com/cloud/reference/configuration-reference/engine-variables/list-of-engine-variables.md). Memory usage can also be controlled using the `java_pipelines_heap_size` and `pipelines_iceberg_data_workers_heap_size` engine variables. Separate variables are used to control the memory used by leaf and aggregator nodes. Adjust these variables if there are out of memory (OOM) errors in Java extractors. Pipeline timeouts can be controlled using `pipelines_extractor_get_offsets_timeout_ms` and `pipelines_extractor_idle_timeout_ms`. For Iceberg pipelines, the values of these variables are adjusted to be a minimum of 5 minutes. Increase the values of these variables if there are timeouts in pipeline creation or while running pipeline batches. Refer to [List of Engine Variables](https://docs.singlestore.com/cloud/reference/configuration-reference/engine-variables/list-of-engine-variables.md) for more information. ## Examples ## Glue Catalog on Amazon S3 An Iceberg table with data files in Parquet format that is stored in an AWS S3 bucket using AWS Glue can be loaded into a SingleStore table using a pipeline ([CREATE PIPELINE](https://docs.singlestore.com/cloud/reference/sql-reference/pipelines-commands/create-pipeline.md). Refer to [Apache Iceberg - Glue Catalog](https://iceberg.apache.org/docs/1.5.0/aws/#glue-catalog) for information on using a GLUE catalog with Iceberg. In this example, a table named `books` is created and data from an Iceberg table that meets this schema is loaded into the `books` table. Create the table. ```sql CREATE TABLE books( Id INT, Name TEXT, NumPages INT, Rating DOUBLE, PRIMARY KEY(Id)); ``` The following data is used for this example. ```shell (1, 'Happy Place', 400, 4.9) (2, 'Legends & Lattes', 304, 4.9) (3, 'The Vanishing Half', 352, 4.9) (4, 'The Wind Knows My Name', 304, 4.9) ``` The `PIPELINE` statement below will load data from an Iceberg table containing the data above into the `books` table. The column names on the left side of the `<-` are the column names from the SingleStore table into which the data will be loaded. The column names on the right side of the `<-` are the column names from the Iceberg table which is to be loaded into SingleStore. ```sql CREATE PIPELINE books_pipe AS LOAD DATA S3 'db.books' CONFIG '{"region":"us-west-2", "catalog_type": "GLUE" }' CREDENTIALS '{"aws_access_key_id": "", "aws_secret_access_key": ""}' INTO TABLE books (Id <- id, Name <- name, NumPages <- numPages, Rating <- rating) FORMAT ICEBERG; ``` Test the pipeline. ```sql TEST PIPELINE books_pipe; ``` ```output +------+------------------------+----------+--------+ | Id | Name | NumPages | Rating | +------+------------------------+----------+--------+ | 4 | The Wind Knows My Name | 304 | 4.9 | | 1 | Happy Place | 400 | 4.9 | | 2 | Legends & Lattes | 304 | 4.9 | | 3 | The Vanishing Half | 352 | 4.9 | +------+------------------------+----------+--------+ ``` Refer to [START PIPELINE](https://docs.singlestore.com/cloud/reference/sql-reference/pipelines-commands/start-pipeline.md) for more information on starting pipelines. ## Use Subvalue Mappings This example shows the use of subvalue mappings to load nested elements from an Iceberg schema into a SingleStore table. Create a table. ```sql CREATE TABLE addresses( Id INT, Name TEXT, Street TEXT, City TEXT, Country TEXT, PRIMARY KEY(Id)); ``` The following data is used for this example. ```shell (1, 'Mylo', struct('123 Main St', 'New York', 'USA')) (2, 'Naya', struct('456 Elm St', 'San Francisco', 'USA')) ``` The `PIPELINE` statement below will load data from an Iceberg table containing the data above into the `addresses` table. The column names on the left side of the `<-` are the column names from the SingleStore table into which the data will be loaded. The column names on the right side of the `<-` are the column names from the Iceberg table which is to be loaded into SingleStore. ```sql CREATE PIPELINE addresses_pipe AS LOAD DATA S3 'db2.addresses' CONFIG '{"region":"us-west-2", "catalog_type": "GLUE", "catalog_name": "s3_catalog" }' CREDENTIALS '{"aws_access_key_id": "", "aws_secret_access_key": ""}' INTO TABLE addresses (Id <- id, Name <- name, Street <- address::street, City <- address::city, Country <- address::country) FORMAT ICEBERG; ``` Test the pipeline. ```sql TEST PIPELINE addresses_pipe; ``` ```output +------+------+-------------+---------------+---------+ | Id | Name | Street | City | Country | +------+------+-------------+---------------+---------+ | 1 | Mylo | 123 Main St | New York | USA | | 2 | Naya | 456 Elm St | San Francisco | USA | +------+------+-------------+---------------+---------+ ``` Refer to [START PIPELINE](https://docs.singlestore.com/cloud/reference/sql-reference/pipelines-commands/start-pipeline.md) for more information on starting pipelines. ## Snowflake Catalog on Amazon S3 Ingest an Iceberg table stored in Amazon S3 with a Snowflake catalog. Iceberg tables to be ingested in SingleStore must be created on an external volume. Refer to [Tutorial: Create your first Iceberg table](https://docs.snowflake.com/en/user-guide/tutorials/create-your-first-iceberg-table), [Create an external volume](https://docs.snowflake.com/en/user-guide/tutorials/create-your-first-iceberg-table#create-an-external-volume), and [Snowflake Iceberg Catalog SDK](https://docs.snowflake.com/en/user-guide/tables-iceberg-catalog) for more information. ```sql CREATE PIPELINE addresses_pipe AS LOAD DATA S3 'db_name.schema_name.table_name' CONFIG '{"region": "us-west-2", "catalog_type": "SNOWFLAKE", "catalog.uri": "jdbc:snowflake://tpq12345.snowflakecomputing.com", "catalog.jdbc.user":"", "catalog.jdbc.password":"", "catalog.jdbc.role":""}' CREDENTIALS '{"aws_access_key_id": "", "aws_secret_access_key": ""}' INTO TABLE addresses (Id <- id, Name <- name, Street <- address::street, City <- address::city, Country <- address::country) FORMAT ICEBERG; ``` For the Snowflake catalog, the `` must consist of three parts - the database name, the schema name, and the table name, `db_name.schema_name.table_name` in the example above. The catalog.uri can be obtained from running `SELECT SYSTEM$ALLOWLIST();` in the Snowflake system. In addition, the `catalog.uri`, `catalog.jdbc.user`, `catalog.jdbc.password`, and `catalog.jdbc.role` are required when using the Snowflake catalog. ## REST Catalog on Amazon S3 Ingest an Iceberg table stored in Amazon S3 with REST catalog. ```sql CREATE PIPELINE addresses_pipe AS LOAD DATA S3 'db_name.table_name' CONFIG '{"region": "us-west-2", "catalog_type": "REST", "catalog.uri": "http://host.address:8181"}' CREDENTIALS '{"aws_access_key_id": "", "aws_secret_access_key": ""}' INTO TABLE addresses (Id <- id, Name <- name, Street <- address::street, City <- address::city, Country <- address::country) FORMAT ICEBERG; ``` ## File Filtering with REST Catalog on Amazon S3 Ingest an Iceberg table stored in Amazon S3 with REST catalog using file filtering. Assume there exists an Iceberg table that stores weather sensor data and is partitioned with a day transform on the timestamp as follows. ```sql CREATE TABLE sensors.weather ( sensorid int, temperature double, humidity double, pressure double, ts timestamp) PARTITIONED BY day(ts); ``` This data is to be loaded into a SingleStore table defined as follows: ```sql CREATE TABLE weather_data( SensorId INT, Temperature DOUBLE, Humidity DOUBLE, Pressure DOUBLE, TS DATETIME, PRIMARY KEY(SensorId)); ``` The following pipeline will load all weather data for January 1, 2026 and later into the `weather_data` table. As the Iceberg table is partitioned by day(ts), SingleStore will read and load only the Iceberg files containing data for 2026 (and future years). ```sql CREATE PIPELINE weather_pipe AS LOAD DATA S3 'sensors.weather' CONFIG '{"region": "us-west-2", "catalog_type": "REST", "catalog.uri": "http://host.address:8181"}' CREDENTIALS '{"aws_access_key_id": "", "aws_secret_access_key": ""}' INTO TABLE weather_data (SensorId <- sensorid, Temperature <- temperature, Humidity <- humidity, Pressure <- pressure, TS <- ts) FORMAT ICEBERG WHERE timestamp >= '2026-01-01'; ``` ## JDBC Catalog on Amazon S3 Ingest an Iceberg table stored in Amazon S3 with JDBC catalog. Refer to [Iceberg JDBC Integration](https://iceberg.apache.org/docs/1.5.0/jdbc/) for more information on using Iceberg with JDBC catalog. SingleStore supports Postgres, MySQL, and SQLite JDBC drivers by default. The following example uses JDBC with SQLite. ```sql CREATE PIPELINE addresses_pipe AS LOAD DATA S3 'db_name.table_name' CONFIG '{"region": "us-west-2", "catalog_type": "JDBC", "catalog_name": "catalog_name", "catalog.warehouse": "s3://path_to_warehouse", "catalog.uri":"jdbc:sqlite_:file:/path_jdbc"}' CREDENTIALS '{"aws_access_key_id": "", "aws_secret_access_key": ""}' INTO TABLE addresses (Id <- id, Name <- name, Street <- address::street, City <- address::city, Country <- address::country) FORMAT ICEBERG; ``` The following example uses JDBC with MySQL. ```sql CREATE PIPELINE addresses_pipe AS LOAD DATA S3 'db_name.table_name' CONFIG '{"region": "us-west-2", "catalog_type": "JDBC", "catalog_name": "catalog_name", "catalog.warehouse": "s3://path_to_warehouse", "catalog.uri": "jdbc:mysql://host.address:3306/default", "catalog.jdbc.user": "", "catalog.jdbc.password": ""}' CREDENTIALS '{"aws_access_key_id": "", "aws_secret_access_key": ""}' INTO TABLE addresses (Id <- id, Name <- name, Country <- address::country) FORMAT ICEBERG; ``` The `catalog.warehouse` and `catalog_name` are required for JDBC catalogs. ## Hive Catalog on Amazon S3 Ingest an Iceberg table stored in Amazon S3 using Hive Catalog. Iceberg tables to be ingested in the Hive catalog must use Hive Metastore service. Refer to [Apache Hive](https://hive.apache.org/) and [Hive - Apache Iceberg](https://iceberg.apache.org/docs/latest/hive/) for more information. ```sql CREATE PIPELINE addresses_pipe AS LOAD DATA S3 'db_name.table_name' CONFIG '{"catalog_type": "HIVE", "catalog.uri": "thrift://:46590", "region": "us-east-1", "catalog.hive.metastore.client.auth.mode": "PLAIN", "catalog.hive.metastore.client.plain.username": "", "catalog.hive.metastore.client.plain.password": "", "catalog.metastore.use.SSL": "true", "catalog.hive.metastore.truststore.type": "PKCS12", "catalog.hive.metastore.truststore.path": "/path/to/your/project/hive/truststore.12" "catalog.hive.metastore.truststore.password": '' }' CREDENTIALS '{"aws_access_key_id": "", "aws_secret_access_key": ""}' INTO TABLE addresses (Id <- id, Name <- name, Street <- address::street, City <- address::city, Country <- address::country) FORMAT ICEBERG; ``` The `catalog.uri` is the base URL for accessing the Hive catalog's API or service endpoint. The `catalog.hive.metastore.client.auth.mode` is the authentication mode for connecting to the Hive Metastore. The `catalog.hive.metastore.client.plain.username` is the username used to authenticate with the Hive Metastore. The `catalog.hive.metastore.client.plain.password` is the password for the authenticated user. The `catalog.metastore.use.SSL` is a boolean flag that secures communication with the Hive Metastore. The `catalog.hive.metastore.truststore.type` is the truststore format used to validate the SSL certificate. The `catalog.hive.metastore.truststore.path` is the file path that contains the SSL certificate. The `catalog.hive.metastore.truststore.password` is the password needed to access the truststore. Refer to [GitHub](https://github.com/apache/hive/blob/master/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java) for additional Hive configurations. ## Hadoop Catalog on Amazon S3 Ingest an Iceberg table stored in Amazon S3 with Hadoop catalog. Hadoop catalog supports S3 compatible backend storages. Refer to [Apache Hadoop](https://hadoop.apache.org/) and [Hadoop - Apache Iceberg](https://iceberg.apache.org/javadoc/nightly/org/apache/iceberg/hadoop/HadoopCatalog.html) for more information. ```sql CREATE PIPELINE addresses_pipe AS LOAD DATA s3 'db_name.table_name' CONFIG '{"catalog_type": "HADOOP", "catalog.warehouse": "s3://path_to_warehouse", "endpoint_url": "", "region": "us-east-1" }' CREDENTIALS '{"aws_access_key_id": "", "aws_secret_access_key":""}' INTO TABLE addresses (Id <- id, Name <- name, Street <- address::street, City <- address::city, Country <- address::country) FORMAT ICEBERG; ``` ## Polaris Catalog on Amazon S3 Ingest an Iceberg table stored in Amazon S3 using Polaris Catalog. Polaris catalog uses the Apache Iceberg REST API. Refer to [Getting Started with Snowflake Open Catalog](https://other-docs.snowflake.com/opencatalog/tutorials/open-catalog-gs) for more information. ```sql CREATE PIPELINE addresses_pipe AS LOAD DATA S3 'db_name.table_name' CONFIG '{"catalog_type": "REST", "catalog.warehouse": "", "table_id": "db_name.table_name", "region":"us-east-1", "catalog.uri":"https://tpq12345.snowflakecomputing.com/polaris/api/catalog", "catalog.credential":"catalog.credential", "catalog.scope": "PRINCIPAL_ROLE:ALL"}' CREDENTIALS '{"aws_access_key_id": "", "aws_secret_access_key": ""}' INTO TABLE addresses (Id <- id, Name <- name, Street <- address::street, City <- address::city, Country <- address::country) FORMAT ICEBERG; ``` The `catalog.uri` is the base URL for accessing the Polaris catalog's API or service endpoint. The `catalog.scope` defines the access permissions for the Polaris catalog. The `catalog.connection` is a secret key from Polaris catalog connection, formatted as `:`. Get this when you configure the service connection in Snowflake while creating a connection for a Polaris catalog. ## Upsert (`ingest_mode`) The example below shows a pipeline using `ingest_mode` of `upsert`. As required when `ingest_mode` is `upsert`, the books table has a primary key, `Id`, in this example. When started, this pipeline will ingest data from the latest snapshot of the Iceberg table into the SingleStore `books` table. Then, when the Iceberg table is updated, those updates will automatically be applied to the SingleStore table as upserts. ```sql CREATE PIPELINE books_upsert_pipe AS LOAD DATA S3 'db.books' CONFIG '{"region":"us-west-2", "catalog_type": "GLUE", "catalog_name": "s3_catalog", "ingest_mode": "upsert" }' CREDENTIALS '{"aws_access_key_id": "", "aws_secret_access_key": ""}' REPLACE INTO TABLE books (Id <- id, Name <- name, NumPages <- numPages, Rating <- rating) FORMAT ICEBERG; ``` ## Append (`ingest_mode`) The example below shows a pipeline using `ingest_mode` of `append`. When started, this pipeline will ingest data from the latest snapshot of the Iceberg table into the SingleStore `books` table. Then, when the Iceberg table is updated, any appends to that table will automatically be appended to the SingleStore table. If non-append updates are encountered, an error will be thrown. ```sql CREATE PIPELINE books_append_pipe AS LOAD DATA S3 'db.books' CONFIG '{"region":"us-west-2", "catalog_type": "GLUE", "catalog_name": "s3_catalog", "ingest_mode": "append" }' CREDENTIALS '{"aws_access_key_id": "", "aws_secret_access_key": ""}' INTO TABLE books (Id <- id, Name <- name, NumPages <- numPages, Rating <- rating) FORMAT ICEBERG; ``` ## MERGE Pipeline The example below shows an example of a creating a merge pipeline. In this example, a destination table in SingleStore , named `_books` , is created with the necessary `$_file`, `$_row`, and `$_delete` columns. Then a view, `books` , is created on top of the `_books` table. Finally, a merge pipeline is defined. When started, this pipeline will ingest data from the latest snapshot of the source Iceberg table into the SingleStore `_books` table. When the source Iceberg table is updated, those updates (append, replace, delete) will be propagated to the `_books` table. Create the destination `_books` table. ```sql CREATE TABLE _books( Id INT, Name TEXT, NumPages INT, Rating DOUBLE, `$_file` VARCHAR(2048), `$_row` BIGINT, `$_delete` JSON DEFAULT '{}', KEY(Id), SHARD KEY(`$_file`,`$_row`)); ``` Create the `books` view. ```sql CREATE VIEW books AS SELECT Id, Name, NumPages, Rating from _books WHERE JSON_LENGTH(`$_delete`) = 0; ``` Define the pipeline. ```sql CREATE PIPELINE books_pipe AS LOAD DATA S3 'db.books' CONFIG '{"region":"us-west-2", "catalog_type": "GLUE", "catalog_name": "s3_catalog" }' CREDENTIALS '{"aws_access_key_id": "", "aws_secret_access_key": ""}' MERGE INTO TABLE _books (Id <- id, Name <- name, NumPages <- numPages, Rating <- rating) FORMAT ICEBERG SET `$_file` = pipeline_source_file(), `$_row` = pipeline_source_file_row(); ``` ## Debug Datetime Conversions Datetime value conversion can be debugged by loading raw datetime from an Iceberg file into a `BLOB`. The example below creates a `books_debug` table with a `PublishTimestamp` attribute of type `DATETIME(6)` and an extra attribute, `RawTimestamp`, into which the raw timestamp data will be loaded. If the datetimes are stored as strings, try loading the raw values into a `TEXT` attribute in addition to, or instead of, using a `BLOB` attribute. ```sql CREATE TABLE books_debug( Id INT, Name TEXT, NumPages INT, Rating DOUBLE, PublishTimestamp DATETIME(6), RawTimestamp BLOB); ``` The `PIPELINE` statement below loads the `publishTimestamp` from the Iceberg file into both the `PublishTimestamp` column and the `RawTimestamp` column. A `SET` statement is used to convert the `publishTimestamp` from Iceberg to the `PublishTimestamp` in SingleStore. The column names on the left side of the `<-` are the column names from the SingleStore table into which the data will be loaded. The column names on the right side of the `<-` are the column names from the Iceberg file which is to be loaded into SingleStore. ```sql CREATE PIPELINE books_debug_pipe AS LOAD DATA S3 '' CONFIG '{ "region":"us-west-2", "catalog_type": "GLUE", "catalog_name": "s3_catalog", "table_id": "dbtest.books" }' CREDENTIALS '{ "aws_access_key_id": "", "aws_secret_access_key": "", "aws_session_token": ""}' INTO TABLE books_debug (Id <- IdIce, Name <- Name, NumPages <- NumPages, Rating <- Rating, @ts<-PublishTimestamp, RawTimestamp<-PublishTimestamp ) FORMAT ICEBERG SET PublishTimestamp = DATE_ADD('1970-01-01', INTERVAL @ts MICROSECOND); ``` Use `TEST PIPELINE` to compare the values in the two columns. The `PublishTimestamp` column shows the converted timestamp, the `RawTimestamp` column contains the timestamp from the Iceberg file, in microseconds in this example. The combination of these two columns can be used to debug the conversion used in the `SET` statement. ```sql TEST PIPELINE books_debug_pipe; ``` ```output +------+--------------------+----------+--------+------------------+-------+---------------------+ | Id | Name | NumPages | Rating | RawTimestamp | PublishTimestamp | +------+--------------------+----------+--------+------------------+-------+---------------------+ | 1 | Happy Place | 400 | 4.9 | 1680721200000000 | 2023-04-05 12:00:00.000000 | | 2 | Legends & Lattes | 304 | 4.9 | 1669665600000000 | 2022-11-28 12:00:00.000000 | | 3 | The Vanishing Half | 352 | 4.9 | 1591124400000000 | 2020-06-02 12:00:00.000000 | +------+--------------------+----------+--------+------------------+-------+---------------------+ ``` ## Troubleshooting The following table lists errors that can occur when creating an Iceberg Ingest pipeline. Also, refer to [Debugging Pipeline Errors](https://docs.singlestore.com/cloud/load-data/about-singlestore-pipelines/pipeline-troubleshooting/debugging-pipeline-errors.md) for additional information on troubleshooting pipeline errors. | Catalog | Error | Cause and Resolution | | --------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Snowflake | Certificate for <...> doesn't match any of the subject alternative names. | An incorrect URI may cause this error.Verify that the catalog.uri is valid. | | Snowflake | SEVERE: WARNING!!! Using fail-open to connect. Driver is connecting to an HTTPS endpoint without OCSP based Certificate Revocation checking as it could not obtain a valid OCSP Response to use from the CA OCSP responder. | This issue needs to be resolved on the Snowflake side. Refer to[OCSP Configuration](https://docs.snowflake.com/en/user-guide/ocsp#ocsp-certification-checks-require-port-80)for more information | | Snowflake | Parquet parsing errors such as “Dictionary encoding not implemented". | Set Snowflake table property`STORAGE_SERIALIZATION_POLICY = COMPATIBLE`as in[CREATE ICEBERG TABLE (Snowflake as the Iceberg catalog) \| Snowflake Documentation](https://docs.snowflake.com/en/sql-reference/sql/create-iceberg-table-snowflake). | | All | protocol error … Process died unexpectedly or didn't start. | An incorrect value of the engine variable`java_pipelines_java11_path`may cause this error. Ensure sure the path is valid, for example, “/usr/bin/java”. | | All | java.lang.OutOfMemoryError: Java heap space | This error occurs if:

Heap memory usage exceeds the pipelines_iceberg_heap_size variable limit
There is an excessive amount of table metadata
Parquet row groups are very large

To resolve:

If the Parquet row groups are very large, switch to a MERGE pipeline by removing the ingest_mode option from the CONFIG call and adding MERGE so the statement becomes MERGE INTO <table_name>
Increase the value of the pipelines_iceberg_heap_size engine variable

| | All | Unsupported snapshot with operation`%s`for table`%s`(snapshot=`%d`). | This error occurs when the Iceberg pipeline is used on an unsupported snapshot type.To resolve:

Set ignore_unsupported_modifications config value to true to ignore it.
If applicable (for example, when you want to ingest updates), set ingest_mode to upsert so that updates are processed.
Alternatively, remove the ingest_mode option from the CONFIG and use a MERGE pipeline instead, so the statement becomes MERGE INTO <table_name>.

| ## Related Topics * [CREATE PIPELINE](https://docs.singlestore.com/cloud/reference/sql-reference/pipelines-commands/create-pipeline.md) * [CREATE LINK](https://docs.singlestore.com/cloud/reference/sql-reference/security-management-commands/create-link.md) *** Modified at: April 25, 2026 Source: [/cloud/load-data/data-sources/iceberg-ingest/](https://docs.singlestore.com/cloud/load-data/data-sources/iceberg-ingest/) (An index of the documentation is available at /llms.txt)