Important

The SingleStore 9.1 release candidate (RC) gives you the opportunity to preview, evaluate, and provide feedback on new and upcoming features prior to their general availability. In the interim, SingleStore 9.0 is recommended for production workloads, which can later be upgraded to SingleStore 9.1.

Pipeline Built-in Functions

pipeline_source_file()

Pipelines persist the name of a file by using the pipeline_source_file() function. Use this function in the SET clause to set a table column to the name of the pipeline data source file.

For example, given the table definition CREATE TABLE b(isbn NUMERIC(13), title VARCHAR(50));, use the following statement to set the titles of files while ingesting data from AWS S3.

SQL

CREATE PIPELINE books AS
LOAD DATA S3 's3://<bucket_name>/Books/'
CONFIG '{"region":"us-west-2"}'
CREDENTIALS '{"aws_access_key_id": "<access_key_id>",                              
             "aws_secret_access_key": "<secret_access_key>"}'
SKIP DUPLICATE KEY ERRORS
INTO TABLE b
(isbn)
SET title = pipeline_source_file();

For more information on using the pipeline_source_file() function to load data from AWS S3, refer to Load Data from Amazon Web Services (AWS) S3.

pipeline_batch_id()

Pipelines persist the ID of the batch used to load data with the pipeline_batch_id() built-in function. Use this function in the SET clause to set a table column to the ID of the batch used to load the data.

For example, given the table definition CREATE TABLE t(b_id INT, column_2 TEXT);, use this statement to load the batch ID into the b_id column:

SQL

CREATE PIPELINE p AS LOAD DATA ... INTO TABLE t(@b_id,column_2) ...
SET b_id = pipeline_batch_id();

pipeline_source_metadata()

Pipelines persist metadata about the source file from which each row is ingested using the pipeline_source_metadata() function. Use this function in the SET clause to populate table columns. For each ingested row, this function returns the value of the specified metadata property associated with the source file.

The following are the supported metadata properties for each supported source:

Source	Supported Metadata Properties
S3	`size`, `last_modified_timestamp`, `entity_tag`, `owner`, `storage_class`, `file_name`
GCS	`size`, `last_modified_timestamp`, `entity_tag`, `owner`, `storage_class`, `file_name`
FS	`size`, `last_modified_timestamp`, `file_name`, `is_directory`, `file_mode`
Azure	`size`, `last_modified_timestamp`, `entity_tag`, `file_type`, `file_encoding`, `file_language`, `content_disposition`, `cache_control_settings`, `file_name`, `lease_status`, `lease_state`, `blob_type`

Note

To store metadata values, the corresponding table columns in the CREATE TABLE statement must exist and must be of type TEXT.

For example, given the table definition,

CREATE TABLE t(a TEXT, b TEXT, c TEXT,
    file_name TEXT, last_modified_timestamp TEXT,
    size TEXT, owner TEXT, entity_tag TEXT,
    storage_class TEXT);

Use the following statement to load source file metadata into the corresponding columns:

SQL

CREATE PIPELINE pl
AS LOAD DATA S3 '<path>'
CONFIG '<config>'
CREDENTIALS '<credentials>'
INTO TABLE t(a, b, c)
SET
  last_modified_timestamp = pipeline_source_metadata("last_modified_timestamp"),
  file_name               = pipeline_source_metadata("file_name"),
  entity_tag              = pipeline_source_metadata("entity_tag"),
  size                    = pipeline_source_metadata("size"),
  owner                   = pipeline_source_metadata("owner"),
  storage_class           = pipeline_source_metadata("storage_class");

After the pipeline runs, the target table includes the ingested data along with the metadata of the source file for each row.