PIPELINES_BATCHES

This view contains detailed, low-level information about individual batches as they’re loaded into the database.

It contains one row per batch partition and is useful for understanding partition-level details of a pipeline’s behavior.

The data in this view is not persistent across node restarts. After restarting a leaf node, the view will have batches with NULLs for the respective batch partitions. If the MA is restarted, all batch data is removed. To maintain this data across restarts, write the data to a separate table periodically.

The following pipeline engine variable is used to control an aspect of pipeline behavior. For more detailed information consult the Sync Variables Lists section of the Engine Variable List.

  • pipelines_batches_metadata_to_keep - controls the number of batch metadata entries to keep before they are overwritten by incoming batches.

information_schema.PIPELINES_BATCHES Schema

Name

Description

DATABASE_NAME

The name of the database associated with the batch.

PIPELINE_NAME

The name of the pipeline associated with the batch.

BATCH_ID

The internal unique identifier for the batch. While every row in the table contains unique information about a batch, multiple rows with the same BATCH_ID may be present.

BATCH_STATE

Specifies whether the batch transaction was successful. Possible values: Succeeded, In Progress, Failed, Queued, Canceled, or No Data. Succeeded: The batch was successfully loaded into the destination table. In Progress: The batch is currently being loaded into the destination table. Failed: The batch failed to be written. Batch failures are written as errors in the information_schema.PIPELINES_ERRORS table, and you can use the BATCH_ID to correlate a specific batch failure with more detailed error information. Queued: The batch is waiting for enough resources in the cluster to free up so it can run. Canceled: a pipeline batch's internal query has been killed. That can be triggered internally, by running STOP PIPELINE while a batch is running, or externally by CTRL-C-ing START PIPELINE FOREGROUND. No Data: The batch contained no data. Batches that don’t contain data will be deleted as new batches are written.

BATCH_ROWS_WRITTEN

The number of rows that were affected in the destination table as a result of the batch transaction. When batches are written to the destination table, they consist of insert, replace, upsert, or delete operations. The BATCH_ROWS_WRITTEN value is the sum total of all the operations that occurred when the batch was written. This value may be NULL if the BATCH_STATE is any value other than Succeeded.

BATCH_TIME

The elapsed time in seconds from the start of the batch transaction to its success or failure. This value reflects the complete transaction time from the perspective of the master aggregator node. Therefore, the elapsed time may be longer than a BATCH_PARTITION_TIME value for a given batch, as BATCH_PARTITION_TIME reflects the perspective of the leaf node.

BATCH_START_UNIX_TIMESTAMP

The time that the batch transaction started in Unix timestamp format.

BATCH_PARTITION_STATE

Specifies whether the batch transaction was successful for a given partition. There are five possible values: Succeeded, In Progress, Failed, No Data, or NULL. Succeeded: The batch was successfully loaded into the destination table. In Progress: The batch is currently being loaded into the destination table. Failed: The batch failed to be written. Batch failures are written as errors in the information_schema.PIPELINES_ERRORS table, and you can use the BATCH_ID and PARTITION values to correlate a specific batch failure with more detailed error information. No Data: The batch contained no data. Batches that don’t contain data will be deleted as new batches are written. NULL: The batch data either contains no data, or the batch transaction hasn’t yet been executed on the partition itself.

BATCH_PARTITION_PARSED_ROWS

The number of rows that were parsed during the batch transaction for a given partition. This value can be NULL if the batch transaction is still in progress or if the batch failed.

BATCH_SOURCE_PARTITION_ID

The identifier for a data source’s partition from which batch partition data is extracted. This value may be NULL if the batch’s BATCH_PARTITION_STATE is also NULL. For an S3 pipeline, this represents the name of an object in the source bucket.

BATCH_EARLIEST_OFFSET

Specifies the earliest offset for the batch partition. This value indicates the start of the offset range for a batch, while BATCH_LATEST_OFFSET indicates the end of the offset range. This value may be NULL if the batch’s BATCH_PARTITION_STATE is also NULL. For an S3 pipeline, this value will always be 0 when a bucket’s object has been successfully processed. For more information, see Load Data from Amazon Web Services (AWS) S3

BATCH_LATEST_OFFSET

Specifies the latest offset for the batch. This value indicates the end of the offset range for a batch, while BATCH_EARLIEST_OFFSET indicates the start of the offset range. This value may be NULL if the batch’s BATCH_PARTITION_STATE is also NULL. For an S3 pipeline, this value will always be 1 when a bucket’s object has been successfully processed. For more information, see Load Data from Amazon Web Services (AWS) S3

BATCH_PARTITION_TIME

The elapsed time in seconds from the start of the batch transaction to its success or failure. This value reflects the batch transaction time from the perspective of the leaf node. Therefore, the elapsed time may be shorter than a BATCH_TIME for a given batch, as BATCH_TIME reflects the perspective of the master aggregator node.

BATCH_PARTITION_EXTRACTED_BYTES

The number of bytes that was extracted from the data source for the batch. This value may be NULL if no data was extracted for the batch.

BATCH_PARTITION_TRANSFORMED_BYTES

The number of bytes that were transformed for the batch. This value may be NULL if a transform was not specified for the pipeline, or if no data was successfully transformed.

BATCH_PARTITION_EXTRACTOR_WAIT_TIME

The amount of time, in seconds, that the pipeline spent waiting on the extractor to complete its operation.

BATCH_PARTITION_TRANSFORM_WAIT_TIME

The amount of time, in seconds, that the pipeline spent waiting on the transform to complete its operation.

HOST

The hostname or host IP address for the leaf node that processed the batch. This value may be NULL if BATCH_SOURCE_PARTITION_ID or PARTITION is also NULL. The combination of a batch’s HOST, PORT, and PARTITION identify the specific leaf node partition that attempted to load batch data.

PORT

The port number for the leaf node that processed the batch. This value may be NULL if BATCH_SOURCE_PARTITION_ID or PARTITION is also NULL. The combination of a batch’s HOST, PORT, and PARTITION identify the specific leaf node partition that attempted to load batch data.

PARTITION

Specifies the partition ID on a leaf node that processed the batch. This value may be NULL if BATCH_SOURCE_PARTITION_ID is also NULL. The combination of a batch’s HOST, PORT, and PARTITION identify the specific leaf node partition that attempted to load batch data.

Last modified: March 4, 2024

Was this article helpful?