Debugging Pipeline Errors

Viewing pipeline errors

The Data Ingest provide information about pipeline errors that have occurred. Some useful queries against these tables are provided in this section.

Query the information_schema.PIPELINES_ERRORS table

You can run the following query to show all errors that have occurred, per database, per pipeline, per batch, and per partition.

SELECT DATABASE_NAME, PIPELINE_NAME, BATCH_ID, PARTITION, BATCH_SOURCE_PARTITION_ID,
ERROR_KIND, ERROR_CODE, ERROR_MESSAGE, LOAD_DATA_LINE_NUMBER, LOAD_DATA_LINE
FROM information_schema.PIPELINES_ERRORS;

Query files that were skipped

The query in the previous section does not show files that were skipped because they had errors. To return such files that were skipped per database and per pipeline (but not per batch nor per partition), run the following query.

SELECT * FROM information_schema.PIPELINES_FILES WHERE FILE_STATE = 'Skipped';

If you need additional information, such as the database, the partition, the error that was generated and the line of the error file or object that caused the issue, run the following query.

SELECT pe.DATABASE_NAME, pe.PIPELINE_NAME, pe.BATCH_ID, pe.PARTITION,
pe.BATCH_SOURCE_PARTITION_ID, pe.ERROR_TYPE, pe.ERROR_KIND, pe.ERROR_CODE, pe.ERROR_MESSAGE,
pe.LOAD_DATA_LINE_NUMBER, pe.LOAD_DATA_LINE
FROM information_schema.PIPELINES_FILES pf, information_schema.PIPELINES_ERRORS pe
WHERE pe.BATCH_SOURCE_PARTITION_ID = pf.FILE_NAME and pf.FILE_STATE = 'Skipped';

Use CREATE PIPELINE… FORCE to Collect More Logs

When the CREATE PIPELINE statement fails, more information can be found with the SHOW WARNINGS command.

If SHOW WARNINGS does not reveal enough information, the pipeline can be created using the FORCE option and the errors can be fixed later. This will create a pipeline regardless of the connection error and store the errors in the information_schema.pipelines_errors table.

CREATE PIPELINE <pipeline_name> FORCE AS LOAD DATA <data_source>
INTO TABLE <table_name>;

Use SELECT * to view the pipeline errors.

SELECT * FROM information_schema.pipelines_errors;
| DATABASE_NAME | PIPELINE_NAME | ERROR_UNIX_TIMESTAMP | ERROR_TYPE | ERROR_CODE | ERROR_MESSAGE|

Logging to Show More Information with Failures

Some data sources have debug logging availability which provides additional information along with the errors.

Use the following engine variable and command in the order they appear:

  • SET GLOBAL pipelines_extractor_debug_logging=ON - this engine variable will set debug logging for pipelines if available.

  • FLUSH EXTRACTOR POOLS - Flush extractor pools to remove any cached extractors.

Larger errors in the information_schema.pipelines_errors table will show more context about the cause of the failure.

Collecting Full Error Logs

This method is particularly useful when creating pipelines, where the error messages received may not be descriptive enough or can be truncated due to size limitations.

By default, pipeline errors are limited to 65 KB of characters. However, with debug logging turned on (using the engine variable pipelines_extractor_debug_logging), errors can easily grow up to a few megabytes in size. This results in the error log being truncated.

To capture the complete error details, use a debug pipeline. This allows you to collect larger error buffers, up to 10 MB.

Note

  • Using a debug pipeline will slow down the pipeline execution, and it should only be used for error analysis.

  • The debug pipeline behaves as if the pipelines_extractor_debug_logging variable is set to on. It will produce all of the debugging logs.

CREATE PIPELINE p DEBUG as
LOAD DATA KAFKA 'host.example.com/my-topic'
INTO TABLE t;

The debugging logs will be stored in the information_schema.pipeline_errors table. View these errors by exporting them in a new file.

SELECT * FROM information_schema.pipeline_errors INTO OUTFILE '/tmp/out.txt';

The out.txt file will contain the errors with complete debugging logs for further analysis.

Debug Running Pipelines

There are six tables in the information_schema database that relate to pipelines:

  • PIPELINES - stores high-level information about any pipelines.

  • PIPELINES_BATCHES_SUMMARY - contains high-level information about individual batches as they are loaded into the database.

  • PIPELINES_BATCHES - contains detailed, low-level information about individual batches as they are loaded into a database.

  • PIPELINES_ERRORS - contains detailed information about errors that occurred during extraction, transformation, or loading. Each row represents a single error event.

  • PIPELINES_CURSORS - contains information about a pipeline’s offset ranges.

  • PIPELINES_FILES - stores information about files that have been extracted from a file system-like data source, such as Amazon S3.

Last modified: November 18, 2024

Was this article helpful?