Pipeline Troubleshooting

Concepts

This topic requires an understanding of pipeline batches, which are explained in The Lifecycle of a Pipeline.

Monitoring

Users can monitor the status and performance of pipelines in SingleStore via a dashboard. The Pipeline Summary section of this dashboard provides a high-level overview of the state of all pipelines, including the number and percentage of pipelines in various states including running, stopped, and errored.

The Pipeline Performance section also provides insights into the operation of pipelines. For example, metrics such as execution count, average CPU time per execution, average elapsed time per execution, and others, can aid in identifying and optimizing aspects of pipeline performance.

Address specific errors

The following table lists errors that can occur when running a pipeline, such as CREATE PIPELINE, and errors that can occur while a pipeline is extracting, shaping, or loading data.

Error	Resolution
Syntax error when running `CREATE PIPELINE`.	Both `CREATE PIPELINE` and `LOAD DATA` (which is part of the `CREATE PIPELINE` syntax) have many options. Verify that the included options are specified in the correct order.
Error `1970: Subprocess timed out`	The master aggregator is likely unable to connect to the pipeline's data source. Check the connection parameters, such as `CONFIG` and `CREDENTIALS`, that specify how to connect to the data source.Additionally, verify that the data source is reachable from the master aggregator.
`CREATE PIPELINE ... S3` returns an error that the bucket cannot be located.	The bucket name is case-sensitive. Verify that the S3 bucket names are in the same case as the bucket names specified in your `CREATE PIPELINE ... S3` statement.
Error `1953: exited with failure result (8 : Exec format error)` or `No such file or directory`	This error can occur when a pipeline attempts to run a transform. Check the following: Verify that the first line of your transform contains a shebang. This specifies the interpreter (such as Python) to use to run the script. Is the interpreter (such as Python) installed on all leaves? If the transform was written on a Windows machine, do the newlines use `\r\n`?
`CREATE PIPELINE ... WITH TRANSFORM` fails with a `libcurl` error.	An incorrect path to the transform was likely specified. If the path to the transform is correct, then running `curl` with the path to the transform will succeed.
Error: `java.lang.OutOfMemoryError: Java heap space`	This error may occur when the heap memory usage exceeds the value of `java_pipelines_heap_size` variable. Increase the value of this engine variable to potentially resolve this error.
A parsing error occurs in your transform.	To debug your transform, you can run `EXTRACT PIPELINE ... INTO OUTFILE`. This command saves a sample of the data extracted from the data source to a file. For debugging purposes, you can make changes to the file as needed and then send the file to the transform. For more information, refer to EXTRACT PIPELINE … INTO OUTFILE.
S3 pipeline create/start delays (approximately 60 seconds) or “Subprocess timed out” outside AWS	Reduce the value of the `subprocess_ec2_metadata_timeout_ms` engine variable (for example, to `1000`) or provide explicit S3 `CREDENTIALS`. Refer to Sync Variables Lists for more information.

Error: An error that isn't associated with any specific source partition of the pipeline occurred during the batch loading process. The whole batch will be failed.

Issue

The batch loading process was able to load the data from the source, but it failed to ingest the data into the SingleStore database. This error is caused by a secondary error which is the root cause of the pipeline failure. The secondary error may be caused by resource overload, lock wait timeouts, etc.

Solution

Address the secondary error to solve the issue. Query the PIPELINES_ERRORS information schema table for more information on the error that caused the pipeline failure.

For example, consider the following error:

Error,2790,"An error that isn't associated with any specific source partition of the pipeline
occurred during the batch loading process. The whole batch will be failed.
Error 1205 : ""Leaf Error (svchost:3306): Lock wait timeout exceeded; try restarting transaction.
Unique key Row Value lock owned by connection id xxxx, query `open idle transaction`"""

In this case, the pipelines failed because the query was unable to acquire row locks for ingesting data. Identify the transaction that caused the timeout, and kill its connection.

Rename a table referenced by a pipeline

Trying to rename a table that is referenced by a pipeline returns the following error:

SQL

ERROR 1945 ER_CANNOT_DROP_REFERENCED_BY_PIPELINE: Cannot rename table because it is referenced by pipeline <pipeline_name>

The following sequence demonstrates how to rename a pipeline referenced table:

Save the pipeline settings.

SQL

SHOW CREATE PIPELINE <pipeline_name> EXTENDED;

Stop the pipeline.
SQL
```
STOP PIPELINE <pipeline_name>;
```
Drop the pipeline.
SQL
```
DROP PIPELINE <pipeline_name>;
```

Change the name of the table.

SQL

ALTER TABLE <old_table_name> RENAME <new_table_name>;

Recreate the pipeline with the required configuration options, and change the table name to reflect the new table name.
Start the pipeline.
SQL
```
START PIPELINE <pipeline_name>;
```

Pipeline errors that are handled automatically

Typical error handing scenario

In most situations, an error that occurs while a pipeline is running is handled in this way:

If an error occurs while a batch is running, the batch fails and its transactions are rolled back.

Note

Monitor the source-specific configurations that you have set to ensure that your pipelines are operating within those limits.

For example, if operation.timeout.ms is set to 10 seconds in Kafka and a Kafka offset takes 20 seconds to fetch, an error will be thrown. To avoid this error, increase the operation.timeout.ms limit.

The following table lists events, which may or may not cause errors, and how the events are handled.

Event	How the Event is Handled
The pipeline cannot access a file or object.	`nb` skips the file/object.
The pipeline cannot read a file or object because it is corrupted.	`nb` skips the file or object. After fixing the issue with the corrupted file/object, you can have the pipeline reprocess the file/object by running `ALTER PIPELINE ... DROP FILE <filename>;`. The pipeline will process the file/object during the next batch.
A file or object is removed from the filesystem after the batch has started processing the file/object.	The batch does not fail; the file or object is processed.
A file is removed from the filesystem (or an object is removed from an object store) after the pipeline registers the file/object in `information_schema.PIPELINES_FILES`, but before the file/object is processed.	`nb` skips the file or object.
The cluster restarts while the batch is being processed.	The typical error handling scenario (mentioned earlier in this topic) applies. Once the cluster is online, `b` is retried.
A leaf node is unavailable before the pipeline starts.	This does not cause the pipeline to fail. The pipeline will not ingest any data to the unavailable leaf node.
A leaf node fails while the pipeline is running.	The batch fails. The batch is retried as described in this error handling scenario; that batch and all future batches no longer attempt to load data to the unavailable leaf node.
An aggregator fails while the pipeline is running	The batch fails. When the aggregator is available, the batch is retried as described in the typical error handling scenario.
The pipeline reaches the allocated storage space for errors.	The pipeline pauses. To resolve this issue: Increase the value of the `ingest_errors_max_disk_space_mb` engine variable. Run `CLEAR PIPELINE ERRORS;` to free up storage space for errors. (Running this command removes all the existing pipeline errors that are shown when running `SHOW ERRORS;`).

Additional Information

For information on troubleshooting pipeline errors and performance issues, refer to Pipeline Dashboards.

On this page