EXTRACT PIPELINE … INTO OUTFILE

This command takes a sample of the data streaming into your pipeline and copies it into a file on disk. After your file has been created, you can pipe it back into your transform for iterative testing and debugging of the transform.

Syntax

EXTRACT PIPELINE pipe_line
[FROM 'source_partition'
[OFFSETS start_offset TO end_offset]
]
INTO OUTFILE 'file_name'

Remarks

  • pipe_line is the configured pipeline.

  • file_name the output file containing your sample data.

  • source_partition is a source partition ID.

  • start_offset and end_offset can be used to extract the exact range of sample data.

  • This command causes implicit commits. Refer to COMMIT for more information.

  • Refer to the Permission Matrix for the required permission.

Note

You cannot run EXTRACT PIPELINE when the pipeline is in a Running or Error state.

Return Type

A file containing transform data that can be used during debugging operations. For example, the following will take the output file and pipe it into a transform file. This can show you any mistakes in how your transform code is applied to the data streaming into your pipeline.

cat sample_output | python transform.py

Examples

The following saves random sample data.

EXTRACT PIPELINE p INTO OUTFILE 'transform_output';

The following is useful if there is a specific partition or file with a known problem.

EXTRACT PIPELINE p FROM '6' INTO OUTFILE 'transform_output';

The following extracts an exact range of data, which is useful if the problematic data is in a specifically known kafka region.

EXTRACT PIPELINE p FROM '10' OFFSETS 0 TO 6 INTO OUTFILE 'transform_output';

Last modified: April 6, 2023

Was this article helpful?