EXTRACT PIPELINE … INTO OUTFILE
This command takes a sample of the data streaming into your pipeline and copies it into a file on disk. After your file has been created, you can pipe it back into your transform for iterative testing and debugging of the transform.
Syntax
EXTRACT PIPELINE pipe_line [FROM 'source_partition' [OFFSETS start_offset TO end_offset] ] INTO OUTFILE 'file_name'
Remarks
pipe_line
is the configured pipeline.file_name
the output file containing your sample data.source_partition
is a source partition ID.start_offset
andend_offset
can be used to extract the exact range of sample data.This command causes implicit commits. See COMMIT for more information.
See the Permission Matrix for the required permission.
Note
You cannot run EXTRACT PIPELINE
when the pipeline is in a Running
or Error
state.
Return Type
A file containing transform data that can be used during debugging operations. For example, the following will take the output file and pipe it into a transform file. This can show you any mistakes in how your transform code is applied to the data streaming into your pipeline.
cat sample_output | python transform.py
Examples
The following saves random sample data.
EXTRACT PIPELINE p INTO OUTFILE 'transform_output';
The following is useful if there is a specific partition or file with a known problem.
EXTRACT PIPELINE p FROM '6' INTO OUTFILE 'transform_output';
The following extracts an exact range of data, which is useful if the problematic data is in a specifically known kafka region.
EXTRACT PIPELINE p FROM '10' OFFSETS 0 TO 6 INTO OUTFILE 'transform_output';