CREATE PIPELINE .
. . WITH TRANSFORM
On this page
Creates a pipeline that uses a transform.
A transform is one of three methods you can use to shape data ingested from a pipeline.
SingleStoreDB Cloud does not support transforms.
CREATE PIPELINE .: Each of the transform’s parameters are described below:
uri: The transform’s URI is the location from where the user-provided program can be downloaded, which is specified as either an
If the URI contains a tarball with a
.extension, its contents will be automatically extracted.
tgz If the
uricontains a tarball, the
programparameter must also be specified.
Alternatively, if the URI specifies the user-provided program filename itself (such as
argumentsparameters can be empty.
program: The filename of the user-provided program to run.
This parameter is required if a tarball was specified as the endpoint for the transform’s
urlspecifies the user-provided program file itself, this parameter can be empty.
arguments: A series of arguments that are passed to the transform at runtime.
Each argument must be delimited by a space.
For information on creating a pipeline other than using the
WITH TRANSFORM clause, see CREATE PIPELINE.
WITH TRANSFORM('http://memsql.com/my-transform-tarball.tar.gz', 'my-transform.py','')
WITH TRANSFORM('http://memsql.com/my-transform-tarball.tar.gz', 'my-transform.py', '-arg1 -arg1')
During pipeline creation, a cluster’s master aggregator distributes the transform to each leaf node in the cluster.
Each leaf node then executes the transform every time a batch partition is processed.
CREATE PIPELINEstatement is executed, the transform must be accessible at the specified file system or network endpoint.
If the transform is unavailable, pipeline creation will fail.
Depending on your desired language used to write the transform and your desired platform used to deploy the transform, any virtual machine overhead may greatly reduce a pipeline’s performance.
Transforms are executed every time a batch partition is processed, which can be many times per second. Virtual machine overhead will reduce the execution speed of a transform, and thus degrade the performance of the entire pipeline.
You must install any required dependencies for your transform (such as Python) on each leaf node in your cluster.
Test out your pipeline by running
TEST PIPELINEbefore running
START PIPELINEto make sure your nodes are set up properly.
Transforms can be written in any language, but the SingleStoreDB node’s host Linux distribution must have the required dependencies to execute the transform.
For example, if you write a transform in Python, the node’s Linux distribution must have Python installed and configured before it can be executed.
At the top of your transform file, use a shebang to specify the interpreter to use to execute the script (e.
#!/usr/bin/env python3for Python 3 or
#!/usr/bin/env rubyfor Ruby).
Use Unix line endings in your transform file.
A transform reads from
stdinto receive data from a pipeline’s extractor.
After shaping the input data, the transform writes to
stdout, which returns the results to the pipeline.
Transactional guarantees apply to data written to
There are no transactional guarantees for any side effects that are coded in the transform logic.
SUPERpermission if using a transform and the URI in the
WITH TRANSFORMclause does not have the prefix
For an example implementation of a transform, See Writing a Transform to Use with a Pipeline.
Last modified: May 17, 2023