CREATE PIPELINE . . . WITH TRANSFORM
On this page
Creates a pipeline that uses a transform.
A transform is one of three methods you can use to shape data ingested from a pipeline.
Note
SingleStore Helios does not support transforms.
Syntax
CREATE PIPELINE .
: Each of the transform’s parameters are described below:
-
uri
: The transform’s URI is the location from where the user-provided program can be downloaded, which is specified as either anhttp://
orfile://
endpoint.If the URI contains a tarball with a .
ortar. gz .
extension, its contents will be automatically extracted.tgz If the uri
contains a tarball, theprogram
parameter must also be specified.Alternatively, if the URI specifies the user-provided program filename itself (such as file://localhost/root/path/to/my-transform.
), thepy program
andarguments
parameters can be empty. -
program
: The filename of the user-provided program to run.This parameter is required if a tarball was specified as the endpoint for the transform’s url
.If the url
specifies the user-provided program file itself, this parameter can be empty. -
arguments
: A series of arguments that are passed to the transform at runtime.Each argument must be delimited by a space.
Note
For information on creating a pipeline other than using the WITH TRANSFORM
clause, see CREATE PIPELINE.
WITH TRANSFORM('http://memsql.com/my-transform.py','','')
WITH TRANSFORM('file://localhost/root/path/to/my-transform.py','','')
WITH TRANSFORM('http://memsql.com/my-transform-tarball.tar.gz', 'my-transform.py','')
WITH TRANSFORM('http://memsql.com/my-transform-tarball.tar.gz', 'my-transform.py', '-arg1 -arg1')
Remarks
-
During pipeline creation, a cluster’s master aggregator distributes the transform to each leaf node in the cluster.
Each leaf node then executes the transform every time a batch partition is processed. -
When the
CREATE PIPELINE
statement is executed, the transform must be accessible at the specified file system or network endpoint.If the transform is unavailable, pipeline creation will fail. -
Depending on your desired language used to write the transform and your desired platform used to deploy the transform, any virtual machine overhead may greatly reduce a pipeline’s performance.
Transforms are executed every time a batch partition is processed, which can be many times per second. Virtual machine overhead will reduce the execution speed of a transform, and thus degrade the performance of the entire pipeline. -
You must install any required dependencies for your transform (such as Python) on each leaf node in your cluster.
Test out your pipeline by running TEST PIPELINE
before runningSTART PIPELINE
to make sure your nodes are set up properly. -
Transforms can be written in any language, but the SingleStore node’s host Linux distribution must have the required dependencies to execute the transform.
For example, if you write a transform in Python, the node’s Linux distribution must have Python installed and configured before it can be executed. -
At the top of your transform file, use a shebang to specify the interpreter to use to execute the script (e.
g. #!/usr/bin/env python3
for Python 3 or#!/usr/bin/env ruby
for Ruby). -
Use Unix line endings in your transform file.
-
A transform reads from
stdin
to receive data from a pipeline’s extractor.After shaping the input data, the transform writes to stdout
, which returns the results to the pipeline. -
Transactional guarantees apply to data written to
stdout
, only.There are no transactional guarantees for any side effects that are coded in the transform logic. -
Requires
SUPER
permission if using a transform and the URI in theWITH TRANSFORM
clause does not have the prefixmemsql://
.
Note
For an example implementation of a transform, See Writing a Transform to Use with a Pipeline.
Last modified: February 5, 2024