Data Shaping with Pipelines
On this page
After data is extracted from a SingleStore Pipeline’s data source, it can be optionally shaped (modified).
Some common data shaping operations that can be performed are:
-
Lookups from other SingleStore tables (in addition to the destination table(s))
-
Normalizing data
-
Denormalizing data
-
Adding computed columns
-
Filtering data (excluding specific columns or records)
-
Mapping data values from the data source to new values
-
Splitting records from the data source into multiple destination tables
-
Adding surrogate keys
Data modifications made during shaping are not written back to the data source, unless done explicitly in a transform (SingleStore Self-Managed only).
Ways to specify data shaping logic:
-
In a
CREATE PIPELINE
statement. -
In a stored procedure that is called from the pipeline.
-
In a transform that is called from the pipeline.
Methods for Data Shaping with Pipelines
The details of each data shaping method are explained in the following table.
Data Shaping Method |
Amount of Customization Logic Allowed |
Ease of Use |
Comments |
Examples |
---|---|---|---|---|
In a |
Low |
Easiest |
Pros: Generally, runs the fastest of the three data shaping methods; transactional guarantees. |
|
Pipeline Stored Procedure |
Medium |
More Difficult |
Pros: Transactional guarantees; cons of specifying data shaping logic directly in your |
See examples in CREATE PIPELINE . |
Transform |
High |
Most Difficult |
Pros: Can use any nearly any programming language and leverage third-party libraries. |
See the guide Writing a Transform to Use with a Pipeline |
Last modified: October 23, 2023