Parallelized Data Extraction with Pipelines
A pipeline extracts data from a source, in parallel, using these general rules:
-
The pipeline pairs n number of source partitions or objects with p number of SingleStore leaf node partitions.
-
Each leaf node partition runs its own extraction process independently of other leaf nodes and their partitions.
-
Extracted data is stored on the leaf node where a partition resides until it can be written to the destination table.
Depending on the way your table is sharded, the extracted data may only temporarily be stored on this leaf node.
Note
The term batch partition is used below and elsewhere in the documentation.
In this section
Last modified: January 20, 2022