Parallelized Data Extraction with Pipelines

A pipeline extracts data from a source, in parallel, using these general rules:

  • The pipeline pairs n number of source partitions or objects with p number of SingleStore leaf node partitions.

  • Each leaf node partition runs its own extraction process independently of other leaf nodes and their partitions.

  • Extracted data is stored on the leaf node where a partition resides until it can be written to the destination table. Depending on the way your table is sharded, the extracted data may only temporarily be stored on this leaf node.

Note

The term batch partition is used below and elsewhere in the documentation. A batch partition is a partition in the data source. If the data source does not contain partitions, then a batch partition refers to a single object in the data source.

In this section

Last modified: January 20, 2022

Was this article helpful?