Data Loading for S3 Pipelines
For S3 pipelines, each leaf node partition will process a single object from the source bucket in a batch.
If the source bucket contains objects that greatly differ in size, it’s important to understand how an S3 pipeline’s performance may be affected.partition1
is processing an object that is 1KB in size, while partition2
is processing an object that is 10 MB in size.partition1
will finish processing its object sooner than partition2
.partition1
will sit idle and will not extract the next object from the bucket until partition2
finishes processing its 10 MB object.partition1
and partition2
are both finished processing their respective objects.
Last modified: June 22, 2022