Data Loading for S3 Pipelines
For S3 pipelines, each leaf node partition will process a single object from the source bucket in a batch.
If the source bucket contains objects that greatly differ in size, it’s important to understand how an S3 pipeline’s performance may be affected.
partition1 is processing an object that is 1KB in size, while
partition2 is processing an object that is 10 MB in size.
partition1 will finish processing its object sooner than
partition1 will sit idle and will not extract the next object from the bucket until
partition2 finishes processing its 10 MB object.
partition2 are both finished processing their respective objects.
Last modified: June 22, 2022