Data Reflection and Query Acceleration

Data Reflection

Dremio maintains physically optimized representations of source data known as Data Reflections. The query optimizer can accelerate a query by utilizing one or more Data Reflections to partially or entirely satisfy that query, rather than processing the raw data in the underlying data source.

Dremio supports two fundamental types of Data Reflections: Raw Reflections and Aggregation Reflections. Many of the options for configuring and managing both types of Data Reflections are the same, but they each optimize different types of query patterns.

Raw Reflections

Raw Reflections preserve row-level fidelity of the anchor dataset. A Raw Reflection includes one or more fields from the anchor dataset, and is sorted and partitioned by specific fields in the dataset retrieved from the database on Dremio. You can use Raw Reflections to perform a number of optimizations.

Aggregation Reflections

Aggregation Reflections maintain summary data about the anchor dataset. An Aggregation Reflection includes one or more dimensions and measures fields from the anchor dataset, sorted, partitioned and distributed by specified columns. Some of these columns are configured as dimensions that will be used in GROUP BY (or DISTINCT) statements, and other columns are configured as measures that will be used in calculations such as MAX, MIN, AVG, SUM, and COUNT.

Query Acceleration

Dremio uses Data Reflection for query acceleration.

When Dremio receives a user query, it first determines whether any Data Reflections have at least one physical dataset in common with the query after both have undergone dataset expansion. All Data Reflections that pass this step are then evaluated to determine if they cover the query.

For Data Reflections that cover the query, Dremio will determine the cost of using the Data Reflection to execute the query. These costs are then compared to the cost of executing the query against the physical datasets, and the lowest cost query plan is selected for physical plan generation. Typically using one or more Data Reflections will be less expensive than executing the query against the raw physical data.

Last modified: April 24, 2021

Was this article helpful?

Verification instructions

Note: You must install cosign to verify the authenticity of the SingleStore file.

Use the following steps to verify the authenticity of singlestoredb-server, singlestoredb-toolbox, singlestoredb-studio, and singlestore-client SingleStore files that have been downloaded.

You may perform the following steps on any computer that can run cosign, such as the main deployment host of the cluster.

  1. (Optional) Run the following command to view the associated signature files.

    curl undefined
  2. Download the signature file from the SingleStore release server.

    • Option 1: Click the Download Signature button next to the SingleStore file.

    • Option 2: Copy and paste the following URL into the address bar of your browser and save the signature file.

    • Option 3: Run the following command to download the signature file.

      curl -O undefined
  3. After the signature file has been downloaded, run the following command to verify the authenticity of the SingleStore file.

    echo -n undefined |
    cosign verify-blob --certificate-oidc-issuer https://oidc.eks.us-east-1.amazonaws.com/id/CCDCDBA1379A5596AB5B2E46DCA385BC \
    --certificate-identity https://kubernetes.io/namespaces/freya-production/serviceaccounts/job-worker \
    --bundle undefined \
    --new-bundle-format -
    Verified OK