Best practices guide
On this page
The following sections represent best practices for administering and operating a Dremio cluster.
Think in Terms of Several Discrete Data Reflections
Data Reflections allow administrators to be iterative in their approach to performance optimization.
Optimize data reflection
To determine the optimal set of Data Reflections, Administrators should isolate known query patterns into groups that do not interact with one another.
-
Smaller reflections on disk.
-
More efficient Data Reflection maintenance can be performed.
-
Queries can be executed more efficiently.
Keep in mind that a single query can use multiple Data Reflections and a single Data Reflection can serve many queries.
Accelerate a query pattern
Dremio supports two types of Data Reflections: Raw Reflections and Aggregation Reflections.
Aggregation
Dremio can pre-aggregate data at multiple levels of granularity.
Calculated Fields
For calculated fields that are frequently used by Data Consumers, administrators have a few different options for accelerating these calculations:
-
Add the calculated field to a virtual dataset - The administrator can add a new column that provides the calculation.
Depending on the expression, Dremio may be able to match the new column without making the Data Consumers explicitly use the new column. Otherwise, they will need to include the new column in their queries. -
Use a Supporting Anchor Dataset - The administrator can create a Supporting Anchor Dataset that includes the calculated field along with other fields from the dataset, and Dremio will automatically use the associated Data Reflection to accelerate the query.
Last modified: June 22, 2022