Important
The SingleStore 9.1 release candidate (RC) gives you the opportunity to preview, evaluate, and provide feedback on new and upcoming features prior to their general availability. In the interim, SingleStore 9.0 is recommended for production workloads, which can later be upgraded to SingleStore 9.1.
Cross Query Stats and Sampling Cache
On this page
The cross-query stats cache is an in‑memory cache on each aggregator node that stores rowcount statistics and sampling results from previously executed queries.
-
Reuse runtime rowcount statistics across related queries, improving cardinality estimates and enabling better plans for common subqueries in future executions.
-
Cache sampling query results so that the optimizer can avoid rerunning the same sampling queries, reducing optimization time and resource usage.
Each cache entry is identified by a 64‑bit hash derived from the query or subquery shape.
This feature is available in SingleStore 9.
Write Path: Collecting Runtime Stats
The cross-query stats cache can store runtime rowcount statistics and sampling results from executed queries.cross_ global variable controls which statistics are written into the cache and supports the following values:
-
Off: No cross-query stats are stored in the cross-query stats cache.
-
SamplingOnly: Only results from sampling queries are collected.
-
SamplingAndMisEstimated: Sampling results and significantly misestimated feedback rowcounts are stored, saving memory while still correcting bad estimates.
-
Full: All legitimate runtime rowcounts that can be used for future optimization are stored.
-
Auto: Reserved for future flexibility; currently behaves the same as
Offand is the default value.
SingleStore heuristically decides which profiled rowcounts are legitimate to store, and exposes the tracked entries in query profiles using the fr_ and fr_ fields.
Read Path: Using Cached Stats During Optimization
The optimizer can read the cross-query stats cache in two ways:
-
Use cached feedback rowcounts when estimating cardinalities during optimization.
-
Use cached sampling results instead of issuing new sampling queries.
These behaviors are controlled by the following session variables:
-
enable_: Use the cache when the optimizer would otherwise issue a sampling query.cross_ query_ stats_ for_ sampling -
optimizer_: Use runtime feedback rowcounts from the cache during cardinality estimation.use_ cross_ query_ feedback_ stats
Both variables are ENUMs with values Off, On, and Auto, and default to Auto (currently equivalent to Off).optimizer_ variable also affects the plan cache query hash, so changing it can cause new plans to be generated.
When enable_ is enabled, the optimizer reuses cached sampling results instead of issuing new sampling queries.
Cached sampling results expire based on cross_ and cross_.DROP SAMPLE FROM cross_) trigger new sampling queries.
When optimizer_ is enabled and a matching cached rowcount is found, that rowcount takes priority over other estimation sources, and is propagated through later phases of optimization.
Cache Management, TTL, and Memory Limits
The cross-query stats cache is a TTL‑based in‑memory cache with garbage collection behavior similar to the in‑memory plan cache.
The following global variables control expiry and memory usage:
|
Global Variable |
Description and Default Value |
|---|---|
|
|
A cache entry must be reused (written or read) at least once within this interval, otherwise it is evicted. Default: 12 × 60 minutes (12 hours) |
|
|
Soft memory limit (in megabytes) for the entire cache. Default: 1024 MB (1 GB) |
|
|
Maximum time window (in minutes) since last update during which sampling entries remain valid for reuse. Default: 5 minutes |
You can also configure whether to store query text for sampling entries using the global variable enable_.
Cache entries can also be cleared manually.
Observability and Profiling
Cross-query stats are primarily observable through query profiles.
-
fr_: Table identifier used by feedback reoptimization.track_ table_ id -
fr_Hash derived from the query/subquery shape; this hash matches thetrack_ hash: HASHcolumn in the cache tables.
In profiles, est_ indicates the source of estimated rowcounts.
-
CROSS_: Stats originate from the cross-query stats cache.QUERY_ STATS -
CROSS_: Sampling results are reused from the cache.QUERY_ CACHED_ SAMPLING
Additional observability is provided by the information schema table, its managed view, and the SHOW CROSS_ command.INFORMATION_ and INFORMATION_ for more information.
Interactions with Auto Profiling and Feedback Reoptimization
The cross-query stats cache is closely related to existing profiling and reoptimization features:
-
With auto profile smart mode,
enable_defaults toauto_ profile TRUEandauto_defaults toprofile_ type SMART.Under this mode, SingleStore automatically profiles runtime rowcounts for queries with distributed operations. When cross_is enabled, the system reuses these profiled stats to populate the cache, without changing auto profile behavior itself.query_ stats_ collection_ strategy -
(Auto) Feedback Reoptimization and cross-query stats both aim to improve plans using runtime rowcounts:
-
Feedback reoptimization reoptimizes a query using stats from its own execution.
-
Cross-query stats cache allows a query to leverage stats from other queries that share common subpatterns.
-
A more general hash algorithm for arbitrary subquery structures is integrated into feedback reoptimization, removing a previous limitation in SingleStore 9.
Examples
Example 1: Enable Collection and Inspect Cache Entries
The following steps show how to enable cross-query stats collection, run a query, and inspect the resulting cache entries.
-- Enable collection of all legitimate runtime rowcountsSET GLOBAL cross_query_stats_collection_strategy = 'Full';-- Run a representative queryPROFILE SELECT COUNT(*) FROM t WHERE created_at >= '2025-01-01';-- Inspect cache entries on the connected aggregatorSELECT * FROM information_schema.cross_query_stats_cache;-- View aggregate cache statisticsSHOW cross_query_stats_cache STATUS;
In this scenario:
-
The
PROFILEquery populates entries in the cross-query stats cache (subject to the heuristics described earlier). -
INFORMATION_exposes individual entries, includingSCHEMA. CROSS_ QUERY_ STATS_ CACHE HASH,ROWCOUNT_,INITIALIZED ROWCOUNT, andHIT_.COUNT -
SHOW cross_returns totals such as:query_ stats_ cache STATUS -
Number of cache entries
-
Total cache hits
-
Total memory in use (in bytes).
-
Example 2: Reuse Sampling Results and Reset Sampling Entries
This example enables reuse of sampling results, then shows how to drop only sampling-related entries without clearing the entire cache.
-- Allow sampling queries to use cached resultsSET SESSION enable_cross_query_stats_for_sampling = 'On';-- Run a query that requires samplingEXPLAIN SELECT * FROM t WHERE some_predicate;-- Later, reset only sampling-related cache entriesDROP SAMPLE FROM cross_query_stats_cache;-- Optionally, check status after the dropSHOW cross_query_stats_cache STATUS;
In this scenario:
-
Turning on
enable_lets sampling queries check the cache first and reuse stored rowcounts when valid.cross_ query_ stats_ for_ sampling -
DROP SAMPLE FROM cross_invalidates sampling-related entries (for example, after data or workload changes), while preserving other feedback rowcount entries.query_ stats_ cache
Last modified: