The first time a query is executed against MemSQL, it is compiled and cached in memory (see Code Generation).
User queries are always directed to an aggregator. Except for DDL operations and writes to reference tables, which must go through the master aggregator, any query can be run against any aggregator.
Queries that involve only reference tables are executed directly on the aggregator. The aggregator does not send these queries to the leaves because reference tables are fully replicated to every aggregator and leaf.
Queries that involve sharded tables are more involved.
In the simplest case, the query involves only data on one partition, so that the query can be forwarded to the correct leaf unchanged except for rewriting the database name to reflect the partition. An example is
INSERT INTO db.table VALUES (15). If the row maps to partition 3 of
db, the query can be rewritten as
INSERT INTO db_3.table VALUES (15)and forwarded to the correct leaf.
If the query involves data that spans more than one partition, however, then the aggregator combines results from many leaves. For example,
SELECT COUNT(*) from twill send a
COUNT(*)to each partition, sum up the responses, and return the final result as one row to the user.
Some queries involve significantly more query transformation and aggregation logic, but they follow the same general flow. Note that running EXPLAIN on a query will show the planned division of labor between aggregator and leaves, including the rewritten queries that will be sent to the leaves.