Read Query Results in Parallel

In SingleStore, the results of a SELECT statement can be read in parallel by multiple readers. Each reader queries a unique partition, which contains a slice of the result set.

Parallel query reads are intended for use by client applications that query data from SingleStore. These client applications need to connect to the master aggregator to run parallel query reads.

A parallel query read can be coded in your client application logic using the following steps, in order.

  1. Create a result table to store the result set whose data the application will query in parallel.

    CREATE [MATERIALIZED] RESULT TABLE <result_table_name> AS SELECT ... FROM <table(s)>;

    The result table can be created in one of two read modes:

    • Single-read mode: Each reader can read the query results, from a specified partition, only once.

    • Multiple-read mode: Each reader can read the query results, from a specified partition, more than once. Also, multiple readers can read from the same partition. In this mode, creating the result table is slower and consumes more memory, as compared to single-read mode. To create the result table in multiple-read mode, specify the MATERIALIZED option.

    If the contents of <table(s)> change after the result table is created, the contents of the result table are not updated.

    Note

    The connection that is used to create the result table (both non-materialized and materialized) must be open while a parallel read is in progress. Once the connection used to run CREATE RESULT TABLE ... is closed, the result table is dropped.

  2. From each reader, query a partition in the result table that was created in the previous step.

    SELECT * FROM :: <result_table_name> WHERE partition_id() = <partition_id>;

    There are some limits on the query shapes that are supported by the aggregator result table:

    • SELECT from an aggregator results table. It can only include * or a list of columns. There should be only exactly one partition_id() = <partition> filter. It can include an optional partition_row_id() comparison operator filter (=, <, >, <=, >=, or !=). For example:

      SELECT * FROM :: t1_result_table WHERE partition_id() = 1 and partition_row_id() < 4;
    • INSERT INTO … SELECT from an aggregator result table into a sharded table. It can only include * or a list of columns. The aggregator result table and the sharded table should have a matching shard key or an error will be generated. Filters are not allowed. For example:

      INSERT INTO t1 (colint, colchar, colst) SELECT * FROM :: t1_result_table;

    Note

    In single-read mode, each reader needs to run in a separate parallel process or thread. Also, in single-read mode, each reader must initiate its read query before any readers will receive data.

    To find the number of partitions in a database, run:

    SELECT num_partitions FROM information_schema.DISTRIBUTED_DATABASES WHERE database_name = '<database_name>';
  3. When readers no longer read from the result table, drop the result table.

    DROP RESULT TABLE <result_table_name>;

    DROP RESULT TABLE can be run on the master aggregator or a child aggregator.

Last modified: February 27, 2024

Was this article helpful?