Analyzing Time Series Data

Time series data consists of sequences of events, where each event includes a timestamp. Examples include events generated by utilities, energy production infrastructure, financial applications, software services, and Internet of Things (IoT) devices. This topic explains how to ingest, structure, and query time series data in SingleStore.

Storing Time Series Data

SingleStore stores time series data in rowstore or columnstore tables. Each row must include a time-based attribute that records when the event occurred.

Use DATETIME(6) to define the time attribute when fractional-second precision is required.
Use DATETIME, if fractional-second precision is not required.
Use DATETIME(6) or DATETIME instead of TIMESTAMP, because timestamp values are limited to dates through 2038.
Applications must supply timestamp values directly. Do not rely on server-side defaults such as NOW().
By default, the time attribute uses the column name ts.

The following example creates a table that stores time series events generated by a wind turbine.

SQL

CREATE TABLE turbine_reading(
 tid int NOT NULL, -- turbine ID
 ts datetime(6) NOT NULL,
 rpm double,
 temperature double,
 vibration double,
 output double,
 wind_direction double,
 wind_speed double,
 SHARD(tid),
 SORT KEY(ts)
);

This table uses a columnstore table by default. Columnstore tables store data on disk, scale efficiently to large data sets, and provide the best performance for analytical queries that process large volumes of data. For most time series workloads, SingleStore recommends columnstore tables.

If a time series table contains a large number of attributes and queries frequently retrieve individual rows, consider adding a column group index to the columnstore table. A column group index improves row lookup performance without sacrificing the scalability, compression, and scan performance benefits of columnstore storage.

Consider using a rowstore table if either of the following conditions apply:

Your workload has a high update rate and the entire dataset fits in available memory.
You need low-latency point-in-time lookups using AS OF queries. Refer to Finding a Row current AS OF a Point in Time for more information. Rowstore query processing can efficiently locate a small number of rows within a narrow time range and return rows with many attributes more efficiently than columnstore processing for this access pattern.

SingleStore recommends creating a SORT KEY on the ts column because time series queries filter on ranges of timestamp values. For columnstore tables, a SORT KEY stores the data in timestamp order, which allows the query engine to process range filters efficiently through segment elimination. For rowstore tables, a KEY on the ts column creates an index that enables efficient processing of range filters.

For time series data sets with a large number of attributes, where applications retrieve all table attributes and the data fits within the available table memory, use a rowstore table to store the data. Rowstore query processing can efficiently locate a small number of rows within a narrow time range and return rows with many attributes more efficiently than columnstore query processing.

Descriptive Data

Store descriptive properties that remain unchanged across time series events in a separate table. For example, store information about individual turbines in a table such as the following:

SQL

CREATE REFERENCE TABLE turbine(
 tid int,
 name varchar(60),
 model varchar(60),
 max_output double,
 latitude double,
 longitude double,
 PRIMARY KEY(tid)
);

Use a reference table for small collections of descriptive properties. For larger collections, use a standard (partitioned) table.

The descriptive data table described in the previous section serves as a dimension table that links to the fact table containing the time series events. Dimensional modeling concepts commonly used in data warehouses also apply to time series data.

The following example uses the sample turbine data in the table turbine:

SQL

INSERT INTO turbine VALUES
 (1, 'Hood River A', 'Volkswind Mega 5', 5.0, 47.130, 113.187),
 (2, 'Hood River B', 'Volkswind Mega 5+', 5.3, 47.141, 113.199);

Ingesting Time Series Data

SingleStore supports multiple methods for ingesting time series data. Select the method that matches with your workload:

Pipelines: Use pipelines for continuous ingestion from files or Kafka topics and for high-volume data loads.
LOAD DATA: Use LOAD DATA for bulk loading historical time series data from files.
INSERT: Use INSERT for ingesting individual rows or small batches directly from an application.

You can use any combination of these methods. SingleStore optimizes data ingestion across all supported ingestion paths.

Time Series Functions

SingleStore provides a set of built-in functions designed to work together for time series analysis:

Function	Type	Purpose
`FIRST`	Aggregate	Returns the value associated with the earliest timestamp in a group
`LAST`	Aggregate	Returns the value associated with the latest timestamp in a group
`TIME_BUCKET`	Scalar	Groups timestamps into fixed-width intervals

In addition to time series functions, you can use the SERIES TIMESTAMP option in the CREATE TABLE command to designate a table column as the default timestamp for time series functions.These features eliminate the need for complex window functions and CTEs when performing common time series operations such as downsampling, open/close calculations, and time-based grouping.

The following example demonstrates the use of FIRST, LAST, TIME_BUCKET, and SERIES TIMESTAMP in a single query that downsamples 5-minute candlestick bars from raw trade data:

SQL

CREATE TABLE tick (
    ts DATETIME(6) SERIES TIMESTAMP,
    symbol VARCHAR(5),
    price NUMERIC(18,4),
    volume INT,
    SORT KEY (ts),
    SHARD KEY (symbol)
);

INSERT INTO tick VALUES
  ('2024-01-15 09:30:15.000000', 'AAPL', 185.20, 1500),
  ('2024-01-15 09:31:45.000000', 'AAPL', 185.45, 800),
  ('2024-01-15 09:33:20.000000', 'AAPL', 185.10, 1200),
  ('2024-01-15 09:35:50.000000', 'AAPL', 185.80, 2000),
  ('2024-01-15 09:37:10.000000', 'AAPL', 186.00, 900),
  ('2024-01-15 09:38:30.000000', 'AAPL', 185.90, 1100),
  ('2024-01-15 09:40:05.000000', 'AAPL', 186.20, 1800),
  ('2024-01-15 09:42:45.000000', 'AAPL', 186.50, 700),
  ('2024-01-15 09:44:15.000000', 'AAPL', 186.10, 1300);

SELECT TIME_BUCKET('5m') AS bucket,
       symbol,
       FIRST(price) AS open,
       LAST(price) AS close,
       MAX(price) AS high,
       MIN(price) AS low,
       SUM(volume) AS total_volume
FROM tick
WHERE symbol = 'AAPL'
  AND ts >= '2024-01-15 09:30:00'
  AND ts < '2024-01-15 16:00:00'
GROUP BY bucket, symbol
ORDER BY bucket;

+----------------------------+--------+----------+----------+----------+----------+--------------+
| bucket                     | symbol | open     | close    | high     | low      | total_volume |
+----------------------------+--------+----------+----------+----------+----------+--------------+
| 2024-01-15 09:30:00.000000 | AAPL   | 185.2000 | 185.1000 | 185.4500 | 185.1000 |         3500 |
| 2024-01-15 09:35:00.000000 | AAPL   | 185.8000 | 185.9000 | 186.0000 | 185.8000 |         4000 |
| 2024-01-15 09:40:00.000000 | AAPL   | 186.2000 | 186.1000 | 186.5000 | 186.1000 |         3800 |
+----------------------------+--------+----------+----------+----------+----------+--------------+

Querying Time Series Data

Continuing the wind turbine example from the previous section, assume the following data is inserted into the turbine_reading table:

SQL

INSERT INTO turbine_reading VALUES
 (1, '2020-03-14 13:00:33', 10, 33, 100, 1000000, 90, 15),
 (1, '2020-03-14 13:00:34', 10, 33, 100, 1000000, 90, 15),
 (1, '2020-03-14 13:00:35', 11, 33, 105, 1050000, 91, 16),
 (1, '2020-03-14 13:00:36', 11, 33.1, 104, 1000000, 90, 16),
 (2, '2020-03-14 13:00:33', 18, 30, 170, 2000000, 0, 23),
 (2, '2020-03-14 13:00:34', 18, 30, 170, 2000000, 0, 23),
 (2, '2020-03-14 13:00:35', 18.5, 30, 176, 2050000, 0, 23.5),
 (2, '2020-03-14 13:00:36', 19, 30.1, 174, 2070000, 1, 23.6),
 (1, '2020-03-15 13:00:33', 11, 32, 99, 1010000, 45, 15.1),
 (1, '2020-03-15 13:00:34', 11, 32, 99, 1020000, 45, 15.2),
 (1, '2020-03-15 13:00:35', 12, 32.1, 101, 1030000, 45, 15.2),
 (1, '2020-03-15 13:00:36', 13, 32.15, 102, 1030000, 46, 15.2);

The following query demonstrates how to calculate the average of all time series values in the table.

SQL

-- average RPM by turbine
SELECT tid, AVG(rpm)
FROM turbine_reading
GROUP BY tid;

+-----+----------+
| tid | AVG(rpm) |
+-----+----------+
|   2 |   18.375 |
|   1 |   11.125 |
+-----+----------+

Time Bucketing

Use the TIME_BUCKET function to group timestamps into fixed intervals and return the start time of each bucket.

The following example uses TIME_BUCKET to group time series data into 5-day intervals and calculate the average value for each interval:

SQL

SELECT tid, TIME_BUCKET("5d", ts), AVG(output) FROM turbine_reading GROUP BY 1, 2 ORDER BY 1, 2;

+-----+----------------------------+-------------+
| tid | TIME_BUCKET("5d", ts)      | AVG(output) |
+-----+----------------------------+-------------+
|   1 | 2020-03-13 00:00:00.000000 |     1017500 |
|   2 | 2020-03-13 00:00:00.000000 |     2030000 |
+-----+----------------------------+-------------+

To bucket data by day, one approach is to cast a high-resolution DATETIME(6) value to the DATE data type. To bucket data by a fixed interval of N seconds, convert the timestamp to a unix_timestamp, divide the result by N using the DIV operator, multiply the result by N, and then convert it back to a timestamp value. Dividing by N and then multiplying by N removes the remainder and produces a value that is evenly divisible by N. This value represents the start of the time bucket and provides a standardized timestamp for grouping data.

SQL

-- Find high, low, and average output for each turbine, bucketed by day,
-- sorted by day.
SELECT tid, ts :> date, MIN(output), MAX(output), AVG(output)
FROM turbine_reading
GROUP by 1, 2
ORDER BY 1, 2;

+-----+------------+-------------+-------------+-------------+
| tid | ts :> date | MIN(output) | MAX(output) | AVG(output) |
+-----+------------+-------------+-------------+-------------+
|   1 | 2020-03-14 |     1000000 |     1050000 |     1012500 |
|   1 | 2020-03-15 |     1010000 |     1030000 |     1022500 |
|   2 | 2020-03-14 |     2000000 |     2070000 |     2030000 |
+-----+------------+-------------+-------------+-------------+

SQL

-- Find high, low, and average output for each turbine,
-- bucketed by three second intervals, sorted by interval start time.

SELECT tid,
      from_unixtime(unix_timestamp(ts) DIV 3 * 3) as ts,
      MIN(output), MAX(output), AVG(output)
FROM turbine_reading
GROUP by 1, 2
ORDER BY 1, 2;

+-----+---------------------+-------------+-------------+--------------------+
| tid | ts                  | MIN(output) | MAX(output) | AVG(output)        |
+-----+---------------------+-------------+-------------+--------------------+
|   1 | 2020-03-14 13:00:33 |     1000000 |     1050000 | 1016666.6666666666 |
|   1 | 2020-03-14 13:00:36 |     1000000 |     1000000 |            1000000 |
|   1 | 2020-03-15 13:00:33 |     1010000 |     1030000 |            1020000 |
|   1 | 2020-03-15 13:00:36 |     1030000 |     1030000 |            1030000 |
|   2 | 2020-03-14 13:00:33 |     2000000 |     2050000 | 2016666.6666666667 |
|   2 | 2020-03-14 13:00:36 |     2070000 |     2070000 |            2070000 |
+-----+---------------------+-------------+-------------+--------------------+

Gap Filling

When you aggregate time series data into buckets, some intervals may not contain data points. These missing intervals create gaps in the output, which can cause issues for dashboards, alerting systems, and analytics pipelines that expect a continuous sequence of time buckets.

For example, if you bucket turbine readings into 1-second intervals over a 10-second window, but data exists only for seconds 33 through 36, the result contains 4 rows instead of 10.

Gap filling ensures that every expected time bucket appears in the result set. This approach uses three components:

A reference table of time buckets: A table that contains one row per time interval and covers the full range of interest without gaps.
A LEFT JOIN: A left outer join between the reference table and the aggregated data that preserves all time buckets in the output.
NVL for default values: A function that replaces NULL values (where no data exists) with a default value, such as 0.

Create a table that contains one row per time interval for the required time range. In a performance-intensive workload, consider creating time_days as a reference table to avoid any need to shuffle or broadcast during the LEFT JOIN. Because reference tables are replicated to all nodes, joins against them generally avoid network-intensive data redistribution of fact tables. For example, the following table stores one row per day:

SQL

CREATE REFERENCE TABLE time_days (
   ts DATETIME(6) NOT NULL,
   PRIMARY KEY (ts)
);

Populate the table with rows that cover the desired time range.

INSERT INTO time_days (ts) VALUES
   ('2020-03-14 00:00:00'),
   ('2020-03-15 00:00:00'),
   ('2020-03-16 00:00:00');

Use the turbine_reading table defined in Storing Time Series Data without gap filling, a standard bucketed aggregation returns only intervals where data exists:

SQL

SELECT
   TIME_BUCKET('1d', ts) AS bucket,
   tid,
   SUM(output) AS total_output,
   AVG(output) AS avg_output
FROM turbine_reading
WHERE ts >= '2020-03-14 00:00:00'
 AND ts < '2020-03-17 00:00:00'
GROUP BY bucket, tid
ORDER BY tid, bucket;

+----------------------------+-----+--------------+------------+
| bucket                     | tid | total_output | avg_output |
+----------------------------+-----+--------------+------------+
| 2020-03-14 00:00:00.000000 |   1 |      4050000 |    1012500 |
| 2020-03-15 00:00:00.000000 |   1 |      4090000 |    1022500 |
| 2020-03-14 00:00:00.000000 |   2 |      8120000 |    2030000 |
+----------------------------+-----+--------------+------------+

Note that turbine 2 is missing a row for 2020-03-15, and neither turbine has a row for 2020-03-16. A dashboard plotting this data would show discontinuous lines or missing data points for those intervals.

The following query fills the gaps using a LEFT JOIN to the reference table:

SQL

WITH filtered AS (
  SELECT *
  FROM turbine_reading
  WHERE ts >= '2020-03-14 00:00:00'
    AND ts < '2020-03-17 00:00:00'
),
turbines AS (
  SELECT DISTINCT tid
  FROM filtered
),
agg AS (
  SELECT
      TIME_BUCKET('1d', ts) AS bucket,
      tid,
      SUM(output) AS total_output,
      AVG(output) AS avg_output
  FROM filtered
  GROUP BY 1, 2
),
ref AS (
  SELECT
      TIME_BUCKET('1d', td.ts) AS bucket,
      t.tid
  FROM time_days td
  CROSS JOIN turbines t
  WHERE td.ts >= '2020-03-14 00:00:00'
    AND td.ts < '2020-03-17 00:00:00'
)
SELECT
  ref.bucket,
  ref.tid,
  NVL(agg.total_output, 0) AS total_output,
  NVL(agg.avg_output, 0) AS avg_output
FROM ref
LEFT JOIN agg
  ON agg.bucket = ref.bucket
 AND agg.tid = ref.tid
ORDER BY ref.tid, ref.bucket;

+----------------------------+-----+--------------+------------+
| bucket                     | tid | total_output | avg_output |
+----------------------------+-----+--------------+------------+
| 2020-03-14 00:00:00.000000 |   1 |      4050000 |    1012500 |
| 2020-03-15 00:00:00.000000 |   1 |      4090000 |    1022500 |
| 2020-03-16 00:00:00.000000 |   1 |            0 |          0 |
| 2020-03-14 00:00:00.000000 |   2 |      8120000 |    2030000 |
| 2020-03-15 00:00:00.000000 |   2 |            0 |          0 |
| 2020-03-16 00:00:00.000000 |   2 |            0 |          0 |
+----------------------------+-----+--------------+------------+

Every turbine now has a row for every day in the range. Intervals without data show 0 instead of being absent from the result.

This query works as follows:

The ref subquery generates every combination of time bucket and turbine ID by cross-joining the reference table (bucketed using TIME_BUCKET) with the distinct turbine IDs in the data range. This ensures a row exists for every turbine in every time bucket in the interval.
The agg subquery computes the actual aggregates (SUM, AVG) grouped by bucket and turbine.
The LEFT JOIN preserves all rows from ref, including the intervals without readings. The NVL function replaces the resulting NULL values with 0.

Gap filling is useful in the following scenarios:

When visualizing time series in charts where gaps would break line continuity or mislead viewers.
When feeding data to monitoring or alerting systems that expect values at regular intervals.
When performing calculations (such as moving averages or cumulative sums) that require a complete sequence of equally-spaced data points.
When exporting data to external systems that do not handle missing intervals gracefully.

The gap-filling approach uses NVL to replace missing intervals with a default value of 0. Use interpolation to estimate values from surrounding data points instead of using a fixed default.

Interpolation

A time series may contain gaps that you want to fill so that a data point exists at every interval for a chosen time granularity. Gaps commonly occur when you convert a time series with data points at irregular intervals into one with data points at regular intervals by bucketing the data. For example, if data points arrive at random intervals approximately every half second, some one-second intervals may contain no data points. When you bucket the data into one-second intervals, those intervals appear as gaps in the resulting time series.

You can interpolate missing data points by using a stored procedure. The following example demonstrates this approach with a simple set of stock ticks. In this example, the procedure interpolates data points that are missing from a time series that has already been bucketed into one-second intervals.

SQL

DROP TABLE IF EXISTS tick_interpolation;
CREATE TABLE tick_interpolation(ts datetime(6), symbol varchar(5),
  price numeric(18,4));
INSERT INTO tick_interpolation VALUES
 ('2019-02-18 10:55:36.000000', 'ABC', 100.00),
 ('2019-02-18 10:55:37.000000', 'ABC', 102.00),
 ('2019-02-18 10:55:40.000000', 'ABC', 103.00),
 ('2019-02-18 10:55:42.000000', 'ABC', 104.00);

DELIMITER //
CREATE OR REPLACE PROCEDURE driver() AS
DECLARE
 q query(ts datetime(6), symbol varchar(5), price numeric(18,4));
BEGIN
 q = SELECT ts, symbol, price FROM tick_interpolation ORDER BY ts;
 ECHO SELECT 'Input time series' AS message;
 ECHO SELECT * FROM q ORDER BY ts;
 ECHO SELECT 'Interpolated time series' AS message;
 CALL interpolate_ts(q);
END //
DELIMITER ;

DELIMITER //
CREATE OR REPLACE PROCEDURE interpolate_ts(
 q query(ts datetime(6), symbol varchar(5), price numeric(18,4)))
   -- Important: q must produce sorted output by ts
AS
DECLARE
 c array(record(ts datetime(6), symbol varchar(5), price numeric(18,4)));
 r record(ts datetime(6), symbol varchar(5), price numeric(18,4));
 r_next record(ts datetime(6), symbol varchar(5), price numeric(18,4));
 n int;
 i int;
 _ts datetime(6); _symbol varchar(5); _price numeric(18,4);
 time_diff int;
 delta numeric(18,4);
BEGIN
 DROP TABLE IF EXISTS tmp;
 CREATE TEMPORARY TABLE tmp LIKE tick_interpolation;
 c = collect(q);
 n = length(c);
 IF n < 2 THEN
   ECHO SELECT * FROM q ORDER BY ts;
   return;
 END IF;

 i = 0;
 r = c[i];
 r_next = c[i + 1];


 WHILE (i < n) LOOP
   -- IF at last row THEN output it and exit
   IF i = n - 1 THEN
     _ts = r.ts; _symbol = r.symbol; _price = r.price;
     INSERT INTO tmp VALUES(_ts, _symbol, _price);
     i += 1;
     CONTINUE;
   END IF;


   time_diff = unix_timestamp(r_next.ts) - unix_timestamp(r.ts);


   IF time_diff <= 0 THEN
     RAISE user_exception("time series not sorted or has duplicate timestamps");
   END IF;


   -- output r
   _ts = r.ts; _symbol = r.symbol; _price = r.price;
   INSERT INTO tmp VALUES(_ts, _symbol, _price);


   IF time_diff = 1 THEN
     r = r_next; -- advance to next row
   ELSIF time_diff > 1 THEN
     -- output time_diff-1 rows by extending current row and interpolating price
     delta = (r_next.price - r.price) / time_diff;
     FOR j in 1..time_diff-1 LOOP
       _ts += 1; _price += delta;
       INSERT INTO tmp VALUES(_ts, _symbol, _price);
     END LOOP;
     r = r_next; -- advance to next row
   ELSE
     RAISE user_exception("time series not sorted");
   END IF;


   i += 1;
   IF i < n - 1 THEN r_next = c[i + 1]; END IF;
 END LOOP;
 ECHO SELECT * FROM tmp ORDER BY ts;
 DROP TABLE tmp;
END //
DELIMITER ;

The driver() procedure produces the following output:

SQL

CALL driver();

+-------------------+
| message           |
+-------------------+
| Input time series |
+-------------------+
1 row in set (0.02 sec)

+----------------------------+--------+----------+
| ts                         | symbol | price    |
+----------------------------+--------+----------+
| 2019-02-18 10:55:36.000000 | ABC    | 100.0000 |
| 2019-02-18 10:55:37.000000 | ABC    | 102.0000 |
| 2019-02-18 10:55:40.000000 | ABC    | 103.0000 |
| 2019-02-18 10:55:42.000000 | ABC    | 104.0000 |
+----------------------------+--------+----------+
4 rows in set (0.06 sec)

+--------------------------+
| message                  |
+--------------------------+
| Interpolated time series |
+--------------------------+
1 row in set (0.16 sec)

+----------------------------+--------+----------+
| ts                         | symbol | price    |
+----------------------------+--------+----------+
| 2019-02-18 10:55:36.000000 | ABC    | 100.0000 |
| 2019-02-18 10:55:37.000000 | ABC    | 102.0000 |
| 2019-02-18 10:55:38.000000 | ABC    | 102.3333 |
| 2019-02-18 10:55:39.000000 | ABC    | 102.6666 |
| 2019-02-18 10:55:40.000000 | ABC    | 103.0000 |
| 2019-02-18 10:55:41.000000 | ABC    | 103.5000 |
| 2019-02-18 10:55:42.000000 | ABC    | 104.0000 |
+----------------------------+--------+----------+
7 rows in set (0.16 sec)

The output fills the gaps between 37 and 40 seconds and between 40 and 42 seconds with linearly interpolated values.

Smoothing

Smoothing reduces noise in volatile signals so trends are easier to see. Use AVG as a window function to smooth time series data. The following query returns the output value and the moving average calculated over the current row and the previous row for a specified date.

SQL

SELECT tid, ts, output, AVG(output) OVER w
FROM turbine_reading
WHERE DATE(ts) = '2020-03-14'
WINDOW w as (PARTITION BY tid ORDER BY ts
            ROWS BETWEEN 1 PRECEDING AND CURRENT ROW)
ORDER BY 1, 2;

+-----+----------------------------+---------+--------------------+
| tid | ts                         | output  | AVG(output) OVER w |
+-----+----------------------------+---------+--------------------+
|   1 | 2020-03-14 13:00:33.000000 | 1000000 |            1000000 |
|   1 | 2020-03-14 13:00:34.000000 | 1000000 |            1000000 |
|   1 | 2020-03-14 13:00:35.000000 | 1050000 |            1025000 |
|   1 | 2020-03-14 13:00:36.000000 | 1000000 |            1025000 |
|   2 | 2020-03-14 13:00:33.000000 | 2000000 |            2000000 |
|   2 | 2020-03-14 13:00:34.000000 | 2000000 |            2000000 |
|   2 | 2020-03-14 13:00:35.000000 | 2050000 |            2025000 |
|   2 | 2020-03-14 13:00:36.000000 | 2070000 |            2060000 |
+-----+----------------------------+---------+--------------------+

Finding a Row Current AS OF a Point in Time

A common time series operation is to retrieve the row that is current AS OF a specific point in time. The following query uses ORDER BY and LIMIT to identify that row:

SQL

-- find turbine reading for tid 1 that is current
-- AS OF 2020-03-14 13:00:35.5
SELECT *
FROM turbine_reading
WHERE ts <= '2020-03-14 13:00:35.5'
AND tid = 1
ORDER BY ts DESC
LIMIT 1;

+-----+----------------------------+------+-------------+-----------+---------+----------------+------------+
| tid | ts                         | rpm  | temperature | vibration | output  | wind_direction | wind_speed |
+-----+----------------------------+------+-------------+-----------+---------+----------------+------------+
|   1 | 2020-03-14 13:00:35.000000 |   11 |          33 |       105 | 1050000 |             91 |         16 |
+-----+----------------------------+------+-------------+-----------+---------+----------------+------------+

You can use EXPLAIN to view the query plan for this query. The query plan uses the index on ts to seek directly to the target timestamp and scan matching rows in reverse order, returning the row current as of a specified timestamp in O(log n) time.

This optimization requires a rowstore table because columnstore tables do not support index-based top-1 seeks. If your workload relies heavily on AS OF lookups, consider using a rowstore table instead of the default columnstore table.

You can implement equivalent AS OF queries on columnstore tables by using MAX(ts) with a window function or a correlated subquery. However, these approaches typically scan more data and may have higher query latency.

To find the current row for each turbine as of a specific point in time, use the stored procedure shown in the following example.

SQL

DELIMITER //
CREATE OR REPLACE PROCEDURE get_turbine_readings_as_of(_ts datetime(6))
AS
DECLARE
 q_turbines QUERY(tid int) = SELECT tid FROM turbine;
 a ARRAY(RECORD(tid int));
 _tid int;
BEGIN
 DROP TABLE IF EXISTS r;
 CREATE TEMPORARY TABLE r LIKE turbine_reading;

 a = COLLECT(q_turbines);
 FOR x IN a LOOP
   _tid = x.tid;
   INSERT INTO r
     SELECT *
     FROM turbine_reading t
     WHERE t.tid = _tid
     AND ts <= _ts
     ORDER BY ts DESC
     LIMIT 1;
 END LOOP;
 ECHO SELECT * FROM r ORDER BY tid;
 DROP TABLE r;
END //
DELIMITER ;

CALL get_turbine_readings_as_of('2020-03-14 13:00:35.5');

+-----+----------------------------+------+-------------+-----------+---------+----------------+------------+
| tid | ts                         | rpm  | temperature | vibration | output  | wind_direction | wind_speed |
+-----+----------------------------+------+-------------+-----------+---------+----------------+------------+
|   1 | 2020-03-14 13:00:35.000000 |   11 |          33 |       105 | 1050000 |             91 |         16 |
|   2 | 2020-03-14 13:00:35.000000 | 18.5 |          30 |       176 | 2050000 |              0 |       23.5 |
+-----+----------------------------+------+-------------+-----------+---------+----------------+------------+

Managing the Life Cycle of Time Series Data

Manage the lifecycle of time series data by moving older data from a rowstore table to a columnstore table when the data grows beyond available memory. When the data is no longer needed, remove it with the DELETE statement. Bulk DELETE of large sets of rows is extremely fast in SingleStore, unlike legacy B-tree based database systems. Refer to this blog for related information.

Analyzing Time Series Data

On this page

Storing Time Series Data

Descriptive Data

Ingesting Time Series Data

Time Series Functions

Querying Time Series Data

Time Bucketing

Gap Filling

Interpolation

Smoothing

Finding a Row Current AS OF a Point in Time

Managing the Life Cycle of Time Series Data

Was this article helpful?

On this page

Was this article helpful?

Analyzing Time Series Data

On this page

Storing Time Series Data

Descriptive Data

Ingesting Time Series Data

Time Series Functions

Querying Time Series Data

Time Bucketing

Gap Filling

Interpolation

Smoothing

Finding a Row Current AS OF a Point in Time

Managing the Life Cycle of Time Series Data

Related Topics

Was this article helpful?

On this page

Was this article helpful?