Load TPC-H Data into SingleStore
On this page
This tutorial shows how to load the TPC-H dataset into your SingleStore deployment using Pipelines and then query it to obtain business-critical metrics.
What is TPC-H?
The TPC Benchmark™H (TPC-H) is a decision support benchmark, which consists of a suite of business oriented ad-hoc queries and concurrent data modifications.
Create Database and Tables
Connect to your SingleStore Helios deployment and run the following SQL commands to create a database named tpch and eight associated tables.
For SingleStore Helios deployments, you can also run these commands in the SQL Editor, select Cloud Portal > Develop > Data Studio > SQL Editor.
Note
The SQL Editor only runs the queries that you select, so ensure you have them all selected before selecting Run.
DROP DATABASE IF EXISTS tpch;CREATE DATABASE tpch;USE tpch;CREATE TABLE `customer` (`c_custkey` int(11) NOT NULL,`c_name` varchar(25) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,`c_address` varchar(40) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,`c_nationkey` int(11) NOT NULL,`c_phone` char(15) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,`c_acctbal` decimal(15,2) NOT NULL,`c_mktsegment` char(10) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,`c_comment` varchar(117) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,UNIQUE KEY pk (`c_custkey`) UNENFORCED RELY,SHARD KEY (`c_custkey`) USING CLUSTERED COLUMNSTORE);CREATE TABLE `lineitem` (`l_orderkey` bigint(11) NOT NULL,`l_partkey` int(11) NOT NULL,`l_suppkey` int(11) NOT NULL,`l_linenumber` int(11) NOT NULL,`l_quantity` decimal(15,2) NOT NULL,`l_extendedprice` decimal(15,2) NOT NULL,`l_discount` decimal(15,2) NOT NULL,`l_tax` decimal(15,2) NOT NULL,`l_returnflag` char(1) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,`l_linestatus` char(1) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,`l_shipdate` date NOT NULL,`l_commitdate` date NOT NULL,`l_receiptdate` date NOT NULL,`l_shipinstruct` char(25) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,`l_shipmode` char(10) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,`l_comment` varchar(44) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,UNIQUE KEY pk (`l_orderkey`, `l_linenumber`) UNENFORCED RELY,SHARD KEY (`l_orderkey`) USING CLUSTERED COLUMNSTORE);CREATE TABLE `nation` (`n_nationkey` int(11) NOT NULL,`n_name` char(25) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,`n_regionkey` int(11) NOT NULL,`n_comment` varchar(152) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,UNIQUE KEY pk (`n_nationkey`) UNENFORCED RELY,SHARD KEY (`n_nationkey`) USING CLUSTERED COLUMNSTORE);CREATE TABLE `orders` (`o_orderkey` bigint(11) NOT NULL,`o_custkey` int(11) NOT NULL,`o_orderstatus` char(1) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,`o_totalprice` decimal(15,2) NOT NULL,`o_orderdate` date NOT NULL,`o_orderpriority` char(15) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,`o_clerk` char(15) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,`o_shippriority` int(11) NOT NULL,`o_comment` varchar(79) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,UNIQUE KEY pk (`o_orderkey`) UNENFORCED RELY,SHARD KEY (`o_orderkey`) USING CLUSTERED COLUMNSTORE);CREATE TABLE `part` (`p_partkey` int(11) NOT NULL,`p_name` varchar(55) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,`p_mfgr` char(25) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,`p_brand` char(10) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,`p_type` varchar(25) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,`p_size` int(11) NOT NULL,`p_container` char(10) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,`p_retailprice` decimal(15,2) NOT NULL,`p_comment` varchar(23) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,UNIQUE KEY pk (`p_partkey`) UNENFORCED RELY,SHARD KEY (`p_partkey`) USING CLUSTERED COLUMNSTORE);CREATE TABLE `partsupp` (`ps_partkey` int(11) NOT NULL,`ps_suppkey` int(11) NOT NULL,`ps_availqty` int(11) NOT NULL,`ps_supplycost` decimal(15,2) NOT NULL,`ps_comment` varchar(199) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,UNIQUE KEY pk (`ps_partkey`,`ps_suppkey`) UNENFORCED RELY,SHARD KEY(`ps_partkey`),KEY (`ps_partkey`,`ps_suppkey`) USING CLUSTERED COLUMNSTORE);CREATE TABLE `region` (`r_regionkey` int(11) NOT NULL,`r_name` char(25) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,`r_comment` varchar(152) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,UNIQUE KEY pk (`r_regionkey`) UNENFORCED RELY,SHARD KEY (`r_regionkey`) USING CLUSTERED COLUMNSTORE);CREATE TABLE `supplier` (`s_suppkey` int(11) NOT NULL,`s_name` char(25) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,`s_address` varchar(40) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,`s_nationkey` int(11) NOT NULL,`s_phone` char(15) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,`s_acctbal` decimal(15,2) NOT NULL,`s_comment` varchar(101) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,UNIQUE KEY pk (`s_suppkey`) UNENFORCED RELY,SHARD KEY (`s_suppkey`) USING CLUSTERED COLUMNSTORE);
Load Data with Pipelines
This part of the tutorial shows how to ingest TPC-H data from a public AWS S3 bucket into the SingleStore database using pipelines.
Note
The SQL Editor only runs the queries that you select, so ensure you have them all selected before selecting Run.
-
Run the following SQL commands to create the pipelines:
USE tpch;CREATE OR REPLACE PIPELINE tpch_100_lineitemAS LOAD DATA S3 'memsql-tpch-dataset/sf_100/lineitem/'config '{"region":"us-east-1"}'SKIP DUPLICATE KEY ERRORSINTO TABLE lineitemFIELDS TERMINATED BY '|'LINES TERMINATED BY '|\n';CREATE OR REPLACE PIPELINE tpch_100_customerAS LOAD DATA S3 'memsql-tpch-dataset/sf_100/customer/'config '{"region":"us-east-1"}'SKIP DUPLICATE KEY ERRORSINTO TABLE customerFIELDS TERMINATED BY '|'LINES TERMINATED BY '|\n';CREATE OR REPLACE PIPELINE tpch_100_nationAS LOAD DATA S3 'memsql-tpch-dataset/sf_100/nation/'config '{"region":"us-east-1"}'SKIP DUPLICATE KEY ERRORSINTO TABLE nationFIELDS TERMINATED BY '|'LINES TERMINATED BY '|\n';CREATE OR REPLACE PIPELINE tpch_100_ordersAS LOAD DATA S3 'memsql-tpch-dataset/sf_100/orders/'config '{"region":"us-east-1"}'SKIP DUPLICATE KEY ERRORSINTO TABLE ordersFIELDS TERMINATED BY '|'LINES TERMINATED BY '|\n';CREATE OR REPLACE PIPELINE tpch_100_partAS LOAD DATA S3 'memsql-tpch-dataset/sf_100/part/'config '{"region":"us-east-1"}'SKIP DUPLICATE KEY ERRORSINTO TABLE partFIELDS TERMINATED BY '|'LINES TERMINATED BY '|\n';CREATE OR REPLACE PIPELINE tpch_100_partsuppAS LOAD DATA S3 'memsql-tpch-dataset/sf_100/partsupp/'config '{"region":"us-east-1"}'SKIP DUPLICATE KEY ERRORSINTO TABLE partsuppFIELDS TERMINATED BY '|'LINES TERMINATED BY '|\n';CREATE OR REPLACE PIPELINE tpch_100_regionAS LOAD DATA S3 'memsql-tpch-dataset/sf_100/region/'config '{"region":"us-east-1"}'SKIP DUPLICATE KEY ERRORSINTO TABLE regionFIELDS TERMINATED BY '|'LINES TERMINATED BY '|\n';CREATE OR REPLACE PIPELINE tpch_100_supplierAS LOAD DATA S3 'memsql-tpch-dataset/sf_100/supplier/'config '{"region":"us-east-1"}'SKIP DUPLICATE KEY ERRORSINTO TABLE supplierFIELDS TERMINATED BY '|'LINES TERMINATED BY '|\n'; -
Run the following SQL commands to start the pipelines:
USE tpch;START ALL PIPELINES;Once the Success message is returned for all the created pipelines, SingleStore starts ingesting the data from the S3 bucket.
Verify Pipeline Success
Query the pipelines_
information schema view to inspect the progress of the pipelines.tpch_
pipeline has finished ingesting data.
SELECT * FROM information_schema.pipelines_filesWHERE pipeline_name = "tpch_100_lineitem";
You can also run the pipelines in foreground to easily verify that all the data has been ingested.
START PIPELINE <pipeline_name> FOREGROUND;
For SingleStore Helios deployments, you can also monitor the progress of your pipelines on the Cloud Portal, select Deployments > <your_
Run Queries on Data
You can start running queries once the data is loaded into your SingleStore database.
Query 1: Pricing Summary Report
This query reports the total number of items, their prices, and average quantities shipped within 90 days of the most recent ship date (specified date) in the database.
USE tpch;SELECTl_returnflag,l_linestatus,SUM(l_quantity) AS sum_qty,SUM(l_extendedprice) AS sum_base_price,SUM(l_extendedprice * (1 - l_discount)) AS sum_disc_price,SUM(l_extendedprice * (1 - l_discount) * (1 + l_tax)) AS sum_charge,AVG(l_quantity) AS avg_qty,AVG(l_extendedprice) AS avg_price,AVG(l_discount) AS avg_disc,COUNT(*) AS count_orderFROMlineitemWHEREl_shipdate <= date('1998-12-01' - interval '90' day)GROUP BYl_returnflag,l_linestatusORDER BYl_returnflag,l_linestatus;
+--------------+--------------+--------------+-----------------+-------------------+---------------------+-----------+--------------+----------+-------------+
| l_returnflag | l_linestatus | sum_qty | sum_base_price | sum_disc_price | sum_charge | avg_qty | avg_price | avg_disc | count_order |
+--------------+--------------+--------------+-----------------+-------------------+---------------------+-----------+--------------+----------+-------------+
| A | F | 83289121.00 | 124912195605.43 | 118665316925.8745 | 123413447483.580151 | 25.491969 | 38231.377828 | 0.050007 | 3267269 |
| N | F | 2190539.00 | 3282604241.36 | 3118652485.5821 | 3243864574.415261 | 25.478790 | 38180.915863 | 0.049948 | 85975 |
| N | O | 164028717.00 | 245978182329.81 | 233676744963.8986 | 243022073269.855210 | 25.499480 | 38239.131169 | 0.050004 | 6432630 |
| R | F | 83339258.00 | 124970353864.21 | 118715983030.2238 | 123467940050.739462 | 25.517608 | 38264.613849 | 0.050047 | 3265951 |
+--------------+--------------+--------------+-----------------+-------------------+---------------------+-----------+--------------+----------+-------------+
4 rows in set (4.17 sec)
Note
Run the query again and observe the execution time difference.
Query 2: Shipping Priority
This query retrieves the 10 unshipped orders with the highest value.
USE tpch;SELECTl_orderkey,SUM(l_extendedprice * (1 - l_discount)) AS revenue,o_orderdate,o_shippriorityFROMcustomer,orders,lineitemWHEREc_mktsegment = 'BUILDING'AND c_custkey = o_custkeyAND l_orderkey = o_orderkeyAND o_orderdate < DATE('1995-03-15')AND l_shipdate > DATE('1995-03-15')GROUP BYl_orderkey,o_orderdate,o_shippriorityORDER BYrevenue DESC,o_orderdateLIMIT 5;
+------------+-------------+-------------+----------------+
| l_orderkey | revenue | o_orderdate | o_shippriority |
+------------+-------------+-------------+----------------+
| 81316036 | 459781.3856 | 1995-02-24 | 0 |
| 54427397 | 391221.7016 | 1995-02-28 | 0 |
| 82073798 | 390074.7002 | 1995-03-12 | 0 |
| 81872546 | 386608.3992 | 1995-02-28 | 0 |
| 45138020 | 386056.2188 | 1995-01-30 | 0 |
+------------+-------------+-------------+----------------+
5 rows in set (46.47 sec)
Query 3: Returned Item Reporting
This query identifies the top 20 customers that may be experiencing issues with the items shipped to them.
USE tpch;SELECTc_custkey,c_name,SUM(l_extendedprice * (1 - l_discount)) AS revenue,c_acctbal,n_name,c_address,c_phone,c_commentFROMcustomer,orders,lineitem,nationWHEREc_custkey = o_custkeyAND l_orderkey = o_orderkeyAND o_orderdate >= DATE('1993-10-01')AND o_orderdate < DATE('1993-10-01') + interval '3' monthAND l_returnflag = 'R'AND c_nationkey = n_nationkeyGROUP BYc_custkey,c_name,c_acctbal,c_phone,n_name,c_address,c_commentORDER BYrevenue descLIMIT 5;
+-----------+--------------------+-------------+-----------+-----------+---------------------------------------+-----------------+------------------------------------------------------------------------------------------------+
| c_custkey | c_name | revenue | c_acctbal | n_name | c_address | c_phone | c_comment |
+-----------+--------------------+-------------+-----------+-----------+---------------------------------------+-----------------+------------------------------------------------------------------------------------------------+
| 2490736 | Customer#002490736 | 473138.4854 | 4701.80 | RUSSIA | rjaIP,XKH,v0R 0u7Qyb | 32-234-811-3202 | s sleep carefully against the ironic pinto beans. blithely unusual deposits haggle furiously a |
| 7658659 | Customer#007658659 | 440156.4690 | 1442.76 | MOROCCO | ucA926oE6y1fOr,teHGthr6GXz6y qf1ySK3 | 25-418-150-7556 | nusual requests nag blithely according to the regular requests: pending, unusual |
| 11381863 | Customer#011381863 | 437190.0882 | 4397.51 | JORDAN | TFd7KvQQmf6XJjSQUBbvZ9rlxTUle1mwfqt9 | 23-407-202-8891 | lose blithely along the furiously pending sentiments. quickly final theodo |
| 4804573 | Customer#004804573 | 432390.5681 | 4930.12 | INDONESIA | LgPei3cnZL19sW8lM,qM | 19-664-439-8190 | ial accounts. blithely final platelets breach. furious |
| 2247622 | Customer#002247622 | 414271.9102 | 4292.76 | PERU | c5eYVZ7PXycKnsgLmCtarE5aCcWYga5u0RIwa | 27-403-465-9280 | ly. sly packages haggle. even theodolites wa |
+-----------+--------------------+-------------+-----------+-----------+---------------------------------------+-----------------+------------------------------------------------------------------------------------------------+
5 rows in set (37.33 sec)
Query 4: Product Type Profit Measure
This query calculates the profit made on a given line of items, broken down by supplier nation and year.
USE tpch;SELECTnation,o_year,SUM(amount) AS sum_profitFROM(SELECTn_name AS nation,EXTRACT(year FROM o_orderdate) AS o_year,l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity AS amountFROMpart,supplier,lineitem,partsupp,orders,nationWHEREs_suppkey = l_suppkeyAND ps_suppkey = l_suppkeyAND ps_partkey = l_partkeyAND p_partkey = l_partkeyAND o_orderkey = l_orderkeyAND s_nationkey = n_nationkeyAND p_name LIKE '%green%') AS profitGROUP BYnation,o_yearORDER BYnation,o_year DESCLIMIT 5;
+---------+--------+---------------+
| nation | o_year | sum_profit |
+---------+--------+---------------+
| ALGERIA | 1998 | 20613196.5474 |
| ALGERIA | 1997 | 37498211.6758 |
| ALGERIA | 1996 | 35471708.2126 |
| ALGERIA | 1995 | 36720114.9923 |
| ALGERIA | 1994 | 34921048.6736 |
+---------+--------+---------------+
5 rows in set (49.48 sec)
Analyze Query Performance
The SQL Editor allows you to record and visualize workload profiles at either the query or node level.
-
In SQL Editor, select Visual Explain and select Profile.
-
The Visual Explain window opens.
You can now explore the query and navigate around different operations.
Last modified: October 28, 2024