Load Data and Run Queries into SingleStore

Welcome to SingleStore Helios! We put together this guide to walk you through using the Cloud Portal to create a database, load sample data quickly, and then run some queries against that data.

Specifically, this guide walks you through:

  1. Creating a SingleStore Helios workspace.

  2. Creating a database and a table.

  3. Creating an S3 Pipeline and running that pipeline to load sample data into your database. The data we are loading is approximately 20 million rows of sales data for a cosmetics store.

  4. Running queries against the data you loaded.

Create a SingleStore Helios Workspace

If you already have a workspace, feel free to skip to the next step and create the database.

Go to the Cloud Portal and follow these instructions to create a workspace. A free trial workspace is all you need for this example, but you can also create a new paid workspace or a new database in your existing workspace.

Create a Database and Table

Use the SQL Editor to run the following commands. Paste in the text, highlight it, and then click the Run button. All of the SQL queries for this and the following sections are also available online on Github.

Note

Free shared tier users cannot create new databases.

Create the database:

-- create a database
CREATE DATABASE cosmeticshop;

Create the table:

-- create a table
USE cosmeticshop;
CREATE TABLE cosmeticshopfunnel
(
event_time TIMESTAMP,
event_type CHAR(18),
product_id CHAR(10),
category_id CHAR(20),
category_code TINYTEXT,
brand TINYTEXT,
price FLOAT,
user_id TINYTEXT,
user_session TINYTEXT
);

Create and Run an S3 Pipeline

Again, run these commands in the SQL Editor.

Note

For this example, you must set data_conversion_compatibility_level to a value of 7.5 or lower.

First, create the pipeline:

-- create a pipeline to ingest the data in AWS S3
CREATE or REPLACE PIPELINE cosmeticshoppipe
AS LOAD DATA S3 's3://studiotutorials/eCommerce/*'
CONFIG '{"region": "us-east-1"}'
INTO TABLE `cosmeticshopfunnel`
FIELDS TERMINATED BY ',' ENCLOSED BY '"';

Next, run the pipeline. Because the pipeline is ingesting 20 million rows of data, this process will take a moment.

-- start the pipeline
START PIPELINE cosmeticshoppipe;

You can check on the progress of the pipeline by checking to see how many rows are in the table.

-- see how many events have been ingested
SELECT COUNT(*) FROM cosmeticshopfunnel;

Once you rows are loaded, you can verify the data has been ingested correctly by retrieving some rows.

-- see the data that has been ingested
SELECT * FROM cosmeticshopfunnel LIMIT 100;

You should see 100 rows of cosmetics sales data. The queries in the next section show some possible uses for the data.

Run Queries on the Data

The following queries show some possible ways to interact with the data.

-- create a holiday reference table to store all holiday dates
CREATE REFERENCE TABLE holidays
(
holiday TINYTEXT,
date_of_holiday DATE PRIMARY KEY
);
-- insert holiday dates
INSERT INTO holidays VALUES
("New Year's Day", "2020-1-1"),
("Martin Luther King Jr. Day", "2020-02-20"),
("Memorial Day", "2020-05-25"),
("Independence Day", "2020-07-04"),
("Labor Day", "2020-09-07"),
("Veterans Day", "2019-11-11"),
("Thanksgiving", "2019-11-28"),
("Christmas Day", "2019-12-25");
SELECT * FROM holidays;
-- find out which holiday has the most activity, interesting results!
SELECT holiday, count(holiday) FROM cosmeticshopfunnel
JOIN (SELECT holiday, DATE_SUB(date_of_holiday, INTERVAL 3 DAY) AS beforedate, DATE_ADD(date_of_holiday, INTERVAL 3 DAY) AS afterdate FROM holidays)
ON event_time > beforedate and event_time < afterdate
GROUP BY holiday
ORDER BY count(holiday) desc;
-- find out which is the top brand purchased during each of the holidays
SELECT holiday, brand, count(brand) FROM cosmeticshopfunnel
JOIN (SELECT holiday, DATE_SUB(date_of_holiday, INTERVAL 3 DAY) AS beforedate, DATE_ADD(date_of_holiday, INTERVAL 3 DAY) AS afterdate FROM holidays)
ON event_time > beforedate AND event_time < afterdate
WHERE event_type = "purchase" AND brand != ""
GROUP BY holiday
ORDER BY count(holiday) desc;
-- find out if customers are sensitive to the average price of the brands
SELECT brand, AVG(price), COUNT(event_type) FROM cosmeticshopfunnel
WHERE brand != ""
GROUP BY brand
ORDER BY count(event_type) desc;
-- find out which brands have been removed from cart the most
SELECT brand, COUNT(brand) COUNT FROM cosmeticshopfunnel
WHERE event_type = "remove_from_cart" AND brand != ""
GROUP BY brand ORDER BY COUNT desc;
-- find out which brands have been purchased the most
SELECT brand, COUNT(brand) AS c FROM cosmeticshopfunnel
WHERE event_type = "purchase"
GROUP BY brand
ORDER BY c desc;
-- find out which categories have been purchased the most
SELECT DISTINCT category_code, COUNT(category_code) AS c FROM cosmeticshopfunnel
WHERE event_type = "purchase" GROUP BY category_code ORDER BY c desc;
-- find out which brands have been purchased the most
SELECT brand, COUNT(brand) AS c FROM cosmeticshopfunnel
WHERE event_type = "purchase" GROUP BY brand ORDER BY c desc;
-- find out which product_id has been the most removed from cart
SELECT product_id, COUNT(product_id) AS c FROM cosmeticshopfunnel
WHERE event_type = "remove_from_cart" GROUP BY product_id ORDER BY c desc;

Last modified: October 28, 2024

Was this article helpful?