Load Data and Run Queries into SingleStore

This tutorial shows how to create a database, load sample data, and run queries using SingleStore Helios workspace.

It tutorial includes the following steps:

  1. Create a SingleStore Helios workspace.

  2. Create a database and a table.

  3. Create an S3 pipeline and run it to load approximately 18 million rows of sample sales data for a retail store into the SingleStore database.

  4. Run queries against the loaded data.

Create a SingleStore Helios Workspace

If a workspace already exists, skip to the next step to create the database. Navigate to the Cloud Portal and follow these instructions to create a workspace.

Create a Database and a Table

Connect to your SingleStore deployment and run the following SQL commands to create a database named sales and associated table. Free shared tier users cannot create new databases.

For SingleStore Helios deployments, you can also run these commands in the SQL Editor, select Cloud Portal > Develop > Data Studio > SQL Editor.

Note

The SQL Editor only runs the queries that you select, so ensure you have them all selected before selecting Run.

CREATE DATABASE sales;
USE sales;
CREATE TABLE SalesData (
Date TEXT CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci,
Store_ID BIGINT(20) DEFAULT NULL,
ProductID TEXT CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci,
Product_Name TEXT CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci,
Product_Category TEXT CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci,
Quantity_Sold BIGINT(20) DEFAULT NULL,
Price FLOAT DEFAULT NULL,
Total_Sales FLOAT DEFAULT NULL
);

Create and Run an S3 Pipeline

This step shows how to ingest the sample sales data from a public AWS S3 bucket into the SingleStore database using pipelines. Pipelines enable you to extract, transform, and load external data without needing third-party tools or middleware.

Note

For this example, you must set data_conversion_compatibility_level to a value of 7.5 or lower.

Run the following SQL command to create the pipeline:

CREATE PIPELINE SalesData_Pipeline AS
LOAD DATA S3 's3://singlestoreloaddata/SalesData/*.csv'
CONFIG '{ "region": "ap-south-1" }'
INTO TABLE SalesData
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
IGNORE 1 lines;

Run the following SQL command to start the pipeline:

START PIPELINE SalesData_Pipeline;

The pipeline ingests 18 million rows of data, which takes some time.

Verify the Pipeline Success

Query the pipelines_files information schema view to inspect the progress of the pipelines. The following command to check whether the SalesData_Pipeline pipeline has finished ingesting data.

SELECT * FROM information_schema.pipelines_files
WHERE pipeline_name = "SalesData_Pipeline";

You can also run the pipelines in the foreground to easily verify that all the data has been ingested.

START PIPELINE <pipeline_name> FOREGROUND;

For SingleStore Helios deployments, you can also monitor the progress of your pipelines on the Cloud Portal, select Deployments > <your_workspace_group> > Databases > sales > Pipelines.

Run Queries on the Data

Once the data is loaded into your SingleStore database, the following queries can be used on the dataset.

Query 1: Total Sales for Each Product Category

This query finds the total sales for each product category.

SELECT Product_Category, SUM(Total_Sales) AS Total_Category_Sales
FROM SalesData
GROUP BY Product_Category
ORDER BY Total_Category_Sales DESC;
+------------------+----------------------+
| Product_Category | Total_Category_Sales |
+------------------+----------------------+
| Toys             |           2363715564 |
| Groceries        |           2363682480 |
| Pharmacy         |           2363608712 |
| Furniture        |           2362957132 |
| Clothing         |           2361497888 |
| Electronics      |           2360690400 |
| Pet Supplies     |           2358230168 |
+------------------+----------------------+

Query 2: Top Five Best-Selling Products by Quantity Sold

This query identifies the top five products with the highest quantity sold.

SELECT Product_Name, SUM(Quantity_Sold) AS Total_Quantity_Sold
FROM SalesData
GROUP BY Product_Name
ORDER BY Total_Quantity_Sold DESC
LIMIT 5;
+--------------+---------------------+
| Product_Name | Total_Quantity_Sold |
+--------------+---------------------+
| Coats        |             1355284 |
| Jeans        |             1352520 |
| Vests        |             1348256 |
| Sweaters     |             1347210 |
| Jackets      |             1346336 |
+--------------+---------------------+

Query 3: Monthly Sales Trend for a Specific Year

This query shows the monthly sales in the year 2024.

SELECT DATE_FORMAT(Date, '%Y-%m') AS Month, SUM(Total_Sales) AS Monthly_Sales
FROM SalesData
WHERE YEAR(Date) = 2024
GROUP BY Month
ORDER BY Month;
+---------+--------------------+
| Month   | Monthly_Sales      |
+---------+--------------------+
| 2024-01 | 1398745836.2047148 |
| 2024-02 | 1307658583.4984026 |
| 2024-03 | 1402409728.1016903 |
| 2024-04 | 1353704241.8081808 |
| 2024-05 | 1401236442.2752619 |
| 2024-06 | 1354872518.1564188 |
| 2024-07 |  90234796.49502373 |
+---------+--------------------+

Query 4: Top Three Stores with the Highest Sales in the Electronics Category

This query identifies the top three stores with the highest sales in the Electronics category.

SELECT Store_ID, SUM(Total_Sales) AS Electronics_Sales
FROM SalesData
WHERE Product_Category = 'Electronics'
GROUP BY Store_ID
ORDER BY Electronics_Sales DESC
LIMIT 3;
+----------+-------------------+
| Store_ID | Electronics_Sales |
+----------+-------------------+
|     1080 |      23858836.625 |
|     1014 |     23779521.1875 |
|     1062 |    23777309.65625 |
+----------+-------------------+

Query 5: Total Revenue by Product Category for a Specific Store

This query finds the total sales revenue for each product category for a specific store (Store ID 1084).

SELECT
Store_ID,
Product_Category,
SUM(Total_Sales) AS Total_Revenue
FROM
SalesData
WHERE
Store_ID = 1084
GROUP BY
Store_ID, Product_Category;
+----------+------------------+----------------+
| Store_ID | Product_Category | Total_Revenue  |
+----------+------------------+----------------+
|     1084 | Pharmacy         | 23310525.40625 |
|     1084 | Groceries        |   23128682.625 |
|     1084 | Electronics      | 23364940.28125 |
|     1084 | Clothing         | 23447722.09375 |
|     1084 | Furniture        | 23373544.15625 |
|     1084 | Toys             |  23508566.8125 |
|     1084 | Pet Supplies     | 23458592.53125 |
+----------+------------------+----------------+

Query 6: Top Three Most Profitable Days by Total Sales

This query identifies the top three days with the highest total sales across all stores.

SELECT Date, SUM(Total_Sales) AS Daily_Total_Sales
FROM SalesData
GROUP BY Date
ORDER BY Daily_Total_Sales DESC
LIMIT 3;
+------------+-------------------+
| Date       | Daily_Total_Sales |
+------------+-------------------+
| 2024-03-12 |      46100396.875 |
| 2023-10-04 |     46064106.5625 |
| 2023-08-10 |     45997283.8125 |
+------------+-------------------+

Query 7: Yearly Sales Growth Comparison

This query compares the total sales by year to analyze growth.

SELECT YEAR(Date) AS Year, SUM(Total_Sales) AS Annual_Sales
FROM SalesData
GROUP BY YEAR(Date)
ORDER BY Year;
+------+--------------+
| Year | Annual_Sales |
+------+--------------+
| 2023 |   8225510272 |
| 2024 |   8308821056 |
+------+--------------+

Query 8: Most Expensive Product Sold

This query finds the most expensive product sold, along with its details.

SELECT Product_Name, Price, Total_Sales
FROM SalesData
ORDER BY Price DESC
LIMIT 1;
+--------------+-------+-------------+
| Product_Name | Price | Total_Sales |
+--------------+-------+-------------+
| Corn         |   100 |         700 |
+--------------+-------+-------------+

Query 9: Average Quantity Sold per Product Category

This query calculates the average quantity sold for each product category.

SELECT Product_Category, AVG(Quantity_Sold) AS Average_Quantity_Sold
FROM SalesData
GROUP BY Product_Category;
+------------------+-----------------------+
| Product_Category | Average_Quantity_Sold |
+------------------+-----------------------+
| Pet Supplies     |               10.4948 |
| Toys             |               10.4935 |
| Electronics      |               10.5035 |
| Groceries        |               10.5019 |
| Pharmacy         |               10.4952 |
| Clothing         |               10.4956 |
| Furniture        |               10.4928 |
+------------------+-----------------------+

Last modified: November 21, 2024

Was this article helpful?