Load Data and Run Queries into SingleStore
On this page
This tutorial shows how to create a database, load sample data, and run queries using SingleStore Helios workspace.
It tutorial includes the following steps:
-
Create a SingleStore Helios workspace.
-
Create a database and a table.
-
Create an S3 pipeline and run it to load approximately 18 million rows of sample sales data for a retail store into the SingleStore database.
-
Run queries against the loaded data.
Create a SingleStore Helios Workspace
If a workspace already exists, skip to the next step to create the database.
Create a Database and a Table
Connect to your SingleStore deployment and run the following SQL commands to create a database named sales and associated table.
For SingleStore Helios deployments, you can also run these commands in the SQL Editor, select Cloud Portal > Editor > Open SQL Editor.
Note
The SQL Editor only runs the queries that you select, so ensure you have them all selected before selecting Run.
CREATE DATABASE sales;USE sales;CREATE TABLE SalesData (Date TEXT COLLATE utf8mb4_bin,Store_ID BIGINT(20) DEFAULT NULL,ProductID TEXT COLLATE utf8mb4_bin,Product_Name TEXT COLLATE utf8mb4_bin,Product_Category TEXT utf8mb4_bin,Quantity_Sold BIGINT(20) DEFAULT NULL,Price FLOAT DEFAULT NULL,Total_Sales FLOAT DEFAULT NULL);
Create and Run an S3 Pipeline
This step shows how to ingest the sample sales data from a public AWS S3 bucket into the SingleStore database using pipelines.
Note
For this example, you must set data_ to a value of 7. or lower.
Run the following SQL command to create the pipeline:
CREATE PIPELINE SalesData_Pipeline ASLOAD DATA S3 's3://singlestoreloaddata/SalesData/*.csv'CONFIG '{ "region": "ap-south-1" }'INTO TABLE SalesDataFIELDS TERMINATED BY ','LINES TERMINATED BY '\n'IGNORE 1 lines;
Run the following SQL command to start the pipeline:
START PIPELINE SalesData_Pipeline;
The pipeline ingests 18 million rows of data, which takes some time.
Verify the Pipeline Success
Query the pipelines_ information schema view to inspect the progress of the pipelines.SalesData_ pipeline has finished ingesting data.
SELECT * FROM information_schema.pipelines_filesWHERE pipeline_name = "SalesData_Pipeline";
You can also run the pipelines in the foreground to easily verify that all the data has been ingested.
START PIPELINE <pipeline_name> FOREGROUND;
For SingleStore Helios deployments, you can also monitor the progress of your pipelines on the Cloud Portal, select Ingestion > Pipelines.
Run Queries on the Data
Once the data is loaded into your SingleStore database, the following queries can be used on the dataset.
Query 1: Total Sales for Each Product Category
This query finds the total sales for each product category.
SELECT Product_Category, SUM(Total_Sales) AS Total_Category_SalesFROM SalesDataGROUP BY Product_CategoryORDER BY Total_Category_Sales DESC;
+------------------+----------------------+
| Product_Category | Total_Category_Sales |
+------------------+----------------------+
| Furniture | 473073264 |
| Clothing | 472839616 |
| Toys | 472607632 |
| Groceries | 472568544 |
| Pharmacy | 472009800 |
| Electronics | 471887584 |
| Pet Supplies | 470901324 |
+------------------+----------------------+Query 2: Top Five Best-Selling Products by Quantity Sold
This query identifies the top five products with the highest quantity sold.
SELECT Product_Name, SUM(Quantity_Sold) AS Total_Quantity_SoldFROM SalesDataGROUP BY Product_NameORDER BY Total_Quantity_Sold DESCLIMIT 5;
+--------------+---------------------+
| Product_Name | Total_Quantity_Sold |
+--------------+---------------------+
| Coats | 542856 |
| Jeans | 541885 |
| Vests | 539984 |
| Sweaters | 539455 |
| Jackets | 538003 |
+--------------+---------------------+Query 3: Monthly Sales Trend for a Specific Year
This query shows the monthly sales in the year 2024.
SELECT DATE_FORMAT(Date, '%Y-%m') AS Month, SUM(Total_Sales) AS Monthly_SalesFROM SalesDataWHERE YEAR(Date) = 2024GROUP BY MonthORDER BY Month;
+---------+--------------------+
| Month | Monthly_Sales |
+---------+--------------------+
| 2024-01 | 559552342.1877503 |
| 2024-02 | 523234602.4183636 |
| 2024-03 | 560848984.9077282 |
| 2024-04 | 542002073.6334558 |
| 2024-05 | 560440739.4689851 |
| 2024-06 | 541728606.9727273 |
| 2024-07 | 36056514.9656029 |
+---------+--------------------+Query 4: Top Three Stores with the Highest Sales in the Electronics Category
This query identifies the top three stores with the highest sales in the Electronics category.
SELECT Store_ID, SUM(Total_Sales) AS Electronics_SalesFROM SalesDataWHERE Product_Category = 'Electronics'GROUP BY Store_IDORDER BY Electronics_Sales DESCLIMIT 3;
+----------+-------------------+
| Store_ID | Electronics_Sales |
+----------+-------------------+
| 1080 | 23858838.78125 |
| 1014 | 23779519.59375 |
| 1062 | 23777307.40625 |
+----------+-------------------+Query 5: Total Revenue by Product Category for a Specific Store
This query finds the total sales revenue for each product category for a specific store (Store ID 1084).
SELECTStore_ID,Product_Category,SUM(Total_Sales) AS Total_RevenueFROMSalesDataWHEREStore_ID = 1084GROUP BYStore_ID, Product_Category;
+----------+------------------+----------------+
| Store_ID | Product_Category | Total_Revenue |
+----------+------------------+----------------+
| 1084 | Toys | 23508569.87500 |
| 1084 | Electronics | 23364933.40625 |
| 1084 | Pharmacy | 23310525.65625 |
| 1084 | Furniture | 23373549.37500 |
| 1084 | Groceries | 23128677.03125 |
| 1084 | Pet Supplies | 23458593.21875 |
| 1084 | Clothing | 23447724.09375 |
+----------+------------------+----------------+Query 6: Top Three Most Profitable Days by Total Sales
This query identifies the top three days with the highest total sales across all stores.
SELECT Date, SUM(Total_Sales) AS Daily_Total_SalesFROM SalesDataGROUP BY DateORDER BY Daily_Total_Sales DESCLIMIT 3;
+------------+-------------------+
| Date | Daily_Total_Sales |
+------------+-------------------+
| 2024-03-12 | 46100391.87500 |
| 2023-10-04 | 46064107.06250 |
| 2023-08-10 | 45997295.31250 |
+------------+-------------------+Query 7: Yearly Sales Growth Comparison
This query compares the total sales by year to analyze growth.
SELECT YEAR(Date) AS Year, SUM(Total_Sales) AS Annual_SalesFROM SalesDataGROUP BY YEAR(Date)ORDER BY Year;
+------+--------------+
| Year | Annual_Sales |
+------+--------------+
| 2023 | 8225504096 |
| 2024 | 8308831376 |
+------+--------------+Query 8: Most Expensive Product Sold
This query finds the most expensive product sold, along with its details.
SELECT Product_Name, Price, Total_SalesFROM SalesDataORDER BY Price DESCLIMIT 1;
+----------------------+-------+-------------+
| Product_Name | Price | Total_Sales |
+----------------------+-------+-------------+
| Ginkgo Biloba 120 mg | 100 | 700 |
+----------------------+-------+-------------+Query 9: Average Quantity Sold per Product Category
This query calculates the average quantity sold for each product category.
SELECT Product_Category, AVG(Quantity_Sold) AS Average_Quantity_SoldFROM SalesDataGROUP BY Product_Category;
+------------------+-----------------------+
| Product_Category | Average_Quantity_Sold |
+------------------+-----------------------+
| Groceries | 10.5019 |
| Pet Supplies | 10.4948 |
| Clothing | 10.4956 |
| Electronics | 10.5035 |
| Pharmacy | 10.4952 |
| Toys | 10.4935 |
| Furniture | 10.4928 |
+------------------+-----------------------+Last modified: