Load Data and Run Queries into SingleStore
On this page
This tutorial shows how to create a database, load sample data, and run queries using SingleStore Helios workspace.
It tutorial includes the following steps:
-
Create a SingleStore Helios workspace.
-
Create a database and a table.
-
Create an S3 pipeline and run it to load approximately 18 million rows of sample sales data for a retail store into the SingleStore database.
-
Run queries against the loaded data.
Create a SingleStore Helios Workspace
If a workspace already exists, skip to the next step to create the database.
Create a Database and a Table
Connect to your SingleStore deployment and run the following SQL commands to create a database named sales and associated table.
For SingleStore Helios deployments, you can also run these commands in the SQL Editor, select Cloud Portal > Develop > Data Studio > SQL Editor.
Note
The SQL Editor only runs the queries that you select, so ensure you have them all selected before selecting Run.
CREATE DATABASE sales;USE sales;CREATE TABLE SalesData (Date TEXT CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci,Store_ID BIGINT(20) DEFAULT NULL,ProductID TEXT CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci,Product_Name TEXT CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci,Product_Category TEXT CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci,Quantity_Sold BIGINT(20) DEFAULT NULL,Price FLOAT DEFAULT NULL,Total_Sales FLOAT DEFAULT NULL);
Create and Run an S3 Pipeline
This step shows how to ingest the sample sales data from a public AWS S3 bucket into the SingleStore database using pipelines.
Note
For this example, you must set data_
to a value of 7.
or lower.
Run the following SQL command to create the pipeline:
CREATE PIPELINE SalesData_Pipeline ASLOAD DATA S3 's3://singlestoreloaddata/SalesData/*.csv'CONFIG '{ "region": "ap-south-1" }'INTO TABLE SalesDataFIELDS TERMINATED BY ','LINES TERMINATED BY '\n'IGNORE 1 lines;
Run the following SQL command to start the pipeline:
START PIPELINE SalesData_Pipeline;
The pipeline ingests 18 million rows of data, which takes some time.
Verify the Pipeline Success
Query the pipelines_
information schema view to inspect the progress of the pipelines.SalesData_
pipeline has finished ingesting data.
SELECT * FROM information_schema.pipelines_filesWHERE pipeline_name = "SalesData_Pipeline";
You can also run the pipelines in the foreground to easily verify that all the data has been ingested.
START PIPELINE <pipeline_name> FOREGROUND;
For SingleStore Helios deployments, you can also monitor the progress of your pipelines on the Cloud Portal, select Deployments > <your_
Run Queries on the Data
Once the data is loaded into your SingleStore database, the following queries can be used on the dataset.
Query 1: Total Sales for Each Product Category
This query finds the total sales for each product category.
SELECT Product_Category, SUM(Total_Sales) AS Total_Category_SalesFROM SalesDataGROUP BY Product_CategoryORDER BY Total_Category_Sales DESC;
+------------------+----------------------+
| Product_Category | Total_Category_Sales |
+------------------+----------------------+
| Toys | 2363715564 |
| Groceries | 2363682480 |
| Pharmacy | 2363608712 |
| Furniture | 2362957132 |
| Clothing | 2361497888 |
| Electronics | 2360690400 |
| Pet Supplies | 2358230168 |
+------------------+----------------------+
Query 2: Top Five Best-Selling Products by Quantity Sold
This query identifies the top five products with the highest quantity sold.
SELECT Product_Name, SUM(Quantity_Sold) AS Total_Quantity_SoldFROM SalesDataGROUP BY Product_NameORDER BY Total_Quantity_Sold DESCLIMIT 5;
+--------------+---------------------+
| Product_Name | Total_Quantity_Sold |
+--------------+---------------------+
| Coats | 1355284 |
| Jeans | 1352520 |
| Vests | 1348256 |
| Sweaters | 1347210 |
| Jackets | 1346336 |
+--------------+---------------------+
Query 3: Monthly Sales Trend for a Specific Year
This query shows the monthly sales in the year 2024.
SELECT DATE_FORMAT(Date, '%Y-%m') AS Month, SUM(Total_Sales) AS Monthly_SalesFROM SalesDataWHERE YEAR(Date) = 2024GROUP BY MonthORDER BY Month;
+---------+--------------------+
| Month | Monthly_Sales |
+---------+--------------------+
| 2024-01 | 1398745836.2047148 |
| 2024-02 | 1307658583.4984026 |
| 2024-03 | 1402409728.1016903 |
| 2024-04 | 1353704241.8081808 |
| 2024-05 | 1401236442.2752619 |
| 2024-06 | 1354872518.1564188 |
| 2024-07 | 90234796.49502373 |
+---------+--------------------+
Query 4: Top Three Stores with the Highest Sales in the Electronics Category
This query identifies the top three stores with the highest sales in the Electronics category.
SELECT Store_ID, SUM(Total_Sales) AS Electronics_SalesFROM SalesDataWHERE Product_Category = 'Electronics'GROUP BY Store_IDORDER BY Electronics_Sales DESCLIMIT 3;
+----------+-------------------+
| Store_ID | Electronics_Sales |
+----------+-------------------+
| 1080 | 23858836.625 |
| 1014 | 23779521.1875 |
| 1062 | 23777309.65625 |
+----------+-------------------+
Query 5: Total Revenue by Product Category for a Specific Store
This query finds the total sales revenue for each product category for a specific store (Store ID 1084).
SELECTStore_ID,Product_Category,SUM(Total_Sales) AS Total_RevenueFROMSalesDataWHEREStore_ID = 1084GROUP BYStore_ID, Product_Category;
+----------+------------------+----------------+
| Store_ID | Product_Category | Total_Revenue |
+----------+------------------+----------------+
| 1084 | Pharmacy | 23310525.40625 |
| 1084 | Groceries | 23128682.625 |
| 1084 | Electronics | 23364940.28125 |
| 1084 | Clothing | 23447722.09375 |
| 1084 | Furniture | 23373544.15625 |
| 1084 | Toys | 23508566.8125 |
| 1084 | Pet Supplies | 23458592.53125 |
+----------+------------------+----------------+
Query 6: Top Three Most Profitable Days by Total Sales
This query identifies the top three days with the highest total sales across all stores.
SELECT Date, SUM(Total_Sales) AS Daily_Total_SalesFROM SalesDataGROUP BY DateORDER BY Daily_Total_Sales DESCLIMIT 3;
+------------+-------------------+
| Date | Daily_Total_Sales |
+------------+-------------------+
| 2024-03-12 | 46100396.875 |
| 2023-10-04 | 46064106.5625 |
| 2023-08-10 | 45997283.8125 |
+------------+-------------------+
Query 7: Yearly Sales Growth Comparison
This query compares the total sales by year to analyze growth.
SELECT YEAR(Date) AS Year, SUM(Total_Sales) AS Annual_SalesFROM SalesDataGROUP BY YEAR(Date)ORDER BY Year;
+------+--------------+
| Year | Annual_Sales |
+------+--------------+
| 2023 | 8225510272 |
| 2024 | 8308821056 |
+------+--------------+
Query 8: Most Expensive Product Sold
This query finds the most expensive product sold, along with its details.
SELECT Product_Name, Price, Total_SalesFROM SalesDataORDER BY Price DESCLIMIT 1;
+--------------+-------+-------------+
| Product_Name | Price | Total_Sales |
+--------------+-------+-------------+
| Corn | 100 | 700 |
+--------------+-------+-------------+
Query 9: Average Quantity Sold per Product Category
This query calculates the average quantity sold for each product category.
SELECT Product_Category, AVG(Quantity_Sold) AS Average_Quantity_SoldFROM SalesDataGROUP BY Product_Category;
+------------------+-----------------------+
| Product_Category | Average_Quantity_Sold |
+------------------+-----------------------+
| Pet Supplies | 10.4948 |
| Toys | 10.4935 |
| Electronics | 10.5035 |
| Groceries | 10.5019 |
| Pharmacy | 10.4952 |
| Clothing | 10.4956 |
| Furniture | 10.4928 |
+------------------+-----------------------+
Last modified: November 21, 2024