Quickstart Loading Data and Running Queries
Welcome to SingleStoreDB Cloud! We put together this guide to walk you through using the Portal to create a database, load sample data quickly, and then run some queries against that data.
Specifically, this guide walks you through:
Creating a SingleStoreDB Cloud workspace.
Creating a database and a table.
Creating an S3 Pipeline and running that pipeline to load sample data into your database. The data we are loading is approximately 20 million rows of sales data for a cosmetics store.
Running queries against the data you loaded.
Create a SingleStoreDB Cloud Workspace
If you already have a workspace, feel free to skip to the next step and create the database.
Go to https://portal.singlestore.com and follow these instructions to create a workspace. A free trial workspace is all you need for this example, but you can also create a new paid workspace or a new database in your existing workspace.
Create a Database and Table
Use the SQL Editor to run the following commands. Paste in the text, highlight it, and then click the Run button. All of the SQL queries for this and the following sections are also available online here: https://github.com/singlestore-labs/getting-started-with-singlestore/blob/main/init.sql
Create the database:
-- create a database CREATE DATABASE cosmeticshop;
Create the table:
-- create a table USE cosmeticshop; CREATE TABLE cosmeticshopfunnel ( event_time TIMESTAMP, event_type CHAR(18), product_id CHAR(10), category_id CHAR(20), category_code TINYTEXT, brand TINYTEXT, price FLOAT, user_id TINYTEXT, user_session TINYTEXT );
Create and Run an S3 Pipeline
Again, run these commands in the SQL Editor.
First, create the pipeline:
-- create a pipeline to ingest the data in AWS S3 CREATE or REPLACE PIPELINE cosmeticshoppipe AS LOAD DATA S3 's3://studiotutorials/eCommerce/*' CONFIG '{"region": "us-east-1"}' INTO TABLE `cosmeticshopfunnel` FIELDS TERMINATED BY ',' ENCLOSED BY '"';
Next, run the pipeline. Because the pipeline is ingesting 20 million rows of data, this process will take a moment.
-- start the pipeline START PIPELINE cosmeticshoppipe;
You can check on the progress of the pipeline by checking to see how many rows are in the table.
-- see how many events have been ingested SELECT COUNT(*) FROM cosmeticshopfunnel;
Once you rows are loaded, you can verify the data has been ingested correctly by retrieving some rows.
-- see the data that has been ingested SELECT * FROM cosmeticshopfunnel LIMIT 100;
You should see 100 rows of cosmetics sales data. The queries in the next section show some possible uses for the data.
Run Queries on the Data
The following queries show some possible ways to interact with the data.
-- create a holiday reference table to store all holiday dates CREATE REFERENCE TABLE holidays ( holiday TINYTEXT, date_of_holiday DATE PRIMARY KEY ); -- insert holiday dates INSERT INTO holidays VALUES ("New Year's Day", "2020-1-1"), ("Martin Luther King Jr. Day", "2020-02-20"), ("Memorial Day", "2020-05-25"), ("Independence Day", "2020-07-04"), ("Labor Day", "2020-09-07"), ("Veterans Day", "2019-11-11"), ("Thanksgiving", "2019-11-28"), ("Christmas Day", "2019-12-25"); SELECT * FROM holidays; -- find out which holiday has the most activity, interesting results! SELECT holiday, count(holiday) FROM cosmeticshopfunnel JOIN (SELECT holiday, DATE_SUB(date_of_holiday, INTERVAL 3 DAY) AS beforedate, DATE_ADD(date_of_holiday, INTERVAL 3 DAY) AS afterdate FROM holidays) ON event_time > beforedate and event_time < afterdate GROUP BY holiday ORDER BY count(holiday) desc; -- find out which is the top brand purchased during each of the holidays SELECT holiday, brand, count(brand) FROM cosmeticshopfunnel JOIN (SELECT holiday, DATE_SUB(date_of_holiday, INTERVAL 3 DAY) AS beforedate, DATE_ADD(date_of_holiday, INTERVAL 3 DAY) AS afterdate FROM holidays) ON event_time > beforedate AND event_time < afterdate WHERE event_type = "purchase" AND brand != "" GROUP BY holiday ORDER BY count(holiday) desc; -- find out if customers are sensitive to the average price of the brands SELECT brand, AVG(price), COUNT(event_type) FROM cosmeticshopfunnel WHERE brand != "" GROUP BY brand ORDER BY count(event_type) desc; -- find out which brands have been removed from cart the most SELECT brand, COUNT(brand) COUNT FROM cosmeticshopfunnel WHERE event_type = "remove_from_cart" AND brand != "" GROUP BY brand ORDER BY COUNT desc; -- find out which brands have been purchased the most SELECT brand, COUNT(brand) AS c FROM cosmeticshopfunnel WHERE event_type = "purchase" GROUP BY brand ORDER BY c desc; -- find out which categories have been purchased the most SELECT DISTINCT category_code, COUNT(category_code) AS c FROM cosmeticshopfunnel WHERE event_type = "purchase" GROUP BY category_code ORDER BY c desc; -- find out which brands have been purchased the most SELECT brand, COUNT(brand) AS c FROM cosmeticshopfunnel WHERE event_type = "purchase" GROUP BY brand ORDER BY c desc; -- find out which product_id has been the most removed from cart SELECT product_id, COUNT(product_id) AS c FROM cosmeticshopfunnel WHERE event_type = "remove_from_cart" GROUP BY product_id ORDER BY c desc;