Get Started with the SingleStore Snowpark App

Note

This is a Preview feature.

Introduction

SingleStore is a distributed SQL database with sub-second (and often < 100 or even < 10 ms) performance at scale across various challenging workloads, even when high concurrency is required. These include transactions, point reads and writes, real-time ingestion, relational and JSON analytics, full-text search, vector search, and more.

Snowpark Container Services (SPCS) is a fully managed container offering designed to facilitate the deployment, management, and scaling of containerized applications within Snowflake. This service allows containerized workloads to run directly within Snowflake, preventing the need to move data out of the Snowflake environment for processing.

SingleStore, in combination with Snowflake, provides a very capable foundation to power the next generation of intelligent apps and operational workloads. With SingleStore’s Snowflake Native App, combining streaming data with Iceberg data is now an option for providing real-time analytics to applications built on Snowflake. Best of all, SingleStore’s data platform can be utilized without shifting the entire database out of Snowflake.

You can get the SingleStore Native App for Snowflake, SINGLESTORE_DB_APP, from the Snowflake Marketplace. It facilitates the creation, management, and scaling of SingleStore clusters from within your Snowflake deployment. The SingleStore app includes a set of Docker images, helper SQL scripts, and a comprehensive interface for managing clusters.

The following diagram is an overview of the full system architecture.

Prerequisites

Request Access

You must request access to the SingleStore app before you can use it. Access can only be granted to non-trial accounts.

  1. Locate the SingleStore app in the Snowflake Marketplace.

    Option 1: From the Snowflake Portal

    • Log into the Snowflake Portal.

    • In the search bar, search for SingleStore and click Marketplace to limit the search results to the Snowflake Marketplace.

    Option 2: Not logged into the Snowflake Portal

    • Navigate to the Snowflake Marketplace and click Browse Listings.

    • In the search bar, search for SingleStore.

  2. In the search results, click SingleStore’s Real-time Database for Intelligent Applications.

  3. On the Real-time Database for Intelligent Applications page, click Request. A form is displayed. Fill out the form with the requested information and click Request.

  4. After your request has been approved, the SingleStore app appears under Data Products > Apps.

  5. Click Get to download the SingleStore app. A confirmation dialog is displayed.

    Select the warehouse in which to install the SingleStore app and click Get.

    Once installed, the SingleStore app appears in Data Products > Apps > Installed Apps.

  6. Click on the SingleStore app.

    Under Grant Account Privileges, review the requested privileges and click Grant.

    Under Allow Connections, click Review. Review the requested connections and click Connect.

    When done, click Activate.

Set Up

  1. After the SingleStore app has been activated, click Launch app to launch it.

  2. After a few moments, a Welcome to SingleStore page is displayed.

    Click Start Dev Cluster to create a cluster.

  3. On the Create Cluster page, create a cluster.

    • Provide a name for this cluster. Choose a unique name that will help identify the type of workload connected with this cluster.

    • Select a cluster size to fit the workload. Larger clusters are available on request.

      Dev

      Small

      Master Aggregators

      1

      1

      Child Aggregators

      2

      2

      vCPUs per Aggregator

      1

      3

      RAM per Aggregator (GB)

      6

      13

      Storage per Aggregator (GB)

      128

      128

      Leaf Nodes

      2

      2

      vCPU per Leaf Node

      1

      6

      RAM per Leaf Node (GB)

      6

      28

      Storage per Leaf Node (GB)

      128

      512

    • Provide a password for the default database user.

      Each cluster starts with a default database user named root, which requires a password to connect to the cluster.

  4. Click Create Cluster. A progress bar is displayed as the cluster is created. It may take a few minutes to create the cluster.

Network Configuration for Loading Data

The default networking rules only allow outbound HTTP traffic from a SingleStore cluster. To load data from a data source, the endpoint of the data source must be explicitly set via the following rule.

USE ROLE ACCOUNTADMIN;
ALTER NETWORK RULE SINGLESTORE_DB_APP_APP_DATA.CONFIGURATION.SINGLESTORE_DB_APP_ALL_ACCESS_EAI_NETWORK_RULE SET
VALUE_LIST=('<data-source-endpoint-or-IP-address>:<port>')

For example:

USE ROLE ACCOUNTADMIN;
ALTER NETWORK RULE SINGLESTORE_DB_APP_APP_DATA.CONFIGURATION.SINGLESTORE_DB_APP_ALL_ACCESS_EAI_NETWORK_RULE SET
VALUE_LIST=('192.168.1.2:443')

Loading Data into SingleStore

The SQL Editor, which is included with each cluster, performs like a worksheet, where users can add multiple SQL queries and then select which queries to run. By default, the SQL Editor only runs the highlighted query. However, multiple queries can be highlighted and are then run sequentially.

To access the SQL Editor, which is part of SingleStore Studio:

  1. Launch the SingleStore app.

  2. Select your cluster from the drop-down.

  3. Click Go to SingleStore Studio under the Advanced dropdown.

  4. Connect to the selected cluster by using the root user and the password that you specified when the cluster was created.

    Note

    If an error is displayed while launching SingleStore Studio, try opening the link (as shown in the address bar) in a private/incognito window. You may be prompted to log into Snowflake.

Refer to SingleStore Studio SQL Editor for more information.

Unlimited Storage Databases

An unlimited storage database, also known as a "bottomless" database, is kept on remote object storage. The use of remote storage separates where data is stored (in an object store external to the SingleStore cluster) from where the data is processed (within a SingleStore cluster).

As unlimited storage databases are stored remotely, their size is not limited by the size of local cluster storage but by the available external object storage. On public cloud object stores, storage is effectively unlimited.

To create an unlimited storage database, use the {FOR | INTO | ON} <object_store_settings> option. Note that FOR, INTO, and ON are synonyms and their functionality is identical.

Prior to creating an unlimited storage database, the enable_bottomless engine variable must be set to ON, which is the default value.

The following example creates an unlimited storage database.

CREATE DATABASE bottomless_db ON S3 "bottomless_db_bucket/bottomless_db_folder"
CONFIG '{"region":"us-east-1", "verify_ssl":false}'
CREDENTIALS '{"aws_access_key_id":"your_access_key_id","aws_secret_access_key":"your_secret_access_key"}';

Refer to Local and Unlimited Database Storage Concepts for more information on using unlimited storage databases.

Amazon S3

SingleStore pipelines can extract objects from S3 buckets, optionally transform them, and insert them into a destination table. Refer to the Amazon S3 documentation for S3’s core concepts and the terminology.

The following is an example of how to load data from S3 via SingleStore pipeline. You can use the SingleStore Studio SQL Editor to enter and run the following SQL statements.

  1. Create a SalesData table.

    CREATE TABLE `SalesData` (
    `Date` text CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci,
    `Store_ID` bigint(20) DEFAULT NULL,
    `ProductID` text CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci,
    `Product_Name` text CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci,
    `Product_Category` text CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci,
    `Quantity_Sold` bigint(20) DEFAULT NULL,
    `Price` float DEFAULT NULL,
    `Total_Sales` float DEFAULT NULL
    );
  2. Set a timeout before creating the SalesData pipeline.

    SingleStore recommends setting the global engine variable pipelines_extractor_get_offsets_timeout_ms to 200000 to prevent timeouts and to provide an adequate amount of time to process the pipeline. This variable should be adjusted higher if the pipeline encounters timeouts.

    SET GLOBAL pipelines_extractor_get_offsets_timeout_ms = 200000;
  3. Define a SalesData pipeline to load data from S3 into SingleStore.

    CREATE PIPELINE SalesData_Pipeline AS
    LOAD DATA S3 's3://singlestoreloaddata/SalesData/*.csv'
    CONFIG '{ \"region\": \"ap-south-1\" }'
    INTO TABLE SalesData
    FIELDS TERMINATED BY ','
    LINES TERMINATED BY '\n'
    IGNORE 1 lines;
  4. Start the pipeline.

    START PIPELINE SalesData_Pipeline;
  5. Check for pipeline errors.

    SELECT * FROM information_schema.PIPELINES_ERRORS WHERE pipeline_name = 'SalesData_Pipeline';
  6. Confirm that this sample data was loaded.

    SELECT * FROM SalesData;

Refer to Load Data from Amazon Web Services (AWS) S3 for more information.

Kafka

Apache Kafka is an open-source distributed event-streaming platform for high-performance data pipelines, analytics, data integration, and mission-critical applications.

Similar to S3, data from Kafka can be loaded into SingleStore via pipeline. For example, and borrowing from the Ridesharing Demo, the following SQL creates a Kafka pipeline to ingest trip data in real time. You can use the SingleStore Studio SQL Editor to enter and run the following SQL statements.

Note: The default networking rules only allow outbound HTTP traffic from a SingleStore cluster. To load the following sample data from Kafka, the following hostname must be explicitly set for each broker in Snowflake.

USE ROLE ACCOUNTADMIN;
ALTER NETWORK RULE SINGLESTORE_DB_APP_APP_DATA.CONFIGURATION.SINGLESTORE_DB_APP_ALL_ACCESS_EAI_NETWORK_RULE SET
VALUELTSTVALUE_LIST=(
'0.0.0.0:443',
'0.0.0.0:80*',
'pkc-rgm37.us-west-2.aws.confluent.cloud:9092',
'b0-pkc-rgm37.us-west-2.aws.confluent.cloud:9092',
'b1-pkc-rgm37.us-west-2.aws.confluent.cloud:9092',
'b2-pkc-rgm37.us-west-2.aws.confluent.cloud:9092',
'b3-pkc-rgm37.us-west-2.aws.confluent.cloud:9092',
'b4-pkc-rgm37.us-west-2.aws.confluent.cloud:9092',
'b5-pkc-rgm37.us-west-2.aws.confluent.cloud:9092'
}
)
DROP PIPELINE IF EXISTS rideshare_kafka_trips;
CREATE OR REPLACE PIPELINE rideshare_kafka_trips AS
LOAD DATA KAFKA 'pkc-rgm37.us-west-2.aws.confluent.cloud:9092/ridesharing-sim-trips'
CONFIG '{"sasl.username": "username",
"sasl.mechanism": "PLAIN",
"security.protocol": "SASL_SSL",
"ssl.ca.location": "/etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem"}'
CREDENTIALS '{"sasl.password": "password"}'
DISABLE OUT_OF_ORDER OPTIMIZATION
REPLACE INTO TABLE trips
FORMAT JSON
(
id <- id,
rider_id <- rider_id,
driver_id <- driver_id,
status <- status,
@request_time <- request_time,
@accept_time <- accept_time,
@pickup_time <- pickup_time,
@dropoff_time <- dropoff_time,
fare <- fare,
distance <- distance,
pickup_lat <- pickup_lat,
pickup_long <- pickup_long,
dropoff_lat <- dropoff_lat,
dropoff_long <- dropoff_long,
city <- city
)
SET request_time = STR_TO_DATE(@request_time, '%Y-%m-%dT%H:%i:%s.%f'),
accept_time = STR_TO_DATE(@accept_time, '%Y-%m-%dT%H:%i:%s.%f'),
pickup_time = STR_TO_DATE(@pickup_time, '%Y-%m-%dT%H:%i:%s.%f'),
dropoff_time = STR_TO_DATE(@dropoff_time, '%Y-%m-%dT%H:%i:%s.%f');
START PIPELINE rideshare_kafka_trips;
-- Create a riders table and kafka pipeline that consumes the ridesharing-sim-riders topic and upserts data.
DROP TABLE IF EXISTS riders;
CREATE TABLE riders (
id VARCHAR(255) NOT NULL,
first_name VARCHAR(255),
last_name VARCHAR(255),
email VARCHAR(255),
phone_number VARCHAR(255),
date_of_birth DATETIME(6),
created_at DATETIME(6),
location_city VARCHAR(255),
location_lat DOUBLE,
location_long DOUBLE,
status VARCHAR(20),
PRIMARY KEY (id),
SORT KEY (status, location_city)
);
DROP PIPELINE IF EXISTS rideshare_kafka_riders;
CREATE OR REPLACE PIPELINE rideshare_kafka_riders AS
LOAD DATA KAFKA 'pkc-rgm37.us-west-2.aws.confluent.cloud:9092/ridesharing-sim-riders'
CONFIG '{"sasl.username": "username",
"sasl.mechanism": "PLAIN",
"security.protocol": "SASL_SSL",
"ssl.ca.location": "/etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem"}'
CREDENTIALS '{"sasl.password": "password"}'
DISABLE OUT_OF_ORDER OPTIMIZATION
REPLACE INTO TABLE riders
FORMAT JSON
(
id <- id,
first_name <- first_name,
last_name <- last_name,
email <- email,
phone_number <- phone_number,
@date_of_birth <- date_of_birth,
@created_at <- created_at,
location_city <- location_city,
location_lat <- location_lat,
location_long <- location_long,
status <- status
)
SET date_of_birth = STR_TO_DATE(@date_of_birth, '%Y-%m-%dT%H:%i:%s.%f'),
created_at = STR_TO_DATE(@created_at, '%Y-%m-%dT%H:%i:%s.%f');
START PIPELINE rideshare_kafka_riders;

Refer to Load Data from Kafka for more information.

Iceberg

Apache Iceberg is an open table format that simplifies data management by enabling seamless and scalable data storage across different platforms without the need for complex ETL processes. If the data you want to use with SingleStore is already inside of Snowflake, Iceberg is the easiest and most cost effective way to move it to SingleStore. You can create Iceberg tables in Snowflake and ingest them directly into SingleStore with no other tools and without burning any Snowflake credits.

Refer to Snowflake’s Apache Iceberg™ tables, Iceberg Ingest, and Get More From Your Data with SingleStore + Snowflake Integration video for more information.

Ridesharing Demo

To help demonstrate what the combination of Snowflake data, the SingleStore app, and Iceberg tables can deliver, SingleStore has created a real-time ridesharing simulation. This simulation demonstrates how an existing analytics app built on top of Snowflake data can be augmented with real-time analytics powered by SingleStore. The Iceberg Integration video provides an overview of SingleStore’s Iceberg integration

The instructions in the GitHub repository show how to set up SingleStore and the ridesharing simulation in SPCS, ingest Snowflake data through Iceberg, and connect existing Kafka streams to feed-in real-time information, all without sacrificing the ownership of the Snowflake data.

To build the ridesharing simulation in your own cluster, refer to Ridesharing Simulation in Snowpark.

Manage the Cluster

SingleStore App Interface

Function

Description

View the Cluster Status

The selected cluster's status is displayed in the SingleStore app and in the main dashboard when SingleStore Studio first opens.

Upgrade SingleStore

Upgrades the cluster to the latest version of SingleStore.

Upgrade Toolbox

Upgrades the cluster to the latest version of Toolbox.

Suspend a Cluster

Suspends a cluster.

When a cluster is suspended, the cluster name and connection endpoint are preserved, but the cluster does not process queries or consume credits. Storage metering is unaffected.

Resume a Cluster

Resumes a suspended cluster.

Repair a Cluster

If the cluster is in an unhealthy state, all stopped services will be resumed and the Toolbox service will be restarted to repair the cluster.

Delete a Cluster

Deletes a cluster.

Deleting a cluster stops all running queries and permanently removes the cluster and its data.

Toolbox Management Console

Provides Toolbox-level access to the cluster via a command line.

Note: As this feature can render a cluster unusable, use it with caution.

Note: If an error is displayed while opening the Toolbox Management Console, try opening the link (as shown in the address bar) in a private/incognito window. You may be prompted to log into Snowflake.

Get Service Logs

This obtains the service logs using the SYSTEM$GET_SERVICE_LOGS function. The result is saved on the @workspace stage of the selected cluster.

Get Cluster Report

This runs the sdb-report collect command on the cluster. By default, all standard collectors are run. Standard collectors can be excluded, and additional collectors can be included. The result is saved on the @workspace stage of the selected cluster.

Snowflake Worksheet

You may also interact with and manage a SingleStore cluster directly from a SQL worksheet in Snowflake using the following stored procedures.

Description

Stored Procedure

Obtain the TCP host for connecting to the cluster

CORE.GET_APP_S2_HOST_ENDPOINT(INSTANCE_NAME)

Retrieve the SingleStore Studio and TTYD endpoints

CORE.GET_APP_ENDPOINT(INSTANCE_NAME)

Run SQL commands directly against a cluster

CORE.SQL_COMMAND(INSTANCE_NAME, USER, PASSWORD, SQL

List all services within the cluster

CORE.LIST_APP_INSTANCE(INSTANCE_NAME)

Upgrade SingleStore for all nodes in a cluster

CORE.CLUSTER_UPGRADE(INSTANCE_NAME)

Upgrade the Toolbox service for a cluster

CORE.TOOLBOX_SERVICE_UPGRADE(INSTANCE_NAME)

Suspend a cluster

CORE.STOP_APP_INSTANCE(INSTANCE_NAME)

Resume a cluster

CORE.START_APP_INSTANCE(INSTANCE_NAME)

Permanently delete a cluster

This action is irreversible and requires confirmation

CORE.DROP_APP_INSTANCE(INSTANCE_NAME)

Obtain Support

To obtain support with the SingleStore app, either email spcs@singlestore.com or file a Support ticket.

Last modified: October 22, 2024

Was this article helpful?