Migrating Data from Another Database

Welcome to SingleStore Helios! We put together a guide to help you migrate your analytical application to SingleStore. Use this guide to follow a top-down approach to:

Extracting your data from your existing database
Interacting with SingleStore
Designing your tables to be optimized for your applications
Importing data from your originating database
Optimizing your queries for your applications
Connecting your application to SingleStore

Tutorial Outline

This tutorial is divided into the following sections:

Getting Started

Benefits of leveraging S2MS for your application
What to consider when bringing your application data from:
- MySQL, MariaDB, Postgres
- AWS, GCP, Azure
What to consider when designing your database coming from:
- MySQL, MariaDB, Postgres
- AWS, GCP, Azure

Preparing Your Data for SingleStore

Database Administration

Cloud Portal
Drivers
MySQL Workbench
MySQL Command Line

Designing Your Tables

Data types
Designing a schema for an application
- Sharding for application workloads
- Sort keys for application workloads
- Reference Tables

Ingesting Data

Using SingleStore Pipelines
Using INSERT Statements

Testing Your Queries and Performance

Run queries
Visual Explain
Concurrency test your queries

Connecting Your Application Development Tools

Code samples to connect to SingleStore

Getting Started

We are excited for you to get started migrating your application from your existing database to SingleStore Helios (S2MS). We have many users that have migrated internal and customer-facing applications to S2MS after they found that other solutions (MySQL, MariaDB, Postgres, etc.) were not meeting their needs. Users often find improved data ingestion speed and query latency, and better concurrency support using S2MS as their backend database instead of those platforms.

Some terminology used in this guide:

App/Application: Web- or mobile-based, customer or internal facing application.
Object storage: Cloud-based data repository.

Let’s get started by discussing some of the basic things to consider when bringing your data over from one of these databases.

Things to Consider

For users starting with a database already hosted in a cloud provider like AWS, GCP, or Azure, you may already have your data sitting in object storage (if not, we walk through how to do this later). The feature called Pipelines, allows you to bring in data from any of these places very quickly. We’ll go through how to do this in a bit. Alternatively, you can use mysqldump, since SingleStore follows the MySQL wire protocol.

Now, of course, every database is different. Let’s talk a bit about database design differences.

Things to know, regardless of your current platform:

SingleStore Helios can be deployed in any of the three major clouds, in any region – so you don’t have to worry about moving regions.

MySQL or MariaDB (for example, open source, AWS RDS MariaDB, Google Cloud SQL, Azure DB for MySQL, etc.)

S2MS is MySQL wire-compatible, making it very easy to transition any of these above databases.
S2MS is a distributed database system offering simple, powerful sharding capabilities through a shard key, which we will explain later.
S2MS will not validate whether foreign keys exist when your data changes, though you can still have foreign keys in your app.

PostgreSQL (for example, open source, EnterpriseDB, CitusDB)

S2MS primarily follows MySQL syntax, so you will have to re-write queries.
S2MS is distributed, and provides robust support for all data types including JSON.
S2MS will not validate whether foreign keys exist when your data changes, though you can still have foreign keys in your app.

Preparing Your Data for SingleStore

SingleStore has simple, powerful methods of bringing data in from object storage. Here are some options for getting your existing databases exported to object storage in places like S3, GCS and Azure Blob Storage.

Is your data already in cloud storage? Feel free to move on to the next section.

Existing Managed Cloud Databases:

AWS RDS to AWS S3
Google Cloud SQL to GCS
Azure Database to Blob Storage

For other databases that you’re looking to migrate, you can still export that data to CSV, JSON, etc. and then upload it to object storage. Typically this would involve something like a SELECT ... INTO OUTFILE ... from your existing database. From there, you can upload those files to object storage.

If you are importing data from MySQL or MariaDB, you can perform a simple mysqldump as listed in Transition from MySQL to SingleStore Helios.

Don’t want to do a bulk import? That’s fine too! After we discuss schema design, we’ll walk you through connecting directly from your application and writing data.

Database Administration

Now that we have identified our data source, let’s talk a bit about how we will interact with SingleStore.

Cloud Portal

When you signed up in the Cloud Portal, you got access to the SQL Editor and Visual Explain tools. The Cloud Portal is the best place to interact with your SingleStore data, build new data pipelines, and test out queries. Click here to explore the features in the Cloud Portal, as well as the Tutorials available through the Help button.

If you plan to use third party tools, make sure that you have the appropriate client drivers prior to using other database administration tools.

MySQL Workbench

If you’re coming from MariaDB or MySQL, you may already be comfortable with Workbench. You can download MySQL Workbench here.

Note: When defining your connection, you will need to go to the Advanced tab and insert defaultAuth=mysql_native_password in the Others: field to ensure proper authentication.

MySQL Command Line

Within your Cloud Portal, your workspace details will include a MySQL Command section. This gives you an easy way to use the command line to connect to SingleStore. You can find the download for the MySQL shell here.

Note: When defining your connection, you will need to enter this at the MySQL command line: --defaultAuth=mysql_native_password to ensure proper authentication.

Designing Your Tables

At this point, you should be using the SQL Editor or some other MySQL client to work with your cloud database. Before checking out how fast SingleStore can bring in your data, let’s make sure your tables are designed optimally for your application workload.

By default within SingleStore Helios, database tables are created using our Universal Storage format (i.e., disk-based columnstore). There are a few important things to consider when designing tables to house your data:

Data Types

If you’re familiar with relational databases, you may not need too much guidance on data types.

In addition to data types traditionally supported in relational databases, SingleStore also supports JSON and geospatial data types. Read about Data Types.

Shard Key

This key determines how data is distributed across the database workspace, and is critical to ensure that your data isn’t skewed. Skewed data can lead to longer query times. Data contained within unique values of your shard key will reside in individual partitions of the database.

Users of commercial Postgres offerings like Citus will find this familiar to, but not exactly the same as, distribution keys.
Users of MySQL in AWS RDS or GCP Cloud SQL will also be familiar with the concept of sharding.
We offer free training on sharding if you’d like to learn more!

So how do I pick a shard key best for my application workload?

If you have a primary key, make sure the shard key is a subset of it (because cardinality matters!).
If your application queries include frequent joins or filters on a specific set of columns, make sure the shard key is a subset of those.
Concurrency is very important with application workloads, so make sure your shard key allows your queries to be single partition, as explained below.

In this example, we use user_id as our shard key, which works nicely given its high cardinality as a part of this dataset. All records with the same user_id will be maintained together, which will improve query response time.

SQL

CREATE TABLE clicks (
        click_id BIGINT AUTO_INCREMENT,
        user_id INT,
        page_id INT,
        ts TIMESTAMP,
        SHARD KEY (user_id),
        SORT KEY (click_id, user_id))
);

Columnstore Key

In addition to identifying your shard key, it’s important to tell SingleStore how you would like to sort your data within each data segment. This helps enable segment elimination, which ensures a minimal amount of data needs to be read for each query. This also helps presort data for your queries.

So how do I pick a columnstore key best for my application workload?

If you have common filter columns, make sure those are in the columnstore key.
If you’re inserting in order by some column, it’s best to put that column first in the columnstore key.
Lower cardinality columns should be first in the columnstore key.

In this example, we use price as our sort key, so items are sorted in order of that column when queried.

SQL

CREATE TABLE products (
     ProductId INT,
     Color VARCHAR(10),
     Price INT,
     Qty INT,
     SORT KEY (Price),
     SHARD KEY (ProductId)
);

Reference Tables

If you have small, infrequently changing table(s) that are required for joins, consider making them Reference Tables.

Reference tables are a convenient way to recreate dimension tables that you may use in MySQL, MariaDB, or PostgreSQL.
Reference tables are replicated to each leaf in the workspace ensuring data does not need to go across the network between partitions to join data.
Reference table commands need to be run from the endpoint listed in the Cloud Portal.

Data Ingest

Now for the fun part, ingesting data! This is where things may look a bit different to you compared to other databases, because SingleStore has this unique ingest capability called Pipelines that supports high-frequency, parallel ingest of data from sources like S3, Azure Blob, GCS, Kafka, etc. Skip-list indexes and concurrent versioning technologies allow these writes to not lock tables, allowing reads to continue unimpacted during ingest.

Previously, we discussed how to bring your data from MySQL, MariaDB, etc. into object storage. Now it’s time to go fetch that data from your object storage and import it into SingleStore.

SingleStore Pipelines

To use a Pipeline to import data into SingleStore, write a CREATE PIPELINE statement using the SQL Editor or a MySQL client.

A few things to consider:

Make sure that your security settings in your blob storage will allow for access from SingleStore. For example, AWS S3 security settings can be found here.
Make sure your buckets are not public, but you should be able to obtain an access and secret key using the AWS doc here.
You can use wildcard notation when identifying your files from within the bucket.

Here’s an example of a CREATE PIPELINE statement:

SQL

CREATE PIPELINE library
AS LOAD DATA S3 'my-bucket-name'
CONFIG '{"region": "us-west-1", "suffixes": ["csv"]}'
CREDENTIALS '{"aws_access_key_id": "your_access_key_id", "aws_secret_access_key": "your_secret_access_key"}'
INTO TABLE `classic_books`
FIELDS TERMINATED BY ',';

GCP Documentation:

Load Data from Google Cloud Storage (GCS) Using a Pipeline

Bucket Security

Azure Documentation:

Azure Blob Pipelines Quickstart

Bucket Security

INSERT Statements

Perhaps you already have an application that is writing to your existing database, and you simply want to redirect the writes from that application to SingleStore. That’s great! Many of our users do this with the MySQL JDBC driver. See INSERT for more examples on writing your insert statements.

Testing Your Queries and Performance

Running Queries

Hopefully at this point you have your data in SingleStore. You can check this by running some basic SELECT statements within the SQL Editor. Next, you may want to try out some queries that you were running with your last database. Generally, you’ll want to run queries twice to get a true understanding of the runtime as the first run must create and cache the query plan.

Visual Explain

One great feature of the Cloud Portal is our Visual Explain functionality. If you encounter a situation in which your query is taking longer than expected, highlight the desired query in the SQL Editor and click the Visual Explain icon to the left of the Run button. The icon resembles a tree. After clicking, you may then choose between EXPLAIN and PROFILE.

Once you identify a bottleneck, you should be able to make changes either to your schema or to your query itself in order to improve speed.

You can manually (non-visually) run EXPLAIN or PROFILE from any client; see the links above for details on the commands.

Benchmarking

If you’re moving over from any of the databases we’ve talked about so far, you are probably interested in improving performance. We have developed an easy-to-use tool for benchmarking called dbbench. You can check that out here.

Once you’ve installed the packages to your host machine, you can walk through this tutorial. All you will have to do is change the host from 127.0.0.1 to the endpoint listed for your workspace in the Cloud Portal (it should look something like: svc-xxx-dml.aws-virginia-1.db.memsql.com). The port will remain 3306, as listed. You can then move onto testing based on your application workload.

Connecting to Your Application Development Tools

We've created tutorials on how to connect to SingleStore using a variety of different frameworks, which you can find in the list below.

The tutorials below show how to leverage both SingleStore Helios and SingleStore Self-Managed (our self-managed product). You can skip to the part of each that details how to make the connection and then get started. Again, you'll be using your endpoint here provided in the Cloud Portal, along with the credentials you used when spinning up your workspace.

JavaScript / Node

SQL: https://github.com/singlestore-labs/start-with-singlestore-node

Stored Procedures: https://github.com/singlestore-labs/start-with-singlestore-node-stored-procedure

SQL: https://github.com/singlestore-labs/start-with-singlestore-csharp

Stored procedures: https://github.com/singlestore-labs/start-with-singlestore-csharp-stored-procedure

Java

SQL: https://github.com/singlestore-labs/start-with-singlestore-java

Stored procedures: https://github.com/singlestore-labs/start-with-singlestore-java-stored-procedure

SQL: https://github.com/singlestore-labs/start-with-singlestore-go

Stored procedures: https://github.com/singlestore-labs/start-with-singlestore-go-stored-procedure

Ruby

SQL: https://github.com/singlestore-labs/start-with-singlestore-ruby