Migrating Data from Another Database
On this page
Welcome to SingleStore Helios! We put together a guide to help you migrate your analytical application to SingleStore.
-
Extracting your data from your existing database
-
Interacting with SingleStore
-
Designing your tables to be optimized for your applications
-
Importing data from your originating database
-
Optimizing your queries for your applications
-
Connecting your application to SingleStore
Tutorial Outline
This tutorial is divided into the following sections:
Getting Started
-
Benefits of leveraging S2MS for your application
-
What to consider when bringing your application data from:
-
MySQL, MariaDB, Postgres
-
AWS, GCP, Azure
-
-
What to consider when designing your database coming from:
-
MySQL, MariaDB, Postgres
-
AWS, GCP, Azure
-
Preparing Your Data for SingleStore
Database Administration
-
Cloud Portal
-
Drivers
-
MySQL Workbench
-
MySQL Command Line
Designing Your Tables
-
Data types
-
Designing a schema for an application
-
Sharding for application workloads
-
Sort keys for application workloads
-
Reference Tables
-
Ingesting Data
-
Using SingleStore Pipelines
-
Using INSERT Statements
Testing Your Queries and Performance
-
Run queries
-
Visual Explain
-
Concurrency test your queries
Connecting Your Application Development Tools
-
Code samples to connect to SingleStore
Getting Started
We are excited for you to get started migrating your application from your existing database to SingleStore Helios (S2MS).
Some terminology used in this guide:
-
App/Application: Web- or mobile-based, customer or internal facing application.
-
Object storage: Cloud-based data repository.
Let’s get started by discussing some of the basic things to consider when bringing your data over from one of these databases.
Things to Consider
For users starting with a database already hosted in a cloud provider like AWS, GCP, or Azure, you may already have your data sitting in object storage (if not, we walk through how to do this later).mysqldump
, since SingleStore follows the MySQL wire protocol.
Now, of course, every database is different.
Things to know, regardless of your current platform:
-
SingleStore Helios can be deployed in any of the three major clouds, in any region – so you don’t have to worry about moving regions.
MySQL or MariaDB (for example, open source, AWS RDS MariaDB, Google Cloud SQL, Azure DB for MySQL, etc.
-
S2MS is MySQL wire-compatible, making it very easy to transition any of these above databases.
-
S2MS is a distributed database system offering simple, powerful sharding capabilities through a shard key, which we will explain later.
-
S2MS will not validate whether foreign keys exist when your data changes, though you can still have foreign keys in your app.
PostgreSQL (for example, open source, EnterpriseDB, CitusDB)
-
S2MS primarily follows MySQL syntax, so you will have to re-write queries.
-
S2MS is distributed, and provides robust support for all data types including JSON.
-
S2MS will not validate whether foreign keys exist when your data changes, though you can still have foreign keys in your app.
Preparing Your Data for SingleStore
SingleStore has simple, powerful methods of bringing data in from object storage.
Is your data already in cloud storage? Feel free to move on to the next section.
Existing Managed Cloud Databases:
-
Azure Database to Blob Storage
For other databases that you’re looking to migrate, you can still export that data to CSV, JSON, etc.SELECT .
from your existing database.
If you are importing data from MySQL or MariaDB, you can perform a simple mysqldump
as listed in Transition from MySQL to SingleStore Helios.
Don’t want to do a bulk import? That’s fine too! After we discuss schema design, we’ll walk you through connecting directly from your application and writing data.
Database Administration
Now that we have identified our data source, let’s talk a bit about how we will interact with SingleStore.
Cloud Portal
When you signed up in the Cloud Portal, you got access to our SQL Editor and Visual Explain tools.
If you plan to use third party tools, make sure that you have the appropriate client drivers prior to using other database administration tools.
MySQL Workbench
If you’re coming from MariaDB or MySQL, you may already be comfortable with Workbench.
Note: When defining your connection, you will need to go to the Advanced tab and insert defaultAuth=mysql_
in the Others:
field to ensure proper authentication.
MySQL Command Line
Within your Cloud Portal, your workspace details will include a MySQL Command section.
Note: When defining your connection, you will need to enter this at the MySQL command line: --defaultAuth=mysql_
to ensure proper authentication.
Designing Your Tables
At this point, you should be using the SQL Editor or some other MySQL client to work with your cloud database.
By default within SingleStore Helios, database tables are created using our Universal Storage format (i.
Data Types
If you’re familiar with relational databases, you may not need too much guidance on data types.
In addition to data types traditionally supported in relational databases, SingleStore also supports JSON and geospatial data types.
Shard Key
This key determines how data is distributed across the database workspace, and is critical to ensure that your data isn’t skewed.
-
Users of commercial Postgres offerings like Citus will find this familiar to, but not exactly the same as, distribution keys.
-
Users of MySQL in AWS RDS or GCP Cloud SQL will also be familiar with the concept of sharding.
-
We offer free training on sharding if you’d like to learn more!
So how do I pick a shard key best for my application workload?
-
If you have a primary key, make sure the shard key is a subset of it (because cardinality matters!).
-
If your application queries include frequent joins or filters on a specific set of columns, make sure the shard key is a subset of those.
-
Concurrency is very important with application workloads, so make sure your shard key allows your queries to be single partition, as explained below.
In this example, we use user_
as our shard key, which works nicely given its high cardinality as a part of this dataset.user_
will be maintained together, which will improve query response time.
CREATE TABLE clicks (click_id BIGINT AUTO_INCREMENT,user_id INT,page_id INT,ts TIMESTAMP,SHARD KEY (user_id),SORT KEY (click_id, user_id)));
Columnstore Key
In addition to identifying your shard key, it’s important to tell SingleStore how you would like to sort your data within each data segment.
So how do I pick a columnstore key best for my application workload?
-
If you have common filter columns, make sure those are in the columnstore key.
-
If you’re inserting in order by some column, it’s best to put that column first in the columnstore key.
-
Lower cardinality columns should be first in the columnstore key.
In this example, we use price
as our sort key, so items are sorted in order of that column when queried.
CREATE TABLE products (ProductId INT,Color VARCHAR(10),Price INT,Qty INT,SORT KEY (Price),SHARD KEY (ProductId));
Reference Tables
If you have small, infrequently changing table(s) that are required for joins, consider making them Reference Tables.
-
Reference tables are a convenient way to recreate dimension tables that you may use in MySQL, MariaDB, or PostgreSQL.
-
Reference tables are replicated to each leaf in the workspace ensuring data does not need to go across the network between partitions to join data.
-
Reference table commands need to be run from the endpoint listed in the Cloud Portal.
Data Ingest
Now for the fun part, ingesting data! This is where things may look a bit different to you compared to other databases, because SingleStore has this unique ingest capability called Pipelines that supports high-frequency, parallel ingest of data from sources like S3, Azure Blob, GCS, Kafka, etc.
Previously, we discussed how to bring your data from MySQL, MariaDB, etc.
SingleStore Pipelines
To use a Pipeline to import data into SingleStore, write a CREATE PIPELINE statement using our SQL Editor or a MySQL client.
A few things to consider:
-
Make sure that your security settings in your blob storage will allow for access from SingleStore.
For example, AWS S3 security settings can be found here. -
Make sure your buckets are not public, but you should be able to obtain an access and secret key using the AWS doc here.
-
You can use wildcard notation when identifying your files from within the bucket.
Here’s an example of a CREATE PIPELINE
statement:
CREATE PIPELINE libraryAS LOAD DATA S3 'my-bucket-name'CONFIG '{"region": "us-west-1", "suffixes": ["csv"]}'CREDENTIALS '{"aws_access_key_id": "your_access_key_id", "aws_secret_access_key": "your_secret_access_key"}'INTO TABLE `classic_books`FIELDS TERMINATED BY ',';
GCP Documentation:
Load Data from Google Cloud Storage (GCS) Using a Pipeline
Azure Documentation:
INSERT Statements
Perhaps you already have an application that is writing to your existing database, and you simply want to redirect the writes from that application to SingleStore.
Testing Your Queries and Performance
Running Queries
Hopefully at this point you have your data in SingleStore.
Visual Explain
One great feature of the Cloud Portal is our Visual Explain functionality.
Once you identify a bottleneck, you should be able to make changes either to your schema or to your query itself in order to improve speed.
You can manually (non-visually) run EXPLAIN
or PROFILE
from any client; see the links above for details on the commands.
Benchmarking
If you’re moving over from any of the databases we’ve talked about so far, you are probably interested in improving performance.
Once you’ve installed the packages to your host machine, you can walk through this tutorial.svc-xxx-dml.
).3306
, as listed.
Connecting to Your Application Development Tools
We've created tutorials on how to connect to SingleStore using a variety of different frameworks, which you can find in the list below.
The tutorials below show how to leverage both SingleStore Helios and SingleStore Self-Managed (our self-managed product).
JavaScript / Node
SQL: https://github.
Stored Procedures: https://github.
C#
SQL: https://github.
Stored procedures: https://github.
Java
SQL: https://github.
Stored procedures: https://github.
Go
SQL: https://github.
Stored procedures: https://github.
Ruby
SQL: https://github.
Stored procedures: https://github.
Last modified: September 27, 2023