Options for Loading Data Into SingleStore min read


Info

This topic discusses three options for loading data into SingleStore DB. SingleStore Managed Service supports options 2 and 3. To use option 3 with SingleStore Managed Service, use the admin endpoint.

This guide covers various ways of loading data into the system. We recommend trying option 1, and if that is not sufficient, trying option 2 then option 3.

  • Option 1: Loading from a file
  • Option 2: Streaming data using SingleStore Pipelines.
  • Option 3: Loading from MySQL

Option 1: Loading Data stored in a file

SingleStore DB supports standard SQL loading constructs. For instance, after creating tbl_name, run:

LOAD DATA INFILE 'file_name.tsv' INTO TABLE tbl_name
FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n'

For more details see LOAD DATA. For a tutorial on using LOAD DATA, see Build a Sample Stock Trade Database.

Option 2: Streaming Data Using SingleStore Pipelines

Data can be streamed from a variety of sources directly to SingleStore using SingleStore Pipelines. Pipelines currently supports streaming data from the following sources:

  • Kafka
  • Amazon S3
  • Azure Blob
  • Google Cloud Service
  • A Filesystem (not supported for SingleStore Managed Service)

You can set up and start running a SingleStore Pipeline using two simple commands:

  1. CREATE PIPELINE my_pipline AS LOAD DATA...
  2. START PIPELINE;

The required arguments for CREATE PIPELINE are specific to each source. For more information on getting started with the source of your choosing, please see the following Quickstart guides:

Option 3: Loading from MySQL

The popular open source tool mysqldump is a straightforward way to load data from MySQL into SingleStore DB. For example, log into a machine running a MySQL database (named foo in the following examples), and issue the following commands:

$ mysqldump -h 127.0.0.1 -u root --databases foo > foodump.sql

You can also dump specific tables by using the --tables option:

$ mysqldump -h 127.0.0.1 -u root --databases foo --tables tbl_name > foodump.sql

Since SingleStore DB is a distributed database, you need to tell SingleStore DB how to shard the data across the cluster. If all your tables have primary keys, SingleStore DB will automatically use the primary keys to shard them, so you can skip this next step. Otherwise, you need to open up foodump.sql in your favorite editor and either add shard keys or declare the tables as reference tables (for more details, see How to Port Your Applications to SingleStore DB.

For instance, suppose foodump.sql contains

CREATE TABLE users (
  id bigint(20) NOT NULL,
  username varchar(11) DEFAULT NULL,
  address varchar(10)
);

To shard this table on the id column, use the following statement:

CREATE TABLE users (
  id bigint(20) NOT NULL,
  username varchar(11) DEFAULT NULL,
  address varchar(10),
  SHARD KEY(id)
);

For more information on choosing a shard key, see Primary Key as the Shard Key. Once all your tables have either shard key or primary key declarations, you can run the following on the master aggregator:

mysql -h 127.0.0.1 -u root < foodump.sql

All tables and data from users will now be loaded into SingleStore DB.