SingleStore DB

Online Deployment Using YAML File - Debian Distribution

Introduction

Installing SingleStore DB on bare metal, on virtual machines, or in the cloud can be done through the use of popular configuration management tools or through SingleStore’s management tools.

In this guide, you will deploy a SingleStore DB cluster onto physical or virtual machines and connect to the cluster using our monitoring, profiling, and debugging tool, SingleStore DB Studio Overview.

A four-node cluster is the minimal recommended cluster size for showcasing SingleStore DB as a distributed database with high availability; however, you can use the procedures in this tutorial to scale out to additional nodes for increased performance over large data sets or to handle higher concurrency loads. To learn more about SingleStore’s design principles and topology concepts, see Distributed Architecture.

Notice

There are no licensing costs for using up to four license units for the leaf nodes in your cluster. If you need a larger cluster with more/larger leaf nodes, please create an Enterprise License trial key.

Prerequisites

For this tutorial you will need:

  • One (for single-host cluster-in-a-box for development) or four physical or virtual machines (hosts) with the following:

    • Each SingleStore DB node requires at least four (4) x86_64 CPU cores and eight (8) GB of RAM per host

    • Eight (8) vCPU and 32 GB of RAM are recommended for leaf nodes to align with license unit calculations

    • Running 64-bit version of RHEL/CentOS 6 or higher or Debian 8 or higher, with kernel 3.10 or higher

    • Port 3306 open on all hosts for intra-cluster communication. This default can be changed in the cluster file.

    • Port 8080 open on the main deployment host for the cluster

    • A non-root user with sudo privileges available on all hosts in the cluster that be used to run SingleStore DB services and own the corresponding runtime state

  • SSH access to all hosts (installing and using ssh-agent is recommended for SSH keys with passwords).

    • If using SSH keys, make sure the identity key used on the main deployment host can be used to log in to the other hosts.

    • Refer to How to Setup Passwordless SSH Login for more information on using SSH without a password.

  • A connection to the Internet to download required packages

If running this in a production environment, it is highly recommended that you follow our host configuration recommendations for optimal cluster performance.

Duplicate Hosts

As of SingleStore DB Toolbox 1.4.4, a check for duplicate hosts is performed before SingleStore DB is deployed, and will display a message similar to the following if more than one host has the same SSH host key:

✘ Host check failed.host 172.26.212.166 has the same ssh
host keys as 172.16.212.165, toolbox doesn't support
registering the same host twice

Confirm that all specified hosts are indeed different and aren’t using identical SSH host keys. Identical host keys can be present if you have instantiated your host instances from images (AMIs, snapshots, etc.) that contain existing host keys. When a host is cloned, the host key (typically stored in /etc/ssh/ssh_host_<cipher>_key) will also be cloned.

As each cloned host will have the same host key, an SSH client cannot verify that it is connecting to the intended host. The script that deploys SingleStore DB will interpret a duplicate host key as an attempt to deploy to the same host twice, and the deployment will fail.

The following steps demonstrate a potential remedy for the duplicate hosts message. Please note these steps may slightly differ depending on your Linux distribution and configuration.

$ sudo root
# ls -al /etc/ssh/
# rm /etc/ssh/<your-ssh-host-keys>
# ssh-keygen -f /etc/ssh/<ssh-host-key-filename> -N '' -t rsa1
# ssh-keygen -f /etc/ssh/<ssh-host-rsa-key-filename> -N '' -t rsa
# ssh-keygen -f /etc/ssh/<ssh-host-dsa-key-filename> -N '' -t dsa

For more information about SSH host keys, including the equivalent steps for Ubuntu-based systems, refer to Avoid Duplicating SSH Host Keys.

As of SingleStore DB Toolbox 1.5.3, sdb-deploy setup-cluster supports an --allow-duplicate-host-fingerprints option that can be used to ignore duplicate SSH host keys.

Network Configuration

Depending on the host and its function in deployment, some or all of the following port settings should be enabled on hosts in your cluster.

These routing and firewall settings must be configured to:

  • Allow database clients (e.g. your application) to connect to the SingleStore DB aggregators

  • Allow all nodes in the cluster to talk to each other over the SingleStore DB protocol (3306)

  • Allow you to connect to management and monitoring tools

Protocol

Default Port

Direction

Description

TCP

22

Inbound and Outbound

For host access. Required between nodes in SingleStore DB tool deployment scenarios. Also useful for remote administration and troubleshooting on the main deployment host.

TCP

443

Outbound

To get public repo key for package verification. Required for nodes downloading SingleStore APT or YUM packages.

TCP

3306

Inbound and Outbound

Default port used by SingleStore DB. Required on all nodes for intra-cluster communication. Also required on aggregators for client connections.

TCP

8080

Inbound and Outbound

Default port for SingleStore DB Studio. (Only required for the host running Studio.)

The service port values are configurable if the default values cannot be used in your deployment environment. For more information on how to change them, see:

We also highly recommend configuring your firewall to prevent other hosts on the Internet from connecting to SingleStore DB.

Install SingleStore Tools

The first step in deploying your cluster is to download and install the SingleStore Tools on one of the hosts in your cluster. This host will be designated as the main deployment host for deploying SingleStore DB across your other hosts and setting up your cluster.

These tools perform all major cluster operations including downloading the latest version of SingleStore DB onto your hosts, assigning and configuring nodes in your cluster, and other management operations. For the purpose of this guide, the main deployment host is the same as the designated Master Aggregator of the SingleStore DB cluster.

Note: If SingleStore DB is installed as a sudo user via packages, systemd will automatically start the associated SingleStore DB processes when a host is rebooted.

Online Installation - Debian Distribution
  1. SingleStore packages are signed to ensure integrity, so the GPG key needs to be added to this machine. When done, verify that the SingleStore signing key has been added using apt-key list.

    wget -O - 'https://release.memsql.com/release-aug2018.gpg'  2>/dev/null | sudo apt-key add - && apt-key list
  2. Verify you have apt-transport-https installed.

    apt-cache policy apt-transport-https

    If apt-transport-https is not installed, you must install it before proceding.

    sudo apt -y install apt-transport-https
  3. Add the SingleStore repository to retrieve its packages.

    echo "deb [arch=amd64] https://release.memsql.com/production/debian memsql main" | sudo tee /etc/apt/sources.list.d/memsql.list
  4. To install the management tools, client application, and SingleStore DB Studio, run the following:

    sudo apt update && sudo apt -y install singlestoredb-toolbox singlestore-client singlestoredb-studio
Deploy SingleStore DB
Prerequisites

Warning

Before deploying a SingleStore DB cluster in a production environment, please review and follow the host configuration recommendations.

Failing to follow these recommendations will result in sub-optimal cluster performance.

Notes on Users and Groups

The user that deploys SingleStore DB via SingleStore DB Toolbox must be able to SSH to each host in the cluster. When singlestoredb-server is installed via an RPM or Debian package when deploying SingleStore DB, a memsql user and group are also created on each host in the cluster.

This memsql user does not have a shell, and attempting to log in or SSH as this user will fail. The user that deploys SingleStore DB is added to the memsql group. This group allows most Toolbox commands to run without sudo privileges, and members of this group can perform many Toolbox operations without the need to escalate to sudo. Users who desire to run SingleStore DB Toolbox commands must be added to the memsql group on each host in the cluster. They must also be able to SSH to each host.

Manually creating a memsql user and group is only recommended in a sudo-less environment when performing a tarball-based deployment of SingleStore DB. In order to run SingleStore DB Toolbox commands against a cluster, this manually-created memsql user must be configured so that it can SSH to each host in the cluster.

Minimal Deployment

SingleStore DB has been designed to be deployed with at least two nodes:

  • A Master Aggregator node that runs SQL queries and aggregates the results, and

  • A single leaf node, which is responsible for storing and processing data

These two nodes can be deployed on a single host (via the cluster-in-box option), or on two hosts, with one SingleStore DB node on each host.

While additional aggregators and nodes can be added and removed as required, a minimal deployment of SingleStore DB always consists of at least these two nodes.

Online Deployment Using YAML File

As of SingleStore DB Toolbox 1.3.0, the sdb-deploy setup-cluster command now accepts a YAML-based cluster configuration file (or simply “cluster file”), the format of which is validated before attempting to set up the specified cluster. Using a cluster file is the recommended method for creating new SingleStore DB clusters.

The command is designed to be consistent, where re-running the sdb-deploy setup-cluster command with the same cluster file will always produce the same cluster. This methods is also resilient, allowing errors encountered at any stage of the cluster construction process to be corrected, and sdb-deploy setup-cluster re-run, in order to generate the desired cluster.

Complete Cluster File Template
license:                    <LICENSE | /path/to/LICENSE-file> [Required to bootstrap Master Aggregator]
high_availability:          <true | false>
memsql_server_version:      <the version of memsql you want to install (6.7+)>
memsql_server_file_path:    <path to the downloaded memsql server file>
package_type:               <deb|rpm|tar> [Required if multiple package present]
root_password:              <default password to be used for all nodes>
optimize:                   <true | false>
optimize_config:
  memory_percentage:        <percentage of memory you want memsql to use>
  no_numa:                  <true|false>
hosts:
- hostname:                 <host-name> [Required]
  localhost:                <true | false> 
  memsqlctl_path:           <path to memsqlctl> [ADVANCED]
  memsqlctl_config_path:    <path to memsqlctl config> [ADVANCED]
  tar_install_dir:          <path to tar install dir> [ADVANCED]
  tar_install_state:        <path to tar install state> [ADVANCED]
  ssh:                      [Required for remote Hosts]
    host:                   <ssh host name>
    port:                   <ssh port>
    user:                   <ssh user>
    private_key:            <path to your identity key>
  nodes:
  - register:               <true | false>
    role:                   <Unknown | Master | Leaf | Aggregator> (case sensitive) [Required]
    availability_group:     <availability group>
    no_start:               <true | false>
    config: 
      auditlogsdir:         <path to auditlogs directory> [ADVANCED]
      baseinstalldir:       <path to base install directory> [ADVANCED]
      configpath:           <path to configuration path> [ADVANCED] [Required if register is true]
      disable_auto_restart: <true | false>
      password:             <password>
      plancachedir:         <path to plancache directory> [ADVANCED]
      port:                 <port number> [Required for node creation]
      tracelogsdir:         <path to tracelogs directory> [ADVANCED]
      bind_address:         <bind address> [ADVANCED]
Deploy a Cluster

You can deploy your own SingleStore DB cluster with your desired cluster configuration using the cluster file template above, and/or the example cluster files in the following sections.

After creating the cluster file, you can deploy the corresponding SingleStore DB cluster via the sdb-deploy setup-cluster command.

Run the following with the path to the cluster file as input.

sdb-deploy setup-cluster --cluster-file </path/to/cluster-file>
Cluster File Notes
  • high_availability: Used to enable high availability on the cluster.

    • If set to true, each node may be assigned an availability group via the availability_group field.

    • Refer to Availability Groups for more information.

  • license: Use your license from the SingleStore Customer Portal. This can be the license itself, or the full path to a text file with the license in it.

  • singlestoredb-server_version: You may specify either a major release of SingleStore DB (such as 7.3) or a specific release (such as 7.3.10). When a major release is specified, the latest patch level of that release will be deployed.

  • register: Set the value of this field to false to create a new node. Set the value to true if the node is already present and you want to register it to SingleStore DB Toolbox. The configpath field and value are also required when register is set to true. Do not set this value to true to create a new node. For more information, refer to the sdb-deploy setup-clusterreference page.

  • Indicating a Host: You may use either an IP address or a hostname when indicating a host in the cluster file.

  • Aggregator Hosts: When deploying SingleStore DB, it is recommended that you deploy each Aggregator to its own individual host. If the Master Aggregator goes down, the Child Aggregators can keep running queries, and coordinating and executing writes. In this scenarios, the only operations that can’t be done are DDL commands and reference table management, which must be done on the Master Aggregator.

  • Optimize the Cluster: It is recommended that you include the optimize field in the cluster file and set it to true. Doing so checks your current cluster configuration against a set of best practices and either makes changes to maximize performance or provides recommendations for you. For hosts with NUMA support, this command will bind the leaf nodes to specific NUMA nodes.

Cluster File Examples

SingleStore DB uses a combination of aggregator and leaf nodes that are typically configured in a specific ratio. For more information, refer to Cluster Components.

The examples below deploy two different types of SingleStore DB cluster:

  • A multi-host, multi-node SingleStore DB cluster with four hosts, two aggregators, and two leaf nodes

  • A multi-host, multi-node SingleStore DB cluster with two hosts, a single aggregator, and two leaf nodes

These cluster file examples can be used as a starting point for deploying a SingleStore DB cluster that fulfills your specific requirements.

For this example, you will need four hosts and the ability to ssh into each host from the main deployment host.

Set package_type to either rpm for Red Hat distributions or deb for Debian distributions to download and deploy the appropriate singlestoredb-server

Using this cluster file, sdb-deploy setup-cluster:

license: <license-from-portal.singlestore.com>
high_availability: true
memsql_server_version: 7.3
package_type: rpm
hosts:
- hostname: 172.16.212.165
  localhost: true
  ssh: 
    host: 172.16.212.165
    private_key: /home/<user>/.ssh/id_rsa
  nodes:
  - register: false
    role: Master
    config:
      password: <secure-password>
      port: 3306
- hostname: 172.16.212.166
  localhost: false
  ssh: 
    host: 172.16.212.166
    private_key: /home/<user>/.ssh/id_rsa
  nodes:
  - register: false
    role: Aggregator
    config:
      password: <secure-password>
      port: 3306
- hostname: 172.16.212.167
  localhost: false
  ssh: 
    host: 172.16.212.167
    private_key: /home/<user>/.ssh/id_rsa
  nodes:
  - register: false
    role: Leaf
    config:
      password: <secure-password>
      port: 3306
- hostname: 172.16.212.168
  localhost: false
  ssh: 
    host: 172.16.212.168
    private_key: /home/<user>/.ssh/id_rsa
  nodes:
  - register: false
    role: Leaf
    config:
      password: <secure-password>
      port: 3306

Using this cluster file, sdb-deploy setup-cluster:

  1. Registers four hosts to the cluster.

  2. Enables High Availability.

  3. Installs the latest patch level of singlestoredb-server v7.3 on all four hosts.

  4. Creates a Master Aggregator node on port 3306 on host 172.16.212.165 and sets the SingleStore DB password to the one specified in the cluster file.

  5. Creates a Child Aggregator node on port 3306 on host 172.16.212.166 and sets the SingleStore DB password to the one specified in the cluster file.

  6. Creates a leaf node on port 3306 on host 172.16.212.167 and sets the SingleStore DB password to the one specified in the cluster file.

  7. Creates a leaf node on port 3306 on host 172.16.212.168 and sets the SingleStore DB password to the one specified in the cluster file.

  8. Run the following with the path to the cluster file as input.

    sdb-deploy setup-cluster --cluster-file </path/to/cluster-file>
    

For this example, you will need two hosts and the ability to ssh into each host from the main deployment host.

Set package_type to either rpm for Red Hat distributions or deb for Debian distributions to download and deploy the appropriate singlestoredb-server.

license: <license-from-portal.singlestore.com>
memsql_server_version: 7.3
package_type: rpm
root_password: <secure-password>
hosts:
- hostname: 172.16.212.165
  localhost: true
  ssh:
    host: 172.16.212.165
    private_key: /home/<user>/.ssh/id_rsa
  nodes:
  - register: false
    role: Master
    config:
      auditlogsdir: /data/memsql/Master/auditlogs/
      datadir: /data/memsql/Master/data
      plancachedir: /data/memsql/Master/plancache
      tracelogsdir: /data/memsql/Master/tracelogs
      port: 3306
  - register: false
    role: Leaf
    config:
      auditlogsdir: /data/memsql/Leaf1/auditlogs
      datadir: /data/memsql/Leaf1/data
      plancachedir: /data/memsql/Leaf1/plancache
      tracelogsdir: /data/memsql/Leaf1/tracelogs
      port: 3307
- hostname: 172.16.212.166
  localhost: false
  ssh:
    host: 172.16.212.166
    private_key: /home/<user>/.ssh/id_rsa
  nodes:
  - register: false
    role: Leaf
    config:
      auditlogsdir: /data/memsql/Leaf2/auditlogs
      datadir: /data/memsql/Leaf2/data
      plancachedir: /data/memsql/Leaf2/plancache
      tracelogsdir: /data/memsql/Leaf2/tracelogs
      port: 3307

Using this cluster file, sdb-deploy setup-cluster:

  1. Registers two hosts to the cluster.

  2. Installs the latest patch level of singlestoredb-server v7.3 on both hosts.

  3. Creates a Master Aggregator node on port 3306 on host 172.16.212.165 and sets the SingleStore DB password to the one specified by the value in the root_password field.

  4. Creates a leaf node on port 3307 on host 172.16.212.165 and sets the SingleStore DB password to the one specified by the value in the root_password field.

  5. Creates a leaf node on port 3307 on host 172.16.212.166 and sets the SingleStore DB password to the one specified by the value in the root_password field.

  6. Sets the paths for the audit logs, data, plancache, and trace logs on each host. Notice that each field has its own path.

    Alternatively, you can replace these individual fields and paths on each node definition with a single baseinstalldir field and path, such as baseinstalldir: /data/memsql/Master. Each node definition would then resemble:

    nodes:
      - register: false
        role: Master
        config:
          baseinstalldir: /data/memsql/Master
          port: 3306
    
  7. Run the following with the path to the cluster file as input.

    sdb-deploy setup-cluster --cluster-file </path/to/cluster-file>
    
Additional Deployment Options

Notice

If this deployment method is not ideal for your target environment, you can choose one that fits your requirements from the Deployment Options.

Interact with your Cluster
Start Studio

On your main deployment host, run the following command to use SingleStore DB Studio to monitor and interact with your cluster.

Enable the SingleStore DB Studio service to start SingleStore DB Studio at system boot (recommended).

sudo systemctl enable singlestoredb-studio.service
****
Created symlink /etc/systemd/system/multi-user.target.wants/memsql-studio.service → /lib/systemd/system/memsql-studio.service.

SSH into your main deployment host and run the following:

sudo systemctl start singlestoredb-studio

If your Linux distribution does not use systemd, you can run SingleStore DB Studio directly instead.

sudo singlestoredb-studio &

The Studio Web server will now be running on port 8080, which can be accessed via Web browser at http://<main-deployment-host>:8080.

Add a New Cluster to Studio
  1. With SingleStore DB Studio running, go to http://<main_deployment_host>:8080 and click Add New Cluster to set up a cluster.

    Important

    SingleStore DB Studio is only supported on Chrome and Firefox browsers at this time.

    To run Studio on a different port, add port = <port_name> to /etc/singlestore/singlestoredb-studio.hcl and restart Studio.

  2. Paste the main deployment host IP address into Hostname.

  3. Set Port to 3306.

  4. Specify root as the Username.

  5. In the Password field, provide the Superuser password that was set during cluster deployment.

  6. Click Create Cluster Profile and set Type as Development.

  7. Fill in Cluster Name and Description to your preference.

After you have successfully logged in, you will see the dashboard for your cluster. To run a query against your cluster, navigate to the SQL Editor through the navigation in the left pane.

Next Steps After Deployment

Now that you have installed SingleStore DB and connected to SingleStore DB Studio, check out the following resources to continue your learning: