AWS EC2 Best Practices

This document summarizes findings from the technical assistance provided by SingleStore engineers to customers operating a production SingleStore environment at Amazon EC2. This guide is designed for someone familiar with SingleStore fundamentals as well as technical basics, terminology, and economics of AWS operations.

Glossary

Amazon EBS Encryption

Amazon Simple Storage Service (Amazon S3)

AWS Certificate Manager (ACM)

AWS Nitro System

CloudTrail

EBS, Elastic Block Storage

EC2 Instance

IAM Roles

SingleStore Aggregator Node - What is a SingleStore Aggregator?

SingleStore Leaf Node - What is a SingleStore Leaf?

Partition Placement Group

Regions and AZ (Availability Zones)

S3 Cross Region Replication

VPC (Virtual Private Cloud)

VPC Peering

Kernel Requirements

SingleStore will run on any Amazon Machine Image (AMI) with a kernel version of 3.10 or higher. Amazon Linux v2, RHEL/CentOS/AlmaLinux 7 or higher, and Debian 8 or higher all meet this requirement.

SingleStore Cluster Provisioning Considerations

SingleStore is a shared-nothing MPP cluster of servers. In the context of EC2, a SingleStore server is an EC2 instance. As a general rule, a cluster should be operated within a single EC2 VPC/Region/Availability Zone and should utilize identically configured instances of the same type. The amount of EBS storage attached to the SingleStore EC2 instances, however, may differ depending on the type of node(s) hosted by the instance; aggregators require a minimal amount of EBS storage (aggregators may store reference table data) while EBS storage for leaves has to be provisioned as per user data capacity requirements.

The recommended decision making method is a two-step process:

  1. Select the proper instance type for the SingleStore server. This will be a cluster’s building block.

  2. Determine the required number of instances to scale out the cluster capacity horizontally to meet storage, response time, concurrency, and service availability requirements.

The basic principle when provisioning virtualized resources for a SingleStore server is to allocate CPU, RAM, storage I/O and Networking in a balanced manner so no particular resource becomes a bottleneck, leaving other resources underutilized.

EC2 users should keep in mind that Amazon EC2 dedicates some resources of the host computer, such as CPU, memory, and instance storage, to a particular instance. Amazon EC2 shares other resources of the host computer, such as the network and the disk subsystem, among instances.AWS Instance Types

Selecting Instance Type (CPU, RAM)

SingleStore recommends provisioning a minimum of 8GB of RAM per physical core or virtual processor. For faster CPUs, 16-24GB per core is commonly selected.

When selecting an instance type as a building block of a cluster in a production environment, users should only consider instances with 8GB or more RAM per vCPU.

Several available instance types meet the above guideline. The best instance type for a particular application can only be selected per specific performance and availability requirements.

FAQ: Is it better to have a smaller number of bigger machines or a larger number of smaller machines?

Answer: Though there is no one size fits all answer, the smallest number of medium size machines may be a good baseline. There is rarely a good reason to select larger than a 2 socket underlying server in an EC2 environment (e.g. 32 vCPU instances), while 4 vCPU or smaller instances are only suitable for low concurrency database workloads. An 8 vCPU medium size instance may be a good starting point. We at SingleStore experimented extensively with r4.2x large instances (8 vCPUs, 64GB RAM - meets SingleStore guidance) and often find it a potent option with reasonable commercial terms. For heavy columnstore or concurrent workloads, 16 core+ machines are recommended.

Note: If you select a multi-socket EC2 instance, you must enable NUMA and deploy one leaf node per socket. See Configuring SingleStore for NUMA.

Networking

SingleStore recommends 10 Gbps networking and deploying clusters in a single availability zone on a VPC for latency and bandwidth considerations.

You can set up replication in a secondary cluster for DR configuration in a different availability zone in the same region or a different region. When secondary clusters are deployed in a different VPC than the primary you can leverage AWS Transit Gateway or VPC Peering, so that replication traffic never passes through the public internet.

AWS Nitro

Nitro-based Instance types support network bandwidth up to 100 Gbps based on the instance type selected. SingleStore recommends optimizing the workload for the requirement. The key decision points for optimization are:

  • Instance type selection based on Memory requirement of the Aggregator node and Leaf node.

  • EBS volume IOPS should also be supported by the EC2 instance type selected.

SingleStore Deployment

As a general rule, all EC2 instances of a cluster should be configured on a single subnet. This means all SingleStore nodes will be within a single VPC/Region/Availability Zone.

All AWS customers get a VPC allocated to the account. Each EC2 instance must be assigned to a subnet in a VPC. Customers are expected to bind a subnet to an AZ and then place the instance in the subnet. An instance can only exist in one subnet and therefore one AZ.

For DR configurations or geographically distributed data services, customers can provision two or more clusters, each in a dedicated VPC, typically in separate regions, with VPC peering between VPCs (regions).

VPC Peering

The following examples illustrate use cases leveraging VPC peering:

  • A SingleStore environment includes a primary cluster in one region and a secondary cluster in a different geography (region) for DR. Connectivity between the primary and secondary sites is provided by VPC peering.

  • A cluster is ingesting data subscribing to a Kafka feed. A customer would typically set up a Kafka cluster in one VPC and a cluster in a different VPC, with a VPC peering to connect the SingleStore database to Kafka.

For VPC peering setup, scenarios, and configuration guidance see the VPC Peering Guide.

Partition Placement Groups

Partition placement groups help reduce the likelihood of correlated hardware failures for your application. When using partition placement groups, Amazon EC2 divides each group into logical segments called partitions. Amazon EC2 ensures that each partition within a placement group has its own set of racks. Each rack has its own network and power source. No two partitions within a placement group share the same racks, allowing you to isolate the impact of hardware failure within your application.

SingleStore recommends using Partition Placement Groups with SingleStore Availability Groups to implement HA solutions.

Warning

Aligning AWS Availability Zones with SingleStore Availability Groups

If you are considering using AWS Availability Zones and SingleStore Availability Groups to add AZ level robustness to operating environments, note the following:

SingleStore operates most efficiently when all nodes of a cluster are within a single subnet. Separate AWS Availability Zones require separate subnets and as such are not optimal for SingleStore performance. The current guidance from SingleStore is to deploy all AWS nodes of a cluster into the same AWS Availability Zone.

Storage

Storage Capacity

When provisioning EBS volumes’ capacity per application data retention requirements, SingleStore EC2 administrators need to include a fudge factor and ensure that no production environment is operated with less than 40% free storage space on the data volume. Free disk space is required for temporary data materializations during database operations and to support continuous data growth. Users should also be aware that EBS performance generally tracks with EBS volume size.

For rowstore SingleStore deployments, provision a storage system for each node with at least 3 times the capacity of main memory and for columnstore workloads, provision SSD based storage volumes.

EBS Storage

Please note that an under-configured EC2 storage is a common root cause of inconsistent SingleStore EC2 cluster performance.

SingleStore is a shared-nothing MPP system, i.e. each SingleStore server (EC2 instance) manages its own internal storage.

To ensure a permanent SingleStore server storage, users need to provision EBS volumes and attach them to SingleStore servers (EC2 instances).

EBS is a disk subsystem that is shared among instances, which means that SingleStore EC2 users may encounter somewhat unpredictable variations in I/O performance across SingleStore servers and even performance variability of the same EBS volume over time. EBS performance characteristics may be affected by activities of co-tenants (a noisy neighbor problem), by file replication for availability, by EBS rebalancing, etc.

The elastic nature of EBS means that the system is designed to monitor utilization of underlying hardware assets and automatically rebalance itself to avoid hotspots. This has both a positive and negative impact on end-user operations. Users are assured that EBS will reasonably promptly resolve severe contention for I/O. But on the other side, relocation of files to new storage nodes during rebalancing adversely affects EBS volume performance.

To maximize consistency and performance characteristics of EBS, SingleStore encourages users to follow the general AWS recommendation to attach multiple EBS volumes to an instance and stripe across the volumes. This technique is widely employed by EC2 users. Since AWS charges by the EBS volume capacity, there should be no economic penalties for using multiple smaller EBS vs. one large EBS of the same total size.

Users can consider attaching 3-4 EBS volumes to each leaf server (instance) of the cluster and present this storage to the host as a software RAID0 device.

Studies show that there is an appreciable increase in RAID0 performance up to 4 EBS volumes in a stripe, with flattening after 6 volumes.

For more information, see the AWS documentation section Amazon EBS Volume Performance on Linux Instances, in particular:

EBS Type

SingleStore EC2 customers with extremely demanding database performance requirements may consider provisioning enhanced EBS types such as io1, delivering very high IOPS rates.

The General Purpose SSD (gp2) option provides a good balance of performance and cost for most deployments. It delivers single-digit millisecond latencies and the ability to burst to 3,000 IOPS for extended periods. It provides a consistent baseline performance of 3 IOPS/GiB, for example an EBS gp2 volume of 1000 GiB has a maximum IOPS of 3,000 (16 KiB I/O size). You can consider joining multiple EBS volumes together in RAID 0 configuration to increase available throughput.

You do not have to over provision your EBS volumes based on your future expected workloads. You can benefit from the EBS Elastic Volume feature, which allows changes to the type, size, and IOPS with no downtime.

Storage Level Redundancy

As a reminder, SingleStore provides native out-of-the-box fault tolerance.

In SingleStore database environments running on physical hardware, SingleStore recommends supplementing a cluster’s fault tolerance with storage level redundancy supported by hardware RAID controllers. It’s a cost effective approach diminishing the impact of a single drive failure on cluster operations.

However, in EC2 environments storage-level redundancy provisions are not applicable because:

  • EBS volumes are not statistically independent (they may share the same physical network and storage infrastructure).

  • Studies and customer experience show that performance of software RAID in a redundant configuration, in particular RAID5 over EBS volumes is below acceptable levels.

For fault tolerance, SingleStore EC2 users can rely on cluster level redundancy and under-the-cover mirroring of EBS volumes provided by AWS.

Instance (Ephemeral) Storage

EC2 instance types that meet recommendations for a SingleStore server typically come with preconfigured temporary block storage referred to as instance store or ephemeral store. Since ephemeral storage is physically attached to the host computer, it delivers superior I/O performance compared to network-attached EBS.

However due to instance storage’s ephemeral nature, proper care must be taken (configure HA, understand the limitations and potential risks) when deploying persistent data storage in a production environment.

The use of instance storage for SingleStore data is typically limited to scenarios where the database can be reloaded entirely from persistent backups or custom save points. For example, as a development sandbox, or for one-time data mining/ad hoc analytics, or when data files loaded since the last save point are preserved and may be used to restore the latest content, etc.

Encryption of Data at Rest

SingleStore recommends enabling EBS encryption. If an NVMe instance store is used, the data is encrypted at rest, by default.

Backup and Restore

SingleStore recommends backing up to an S3 bucket with cross-region replication enabled to protect against region failure and to meet disaster recovery requirements.

Load Balancing of Client Connections

Application clients access a SingleStore database cluster by connecting to aggregator nodes. Normally multiple aggregator nodes are provisioned for fault tolerance and performance considerations. A good practice is to spread client connections evenly across all aggregator nodes of a cluster. This can be achieved with either or both the following methods:

  • Application side connection pool. Sophisticated connection pool implementations offer load balancing, failover and failback, and even multi-pool failover and failback.

  • NLB, Network Load Balancing service.

Health Check Considerations

Expiring security certificates can be a security risk. AWS Certificate Manager (ACM) helps manage renewal for your Amazon-issues SSL/TL certificates and SingleStore recommends using it to mitigate the risk.

Last modified: July 30, 2024

Was this article helpful?