AWS EC2 Best Practices
This topic does not apply to SingleStore Managed Service.
This document summarizes findings from the technical assistance provided by SingleStore DB engineers to customers operating a production SingleStore DB environment at Amazon EC2. This guide is designed for someone familiar with SingleStore DB fundamentals as well as technical basics, terminology, and economics of AWS operations.
SingleStore DB Aggregator Node - What is a SingleStore DB Aggregator?
SingleStore DB Leaf Node - What is a SingleStore DB Leaf?
SingleStore DB will run on any Amazon Machine Image (AMI) with a kernel version of 3.10 or higher. Amazon Linux v2, RHEL/CentOS 6 or higher, and Debian 8 or higher all meet this requirement.
SingleStore DB Cluster Provisioning Considerations
SingleStore DB is a shared-nothing MPP cluster of servers. In the context of EC2, a SingleStore DB server is an EC2 instance. As a general rule, a cluster should be operated within a single EC2 VPC/Region/Availability Zone and should utilize identically configured instances of the same type. The amount of EBS storage attached to the SingleStore DB EC2 instances, however, may differ depending on the type of node(s) hosted by the instance; aggregators require a minimal amount of EBS storage (aggregators may store reference table data) while EBS storage for leaves has to be provisioned as per user data capacity requirements.
The recommended decision making method is a two-step process:
- Select the proper instance type for the SingleStore DB server. This will be a cluster’s “building block”.
- Determine the required number of instances to scale out the cluster capacity horizontally to meet storage, response time, concurrency, and service availability requirements.
The basic principle when provisioning virtualized resources for a SingleStore DB server is to allocate CPU, RAM, storage I/O and Networking in a balanced manner so no particular resource becomes a bottleneck, leaving other resources underutilized.
EC2 users should keep in mind that “Amazon EC2 dedicates some resources of the host computer, such as CPU, memory, and instance storage, to a particular instance. Amazon EC2 shares other resources of the host computer, such as the network and the disk subsystem, among instances.” AWS Instance Types
Selecting Instance Type (CPU, RAM)
SingleStore recommends provisioning a minimum of 8GB of RAM per physical core or virtual processor. For faster CPUs, 16-24GB per core is commonly selected.
When selecting an instance type as a building block of a cluster in a production environment, users should only consider instances with 8GB or more RAM per vCPU.
Several available instance types meet the above guideline. The best instance type for a particular application can only be selected per specific performance and availability requirements.
FAQ: Is it better to have a smaller number of bigger machines or a larger number of smaller machines?
Answer: Though there is no “one size fits all” answer, the smallest number of “medium” size machines may be a good baseline. There is rarely a good reason to select larger than a 2 socket underlying server in an EC2 environment (e.g. 32 vCPU instances), while 4 vCPU or smaller instances are only suitable for low concurrency database workloads. An 8 vCPU “medium” size instance may be a good starting point. We at SingleStore experimented extensively with r4.2x large instances (8 vCPUs, 64GB RAM - meets SingleStore guidance) and often find it a potent option with reasonable commercial terms. For heavy columnstore or concurrent workloads, 16 core+ machines are recommended.
Note: If you select a multi-socket EC2 instance, you must enable NUMA and deploy one leaf node per socket. See Configuring SingleStore DB for NUMA.
We recommend 10 Gbps networking and deploying clusters in a single availability zone on a VPC for latency and bandwidth considerations.
You can set up replication in a secondary cluster for DR configuration in a different availability zone in the same region or a different region. When secondary clusters are deployed in a different VPC than the primary you can leverage AWS Transit Gateway or VPC Peering, so that replication traffic never passes through the public internet.
Nitro-based Instance types support network bandwidth up to 100 Gbps based on the instance type selected. SingleStore recommends optimizing the workload for the requirement. The key decision points for optimization are:
- Instance type selection based on Memory requirement of the Aggregator node and Leaf node.
- EBS volume IOPS should also be supported by the EC2 instance type selected.
SingleStore DB Deployment
As a general rule, all EC2 instances of a cluster should be configured on a single subnet. This means all SingleStore nodes will be within a single VPC/Region/Availability Zone.
All AWS customers get a VPC allocated to the account. Each EC2 instance must be assigned to a subnet in a VPC. Customers are expected to bind a subnet to an AZ and then place the instance in the subnet. An instance can only exist in one subnet and therefore one AZ.
For DR configurations or geographically distributed data services, customers can provision two or more clusters, each in a dedicated VPC, typically in separate regions, with VPC peering between VPCs (regions).
The following examples illustrate use cases leveraging VPC peering:
- A SingleStore DB environment includes a primary cluster in one region and a secondary cluster in a different geography (region) for DR. Connectivity between the primary and secondary sites is provided by VPC peering.
- A cluster is ingesting data subscribing to a Kafka feed. A customer would typically set up a Kafka cluster in one VPC and a cluster in a different VPC, with a VPC peering to connect the SingleStore DB database to Kafka.
For VPC peering setup, scenarios, and configuration guidance see the VPC Peering Guide.
Partition Placement Groups
Partition placement groups help reduce the likelihood of correlated hardware failures for your application. When using partition placement groups, Amazon EC2 divides each group into logical segments called partitions. Amazon EC2 ensures that each partition within a placement group has its own set of racks. Each rack has its own network and power source. No two partitions within a placement group share the same racks, allowing you to isolate the impact of hardware failure within your application.
SingleStore recommends using Partition Placement Groups with SingleStore DB Availability Groups to implement HA solutions.
Aligning AWS Availability Zones with SingleStore DB Availability Groups
If you are considering using AWS Availability Zones and SingleStore DB Availability Groups to add “AZ level robustness” to operating environments, note the following:
SingleStore DB operates most efficiently when all nodes of a cluster are within a single subnet. Separate AWS Availability Zones require separate subnets and as such are not optimal for SingleStore DB performance. The current guidance from SingleStore DB is to deploy all AWS nodes of a cluster into the same AWS Availability Zone.
When provisioning EBS volumes’ capacity per application data retention requirements, SingleStore DB EC2 administrators need to include a “fudge” factor and ensure that no production environment is operated with less than 40% free storage space on the data volume. Free disk space is required for temporary data materializations during database operations and to support continuous data growth. Users should also be aware that EBS performance generally tracks with EBS volume size.
For rowstore SingleStore DB deployments, provision a storage system for each node with at least 3 times the capacity of main memory and for columnstore workloads, provision SSD based storage volumes.
Please note that an under-configured EC2 storage is a common root cause of inconsistent SingleStore DB EC2 cluster performance.
SingleStore DB is a shared-nothing MPP system, i.e. each SingleStore DB server (EC2 instance) manages its own “internal” storage.
To ensure a permanent SingleStore DB server storage, users need to provision EBS volumes and attach them to SingleStore DB servers (EC2 instances).
EBS is a disk subsystem that is shared among instances, which means that SingleStore DB EC2 users may encounter somewhat unpredictable variations in I/O performance across SingleStore DB servers and even performance variability of the same EBS volume over time. EBS performance characteristics may be affected by activities of co-tenants (a “noisy neighbor” problem), by file replication for availability, by EBS rebalancing, etc.
The elastic nature of EBS means that the system is designed to monitor utilization of underlying hardware assets and automatically rebalance itself to avoid hotspots. This has both a positive and negative impact on end-user operations. Users are assured that EBS will reasonably promptly resolve severe contention for I/O. But on the other side, relocation of files to new storage nodes during rebalancing adversely affects EBS volume performance.
To maximize consistency and performance characteristics of EBS, SingleStore encourages users to follow the general AWS recommendation to attach multiple EBS volumes to an instance and stripe across the volumes. This technique is widely employed by EC2 users. Since AWS charges by the EBS volume capacity, there should be no economic penalties for using multiple smaller EBS vs. one large EBS of the same total size.
Users can consider attaching 3-4 EBS volumes to each leaf server (instance) of the cluster and present this storage to the host as a software RAID0 device.
Studies show that there is an appreciable increase in RAID0 performance up to 4 EBS volumes in a stripe, with flattening after 6 volumes.
For more information, see the AWS documentation section “Amazon EBS Volume Performance on Linux Instances”, in particular:
SingleStore DB EC2 customers with extremely demanding database performance requirements may consider provisioning enhanced EBS types such as io1, delivering very high IOPS rates.
The General Purpose SSD (gp2) option provides a good balance of performance and cost for most deployments. It delivers single-digit millisecond latencies and the ability to burst to 3,000 IOPS for extended periods. It provides a consistent baseline performance of 3 IOPS/GiB, for example an EBS gp2 volume of 1000 GiB has a maximum IOPS of 3,000 (16 KiB I/O size). You can consider joining multiple EBS volumes together in RAID 0 configuration to increase available throughput.
You do not have to over provision your EBS volumes based on your future expected workloads. You can benefit from the EBS Elastic Volume feature, which allows changes to the type, size, and IOPS with no downtime.
Storage Level Redundancy
As a reminder, SingleStore DB provides native out-of-the-box fault tolerance.
In SingleStore DB database environments running on physical hardware, SingleStore recommends supplementing a cluster’s fault tolerance with storage level redundancy supported by hardware RAID controllers. It’s a cost effective approach diminishing the impact of a single drive failure on cluster operations.
However, in EC2 environments storage-level redundancy provisions are not applicable because:
- EBS volumes are not statistically independent (they may share the same physical network and storage infrastructure).
- Studies and customer experience show that performance of software RAID in a redundant configuration, in particular RAID5 over EBS volumes is below acceptable levels.
For fault tolerance, SingleStore DB EC2 users can rely on cluster level redundancy and under-the-cover mirroring of EBS volumes provided by AWS.
Instance (Ephemeral) Storage
EC2 instance types that meet recommendations for a SingleStore DB server typically come with preconfigured temporary block storage referred to as instance store or ephemeral store. Since ephemeral storage is physically attached to the host computer, it delivers superior I/O performance compared to network-attached EBS.
However due to instance storage’s ephemeral nature, proper care must be taken (configure HA, understand the limitations and potential risks) when deploying persistent data storage in a production environment.
The use of instance storage for SingleStore DB data is typically limited to scenarios where the database can be reloaded entirely from persistent backups or custom save points. For example, as a “development sandbox”, or for one-time data mining/ad hoc analytics, or when data files loaded since the last save point are preserved and may be used to restore the latest content, etc.
Encryption of Data at Rest
SingleStore recommends enabling EBS encryption. If an NVMe instance store is used, the data is encrypted at rest, by default.
Backup and Restore
SingleStore recommends backing up to an S3 bucket with cross-region replication enabled to protect against region failure and to meet disaster recovery requirements.
Load Balancing of Client Connections
Application clients access a SingleStore DB database cluster by connecting to aggregator nodes. Normally multiple aggregator nodes are provisioned for fault tolerance and performance considerations. A good practice is to spread client connections evenly across all aggregator nodes of a cluster. This can be achieved with either or both the following methods:
- Application side connection pool. Sophisticated connection pool implementations offer load balancing, failover and failback, and even multi-pool failover and failback.
- NLB, Network Load Balancing service.
Health Check Considerations
Expiring security certificates can be a security risk. AWS Certificate Manager (ACM) helps manage renewal for your Amazon-issues SSL/TL certificates and we recommend using it to mitigate the risk.
Enable AWS CloudTrail Logs for extending the retention of logs
AWS CloudTrail logs are enabled by default for management events and provide visibility into the past 90 days of account activity. Configure AWS CloudTrail logs for the SingleStore DB hosts if you wish to store the AWS management events logs over 90 days.
Create a IAM Role
AWS users should not use the AWS root user to create or manage clusters. Use an IAM user with the SingleStore DB Cluster Management Role to deploy and manage clusters on AWS.
Here is the minimum required privilege IAM Policy needed for the SingleStore DB Cluster Management Role:
"aws-marketplace:ListBuilds", "aws-marketplace:StartBuild", "aws-marketplace:Subscribe", "aws-marketplace:ViewSubscriptions", "cloudformation:CreateChangeSet", "cloudformation:CreateStack", "cloudformation:CreateStackInstances", "cloudformation:CreateStackSet", "cloudformation:CreateUploadBucket", "cloudformation:DescribeChangeSet", "cloudformation:DescribeStackEvents", "cloudformation:DescribeStackInstance", "cloudformation:DescribeStackResource", "cloudformation:DescribeStackResources", "cloudformation:DescribeStacks", "cloudformation:DescribeStackSet", "cloudformation:GetStackPolicy", "cloudformation:GetTemplate", "cloudformation:GetTemplateSummary", "cloudformation:ListStackInstances", "cloudformation:ListStackResources", "cloudformation:ListStacks", "cloudformation:ListStackSetOperationResults", "cloudformation:ListStackSetOperations", "cloudformation:ListStackSets", "cloudformation:SetStackPolicy", "cloudformation:UpdateStack", "cloudformation:UpdateStackInstances", "cloudformation:UpdateStackSet", "cloudformation:UpdateTerminationProtection", "ec2:AssociateRouteTable", "ec2:AttachInternetGateway", "ec2:AttachVolume", "ec2:AuthorizeSecurityGroupIngress", "ec2:CreateInternetGateway", "ec2:CreateKeyPair", "ec2:CreateRoute", "ec2:CreateRouteTable", "ec2:CreateSecurityGroup", "ec2:CreateSubnet", "ec2:CreateTags", "ec2:CreateVolume", "ec2:CreateVpc", "ec2:DescribeAccountAttributes", "ec2:DescribeAvailabilityZones", "ec2:DescribeInstanceAttribute", "ec2:DescribeInstances", "ec2:DescribeInstanceStatus", "ec2:DescribeInternetGateways", "ec2:DescribeKeyPairs", "ec2:DescribeRegions", "ec2:DescribeRouteTables", "ec2:DescribeSecurityGroups", "ec2:DescribeSubnets", "ec2:DescribeTags", "ec2:DescribeVolumeAttribute", "ec2:DescribeVolumes", "ec2:DescribeVpcAttribute", "ec2:DescribeVpcs", "ec2:DetachInternetGateway", "ec2:ImportKeyPair", "ec2:ModifyVpcAttribute", "ec2:RebootInstances", "ec2:RunInstances", "ec2:StartInstances", "ec2:StopInstances", "ec2:TerminateInstances", "elasticloadbalancing:AddTags", "elasticloadbalancing:CreateListener", "elasticloadbalancing:CreateLoadBalancer", "elasticloadbalancing:DescribeListeners", "elasticloadbalancing:DescribeLoadBalancers", "elasticloadbalancing:DescribeTargetGroups", "elasticloadbalancing:DescribeTargetHealth", "iam:ListRoles", "sns:CreateTopic", "sns:ListTopics", "sns:TagResource"
When deploying SingleStore DB via Cloud Formation script, SingleStore recommends using the SingleStore DB Cluster Management Role to deploy and manage the cluster.
AWS administrators should rotate the user access key and secret periodically if IAM Roles are used to manage the users.
The SingleStore AWS team is actively soliciting customer feedback and would appreciate hearing from you. Please send your comments and suggestions to firstname.lastname@example.org.