Designing for Multi-Tenant Applications

SingleStore is a great solution for multi-tenant analytical applications because of its unique architecture which provides ultra fast ingest, super low latency queries, and high concurrency. This article is for developers looking to create secure, fast, scalable, and cost effective applications on top of SingleStore. It contains design patterns you can use to get the most out of SingleStore for your multi-tenant apps.

Definition: A multi-tenant application is one where a software-as-a-service (SaaS) application provider has many customers of their own. Each of their customers is a tenant of the application. While the typical design for a multi-tenant application assumes a Cloud-based platform, and that is the general approach described here, the principles and approach discussed herein also work well with a self-hosted environment. The diagrams below mention workspaces in the Cloud but the same concepts can be applied to clusters that you manage.

SaaS application providers typically have several goals, primarily:

create a compelling application
make the application fast and scalable
keep the cost per tenant reasonable
isolate tenants from one another

SingleStore can allow you to achieve all these goals.

Multi-Tenancy Approaches

There are the following main approaches to design your databases and table schemas to handle multi-tenancy:

tenant_id-column: This approach puts all tenants' data together in one database (DB), and puts a tenant_id column in every table to differentiate data for each tenant.
tables-per-tenant: This approach puts all tenant data in one DB, and gives each tenant their own set of tables. More than one tenant's tables may be placed in one database. All data in one table is for one tenant.
DB-per-tenant: This approach gives each tenant their own database. All data for a tenant is in tables in one database.
hybrid: Hybrid approaches combine elements of two or more of the above.

Here's a quick summary of the benefits and drawbacks of each and our suggestions of when each is appropriate.

Approach	Benefits	Drawbacks
tenant_id column	Low disk and memory overhead because only one DB and relatively few tables. Easy to aggregate data across many tenants. Use this if you have a very large number of tenants (say more than a thousand) or if you want to have an individual query aggregate data across many tenants.	Queries have to include tenant_id column filter on each table. Query plans may not be optimal for every tenant.
tables-per-tenant	Low disk overhead because creating a database and its associated log files is what takes the most disk space, at least initially.	Requires a table naming convention such as tenantID_LogicalTableName; application must substitute tenantID into actual table name at runtime.
DB-per-tenant	Easy naming convention – queries can look the same for each tenant. "USE db_for_tenant" sets the context DB for a tenant. Tenants are isolated from each other. Backup and restore, disaster recovery, resource governance, and other DB-level features work separately for each tenant.	Requires a lot of disk for each tenant for preallocated log files for each partition. Requires dedicated threads for each tenant DB (which may hit system limits with many DBs). Each tenant's data must be backed up separately.

Approach

Benefits

Drawbacks

tenant_id column

Low disk and memory overhead because only one DB and relatively few tables. Easy to aggregate data across many tenants.

Use this if you have a very large number of tenants (say more than a thousand) or if you want to have an individual query aggregate data across many tenants.

Queries have to include tenant_id column filter on each table. Query plans may not be optimal for every tenant.

tables-per-tenant

Low disk overhead because creating a database and its associated log files is what takes the most disk space, at least initially.

Requires a table naming convention such as tenantID_LogicalTableName; application must substitute tenantID into actual table name at runtime.

DB-per-tenant

Easy naming convention – queries can look the same for each tenant. "USE db_for_tenant" sets the context DB for a tenant. Tenants are isolated from each other. Backup and restore, disaster recovery, resource governance, and other DB-level features work separately for each tenant.

Requires a lot of disk for each tenant for preallocated log files for each partition. Requires dedicated threads for each tenant DB (which may hit system limits with many DBs).

Each tenant's data must be backed up separately.

Multi-Tenant Design Approaches

Customers have a few options when designing their databases with multi-tenancy in mind. The following sections describe various possible approaches.

Tenant_id Column Approach

The most scalable and cost effective model is one database and set of tables for all tenants, with a tenant_id column in each table used to identify data for different tenants. You could use customer_id or a similar column name to identify each tenant's data; we'll use tenant_id in the rest of this topic.

Highlights for tenant_id column approach:

The best system in terms of scalability, manageability, and cost control.
Keep customer data separate via addition of filters on tenant_id into SQL statements generated by the application.
Use the tenant_id column approach unless you have a good reason not to.

Reasons to not use tenant_id column approach:

Your customers have different schemas - For this, use tables-per-tenant or DB-per-tenant approaches.
For compliance reasons, some customers must not be in the same database as other customers - Go to DB-per-tenant, or a hybrid of tables-per-tenant and DB-per-tenant.
For compliance reasons, each customer must have its own database - Go to DB-per-tenant.

Here's an example schema that uses a tenant_id column to distinguish data among tenants "acme" (1) and "ajax" (2):

SQL

/* this table has one row per tenant to define tenant names and IDs */
CREATE TABLE tenant(id int, name varchar(80));
INSERT INTO tenant VALUES (1,"acme"), (2,"ajax");

/* data for each tenant in these tables tied to the owning tenant by 
   tenant_id */
CREATE TABLE sales(tenant_id int, product_id int, amount numeric(18,2));
CREATE TABLE product(tenant_id int, product_id int, product_name varchar(80), 
   list_price numeric(18,2));

Now, to retrieve all sales including the name of the item sold for tenant "ajax", you would write a SQL query like:

SQL

SELECT s.product_id, p.product_name, s.amount
FROM sales s, product p
WHERE s.tenant_id = 2 AND p.tenant_id = 2
AND s.product_id = p.product_id;

To make the SQL more dynamic so you don't have to substitute the tenants in with application code all the time, you could use a session variable:

SQL

SET @tenant_id = (SELECT id FROM tenant WHERE name = "ajax");

SELECT s.product_id, s.amount
FROM sales s, product p
WHERE s.tenant_id = @tenant_id AND p.tenant_id = @tenant_id
AND s.product_id = p.product_id;

Security in the Tenant_id-Column Approach

In the tenant_id-column approach, the application can keep data secure by (1) inserting the correct tenant_id in each new row and (2) putting filters like "table_name.tenant_id = @tenant_id" in the WHERE clause of every query, for every table referenced in the query. Normally, all data access must be done via the application. Direct access to the data must not be granted to individual tenants.

If direct read-only SQL access to the data is required, it is possible to create a view for each tenant for each table, restricting access to only their data, and grant SELECT access on the views to the appropriate tenant. It's also possible to use Row-Level Security so only one view will be needed and the appropriate tenant ID filter is put in at runtime

Tables-Per-Tenant Approach

Another great approach is to put all customers in the same database, but use separate sets of tables for each customer. We'll call this tables-per-tenant for short.

Highlights for tables-per-tenant approach:

This system is good at reducing total resources used (disk, threads), but you will need to carefully manage your memory use as you add customers. Even small tables take a certain amount of memory, which can become a limiting factor for tables-per-tenant.
This system is significantly more scalable than the DB-per-tenant approach so you should use it unless you strongly prefer DB-per-tenant for isolation of tenant data, for example, or you have few tenants or a big enough budget to provision enough disk for each tenant database needed in DB-per-tenant.

Figure 1. The tables-per-tenant approach, with a set of tables for each tenant, all in one database.

Memory Management Using Tables-Per-Tenant

Memory use is proportional to:

# of tables in set * # of sets of tables * overhead per table+ (number of frequent queries per tenant * number of active tenants)

Use the following strategies to minimize the above formula:

Use views instead of tables for each tenant (a hybrid tenant_id-column approach).
Use fewer unique queries (queries that require individual query plans to be compiled).
Decrease the number of tables via denormalization and/or avoiding vertical partitioning (wider tables are ok).

Another memory minimization strategy is:

Turn off table autostats (generally not recommended because it can affect query plan quality).

Consider this strategy to minimize disk usage:

Use fewer partitions (you can choose the number of partitions when you run CREATE DATABASE – but be aware that it is difficult to increase the number of partitions later). Tenants that you know will be small for a long time can have, say, only 2 or 4 partitions instead of the default value which is many times more.

Security in Tables-Per-Tenant Approach

The typical way to secure data in the tables-per-tenant approach is to perform all data access through applications and not grant direct SQL access to the tenant. If direct SQL access to the tables by the tenant is required, a user_id can be created for each tenant and that user can be granted access to their tables with GRANT statements. Typically you'd only grant read-only (SELECT) access. Granting UPDATE access would risk corrupting data and causing your application to fail.

DB-Per-Tenant Approach

It’s also possible to put each customer in its own database: DB-per-tenant. This is easy to set up and great for prototyping, but for many budgets it will not scale to a significant number of tenants due to the resources used for databases and tables.

Highlights for DB-per-tenant approach:

If every customer must be isolated at the database level for compliance reasons, the DB-per-tenant approach is your best option.
This option will work well for small numbers of customers, but because each database has significant local disk space overhead, you will need to scale up your Workspace Group (or cluster for self-hosted) as you add customers and databases.

Local Disk Use in DB-Per-Tenant

Unlimited Storage provides great benefits, but is distinct from the Local Disk (a.k.a. Persistent Cache), which is limited and holds cached data as well as other types of data, such as transaction logs, snapshots, and plan cache.

By default, an 8-partition database, on average, will use 4 GB for transaction logs for each partition when you create the database, so if you have a lot of databases, this can add up quickly. This means that if you plan to have a lot of customers, you will need a large amount of local disk using the DB-per-tenant approach, which can require scaling up your workspace (cluster).

Reducing Disk Usage Per DB

For smaller tenants, you can reduce the amount of disk needed per database by:

reducing the number of partitions
reducing the log_file_size_partitions and snapshot_trigger_size engine variables

Having less partitions means you'll need to allocate fewer log files (and thus less disk space) since each partition has a log file active and one pre-allocated, at time of DB creation. Reducing log_file_size_partitions will proportionally reduce storage for pre-allocated log files. Reducing snapshot_trigger_size reduces the total amount of log bytes that have to be kept around on disk in a steady state.

Aggressive settings for these two variables are:

Variable	Value	Formatted Value
log_file_size_partitions	8388608	8,388,608
snapshot_trigger_size	67108864	67,108,864

E.g. the following would create a 4-partition database with the log file size and snapshot trigger size shown:

SQL

SET GLOBAL log_file_size_partitions = 8388608;
SET GLOBAL snapshot_trigger_size = 67108864;
CREATE DATABASE db_tenant1 PARTITIONS = 4;

These values may work for small tenants but not be appropriate for large tenants. You can use different settings when creating different databases. For small tenants and a DB-per-tenant model, these settings can let you have many times more databases on the same size workspace (cluster).

Over time, if you start with small settings like this and a tenant grows to be large and needs to run on a large workspace or cluster, you may be required to to create a new database with more partitions and move the data over to it, which is an offline operation.

It's best to set these values appropriately in advance for each database. Large tenants should have larger values for these, when using a DB-per-tenant approach.

Isolating Sets of Customers with Workspace Groups (Hybrid)

Some groups of tenants might need to be physically isolated from other tenants. You can achieve this with separate workspace groups:

If some sets of customers must be totally walled off, Workspace Groups (or separate clusters in a self-hosted environment) provide the greatest isolation. For example, if all your customers in AWS Europe needed to be separated, you could have an AWS Europe Workspace Group.
You would need to use this in conjunction with a tenant_id-column or tables-per-tenant approach as having every customer in its own workspace group will not scale well in terms of operational complexity or cost.

Diagram showing an organization consisting of three tenants: AWS European Tenants, Azure European Tenants, and AWS Australian Tenants

Providing Capacity and Isolation for Tenants

Workspaces are how SingleStore achieves separation of storage and compute, which makes them ideal for use in multi-tenant applications.

Example

Let's assume that one of SingleStore's customers has 10,000 customers with free, standard and enterprise offerings. Workspaces can be used to create different service expectations between these offerings. Each group of tenants can be placed on a separate workspace, where the number of users and the size of the compute workspace could be adjusted to provide higher or lower quality of service per tenant.

Example offerings:

Free - S-2 Workspace - 9,000 tenants with low performance requirement and infrequent usage.
Standard - S-4 Workspace - 900 tenants, medium performance requirement with infrequent usage.
Enterprise - S-16 Workspace - 99 tenants with high performance requirements and frequent usage.

Here, a tenant_id-column approach would be used throughout, so 9,000 tenants in the "Free" group could fit on one S2 workspace. The same approach would be used for the higher-level groups so the application code would only have to be written once and not be different for different tiers.

Summary

When deciding which multi-tenancy approach is best for your particular circumstances, consider which factors including cost, scaling tenants over time, and isolation are most important to you. One of these approaches can work well for you:

tenant_id column
tables-per-tenant
DB-per-tenant
a hybrid of two or more of the above

Hybrid approaches can help you balance the relative strengths of the three foundational approaches. Moreover, placing databases containing different sets of tenants on different workspace groups (clusters) can give you additional scalability and isolation across your set of tenants.