Designing for Multi-Tenant Applications
On this page
SingleStore is a great solution for multi-tenant analytical applications because of its unique architecture which provides ultra fast ingest, super low latency queries, and high concurrency.
Definition: A multi-tenant application is one where a software-as-a-service (SaaS) application provider has many customers of their own.
SaaS application providers typically have several goals, primarily:
-
create a compelling application
-
make the application fast and scalable
-
keep the cost per tenant reasonable
-
isolate tenants from one another
SingleStore can allow you to achieve all these goals.
Multi-Tenancy Approaches
There are the following main approaches to design your databases and table schemas to handle multi-tenancy:
-
tenant_
id-column: This approach puts all tenants' data together in one database (DB), and puts a tenant_ id column in every table to differentiate data for each tenant. -
tables-per-tenant: This approach puts all tenant data in one DB, and gives each tenant their own set of tables.
More than one tenant's tables may be placed in one database. All data in one table is for one tenant. -
DB-per-tenant: This approach gives each tenant their own database.
All data for a tenant is in tables in one database. -
hybrid: Hybrid approaches combine elements of two or more of the above.
Here's a quick summary of the benefits and drawbacks of each and our suggestions of when each is appropriate.
Approach |
Benefits |
Drawbacks |
---|---|---|
tenant_ |
Low disk and memory overhead because only one DB and relatively few tables. Use this if you have a very large number of tenants (say more than a thousand) or if you want to have an individual query aggregate data across many tenants. |
Queries have to include tenant_ |
tables-per-tenant |
Low disk overhead because creating a database and its associated log files is what takes the most disk space, at least initially. |
Requires a table naming convention such as tenantID_ |
DB-per-tenant |
Easy naming convention – queries can look the same for each tenant. |
Requires a lot of disk for each tenant for preallocated log files for each partition. Each tenant's data must be backed up separately. |
Multi-Tenant Design Approaches
Customers have a few options when designing their databases with multi-tenancy in mind.
Tenant_ id Column Approach
The most scalable and cost effective model is one database and set of tables for all tenants, with a tenant_
Highlights for tenant_
-
The best system in terms of scalability, manageability, and cost control.
-
Keep customer data separate via addition of filters on tenant_
id into SQL statements generated by the application. -
Use the tenant_
id column approach unless you have a good reason not to.
Reasons to not use tenant_
-
Your customers have different schemas - For this, use tables-per-tenant or DB-per-tenant approaches.
-
For compliance reasons, some customers must not be in the same database as other customers - Go to DB-per-tenant, or a hybrid of tables-per-tenant and DB-per-tenant.
-
For compliance reasons, each customer must have its own database - Go to DB-per-tenant.
Here's an example schema that uses a tenant_
/* this table has one row per tenant to define tenant names and IDs */CREATE TABLE tenant(id int, name varchar(80));INSERT INTO tenant VALUES (1,"acme"), (2,"ajax");/* data for each tenant in these tables tied to the owning tenant bytenant_id */CREATE TABLE sales(tenant_id int, product_id int, amount numeric(18,2));CREATE TABLE product(tenant_id int, product_id int, product_name varchar(80),list_price numeric(18,2));
Now, to retrieve all sales including the name of the item sold for tenant "ajax", you would write a SQL query like:
SELECT s.product_id, p.product_name, s.amountFROM sales s, product pWHERE s.tenant_id = 2 AND p.tenant_id = 2AND s.product_id = p.product_id;
To make the SQL more dynamic so you don't have to substitute the tenants in with application code all the time, you could use a session variable:
SET @tenant_id = (SELECT id FROM tenant WHERE name = "ajax");SELECT s.product_id, s.amountFROM sales s, product pWHERE s.tenant_id = @tenant_id AND p.tenant_id = @tenant_idAND s.product_id = p.product_id;
Security in the Tenant_ id-Column Approach
In the tenant_WHERE
clause of every query, for every table referenced in the query.
If direct read-only SQL access to the data is required, it is possible to create a view for each tenant for each table, restricting access to only their data, and grant SELECT
access on the views to the appropriate tenant.
Tables-Per-Tenant Approach
Another great approach is to put all customers in the same database, but use separate sets of tables for each customer.
Highlights for tables-per-tenant approach:
-
This system is good at reducing total resources used (disk, threads), but you will need to carefully manage your memory use as you add customers.
Even small tables take a certain amount of memory, which can become a limiting factor for tables-per-tenant. -
This system is significantly more scalable than the DB-per-tenant approach so you should use it unless you strongly prefer DB-per-tenant for isolation of tenant data, for example, or you have few tenants or a big enough budget to provision enough disk for each tenant database needed in DB-per-tenant.
Figure 1.
Memory Management Using Tables-Per-Tenant
Memory use is proportional to:
# of tables in set * # of sets of tables * overhead per table+ (number of frequent queries per tenant * number of active tenants)
Use the following strategies to minimize the above formula:
-
Use views instead of tables for each tenant (a hybrid tenant_
id-column approach). -
Use fewer unique queries (queries that require individual query plans to be compiled).
-
Decrease the number of tables via denormalization and/or avoiding vertical partitioning (wider tables are ok).
Another memory minimization strategy is:
-
Turn off table autostats (generally not recommended because it can affect query plan quality).
Consider this strategy to minimize disk usage:
-
Use fewer partitions (you can choose the number of partitions when you run
CREATE DATABASE
– but be aware that it is difficult to increase the number of partitions later).Tenants that you know will be small for a long time can have, say, only 2 or 4 partitions instead of the default value which is many times more.
The enable_
engine variable defaults to Full
mode.
You can view your memory usage in the SingleStore Helios Portal.
Security in Tables-Per-Tenant Approach
The typical way to secure data in the tables-per-tenant approach is to perform all data access through applications and not grant direct SQL access to the tenant.GRANT
statements.SELECT
) access.UPDATE
access would risk corrupting data and causing your application to fail.
DB-Per-Tenant Approach
It’s also possible to put each customer in its own database: DB-per-tenant.
Highlights for DB-per-tenant approach:
-
If every customer must be isolated at the database level for compliance reasons, the DB-per-tenant approach is your best option.
-
This option will work well for small numbers of customers, but because each database has significant local disk space overhead, you will need to scale up your Workspace Group (or cluster for self-hosted) as you add customers and databases.
Local Disk Use in DB-Per-Tenant
Unlimited Storage provides great benefits, but is distinct from the Local Disk (a.
By default, an 8-partition database, on average, will use 4 GB for transaction logs for each partition when you create the database, so if you have a lot of databases, this can add up quickly.
Reducing Disk Usage Per DB
For smaller tenants, you can reduce the amount of disk needed per database by:
-
reducing the number of partitions
-
reducing the
log_
andfile_ size_ partitions snapshot_
engine variablestrigger_ size
Having less partitions means you'll need to allocate fewer log files (and thus less disk space) since each partition has a log file active and one pre-allocated, at time of DB creation.log_
will proportionally reduce storage for pre-allocated log files.snapshot_
reduces the total amount of log bytes that have to be kept around on disk in a steady state.
Aggressive settings for these two variables are:
Variable |
Value |
Formatted Value |
---|---|---|
log_ |
8388608 |
8,388,608 |
snapshot_ |
67108864 |
67,108,864 |
E.
SET GLOBAL log_file_size_partitions = 8388608;SET GLOBAL snapshot_trigger_size = 67108864;CREATE DATABASE db_tenant1 PARTITIONS = 4;
These values may work for small tenants but not be appropriate for large tenants.
Over time, if you start with small settings like this and a tenant grows to be large and needs to run on a large workspace or cluster, you may be required to to create a new database with more partitions and move the data over to it, which is an offline operation.
It's best to set these values appropriately in advance for each database.
Isolating Sets of Customers with Workspace Groups (Hybrid)
Some groups of tenants might need to be physically isolated from other tenants.
-
If some sets of customers must be totally walled off, Workspace Groups (or separate clusters in a self-hosted environment) provide the greatest isolation.
For example, if all your customers in AWS Europe needed to be separated, you could have an AWS Europe Workspace Group. -
You would need to use this in conjunction with a tenant_
id-column or tables-per-tenant approach as having every customer in its own workspace group will not scale well in terms of operational complexity or cost.
Providing Capacity and Isolation for Tenants
Workspaces are how SingleStore achieves separation of storage and compute, which makes them ideal for use in multi-tenant applications.
Example
Let's assume that one of SingleStore's customers has 10,000 customers with free, standard, premium and dedicated offerings.
Example offerings:
-
Free - S-2 Workspace - 9,000 tenants with low performance requirement and infrequent usage.
-
Standard - S-4 Workspace - 900 tenants, medium performance requirement with infrequent usage.
-
Premium - S-16 Workspace - 99 tenants with high performance requirements and frequent usage.
-
Dedicated - S-4 Workspace - 1 tenant whose data and compute capacity need to be dedicated.
Here, a tenant_
Summary
When deciding which multi-tenancy approach is best for your particular circumstances, consider which factors including cost, scaling tenants over time, and isolation are most important to you.
-
tenant_
id column -
tables-per-tenant
-
DB-per-tenant
-
a hybrid of two or more of the above
Hybrid approaches can help you balance the relative strengths of the three foundational approaches.
Last modified: May 31, 2024