Building a Multi-Tenant SaaS: Architecture Decisions That Matter

Building a Multi-Tenant SaaS: Architecture Decisions That Matter

Building a Multi-Tenant SaaS: Architecture Decisions That Matter

We're building a multi-tenant SaaS. Every customer shares the same infrastructure but their data must be isolated.

We made some good decisions. We made some mistakes. Here's what we learned.

The Core Question

How do you isolate tenant data?

Three approaches:

  1. Database per tenant: Each tenant gets their own database
  2. Schema per tenant: Shared database, separate schemas
  3. Shared schema: One database, one schema, tenant_id column

We chose option 3. Here's why.

Our Architecture

Shared Schema with tenant_id

Every table has a tenant_id column.

CREATE TABLE orders (
  id UUID PRIMARY KEY,
  tenant_id UUID NOT NULL,
  user_id UUID NOT NULL,
  total DECIMAL(10,2),
  created_at TIMESTAMP
);

CREATE INDEX idx_orders_tenant ON orders(tenant_id);

Every query filters by tenant_id.

SELECT * FROM orders
WHERE tenant_id = $1 AND user_id = $2;

Why Shared Schema?

Pros:

  • Simple to manage
  • Easy to scale
  • Cost-effective
  • Cross-tenant analytics possible

Cons:

  • Risk of data leakage
  • Noisy neighbor problem
  • Complex queries

Why Not Database Per Tenant?

Database per tenant sounds safe. But:

  • Managing 1,000 databases is hard
  • Migrations take forever
  • Costs add up
  • Can't do cross-tenant analytics

We'd need it only for enterprise customers with strict compliance requirements.

Security: The Critical Part

Row-Level Security

PostgreSQL has Row-Level Security (RLS). It enforces tenant isolation at the database level.

ALTER TABLE orders ENABLE ROW LEVEL SECURITY;

CREATE POLICY tenant_isolation ON orders
USING (tenant_id = current_setting('app.tenant_id')::UUID);

Now the database automatically filters by tenant_id. Even if application code forgets.

Application-Level Checks

We also enforce tenant_id in application code.

class TenantMiddleware:
    def __call__(self, request):
        tenant_id = get_tenant_from_request(request)
        request.tenant_id = tenant_id
        set_db_tenant_id(tenant_id)

Every request sets the tenant context. Every query uses it.

API Design

Our API URLs include tenant identifier.

GET /api/v1/tenants/{tenant_id}/orders
POST /api/v1/tenants/{tenant_id}/orders

We validate that the authenticated user belongs to the tenant.

Performance Considerations

Indexing

Every query filters by tenant_id. So every table needs an index on tenant_id.

CREATE INDEX idx_orders_tenant_created 
ON orders(tenant_id, created_at DESC);

Composite indexes for common query patterns.

Partitioning

For large tables, we partition by tenant_id.

CREATE TABLE orders (
  id UUID,
  tenant_id UUID,
  ...
) PARTITION BY LIST (tenant_id);

Each tenant's data lives in a separate partition. Queries are faster.

Caching

We cache per tenant.

cache_key = f"tenant:{tenant_id}:orders:{user_id}"
orders = cache.get(cache_key)
if not orders:
    orders = db.query(...)
    cache.set(cache_key, orders)

One tenant's cache doesn't affect another's.

Scaling Strategy

Horizontal Scaling

We can add more application servers. They're stateless.

Database Scaling

Read replicas for read-heavy workloads. Connection pooling to handle many tenants.

Tenant Sharding

Eventually, we'll shard by tenant_id. Large tenants get dedicated shards. Small tenants share shards.

Not needed yet, but the architecture supports it.

Billing and Metering

We track usage per tenant.

CREATE TABLE usage_events (
  id UUID PRIMARY KEY,
  tenant_id UUID NOT NULL,
  event_type VARCHAR(50),
  quantity INTEGER,
  created_at TIMESTAMP
);

Aggregated monthly for billing.

SELECT tenant_id, 
       event_type,
       SUM(quantity) as total
FROM usage_events
WHERE created_at >= '2025-11-01'
  AND created_at < '2025-12-01'
GROUP BY tenant_id, event_type;

Tenant Onboarding

New tenant signup:

  1. Create tenant record
  2. Create admin user
  3. Set up default settings
  4. Send welcome email

All automated. Takes 2 seconds.

Tenant Offboarding

When a tenant cancels:

  1. Mark tenant as inactive
  2. Stop billing
  3. Schedule data deletion (30 days)
  4. Actually delete data

We soft-delete first. Hard-delete later.

Monitoring

We monitor per tenant:

  • Request rate
  • Error rate
  • Response time
  • Database query time

Alerts if one tenant is causing problems.

Mistakes We Made

1. Forgot tenant_id in a Query

We had a bug where one query didn't filter by tenant_id. A customer saw another customer's data.

Fix: Row-Level Security catches this now.

2. No Rate Limiting Per Tenant

One tenant hammered our API. Affected everyone.

Fix: Rate limiting per tenant.

3. Global Cache Keys

We used cache keys without tenant_id. One tenant's cache affected another's.

Fix: Always include tenant_id in cache keys.

Lessons Learned

  1. Security first: Enforce tenant isolation at multiple levels
  2. Index everything: tenant_id should be in every index
  3. Monitor per tenant: Know which tenants are causing issues
  4. Plan for scale: Architecture should support sharding
  5. Test isolation: Regularly test that tenants can't see each other's data

When to Use Each Approach

Shared schema: Most SaaS applications
Schema per tenant: Medium-sized tenants with compliance needs
Database per tenant: Enterprise customers with strict requirements

We use shared schema for 99% of tenants. We'll add database-per-tenant for enterprise later.

The Bottom Line

Multi-tenancy is complex. But it's the only way to build a scalable SaaS.

Shared schema works for most cases. Add Row-Level Security. Monitor everything. Test isolation.

Our architecture supports 1,000 tenants today. It'll support 100,000 tomorrow.

Subscribe to Blyss Blog

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.
jamie@example.com
Subscribe