Building a Multi-Tenant SaaS: Architecture Decisions That Matter
Building a Multi-Tenant SaaS: Architecture Decisions That Matter
We're building a multi-tenant SaaS. Every customer shares the same infrastructure but their data must be isolated.
We made some good decisions. We made some mistakes. Here's what we learned.
The Core Question
How do you isolate tenant data?
Three approaches:
- Database per tenant: Each tenant gets their own database
- Schema per tenant: Shared database, separate schemas
- Shared schema: One database, one schema, tenant_id column
We chose option 3. Here's why.
Our Architecture
Shared Schema with tenant_id
Every table has a tenant_id column.
CREATE TABLE orders (
id UUID PRIMARY KEY,
tenant_id UUID NOT NULL,
user_id UUID NOT NULL,
total DECIMAL(10,2),
created_at TIMESTAMP
);
CREATE INDEX idx_orders_tenant ON orders(tenant_id);
Every query filters by tenant_id.
SELECT * FROM orders
WHERE tenant_id = $1 AND user_id = $2;
Why Shared Schema?
Pros:
- Simple to manage
- Easy to scale
- Cost-effective
- Cross-tenant analytics possible
Cons:
- Risk of data leakage
- Noisy neighbor problem
- Complex queries
Why Not Database Per Tenant?
Database per tenant sounds safe. But:
- Managing 1,000 databases is hard
- Migrations take forever
- Costs add up
- Can't do cross-tenant analytics
We'd need it only for enterprise customers with strict compliance requirements.
Security: The Critical Part
Row-Level Security
PostgreSQL has Row-Level Security (RLS). It enforces tenant isolation at the database level.
ALTER TABLE orders ENABLE ROW LEVEL SECURITY;
CREATE POLICY tenant_isolation ON orders
USING (tenant_id = current_setting('app.tenant_id')::UUID);
Now the database automatically filters by tenant_id. Even if application code forgets.
Application-Level Checks
We also enforce tenant_id in application code.
class TenantMiddleware:
def __call__(self, request):
tenant_id = get_tenant_from_request(request)
request.tenant_id = tenant_id
set_db_tenant_id(tenant_id)
Every request sets the tenant context. Every query uses it.
API Design
Our API URLs include tenant identifier.
GET /api/v1/tenants/{tenant_id}/orders
POST /api/v1/tenants/{tenant_id}/orders
We validate that the authenticated user belongs to the tenant.
Performance Considerations
Indexing
Every query filters by tenant_id. So every table needs an index on tenant_id.
CREATE INDEX idx_orders_tenant_created
ON orders(tenant_id, created_at DESC);
Composite indexes for common query patterns.
Partitioning
For large tables, we partition by tenant_id.
CREATE TABLE orders (
id UUID,
tenant_id UUID,
...
) PARTITION BY LIST (tenant_id);
Each tenant's data lives in a separate partition. Queries are faster.
Caching
We cache per tenant.
cache_key = f"tenant:{tenant_id}:orders:{user_id}"
orders = cache.get(cache_key)
if not orders:
orders = db.query(...)
cache.set(cache_key, orders)
One tenant's cache doesn't affect another's.
Scaling Strategy
Horizontal Scaling
We can add more application servers. They're stateless.
Database Scaling
Read replicas for read-heavy workloads. Connection pooling to handle many tenants.
Tenant Sharding
Eventually, we'll shard by tenant_id. Large tenants get dedicated shards. Small tenants share shards.
Not needed yet, but the architecture supports it.
Billing and Metering
We track usage per tenant.
CREATE TABLE usage_events (
id UUID PRIMARY KEY,
tenant_id UUID NOT NULL,
event_type VARCHAR(50),
quantity INTEGER,
created_at TIMESTAMP
);
Aggregated monthly for billing.
SELECT tenant_id,
event_type,
SUM(quantity) as total
FROM usage_events
WHERE created_at >= '2025-11-01'
AND created_at < '2025-12-01'
GROUP BY tenant_id, event_type;
Tenant Onboarding
New tenant signup:
- Create tenant record
- Create admin user
- Set up default settings
- Send welcome email
All automated. Takes 2 seconds.
Tenant Offboarding
When a tenant cancels:
- Mark tenant as inactive
- Stop billing
- Schedule data deletion (30 days)
- Actually delete data
We soft-delete first. Hard-delete later.
Monitoring
We monitor per tenant:
- Request rate
- Error rate
- Response time
- Database query time
Alerts if one tenant is causing problems.
Mistakes We Made
1. Forgot tenant_id in a Query
We had a bug where one query didn't filter by tenant_id. A customer saw another customer's data.
Fix: Row-Level Security catches this now.
2. No Rate Limiting Per Tenant
One tenant hammered our API. Affected everyone.
Fix: Rate limiting per tenant.
3. Global Cache Keys
We used cache keys without tenant_id. One tenant's cache affected another's.
Fix: Always include tenant_id in cache keys.
Lessons Learned
- Security first: Enforce tenant isolation at multiple levels
- Index everything: tenant_id should be in every index
- Monitor per tenant: Know which tenants are causing issues
- Plan for scale: Architecture should support sharding
- Test isolation: Regularly test that tenants can't see each other's data
When to Use Each Approach
Shared schema: Most SaaS applications
Schema per tenant: Medium-sized tenants with compliance needs
Database per tenant: Enterprise customers with strict requirements
We use shared schema for 99% of tenants. We'll add database-per-tenant for enterprise later.
The Bottom Line
Multi-tenancy is complex. But it's the only way to build a scalable SaaS.
Shared schema works for most cases. Add Row-Level Security. Monitor everything. Test isolation.
Our architecture supports 1,000 tenants today. It'll support 100,000 tomorrow.