mTLS: Mutual TLS for Zero-Trust Service Authentication

Your microservices communicate over HTTP. You add an API gateway that validates JWTs for external traffic. But what about internal traffic? Service A calls Service B directly. Service B trusts the request because it came from inside the network. An attacker who compromises any internal service can now call any other service without authentication.

This is the “castle and moat” security model: strong perimeter, nothing inside. mTLS (mutual TLS) replaces it with zero-trust: every service authenticates every other service, regardless of network location.

What mTLS is

Regular TLS (one-way TLS) proves the server’s identity to the client. The server presents a certificate. The client verifies it against trusted certificate authorities. The client knows it is talking to the real server.

mTLS (mutual TLS) adds the reverse: the client also presents a certificate. The server verifies the client’s certificate. Both sides prove their identity.

Regular TLS:

Server presents certificate
Client verifies server identity
Client is anonymous to the server

mTLS:

Server presents certificate
Client verifies server identity
Client presents certificate
Server verifies client identity
Both sides are authenticated

graph LR
subgraph tls["Regular TLS"]
  C1["Client
(anonymous)"] -->|"1. ClientHello"| S1["Server"]
  S1 -->|"2. Certificate
(server identity)"| C1
  C1 -->|"3. Verify cert
against CA"| C1
  C1 -->|"4. Encrypted request"| S1
end

subgraph mtls["mTLS"]
  C2["Client
(has certificate)"] -->|"1. ClientHello"| S2["Server"]
  S2 -->|"2. Certificate
+ request client cert"| C2
  C2 -->|"3. Client certificate"| S2
  S2 -->|"4. Verify client cert
against CA"| S2
  C2 -->|"5. Encrypted request"| S2
end

style S1 fill:#EEEDFE,stroke:#534AB7,color:#3C3489
style S2 fill:#EEEDFE,stroke:#534AB7,color:#3C3489
style C2 fill:#E1F5EE,stroke:#0F6E56,color:#085041

How mTLS works in practice

Certificate infrastructure

mTLS requires a PKI (Public Key Infrastructure):

Root CA: A certificate authority that signs intermediate CAs. Kept offline and highly secured.
Intermediate CA: Signs service certificates. Can be revoked if compromised.
Service certificates: Each service has a certificate signed by the intermediate CA. The certificate contains the service’s identity (SPIFFE ID or DNS name).

SPIFFE (Secure Production Identity Framework for Everyone): A standard for service identity in distributed systems. Each service gets a SPIFFE ID: spiffe://trust-domain/service-name. The certificate’s Subject Alternative Name (SAN) contains the SPIFFE ID.

Certificate issuance and rotation

Certificates must be rotated regularly (every 24 hours in some systems). Manual rotation is impractical at scale. Use an automated certificate management system:

SPIRE: SPIFFE Runtime Environment. Issues and rotates certificates automatically. Each service gets a short-lived certificate (hours, not years).
Vault: HashiCorp’s secrets management tool. PKI secrets engine issues certificates.
cert-manager: Kubernetes operator for certificate management. Integrates with Let’s Encrypt and internal CAs.

Service mesh mTLS

Service meshes (Istio, Linkerd) implement mTLS transparently. A sidecar proxy (Envoy) is injected next to each service. The proxy handles TLS termination and certificate management. The application code does not need to know about mTLS.

Istio mTLS modes:

PERMISSIVE: Accepts both mTLS and plain HTTP. Used during migration.
STRICT: Only accepts mTLS. Rejects plain HTTP.

graph LR
subgraph mesh["Service Mesh mTLS"]
  subgraph svcA["Service A pod"]
    APP_A["App A
(plain HTTP)"]
    PROXY_A["Envoy sidecar
(handles mTLS)"]
  end
  subgraph svcB["Service B pod"]
    PROXY_B["Envoy sidecar
(handles mTLS)"]
    APP_B["App B
(plain HTTP)"]
  end
  APP_A -->|"plain HTTP"| PROXY_A
  PROXY_A -->|"mTLS"| PROXY_B
  PROXY_B -->|"plain HTTP"| APP_B
end

style PROXY_A fill:#EEEDFE,stroke:#534AB7,color:#3C3489
style PROXY_B fill:#EEEDFE,stroke:#534AB7,color:#3C3489
style APP_A fill:#E1F5EE,stroke:#0F6E56,color:#085041
style APP_B fill:#E1F5EE,stroke:#0F6E56,color:#085041

Where it breaks or gets interesting

Certificate revocation

If a service’s private key is compromised, you need to revoke its certificate. Certificate revocation is hard:

CRL (Certificate Revocation List): A list of revoked certificates. Clients must download and check it. Can be stale.
OCSP (Online Certificate Status Protocol): Real-time revocation check. Adds latency. OCSP stapling reduces this.
Short-lived certificates: If certificates expire in 24 hours, revocation is less critical. A compromised certificate is only valid for a short time.

Short-lived certificates (hours, not years) are the modern approach. Revocation becomes less important when certificates expire quickly.

mTLS and load balancers

Load balancers terminate TLS. With mTLS, the load balancer must also present a client certificate to backend services. This requires the load balancer to be part of the PKI.

Options: terminate mTLS at the load balancer and re-establish mTLS to backends (the load balancer acts as a client), or use passthrough mode (the load balancer forwards the TLS connection without terminating it).

Debugging mTLS

mTLS failures are harder to debug than regular TLS failures. Common issues:

Certificate expired
Certificate not trusted (wrong CA)
Certificate SAN does not match the expected identity
Clock skew (certificate not yet valid)

Use openssl s_client -connect host:port -cert client.crt -key client.key to test mTLS connections manually.

Performance overhead

mTLS adds overhead: certificate verification on every new connection. With connection reuse (HTTP/2, connection pooling), this overhead is amortized. For high-throughput services, use hardware acceleration (TLS offload cards) or ensure connection reuse is configured correctly.

Real-world systems

Istio - Service mesh that implements mTLS between all services in a Kubernetes cluster. Uses SPIRE for certificate management. Supports PERMISSIVE and STRICT modes.

Linkerd - Lightweight service mesh with automatic mTLS. Uses its own certificate management. Simpler than Istio.

Consul Connect - HashiCorp Consul’s service mesh. Uses mTLS for service-to-service communication. Integrates with Vault for certificate management.

Google BeyondCorp - Google’s zero-trust network model. Uses device certificates and user identity for access control. mTLS is a core component.

Cloudflare Access - Zero-trust access platform. Uses mTLS for device authentication.

How to apply it in practice

When to use mTLS

Use mTLS when:

You need zero-trust security (do not trust the network)
Services must authenticate each other (not just users)
You are running in a multi-tenant environment
Compliance requires strong service authentication

Use API keys or JWTs instead when:

mTLS operational complexity is too high
You are in a trusted network with strong perimeter security
Services are external (mTLS is for internal service-to-service)

Gradual adoption

Adopt mTLS gradually:

Start with PERMISSIVE mode (accept both mTLS and plain HTTP)
Deploy certificates to all services
Verify all services are using mTLS
Switch to STRICT mode (reject plain HTTP)

This prevents breaking existing services during the migration.

Certificate management automation

Never manage certificates manually at scale. Use:

SPIRE for SPIFFE-based certificate management
cert-manager for Kubernetes
Vault PKI for non-Kubernetes environments

Automate certificate rotation. Alert on certificates approaching expiry.

FAQ

Q: What is the difference between mTLS and API keys for service authentication?

API keys are shared secrets: if an API key is compromised, any service can impersonate the legitimate service. mTLS uses asymmetric cryptography: the private key never leaves the service. Even if an attacker intercepts traffic, they cannot forge a certificate without the private key. mTLS also provides encryption in transit, while API keys do not. mTLS is more secure but more complex to operate. API keys are simpler but less secure.

Q: Does mTLS replace authorization?

No. mTLS proves identity (authentication). It does not determine what a service is allowed to do (authorization). After mTLS authentication, you still need authorization: is Service A allowed to call Service B’s /admin endpoint? Use a policy engine (OPA, Istio AuthorizationPolicy) for authorization after mTLS authentication.

Q: How do you handle mTLS for external clients (browsers, mobile apps)?

mTLS for external clients requires distributing client certificates to users’ devices. This is complex and rarely done for consumer applications. For enterprise applications (zero-trust access), device certificates are managed by MDM (Mobile Device Management) systems. For most consumer applications, use regular TLS for external clients and mTLS only for internal service-to-service communication.

Interview questions

Q1: You are migrating a microservices architecture to zero-trust security. How do you implement mTLS without downtime?

Strong answer: Use a service mesh (Istio or Linkerd) with a gradual migration. Phase 1: deploy the service mesh in PERMISSIVE mode. All services accept both mTLS and plain HTTP. No traffic is disrupted. Phase 2: deploy certificates to all services (using SPIRE or cert-manager). Verify that services are using mTLS by checking the service mesh telemetry. Phase 3: switch to STRICT mode for services that have been verified. Do this service by service, not all at once. Phase 4: once all services are in STRICT mode, plain HTTP traffic is rejected. Monitor for any services that were missed. The key is the PERMISSIVE mode: it allows gradual migration without breaking existing traffic.

Q2: A service’s private key is compromised. How do you respond?

Strong answer: Immediate response: revoke the certificate (add to CRL or OCSP). If using short-lived certificates (24 hours), the certificate expires soon anyway. Issue a new certificate with a new key pair. Rotate the service’s identity (new SPIFFE ID if needed). Investigate how the key was compromised: was it stored insecurely? Was the service itself compromised? If the service was compromised, assume all data it had access to is compromised. Audit logs to see what the compromised service accessed. Notify affected services and users. Long-term: use short-lived certificates (hours, not days) to minimize the impact of future compromises. Store private keys in hardware security modules (HSMs) or use a secrets management service (Vault) that never exposes the raw key.

Q3: How does a service mesh implement mTLS transparently without changing application code?

Strong answer: The service mesh injects a sidecar proxy (Envoy) into each pod. The sidecar intercepts all inbound and outbound network traffic using iptables rules. For outbound traffic: the application makes a plain HTTP request to another service. The sidecar intercepts it, establishes an mTLS connection to the destination service’s sidecar, and forwards the request. For inbound traffic: the sidecar receives the mTLS connection from the calling service’s sidecar, verifies the client certificate, terminates TLS, and forwards the plain HTTP request to the application. The application never sees TLS - it only sees plain HTTP. The sidecar handles certificate management (fetching certificates from SPIRE), certificate rotation, and mTLS negotiation. This is transparent to the application code.