Build a Secrets Management Vault
security reliability distributed-systems
System Design Deep Dive
Secrets Management Vault
When every service needs a password - and stealing the database shouldn’t give an attacker all of them
Think of a bank vault with safety deposit boxes. The bank holds a master key that opens the vault door. Each customer holds a key that opens only their box. A bank employee cannot read your box without your key even if they have unrestricted physical access to the vault. A thief who steals the vault itself cannot open any box without both the master key and each individual customer key. This layered protection is the core intuition behind secrets management at scale.
At the scale of a real production system, hundreds of microservices each need dozens of credentials: database passwords, API keys for third-party providers, signing keys for JWTs, TLS private keys, OAuth client secrets. Credentials are accessed thousands of times per second. Some are long-lived static secrets that need rotating on a schedule. Others are ephemeral dynamic secrets generated on demand and valid for one hour. A single leaked credential in a shared environment has a blast radius proportional to how many services use it.
Three tensions dominate the design. Security demands encrypt-everything and audit-everything, producing an immutable record of every access. Performance demands sub-5ms read latency on the hot path because credentials are checked at service startup and on every authenticated API call. Availability demands that if the vault is down, every service that reads credentials at startup fails to start, which turns a vault outage into a full platform outage.
We need to solve for a key hierarchy that limits blast radius so that compromising one layer does not expose all secrets, a policy engine that enforces least-privilege without becoming a bottleneck at 10,000 requests per second, and lease-based rotation that updates consumers before revoking the old credential with no service restart required.
Requirements and Constraints
Functional Requirements:
- CRUD secrets at arbitrary path hierarchies (
secret/prod/db/postgres,secret/staging/api/stripe) - Dynamic secrets: generate database credentials on demand with a TTL; no pre-provisioned shared passwords
- Static secret auto-rotation: trigger rotation before TTL expiry, notify consumers via watch endpoint
- Full audit trail: every read, write, delete, and list operation is logged with token identity and timestamp
- RBAC: tokens bound to policies; policies express path patterns and capability ACLs
- Seal/unseal: vault starts sealed with the master key encrypted; K-of-N key shards required to unseal
Non-Functional Requirements:
- p99 read latency under 5ms
- 10,000 req/s sustained throughput
- 99.99% availability (vault down means services cannot start)
- AES-256-GCM encryption at rest
- TLS 1.3 in transit, mTLS between internal components
- Audit logs retained 1 year minimum
- RPO under 1 second, RTO under 30 seconds
Constraints:
- API-first; not a secrets browser for human operators
- Secret payload capped at 512 KB
- No cross-region synchronous writes (async DR replication only)
High-Level Architecture
Seven components make up the system. The API Gateway handles TLS termination, rate limiting, and Bearer token extraction. The Auth and Policy Engine validates token identity and evaluates RBAC policies per request. The Vault Core orchestrates business logic: envelope encryption, decryption, and secret versioning. The Lease Manager tracks TTLs, triggers rotation for static secrets, and drops dynamic credentials when their lease expires. The Storage Backend persists encrypted blobs in a Raft-replicated store - it never sees plaintext. The KMS/HSM holds the root key material and performs key wrapping and unwrapping operations. The Audit Pipeline receives a write-ahead event before every response is returned, writing to an immutable append-only log.
Data flows like this for a read request: the client sends a Bearer token and a secret path to the API Gateway. The Auth Engine validates the token identity, evaluates the token’s bound policies against the requested path and operation, and passes an authenticated request to Vault Core. Vault Core fetches the encrypted blob and its associated key identifier from Storage. It sends the encrypted DEK to KMS for unwrapping. KMS returns the plaintext DEK without ever seeing the secret payload. Vault Core decrypts the blob in memory using the DEK and returns the plaintext to the client. Before returning, it writes an audit event to the Audit Pipeline.
The Lease Manager runs a background sweep every second. For static secrets, it checks whether any secret’s TTL is within the rotation window and triggers rotation if so. For dynamic secrets like ephemeral database users, it calls into the backend provider (Postgres, MySQL, AWS IAM) to drop the credential when the lease expires. Every successful access also records a lease so that explicitly revoked tokens immediately cut off access even if the in-process cache would otherwise have served the old value.
The vault never stores plaintext anywhere. The storage backend only sees encrypted blobs - compromising storage gives an attacker nothing without also compromising KMS. Compromising KMS gives an attacker wrapped keys that are useless without the storage blobs. You must compromise both to extract any secret.
The Encryption Layer
The encryption layer transforms a plaintext secret into a ciphertext blob that can be stored safely anywhere, with the property that each secret is encrypted under its own key and that key is itself encrypted under a hierarchy of protection.
Envelope encryption is best understood as a box inside a box inside a box. The innermost box is the Data Encryption Key (DEK): a randomly generated 256-bit AES key used exactly once per secret. The DEK encrypts the secret blob. The DEK itself is then encrypted by a Key Encryption Key (KEK): one KEK exists per namespace or mount. The KEK is stored in the database encrypted by the Master Key. The Master Key is held only in the HSM hardware module and never leaves it in plaintext.
# AES-256-GCM envelope encryption for secret storage
import os
from cryptography.hazmat.primitives.ciphers.aead import AESGCM
import struct
def generate_dek() -> bytes:
return os.urandom(32) # 256-bit DEK
def encrypt_secret(plaintext: bytes, dek: bytes) -> bytes:
aesgcm = AESGCM(dek)
nonce = os.urandom(12) # 96-bit nonce for GCM
ciphertext = aesgcm.encrypt(nonce, plaintext, None)
return nonce + ciphertext # prepend nonce for storage
def decrypt_secret(blob: bytes, dek: bytes) -> bytes:
aesgcm = AESGCM(dek)
nonce, ciphertext = blob[:12], blob[12:]
return aesgcm.decrypt(nonce, ciphertext, None)
def encrypt_dek_with_kek(dek: bytes, kek: bytes) -> bytes:
aesgcm = AESGCM(kek)
nonce = os.urandom(12)
encrypted_dek = aesgcm.encrypt(nonce, dek, None)
return nonce + encrypted_dek
def decrypt_dek_with_kek(encrypted_dek_blob: bytes, kek: bytes) -> bytes:
aesgcm = AESGCM(kek)
nonce, encrypted_dek = encrypted_dek_blob[:12], encrypted_dek_blob[12:]
return aesgcm.decrypt(nonce, encrypted_dek, None)
Without envelope encryption, every secret is encrypted with one master key. A breach of that key means every secret in the vault is immediately decryptable. With envelope encryption, breaching the master key still requires fetching and unwrapping every individual KEK and DEK before any secret is readable. That is a time-bounded attack window, not instant compromise.
Key rotation at the DEK level re-encrypts one secret. At the KEK level, it re-encrypts all DEKs under that KEK - an expensive but scoped operation. Rotating the master key requires re-encrypting every KEK in the vault. Plan your key hierarchy depth accordingly.
The Auth and Policy Engine
The policy engine decides whether a given token is permitted to perform a given operation on a given path, evaluated on every request before Vault Core is invoked.
Three token types cover the main authentication scenarios. AppRole uses a role ID baked into the service image plus a short-lived secret ID injected at deploy time - the combination proves both what the service is and that a human authorized its deployment. JWT/OIDC lets services authenticate using tokens from an existing identity provider, which is the dominant pattern in Kubernetes where pods already have a service account token. AWS IAM lets EC2 instances and Lambda functions prove their identity using their instance metadata without any credentials at all.
# db-readonly policy: grants read-only access to production DB secrets
path "secret/data/prod/db/*" {
capabilities = ["read"]
}
path "secret/metadata/prod/db/*" {
capabilities = ["list", "read"]
}
path "secret/data/prod/db/admin" {
capabilities = []
}
The last rule explicitly denies access to the admin credential even though the wildcard above would otherwise permit it. Rules are evaluated in specificity order: the most specific matching path wins. An empty capabilities list is an explicit deny.
# Policy path evaluation - O(num_policies * num_rules) per request
from fnmatch import fnmatch
CAPABILITY_MAP = {
"read": {"GET"},
"write": {"POST", "PUT"},
"delete": {"DELETE"},
"list": {"LIST"},
"create": {"POST"},
"update": {"PUT", "PATCH"},
}
def evaluate_policy(token_policies: list[dict], path: str, operation: str) -> bool:
"""Returns True if any policy grants the operation on the path."""
for policy in token_policies:
for rule in policy["rules"]:
if fnmatch(path, rule["path"]):
allowed_ops = set()
for cap in rule["capabilities"]:
allowed_ops |= CAPABILITY_MAP.get(cap, set())
if operation in allowed_ops:
return True
return False # deny by default
Token metadata and bound policies are cached in-process to avoid a database round-trip on every request. The cache TTL is typically 5 seconds. If you revoke a token or change its policies, those changes take up to 5 seconds to propagate. In high-security environments, set the policy cache TTL to zero and accept the additional latency.
The Lease Manager
The lease manager assigns a TTL to every secret access and tracks all outstanding leases so that expiry, renewal, and revocation are handled consistently regardless of whether a secret is static or dynamically generated.
Think of it like a hotel key card. The card is programmed to expire at checkout time. The hotel does not wait for you to hand it back. It simply stops working at the configured time. You can extend your stay (renew the lease) and the card’s expiry moves forward. If you check out early, the card can be invalidated immediately (explicit revocation). The card itself does not contain a record of your checkout time - the hotel’s system does.
Each lease record stores: the secret path it covers, the token that obtained it, when it was issued, when it expires, the maximum TTL it can be extended to, whether it is renewable, whether it is a dynamic secret, and the backend reference string used to drop the credential on revocation.
// Lease expiry sweep - runs every second to revoke expired leases
package lease
import (
"context"
"database/sql"
"log"
"time"
)
type Lease struct {
ID string
SecretPath string
TokenID string
ExpiresAt time.Time
Renewable bool
IsDynamic bool
}
type Manager struct {
db *sql.DB
revoker SecretRevoker
}
type SecretRevoker interface {
RevokeSecret(ctx context.Context, path string, leaseID string) error
}
func (m *Manager) SweepExpired(ctx context.Context) error {
rows, err := m.db.QueryContext(ctx, `
SELECT id, secret_path, token_id, is_dynamic
FROM leases
WHERE expires_at < NOW() AND revoked_at IS NULL
LIMIT 1000
`)
if err != nil {
return err
}
defer rows.Close()
for rows.Next() {
var l Lease
if err := rows.Scan(&l.ID, &l.SecretPath, &l.TokenID, &l.IsDynamic); err != nil {
continue
}
if l.IsDynamic {
// Revoke the credential from the backend provider (e.g., DROP USER in Postgres)
if err := m.revoker.RevokeSecret(ctx, l.SecretPath, l.ID); err != nil {
log.Printf("revocation failed for lease %s: %v", l.ID, err)
continue
}
}
m.db.ExecContext(ctx, `UPDATE leases SET revoked_at = NOW() WHERE id = $1`, l.ID)
}
return rows.Err()
}
func (m *Manager) RenewLease(ctx context.Context, leaseID string, increment time.Duration) error {
_, err := m.db.ExecContext(ctx, `
UPDATE leases
SET expires_at = LEAST(expires_at + $1, issued_at + max_ttl)
WHERE id = $2 AND renewable = true AND revoked_at IS NULL
`, increment, leaseID)
return err
}
For static secrets, expiry just marks the lease as expired and triggers rotation of the underlying credential. The secret path continues to serve the new version. For dynamic secrets, expiry calls into the backend to drop the provisioned credential entirely. The database user is dropped, the IAM role binding is removed, or the API key is invalidated.
HashiCorp Vault uses this exact lease model. When a Kubernetes service account token is revoked, all child leases are cascade-revoked and any dynamic database credentials tied to those leases are immediately dropped from the database. The cascade happens within a single lease sweep cycle, typically under one second.
The Audit Pipeline
The audit pipeline guarantees a complete, tamper-evident record of every operation performed against the vault, written before the response is returned to the caller.
Write-ahead is the key invariant. The audit event is appended to a write-ahead log before Vault Core returns the response. If the audit write fails and the vault is configured in strict mode, the request fails. This means there is no window where a secret was accessed but the access went unlogged. An attacker who reads a secret leaves a record. An operator who grants themselves access to a restricted path leaves a record.
{
"timestamp": "2026-06-03T10:42:33.847291Z",
"request_id": "f4a92c1e-7b3d-4f08-a92b-1c3d5e7f9012",
"token_display_name": "webapp-production-001",
"auth_method": "approle",
"operation": "read",
"path": "secret/data/prod/db/postgres",
"status": "success",
"client_ip": "10.42.3.17",
"response_time_ms": 3,
"metadata": {
"version": "5",
"lease_id": "a91b2c3d-..."
}
}
Never log the secret value in the audit trail. Log that the secret at path X was accessed by token Y at time Z with outcome success or denied. If you log plaintext for debugging convenience, you have created an unencrypted copy of every secret in a place that almost certainly has weaker access controls than the vault itself.
Data Model
-- Core secrets table with versioning support
CREATE TABLE secret_versions (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
path TEXT NOT NULL,
version INTEGER NOT NULL,
encrypted_blob BYTEA NOT NULL,
kek_id UUID NOT NULL REFERENCES key_encryption_keys(id),
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
deleted_at TIMESTAMPTZ,
metadata JSONB DEFAULT '{}',
UNIQUE (path, version)
);
CREATE INDEX idx_secret_versions_path ON secret_versions(path, version DESC);
CREATE INDEX idx_secret_versions_deleted ON secret_versions(deleted_at) WHERE deleted_at IS NOT NULL;
-- Active leases tracking TTL and dynamic credential lifecycle
CREATE TABLE leases (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
secret_path TEXT NOT NULL,
token_id UUID NOT NULL,
issued_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
expires_at TIMESTAMPTZ NOT NULL,
max_ttl INTERVAL NOT NULL,
renewable BOOLEAN NOT NULL DEFAULT TRUE,
is_dynamic BOOLEAN NOT NULL DEFAULT FALSE,
revoked_at TIMESTAMPTZ,
backend_ref TEXT
);
CREATE INDEX idx_leases_expires ON leases(expires_at) WHERE revoked_at IS NULL;
CREATE INDEX idx_leases_token ON leases(token_id) WHERE revoked_at IS NULL;
-- Tokens: stored as HMAC(token_value, hmac_key) for O(1) lookup
CREATE TABLE tokens (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
token_hmac TEXT NOT NULL UNIQUE,
display_name TEXT NOT NULL,
auth_method TEXT NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
expires_at TIMESTAMPTZ,
revoked_at TIMESTAMPTZ,
metadata JSONB DEFAULT '{}'
);
-- Policies and RBAC binding
CREATE TABLE policies (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
name TEXT NOT NULL UNIQUE,
rules JSONB NOT NULL
);
CREATE TABLE token_policies (
token_id UUID NOT NULL REFERENCES tokens(id),
policy_id UUID NOT NULL REFERENCES policies(id),
PRIMARY KEY (token_id, policy_id)
);
-- Key encryption keys per namespace/mount
CREATE TABLE key_encryption_keys (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
namespace TEXT NOT NULL,
version INTEGER NOT NULL,
encrypted_kek BYTEA NOT NULL,
master_key_ver INTEGER NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
retired_at TIMESTAMPTZ,
UNIQUE (namespace, version)
);
-- Audit log: append-only, time-partitioned
CREATE TABLE audit_events (
id UUID DEFAULT gen_random_uuid(),
ts TIMESTAMPTZ NOT NULL DEFAULT NOW(),
request_id UUID NOT NULL,
token_display TEXT NOT NULL,
auth_method TEXT,
operation TEXT NOT NULL,
path TEXT NOT NULL,
status TEXT NOT NULL,
client_ip INET,
response_ms INTEGER,
metadata JSONB DEFAULT '{}'
) PARTITION BY RANGE (ts);
-- Monthly partitions, e.g.:
-- CREATE TABLE audit_events_2026_06 PARTITION OF audit_events
-- FOR VALUES FROM ('2026-06-01') TO ('2026-07-01');
CREATE INDEX idx_audit_ts ON audit_events(ts DESC);
CREATE INDEX idx_audit_token ON audit_events(token_display, ts DESC);
CREATE INDEX idx_audit_path ON audit_events(path, ts DESC);
The audit log uses monthly range partitions. Dropping a partition is a metadata operation - no row-by-row delete, no table bloat, instant cleanup at the end of a compliance window. Secrets are sharded by path prefix (namespace), so all secrets under secret/prod/ colocate in storage and benefit from the same page cache. The leases index on (expires_at) WHERE revoked_at IS NULL keeps the sweep query fast even with millions of historical revoked leases in the table.
Key Algorithms and Protocols
Shamir Secret Sharing for Seal/Unseal
The seal/unseal mechanism is the vault’s protection against cold-boot attacks and rogue insiders. At initialization, the master key is split into N shards using Shamir’s Secret Sharing. Any K-of-N shards reconstructs the master key. On every startup, K operators each provide their shard. The vault is sealed and inert until K humans cooperate. No single person can unseal alone.
# Simplified Shamir Secret Sharing - split and reconstruct a 32-byte key
# Production uses a library like secretsharing or a hardened HSM implementation
import secrets
from functools import reduce
PRIME = 2**127 - 1 # Mersenne prime used as field modulus
def _eval_polynomial(coefficients: list[int], x: int) -> int:
"""Evaluate polynomial at x over the prime field."""
result = 0
for coeff in reversed(coefficients):
result = (result * x + coeff) % PRIME
return result
def split_secret(secret_int: int, n: int, k: int) -> list[tuple[int, int]]:
"""Split secret into n shares where any k shares reconstruct it."""
coefficients = [secret_int] + [secrets.randbelow(PRIME) for _ in range(k - 1)]
return [(x, _eval_polynomial(coefficients, x)) for x in range(1, n + 1)]
def reconstruct_secret(shares: list[tuple[int, int]]) -> int:
"""Lagrange interpolation to reconstruct secret from k shares."""
def lagrange_basis(i: int, xs: list[int]) -> int:
num = reduce(lambda a, b: a * b % PRIME, (0 - x % PRIME for x in xs if x != i), 1)
den = reduce(lambda a, b: a * b % PRIME, (i - x % PRIME for x in xs if x != i), 1)
return num * pow(den, PRIME - 2, PRIME) % PRIME
xs = [s[0] for s in shares]
return sum(y * lagrange_basis(x, xs) for x, y in shares) % PRIME
With N=5 and K=3, you tolerate 2 lost shards and 2 compromised ones simultaneously. A full disk image of the storage backend is inert without K shard holders cooperating. This is the foundation of the vault’s insider threat model.
Token HMAC Lookup
Tokens are stored as HMAC-SHA256(token_plaintext, hmac_key). The plaintext token is never persisted. On every request, the vault computes the HMAC of the presented token and looks up that hash. This gives O(1) lookup with no plaintext in the database - an attacker who dumps the tokens table gets only irreversible hashes.
# Token storage and lookup via HMAC - O(1) lookup, no plaintext in DB
import hmac
import hashlib
import secrets as _secrets
def generate_token() -> str:
return "s." + _secrets.token_urlsafe(32) # "s." prefix marks vault tokens
def token_to_lookup_key(token: str, hmac_key: bytes) -> str:
return hmac.new(hmac_key, token.encode(), hashlib.sha256).hexdigest()
# On storage: store only token_to_lookup_key(token, hmac_key)
# On lookup: compute key, query DB - no plaintext stored
Scaling and Performance
The read path is where performance is won or lost. Vault Core maintains an in-process LRU cache keyed by (path, version) with a 5-second TTL. At 10,000 req/s with an 80% cache hit rate, only 2,000 req/s actually reach storage and KMS. The remaining 8,000 req/s are served from L1 cache in under 1ms.
The write path uses Raft consensus across a 3 or 5 node cluster. All writes go to the Raft leader. Followers serve reads. With 3 nodes and quorum writes (2 of 3 must acknowledge), you can lose 1 node without blocking either reads or writes. With 5 nodes you can lose 2.
Given:
- 10,000 read req/s, 500 write req/s
- Average secret size: 1 KB
- Audit events: 10,500 per second
- Audit event size: 500 bytes
- Retention: 1 year audit, indefinite secrets
Compute (reads):
- 80% cache hit rate: 8,000 req/s served from in-process cache
- 2,000 req/s hit storage + KMS
- At 2ms KMS + 1ms storage = 3ms per cache-miss read
- Need: 2,000 * 3ms / 1000 = 6 KMS request threads (asyncio handles this fine)
Storage:
- Secrets: 1M secrets * 1KB * 3 replicas = 3 GB (trivial)
- Secret versions (10 per secret): 10M * 1KB = 10 GB
- Audit log: 10,500 events/s * 500B * 86,400s/day * 365 days = 166 TB/year
- With columnar compression (Parquet/ClickHouse): ~17 TB/year (10x compression)
Bandwidth:
- Inbound reads: 2,000 cache-miss req/s * 1KB = 2 MB/s (negligible)
- Audit stream: 5.25 MB/s to Kafka topic
The audit log is the dominant storage cost by orders of magnitude. Columnar compression with Parquet or ClickHouse brings 166 TB/year to around 17 TB/year. Monthly partitions mean you drop old data with a single DDL statement rather than a weeks-long DELETE operation.
HashiCorp Vault’s integrated storage uses the same Raft model. Cloudflare’s internal secrets system runs per-datacenter Vault clusters with async cross-datacenter replication, isolating a region outage from cascading to global credential access. The read cache is sized to cover the hot secret set for each datacenter independently.
Failure Modes and Recovery
| Failure | Detection | Impact | Recovery |
|---|---|---|---|
| Vault leader crash | Raft heartbeat timeout (default 10s) | Writes blocked during election; reads from followers unaffected | Raft elects new leader in under 30s; clients retry with exponential backoff |
| Vault sealed (OS restart) | Health endpoint returns 503 | All operations return error until unsealed | Unseal ceremony: K-of-N operators provide key shards |
| KMS/HSM unreachable | Decrypt call timeout | Cache-miss reads fail; cache-hit reads still serve | Use KMS multi-region replicas; fall back to secondary KMS after 500ms |
| Storage backend disk full | Write returns IO error | Writes blocked; reads still work from replicas | Add storage, compact old secret versions, purge expired leases |
| Audit pipeline down | WAL write timeout | Configurable: block requests (strict mode) or serve from WAL buffer (degraded) | Repair audit backend; WAL drains automatically on recovery |
| Clock skew between nodes | Raft log divergence | Leader election instability if skew exceeds election timeout | NTP enforced; election timeout set to 2x max expected skew |
The most common operational mistake is unsealing vault without rotating the unseal keys afterward. If an unseal shard is exposed during the unseal ceremony (screen recorded, shoulder-surfed, shared over Slack), rotating all shards immediately is the only safe response. Treat each shard like a root password.
Comparison of Approaches
| Approach | Read Latency | Audit Support | Dynamic Secrets | Failure Mode | Best Fit |
|---|---|---|---|---|---|
| Environment variables | 0ms | None | No | No rotation, no revocation | Solo scripts, local dev |
| Cloud SSM Parameter Store | 10-30ms | CloudTrail only | No | Provider outage | AWS-native workloads |
| Cloud Secrets Manager (AWS/GCP) | 5-20ms | Provider audit log | Limited | Vendor lock-in | Simple static secrets |
| HashiCorp Vault (open source) | 2-5ms | Full audit | Yes (all backends) | Sealed on restart | Full-featured, self-hosted |
| Custom KMS-backed store | 1-3ms | Custom | Manual | No lease management | Specialized requirements |
Choose HashiCorp Vault open-source when you need dynamic secrets, a full audit trail for compliance (SOC 2, PCI-DSS, HIPAA), and RBAC across many services. Choose cloud-native secret managers (AWS Secrets Manager, GCP Secret Manager) when you are already deep in one cloud provider’s ecosystem, want managed operations, and can accept vendor lock-in. The tradeoff is audit log portability and dynamic secret support - cloud managers bolt these on as features, whereas Vault builds from those primitives. Environment variables are acceptable only in local development and single-process scripts where rotation is not a concern.
Key Takeaways
- Envelope encryption means compromising storage gives attackers ciphertext only - you need both storage and KMS to reconstruct any secret.
- Key hierarchy depth determines rotation blast radius - rotating a DEK affects one secret; rotating a KEK affects all secrets under it.
- Dynamic secrets are safer than static ones because they expire automatically - no one forgets to rotate a credential that has a hard deadline.
- Lease-based revocation is the right primitive - every access creates a TTL, and access ends when the TTL expires, not when someone remembers to revoke it.
- Audit logging is mandatory and must be synchronous or write-ahead - an access that isn’t logged might as well not be controlled.
- RBAC with path-based ACLs enables least-privilege without per-secret configuration - one policy covers all secrets under a namespace pattern.
- Seal/unseal prevents cold-boot attacks - a vault image cloned from disk is useless until K humans cooperate to unseal it.
- The read cache is both a performance win and a latency SLO tool - size it to cover your hot secret set, keep TTL short enough that revocations take effect quickly.
The counter-intuitive lesson is that the vault’s availability guarantee runs against its security guarantee. A vault that is always available is one that either never seals (weakening cold-boot protection) or caches aggressively (meaning revocations take seconds to propagate). The right answer is to make the tension explicit: allow degraded reads from cache during brief KMS outages, but never serve an explicitly revoked token, and always block when the audit pipeline is down.
Frequently Asked Questions
Why not just use environment variables and rotate them in the deploy pipeline?
Environment variables are unencrypted in the process environment, visible in crash dumps, logged by container orchestrators, and require a full redeploy to rotate. A vault gives you rotation without restarts, revocation in milliseconds, and an audit trail of who read what when. Those three properties are required for any serious compliance posture: SOC 2, PCI-DSS, and HIPAA all require demonstrable access control with audit evidence, which environment variables cannot provide.
Why does the vault need to be sealed/unsealed rather than just starting normally?
The master key that decrypts all other keys must not live in plaintext on disk or in memory before an operator authorizes startup. The seal/unseal ceremony ensures that even a rogue insider with physical disk access cannot extract secrets without colluding with K key shard holders. Shamir splits mean no single person can unseal the vault alone, which is the defense against the insider threat that most companies discover only after an incident.
How do you handle the chicken-and-egg problem: services need vault credentials to start, but vault needs to be running first?
AppRole is designed for this. Services are provisioned with a role ID (semi-static, baked into the image) and a secret ID (short-lived, injected at pod start via init container or Vault Agent). The init container fetches the token before the main process starts, writing it to a shared tmpfs volume. Vault Agent can also auto-renew tokens in the background without the service process knowing. The key is that the service never calls vault directly at startup - the sidecar handles it.
Why not store secrets in a database with column-level encryption?
Column-level encryption in a database handles at-rest encryption but not access control granularity, dynamic secrets, or audit trails. You would need to build the policy engine, lease manager, and audit pipeline yourself. The vault pattern separates the secret storage concern from the application data concern - your main database does not need to know about credential management, and your secrets system does not need to store terabytes of application data.
How does secret rotation work without downtime for the service consuming it?
Two-phase rotation: write the new credential as version N+1 while version N is still valid. Notify consumers via a watch endpoint or push notification. Consumers re-read and reconnect with the new credential. After a grace period (configurable, typically 2-5 minutes), revoke version N. The dual-valid window size determines your rotation blast radius versus operational complexity tradeoff.
Why use Raft for storage rather than an external database like Postgres?
External storage adds another component whose failure modes you must reason about, and whose credentials you must manage - a classic chicken-and-egg problem. Integrated Raft storage means the vault is self-contained. The data set is tiny (millions of secrets at 1 KB each is a few GB), so Raft scales easily. External storage only makes sense if you need to share the backend with other systems or already operate a hardened Postgres cluster at high availability.
Interview Questions
“How would you design the key rotation process for a vault that has 1 million stored secrets without taking it offline?”
Discuss DEK-level rotation first: re-encrypt one secret at a time in a background job, atomic read-decrypt-re-encrypt-write per record, progress tracked in a rotation_jobs table. No service impact since the path and version do not change. KEK-level rotation affects one namespace and can be batched with a cursor: fetch 1,000 DEKs, re-encrypt under the new KEK, persist, repeat. Master key rotation requires re-encrypting every KEK in the vault, typically done during a maintenance window or via a live dual-write migration where new writes use the new master key while the background job re-encrypts old KEKs. The critical correctness constraint is that each record’s re-encryption is atomic - partial rotation leaves the system in a valid state because the old key is still present until explicitly retired.
“A service is reporting sub-millisecond vault read latency in testing but 50ms in production. How do you diagnose this?”
Check in-process cache hit rate first - it should be 80% or higher. If it is low in production, check whether the service accesses far more unique secret paths in production than in test (wide path fan-out kills cache efficiency). Then check KMS round-trip latency: is the KMS endpoint in the same AZ as the vault? Cross-AZ adds 1-3ms per call. Check Raft leader proximity: reads from a follower versus the leader have different latencies under replication lag. Check connection pool exhaustion to the storage backend - at 500 write/s and small pool, writers can starve readers. Finally check whether the audit pipeline is configured in synchronous mode, which adds its own write latency to every request.
“How do you implement dynamic database secrets - generating a fresh Postgres user per request and revoking it on lease expiry?”
Vault Core calls a secrets engine plugin for the database type. On read: connect to Postgres with a privileged rotation credential stored in the vault itself, run CREATE USER with a hashed name and random password, GRANT minimal privileges for the requested role, store the lease with is_dynamic=true and the username in backend_ref. On lease expiry: connect to Postgres, run DROP USER IF EXISTS using the value from backend_ref. The provisioning credential used to create and drop users must itself be stored in vault with rotation - avoiding the credential-outside-vault anti-pattern even for the provisioner.
“What happens to running services when the vault is sealed? How do you minimize the blast radius?”
Services that already have valid leases continue working until their leases expire. Services that need to fetch a new secret (restart or first startup) fail immediately. The mitigation is Vault Agent as a sidecar: it pre-fetches all secrets the service needs and writes them to a local tmpfs, renewing leases before expiry. When the vault is sealed, the sidecar continues serving from its local tmpfs cache for up to max_ttl. This decouples service health from vault availability for the common case - the only services affected are those whose max_ttl has expired during the sealed period, which for most credentials means the first 1-24 hours of an outage are handled transparently.
Premium Content
Unlock the full article along with everything else in the archive — all in one place.