kyle/mcias

Files

Kyle Isom ec7c966ad2 trusted proxy, TOTP replay protection, new tests

- Trusted proxy config option for proxy-aware IP extraction
  used by rate limiting and audit logs; validates proxy IP
  before trusting X-Forwarded-For / X-Real-IP headers
- TOTP replay protection via counter-based validation to
  reject reused codes within the same time step (±30s)
- RateLimit middleware updated to extract client IP from
  proxy headers without IP spoofing risk
- New tests for ClientIP proxy logic (spoofed headers,
  fallback) and extended rate-limit proxy coverage
- HTMX error banner script integrated into web UI base
- .gitignore updated for mciasdb build artifact

Security: resolves CRIT-01 (TOTP replay attack) and
DEF-03 (proxy-unaware rate limiting); gRPC TOTP
enrollment aligned with REST via StorePendingTOTP

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-03-12 17:44:01 -07:00

13 KiB

Raw Blame History

MCIAS Security Audit Report

Date: 2026-03-12 Scope: Full codebase — authentication flows, token lifecycle, cryptography, database layer, REST/gRPC/UI servers, authorization, and operational security. Methodology: Static code analysis of all source files with adversarial focus on auth flows, crypto usage, input handling, and inter-component trust boundaries.

Executive Summary

MCIAS demonstrates strong security awareness throughout. The cryptographic foundations are sound, credential handling is careful, and the most common web/API authentication vulnerabilities have been explicitly addressed. The codebase shows consistent attention to defense-in-depth: constant-time comparisons, dummy Argon2 operations for unknown users, algorithm-confusion prevention in JWT validation, parameterized SQL, audit logging, and CSRF protection with HMAC-signed double-submit.

Two confirmed bugs with real security impact were found, along with several defense-in-depth gaps that should be addressed before production deployment. The overall security posture is well above average for this class of system.

Confirmed Vulnerabilities

CRIT-01 — TOTP Replay Attack (Medium-High)

File: internal/auth/auth.go:208-230, internal/grpcserver/auth.go:84, internal/ui/handlers_auth.go:152

ValidateTOTP accepts any code falling in the current ±1 time-step window (±30 seconds, so a given code is valid for ~90 seconds) but never records which codes have already been used. The same valid TOTP code can be submitted an unlimited number of times within that window. There is no last_used_totp_counter or last_used_totp_at field in the schema.

Attack scenario: An attacker who has observed a valid TOTP code (e.g. from a compromised session, shoulder surfing, or a MITM that delayed delivery) can reuse that code to authenticate within its validity window.

Fix: Track the last accepted TOTP counter per account in the database. Reject any counter ≤ the last accepted one. This requires a new column (last_totp_counter INTEGER) on the accounts table and a check-and-update in ValidateTOTP's callers (or within it, with a DB reference passed in).

CRIT-02 — gRPC `EnrollTOTP` Enables TOTP Before Confirmation (Medium)

File: internal/grpcserver/auth.go:202 vs internal/server/server.go:724-728

The REST EnrollTOTP handler explicitly uses StorePendingTOTP (which keeps totp_required=0) and a comment at line 724 explains why:

// Security: use StorePendingTOTP (not SetTOTP) so that totp_required
// is not enabled until the user confirms the code.

The gRPC EnrollTOTP handler at line 202 calls SetTOTP directly, which immediately sets totp_required=1. Any user who initiates TOTP enrollment over gRPC but does not immediately confirm will have their account locked out — they cannot log in because TOTP is required, but no working TOTP secret is confirmed.

Fix: Change grpcserver/auth.go:202 from a.s.db.SetTOTP(...) to a.s.db.StorePendingTOTP(...), matching the REST server's behavior and the documented intent of those two DB methods.

Defense-in-Depth Gaps

File: internal/ui/ui.go:264

uiMux.HandleFunc("POST /login", u.handleLoginPost)

The REST /v1/auth/login endpoint is wrapped with loginRateLimit (10 req/s per IP). The UI /login endpoint has no equivalent middleware. Account lockout (10 failures per 15 minutes) partially mitigates brute force, but an attacker can still enumerate whether accounts exist at full network speed before triggering lockout, and can trigger lockout against many accounts in parallel with no rate friction.

Fix: Apply the same middleware.RateLimit(10, 10) to POST /login in the UI mux. A simpler option is to wrap the entire uiMux with the rate limiter since the UI is also a sensitive surface.

DEF-02 — `pendingLogins` Map Has No Expiry Cleanup (Low)

File: internal/ui/ui.go:57

The pendingLogins sync.Map stores short-lived TOTP nonces (90-second TTL). When consumed via consumeTOTPNonce, entries are deleted via LoadAndDelete. However, entries that are created but never consumed (user abandons login at the TOTP step, closes browser) accumulate indefinitely — they are checked for expiry on read but never proactively deleted.

In normal operation this is a minor memory leak. Under adversarial conditions — an attacker repeatedly sending username+password to step 1 without proceeding to step 2 — the map grows without bound. At scale this could be used for memory exhaustion.

Fix: Add a background goroutine (matching the pattern in middleware.RateLimit) that periodically iterates the map and deletes expired entries. A 5-minute cleanup interval is sufficient given the 90-second TTL.

DEF-03 — Rate Limiter Uses `RemoteAddr`, Not `X-Forwarded-For` (Low)

File: internal/middleware/middleware.go:200

The comment already acknowledges this: the rate limiter extracts the client IP from r.RemoteAddr. When the server is deployed behind a reverse proxy (nginx, Caddy, a load balancer), RemoteAddr will be the proxy's IP for all requests, collapsing all clients into a single rate-limit bucket. This effectively disables per-IP rate limiting in proxy deployments.

Fix: Add a configurable TrustedProxy setting. When set, extract the real client IP from X-Forwarded-For or X-Real-IP headers only for requests coming from that proxy address. Never trust those headers unconditionally — doing so allows IP spoofing.

DEF-04 — Missing `nbf` (Not Before) Claim on Issued Tokens (Low)

File: internal/token/token.go:73-82

IssueToken sets iss, sub, iat, exp, and jti, but not nbf. Without a not-before constraint, a token is valid from the moment of issuance and a slightly clock-skewed client or intermediate could present it early. This is a defense-in-depth measure, not a practical attack at the moment, but it costs nothing to add.

Fix: Add NotBefore: jwt.NewNumericDate(now) to the RegisteredClaims struct. Add the corresponding validation step in ValidateToken (using jwt.WithNotBefore() or a manual check).

DEF-05 — No Maximum Token Expiry Ceiling in Config Validation (Low)

File: internal/config/config.go:150-158

The config validator enforces that expiry durations are positive but not that they are bounded above. An operator misconfiguration (e.g. service_expiry = "876000h") would issue tokens valid for 100 years. For human sessions (default_expiry, admin_expiry) this is a significant risk in the event of token theft.

Fix: Add upper-bound checks in validate(). Suggested maximums: 30 days for default_expiry, 24 hours for admin_expiry, 5 years for service_expiry. At minimum, log a warning when values exceed reasonable thresholds.

DEF-06 — `GetAccountByUsername` Comment Incorrect re: Case Sensitivity (Informational)

File: internal/db/accounts.go:73

The comment reads "case-insensitive" but the query uses WHERE username = ? with SQLite's default BINARY collation, which is case-sensitive. This means admin and Admin would be treated as distinct accounts. This is not a security bug by itself, but it contradicts the comment and could mask confusion.

Fix: If case-insensitive matching is intended, add COLLATE NOCASE to the column definition or the query. If case-sensitive is correct (more common for SSO systems), remove the word "case-insensitive" from the comment.

DEF-07 — SQLite `synchronous=NORMAL` in WAL Mode (Low)

File: internal/db/db.go:68

With PRAGMA synchronous=NORMAL and journal_mode=WAL, SQLite syncs the WAL file on checkpoints but not on every write. A power failure between a write and the next checkpoint could lose the most recent transactions. For an authentication system — where token issuance and revocation records must be durable — this is a meaningful risk.

Fix: Change to PRAGMA synchronous=FULL. For a single-node personal SSO the performance impact is negligible; durability of token revocations is worth it.

DEF-08 — gRPC `Login` Counts TOTP-Missing as a Login Failure (Low)

File: internal/grpcserver/auth.go:76-77

When TOTP is required but no code is provided (req.TotpCode == ""), the gRPC handler calls RecordLoginFailure. In the two-step UI flow this is defensible, but via the gRPC single-step Login RPC, a well-behaved client that has not yet obtained the TOTP code (not an attacker) will increment the failure counter. Repeated retries could trigger account lockout unintentionally.

Fix: Either document that gRPC clients must always include the TOTP code and treat its omission as a deliberate attempt, or do not count "TOTP code required" as a failure (since the password was verified successfully at that point).

DEF-09 — Security Headers Missing on REST API Docs Endpoints (Informational)

File: internal/server/server.go:85-94

The /docs and /docs/openapi.yaml endpoints are served from the parent mux and therefore do not receive the securityHeaders middleware applied to the UI sub-mux. The Swagger UI page at /docs is served without X-Frame-Options, Content-Security-Policy, etc.

Fix: Apply a security-headers middleware to the docs handlers, or move them into the UI sub-mux.

DEF-10 — Role Strings Not Validated Against an Allowlist (Low)

File: internal/db/accounts.go:302-311 (GrantRole)

There is no allowlist for role strings written to the account_roles table. Any string can be stored. While the admin-only constraint prevents non-admins from calling these endpoints, a typo by an admin (e.g. "admim") would silently create an unknown role that silently grants nothing. The RequireRole check would never match it, causing a confusing failure mode.

Fix: Maintain a compile-time allowlist of valid roles (e.g. "admin", "user") and reject unknown role names at the handler layer before writing to the database.

Positive Findings

The following implementation details are exemplary and should be preserved:

Area	Detail
JWT alg confusion	`ValidateToken` enforces `alg=EdDSA` in the key function, before signature verification — the only correct place
Constant-time comparisons	`crypto/subtle.ConstantTimeCompare` used consistently for password hashes, TOTP codes, and CSRF tokens
Timing uniformity	Dummy Argon2 computed (once, with full production parameters via `sync.Once`) for unknown/inactive users on both REST and gRPC paths
Token revocation	Every token is tracked by JTI; unknown tokens are rejected (fail-closed) rather than silently accepted
Token renewal atomicity	`RenewToken` wraps revocation + insertion in a single SQLite transaction
TOTP nonce design	Two-step UI login uses a 128-bit single-use server-side nonce to avoid transmitting the password twice
CSRF protection	HMAC-SHA256 signed double-submit cookie with `SameSite=Strict` and constant-time validation
Credential exclusion	`json:"-"` tags on all credential fields; proto messages omit them too
Security headers	All UI responses receive CSP, `X-Content-Type-Options`, `X-Frame-Options`, HSTS, and `Referrer-Policy`
Account lockout	10-attempt, 15-minute rolling lockout checked before Argon2 to prevent timing oracle
Argon2id parameters	Config validator enforces OWASP 2023 minimums and rejects weakening
SQL injection	All queries use parameterized statements; no string concatenation anywhere
Audit log	Append-only with actor/target/IP; no delete path provided
Master key handling	Env var cleared after reading; signing key zeroed on shutdown

Remediation Priority

Fixed	Priority	ID	Severity	Action
Yes	1	CRIT-02	Medium	Change `grpcserver/auth.go:202` to call `StorePendingTOTP` instead of `SetTOTP`
Yes	2	CRIT-01	Medium	Add `last_totp_counter` tracking to prevent TOTP replay within the validity window
Yes	3	DEF-01	Medium	Apply IP rate limiting to the UI `POST /login` endpoint
Yes	4	DEF-02	Low	Add background cleanup goroutine for the `pendingLogins` map
Yes	5	DEF-03	Low	Support trusted-proxy IP extraction for accurate per-client rate limiting
Yes	6	DEF-04	Low	Add `nbf` claim to issued tokens and validate it on receipt
Yes	7	DEF-05	Low	Add upper-bound caps on token expiry durations in config validation
Yes	8	DEF-07	Low	Change SQLite to `PRAGMA synchronous=FULL`
Yes	9	DEF-08	Low	Do not count gRPC TOTP-missing as a login failure
Yes	10	DEF-10	Low	Validate role strings against an allowlist before writing to the DB
Yes	11	DEF-09	Info	Apply security headers to `/docs` endpoints
Yes	12	DEF-06	Info	Correct the misleading "case-insensitive" comment in `GetAccountByUsername`

Schema Observations

The migration chain (migrations 001–006) is sound. Foreign key cascades are appropriate. Indexes are present on all commonly-queried columns. The failed_logins table uses a rolling window query approach which is correct.

One note: the accounts table has no unique index enforcing COLLATE NOCASE on username. This is consistent with treating usernames as case-sensitive but should be documented explicitly to avoid future ambiguity.

13 KiB Raw Blame History Unescape Escape