Merge transit engine branch, resolve conflicts in shared files
This commit is contained in:
713
docs/engineering-standards.md
Normal file
713
docs/engineering-standards.md
Normal file
@@ -0,0 +1,713 @@
|
||||
# Metacircular Dynamics — Engineering Standards
|
||||
|
||||
This document describes the standard repository layout, tooling, and software
|
||||
development lifecycle (SDLC) for services built at Metacircular Dynamics. It is
|
||||
derived from the conventions established in Metacrypt and codifies them as the
|
||||
baseline for all new and existing services.
|
||||
|
||||
## Table of Contents
|
||||
|
||||
1. [Repository Layout](#repository-layout)
|
||||
2. [Language & Toolchain](#language--toolchain)
|
||||
3. [Build System](#build-system)
|
||||
4. [API Design](#api-design)
|
||||
5. [Authentication & Authorization](#authentication--authorization)
|
||||
6. [Database Conventions](#database-conventions)
|
||||
7. [Configuration](#configuration)
|
||||
8. [Web UI](#web-ui)
|
||||
9. [Testing](#testing)
|
||||
10. [Linting & Static Analysis](#linting--static-analysis)
|
||||
11. [Deployment](#deployment)
|
||||
12. [Documentation](#documentation)
|
||||
13. [Security](#security)
|
||||
14. [Development Workflow](#development-workflow)
|
||||
|
||||
---
|
||||
|
||||
## Repository Layout
|
||||
|
||||
Every service follows a consistent directory structure. Adjust the
|
||||
service-specific directories (e.g. `engines/` in Metacrypt) as appropriate,
|
||||
but the top-level skeleton is fixed.
|
||||
|
||||
```
|
||||
.
|
||||
├── cmd/
|
||||
│ ├── <service>/ CLI entry point (server, subcommands)
|
||||
│ └── <service>-web/ Web UI entry point (if separate binary)
|
||||
├── internal/
|
||||
│ ├── auth/ MCIAS integration (token validation, caching)
|
||||
│ ├── config/ TOML configuration loading & validation
|
||||
│ ├── db/ Database setup, schema migrations
|
||||
│ ├── server/ REST API server, routes, middleware
|
||||
│ ├── grpcserver/ gRPC server, interceptors, service handlers
|
||||
│ ├── webserver/ Web UI server, template routes, HTMX handlers
|
||||
│ └── <domain>/ Service-specific packages
|
||||
├── proto/<service>/
|
||||
│ ├── v1/ Legacy proto definitions (if applicable)
|
||||
│ └── v2/ Current proto definitions
|
||||
├── gen/<service>/
|
||||
│ ├── v1/ Generated Go gRPC/protobuf code
|
||||
│ └── v2/
|
||||
├── web/
|
||||
│ ├── embed.go //go:embed directive for templates and static
|
||||
│ ├── templates/ Go HTML templates
|
||||
│ └── static/ CSS, JS (htmx)
|
||||
├── deploy/
|
||||
│ ├── docker/ Docker Compose configuration
|
||||
│ ├── examples/ Example config files
|
||||
│ ├── scripts/ Install, backup, migration scripts
|
||||
│ └── systemd/ systemd unit files and timers
|
||||
├── docs/ Internal engineering documentation
|
||||
├── Dockerfile.api API server container (if split binary)
|
||||
├── Dockerfile.web Web UI container (if split binary)
|
||||
├── Makefile
|
||||
├── buf.yaml Protobuf linting & breaking-change config
|
||||
├── .golangci.yaml Linter configuration
|
||||
├── .gitignore
|
||||
├── CLAUDE.md AI-assisted development instructions
|
||||
├── ARCHITECTURE.md Full system specification
|
||||
└── <service>.toml.example Example configuration
|
||||
```
|
||||
|
||||
### Key Principles
|
||||
|
||||
- **`cmd/`** contains only CLI wiring (cobra commands, flag parsing). No
|
||||
business logic.
|
||||
- **`internal/`** contains all service logic. Nothing in `internal/` is
|
||||
importable by other modules — this is enforced by Go's module system.
|
||||
- **`proto/`** is the source of truth for gRPC definitions. Generated code
|
||||
lives in `gen/`, never edited by hand.
|
||||
- **`deploy/`** contains everything needed to run the service in production.
|
||||
A new engineer should be able to deploy from this directory alone.
|
||||
- **`web/`** is embedded into the binary via `//go:embed`. No external file
|
||||
dependencies at runtime.
|
||||
|
||||
### What Does Not Belong in the Repository
|
||||
|
||||
- Runtime data (databases, certificates, logs) — these live in `/srv/<service>`
|
||||
- Real configuration files with secrets — only examples are committed
|
||||
- IDE configuration (`.idea/`, `.vscode/`) — per-developer, not shared
|
||||
- Vendored dependencies — Go module proxy handles this
|
||||
|
||||
---
|
||||
|
||||
## Language & Toolchain
|
||||
|
||||
| Tool | Version | Purpose |
|
||||
|------|---------|---------|
|
||||
| Go | 1.25+ | Primary language |
|
||||
| protoc + protoc-gen-go | Latest | Protobuf/gRPC code generation |
|
||||
| buf | Latest | Proto linting and breaking-change detection |
|
||||
| golangci-lint | v2 | Static analysis and linting |
|
||||
| Docker | Latest | Container builds |
|
||||
|
||||
### Go Conventions
|
||||
|
||||
- **Pure-Go dependencies** where possible. Avoid CGo — it complicates
|
||||
cross-compilation and container builds. Use `modernc.org/sqlite` instead
|
||||
of `mattn/go-sqlite3`.
|
||||
- **`CGO_ENABLED=0`** for all production builds. Statically linked binaries
|
||||
deploy cleanly to Alpine containers.
|
||||
- **Stripped binaries**: Build with `-trimpath -ldflags="-s -w"` to remove
|
||||
debug symbols and reduce image size.
|
||||
- **Version injection**: Pass `git describe --tags --always --dirty` via
|
||||
`-X main.version=...` at build time. Every binary must report its version.
|
||||
|
||||
### Module Path
|
||||
|
||||
Services hosted on `git.wntrmute.dev` use:
|
||||
|
||||
```
|
||||
git.wntrmute.dev/kyle/<service>
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Build System
|
||||
|
||||
Every repository has a Makefile with these standard targets:
|
||||
|
||||
```makefile
|
||||
.PHONY: build test vet lint proto-lint clean docker all
|
||||
|
||||
LDFLAGS := -trimpath -ldflags="-s -w -X main.version=$(shell git describe --tags --always --dirty)"
|
||||
|
||||
<service>:
|
||||
go build $(LDFLAGS) -o <service> ./cmd/<service>
|
||||
|
||||
build:
|
||||
go build ./...
|
||||
|
||||
test:
|
||||
go test ./...
|
||||
|
||||
vet:
|
||||
go vet ./...
|
||||
|
||||
lint:
|
||||
golangci-lint run ./...
|
||||
|
||||
proto:
|
||||
protoc --go_out=. --go_opt=module=<module> \
|
||||
--go-grpc_out=. --go-grpc_opt=module=<module> \
|
||||
proto/<service>/v2/*.proto
|
||||
|
||||
proto-lint:
|
||||
buf lint
|
||||
buf breaking --against '.git#branch=master,subdir=proto'
|
||||
|
||||
clean:
|
||||
rm -f <service>
|
||||
|
||||
docker:
|
||||
docker build -t <service> -f Dockerfile.api .
|
||||
|
||||
all: vet lint test <service>
|
||||
```
|
||||
|
||||
### Target Semantics
|
||||
|
||||
| Target | When to Run | CI Gate? |
|
||||
|--------|-------------|----------|
|
||||
| `vet` | Every change | Yes |
|
||||
| `lint` | Every change | Yes |
|
||||
| `test` | Every change | Yes |
|
||||
| `proto-lint` | Any proto change | Yes |
|
||||
| `proto` | After editing `.proto` files | No (manual) |
|
||||
| `all` | Pre-push verification | Yes |
|
||||
|
||||
The `all` target is the CI pipeline: `vet → lint → test → build`. If any
|
||||
step fails, the pipeline stops.
|
||||
|
||||
---
|
||||
|
||||
## API Design
|
||||
|
||||
Services expose two synchronized API surfaces:
|
||||
|
||||
### gRPC (Primary)
|
||||
|
||||
- Proto definitions live in `proto/<service>/v2/`.
|
||||
- Use strongly-typed, per-operation RPCs. Avoid generic "execute" patterns.
|
||||
- Use `google.protobuf.Timestamp` for all time fields (not RFC 3339 strings).
|
||||
- Run `buf lint` and `buf breaking` against master before merging proto
|
||||
changes.
|
||||
|
||||
### REST (Secondary)
|
||||
|
||||
- JSON over HTTPS. Routes live in `internal/server/routes.go`.
|
||||
- Use `chi` for routing (lightweight, stdlib-compatible).
|
||||
- Standard error format: `{"error": "description"}`.
|
||||
- Standard HTTP status codes: `401` (unauthenticated), `403` (unauthorized),
|
||||
`412` (precondition failed), `503` (service unavailable).
|
||||
|
||||
### API Sync Rule
|
||||
|
||||
**Every REST endpoint must have a corresponding gRPC RPC, and vice versa.**
|
||||
When adding, removing, or changing an endpoint in either surface, the other
|
||||
must be updated in the same change. This is enforced in code review.
|
||||
|
||||
### gRPC Interceptors
|
||||
|
||||
Access control is enforced via interceptor maps, not per-handler checks:
|
||||
|
||||
| Map | Effect |
|
||||
|-----|--------|
|
||||
| `sealRequiredMethods` | Returns `UNAVAILABLE` if the service is sealed/locked |
|
||||
| `authRequiredMethods` | Validates MCIAS bearer token, populates caller info |
|
||||
| `adminRequiredMethods` | Requires admin role on the caller |
|
||||
|
||||
Adding a new RPC means adding it to the correct interceptor maps. Forgetting
|
||||
this is a security defect.
|
||||
|
||||
---
|
||||
|
||||
## Authentication & Authorization
|
||||
|
||||
### Authentication
|
||||
|
||||
All services delegate authentication to **MCIAS** (Metacircular Identity and
|
||||
Access Service). No service maintains its own user database.
|
||||
|
||||
- Client sends credentials to the service's `/v1/auth/login` endpoint.
|
||||
- The service forwards them to MCIAS via the client library
|
||||
(`git.wntrmute.dev/kyle/mcias/clients/go`).
|
||||
- On success, MCIAS returns a bearer token. The service returns it to the
|
||||
client and optionally sets it as a cookie for the web UI.
|
||||
- Subsequent requests include the token via `Authorization: Bearer <token>`
|
||||
header or cookie.
|
||||
- Token validation calls MCIAS `ValidateToken()`. Results should be cached
|
||||
(keyed by SHA-256 of the token) with a short TTL (30 seconds or less).
|
||||
|
||||
### Authorization
|
||||
|
||||
Three role levels:
|
||||
|
||||
| Role | Meaning |
|
||||
|------|---------|
|
||||
| `admin` | Full access to everything. Policy bypass. |
|
||||
| `user` | Access governed by policy rules. Default deny. |
|
||||
| `guest` | Service-dependent restrictions. Default deny. |
|
||||
|
||||
Admin detection is based solely on the MCIAS `admin` role. The service never
|
||||
promotes users locally.
|
||||
|
||||
Services that need fine-grained access control should implement a policy
|
||||
engine (priority-based ACL rules stored in encrypted storage, default deny,
|
||||
admin bypass). See Metacrypt's implementation as the reference.
|
||||
|
||||
---
|
||||
|
||||
## Database Conventions
|
||||
|
||||
### SQLite
|
||||
|
||||
SQLite is the default database for Metacircular services. It is simple to
|
||||
operate, requires no external processes, and backs up cleanly with
|
||||
`VACUUM INTO`.
|
||||
|
||||
Connection settings (applied at open time):
|
||||
|
||||
```go
|
||||
PRAGMA journal_mode = WAL;
|
||||
PRAGMA foreign_keys = ON;
|
||||
PRAGMA busy_timeout = 5000;
|
||||
```
|
||||
|
||||
File permissions: `0600`. Created by the service on first run.
|
||||
|
||||
### Migrations
|
||||
|
||||
- Migrations are Go functions registered in `internal/db/` and run
|
||||
sequentially at startup.
|
||||
- Each migration is idempotent — `CREATE TABLE IF NOT EXISTS`,
|
||||
`ALTER TABLE ... ADD COLUMN IF NOT EXISTS`.
|
||||
- Applied migrations are tracked in a `schema_migrations` table.
|
||||
- Never modify a migration that has been deployed. Add a new one.
|
||||
|
||||
### Backup
|
||||
|
||||
Every service must provide a `snapshot` CLI command that creates a consistent
|
||||
backup using `VACUUM INTO`. Automated backups run via a systemd timer
|
||||
(daily, with retention pruning).
|
||||
|
||||
---
|
||||
|
||||
## Configuration
|
||||
|
||||
### Format
|
||||
|
||||
TOML. Parsed with `go-toml/v2`. Environment variable overrides via
|
||||
`SERVICENAME_*` (e.g. `METACRYPT_SERVER_LISTEN_ADDR`).
|
||||
|
||||
### Standard Sections
|
||||
|
||||
```toml
|
||||
[server]
|
||||
listen_addr = ":8443" # HTTPS API
|
||||
grpc_addr = ":9443" # gRPC (optional; disabled if unset)
|
||||
tls_cert = "/srv/<service>/certs/cert.pem"
|
||||
tls_key = "/srv/<service>/certs/key.pem"
|
||||
|
||||
[web]
|
||||
listen_addr = "127.0.0.1:8080" # Web UI (optional; disabled if unset)
|
||||
vault_grpc = "127.0.0.1:9443" # gRPC address of the API server
|
||||
vault_ca_cert = "" # CA cert for verifying API server TLS
|
||||
|
||||
[database]
|
||||
path = "/srv/<service>/<service>.db"
|
||||
|
||||
[mcias]
|
||||
server_url = "https://mcias.metacircular.net:8443"
|
||||
ca_cert = "" # Custom CA for MCIAS TLS
|
||||
|
||||
[log]
|
||||
level = "info" # debug, info, warn, error
|
||||
```
|
||||
|
||||
### Validation
|
||||
|
||||
Required fields are validated at startup. The service refuses to start if
|
||||
any are missing. Do not silently default required values.
|
||||
|
||||
### Data Directory
|
||||
|
||||
All runtime data lives in `/srv/<service>/`:
|
||||
|
||||
```
|
||||
/srv/<service>/
|
||||
├── <service>.toml Configuration
|
||||
├── <service>.db SQLite database
|
||||
├── certs/ TLS certificates
|
||||
└── backups/ Database snapshots
|
||||
```
|
||||
|
||||
This convention enables straightforward service migration between hosts:
|
||||
copy `/srv/<service>/` and the binary.
|
||||
|
||||
---
|
||||
|
||||
## Web UI
|
||||
|
||||
### Technology
|
||||
|
||||
- **Go `html/template`** for server-side rendering. No JavaScript frameworks.
|
||||
- **htmx** for dynamic interactions (form submission, partial page updates)
|
||||
without full page reloads.
|
||||
- Templates and static files are embedded in the binary via `//go:embed`.
|
||||
|
||||
### Structure
|
||||
|
||||
- `web/templates/layout.html` — shared HTML skeleton, navigation, CSS/JS
|
||||
includes. All page templates extend this.
|
||||
- Page templates: one `.html` file per page/feature.
|
||||
- `web/static/` — CSS, htmx. Keep this minimal.
|
||||
|
||||
### Architecture
|
||||
|
||||
The web UI runs as a separate binary (`<service>-web`) that communicates
|
||||
with the API server via its gRPC interface. This separation means:
|
||||
|
||||
- The web UI has no direct database access.
|
||||
- The API server enforces all authorization.
|
||||
- The web UI can be deployed independently or omitted entirely.
|
||||
|
||||
### Security
|
||||
|
||||
- CSRF protection via signed double-submit cookies on all mutating requests
|
||||
(POST/PUT/PATCH/DELETE).
|
||||
- Session cookie: `HttpOnly`, `Secure`, `SameSite=Strict`.
|
||||
- All user input is escaped by `html/template` (the default).
|
||||
|
||||
---
|
||||
|
||||
## Testing
|
||||
|
||||
### Philosophy
|
||||
|
||||
Tests are written using the Go standard library `testing` package. No test
|
||||
frameworks (testify, gomega, etc.) — the standard library is sufficient and
|
||||
keeps dependencies minimal.
|
||||
|
||||
### Patterns
|
||||
|
||||
```go
|
||||
func TestFeatureName(t *testing.T) {
|
||||
// Setup: use t.TempDir() for isolated file system state.
|
||||
dir := t.TempDir()
|
||||
database, err := db.Open(filepath.Join(dir, "test.db"))
|
||||
if err != nil {
|
||||
t.Fatalf("open db: %v", err)
|
||||
}
|
||||
defer func() { _ = database.Close() }()
|
||||
db.Migrate(database)
|
||||
|
||||
// Exercise the code under test.
|
||||
// ...
|
||||
|
||||
// Assert with t.Fatal (not t.Error) for precondition failures.
|
||||
if !bytes.Equal(got, want) {
|
||||
t.Fatalf("got %q, want %q", got, want)
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Guidelines
|
||||
|
||||
- **Use `t.TempDir()`** for all file-system state. Never write to fixed
|
||||
paths. Cleanup is automatic.
|
||||
- **Use `errors.Is`** for error assertions, not string comparison.
|
||||
- **No mocks for databases.** Tests use real SQLite databases created in
|
||||
temp directories. This catches migration bugs that mocks would hide.
|
||||
- **Test files** live alongside the code they test: `barrier.go` and
|
||||
`barrier_test.go` in the same package.
|
||||
- **Test helpers** call `t.Helper()` so failures report the caller's line.
|
||||
|
||||
### What to Test
|
||||
|
||||
| Layer | Test Strategy |
|
||||
|-------|---------------|
|
||||
| Crypto primitives | Roundtrip encryption/decryption, wrong-key rejection, edge cases |
|
||||
| Storage (barrier, DB) | CRUD operations, sealed-state rejection, concurrent access |
|
||||
| API handlers | Request/response correctness, auth enforcement, error codes |
|
||||
| Policy engine | Rule matching, priority ordering, default deny, admin bypass |
|
||||
| CLI commands | Flag parsing, output format (lightweight) |
|
||||
|
||||
---
|
||||
|
||||
## Linting & Static Analysis
|
||||
|
||||
### Configuration
|
||||
|
||||
Every repository includes a `.golangci.yaml` with this philosophy:
|
||||
**fail loudly for security and correctness; everything else is a warning.**
|
||||
|
||||
### Required Linters
|
||||
|
||||
| Linter | Category | Purpose |
|
||||
|--------|----------|---------|
|
||||
| `errcheck` | Correctness | Unhandled errors are silent failures |
|
||||
| `govet` | Correctness | Printf mismatches, unreachable code, suspicious constructs |
|
||||
| `ineffassign` | Correctness | Dead writes hide logic bugs |
|
||||
| `unused` | Correctness | Unused variables and functions |
|
||||
| `errorlint` | Error handling | Proper `errors.Is`/`errors.As` usage |
|
||||
| `gosec` | Security | Hardcoded secrets, weak RNG, insecure crypto, SQL injection |
|
||||
| `staticcheck` | Security | Deprecated APIs, mutex misuse, deep analysis |
|
||||
| `revive` | Style | Go naming conventions, error return ordering |
|
||||
| `gofmt` | Formatting | Standard Go formatting |
|
||||
| `goimports` | Formatting | Import grouping and ordering |
|
||||
|
||||
### Settings
|
||||
|
||||
- `errcheck`: `check-type-assertions: true` (catch `x.(*T)` without ok check).
|
||||
- `govet`: all analyzers enabled except `shadow` (too noisy for idiomatic Go).
|
||||
- `gosec`: severity and confidence set to `medium`. Exclude `G104` (overlaps
|
||||
with errcheck).
|
||||
- `max-issues-per-linter: 0` — report everything. No caps.
|
||||
- Test files: allow `G101` (hardcoded credentials) for test fixtures.
|
||||
|
||||
---
|
||||
|
||||
## Deployment
|
||||
|
||||
### Container-First
|
||||
|
||||
Services are designed for container deployment but must also run as native
|
||||
systemd services. Both paths are first-class.
|
||||
|
||||
### Docker
|
||||
|
||||
Multi-stage builds:
|
||||
|
||||
1. **Builder**: `golang:1.23-alpine`. Compile with `CGO_ENABLED=0`, strip
|
||||
symbols.
|
||||
2. **Runtime**: `alpine:3.21`. Non-root user (`<service>`), minimal attack
|
||||
surface.
|
||||
|
||||
If the service has separate API and web binaries, use separate Dockerfiles
|
||||
(`Dockerfile.api`, `Dockerfile.web`) and a `docker-compose.yml` that wires
|
||||
them together with a shared data volume.
|
||||
|
||||
### systemd
|
||||
|
||||
Every service ships with:
|
||||
|
||||
| File | Purpose |
|
||||
|------|---------|
|
||||
| `<service>.service` | Main service unit (API server) |
|
||||
| `<service>-web.service` | Web UI unit (if applicable) |
|
||||
| `<service>-backup.service` | Oneshot backup unit |
|
||||
| `<service>-backup.timer` | Daily backup timer (02:00 UTC, 5-minute jitter) |
|
||||
|
||||
#### Security Hardening
|
||||
|
||||
All service units must include these security directives:
|
||||
|
||||
```ini
|
||||
NoNewPrivileges=true
|
||||
ProtectSystem=strict
|
||||
ProtectHome=true
|
||||
PrivateTmp=true
|
||||
PrivateDevices=true
|
||||
ProtectKernelTunables=true
|
||||
ProtectKernelModules=true
|
||||
ProtectControlGroups=true
|
||||
RestrictSUIDSGID=true
|
||||
RestrictNamespaces=true
|
||||
LockPersonality=true
|
||||
MemoryDenyWriteExecute=true
|
||||
RestrictRealtime=true
|
||||
ReadWritePaths=/srv/<service>
|
||||
```
|
||||
|
||||
The web UI unit should use `ReadOnlyPaths=/srv/<service>` instead of
|
||||
`ReadWritePaths` — it has no reason to write to the data directory.
|
||||
|
||||
### Install Script
|
||||
|
||||
`deploy/scripts/install.sh` handles:
|
||||
|
||||
1. Create system user/group (idempotent).
|
||||
2. Install binary to `/usr/local/bin/`.
|
||||
3. Create `/srv/<service>/` directory structure.
|
||||
4. Install example config if none exists.
|
||||
5. Install systemd units and reload the daemon.
|
||||
|
||||
### TLS
|
||||
|
||||
- **Minimum TLS version: 1.3.** No exceptions, no fallback cipher suites.
|
||||
Go's TLS 1.3 implementation manages cipher selection automatically.
|
||||
- **Timeouts**: read 30s, write 30s, idle 120s.
|
||||
- Certificate and key paths are required configuration — the service refuses
|
||||
to start without them.
|
||||
|
||||
### Graceful Shutdown
|
||||
|
||||
Services handle `SIGINT` and `SIGTERM`, shutting down cleanly:
|
||||
|
||||
1. Stop accepting new connections.
|
||||
2. Drain in-flight requests (with a timeout).
|
||||
3. Clean up resources (close databases, zeroize secrets if applicable).
|
||||
4. Exit.
|
||||
|
||||
---
|
||||
|
||||
## Documentation
|
||||
|
||||
### Required Files
|
||||
|
||||
| File | Purpose | Audience |
|
||||
|------|---------|----------|
|
||||
| `CLAUDE.md` | AI-assisted development context | Claude Code |
|
||||
| `ARCHITECTURE.md` | Full system specification | Engineers |
|
||||
| `deploy/examples/<service>.toml` | Example configuration | Operators |
|
||||
|
||||
### CLAUDE.md
|
||||
|
||||
This file provides context for AI-assisted development. It should contain:
|
||||
|
||||
- Project overview (one paragraph).
|
||||
- Build, test, and lint commands.
|
||||
- High-level architecture summary.
|
||||
- Project structure with directory descriptions.
|
||||
- Ignored directories (runtime data, generated code).
|
||||
- Critical rules (e.g. API sync requirements).
|
||||
|
||||
Keep it concise. AI tools read this on every interaction.
|
||||
|
||||
### ARCHITECTURE.md
|
||||
|
||||
This is the canonical specification for the service. It should cover:
|
||||
|
||||
1. System overview with a layered architecture diagram.
|
||||
2. Cryptographic design (if applicable): algorithms, key hierarchy.
|
||||
3. State machines and lifecycle (if applicable).
|
||||
4. Storage design.
|
||||
5. Authentication and authorization model.
|
||||
6. API surface (REST and gRPC, with tables of every endpoint).
|
||||
7. Web interface routes.
|
||||
8. Database schema (every table, every column).
|
||||
9. Configuration reference.
|
||||
10. Deployment guide.
|
||||
11. Security model: threat mitigations table and security invariants.
|
||||
12. Future work.
|
||||
|
||||
This document is the source of truth. When the code and the spec disagree,
|
||||
one of them has a bug.
|
||||
|
||||
### Engine/Feature Design Documents
|
||||
|
||||
For services with a modular architecture, each module gets its own design
|
||||
document (e.g. `engines/sshca.md`). These are detailed implementation plans
|
||||
that include:
|
||||
|
||||
- Overview and core concepts.
|
||||
- Data model and storage layout.
|
||||
- Lifecycle (initialization, teardown).
|
||||
- Operations table with auth requirements.
|
||||
- API definitions (gRPC and REST).
|
||||
- Implementation steps (file-by-file).
|
||||
- Security considerations.
|
||||
- References to existing code patterns to follow.
|
||||
|
||||
Write these before writing code. They are the blueprint, not the afterthought.
|
||||
|
||||
---
|
||||
|
||||
## Security
|
||||
|
||||
### General Principles
|
||||
|
||||
- **Default deny.** Unauthenticated requests are rejected. Unauthorized
|
||||
requests are rejected. If in doubt, deny.
|
||||
- **Fail closed.** If the service cannot verify authorization, it denies the
|
||||
request. If the database is unavailable, the service is unavailable.
|
||||
- **Least privilege.** Service processes run as non-root. systemd units
|
||||
restrict filesystem access, syscalls, and capabilities.
|
||||
- **No local user databases.** Authentication is always delegated to MCIAS.
|
||||
|
||||
### Cryptographic Standards
|
||||
|
||||
| Purpose | Algorithm | Notes |
|
||||
|---------|-----------|-------|
|
||||
| Symmetric encryption | AES-256-GCM | 12-byte random nonce per operation |
|
||||
| Symmetric alternative | XChaCha20-Poly1305 | For contexts needing nonce misuse resistance |
|
||||
| Key derivation | Argon2id | Memory-hard; tune params to hardware |
|
||||
| Asymmetric signing | Ed25519, ECDSA (P-256, P-384) | Prefer Ed25519 |
|
||||
| CSPRNG | `crypto/rand` | All keys, nonces, salts, tokens |
|
||||
| Constant-time comparison | `crypto/subtle` | All secret comparisons |
|
||||
|
||||
- **Never use RSA for new designs.** Ed25519 and ECDSA are faster, produce
|
||||
smaller keys, and have simpler security models.
|
||||
- **Zeroize secrets** from memory when they are no longer needed. Overwrite
|
||||
byte slices with zeros, nil out pointers.
|
||||
- **Never log secrets.** Keys, passwords, tokens, and plaintext must never
|
||||
appear in log output.
|
||||
|
||||
### Web Security
|
||||
|
||||
- CSRF tokens on all mutating requests.
|
||||
- `SameSite=Strict` on all cookies.
|
||||
- `html/template` for automatic escaping.
|
||||
- Validate all input at system boundaries.
|
||||
|
||||
---
|
||||
|
||||
## Development Workflow
|
||||
|
||||
### Local Development
|
||||
|
||||
```bash
|
||||
# Build and run both servers locally:
|
||||
make devserver
|
||||
|
||||
# Or build everything and run the full pipeline:
|
||||
make all
|
||||
```
|
||||
|
||||
The `devserver` target builds both binaries and runs them against a local
|
||||
config in `srv/`. The `srv/` directory is gitignored — it holds your local
|
||||
database, certificates, and configuration.
|
||||
|
||||
### Pre-Push Checklist
|
||||
|
||||
Before pushing a branch:
|
||||
|
||||
```bash
|
||||
make all # vet → lint → test → build
|
||||
make proto-lint # if proto files changed
|
||||
```
|
||||
|
||||
### Proto Changes
|
||||
|
||||
1. Edit `.proto` files in `proto/<service>/v2/`.
|
||||
2. Run `make proto` to regenerate Go code.
|
||||
3. Run `make proto-lint` to check for linting violations and breaking changes.
|
||||
4. Update REST routes to match the new/changed RPCs.
|
||||
5. Update gRPC interceptor maps for any new RPCs.
|
||||
6. Update `ARCHITECTURE.md` API tables.
|
||||
|
||||
### Adding a New Feature
|
||||
|
||||
1. **Design first.** Write or update the relevant design document. For a new
|
||||
engine or major subsystem, create a new doc in `docs/` or `engines/`.
|
||||
2. **Implement.** Follow existing patterns — the design doc should reference
|
||||
specific files and line numbers.
|
||||
3. **Test.** Write tests alongside the implementation.
|
||||
4. **Update docs.** Update `ARCHITECTURE.md`, `CLAUDE.md`, and route tables.
|
||||
5. **Verify.** Run `make all`.
|
||||
|
||||
### CLI Commands
|
||||
|
||||
Every service uses cobra for CLI commands. Standard subcommands:
|
||||
|
||||
| Command | Purpose |
|
||||
|---------|---------|
|
||||
| `server` | Start the service |
|
||||
| `init` | First-time setup (if applicable) |
|
||||
| `status` | Query a running instance's health |
|
||||
| `snapshot` | Create a database backup |
|
||||
|
||||
Add service-specific subcommands as needed (e.g. `migrate-aad`, `unseal`).
|
||||
Each command lives in its own file in `cmd/<service>/`.
|
||||
Reference in New Issue
Block a user