Files
mcr/PROJECT_PLAN.md
Kyle Isom c01e7ffa30 Phase 7: OCI delete path for manifests and blobs
Manifest delete (DELETE /v2/<name>/manifests/<digest>): rejects tag
references with 405 UNSUPPORTED per OCI spec, cascades to tags and
manifest_blobs via ON DELETE CASCADE, returns 202 Accepted.

Blob delete (DELETE /v2/<name>/blobs/<digest>): removes manifest_blobs
associations only — blob row and file are preserved for GC to handle,
since other repos may reference the same content-addressed blob.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 20:23:47 -07:00

800 lines
31 KiB
Markdown

# MCR Project Plan
Implementation plan for the Metacircular Container Registry. Each phase
contains discrete steps with acceptance criteria. Steps within a phase are
sequential unless noted as batchable. See `ARCHITECTURE.md` for the full
design specification.
## Status
| Phase | Description | Status |
|-------|-------------|--------|
| 0 | Project scaffolding | **Complete** |
| 1 | Configuration & database | **Complete** |
| 2 | Blob storage layer | **Complete** |
| 3 | MCIAS authentication | **Complete** |
| 4 | Policy engine | **Complete** |
| 5 | OCI API — pull path | **Complete** |
| 6 | OCI API — push path | **Complete** |
| 7 | OCI API — delete path | **Complete** |
| 8 | Admin REST API | **Complete** |
| 9 | Garbage collection | Not started |
| 10 | gRPC admin API | Not started |
| 11 | CLI tool (mcrctl) | Not started |
| 12 | Web UI | Not started |
| 13 | Deployment artifacts | Not started |
### Dependency Graph
```
Phase 0 (scaffolding)
└─► Phase 1 (config + db)
├─► Phase 2 (blob storage) ──┐
└─► Phase 3 (MCIAS auth) │
└─► Phase 4 (policy) │
└─► Phase 5 (pull) ◄──┘
└─► Phase 6 (push)
└─► Phase 7 (delete)
└─► Phase 9 (GC)
└─► Phase 8 (admin REST) ◄── Phase 1
└─► Phase 10 (gRPC)
├─► Phase 11 (mcrctl)
└─► Phase 12 (web UI)
Phase 13 (deployment) depends on all above.
```
### Batchable Work
The following phases are independent and can be assigned to different
engineers simultaneously:
- **Batch A** (after Phase 1): Phase 2 (blob storage) and Phase 3 (MCIAS auth)
- **Batch B** (after Phase 4): Phase 5 (OCI pull) and Phase 8 (admin REST)
- **Batch C** (after Phase 10): Phase 11 (mcrctl) and Phase 12 (web UI)
---
## Phase 0: Project Scaffolding
Set up the Go module, build system, and binary skeletons. This phase
produces a project that builds and lints cleanly with no functionality.
### Step 0.1: Go module and directory structure
**Acceptance criteria:**
- `go.mod` initialized with module path `git.wntrmute.dev/kyle/mcr`
- Directory skeleton created per `ARCHITECTURE.md` §13:
`cmd/mcrsrv/`, `cmd/mcr-web/`, `cmd/mcrctl/`, `internal/`, `proto/mcr/v1/`,
`gen/mcr/v1/`, `web/`, `deploy/`
- `.gitignore` excludes: `mcrsrv`, `mcr-web`, `mcrctl`, `srv/`, `*.db`,
`*.db-wal`, `*.db-shm`
### Step 0.2: Makefile
**Acceptance criteria:**
- Standard targets per engineering standards: `build`, `test`, `vet`, `lint`,
`proto`, `proto-lint`, `clean`, `docker`, `all`, `devserver`
- `all` target runs: `vet``lint``test` → build binaries
- Binary targets: `mcrsrv`, `mcr-web`, `mcrctl` with version injection via
`-X main.version=$(shell git describe --tags --always --dirty)`
- `CGO_ENABLED=0` for all builds
- `make all` passes (empty binaries link successfully)
### Step 0.3: Linter and protobuf configuration
**Acceptance criteria:**
- `.golangci.yaml` with required linters per engineering standards:
errcheck, govet, ineffassign, unused, errorlint, gosec, staticcheck,
revive, gofmt, goimports
- `errcheck.check-type-assertions: true`
- `govet`: all analyzers except `shadow`
- `gosec`: exclude `G101` in test files
- `buf.yaml` for proto linting
- `golangci-lint run ./...` passes cleanly
### Step 0.4: Binary entry points with cobra
**Acceptance criteria:**
- `cmd/mcrsrv/main.go`: root command with `server`, `init`, `snapshot`
subcommands (stubs that print "not implemented" and exit 1)
- `cmd/mcr-web/main.go`: root command with `server` subcommand (stub)
- `cmd/mcrctl/main.go`: root command with `status`, `repo`, `gc`, `policy`,
`audit`, `snapshot` subcommand groups (stubs)
- All three binaries accept `--version` flag, printing the injected version
- `make all` builds all three binaries and passes lint/vet/test
---
## Phase 1: Configuration & Database
Implement config loading, SQLite database setup, and schema migrations.
Steps 1.1 and 1.2 can be **batched** (no dependency between them).
### Step 1.1: Configuration loading (`internal/config`)
**Acceptance criteria:**
- TOML config struct matching `ARCHITECTURE.md` §10 (all sections:
`[server]`, `[database]`, `[storage]`, `[mcias]`, `[web]`, `[log]`)
- Parsed with `go-toml/v2`
- Environment variable overrides via `MCR_` prefix
(e.g., `MCR_SERVER_LISTEN_ADDR`)
- Startup validation: refuse to start if required fields are missing
(`listen_addr`, `tls_cert`, `tls_key`, `database.path`,
`storage.layers_path`, `mcias.server_url`)
- Same-filesystem check for `layers_path` and `uploads_path` via device ID
- Default values: `uploads_path` defaults to `<layers_path>/../uploads`,
`read_timeout` = 30s, `write_timeout` = 0, `idle_timeout` = 120s,
`shutdown_timeout` = 60s, `log.level` = "info"
- `mcr.toml.example` created in `deploy/examples/`
- Tests: valid config, missing required fields, env override, device ID check
### Step 1.2: Database setup and migrations (`internal/db`)
**Acceptance criteria:**
- SQLite opened with `modernc.org/sqlite` (pure-Go, no CGo)
- Connection pragmas: `journal_mode=WAL`, `foreign_keys=ON`,
`busy_timeout=5000`
- File permissions: `0600`
- Migration framework: Go functions registered sequentially, tracked in
`schema_migrations` table, idempotent (`CREATE TABLE IF NOT EXISTS`)
- Migration 000001: `repositories`, `manifests`, `tags`, `blobs`,
`manifest_blobs`, `uploads` tables per `ARCHITECTURE.md` §8
- Migration 000002: `policy_rules`, `audit_log` tables per §8
- All indexes created per schema
- `db.Open(path)``*DB`, `db.Close()`, `db.Migrate()` public API
- Tests: open fresh DB, run migrations, verify tables exist, run migrations
again (idempotent), verify foreign key enforcement works
### Step 1.3: Audit log helpers (`internal/db`)
**Acceptance criteria:**
- `db.WriteAuditEvent(event_type, actor_id, repository, digest, ip, details)`
inserts into `audit_log`
- `db.ListAuditEvents(filters)` with pagination (offset/limit), filtering
by event_type, actor_id, repository, time range
- `details` parameter is `map[string]string`, serialized as JSON
- Tests: write events, list with filters, pagination
---
## Phase 2: Blob Storage Layer
Implement content-addressed filesystem operations for blob data.
### Step 2.1: Blob writer (`internal/storage`)
**Acceptance criteria:**
- `storage.New(layersPath, uploadsPath)` constructor
- `storage.StartUpload(uuid)` creates a temp file at
`<uploadsPath>/<uuid>` and returns a `*BlobWriter`
- `BlobWriter.Write([]byte)` appends data and updates a running SHA-256 hash
- `BlobWriter.Commit(expectedDigest)`:
- Finalizes SHA-256
- Rejects with error if computed digest != expected digest
- Creates two-char prefix directory under `<layersPath>/sha256/<prefix>/`
- Renames temp file to `<layersPath>/sha256/<prefix>/<hex-digest>`
- Returns the verified `sha256:<hex>` digest string
- `BlobWriter.Cancel()` removes the temp file
- `BlobWriter.BytesWritten()` returns current offset
- Tests: write blob, verify file at expected path, digest mismatch rejection,
cancel cleanup, concurrent writes to different UUIDs
### Step 2.2: Blob reader and metadata (`internal/storage`)
**Acceptance criteria:**
- `storage.Open(digest)` returns an `io.ReadCloser` for the blob file, or
`ErrBlobNotFound`
- `storage.Stat(digest)` returns size and existence check without opening
- `storage.Delete(digest)` removes the blob file and its prefix directory
if empty
- `storage.Exists(digest)` returns bool
- Digest format validated: must match `sha256:[a-f0-9]{64}`
- Tests: read after write, stat, delete, not-found error, invalid digest
format rejected
---
## Phase 3: MCIAS Authentication
Implement token validation and the OCI token endpoint.
### Step 3.1: MCIAS client (`internal/auth`)
**Acceptance criteria:**
- `auth.NewClient(mciastURL, caCert, serviceName, tags)` constructor
- `client.Login(username, password)` calls MCIAS `POST /v1/auth/login`
with `service_name` and `tags` in the request body; returns JWT string
and expiry
- `client.ValidateToken(token)` calls MCIAS `POST /v1/token/validate`;
returns parsed claims (subject UUID, account type, roles) or error
- Validation results cached by `sha256(token)` with 30-second TTL;
cache entries evicted on expiry
- TLS: custom CA cert supported; TLS 1.3 minimum enforced via
`tls.Config.MinVersion`
- HTTP client timeout: 10 seconds
- Errors wrapped with `fmt.Errorf("auth: %w", err)`
- Tests: use `httptest.Server` to mock MCIAS; test login success,
login failure (401), validate success, validate with revoked token,
cache hit within TTL, cache miss after TTL
### Step 3.2: Auth middleware (`internal/server`)
**Acceptance criteria:**
- `middleware.RequireAuth(authClient)` extracts `Authorization: Bearer <token>`
header, calls `authClient.ValidateToken`, injects claims into
`context.Context`
- Missing/invalid token returns OCI error format:
`{"errors":[{"code":"UNAUTHORIZED","message":"..."}]}` with HTTP 401
- 401 responses include `WWW-Authenticate: Bearer realm="/v2/token",service="<service_name>"`
header
- Claims retrievable from context via `auth.ClaimsFromContext(ctx)`
- Tests: valid token passes, missing header returns 401 with
WWW-Authenticate, invalid token returns 401, claims accessible in handler
### Step 3.3: Token endpoint (`GET /v2/token`)
**Acceptance criteria:**
- Accepts HTTP Basic auth (username:password from `Authorization` header)
- Accepts `scope` and `service` query parameters (logged but not used for
scoping)
- Calls `authClient.Login(username, password)`
- On success: returns `{"token":"<jwt>","expires_in":<seconds>,"issued_at":"<rfc3339>"}`
- On failure: returns `{"errors":[{"code":"UNAUTHORIZED","message":"..."}]}`
with HTTP 401
- Tests: valid credentials, invalid credentials, missing auth header
### Step 3.4: Version check endpoint (`GET /v2/`)
**Acceptance criteria:**
- Requires valid bearer token (via RequireAuth middleware)
- Returns `200 OK` with body `{}`
- Unauthenticated requests return 401 with WWW-Authenticate header
(this is the entry point for the OCI auth handshake)
- Tests: authenticated returns 200, unauthenticated returns 401 with
correct WWW-Authenticate header
---
## Phase 4: Policy Engine
Implement the registry-specific authorization engine.
### Step 4.1: Core policy types and evaluation (`internal/policy`)
**Acceptance criteria:**
- Types defined per `ARCHITECTURE.md` §4: `Action`, `Effect`, `PolicyInput`,
`Rule`
- All action constants: `registry:version_check`, `registry:pull`,
`registry:push`, `registry:delete`, `registry:catalog`, `policy:manage`
- `Evaluate(input PolicyInput, rules []Rule) (Effect, *Rule)`:
- Sorts rules by priority (stable)
- Collects all matching rules
- Deny-wins: any matching deny → return deny
- First allow → return allow
- Default deny if no match
- Rule matching: all populated fields ANDed; empty fields are wildcards
- `Repositories` glob matching via `path.Match`; empty list = match all
- When `input.Repository` is empty (global ops), only rules with empty
`Repositories` match
- Tests: admin wildcard, user allow, system account deny (no rules),
exact repo match, glob match (`production/*`), deny-wins over allow,
priority ordering, empty repository global operation, multiple
matching rules
### Step 4.2: Built-in defaults (`internal/policy`)
**Acceptance criteria:**
- `DefaultRules()` returns the built-in rules per `ARCHITECTURE.md` §4:
admin wildcard, human user full access, version check allow
- Default rules use negative IDs (-1, -2, -3)
- Default rules have priority 0
- Tests: admin gets allow for all actions, user gets allow for pull/push/
delete/catalog, system account gets deny for everything except
version_check, user gets allow for version_check
### Step 4.3: Policy engine wrapper with DB integration (`internal/policy`)
**Acceptance criteria:**
- `Engine` struct wraps `Evaluate` with DB-backed rule loading
- `engine.SetRules(rules)` caches rules in memory (merges with defaults)
- `engine.Evaluate(input)` calls stateless `Evaluate` with cached rules
- Thread-safe: `sync.RWMutex` protects the cached rule set
- `engine.Reload(db)` loads enabled rules from `policy_rules` table and
calls `SetRules`
- Tests: engine with only defaults, engine with custom rules, reload
picks up new rules, disabled rules excluded
### Step 4.4: Policy middleware (`internal/server`)
**Acceptance criteria:**
- `middleware.RequirePolicy(engine, action)` middleware:
- Extracts claims from context (set by RequireAuth)
- Extracts repository name from URL path (empty for global ops)
- Assembles `PolicyInput`
- Calls `engine.Evaluate`
- On deny: returns OCI error `{"errors":[{"code":"DENIED","message":"..."}]}`
with HTTP 403; writes `policy_deny` audit event
- On allow: proceeds to handler
- Tests: admin allowed, user allowed, system account denied (no rules),
system account with matching rule allowed, deny rule blocks access
---
## Phase 5: OCI API — Pull Path
Implement the read side of the OCI Distribution Spec. Requires Phase 2
(storage), Phase 3 (auth), and Phase 4 (policy).
### Step 5.1: OCI handler scaffolding (`internal/oci`)
**Acceptance criteria:**
- `oci.NewHandler(db, storage, authClient, policyEngine)` constructor
- Chi router with `/v2/` prefix; all routes wrapped in RequireAuth middleware
- Repository name extracted from URL path; names may contain `/`
(chi wildcard catch-all)
- OCI error response helper: `writeOCIError(w, code, status, message)`
producing `{"errors":[{"code":"...","message":"..."}]}` format
- All OCI handlers share the same `*oci.Handler` receiver
- Tests: error response format matches OCI spec
### Step 5.2: Manifest pull (`GET /v2/<name>/manifests/<reference>`)
**Acceptance criteria:**
- Policy check: `registry:pull` action on the target repository
- If `<reference>` is a tag: look up tag → manifest in DB
- If `<reference>` is a digest (`sha256:...`): look up manifest by
digest in DB
- Returns manifest content with:
- `Content-Type` header set to manifest's `media_type`
- `Docker-Content-Digest` header set to manifest's digest
- `Content-Length` header set to manifest's size
- `HEAD` variant returns same headers but no body
- Repository not found → `NAME_UNKNOWN` (404)
- Manifest not found → `MANIFEST_UNKNOWN` (404)
- Writes `manifest_pulled` audit event
- Tests: pull by tag, pull by digest, HEAD returns headers only,
nonexistent repo, nonexistent tag, nonexistent digest
### Step 5.3: Blob download (`GET /v2/<name>/blobs/<digest>`)
**Acceptance criteria:**
- Policy check: `registry:pull` action on the target repository
- Verify blob exists in `blobs` table AND is referenced by a manifest
in the target repository (via `manifest_blobs`)
- Open blob from storage, stream to response with:
- `Content-Type: application/octet-stream`
- `Docker-Content-Digest` header
- `Content-Length` header
- `HEAD` variant returns headers only
- Blob not in repo → `BLOB_UNKNOWN` (404)
- Tests: download blob, HEAD blob, blob not found, blob exists
globally but not in this repo → 404
### Step 5.4: Tag listing (`GET /v2/<name>/tags/list`)
**Acceptance criteria:**
- Policy check: `registry:pull` action on the target repository
- Returns `{"name":"<repo>","tags":["tag1","tag2",...]}` sorted
alphabetically
- Pagination via `n` (limit) and `last` (cursor) query parameters
per OCI spec
- If more results: `Link` header with next page URL
- Empty tag list returns `{"name":"<repo>","tags":[]}`
- Repository not found → `NAME_UNKNOWN` (404)
- Tests: list tags, pagination, empty repo, nonexistent repo
### Step 5.5: Catalog listing (`GET /v2/_catalog`)
**Acceptance criteria:**
- Policy check: `registry:catalog` action (no repository context)
- Returns `{"repositories":["repo1","repo2",...]}` sorted alphabetically
- Pagination via `n` and `last` query parameters
- If more results: `Link` header with next page URL
- Tests: list repos, pagination, empty registry
---
## Phase 6: OCI API — Push Path
Implement blob uploads and manifest pushes. Requires Phase 5 (shared
OCI infrastructure).
### Step 6.1: Blob upload — initiate (`POST /v2/<name>/blobs/uploads/`)
**Acceptance criteria:**
- Policy check: `registry:push` action on the target repository
- Creates repository if it doesn't exist (implicit creation)
- Generates upload UUID (`crypto/rand`)
- Inserts row in `uploads` table
- Creates temp file via `storage.StartUpload(uuid)`
- Returns `202 Accepted` with:
- `Location: /v2/<name>/blobs/uploads/<uuid>` header
- `Docker-Upload-UUID: <uuid>` header
- `Range: 0-0` header
- Tests: initiate returns 202 with correct headers, implicit repo
creation, UUID is unique
### Step 6.2: Blob upload — chunked and monolithic
**Acceptance criteria:**
- `PATCH /v2/<name>/blobs/uploads/<uuid>`:
- Appends request body to the upload's temp file
- Updates `byte_offset` in `uploads` table
- `Content-Range` header processed if present
- Returns `202 Accepted` with updated `Range` and `Location` headers
- `PUT /v2/<name>/blobs/uploads/<uuid>?digest=<digest>`:
- If request body is non-empty, appends it first (monolithic upload)
- Calls `BlobWriter.Commit(digest)`
- On digest mismatch: `DIGEST_INVALID` (400)
- Inserts row in `blobs` table (or no-op if digest already exists)
- Deletes row from `uploads` table
- Returns `201 Created` with:
- `Location: /v2/<name>/blobs/<digest>` header
- `Docker-Content-Digest` header
- Writes `blob_uploaded` audit event
- `GET /v2/<name>/blobs/uploads/<uuid>`:
- Returns `204 No Content` with `Range: 0-<offset>` header
- `DELETE /v2/<name>/blobs/uploads/<uuid>`:
- Cancels upload: deletes temp file, removes `uploads` row
- Returns `204 No Content`
- Upload UUID not found → `BLOB_UPLOAD_UNKNOWN` (404)
- Tests: monolithic upload (POST then PUT with body), chunked upload
(POST → PATCH → PATCH → PUT), digest mismatch, check progress,
cancel upload, nonexistent UUID
### Step 6.3: Manifest push (`PUT /v2/<name>/manifests/<reference>`)
**Acceptance criteria:**
- Policy check: `registry:push` action on the target repository
- Implements the full manifest push flow per `ARCHITECTURE.md` §5:
1. Parse manifest JSON; reject malformed → `MANIFEST_INVALID` (400)
2. Compute SHA-256 digest of raw bytes
3. If reference is a digest, verify match → `DIGEST_INVALID` (400)
4. Parse layer and config descriptors from manifest
5. Verify all referenced blobs exist → `MANIFEST_BLOB_UNKNOWN` (400)
6. Single SQLite transaction:
a. Create repository if not exists
b. Insert/update manifest row
c. Populate `manifest_blobs` join table
d. If reference is a tag, insert/update tag row
7. Return `201 Created` with `Docker-Content-Digest` and `Location`
headers
- Writes `manifest_pushed` audit event (includes repo, tag, digest)
- Tests: push by tag, push by digest, push updates existing tag (atomic
tag move), missing blob → 400, malformed manifest → 400, digest
mismatch → 400, re-push same manifest (idempotent)
---
## Phase 7: OCI API — Delete Path
Implement manifest and blob deletion.
### Step 7.1: Manifest delete (`DELETE /v2/<name>/manifests/<digest>`)
**Acceptance criteria:**
- Policy check: `registry:delete` action on the target repository
- Reference must be a digest (not a tag) → `UNSUPPORTED` (405) if tag
- Deletes manifest row; cascades to `manifest_blobs` and `tags`
(ON DELETE CASCADE)
- Returns `202 Accepted`
- Writes `manifest_deleted` audit event
- Tests: delete by digest, attempt delete by tag → 405, nonexistent
manifest → `MANIFEST_UNKNOWN` (404), cascading tag deletion verified
### Step 7.2: Blob delete (`DELETE /v2/<name>/blobs/<digest>`)
**Acceptance criteria:**
- Policy check: `registry:delete` action on the target repository
- Verify blob exists and is referenced in this repository
- Removes the `manifest_blobs` rows for this repo's manifests (does NOT
delete the blob row or file — that's GC's job, since other repos may
reference it)
- Returns `202 Accepted`
- Writes `blob_deleted` audit event
- Tests: delete blob, blob still on disk (not GC'd yet), blob not in
repo → `BLOB_UNKNOWN` (404)
---
## Phase 8: Admin REST API
Implement management endpoints under `/v1/`. Can be **batched** with
Phase 5 (OCI pull) since both depend on Phase 4 but not on each other.
### Step 8.1: Auth endpoints (`/v1/auth`)
**Acceptance criteria:**
- `POST /v1/auth/login`: accepts `{"username":"...","password":"..."}`
body, forwards to MCIAS, returns `{"token":"...","expires_at":"..."}`
- `POST /v1/auth/logout`: requires bearer token, calls MCIAS token
revocation (if supported), returns `204 No Content`
- `GET /v1/health`: returns `{"status":"ok"}` (no auth required)
- Error format: `{"error":"..."}` (platform standard)
- Tests: login success, login failure, logout, health check
### Step 8.2: Repository management endpoints
**Acceptance criteria:**
- `GET /v1/repositories`: list repositories with metadata
(tag count, manifest count, total size). Paginated. Requires bearer.
- `GET /v1/repositories/{name}`: repository detail (tags with digests,
manifests, total size). Requires bearer. Name may contain `/`.
- `DELETE /v1/repositories/{name}`: delete repository and all associated
manifests, tags, manifest_blobs rows. Requires admin role. Writes
`repo_deleted` audit event.
- Tests: list repos, repo detail, delete repo cascades correctly,
non-admin delete → 403
### Step 8.3: Policy management endpoints
**Acceptance criteria:**
- Full CRUD per `ARCHITECTURE.md` §6:
- `GET /v1/policy/rules`: list all rules (paginated)
- `POST /v1/policy/rules`: create rule from JSON body
- `GET /v1/policy/rules/{id}`: get single rule
- `PATCH /v1/policy/rules/{id}`: update priority, enabled, description
- `DELETE /v1/policy/rules/{id}`: delete rule
- All endpoints require admin role
- Write operations trigger policy engine reload
- Audit events: `policy_rule_created`, `policy_rule_updated`,
`policy_rule_deleted`
- Input validation: priority must be >= 1 (0 reserved for built-ins),
actions must be valid constants, effect must be "allow" or "deny"
- Tests: full CRUD cycle, validation errors, non-admin → 403,
engine reload after create/update/delete
### Step 8.4: Audit log endpoint
**Acceptance criteria:**
- `GET /v1/audit`: list audit events. Requires admin role.
- Query parameters: `event_type`, `actor_id`, `repository`, `since`,
`until`, `n` (limit, default 50), `offset`
- Returns JSON array of audit events
- Tests: list events, filter by type, filter by repository, pagination
### Step 8.5: Garbage collection endpoints
**Acceptance criteria:**
- `POST /v1/gc`: trigger async GC run. Requires admin role. Returns
`202 Accepted` with `{"id":"<gc-run-id>"}`. Returns `409 Conflict`
if GC is already running.
- `GET /v1/gc/status`: returns current/last GC status:
`{"running":bool,"last_run":{"started_at":"...","completed_at":"...",
"blobs_removed":N,"bytes_freed":N}}`
- Writes `gc_started` and `gc_completed` audit events
- Tests: trigger GC, check status, concurrent trigger → 409
---
## Phase 9: Garbage Collection
Implement the two-phase GC algorithm. Requires Phase 7 (delete path
creates unreferenced blobs).
### Step 9.1: GC engine (`internal/gc`)
**Acceptance criteria:**
- `gc.New(db, storage)` constructor
- `gc.Run(ctx)` executes the two-phase algorithm per `ARCHITECTURE.md` §9:
- Phase 1 (DB): acquire lock, begin write tx, find unreferenced blobs,
delete blob rows, commit
- Phase 2 (filesystem): delete blob files, remove empty prefix dirs,
release lock
- Registry-wide lock (`sync.Mutex`) blocks new blob uploads during phase 1
- Lock integration: upload initiation (Step 6.1) must check the GC lock
before creating new uploads
- Returns `GCResult{BlobsRemoved int, BytesFreed int64, Duration time.Duration}`
- `gc.Reconcile(ctx)` scans filesystem, deletes files with no `blobs` row
(crash recovery)
- Tests: GC removes unreferenced blobs, GC does not remove referenced blobs,
concurrent GC rejected, reconcile cleans orphaned files
### Step 9.2: Wire GC into server and CLI
**Acceptance criteria:**
- `POST /v1/gc` and gRPC `GarbageCollect` call `gc.Run` in a goroutine
- GC status tracked in memory (running flag, last result)
- `mcrctl gc` triggers via REST/gRPC
- `mcrctl gc status` fetches status
- `mcrctl gc --reconcile` runs filesystem reconciliation
- Tests: end-to-end GC via API trigger
---
## Phase 10: gRPC Admin API
Implement the protobuf definitions and gRPC server. Requires Phase 8
(admin REST, to share business logic).
### Step 10.1: Proto definitions
**Acceptance criteria:**
- Proto files per `ARCHITECTURE.md` §7:
`registry.proto`, `policy.proto`, `audit.proto`, `admin.proto`,
`common.proto`
- All RPCs defined per §7 service definitions table
- `buf lint` passes
- `make proto` generates Go stubs in `gen/mcr/v1/`
- Generated code committed
### Step 10.2: gRPC server implementation (`internal/grpcserver`)
**Acceptance criteria:**
- `RegistryService`: `ListRepositories`, `GetRepository`,
`DeleteRepository`, `GarbageCollect`, `GetGCStatus`
- `PolicyService`: full CRUD for policy rules
- `AuditService`: `ListAuditEvents`
- `AdminService`: `Health`
- All RPCs call the same business logic as REST handlers (shared
`internal/db` and `internal/gc` packages)
- Tests: at least one RPC per service via `grpc.NewServer` + in-process
client
### Step 10.3: Interceptor chain
**Acceptance criteria:**
- Interceptor chain per `ARCHITECTURE.md` §7:
Request Logger → Auth Interceptor → Admin Interceptor → Handler
- Auth interceptor extracts `authorization` metadata, validates via
MCIAS, injects claims. `Health` bypasses auth.
- Admin interceptor requires admin role for GC, policy, delete, audit.
- Request logger logs method, peer IP, status code, duration. Never
logs the authorization metadata value.
- gRPC errors: `codes.Unauthenticated` for missing/invalid token,
`codes.PermissionDenied` for insufficient role
- Tests: unauthenticated → Unauthenticated, non-admin on admin
RPC → PermissionDenied, Health bypasses auth
### Step 10.4: TLS and server startup
**Acceptance criteria:**
- gRPC server uses same TLS cert/key as REST server
- `tls.Config.MinVersion = tls.VersionTLS13`
- Server starts on `grpc_addr` from config; disabled if `grpc_addr`
is empty
- Graceful shutdown: `grpcServer.GracefulStop()` called on SIGINT/SIGTERM
- Tests: server starts and accepts TLS connections
---
## Phase 11: CLI Tool (mcrctl)
Implement the admin CLI. Can be **batched** with Phase 12 (web UI)
since both depend on Phase 10 but not on each other.
### Step 11.1: Client and connection setup
**Acceptance criteria:**
- Global flags: `--server` (REST URL), `--grpc` (gRPC address),
`--token` (bearer token), `--ca-cert` (custom CA)
- Token can be loaded from `MCR_TOKEN` env var
- gRPC client with TLS, using same CA cert if provided
- REST client with TLS, `Authorization: Bearer` header
- Connection errors produce clear messages
### Step 11.2: Status and repository commands
**Acceptance criteria:**
- `mcrctl status` → calls `GET /v1/health`, prints status
- `mcrctl repo list` → calls `GET /v1/repositories`, prints table
- `mcrctl repo delete <name>` → calls `DELETE /v1/repositories/<name>`,
confirms before deletion
- Output: human-readable by default, `--json` for machine-readable
- Tests: at minimum, flag parsing tests
### Step 11.3: Policy, audit, GC, and snapshot commands
**Acceptance criteria:**
- `mcrctl policy list|create|update|delete` → full CRUD via REST/gRPC
- `mcrctl policy create` accepts `--json` flag for rule body
- `mcrctl audit tail [--n N]` → calls `GET /v1/audit`
- `mcrctl gc` → calls `POST /v1/gc`
- `mcrctl gc status` → calls `GET /v1/gc/status`
- `mcrctl gc --reconcile` → calls reconciliation endpoint
- `mcrctl snapshot` → triggers database backup
- Tests: flag parsing, output formatting
---
## Phase 12: Web UI
Implement the HTMX-based web interface. Requires Phase 10 (gRPC).
### Step 12.1: Web server scaffolding
**Acceptance criteria:**
- `cmd/mcr-web/` binary reads `[web]` config section
- Connects to mcrsrv via gRPC at `web.grpc_addr`
- Go `html/template` with `web/templates/layout.html` base template
- Static files embedded via `//go:embed` (`web/static/`: CSS, htmx)
- CSRF protection: signed double-submit cookies on POST/PUT/PATCH/DELETE
- Session cookie: `HttpOnly`, `Secure`, `SameSite=Strict`, stores
MCIAS JWT
- Chi router with middleware chain
### Step 12.2: Login and authentication
**Acceptance criteria:**
- `/login` page with username/password form
- Form submission POSTs to mcr-web, which calls MCIAS login via mcrsrv
gRPC (or directly via MCIAS client)
- On success: sets session cookie, redirects to `/`
- On failure: re-renders login with error message
- Logout link clears session cookie
### Step 12.3: Dashboard and repository browsing
**Acceptance criteria:**
- `/` dashboard: repository count, total size, recent pushes
(last 10 `manifest_pushed` audit events)
- `/repositories` list: table with name, tag count, manifest count,
total size
- `/repositories/{name}` detail: tag list (name → digest), manifest
list (digest, media type, size, created), layer list
- `/repositories/{name}/manifests/{digest}` detail: full manifest
JSON, referenced layers with sizes
- All data fetched from mcrsrv via gRPC
### Step 12.4: Policy management (admin only)
**Acceptance criteria:**
- `/policies` page: list all policy rules in a table
- Create form: HTMX form that submits new rule (priority, description,
effect, actions, account types, subject UUID, repositories)
- Edit: inline HTMX toggle for enabled/disabled, edit priority/description
- Delete: confirm dialog, HTMX delete
- Non-admin users see "Access denied" or are redirected
### Step 12.5: Audit log viewer (admin only)
**Acceptance criteria:**
- `/audit` page: paginated table of audit events
- Filters: event type dropdown, repository name, date range
- HTMX partial page updates for filter changes
- Non-admin users see "Access denied"
---
## Phase 13: Deployment Artifacts
Package everything for production deployment.
### Step 13.1: Dockerfile
**Acceptance criteria:**
- Multi-stage build per `ARCHITECTURE.md` §14:
Builder `golang:1.25-alpine`, runtime `alpine:3.21`
- `CGO_ENABLED=0`, `-trimpath -ldflags="-s -w"`
- Builds all three binaries
- Runtime: non-root user `mcr` (uid 10001)
- `EXPOSE 8443 9443`
- `VOLUME /srv/mcr`
- `ENTRYPOINT ["mcrsrv"]`, `CMD ["server", "--config", "/srv/mcr/mcr.toml"]`
- `make docker` builds image with version tag
### Step 13.2: systemd units
**Acceptance criteria:**
- `deploy/systemd/mcr.service`: registry server unit with security
hardening per engineering standards (`NoNewPrivileges`, `ProtectSystem`,
`ReadWritePaths=/srv/mcr`, etc.)
- `deploy/systemd/mcr-web.service`: web UI unit with
`ReadOnlyPaths=/srv/mcr`
- `deploy/systemd/mcr-backup.service`: oneshot backup unit running
`mcrsrv snapshot`
- `deploy/systemd/mcr-backup.timer`: daily 02:00 UTC with 5-min jitter
- All units run as `User=mcr`, `Group=mcr`
### Step 13.3: Install script and example configs
**Acceptance criteria:**
- `deploy/scripts/install.sh`: idempotent script that creates system
user/group, installs binaries to `/usr/local/bin/`, creates `/srv/mcr/`
directory structure, installs example config if none exists, installs
systemd units and reloads daemon
- `deploy/examples/mcr.toml` with annotated production defaults
- `deploy/examples/mcr-dev.toml` with local development defaults
- Script tested: runs twice without error (idempotent)