Replace :latest with :v1.1.0 in the MCP service definition example to match the new platform convention of explicit version pinning. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
11 KiB
MCR Runbook
Service Overview
MCR (Metacircular Container Registry) is an OCI Distribution Spec-compliant container registry for the Metacircular platform. It stores and serves container images, with authentication delegated to MCIAS and a local policy engine for fine-grained access control.
MCR runs as two containers:
- mcr-api -- the registry server. Exposes OCI Distribution endpoints and an admin REST API on port 8443 (HTTPS), plus a gRPC admin API on port 9443. Handles blob storage, manifest management, and token-based authentication via MCIAS.
- mcr-web -- the web UI. Communicates with mcr-api via gRPC on port
9443. Provides repository/tag browsing and ACL policy management for
administrators. Listens on port 8080. Guest accounts are blocked at
login; only
adminanduserroles can access the web interface.
Both are fronted by MC-Proxy for TLS routing. Metadata is stored in
SQLite; blobs are stored as content-addressed files on the filesystem
under /srv/mcr/layers/.
Health Checks
REST
curl -k https://localhost:8443/v1/health
Expected: HTTP 200.
gRPC
Use the AdminService.Health RPC on port 9443. This method is public
(no auth required).
OCI Version Check
curl -k https://localhost:8443/v2/
Expected: HTTP 401 with WWW-Authenticate header (confirms the OCI
endpoint is alive and responding). An authenticated request returns
HTTP 200 with {}.
CLI
mcrctl status --addr https://localhost:8443
Expected output: ok
Common Operations
Start the Service (MCP)
- Deploy via MCP:
mcp deploy mcr - Verify health:
curl -k https://localhost:8443/v1/health
Start the Service (Docker Compose)
- Verify config exists:
ls /srv/mcr/mcr.toml - Start the containers:
docker compose -f deploy/docker/docker-compose-rift.yml up -d - Verify health:
curl -k https://localhost:8443/v1/health
Stop the Service
Via MCP:
mcp stop mcr
Via Docker Compose:
docker compose -f deploy/docker/docker-compose-rift.yml stop
MCR handles SIGTERM gracefully: it stops accepting new connections, drains in-flight requests (including ongoing uploads) for up to 60 seconds, then force-closes remaining connections and exits.
Restart the Service
Via MCP:
mcp restart mcr
Via Docker Compose:
docker compose -f deploy/docker/docker-compose-rift.yml restart
Verify health after restart:
curl -k https://localhost:8443/v1/health
Backup (Snapshot)
MCR backups have two parts: the SQLite database (metadata) and the blob filesystem. The database snapshot alone is usable but incomplete -- missing blobs return 404 on pull.
- Run the snapshot command:
mcrsrv snapshot --config /srv/mcr/mcr.toml - The snapshot is saved to
/srv/mcr/backups/mcr-YYYYMMDD-HHMMSS.db. - Verify the snapshot file exists and has a reasonable size:
ls -lh /srv/mcr/backups/ - For a complete backup, also copy the blob directory:
rsync -a /srv/mcr/layers/ /backup/mcr/layers/
A systemd timer (mcr-backup.timer) runs the database snapshot daily
at 02:00 UTC with 5-minute jitter.
Restore from Snapshot
- Stop the service (see above).
- Back up the current database:
cp /srv/mcr/mcr.db /srv/mcr/mcr.db.pre-restore - Copy the snapshot into place:
cp /srv/mcr/backups/mcr-YYYYMMDD-HHMMSS.db /srv/mcr/mcr.db - If restoring blobs as well:
rsync -a /backup/mcr/layers/ /srv/mcr/layers/ - Start the service (see above).
- Verify the service is healthy:
curl -k https://localhost:8443/v1/health - Verify an image pull works:
docker pull mcr.svc.mcp.metacircular.net:8443/<repo>:<tag>
Log Inspection
Container logs (mcr-api):
docker compose -f deploy/docker/docker-compose-rift.yml logs --tail 100 mcr-api
Container logs (mcr-web):
docker compose -f deploy/docker/docker-compose-rift.yml logs --tail 100 mcr-web
Follow logs in real time:
docker compose -f deploy/docker/docker-compose-rift.yml logs -f mcr-api mcr-web
Via MCP:
mcp logs mcr
MCR logs to stderr as structured text (slog). Log level is configured
via [log] level in mcr.toml (debug, info, warn, error).
Garbage Collection
Garbage collection removes unreferenced blobs -- blobs no longer referenced by any manifest. GC acquires a registry-wide lock that blocks new blob uploads for the duration of the mark-and-sweep phase. Pulls are not blocked.
- Trigger GC via CLI:
mcrctl gc --addr https://mcr.svc.mcp.metacircular.net:8443 - Check GC status:
mcrctl gc status --addr https://mcr.svc.mcp.metacircular.net:8443 - GC can also be triggered via the REST API:
curl -k -X POST -H "Authorization: Bearer <token>" https://localhost:8443/v1/gc
If a previous GC crashed after the database sweep but before filesystem cleanup, orphaned files may remain on disk. Run reconciliation to clean them up:
mcrctl gc --reconcile --addr https://mcr.svc.mcp.metacircular.net:8443
Incident Procedures
Database Corruption
Symptoms: server fails to start with SQLite errors, or API requests return unexpected errors.
- Stop the service.
- Check for WAL/SHM files alongside the database:
ls -la /srv/mcr/mcr.db* - Attempt an integrity check:
sqlite3 /srv/mcr/mcr.db "PRAGMA integrity_check;" - If integrity check fails, restore from the most recent snapshot:
cp /srv/mcr/mcr.db /srv/mcr/mcr.db.corrupt cp /srv/mcr/backups/mcr-YYYYMMDD-HHMMSS.db /srv/mcr/mcr.db - Start the service and verify health.
- Note: blobs on the filesystem are unaffected by database corruption. Images pushed after the snapshot was taken will be missing from metadata. Their blobs remain on disk and will be cleaned up by GC unless the metadata is re-created.
TLS Certificate Expiry
Symptoms: health check fails with TLS errors, Docker clients get certificate verification errors on push/pull.
- Check certificate expiry:
openssl x509 -in /srv/mcr/certs/cert.pem -noout -enddate - Replace the certificate and key files at the paths configured in
mcr.toml([server] tls_certandtls_key). - Restart the service to load the new certificate.
- Verify health:
curl -k https://localhost:8443/v1/health
MCIAS Outage
Symptoms: push/pull fails with 401 or 502 errors. Authentication cannot complete.
- Confirm MCIAS is unreachable:
curl -k https://svc.metacircular.net:8443/v1/health - Cached token validation results remain valid for up to 30 seconds after the last successful MCIAS check. Operations using recently-validated tokens may continue briefly.
- Once cached tokens expire, all authenticated operations (push, pull, catalog, admin) will fail until MCIAS recovers.
- The OCI
/v2/version check endpoint still responds (confirms MCR itself is running). - Escalate to MCIAS (see Escalation below).
Disk Full
Symptoms: blob uploads fail, database writes fail, container may crash.
- Check disk usage:
df -h /srv/mcr/ du -sh /srv/mcr/layers/ /srv/mcr/uploads/ /srv/mcr/mcr.db - Clean up stale uploads:
Remove upload files that are old and have no matching in-progress upload in the database.
ls -la /srv/mcr/uploads/ - Run garbage collection to reclaim unreferenced blobs:
mcrctl gc --addr https://mcr.svc.mcp.metacircular.net:8443 - If GC does not free enough space, identify large repositories:
mcrctl repo list --addr https://mcr.svc.mcp.metacircular.net:8443 - Delete unused tags or repositories to free space, then run GC again.
- If the disk is completely full and the service cannot start, manually
remove orphaned files from
/srv/mcr/uploads/to free enough space for the service to start, then run GC.
Image Push/Pull Failures
Symptoms: docker push or docker pull returns errors.
- Verify the service is running and healthy:
curl -k https://localhost:8443/v1/health - Test OCI endpoint:
Expected: HTTP 401 with
curl -k https://localhost:8443/v2/WWW-Authenticateheader. - Test authentication:
Expected: HTTP 200 with a token response.
curl -k -u username:password https://localhost:8443/v2/token?service=mcr - Check if the issue is policy-related (403 Denied):
Review policy rules for the affected account and repository.
mcrctl policy list --addr https://mcr.svc.mcp.metacircular.net:8443 - Check audit log for denied requests:
mcrctl audit tail --n 20 --addr https://mcr.svc.mcp.metacircular.net:8443 - For push failures, verify all referenced blobs exist before pushing
the manifest. The error
MANIFEST_BLOB_UNKNOWNmeans a layer was not uploaded before the manifest push. - Check logs for detailed error information:
docker compose -f deploy/docker/docker-compose-rift.yml logs --tail 50 mcr-api
MCP Deployment
MCR is deployed via MCP as a two-component service on the rift node.
Service Definition
name = "mcr"
node = "rift"
active = true
[[components]]
name = "api"
image = "mcr.svc.mcp.metacircular.net:8443/mcr:v1.1.0"
network = "mcpnet"
user = "0:0"
restart = "unless-stopped"
ports = ["127.0.0.1:28443:8443", "127.0.0.1:29443:9443"]
volumes = ["/srv/mcr:/srv/mcr"]
cmd = ["server", "--config", "/srv/mcr/mcr.toml"]
[[components]]
name = "web"
image = "mcr.svc.mcp.metacircular.net:8443/mcr-web:v1.1.0"
network = "mcpnet"
user = "0:0"
restart = "unless-stopped"
ports = ["127.0.0.1:28080:8080"]
volumes = ["/srv/mcr:/srv/mcr"]
cmd = ["server", "--config", "/srv/mcr/mcr.toml"]
Port Mapping
| Component | Container Port | Host Port | Purpose |
|---|---|---|---|
| mcr-api | 8443 | 28443 | HTTPS (OCI + admin REST) |
| mcr-api | 9443 | 29443 | gRPC admin API |
| mcr-web | 8080 | 28080 | Web UI (HTTP, behind MC-Proxy) |
Both containers share the /srv/mcr volume for configuration, database,
and blob storage. They are connected to the mcpnet Docker network.
Escalation
Escalate when:
- Database corruption cannot be resolved by restoring a snapshot.
- MCIAS is down and registry operations are urgently needed.
- Disk full cannot be resolved by GC and cleanup.
- Push/pull failures persist after following the procedures above.
- Any issue not covered by this runbook.
Escalation path: Kyle (platform owner).