mc/mcr

Files

Kyle Isom acc4851549 Update RUNBOOK MCP example to use pinned version tags

Replace :latest with :v1.1.0 in the MCP service definition example
to match the new platform convention of explicit version pinning.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-03-26 23:31:02 -07:00

11 KiB

Raw Blame History

MCR Runbook

Service Overview

MCR (Metacircular Container Registry) is an OCI Distribution Spec-compliant container registry for the Metacircular platform. It stores and serves container images, with authentication delegated to MCIAS and a local policy engine for fine-grained access control.

MCR runs as two containers:

mcr-api -- the registry server. Exposes OCI Distribution endpoints and an admin REST API on port 8443 (HTTPS), plus a gRPC admin API on port 9443. Handles blob storage, manifest management, and token-based authentication via MCIAS.
mcr-web -- the web UI. Communicates with mcr-api via gRPC on port 9443. Provides repository/tag browsing and ACL policy management for administrators. Listens on port 8080. Guest accounts are blocked at login; only admin and user roles can access the web interface.

Both are fronted by MC-Proxy for TLS routing. Metadata is stored in SQLite; blobs are stored as content-addressed files on the filesystem under /srv/mcr/layers/.

Health Checks

REST

curl -k https://localhost:8443/v1/health

Expected: HTTP 200.

gRPC

Use the AdminService.Health RPC on port 9443. This method is public (no auth required).

OCI Version Check

curl -k https://localhost:8443/v2/

Expected: HTTP 401 with WWW-Authenticate header (confirms the OCI endpoint is alive and responding). An authenticated request returns HTTP 200 with {}.

CLI

mcrctl status --addr https://localhost:8443

Expected output: ok

Common Operations

Start the Service (MCP)

Deploy via MCP:
```
mcp deploy mcr
```

Verify health:

curl -k https://localhost:8443/v1/health

Start the Service (Docker Compose)

Verify config exists: ls /srv/mcr/mcr.toml

Start the containers:

docker compose -f deploy/docker/docker-compose-rift.yml up -d

Verify health:

curl -k https://localhost:8443/v1/health

Stop the Service

Via MCP:

mcp stop mcr

Via Docker Compose:

docker compose -f deploy/docker/docker-compose-rift.yml stop

MCR handles SIGTERM gracefully: it stops accepting new connections, drains in-flight requests (including ongoing uploads) for up to 60 seconds, then force-closes remaining connections and exits.

Restart the Service

Via MCP:

mcp restart mcr

Via Docker Compose:

docker compose -f deploy/docker/docker-compose-rift.yml restart

Verify health after restart:

curl -k https://localhost:8443/v1/health

Backup (Snapshot)

MCR backups have two parts: the SQLite database (metadata) and the blob filesystem. The database snapshot alone is usable but incomplete -- missing blobs return 404 on pull.

Run the snapshot command:

mcrsrv snapshot --config /srv/mcr/mcr.toml

The snapshot is saved to /srv/mcr/backups/mcr-YYYYMMDD-HHMMSS.db.
Verify the snapshot file exists and has a reasonable size:
```
ls -lh /srv/mcr/backups/
```
For a complete backup, also copy the blob directory:
```
rsync -a /srv/mcr/layers/ /backup/mcr/layers/
```

A systemd timer (mcr-backup.timer) runs the database snapshot daily at 02:00 UTC with 5-minute jitter.

Restore from Snapshot

Stop the service (see above).

Back up the current database:

cp /srv/mcr/mcr.db /srv/mcr/mcr.db.pre-restore

Copy the snapshot into place:

cp /srv/mcr/backups/mcr-YYYYMMDD-HHMMSS.db /srv/mcr/mcr.db

If restoring blobs as well:

rsync -a /backup/mcr/layers/ /srv/mcr/layers/

Start the service (see above).

Verify the service is healthy:

curl -k https://localhost:8443/v1/health

Verify an image pull works:

docker pull mcr.svc.mcp.metacircular.net:8443/<repo>:<tag>

Log Inspection

Container logs (mcr-api):

docker compose -f deploy/docker/docker-compose-rift.yml logs --tail 100 mcr-api

Container logs (mcr-web):

docker compose -f deploy/docker/docker-compose-rift.yml logs --tail 100 mcr-web

Follow logs in real time:

docker compose -f deploy/docker/docker-compose-rift.yml logs -f mcr-api mcr-web

Via MCP:

mcp logs mcr

MCR logs to stderr as structured text (slog). Log level is configured via [log] level in mcr.toml (debug, info, warn, error).

Garbage Collection

Garbage collection removes unreferenced blobs -- blobs no longer referenced by any manifest. GC acquires a registry-wide lock that blocks new blob uploads for the duration of the mark-and-sweep phase. Pulls are not blocked.

Trigger GC via CLI:

mcrctl gc --addr https://mcr.svc.mcp.metacircular.net:8443

Check GC status:

mcrctl gc status --addr https://mcr.svc.mcp.metacircular.net:8443

GC can also be triggered via the REST API:

curl -k -X POST -H "Authorization: Bearer <token>" https://localhost:8443/v1/gc

If a previous GC crashed after the database sweep but before filesystem cleanup, orphaned files may remain on disk. Run reconciliation to clean them up:

mcrctl gc --reconcile --addr https://mcr.svc.mcp.metacircular.net:8443

Incident Procedures

Database Corruption

Symptoms: server fails to start with SQLite errors, or API requests return unexpected errors.

Stop the service.
Check for WAL/SHM files alongside the database:
```
ls -la /srv/mcr/mcr.db*
```

Attempt an integrity check:

sqlite3 /srv/mcr/mcr.db "PRAGMA integrity_check;"

If integrity check fails, restore from the most recent snapshot:

cp /srv/mcr/mcr.db /srv/mcr/mcr.db.corrupt
cp /srv/mcr/backups/mcr-YYYYMMDD-HHMMSS.db /srv/mcr/mcr.db

Start the service and verify health.
Note: blobs on the filesystem are unaffected by database corruption. Images pushed after the snapshot was taken will be missing from metadata. Their blobs remain on disk and will be cleaned up by GC unless the metadata is re-created.

TLS Certificate Expiry

Symptoms: health check fails with TLS errors, Docker clients get certificate verification errors on push/pull.

Check certificate expiry:

openssl x509 -in /srv/mcr/certs/cert.pem -noout -enddate

Replace the certificate and key files at the paths configured in mcr.toml ([server] tls_cert and tls_key).
Restart the service to load the new certificate.

Verify health:

curl -k https://localhost:8443/v1/health

MCIAS Outage

Symptoms: push/pull fails with 401 or 502 errors. Authentication cannot complete.

Confirm MCIAS is unreachable:

curl -k https://svc.metacircular.net:8443/v1/health

Cached token validation results remain valid for up to 30 seconds after the last successful MCIAS check. Operations using recently-validated tokens may continue briefly.
Once cached tokens expire, all authenticated operations (push, pull, catalog, admin) will fail until MCIAS recovers.
The OCI /v2/ version check endpoint still responds (confirms MCR itself is running).
Escalate to MCIAS (see Escalation below).

Disk Full

Symptoms: blob uploads fail, database writes fail, container may crash.

Check disk usage:

df -h /srv/mcr/
du -sh /srv/mcr/layers/ /srv/mcr/uploads/ /srv/mcr/mcr.db

Clean up stale uploads:
```
ls -la /srv/mcr/uploads/
```
Remove upload files that are old and have no matching in-progress upload in the database.

Run garbage collection to reclaim unreferenced blobs:

mcrctl gc --addr https://mcr.svc.mcp.metacircular.net:8443

If GC does not free enough space, identify large repositories:

mcrctl repo list --addr https://mcr.svc.mcp.metacircular.net:8443

Delete unused tags or repositories to free space, then run GC again.
If the disk is completely full and the service cannot start, manually remove orphaned files from /srv/mcr/uploads/ to free enough space for the service to start, then run GC.

Image Push/Pull Failures

Symptoms: docker push or docker pull returns errors.

Verify the service is running and healthy:

curl -k https://localhost:8443/v1/health

Test OCI endpoint:
```
curl -k https://localhost:8443/v2/
```
Expected: HTTP 401 with WWW-Authenticate header.

Test authentication:

curl -k -u username:password https://localhost:8443/v2/token?service=mcr

Expected: HTTP 200 with a token response.

Check if the issue is policy-related (403 Denied):
```
mcrctl policy list --addr https://mcr.svc.mcp.metacircular.net:8443
```
Review policy rules for the affected account and repository.

Check audit log for denied requests:

mcrctl audit tail --n 20 --addr https://mcr.svc.mcp.metacircular.net:8443

For push failures, verify all referenced blobs exist before pushing the manifest. The error MANIFEST_BLOB_UNKNOWN means a layer was not uploaded before the manifest push.

Check logs for detailed error information:

docker compose -f deploy/docker/docker-compose-rift.yml logs --tail 50 mcr-api

MCP Deployment

MCR is deployed via MCP as a two-component service on the rift node.

Service Definition

name = "mcr"
node = "rift"
active = true

[[components]]
name = "api"
image = "mcr.svc.mcp.metacircular.net:8443/mcr:v1.1.0"
network = "mcpnet"
user = "0:0"
restart = "unless-stopped"
ports = ["127.0.0.1:28443:8443", "127.0.0.1:29443:9443"]
volumes = ["/srv/mcr:/srv/mcr"]
cmd = ["server", "--config", "/srv/mcr/mcr.toml"]

[[components]]
name = "web"
image = "mcr.svc.mcp.metacircular.net:8443/mcr-web:v1.1.0"
network = "mcpnet"
user = "0:0"
restart = "unless-stopped"
ports = ["127.0.0.1:28080:8080"]
volumes = ["/srv/mcr:/srv/mcr"]
cmd = ["server", "--config", "/srv/mcr/mcr.toml"]

Port Mapping

Component	Container Port	Host Port	Purpose
mcr-api	8443	28443	HTTPS (OCI + admin REST)
mcr-api	9443	29443	gRPC admin API
mcr-web	8080	28080	Web UI (HTTP, behind MC-Proxy)

Both containers share the /srv/mcr volume for configuration, database, and blob storage. They are connected to the mcpnet Docker network.

Escalation

Escalate when:

Database corruption cannot be resolved by restoring a snapshot.
MCIAS is down and registry operations are urgently needed.
Disk full cannot be resolved by GC and cleanup.
Push/pull failures persist after following the procedures above.
Any issue not covered by this runbook.

Escalation path: Kyle (platform owner).

11 KiB Raw Blame History

MCR Runbook

Service Overview

Health Checks

REST

gRPC

OCI Version Check

CLI

Common Operations

Start the Service (MCP)

Start the Service (Docker Compose)

Stop the Service

Restart the Service

Backup (Snapshot)

Restore from Snapshot

Log Inspection

Garbage Collection

Incident Procedures

Database Corruption

TLS Certificate Expiry

MCIAS Outage

Disk Full

Image Push/Pull Failures

MCP Deployment

Service Definition

Port Mapping

Escalation

11 KiB

Raw Blame History