Add swap file journaling for crash recovery.

- Introduced `SwapManager` for buffering and writing incremental edits to sidecar `.kte.swp` files. - Implemented basic operations: insertion, deletion, split, join, and checkpointing. - Added recovery design doc (`docs/plans/swap-files.md`). - Updated editor initialization to integrate `SwapManager` instance for crash recovery across buffers.
2025-12-04 08:48:32 -08:00
parent 495183ebd2
commit 78b9345799
24 changed files with 1933 additions and 545 deletions
--- a/docs/TestFrontend.md
+++ b/docs/TestFrontend.md
@@ -2,27 +2,43 @@

 ## Overview

-`TestFrontend` is a headless implementation of the `Frontend` interface designed to facilitate programmatic testing of editor features. It allows you to queue commands and text input manually, execute them step-by-step, and inspect the editor/buffer state.
+`TestFrontend` is a headless implementation of the `Frontend` interface
+designed to facilitate programmatic testing of editor features. It
+allows you to queue commands and text input manually, execute them
+step-by-step, and inspect the editor/buffer state.

 ## Components

 ### TestInputHandler
+
 A programmable input handler that uses a queue-based system:
- `QueueCommand(CommandId id, const std::string &arg = "", int count = 0)` - Queue a specific command
- `QueueText(const std::string &text)` - Queue text for insertion (character by character)
+
+-
+`QueueCommand(CommandId id, const std::string &arg = "", int count = 0)` -
+Queue a specific command
+- `QueueText(const std::string &text)` - Queue text for insertion (
+  character by character)
 - `Poll(MappedInput &out)` - Returns queued commands one at a time
 - `IsEmpty()` - Check if the input queue is empty

 ### TestRenderer
+
 A minimal no-op renderer for testing:
- `Draw(Editor &ed)` - No-op implementation, just increments draw counter
+
+- `Draw(Editor &ed)` - No-op implementation, just increments draw
+  counter
 - `GetDrawCount()` - Returns the number of times Draw() was called
 - `ResetDrawCount()` - Resets the draw counter

 ### TestFrontend
-The main frontend class that integrates TestInputHandler and TestRenderer:
- `Init(Editor &ed)` - Initializes the frontend (sets editor dimensions to 24x80)
- `Step(Editor &ed, bool &running)` - Processes one command from the queue and renders
+
+The main frontend class that integrates TestInputHandler and
+TestRenderer:
+
+- `Init(Editor &ed)` - Initializes the frontend (sets editor dimensions
+  to 24x80)
+- `Step(Editor &ed, bool &running)` - Processes one command from the
+  queue and renders
 - `Shutdown()` - Cleanup (no-op for TestFrontend)
 - `Input()` - Access the TestInputHandler
 - `Renderer()` - Access the TestRenderer
@@ -75,31 +91,55 @@ int main() {

 ## Key Features

-1. **Programmable Input**: Queue any sequence of commands or text programmatically
+1. **Programmable Input**: Queue any sequence of commands or text
+   programmatically
 2. **Step-by-Step Execution**: Run the editor one command at a time
-3. **State Inspection**: Access and verify editor/buffer state between commands
-4. **No UI Dependencies**: Headless operation, no terminal or GUI required
-5. **Integration Testing**: Test command sequences, undo/redo, multi-line editing, etc.
+3. **State Inspection**: Access and verify editor/buffer state between
+   commands
+4. **No UI Dependencies**: Headless operation, no terminal or GUI
+   required
+5. **Integration Testing**: Test command sequences, undo/redo,
+   multi-line editing, etc.

 ## Available Commands

 All commands from `CommandId` enum can be queued, including:
+
 - `CommandId::InsertText` - Insert text (use `QueueText()` helper)
 - `CommandId::Newline` - Insert newline
- `CommandId::Backspace` - Delete character before cursor  
+- `CommandId::Backspace` - Delete character before cursor
 - `CommandId::DeleteChar` - Delete character at cursor
- `CommandId::MoveLeft`, `MoveRight`, `MoveUp`, `MoveDown` - Cursor movement
+- `CommandId::MoveLeft`, `MoveRight`, `MoveUp`, `MoveDown` - Cursor
+  movement
 - `CommandId::Undo`, `CommandId::Redo` - Undo/redo operations
 - `CommandId::Save`, `CommandId::Quit` - File operations
 - And many more (see Command.h)

 ## Integration

-TestFrontend is built into both `kte` and `kge` executables as part of the common source files. You can create standalone test programs by linking against the same source files and ncurses.
+TestFrontend is built into both `kte` and `kge` executables as part of
+the common source files. You can create standalone test programs by
+linking against the same source files and ncurses.

 ## Notes

 - Always call `InstallDefaultCommands()` before using any commands
- Buffer must be initialized (via `OpenFile()` or `AddBuffer()`) before queuing edit commands
+- Buffer must be initialized (via `OpenFile()` or `AddBuffer()`) before
+  queuing edit commands
 - Undo/redo requires the buffer to have an UndoSystem attached
 - The test frontend sets editor dimensions to 24x80 by default
+
+## Highlighter stress harness
+
+For renderer/highlighter race testing without a UI, `kte` provides a
+lightweight stress mode:
+
+```
+kte --stress-highlighter=5
+```
+
+This runs a short synthetic workload (5 seconds by default) that edits
+and scrolls a buffer while
+exercising `HighlighterEngine::PrefetchViewport` and `GetLine`
+concurrently. Use Debug builds with
+AddressSanitizer enabled for best effect.
--- a/docs/plans/swap-files.md
+++ b/docs/plans/swap-files.md
@@ -0,0 +1,144 @@
+Swap files for kte — design plan
+================================
+
+Goals
+-----
+
+- Preserve user work across crashes, power failures, and OS kills.
+- Keep the editor responsive; avoid blocking the UI on disk I/O.
+- Bound recovery time and swap size.
+- Favor simple, robust primitives that work well on POSIX and macOS;
+  keep Windows feasibility in mind.
+
+Model overview
+--------------
+Per open buffer, maintain a sidecar swap journal next to the file:
+
+- Path: `.<basename>.kte.swp` in the same directory as the file (for
+  unnamed/unsaved buffers, use a per‑session temp dir like
+  `$TMPDIR/kte/` with a random UUID).
+- Format: append‑only journal of editing operations with periodic
+  checkpoints.
+- Crash safety: only append, fsync as per policy; checkpoint via
+  write‑to‑temp + fsync + atomic rename.
+
+File format (v1)
+----------------
+Header (fixed 64 bytes):
+
+- Magic: `KTE_SWP\0` (8 bytes)
+- Version: 1 (u32)
+- Flags: bitset (u32) — e.g., compression, checksums, endian.
+- Created time (u64)
+- Host info hash (u64) — optional, for telemetry/debug.
+- File identity: hash of canonical path (u64) and original file
+  size+mtime (u64+u64) at start.
+- Reserved/padding.
+
+Records (stream after header):
+
+- Each record: [type u8][len u24][payload][crc32 u32]
+- Types:
+    - `CHKPT` — full snapshot checkpoint of entire buffer content and
+      minimal metadata (cursor pos, filetype). Payload optionally
+      compressed. Written occasionally to cap replay time.
+    - `INS` — insert at (row, col) text bytes (text may contain
+      newlines). Encoded with varints.
+    - `DEL` — delete length at (row, col). If spanning lines, semantics
+      defined as in Buffer::delete_text.
+    - `SPLIT`, `JOIN` — explicit structural ops (optional; can be
+      expressed via INS/DEL).
+    - `META` — update metadata (e.g., filetype, encoding hints).
+
+Durability policy
+-----------------
+Configurable knobs (sane defaults in parentheses):
+
+- Time‑based flush: group edits and flush every 150–300 ms (200 ms).
+- Operation count flush: after N ops (200).
+- Idle flush: on 500 ms idle lull, flush immediately.
+- Checkpoint cadence: after M KB of journal (512–2048 KB) or T seconds (
+  30–120 s), whichever first.
+- fsync policy:
+    - `always`: fsync every flush (safest, slowest).
+    - `grouped` (default): fsync at most every 1–2 s or on
+      idle/blur/quit.
+    - `never`: rely on OS flush (fastest, riskier).
+    - On POSIX, prefer `fdatasync` when available; fall back to `fsync`.
+
+Performance & threading
+-----------------------
+
+- Background writer thread per editor instance (shared) with a bounded
+  MPSC queue of per‑buffer records.
+- Each Buffer has a small in‑memory journal buffer; UI thread enqueues
+  ops (non‑blocking) and may coalesce adjacent inserts/deletes.
+- Writer batch‑writes records to the swap file, computes CRCs, and
+  decides checkpoint boundaries.
+- Backpressure: if the queue grows beyond a high watermark, signal the
+  UI to start coalescing more aggressively and slow enqueue (never block
+  hard editing path; at worst drop optional `META`).
+
+Recovery flow
+-------------
+
+On opening a file:
+
+1. Detect swap sidecar `.<basename>.kte.swp`.
+2. Validate header, iterate records verifying CRCs.
+3. Compare recorded original file identity against actual file; if
+   mismatch, warn user but allow recovery (content wins).
+4. Reconstruct buffer: start from the last good `CHKPT` (if any), then
+   replay subsequent ops. If trailing partial record encountered (EOF
+   mid‑record), truncate at last good offset.
+5. Present a choice: Recover (load recovered buffer; keep the swap file
+   until user saves) or Discard (delete swap file and open clean file).
+
+Stability & corruption mitigation
+---------------------------------
+
+- Append‑only with per‑record CRC32 guards against torn writes.
+- Atomic checkpoint rotation: write `.<basename>.kte.swp.tmp`, fsync,
+  then rename over old `.swp`.
+- Size caps: rotate or compact when `.swp` exceeds a threshold (e.g.,
+  64–128 MB). Compaction creates a fresh file with a single checkpoint.
+- Low‑disk‑space behavior: on write failures, surface a non‑modal
+  warning and temporarily fall back to in‑memory only; retry
+  opportunistically.
+
+Security considerations
+-----------------------
+
+- Swap files mirror buffer content, which may be sensitive. Options:
+    - Configurable location (same dir vs. `$XDG_STATE_HOME/kte/swap`).
+    - Optional per‑file encryption (future work) using OS keychain.
+    - Ensure permissions are 0600.
+
+Interoperability & UX
+---------------------
+
+- Use a distinctive extension `.kte.swp` to avoid conflicts with other
+  editors.
+- Status bar indicator when swap is active; commands to purge/compact.
+- On save: do not delete swap immediately; keep until the buffer is
+  clean and idle for a short grace period (allows undo of accidental
+  external changes).
+
+Implementation plan (staged)
+----------------------------
+
+1. Minimal journal writer (append‑only INS/DEL) with grouped fsync;
+   single per‑editor writer thread.
+2. Reader/recovery path with CRC validation and replay.
+3. Checkpoints + atomic rotation; compaction path.
+4. Config surface and UI prompts; telemetry counters.
+5. Optional compression and advanced coalescing.
+
+Defaults balancing performance and stability
+-------------------------------------------
+
+- Grouped flush with fsync every ~1 s or on idle/quit.
+- Checkpoint every 1 MB or 60 s.
+- Bounded queue and batch writes to minimize syscalls.
+- Immediate flush on critical events (buffer close, app quit, power
+  source change on laptops if detectable).
--- a/docs/syntax.md
+++ b/docs/syntax.md
@@ -4,67 +4,118 @@ Syntax highlighting in kte
 Overview
 --------

-kte provides lightweight syntax highlighting with a pluggable highlighter interface. The initial implementation targets C/C++ and focuses on speed and responsiveness.
+kte provides lightweight syntax highlighting with a pluggable
+highlighter interface. The initial implementation targets C/C++ and
+focuses on speed and responsiveness.

 Core types
 ----------

- `TokenKind` — token categories (keywords, types, strings, comments, numbers, preprocessor, operators, punctuation, identifiers, whitespace, etc.).
- `HighlightSpan` — a half-open column range `[col_start, col_end)` with a `TokenKind`.
- `LineHighlight` — a vector of `HighlightSpan` and the buffer `version` used to compute it.
+- `TokenKind` — token categories (keywords, types, strings, comments,
+  numbers, preprocessor, operators, punctuation, identifiers,
+  whitespace, etc.).
+- `HighlightSpan` — a half-open column range `[col_start, col_end)` with
+  a `TokenKind`.
+- `LineHighlight` — a vector of `HighlightSpan` and the buffer `version`
+  used to compute it.

 Engine and caching
 ------------------

- `HighlighterEngine` maintains a per-line cache of `LineHighlight` keyed by row and buffer version.
- Cache invalidation occurs when the buffer version changes or when the buffer calls `InvalidateFrom(row)`, which clears cached lines and line states from `row` downward.
- The engine supports both stateless and stateful highlighters. For stateful highlighters, it memoizes a simple per-line state and computes lines sequentially when necessary.
+- `HighlighterEngine` maintains a per-line cache of `LineHighlight`
+  keyed by row and buffer version.
+- Cache invalidation occurs when the buffer version changes or when the
+  buffer calls `InvalidateFrom(row)`, which clears cached lines and line
+  states from `row` downward.
+- The engine supports both stateless and stateful highlighters. For
+  stateful highlighters, it memoizes a simple per-line state and
+  computes lines sequentially when necessary.

 Stateful highlighters
 ---------------------

- `LanguageHighlighter` is the base interface for stateless per-line tokenization.
- `StatefulHighlighter` extends it with a `LineState` and the method `HighlightLineStateful(buf, row, prev_state, out)`.
- The engine detects `StatefulHighlighter` via dynamic_cast and feeds each line the previous line’s state, caching the resulting state per line.
+- `LanguageHighlighter` is the base interface for stateless per-line
+  tokenization.
+- `StatefulHighlighter` extends it with a `LineState` and the method
+  `HighlightLineStateful(buf, row, prev_state, out)`.
+- The engine detects `StatefulHighlighter` via dynamic_cast and feeds
+  each line the previous line’s state, caching the resulting state per
+  line.

 C/C++ highlighter
 -----------------

 - `CppHighlighter` implements `StatefulHighlighter`.
- Stateless constructs: line comments `//`, strings `"..."`, chars `'...'`, numbers, identifiers (keywords/types), preprocessor at beginning of line after leading whitespace, operators/punctuation, and whitespace.
+- Stateless constructs: line comments `//`, strings `"..."`, chars
+  `'...'`, numbers, identifiers (keywords/types), preprocessor at
+  beginning of line after leading whitespace, operators/punctuation, and
+  whitespace.
 - Stateful constructs (v2):
-  - Multi-line block comments `/* ... */` — the state records whether the next line continues a comment.
-  - Raw strings `R"delim(... )delim"` — the state tracks whether we are inside a raw string and its delimiter `delim` until the closing sequence appears.
+    - Multi-line block comments `/* ... */` — the state records whether
+      the next line continues a comment.
+    - Raw strings `R"delim(... )delim"` — the state tracks whether we
+      are inside a raw string and its delimiter `delim` until the
+      closing sequence appears.

 Limitations and TODOs
 ---------------------

- Raw string detection is intentionally simple and does not handle all corner cases of the C++ standard.
- Preprocessor handling is line-based; continuation lines with `\\` are not yet tracked.
- No semantic analysis; identifiers are classified via small keyword/type sets.
- Additional languages (JSON, Markdown, Shell, Python, Go, Rust, Lisp, …) are planned.
- Terminal color mapping is conservative to support 8/16-color terminals. Rich color-pair themes can be added later.
+- Raw string detection is intentionally simple and does not handle all
+  corner cases of the C++ standard.
+- Preprocessor handling is line-based; continuation lines with `\\` are
+  not yet tracked.
+- No semantic analysis; identifiers are classified via small
+  keyword/type sets.
+- Additional languages (JSON, Markdown, Shell, Python, Go, Rust,
+  Lisp, …) are planned.
+- Terminal color mapping is conservative to support 8/16-color
+  terminals. Rich color-pair themes can be added later.

 Renderer integration
 --------------------

- Terminal and GUI renderers request line spans via `Highlighter()->GetLine(buf, row, buf.Version())`.
- Search highlight and cursor overlays take precedence over syntax colors.
+- Terminal and GUI renderers request line spans via
+  `Highlighter()->GetLine(buf, row, buf.Version())`.
+- Search highlight and cursor overlays take precedence over syntax
+  colors.
+
+Renderer-side robustness
+------------------------
+
+- Renderers defensively sanitize `HighlightSpan` data before use to
+  ensure stability even if a highlighter misbehaves:
+    - Clamp `col_start/col_end` to the line length and ensure
+      `end >= start`.
+    - Drop empty/invalid spans and sort by start.
+    - Clip drawing to the horizontally visible region and the
+      tab-expanded line length.
+- The highlighter engine returns `LineHighlight` by value to avoid
+  cross-thread lifetime issues; renderers operate on a local copy for
+  each frame.

 Extensibility (Phase 4)
 -----------------------

- Public registration API: external code can register custom highlighters by filetype.
-  - Use `HighlighterRegistry::Register("mylang", []{ return std::make_unique<MyHighlighter>(); });`
-  - Registered factories are preferred over built-ins for the same filetype key.
-  - Filetype keys are normalized via `HighlighterRegistry::Normalize()`.
- Optional Tree-sitter adapter: disabled by default to keep dependencies minimal.
-  - Enable with CMake option `-DKTE_ENABLE_TREESITTER=ON` and provide
-    `-DTREESITTER_INCLUDE_DIR=...` and `-DTREESITTER_LIBRARY=...` if needed.
-  - Register a Tree-sitter-backed highlighter for a language (example assumes you link a grammar):
-    ```c++
-    extern "C" const TSLanguage* tree_sitter_c();
-    kte::HighlighterRegistry::RegisterTreeSitter("c", &tree_sitter_c);
-    ```
-  - Current adapter is a stub scaffold; it compiles and integrates cleanly when enabled, but
-    intentionally emits no spans until Tree-sitter node-to-token mapping is implemented.
+- Public registration API: external code can register custom
+  highlighters by filetype.
+    - Use
+      `HighlighterRegistry::Register("mylang", []{ return std::make_unique<MyHighlighter>(); });`
+    - Registered factories are preferred over built-ins for the same
+      filetype key.
+    - Filetype keys are normalized via
+      `HighlighterRegistry::Normalize()`.
+- Optional Tree-sitter adapter: disabled by default to keep dependencies
+  minimal.
+    - Enable with CMake option `-DKTE_ENABLE_TREESITTER=ON` and provide
+      `-DTREESITTER_INCLUDE_DIR=...` and `-DTREESITTER_LIBRARY=...` if
+      needed.
+    - Register a Tree-sitter-backed highlighter for a language (example
+      assumes you link a grammar):
+      ```c++
+      extern "C" const TSLanguage* tree_sitter_c();
+      kte::HighlighterRegistry::RegisterTreeSitter("c", &tree_sitter_c);
+      ```
+    - Current adapter is a stub scaffold; it compiles and integrates
+      cleanly when enabled, but
+      intentionally emits no spans until Tree-sitter node-to-token
+      mapping is implemented.