Files
kte/docs/syntax.md
Kyle Isom 78b9345799 Add swap file journaling for crash recovery.
- Introduced `SwapManager` for buffering and writing incremental edits to sidecar `.kte.swp` files.
- Implemented basic operations: insertion, deletion, split, join, and checkpointing.
- Added recovery design doc (`docs/plans/swap-files.md`).
- Updated editor initialization to integrate `SwapManager` instance for crash recovery across buffers.
2025-12-04 08:48:32 -08:00

4.5 KiB
Raw Permalink Blame History

Syntax highlighting in kte

Overview

kte provides lightweight syntax highlighting with a pluggable highlighter interface. The initial implementation targets C/C++ and focuses on speed and responsiveness.

Core types

  • TokenKind — token categories (keywords, types, strings, comments, numbers, preprocessor, operators, punctuation, identifiers, whitespace, etc.).
  • HighlightSpan — a half-open column range [col_start, col_end) with a TokenKind.
  • LineHighlight — a vector of HighlightSpan and the buffer version used to compute it.

Engine and caching

  • HighlighterEngine maintains a per-line cache of LineHighlight keyed by row and buffer version.
  • Cache invalidation occurs when the buffer version changes or when the buffer calls InvalidateFrom(row), which clears cached lines and line states from row downward.
  • The engine supports both stateless and stateful highlighters. For stateful highlighters, it memoizes a simple per-line state and computes lines sequentially when necessary.

Stateful highlighters

  • LanguageHighlighter is the base interface for stateless per-line tokenization.
  • StatefulHighlighter extends it with a LineState and the method HighlightLineStateful(buf, row, prev_state, out).
  • The engine detects StatefulHighlighter via dynamic_cast and feeds each line the previous lines state, caching the resulting state per line.

C/C++ highlighter

  • CppHighlighter implements StatefulHighlighter.
  • Stateless constructs: line comments //, strings "...", chars '...', numbers, identifiers (keywords/types), preprocessor at beginning of line after leading whitespace, operators/punctuation, and whitespace.
  • Stateful constructs (v2):
    • Multi-line block comments /* ... */ — the state records whether the next line continues a comment.
    • Raw strings R"delim(... )delim" — the state tracks whether we are inside a raw string and its delimiter delim until the closing sequence appears.

Limitations and TODOs

  • Raw string detection is intentionally simple and does not handle all corner cases of the C++ standard.
  • Preprocessor handling is line-based; continuation lines with \\ are not yet tracked.
  • No semantic analysis; identifiers are classified via small keyword/type sets.
  • Additional languages (JSON, Markdown, Shell, Python, Go, Rust, Lisp, …) are planned.
  • Terminal color mapping is conservative to support 8/16-color terminals. Rich color-pair themes can be added later.

Renderer integration

  • Terminal and GUI renderers request line spans via Highlighter()->GetLine(buf, row, buf.Version()).
  • Search highlight and cursor overlays take precedence over syntax colors.

Renderer-side robustness

  • Renderers defensively sanitize HighlightSpan data before use to ensure stability even if a highlighter misbehaves:
    • Clamp col_start/col_end to the line length and ensure end >= start.
    • Drop empty/invalid spans and sort by start.
    • Clip drawing to the horizontally visible region and the tab-expanded line length.
  • The highlighter engine returns LineHighlight by value to avoid cross-thread lifetime issues; renderers operate on a local copy for each frame.

Extensibility (Phase 4)

  • Public registration API: external code can register custom highlighters by filetype.
    • Use HighlighterRegistry::Register("mylang", []{ return std::make_unique<MyHighlighter>(); });
    • Registered factories are preferred over built-ins for the same filetype key.
    • Filetype keys are normalized via HighlighterRegistry::Normalize().
  • Optional Tree-sitter adapter: disabled by default to keep dependencies minimal.
    • Enable with CMake option -DKTE_ENABLE_TREESITTER=ON and provide -DTREESITTER_INCLUDE_DIR=... and -DTREESITTER_LIBRARY=... if needed.
    • Register a Tree-sitter-backed highlighter for a language (example assumes you link a grammar):
      extern "C" const TSLanguage* tree_sitter_c();
      kte::HighlighterRegistry::RegisterTreeSitter("c", &tree_sitter_c);
      
    • Current adapter is a stub scaffold; it compiles and integrates cleanly when enabled, but intentionally emits no spans until Tree-sitter node-to-token mapping is implemented.