Files
kte/docs/plans/syntax on.md
Kyle Isom cbbde43dc2 Stub out previous undo implementation; update docs.
- Remove outdated `undo-state.md`
- Add two code quality/optimization reports that were used to guide previous work:
  - `code-report.md` (optimization)
  - `code-report-quality.md` (stability and code health)
- Add `themes.md`.
- Update undo system docs and roadmap.
2025-12-03 15:12:28 -08:00

6.9 KiB
Raw Blame History

Objective

Introduce fast, minimaldependency syntax highlighting to kte, consistent with current architecture (Editor/Buffer + GUI/Terminal renderers), preserving ke UX and performance.

Guiding principles

  • Keep core small and fast; no heavy deps (C++17 only).
  • Start simple (stateless line regex), evolve incrementally (stateful, caching).
  • Work in both Terminal (ncurses) and GUI (ImGui) with consistent token classes and theme mapping.
  • Integrate without disrupting existing search highlight, selection, or cursor rendering.

Scope of v1

  • Languages: plain text (off), C/C++ minimal set (keywords, types, strings, chars, comments, numbers, preprocessor).
  • Stateless perline highlighting; handle singleline comments and strings; defer multiline state to v2.
  • Toggle: :syntax on|off and perbuffer filetype selection.

Architecture

  1. Core types (new):

    • enum class TokenKind { Default, Keyword, Type, String, Char, Comment, Number, Preproc, Constant, Function, Operator, Punctuation, Identifier, Whitespace, Error };
    • struct HighlightSpan { int col_start; int col_end; TokenKind kind; }; // 0based columns in buffer indices per rendered line
    • struct LineHighlight { std::vector<HighlightSpan> spans; uint64_t version; };
  2. Interfaces (new):

    • class LanguageHighlighter { public: virtual ~LanguageHighlighter() = default; virtual void HighlightLine(const Buffer& buf, int row, std::vector<HighlightSpan>& out) const = 0; virtual bool Stateful() const { return false; } };
    • class HighlighterEngine { public: void SetHighlighter(std::unique_ptr<LanguageHighlighter>); const LineHighlight& GetLine(const Buffer&, int row, uint64_t buf_version); void InvalidateFrom(int row); };
    • class HighlighterRegistry { public: static const LanguageHighlighter& ForFiletype(std::string_view ft); static std::string DetectForPath(std::string_view path, std::string_view first_line); };
  3. Editor/Buffer integration:

    • PerBuffer settings: bool syntax_enabled; std::string filetype; std::unique_ptr<HighlighterEngine> highlighter;
    • Buffer emits a monotonically increasing version on edit; renderers request line highlights by (row, version).
    • Invalidate cache minimally on edits (v1: current line only; v2: from current line down when stateful constructs present).

Rendering integration

  • TerminalRenderer/GUIRenderer changes:
    • During line rendering, query Editor.CurrentBuffer()->highlighter->GetLine(buf, row, buf_version) to obtain spans.
    • Apply token styles while drawing glyph runs.
  • Zorder and blending:
    1. Backgrounds (e.g., selection, search highlight rectangles)
    2. Text with syntax colors
    3. Cursor/IME decorations
  • Search highlights must remain visible over syntax colors:
    • Terminal: combine color/attr with reverse/bold for search; if color conflicts, prefer search.
    • GUI: draw semitransparent rects behind text (already present); keep syntax color for text.

Theme and color mapping

  • Extend GUITheme.h with a SyntaxPalette mapping TokenKind -> ImVec4 ink (and optional background tint for comments/strings disabled by default). Provide default Light/Dark palettes.
  • Terminal: map TokenKind to ncurses color pairs where available; degrade gracefully on 8/16color terminals (e.g., comments=dim, keywords=bold, strings=yellow/green if available).

Language detection

  • v1: by file extension; allow manual :set filetype=<lang>.
  • v2: add shebang detection for scripts, simple modelines (optional).

Commands/UX

  • :syntax on|off — global default; buffer inherits on open.
  • :set filetype=<lang> — perbuffer override.
  • :syntax reload — rebuild patterns/themes.
  • Status line shows filetype and syntax state when changed.

Implementation plan (phased)

  1. Phase 1 — Minimal regex highlighter for C/C++

    • Implement CppRegexHighlighter : LanguageHighlighter with precompiled std::regex (or handrolled simple scanners to avoid regex backtracking). Classes: line comment //…, block comment start /* (no state), string "…", char '…' (no multiline), numbers, keywords/types, preprocessor ^\s*#\w+.
    • Add HighlighterEngine with a simple perrow cache keyed by (row, buf_version); no background worker.
    • Integrate into both renderers; add palette to GUITheme.h; add terminal color selection.
    • Add commands.
  2. Phase 2 — Stateful constructs and more languages

    • Add state machine for multiline comments /*…*/ and multiline strings (C++11 raw strings), with invalidation from edit line downward until state stabilizes.
    • Add simple highlighters: JSON (strings, numbers, booleans, null, punctuation), Markdown (headers/emphasis/code fences), Shell (comments, strings, keywords), Go (types, constants, keywords), Python (strings, comments, keywords), Rust (strings, comments, keywords), Lisp (comments, strings, keywords),.
    • Filetype detection by extension + shebang.
  3. Phase 3 — Performance and caching

    • Viewportfirst highlighting: compute only visible rows each frame; background task warms cache around viewport.
    • Reuse span buffers, avoid allocations; smallvector optimization if needed.
    • Bench with large files; ensure O(n_visible) cost per frame.
  4. Phase 4 — Extensibility

    • Public registration API for external highlighters.
    • Optional Treesitter adapter behind a compile flag (off by default) to keep dependencies minimal.

Data flow (per frame)

  • Renderer asks Editor for Buffer and viewport rows.
  • For each row: engine.GetLine(buf, row, buf.version) → spans.
  • Renderer emits runs with style from SyntaxPalette[kind].
  • Search highlights are applied as separate background rectangles (GUI) or attribute toggles (Terminal), not overriding text color.

Testing

  • Unit tests for tokenization per language: golden inputs → spans.
  • Fuzz/edge cases: escaped quotes, numeric literals, preprocessor lines.
  • Renderer tests with TestRenderer asserting the sequence of style changes for a line.
  • Performance tests: highlight 1k visible lines repeatedly; assert time under threshold.

Risks and mitigations

  • Regex backtracking/perf: prefer linear scans; precompute keyword tables; avoid nested regex.
  • Terminal color limitations: featuredetect colors; provide bold/dim fallbacks.
  • Stateful correctness: invalidate conservatively (from edit line downward) and cap work per frame.

Deliverables

  • New files: Highlight.h/.cc, HighlighterEngine.h/.cc, LanguageHighlighter.h, CppHighlighter.h/.cc, optional HighlighterRegistry.h/.cc.
  • Renderer updates: GUIRenderer.cc, TerminalRenderer.cc to consume spans.
  • Theming: GUITheme.h additions for syntax colors.
  • Editor/Buffer: perbuffer syntax settings and highlighter handle.
  • Commands in Command.cc and help text updates.
  • Docs: README/ROADMAP update and a brief docs/syntax.md.
  • Tests: unit and renderer golden tests.