Files
kte/docs/plans/syntax on.md
Kyle Isom cbbde43dc2 Stub out previous undo implementation; update docs.
- Remove outdated `undo-state.md`
- Add two code quality/optimization reports that were used to guide previous work:
  - `code-report.md` (optimization)
  - `code-report-quality.md` (stability and code health)
- Add `themes.md`.
- Update undo system docs and roadmap.
2025-12-03 15:12:28 -08:00

102 lines
6.9 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
### Objective
Introduce fast, minimaldependency syntax highlighting to kte, consistent with current architecture (Editor/Buffer + GUI/Terminal renderers), preserving ke UX and performance.
### Guiding principles
- Keep core small and fast; no heavy deps (C++17 only).
- Start simple (stateless line regex), evolve incrementally (stateful, caching).
- Work in both Terminal (ncurses) and GUI (ImGui) with consistent token classes and theme mapping.
- Integrate without disrupting existing search highlight, selection, or cursor rendering.
### Scope of v1
- Languages: plain text (off), C/C++ minimal set (keywords, types, strings, chars, comments, numbers, preprocessor).
- Stateless perline highlighting; handle singleline comments and strings; defer multiline state to v2.
- Toggle: `:syntax on|off` and perbuffer filetype selection.
### Architecture
1. Core types (new):
- `enum class TokenKind { Default, Keyword, Type, String, Char, Comment, Number, Preproc, Constant, Function, Operator, Punctuation, Identifier, Whitespace, Error };`
- `struct HighlightSpan { int col_start; int col_end; TokenKind kind; };` // 0based columns in buffer indices per rendered line
- `struct LineHighlight { std::vector<HighlightSpan> spans; uint64_t version; };`
2. Interfaces (new):
- `class LanguageHighlighter { public: virtual ~LanguageHighlighter() = default; virtual void HighlightLine(const Buffer& buf, int row, std::vector<HighlightSpan>& out) const = 0; virtual bool Stateful() const { return false; } };`
- `class HighlighterEngine { public: void SetHighlighter(std::unique_ptr<LanguageHighlighter>); const LineHighlight& GetLine(const Buffer&, int row, uint64_t buf_version); void InvalidateFrom(int row); };`
- `class HighlighterRegistry { public: static const LanguageHighlighter& ForFiletype(std::string_view ft); static std::string DetectForPath(std::string_view path, std::string_view first_line); };`
3. Editor/Buffer integration:
- PerBuffer settings: `bool syntax_enabled; std::string filetype; std::unique_ptr<HighlighterEngine> highlighter;`
- Buffer emits a monotonically increasing `version` on edit; renderers request line highlights by `(row, version)`.
- Invalidate cache minimally on edits (v1: current line only; v2: from current line down when stateful constructs present).
### Rendering integration
- TerminalRenderer/GUIRenderer changes:
- During line rendering, query `Editor.CurrentBuffer()->highlighter->GetLine(buf, row, buf_version)` to obtain spans.
- Apply token styles while drawing glyph runs.
- Zorder and blending:
1) Backgrounds (e.g., selection, search highlight rectangles)
2) Text with syntax colors
3) Cursor/IME decorations
- Search highlights must remain visible over syntax colors:
- Terminal: combine color/attr with reverse/bold for search; if color conflicts, prefer search.
- GUI: draw semitransparent rects behind text (already present); keep syntax color for text.
### Theme and color mapping
- Extend `GUITheme.h` with a `SyntaxPalette` mapping `TokenKind -> ImVec4 ink` (and optional background tint for comments/strings disabled by default). Provide default Light/Dark palettes.
- Terminal: map `TokenKind` to ncurses color pairs where available; degrade gracefully on 8/16color terminals (e.g., comments=dim, keywords=bold, strings=yellow/green if available).
### Language detection
- v1: by file extension; allow manual `:set filetype=<lang>`.
- v2: add shebang detection for scripts, simple modelines (optional).
### Commands/UX
- `:syntax on|off` — global default; buffer inherits on open.
- `:set filetype=<lang>` — perbuffer override.
- `:syntax reload` — rebuild patterns/themes.
- Status line shows filetype and syntax state when changed.
### Implementation plan (phased)
1. Phase 1 — Minimal regex highlighter for C/C++
- Implement `CppRegexHighlighter : LanguageHighlighter` with precompiled `std::regex` (or handrolled simple scanners to avoid regex backtracking). Classes: line comment `//…`, block comment start `/*` (no state), string `"…"`, char `'…'` (no multiline), numbers, keywords/types, preprocessor `^\s*#\w+`.
- Add `HighlighterEngine` with a simple perrow cache keyed by `(row, buf_version)`; no background worker.
- Integrate into both renderers; add palette to `GUITheme.h`; add terminal color selection.
- Add commands.
2. Phase 2 — Stateful constructs and more languages
- Add state machine for multiline comments `/*…*/` and multiline strings (C++11 raw strings), with invalidation from edit line downward until state stabilizes.
- Add simple highlighters: JSON (strings, numbers, booleans, null, punctuation), Markdown (headers/emphasis/code fences), Shell (comments, strings, keywords), Go (types, constants, keywords), Python (strings, comments, keywords), Rust (strings, comments, keywords), Lisp (comments, strings, keywords),.
- Filetype detection by extension + shebang.
3. Phase 3 — Performance and caching
- Viewportfirst highlighting: compute only visible rows each frame; background task warms cache around viewport.
- Reuse span buffers, avoid allocations; smallvector optimization if needed.
- Bench with large files; ensure O(n_visible) cost per frame.
4. Phase 4 — Extensibility
- Public registration API for external highlighters.
- Optional Treesitter adapter behind a compile flag (off by default) to keep dependencies minimal.
### Data flow (per frame)
- Renderer asks Editor for Buffer and viewport rows.
- For each row: `engine.GetLine(buf, row, buf.version)` → spans.
- Renderer emits runs with style from `SyntaxPalette[kind]`.
- Search highlights are applied as separate background rectangles (GUI) or attribute toggles (Terminal), not overriding text color.
### Testing
- Unit tests for tokenization per language: golden inputs → spans.
- Fuzz/edge cases: escaped quotes, numeric literals, preprocessor lines.
- Renderer tests with `TestRenderer` asserting the sequence of style changes for a line.
- Performance tests: highlight 1k visible lines repeatedly; assert time under threshold.
### Risks and mitigations
- Regex backtracking/perf: prefer linear scans; precompute keyword tables; avoid nested regex.
- Terminal color limitations: featuredetect colors; provide bold/dim fallbacks.
- Stateful correctness: invalidate conservatively (from edit line downward) and cap work per frame.
### Deliverables
- New files: `Highlight.h/.cc`, `HighlighterEngine.h/.cc`, `LanguageHighlighter.h`, `CppHighlighter.h/.cc`, optional `HighlighterRegistry.h/.cc`.
- Renderer updates: `GUIRenderer.cc`, `TerminalRenderer.cc` to consume spans.
- Theming: `GUITheme.h` additions for syntax colors.
- Editor/Buffer: perbuffer syntax settings and highlighter handle.
- Commands in `Command.cc` and help text updates.
- Docs: README/ROADMAP update and a brief `docs/syntax.md`.
- Tests: unit and renderer golden tests.