- Introduced `HighlighterRegistry` with support for multiple language highlighters (e.g., JSON, Markdown, Python). - Added `JsonHighlighter` implementation for basic JSON syntax highlighting.
2.8 KiB
2.8 KiB
Syntax highlighting in kte
Overview
kte provides lightweight syntax highlighting with a pluggable highlighter interface. The initial implementation targets C/C++ and focuses on speed and responsiveness.
Core types
TokenKind— token categories (keywords, types, strings, comments, numbers, preprocessor, operators, punctuation, identifiers, whitespace, etc.).HighlightSpan— a half-open column range[col_start, col_end)with aTokenKind.LineHighlight— a vector ofHighlightSpanand the bufferversionused to compute it.
Engine and caching
HighlighterEnginemaintains a per-line cache ofLineHighlightkeyed by row and buffer version.- Cache invalidation occurs when the buffer version changes or when the buffer calls
InvalidateFrom(row), which clears cached lines and line states fromrowdownward. - The engine supports both stateless and stateful highlighters. For stateful highlighters, it memoizes a simple per-line state and computes lines sequentially when necessary.
Stateful highlighters
LanguageHighlighteris the base interface for stateless per-line tokenization.StatefulHighlighterextends it with aLineStateand the methodHighlightLineStateful(buf, row, prev_state, out).- The engine detects
StatefulHighlightervia dynamic_cast and feeds each line the previous line’s state, caching the resulting state per line.
C/C++ highlighter
CppHighlighterimplementsStatefulHighlighter.- Stateless constructs: line comments
//, strings"...", chars'...', numbers, identifiers (keywords/types), preprocessor at beginning of line after leading whitespace, operators/punctuation, and whitespace. - Stateful constructs (v2):
- Multi-line block comments
/* ... */— the state records whether the next line continues a comment. - Raw strings
R"delim(... )delim"— the state tracks whether we are inside a raw string and its delimiterdelimuntil the closing sequence appears.
- Multi-line block comments
Limitations and TODOs
- Raw string detection is intentionally simple and does not handle all corner cases of the C++ standard.
- Preprocessor handling is line-based; continuation lines with
\\are not yet tracked. - No semantic analysis; identifiers are classified via small keyword/type sets.
- Additional languages (JSON, Markdown, Shell, Python, Go, Rust, Lisp, …) are planned.
- Terminal color mapping is conservative to support 8/16-color terminals. Rich color-pair themes can be added later.
Renderer integration
- Terminal and GUI renderers request line spans via
Highlighter()->GetLine(buf, row, buf.Version()). - Search highlight and cursor overlays take precedence over syntax colors.