Files
kte/docs/syntax.md
Kyle Isom ceef6af3ae
Some checks failed
Release / Bump Homebrew formula (push) Has been cancelled
Release / Build Linux amd64 (push) Has been cancelled
Release / Build Linux arm64 (push) Has been cancelled
Release / Build macOS arm64 (.app) (push) Has been cancelled
Release / Create GitHub Release (push) Has been cancelled
Add extensible highlighter registration and Tree-sitter support.
- Implemented runtime API for registering custom highlighters.
- Added optional Tree-sitter integration for advanced syntax parsing (disabled by default).
- Updated buffer initialization and copying to support dynamic highlighter configuration.
- Introduced `NullHighlighter` as a fallback for unsupported filetypes.
- Enhanced CMake configuration with `KTE_ENABLE_TREESITTER` option.
2025-12-01 19:04:37 -08:00

71 lines
3.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
Syntax highlighting in kte
==========================
Overview
--------
kte provides lightweight syntax highlighting with a pluggable highlighter interface. The initial implementation targets C/C++ and focuses on speed and responsiveness.
Core types
----------
- `TokenKind` — token categories (keywords, types, strings, comments, numbers, preprocessor, operators, punctuation, identifiers, whitespace, etc.).
- `HighlightSpan` — a half-open column range `[col_start, col_end)` with a `TokenKind`.
- `LineHighlight` — a vector of `HighlightSpan` and the buffer `version` used to compute it.
Engine and caching
------------------
- `HighlighterEngine` maintains a per-line cache of `LineHighlight` keyed by row and buffer version.
- Cache invalidation occurs when the buffer version changes or when the buffer calls `InvalidateFrom(row)`, which clears cached lines and line states from `row` downward.
- The engine supports both stateless and stateful highlighters. For stateful highlighters, it memoizes a simple per-line state and computes lines sequentially when necessary.
Stateful highlighters
---------------------
- `LanguageHighlighter` is the base interface for stateless per-line tokenization.
- `StatefulHighlighter` extends it with a `LineState` and the method `HighlightLineStateful(buf, row, prev_state, out)`.
- The engine detects `StatefulHighlighter` via dynamic_cast and feeds each line the previous lines state, caching the resulting state per line.
C/C++ highlighter
-----------------
- `CppHighlighter` implements `StatefulHighlighter`.
- Stateless constructs: line comments `//`, strings `"..."`, chars `'...'`, numbers, identifiers (keywords/types), preprocessor at beginning of line after leading whitespace, operators/punctuation, and whitespace.
- Stateful constructs (v2):
- Multi-line block comments `/* ... */` — the state records whether the next line continues a comment.
- Raw strings `R"delim(... )delim"` — the state tracks whether we are inside a raw string and its delimiter `delim` until the closing sequence appears.
Limitations and TODOs
---------------------
- Raw string detection is intentionally simple and does not handle all corner cases of the C++ standard.
- Preprocessor handling is line-based; continuation lines with `\\` are not yet tracked.
- No semantic analysis; identifiers are classified via small keyword/type sets.
- Additional languages (JSON, Markdown, Shell, Python, Go, Rust, Lisp, …) are planned.
- Terminal color mapping is conservative to support 8/16-color terminals. Rich color-pair themes can be added later.
Renderer integration
--------------------
- Terminal and GUI renderers request line spans via `Highlighter()->GetLine(buf, row, buf.Version())`.
- Search highlight and cursor overlays take precedence over syntax colors.
Extensibility (Phase 4)
-----------------------
- Public registration API: external code can register custom highlighters by filetype.
- Use `HighlighterRegistry::Register("mylang", []{ return std::make_unique<MyHighlighter>(); });`
- Registered factories are preferred over built-ins for the same filetype key.
- Filetype keys are normalized via `HighlighterRegistry::Normalize()`.
- Optional Tree-sitter adapter: disabled by default to keep dependencies minimal.
- Enable with CMake option `-DKTE_ENABLE_TREESITTER=ON` and provide
`-DTREESITTER_INCLUDE_DIR=...` and `-DTREESITTER_LIBRARY=...` if needed.
- Register a Tree-sitter-backed highlighter for a language (example assumes you link a grammar):
```c++
extern "C" const TSLanguage* tree_sitter_c();
kte::HighlighterRegistry::RegisterTreeSitter("c", &tree_sitter_c);
```
- Current adapter is a stub scaffold; it compiles and integrates cleanly when enabled, but
intentionally emits no spans until Tree-sitter node-to-token mapping is implemented.