Files
kte/docs/syntax.md
Kyle Isom 78b9345799 Add swap file journaling for crash recovery.
- Introduced `SwapManager` for buffering and writing incremental edits to sidecar `.kte.swp` files.
- Implemented basic operations: insertion, deletion, split, join, and checkpointing.
- Added recovery design doc (`docs/plans/swap-files.md`).
- Updated editor initialization to integrate `SwapManager` instance for crash recovery across buffers.
2025-12-04 08:48:32 -08:00

122 lines
4.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
Syntax highlighting in kte
==========================
Overview
--------
kte provides lightweight syntax highlighting with a pluggable
highlighter interface. The initial implementation targets C/C++ and
focuses on speed and responsiveness.
Core types
----------
- `TokenKind` — token categories (keywords, types, strings, comments,
numbers, preprocessor, operators, punctuation, identifiers,
whitespace, etc.).
- `HighlightSpan` — a half-open column range `[col_start, col_end)` with
a `TokenKind`.
- `LineHighlight` — a vector of `HighlightSpan` and the buffer `version`
used to compute it.
Engine and caching
------------------
- `HighlighterEngine` maintains a per-line cache of `LineHighlight`
keyed by row and buffer version.
- Cache invalidation occurs when the buffer version changes or when the
buffer calls `InvalidateFrom(row)`, which clears cached lines and line
states from `row` downward.
- The engine supports both stateless and stateful highlighters. For
stateful highlighters, it memoizes a simple per-line state and
computes lines sequentially when necessary.
Stateful highlighters
---------------------
- `LanguageHighlighter` is the base interface for stateless per-line
tokenization.
- `StatefulHighlighter` extends it with a `LineState` and the method
`HighlightLineStateful(buf, row, prev_state, out)`.
- The engine detects `StatefulHighlighter` via dynamic_cast and feeds
each line the previous lines state, caching the resulting state per
line.
C/C++ highlighter
-----------------
- `CppHighlighter` implements `StatefulHighlighter`.
- Stateless constructs: line comments `//`, strings `"..."`, chars
`'...'`, numbers, identifiers (keywords/types), preprocessor at
beginning of line after leading whitespace, operators/punctuation, and
whitespace.
- Stateful constructs (v2):
- Multi-line block comments `/* ... */` — the state records whether
the next line continues a comment.
- Raw strings `R"delim(... )delim"` — the state tracks whether we
are inside a raw string and its delimiter `delim` until the
closing sequence appears.
Limitations and TODOs
---------------------
- Raw string detection is intentionally simple and does not handle all
corner cases of the C++ standard.
- Preprocessor handling is line-based; continuation lines with `\\` are
not yet tracked.
- No semantic analysis; identifiers are classified via small
keyword/type sets.
- Additional languages (JSON, Markdown, Shell, Python, Go, Rust,
Lisp, …) are planned.
- Terminal color mapping is conservative to support 8/16-color
terminals. Rich color-pair themes can be added later.
Renderer integration
--------------------
- Terminal and GUI renderers request line spans via
`Highlighter()->GetLine(buf, row, buf.Version())`.
- Search highlight and cursor overlays take precedence over syntax
colors.
Renderer-side robustness
------------------------
- Renderers defensively sanitize `HighlightSpan` data before use to
ensure stability even if a highlighter misbehaves:
- Clamp `col_start/col_end` to the line length and ensure
`end >= start`.
- Drop empty/invalid spans and sort by start.
- Clip drawing to the horizontally visible region and the
tab-expanded line length.
- The highlighter engine returns `LineHighlight` by value to avoid
cross-thread lifetime issues; renderers operate on a local copy for
each frame.
Extensibility (Phase 4)
-----------------------
- Public registration API: external code can register custom
highlighters by filetype.
- Use
`HighlighterRegistry::Register("mylang", []{ return std::make_unique<MyHighlighter>(); });`
- Registered factories are preferred over built-ins for the same
filetype key.
- Filetype keys are normalized via
`HighlighterRegistry::Normalize()`.
- Optional Tree-sitter adapter: disabled by default to keep dependencies
minimal.
- Enable with CMake option `-DKTE_ENABLE_TREESITTER=ON` and provide
`-DTREESITTER_INCLUDE_DIR=...` and `-DTREESITTER_LIBRARY=...` if
needed.
- Register a Tree-sitter-backed highlighter for a language (example
assumes you link a grammar):
```c++
extern "C" const TSLanguage* tree_sitter_c();
kte::HighlighterRegistry::RegisterTreeSitter("c", &tree_sitter_c);
```
- Current adapter is a stub scaffold; it compiles and integrates
cleanly when enabled, but
intentionally emits no spans until Tree-sitter node-to-token
mapping is implemented.