Files
kte/docs/syntax.md
Kyle Isom ceef6af3ae
Some checks failed
Release / Bump Homebrew formula (push) Has been cancelled
Release / Build Linux amd64 (push) Has been cancelled
Release / Build Linux arm64 (push) Has been cancelled
Release / Build macOS arm64 (.app) (push) Has been cancelled
Release / Create GitHub Release (push) Has been cancelled
Add extensible highlighter registration and Tree-sitter support.
- Implemented runtime API for registering custom highlighters.
- Added optional Tree-sitter integration for advanced syntax parsing (disabled by default).
- Updated buffer initialization and copying to support dynamic highlighter configuration.
- Introduced `NullHighlighter` as a fallback for unsupported filetypes.
- Enhanced CMake configuration with `KTE_ENABLE_TREESITTER` option.
2025-12-01 19:04:37 -08:00

3.8 KiB
Raw Blame History

Syntax highlighting in kte

Overview

kte provides lightweight syntax highlighting with a pluggable highlighter interface. The initial implementation targets C/C++ and focuses on speed and responsiveness.

Core types

  • TokenKind — token categories (keywords, types, strings, comments, numbers, preprocessor, operators, punctuation, identifiers, whitespace, etc.).
  • HighlightSpan — a half-open column range [col_start, col_end) with a TokenKind.
  • LineHighlight — a vector of HighlightSpan and the buffer version used to compute it.

Engine and caching

  • HighlighterEngine maintains a per-line cache of LineHighlight keyed by row and buffer version.
  • Cache invalidation occurs when the buffer version changes or when the buffer calls InvalidateFrom(row), which clears cached lines and line states from row downward.
  • The engine supports both stateless and stateful highlighters. For stateful highlighters, it memoizes a simple per-line state and computes lines sequentially when necessary.

Stateful highlighters

  • LanguageHighlighter is the base interface for stateless per-line tokenization.
  • StatefulHighlighter extends it with a LineState and the method HighlightLineStateful(buf, row, prev_state, out).
  • The engine detects StatefulHighlighter via dynamic_cast and feeds each line the previous lines state, caching the resulting state per line.

C/C++ highlighter

  • CppHighlighter implements StatefulHighlighter.
  • Stateless constructs: line comments //, strings "...", chars '...', numbers, identifiers (keywords/types), preprocessor at beginning of line after leading whitespace, operators/punctuation, and whitespace.
  • Stateful constructs (v2):
    • Multi-line block comments /* ... */ — the state records whether the next line continues a comment.
    • Raw strings R"delim(... )delim" — the state tracks whether we are inside a raw string and its delimiter delim until the closing sequence appears.

Limitations and TODOs

  • Raw string detection is intentionally simple and does not handle all corner cases of the C++ standard.
  • Preprocessor handling is line-based; continuation lines with \\ are not yet tracked.
  • No semantic analysis; identifiers are classified via small keyword/type sets.
  • Additional languages (JSON, Markdown, Shell, Python, Go, Rust, Lisp, …) are planned.
  • Terminal color mapping is conservative to support 8/16-color terminals. Rich color-pair themes can be added later.

Renderer integration

  • Terminal and GUI renderers request line spans via Highlighter()->GetLine(buf, row, buf.Version()).
  • Search highlight and cursor overlays take precedence over syntax colors.

Extensibility (Phase 4)

  • Public registration API: external code can register custom highlighters by filetype.
    • Use HighlighterRegistry::Register("mylang", []{ return std::make_unique<MyHighlighter>(); });
    • Registered factories are preferred over built-ins for the same filetype key.
    • Filetype keys are normalized via HighlighterRegistry::Normalize().
  • Optional Tree-sitter adapter: disabled by default to keep dependencies minimal.
    • Enable with CMake option -DKTE_ENABLE_TREESITTER=ON and provide -DTREESITTER_INCLUDE_DIR=... and -DTREESITTER_LIBRARY=... if needed.
    • Register a Tree-sitter-backed highlighter for a language (example assumes you link a grammar):
      extern "C" const TSLanguage* tree_sitter_c();
      kte::HighlighterRegistry::RegisterTreeSitter("c", &tree_sitter_c);
      
    • Current adapter is a stub scaffold; it compiles and integrates cleanly when enabled, but intentionally emits no spans until Tree-sitter node-to-token mapping is implemented.