# PieceTable Migration Plan ## Executive Summary This document outlines the plan to remove GapBuffer support from kte and migrate to using a **single PieceTable per Buffer**, rather than the current vector-of-Lines architecture where each Line contains either a GapBuffer or PieceTable. ## Current Architecture Analysis ### Text Storage **Current Implementation:** - `Buffer` contains `std::vector rows_` - Each `Line` wraps an `AppendBuffer` (type alias) - `AppendBuffer` is either `GapBuffer` (default) or `PieceTable` (via `KTE_USE_PIECE_TABLE`) - Each line is independently managed with its own buffer - Operations are line-based with coordinate pairs (row, col) **Key Files:** - `Buffer.h/cc` - Buffer class with vector of Lines - `AppendBuffer.h` - Type selector (GapBuffer vs PieceTable) - `GapBuffer.h/cc` - Per-line gap buffer implementation - `PieceTable.h/cc` - Per-line piece table implementation - `UndoSystem.h/cc` - Records operations with (row, col, text) - `UndoNode.h` - Undo operation types (Insert, Delete, Paste, Newline, DeleteRow) - `Command.cc` - High-level editing commands ### Current Buffer API **Low-level editing operations (used by UndoSystem):** ```cpp void insert_text(int row, int col, std::string_view text); void delete_text(int row, int col, std::size_t len); void split_line(int row, int col); void join_lines(int row); void insert_row(int row, std::string_view text); void delete_row(int row); ``` **Line access:** ```cpp std::vector &Rows(); const std::vector &Rows() const; ``` **Line API (Buffer::Line):** ```cpp std::size_t size() const; const char *Data() const; char operator[](std::size_t i) const; std::string substr(std::size_t pos, std::size_t len) const; std::size_t find(const std::string &needle, std::size_t pos) const; void erase(std::size_t pos, std::size_t len); void insert(std::size_t pos, const std::string &seg); Line &operator+=(const Line &other); Line &operator+=(const std::string &s); ``` ### Current PieceTable Limitations The existing `PieceTable` class only supports: - `Append(char/string)` - add to end - `Prepend(char/string)` - add to beginning - `Clear()` - empty the buffer - `Data()` / `Size()` - access content (materializes on demand) **Missing capabilities needed for buffer-wide storage:** - Insert at arbitrary byte position - Delete at arbitrary byte position - Line indexing and line-based queries - Position conversion (byte offset ↔ line/col) - Efficient line boundary tracking ## Target Architecture ### Design Overview **Single PieceTable per Buffer:** - `Buffer` contains one `PieceTable content_` (replaces `std::vector rows_`) - Text stored as continuous byte sequence with `\n` as line separators - Line index cached for efficient line-based operations - All operations work on byte offsets internally - Buffer provides line/column API as convenience layer ### Enhanced PieceTable Design ```cpp class PieceTable { public: // Existing API (keep for compatibility if needed) void Append(const char *s, std::size_t len); void Prepend(const char *s, std::size_t len); void Clear(); const char *Data() const; std::size_t Size() const; // NEW: Core byte-based editing operations void Insert(std::size_t byte_offset, const char *text, std::size_t len); void Delete(std::size_t byte_offset, std::size_t len); // NEW: Line-based queries std::size_t LineCount() const; std::string GetLine(std::size_t line_num) const; std::pair GetLineRange(std::size_t line_num) const; // (start, end) byte offsets // NEW: Position conversion std::pair ByteOffsetToLineCol(std::size_t byte_offset) const; std::size_t LineColToByteOffset(std::size_t row, std::size_t col) const; // NEW: Substring extraction std::string GetRange(std::size_t byte_offset, std::size_t len) const; // NEW: Search support std::size_t Find(const std::string &needle, std::size_t start_offset) const; private: // Existing members std::string original_; std::string add_; std::vector pieces_; mutable std::string materialized_; mutable bool dirty_; std::size_t total_size_; // NEW: Line index for efficient line operations struct LineInfo { std::size_t byte_offset; // absolute byte offset from buffer start std::size_t piece_idx; // which piece contains line start std::size_t offset_in_piece; // byte offset within that piece }; mutable std::vector line_index_; mutable bool line_index_dirty_; // NEW: Line index management void RebuildLineIndex() const; void InvalidateLineIndex(); }; ``` ### Buffer API Changes ```cpp class Buffer { public: // NEW: Direct content access PieceTable &Content() { return content_; } const PieceTable &Content() const { return content_; } // MODIFIED: Keep existing API but implement via PieceTable void insert_text(int row, int col, std::string_view text); void delete_text(int row, int col, std::size_t len); void split_line(int row, int col); void join_lines(int row); void insert_row(int row, std::string_view text); void delete_row(int row); // MODIFIED: Line access - return line from PieceTable std::size_t Nrows() const { return content_.LineCount(); } std::string GetLine(std::size_t row) const { return content_.GetLine(row); } // REMOVED: Rows() - no longer have vector of Lines // std::vector &Rows(); // REMOVE private: // REMOVED: std::vector rows_; // NEW: Single piece table for all content PieceTable content_; // Keep existing members std::size_t curx_, cury_, rx_; std::size_t nrows_; // cached from content_.LineCount() std::size_t rowoffs_, coloffs_; std::string filename_; bool is_file_backed_; bool dirty_; bool read_only_; bool mark_set_; std::size_t mark_curx_, mark_cury_; std::unique_ptr undo_tree_; std::unique_ptr undo_sys_; std::uint64_t version_; bool syntax_enabled_; std::string filetype_; std::unique_ptr highlighter_; kte::SwapRecorder *swap_rec_; }; ``` ## Migration Phases ### Phase 1: Extend PieceTable (Foundation) **Goal:** Add buffer-wide capabilities to PieceTable without breaking existing per-line usage. **Tasks:** 1. Add line indexing infrastructure to PieceTable - Add `LineInfo` struct and `line_index_` member - Implement `RebuildLineIndex()` that scans pieces for '\n' characters - Implement `InvalidateLineIndex()` called by Insert/Delete 2. Implement core byte-based operations - `Insert(byte_offset, text, len)` - split piece at offset, insert new piece - `Delete(byte_offset, len)` - split pieces, remove/truncate as needed 3. Implement line-based query methods - `LineCount()` - return line_index_.size() - `GetLine(line_num)` - extract text between line boundaries - `GetLineRange(line_num)` - return (start, end) byte offsets 4. Implement position conversion - `ByteOffsetToLineCol(offset)` - binary search in line_index_ - `LineColToByteOffset(row, col)` - lookup line start, add col 5. Implement utility methods - `GetRange(offset, len)` - extract substring - `Find(needle, start)` - search across pieces **Testing:** - Write unit tests for new PieceTable methods - Test with multi-line content - Verify line index correctness after edits - Benchmark performance vs current line-based approach **Estimated Effort:** 3-5 days ### Phase 2: Create Buffer Adapter Layer (Compatibility) **Goal:** Create compatibility layer in Buffer to use PieceTable while maintaining existing API. **Tasks:** 1. Add `PieceTable content_` member to Buffer (alongside existing `rows_`) 2. Add compilation flag `KTE_USE_BUFFER_PIECE_TABLE` (like existing `KTE_USE_PIECE_TABLE`) 3. Implement Buffer methods to delegate to content_: ```cpp #ifdef KTE_USE_BUFFER_PIECE_TABLE void insert_text(int row, int col, std::string_view text) { std::size_t offset = content_.LineColToByteOffset(row, col); content_.Insert(offset, text.data(), text.size()); } // ... similar for other methods #else // Existing line-based implementation #endif ``` 4. Update file I/O to work with PieceTable - `OpenFromFile()` - load into content_ instead of rows_ - `Save()` - serialize content_ instead of rows_ 5. Update `AsString()` to materialize from content_ **Testing:** - Run existing buffer correctness tests with new flag - Verify undo/redo still works - Test file I/O round-tripping - Test with existing command operations **Estimated Effort:** 3-4 days ### Phase 3: Migrate Command Layer (High-level Operations) **Goal:** Update commands that directly access Rows() to use new API. **Tasks:** 1. Audit all usages of `buf.Rows()` in Command.cc 2. Refactor helper functions: - `extract_region_text()` - use content_.GetRange() - `delete_region()` - convert to byte offsets, use content_.Delete() - `insert_text_at_cursor()` - convert position, use content_ .Insert() 3. Update commands that iterate over lines: - Use `buf.GetLine(i)` instead of `buf.Rows()[i]` - Update line count queries to use `buf.Nrows()` 4. Update search/replace operations: - Modify `search_compute_matches()` to work with GetLine() - Update regex matching to work line-by-line or use content directly **Testing:** - Test all editing commands (insert, delete, newline, backspace) - Test region operations (mark, copy, kill) - Test search and replace - Test word navigation and deletion - Run through common editing workflows **Estimated Effort:** 4-6 days ### Phase 4: Update Renderer and Frontend (Display) **Goal:** Ensure all renderers work with new Buffer structure. **Tasks:** 1. Audit renderer implementations: - `TerminalRenderer.cc` - `ImGuiRenderer.cc` - `QtRenderer.cc` - `TestRenderer.cc` 2. Update line access patterns: - Replace `buf.Rows()[y]` with `buf.GetLine(y)` - Handle string return instead of Line object 3. Update syntax highlighting integration: - Ensure HighlighterEngine works with GetLine() - Update any line-based caching **Testing:** - Test rendering in terminal - Test ImGui frontend (if enabled) - Test Qt frontend (if enabled) - Verify syntax highlighting displays correctly - Test scrolling and viewport updates **Estimated Effort:** 2-3 days ### Phase 5: Remove Old Infrastructure (Cleanup) ✅ COMPLETED **Goal:** Remove GapBuffer, AppendBuffer, and Line class completely. **Status:** Completed on 2025-12-05 **Tasks:** 1. ✅ Remove conditional compilation: - Removed `#ifdef KTE_USE_BUFFER_PIECE_TABLE` (PieceTable is now the only way) - Removed `#ifdef KTE_USE_PIECE_TABLE` - Removed `AppendBuffer.h` 2. ✅ Delete obsolete code: - Deleted `GapBuffer.h/cc` - Line class now uses PieceTable internally (kept for API compatibility) - `rows_` kept as mutable cache rebuilt from `content_` PieceTable 3. ✅ Update CMakeLists.txt: - Removed GapBuffer from sources - Removed AppendBuffer.h from headers - Removed KTE_USE_PIECE_TABLE and KTE_USE_BUFFER_PIECE_TABLE options 4. ✅ Clean up includes and dependencies 5. ✅ Update documentation **Testing:** - Full regression test suite - Verify clean compilation - Check for any lingering references **Estimated Effort:** 1-2 days ### Phase 6: Performance Optimization (Polish) **Goal:** Optimize the new implementation for real-world usage. **Tasks:** 1. Profile common operations: - Measure line access patterns - Identify hot paths in editing - Benchmark against old implementation 2. Optimize line index: - Consider incremental updates instead of full rebuild - Tune rebuild threshold - Cache frequently accessed lines 3. Optimize piece table: - Tune piece coalescing heuristics - Consider piece count limits and consolidation 4. Memory optimization: - Review materialization frequency - Consider lazy materialization strategies - Profile memory usage on large files **Testing:** - Benchmark suite with various file sizes - Memory profiling - Real-world usage testing **Estimated Effort:** 3-5 days ## Files Requiring Modification ### Core Files (Must Change) - `PieceTable.h/cc` - Add new methods (Phase 1) - `Buffer.h/cc` - Replace rows_ with content_ (Phase 2) - `Command.cc` - Update line access (Phase 3) - `UndoSystem.cc` - May need updates for new Buffer API ### Renderer Files (Will Change) - `TerminalRenderer.cc` - Update line access (Phase 4) - `ImGuiRenderer.cc` - Update line access (Phase 4) - `QtRenderer.cc` - Update line access (Phase 4) - `TestRenderer.cc` - Update line access (Phase 4) ### Files Removed (Phase 5 - Completed) - `GapBuffer.h/cc` - ✅ Deleted - `AppendBuffer.h` - ✅ Deleted - `test_buffer_correctness.cc` - ✅ Deleted (obsolete GapBuffer comparison test) - `bench/BufferBench.cc` - ✅ Deleted (obsolete GapBuffer benchmarks) - `bench/PerformanceSuite.cc` - ✅ Deleted (obsolete GapBuffer benchmarks) - `Buffer::Line` class - ✅ Updated to use PieceTable internally (kept for API compatibility) ### Build Files - `CMakeLists.txt` - Update sources (Phase 5) ### Documentation - `README.md` - Update architecture notes - `docs/` - Update any architectural documentation - `REWRITE.md` - Note C++ now matches Rust design ## Testing Strategy ### Unit Tests - **PieceTable Tests:** New file `test_piece_table.cc` - Test Insert/Delete at various positions - Test line indexing correctness - Test position conversion - Test with edge cases (empty, single line, large files) - **Buffer Tests:** Extend `test_buffer_correctness.cc` - Test new Buffer API with PieceTable backend - Test file I/O round-tripping - Test multi-line operations ### Integration Tests - **Undo Tests:** `test_undo.cc` should still pass - Verify undo/redo across all operation types - Test undo tree navigation - **Search Tests:** `test_search_correctness.cc` should still pass - Verify search across multiple lines - Test regex search ### Manual Testing - Load and edit large files (>10MB) - Perform complex editing sequences - Test all keybindings and commands - Verify syntax highlighting - Test crash recovery (swap files) ### Regression Testing - All existing tests must pass with new implementation - No observable behavior changes for users - Performance should be comparable or better ## Risk Assessment ### High Risk - **Undo System Integration:** Undo records operations with row/col/text. Need to ensure compatibility or refactor. - *Mitigation:* Carefully preserve undo semantics, extensive testing - **Performance Regression:** Line index rebuilding could be expensive on large files. - *Mitigation:* Profile early, optimize incrementally, consider caching strategies ### Medium Risk - **Syntax Highlighting:** Highlighters may depend on line-based access patterns. - *Mitigation:* Review highlighter integration, test thoroughly - **Renderer Updates:** Multiple renderers need updating, risk of inconsistency. - *Mitigation:* Update all renderers in same phase, test each ### Low Risk - **Search/Replace:** Should work naturally with new GetLine() API. - *Mitigation:* Test thoroughly with existing test suite ## Success Criteria ### Functional Requirements - ✓ All existing tests pass - ✓ All commands work identically to before - ✓ File I/O works correctly - ✓ Undo/redo functionality preserved - ✓ Syntax highlighting works - ✓ All frontends (terminal, ImGui, Qt) work ### Code Quality - ✓ GapBuffer completely removed - ✓ No conditional compilation for buffer type - ✓ Clean, maintainable code - ✓ Good test coverage for new PieceTable methods ### Performance - ✓ Editing operations at least as fast as current - ✓ Line access within 2x of current performance - ✓ Memory usage reasonable (no excessive materialization) - ✓ Large file handling acceptable (tested up to 100MB) ## Timeline Estimate | Phase | Duration | Dependencies | |----------------------------|----------------|--------------| | Phase 1: Extend PieceTable | 3-5 days | None | | Phase 2: Buffer Adapter | 3-4 days | Phase 1 | | Phase 3: Command Layer | 4-6 days | Phase 2 | | Phase 4: Renderer Updates | 2-3 days | Phase 3 | | Phase 5: Cleanup | 1-2 days | Phase 4 | | Phase 6: Optimization | 3-5 days | Phase 5 | | **Total** | **16-25 days** | | **Note:** Timeline assumes one developer working full-time. Actual duration may vary based on: - Unforeseen integration issues - Performance optimization needs - Testing thoroughness - Code review iterations ## Alternatives Considered ### Alternative 1: Keep Line-based but unify GapBuffer/PieceTable - Keep vector of Lines, but make each Line always use PieceTable - Remove GapBuffer, remove AppendBuffer selector - **Pros:** Smaller change, less risk - **Cons:** Doesn't achieve architectural goal, still have per-line overhead ### Alternative 2: Hybrid approach - Use PieceTable for buffer, but maintain materialized Line objects as cache - **Pros:** Easier migration, maintains some compatibility - **Cons:** Complex dual representation, cache invalidation issues ### Alternative 3: Complete rewrite - Follow REWRITE.md exactly, implement in Rust - **Pros:** Modern language, better architecture - **Cons:** Much larger effort, different project ## Recommendation **Proceed with planned migration** (single PieceTable per Buffer) because: 1. Aligns with long-term architecture vision (REWRITE.md) 2. Removes unnecessary per-line buffer overhead 3. Simplifies codebase (one text representation) 4. Enables future optimizations (better undo, swap files, etc.) 5. Reasonable effort (16-25 days) for significant improvement **Suggested Approach:** - Start with Phase 1 (extend PieceTable) in isolated branch - Thoroughly test new PieceTable functionality - Proceed incrementally through phases - Maintain working editor at end of each phase - Merge to main after Phase 4 (before cleanup) to get testing - Complete Phase 5-6 based on feedback ## References - `REWRITE.md` - Rust architecture specification (lines 54-157) - Current buffer implementation: `Buffer.h/cc` - Current piece table: `PieceTable.h/cc` - Undo system: `UndoSystem.h/cc`, `UndoNode.h` - Commands: `Command.cc`