- Created `piece-table-migration.md` outlining the steps to transition from GapBuffer to a unified PieceTable architecture. - Included phased approach: extending PieceTable, Buffer adapter layer, command updates, and renderer changes. - Detailed API changes, file updates, testing strategy, risk assessment, and timeline for each migration phase. - Document serves as a reference for architecture goals and implementation details.
18 KiB
PieceTable Migration Plan
Executive Summary
This document outlines the plan to remove GapBuffer support from kte and migrate to using a single PieceTable per Buffer, rather than the current vector-of-Lines architecture where each Line contains either a GapBuffer or PieceTable.
Current Architecture Analysis
Text Storage
Current Implementation:
Buffercontainsstd::vector<Line> rows_- Each
Linewraps anAppendBuffer(type alias) AppendBufferis eitherGapBuffer(default) orPieceTable(viaKTE_USE_PIECE_TABLE)- Each line is independently managed with its own buffer
- Operations are line-based with coordinate pairs (row, col)
Key Files:
Buffer.h/cc- Buffer class with vector of LinesAppendBuffer.h- Type selector (GapBuffer vs PieceTable)GapBuffer.h/cc- Per-line gap buffer implementationPieceTable.h/cc- Per-line piece table implementationUndoSystem.h/cc- Records operations with (row, col, text)UndoNode.h- Undo operation types (Insert, Delete, Paste, Newline, DeleteRow)Command.cc- High-level editing commands
Current Buffer API
Low-level editing operations (used by UndoSystem):
void insert_text(int row, int col, std::string_view text);
void delete_text(int row, int col, std::size_t len);
void split_line(int row, int col);
void join_lines(int row);
void insert_row(int row, std::string_view text);
void delete_row(int row);
Line access:
std::vector<Line> &Rows();
const std::vector<Line> &Rows() const;
Line API (Buffer::Line):
std::size_t size() const;
const char *Data() const;
char operator[](std::size_t i) const;
std::string substr(std::size_t pos, std::size_t len) const;
std::size_t find(const std::string &needle, std::size_t pos) const;
void erase(std::size_t pos, std::size_t len);
void insert(std::size_t pos, const std::string &seg);
Line &operator+=(const Line &other);
Line &operator+=(const std::string &s);
Current PieceTable Limitations
The existing PieceTable class only supports:
Append(char/string)- add to endPrepend(char/string)- add to beginningClear()- empty the bufferData()/Size()- access content (materializes on demand)
Missing capabilities needed for buffer-wide storage:
- Insert at arbitrary byte position
- Delete at arbitrary byte position
- Line indexing and line-based queries
- Position conversion (byte offset ↔ line/col)
- Efficient line boundary tracking
Target Architecture
Design Overview
Single PieceTable per Buffer:
Buffercontains onePieceTable content_(replacesstd::vector<Line> rows_)- Text stored as continuous byte sequence with
\nas line separators - Line index cached for efficient line-based operations
- All operations work on byte offsets internally
- Buffer provides line/column API as convenience layer
Enhanced PieceTable Design
class PieceTable {
public:
// Existing API (keep for compatibility if needed)
void Append(const char *s, std::size_t len);
void Prepend(const char *s, std::size_t len);
void Clear();
const char *Data() const;
std::size_t Size() const;
// NEW: Core byte-based editing operations
void Insert(std::size_t byte_offset, const char *text, std::size_t len);
void Delete(std::size_t byte_offset, std::size_t len);
// NEW: Line-based queries
std::size_t LineCount() const;
std::string GetLine(std::size_t line_num) const;
std::pair<std::size_t, std::size_t> GetLineRange(std::size_t line_num) const; // (start, end) byte offsets
// NEW: Position conversion
std::pair<std::size_t, std::size_t> ByteOffsetToLineCol(std::size_t byte_offset) const;
std::size_t LineColToByteOffset(std::size_t row, std::size_t col) const;
// NEW: Substring extraction
std::string GetRange(std::size_t byte_offset, std::size_t len) const;
// NEW: Search support
std::size_t Find(const std::string &needle, std::size_t start_offset) const;
private:
// Existing members
std::string original_;
std::string add_;
std::vector<Piece> pieces_;
mutable std::string materialized_;
mutable bool dirty_;
std::size_t total_size_;
// NEW: Line index for efficient line operations
struct LineInfo {
std::size_t byte_offset; // absolute byte offset from buffer start
std::size_t piece_idx; // which piece contains line start
std::size_t offset_in_piece; // byte offset within that piece
};
mutable std::vector<LineInfo> line_index_;
mutable bool line_index_dirty_;
// NEW: Line index management
void RebuildLineIndex() const;
void InvalidateLineIndex();
};
Buffer API Changes
class Buffer {
public:
// NEW: Direct content access
PieceTable &Content() { return content_; }
const PieceTable &Content() const { return content_; }
// MODIFIED: Keep existing API but implement via PieceTable
void insert_text(int row, int col, std::string_view text);
void delete_text(int row, int col, std::size_t len);
void split_line(int row, int col);
void join_lines(int row);
void insert_row(int row, std::string_view text);
void delete_row(int row);
// MODIFIED: Line access - return line from PieceTable
std::size_t Nrows() const { return content_.LineCount(); }
std::string GetLine(std::size_t row) const { return content_.GetLine(row); }
// REMOVED: Rows() - no longer have vector of Lines
// std::vector<Line> &Rows(); // REMOVE
private:
// REMOVED: std::vector<Line> rows_;
// NEW: Single piece table for all content
PieceTable content_;
// Keep existing members
std::size_t curx_, cury_, rx_;
std::size_t nrows_; // cached from content_.LineCount()
std::size_t rowoffs_, coloffs_;
std::string filename_;
bool is_file_backed_;
bool dirty_;
bool read_only_;
bool mark_set_;
std::size_t mark_curx_, mark_cury_;
std::unique_ptr<UndoTree> undo_tree_;
std::unique_ptr<UndoSystem> undo_sys_;
std::uint64_t version_;
bool syntax_enabled_;
std::string filetype_;
std::unique_ptr<kte::HighlighterEngine> highlighter_;
kte::SwapRecorder *swap_rec_;
};
Migration Phases
Phase 1: Extend PieceTable (Foundation)
Goal: Add buffer-wide capabilities to PieceTable without breaking existing per-line usage.
Tasks:
-
Add line indexing infrastructure to PieceTable
- Add
LineInfostruct andline_index_member - Implement
RebuildLineIndex()that scans pieces for '\n' characters - Implement
InvalidateLineIndex()called by Insert/Delete
- Add
-
Implement core byte-based operations
Insert(byte_offset, text, len)- split piece at offset, insert new pieceDelete(byte_offset, len)- split pieces, remove/truncate as needed
-
Implement line-based query methods
LineCount()- return line_index_.size()GetLine(line_num)- extract text between line boundariesGetLineRange(line_num)- return (start, end) byte offsets
-
Implement position conversion
ByteOffsetToLineCol(offset)- binary search in line_index_LineColToByteOffset(row, col)- lookup line start, add col
-
Implement utility methods
GetRange(offset, len)- extract substringFind(needle, start)- search across pieces
Testing:
- Write unit tests for new PieceTable methods
- Test with multi-line content
- Verify line index correctness after edits
- Benchmark performance vs current line-based approach
Estimated Effort: 3-5 days
Phase 2: Create Buffer Adapter Layer (Compatibility)
Goal: Create compatibility layer in Buffer to use PieceTable while maintaining existing API.
Tasks:
- Add
PieceTable content_member to Buffer (alongside existingrows_) - Add compilation flag
KTE_USE_BUFFER_PIECE_TABLE(like existingKTE_USE_PIECE_TABLE) - Implement Buffer methods to delegate to content_:
#ifdef KTE_USE_BUFFER_PIECE_TABLE void insert_text(int row, int col, std::string_view text) { std::size_t offset = content_.LineColToByteOffset(row, col); content_.Insert(offset, text.data(), text.size()); } // ... similar for other methods #else // Existing line-based implementation #endif - Update file I/O to work with PieceTable
OpenFromFile()- load into content_ instead of rows_Save()- serialize content_ instead of rows_
- Update
AsString()to materialize from content_
Testing:
- Run existing buffer correctness tests with new flag
- Verify undo/redo still works
- Test file I/O round-tripping
- Test with existing command operations
Estimated Effort: 3-4 days
Phase 3: Migrate Command Layer (High-level Operations)
Goal: Update commands that directly access Rows() to use new API.
Tasks:
- Audit all usages of
buf.Rows()in Command.cc - Refactor helper functions:
extract_region_text()- use content_.GetRange()delete_region()- convert to byte offsets, use content_.Delete()insert_text_at_cursor()- convert position, use content_ .Insert()
- Update commands that iterate over lines:
- Use
buf.GetLine(i)instead ofbuf.Rows()[i] - Update line count queries to use
buf.Nrows()
- Use
- Update search/replace operations:
- Modify
search_compute_matches()to work with GetLine() - Update regex matching to work line-by-line or use content directly
- Modify
Testing:
- Test all editing commands (insert, delete, newline, backspace)
- Test region operations (mark, copy, kill)
- Test search and replace
- Test word navigation and deletion
- Run through common editing workflows
Estimated Effort: 4-6 days
Phase 4: Update Renderer and Frontend (Display)
Goal: Ensure all renderers work with new Buffer structure.
Tasks:
- Audit renderer implementations:
TerminalRenderer.ccImGuiRenderer.ccQtRenderer.ccTestRenderer.cc
- Update line access patterns:
- Replace
buf.Rows()[y]withbuf.GetLine(y) - Handle string return instead of Line object
- Replace
- Update syntax highlighting integration:
- Ensure HighlighterEngine works with GetLine()
- Update any line-based caching
Testing:
- Test rendering in terminal
- Test ImGui frontend (if enabled)
- Test Qt frontend (if enabled)
- Verify syntax highlighting displays correctly
- Test scrolling and viewport updates
Estimated Effort: 2-3 days
Phase 5: Remove Old Infrastructure (Cleanup) ✅ COMPLETED
Goal: Remove GapBuffer, AppendBuffer, and Line class completely.
Status: Completed on 2025-12-05
Tasks:
- ✅ Remove conditional compilation:
- Removed
#ifdef KTE_USE_BUFFER_PIECE_TABLE(PieceTable is now the only way) - Removed
#ifdef KTE_USE_PIECE_TABLE - Removed
AppendBuffer.h
- Removed
- ✅ Delete obsolete code:
- Deleted
GapBuffer.h/cc - Line class now uses PieceTable internally (kept for API compatibility)
rows_kept as mutable cache rebuilt fromcontent_PieceTable
- Deleted
- ✅ Update CMakeLists.txt:
- Removed GapBuffer from sources
- Removed AppendBuffer.h from headers
- Removed KTE_USE_PIECE_TABLE and KTE_USE_BUFFER_PIECE_TABLE options
- ✅ Clean up includes and dependencies
- ✅ Update documentation
Testing:
- Full regression test suite
- Verify clean compilation
- Check for any lingering references
Estimated Effort: 1-2 days
Phase 6: Performance Optimization (Polish)
Goal: Optimize the new implementation for real-world usage.
Tasks:
- Profile common operations:
- Measure line access patterns
- Identify hot paths in editing
- Benchmark against old implementation
- Optimize line index:
- Consider incremental updates instead of full rebuild
- Tune rebuild threshold
- Cache frequently accessed lines
- Optimize piece table:
- Tune piece coalescing heuristics
- Consider piece count limits and consolidation
- Memory optimization:
- Review materialization frequency
- Consider lazy materialization strategies
- Profile memory usage on large files
Testing:
- Benchmark suite with various file sizes
- Memory profiling
- Real-world usage testing
Estimated Effort: 3-5 days
Files Requiring Modification
Core Files (Must Change)
PieceTable.h/cc- Add new methods (Phase 1)Buffer.h/cc- Replace rows_ with content_ (Phase 2)Command.cc- Update line access (Phase 3)UndoSystem.cc- May need updates for new Buffer API
Renderer Files (Will Change)
TerminalRenderer.cc- Update line access (Phase 4)ImGuiRenderer.cc- Update line access (Phase 4)QtRenderer.cc- Update line access (Phase 4)TestRenderer.cc- Update line access (Phase 4)
Files Removed (Phase 5 - Completed)
GapBuffer.h/cc- ✅ DeletedAppendBuffer.h- ✅ Deletedtest_buffer_correctness.cc- ✅ Deleted (obsolete GapBuffer comparison test)bench/BufferBench.cc- ✅ Deleted (obsolete GapBuffer benchmarks)bench/PerformanceSuite.cc- ✅ Deleted (obsolete GapBuffer benchmarks)Buffer::Lineclass - ✅ Updated to use PieceTable internally (kept for API compatibility)
Build Files
CMakeLists.txt- Update sources (Phase 5)
Documentation
README.md- Update architecture notesdocs/- Update any architectural documentationREWRITE.md- Note C++ now matches Rust design
Testing Strategy
Unit Tests
-
PieceTable Tests: New file
test_piece_table.cc- Test Insert/Delete at various positions
- Test line indexing correctness
- Test position conversion
- Test with edge cases (empty, single line, large files)
-
Buffer Tests: Extend
test_buffer_correctness.cc- Test new Buffer API with PieceTable backend
- Test file I/O round-tripping
- Test multi-line operations
Integration Tests
-
Undo Tests:
test_undo.ccshould still pass- Verify undo/redo across all operation types
- Test undo tree navigation
-
Search Tests:
test_search_correctness.ccshould still pass- Verify search across multiple lines
- Test regex search
Manual Testing
- Load and edit large files (>10MB)
- Perform complex editing sequences
- Test all keybindings and commands
- Verify syntax highlighting
- Test crash recovery (swap files)
Regression Testing
- All existing tests must pass with new implementation
- No observable behavior changes for users
- Performance should be comparable or better
Risk Assessment
High Risk
-
Undo System Integration: Undo records operations with row/col/text. Need to ensure compatibility or refactor.
- Mitigation: Carefully preserve undo semantics, extensive testing
-
Performance Regression: Line index rebuilding could be expensive on large files.
- Mitigation: Profile early, optimize incrementally, consider caching strategies
Medium Risk
-
Syntax Highlighting: Highlighters may depend on line-based access patterns.
- Mitigation: Review highlighter integration, test thoroughly
-
Renderer Updates: Multiple renderers need updating, risk of inconsistency.
- Mitigation: Update all renderers in same phase, test each
Low Risk
- Search/Replace: Should work naturally with new GetLine() API.
- Mitigation: Test thoroughly with existing test suite
Success Criteria
Functional Requirements
- ✓ All existing tests pass
- ✓ All commands work identically to before
- ✓ File I/O works correctly
- ✓ Undo/redo functionality preserved
- ✓ Syntax highlighting works
- ✓ All frontends (terminal, ImGui, Qt) work
Code Quality
- ✓ GapBuffer completely removed
- ✓ No conditional compilation for buffer type
- ✓ Clean, maintainable code
- ✓ Good test coverage for new PieceTable methods
Performance
- ✓ Editing operations at least as fast as current
- ✓ Line access within 2x of current performance
- ✓ Memory usage reasonable (no excessive materialization)
- ✓ Large file handling acceptable (tested up to 100MB)
Timeline Estimate
| Phase | Duration | Dependencies |
|---|---|---|
| Phase 1: Extend PieceTable | 3-5 days | None |
| Phase 2: Buffer Adapter | 3-4 days | Phase 1 |
| Phase 3: Command Layer | 4-6 days | Phase 2 |
| Phase 4: Renderer Updates | 2-3 days | Phase 3 |
| Phase 5: Cleanup | 1-2 days | Phase 4 |
| Phase 6: Optimization | 3-5 days | Phase 5 |
| Total | 16-25 days |
Note: Timeline assumes one developer working full-time. Actual duration may vary based on:
- Unforeseen integration issues
- Performance optimization needs
- Testing thoroughness
- Code review iterations
Alternatives Considered
Alternative 1: Keep Line-based but unify GapBuffer/PieceTable
- Keep vector of Lines, but make each Line always use PieceTable
- Remove GapBuffer, remove AppendBuffer selector
- Pros: Smaller change, less risk
- Cons: Doesn't achieve architectural goal, still have per-line overhead
Alternative 2: Hybrid approach
- Use PieceTable for buffer, but maintain materialized Line objects as cache
- Pros: Easier migration, maintains some compatibility
- Cons: Complex dual representation, cache invalidation issues
Alternative 3: Complete rewrite
- Follow REWRITE.md exactly, implement in Rust
- Pros: Modern language, better architecture
- Cons: Much larger effort, different project
Recommendation
Proceed with planned migration (single PieceTable per Buffer) because:
- Aligns with long-term architecture vision (REWRITE.md)
- Removes unnecessary per-line buffer overhead
- Simplifies codebase (one text representation)
- Enables future optimizations (better undo, swap files, etc.)
- Reasonable effort (16-25 days) for significant improvement
Suggested Approach:
- Start with Phase 1 (extend PieceTable) in isolated branch
- Thoroughly test new PieceTable functionality
- Proceed incrementally through phases
- Maintain working editor at end of each phase
- Merge to main after Phase 4 (before cleanup) to get testing
- Complete Phase 5-6 based on feedback
References
REWRITE.md- Rust architecture specification (lines 54-157)- Current buffer implementation:
Buffer.h/cc - Current piece table:
PieceTable.h/cc - Undo system:
UndoSystem.h/cc,UndoNode.h - Commands:
Command.cc