Architecture
gtext follows a simple, pluggable architecture designed for extensibility and ease of use.
Overview
┌─────────────────┐
│ CLI / API │ User Interface
└────────┬────────┘
│
┌────────▼────────┐
│ TextProcessor │ Core Engine
└────────┬────────┘
│
┌────▼────┐
│Extension│ Extension 1
│ System │
└────┬────┘
│
┌────▼────┐
│Extension│ Extension 2
│ System │
└────┬────┘
│
┌────▼────┐
│ Output │ Generated Files
└─────────┘
Components
1. CLI (gtext/cli.py)
Purpose: Command-line interface
Commands:
- cast - Process single file
- cast-all - Process multiple files
- watch - Watch and regenerate (planned)
Responsibilities: - Argument parsing - File discovery - Error reporting - User feedback
2. TextProcessor (gtext/processor.py)
Purpose: Core processing engine
Key Methods:
- process_file() - Process a .gtext file
- process_string() - Process text content
- add_extension() - Register an extension
Responsibilities: - File I/O - Extension management - Processing pipeline - Context management
3. Extension System (gtext/extensions/)
Purpose: Pluggable transformations
Base Class: BaseExtension
Built-in Extensions:
- IncludeExtension - File includes, CLI, globs
Responsibilities: - Text transformation - Content generation - Validation - Custom processing
Data Flow
File Processing
1. User runs: gtext cast document.md.gtext
2. CLI parses arguments
├─ Input: document.md.gtext
└─ Output: document.md (auto-detected)
3. TextProcessor reads file
└─ content = read(document.md.gtext)
4. Extensions process content sequentially
├─ Extension 1: content = ext1.process(content, context)
├─ Extension 2: content = ext2.process(content, context)
└─ Extension N: content = extN.process(content, context)
5. Write output
└─ write(document.md, processed_content)
6. CLI reports success
└─ "✓ Processed document.md.gtext → document.md"
Extension Pipeline
Input Content
│
▼
┌─────────────────┐
│ Extension 1 │ IncludeExtension
│ - Static files │
│ - CLI commands │
│ - Glob patterns│
└────────┬────────┘
│
▼
┌─────────────────┐
│ Extension 2 │ Custom Extension
│ - Variables │
│ - Conditionals │
└────────┬────────┘
│
▼
┌─────────────────┐
│ Extension N │ Another Extension
│ - Validation │
│ - Formatting │
└────────┬────────┘
│
▼
Output Content
Extension Architecture
BaseExtension
class BaseExtension(ABC):
name: str = "base"
@abstractmethod
def process(self, content: str, context: Dict) -> str:
pass
Extension Lifecycle
1. Instantiation
ext = MyExtension(option1=value)
2. Registration
processor.add_extension(ext)
3. Execution (for each file)
result = ext.process(content, context)
4. Cleanup
(extensions are stateless)
Context Dictionary
The context dict carries metadata through the processing pipeline:
context = {
"input_path": Path("document.md.gtext"),
# Extensions can add custom keys
"custom_key": "custom_value"
}
Usage: - Read by extensions for path resolution - Write by extensions to share data - Passed through entire pipeline
File Extension Convention
source.md.gtext
│ │ │
│ │ └─ Identifies as gtext source
│ └────── Target format (Markdown)
└───────────── Base name
Output: source.md
Error Handling Strategy
Fault Tolerance
- Philosophy: Never break the build
- Implementation: Errors become HTML comments
- Example:
<!-- ERROR: File not found: missing.md -->
Error Propagation
Design Principles
1. Zero Configuration
Works immediately after installation:
2. Convention over Configuration
.gtextextension identifies source files- Auto-detect output by stripping extension
- Relative paths from source file location
3. Extensibility
- Plugin architecture
- Simple base class
- Sequential processing
- Shared context
4. Fault Tolerance
- Errors don't stop builds
- Clear error messages
- Continue on failure
5. Simplicity
- Small core
- Clear interfaces
- Minimal dependencies
- Easy to understand
Performance Considerations
File I/O
- Read once, write once
- UTF-8 encoding
- Buffered I/O
Extension Processing
- Sequential (not parallel)
- Each extension processes full content
- No caching (stateless)
Optimization Opportunities
- Parallel file processing (future)
- Incremental processing (future)
- Caching for expensive operations (future)
Security Model
Current
- Trust model: Process only trusted
.gtextfiles - CLI execution: Full shell access with
shell=True - File access: No restrictions
Future
--safe-modeflag- Sandboxed CLI execution
- Restricted file access
- Command whitelist
Extension Points
Current
- Extensions: Inherit from
BaseExtension
Future
- Hooks: Pre/post processing hooks
- Filters: Content filters
- Validators: Content validators
- Generators: Content generators
Dependencies
Runtime: None (zero dependencies)
Development: - pytest - pytest-cov - black - ruff - mypy
Documentation: - mkdocs - mkdocs-material - mkdocstrings
Future Directions
Planned Features
- Watch mode: Auto-regenerate on changes
- Configuration files:
.gtextrc,pyproject.toml - Safe mode: Restricted execution
- Parallel processing: Process multiple files concurrently
- Incremental builds: Only process changed files
- Extension registry: Discover and load extensions
- Hooks system: Pre/post processing hooks
Extension Ideas
- Variable substitution
- Conditional blocks
- Table of contents generation
- Link validation
- Spell checking
- AI-powered transformations