TS โ TFW-26 / Phase B: gen_docs.py Implementation¶
Date: 2026-04-05 Author: Coordinator Status: ๐ก TS_DRAFT โ Awaiting approval Parent HL: HL-TFW-26
1. Objective¶
Implement all # TODO: Phase B functions in docs/scripts/gen_docs.py. After this phase, mkdocs serve --config-file docs/mkdocs.yml produces a fully navigable local site with all sources compiled, references resolved, and frontmatter injected.
2. Scope¶
In Scope¶
- Implement gen_docs.py: all transformation functions, glob processing, reference resolver
- Read
tfw.task_prefixfrom PROJECT_CONFIG.yaml (replace hardcodedTFW) - Verify build locally with
mkdocs build --config-file docs/mkdocs.yml --strict - Verify reference resolution on real project artifacts (spot-check 5+ resolved links)
- Fix P{N} false positive issue (context-aware resolution)
Out of Scope¶
- Curated landing page (Phase C)
- GitHub Pages deployment verification (Phase C)
- MCP server implementation (future task)
- Migration of existing KNOWLEDGE.md/knowledge/ to reference format (TD-66, TD-67 โ tfw-docs scope)
3. Affected Files¶
| File | Action | Description |
|---|---|---|
docs/scripts/gen_docs.py |
MODIFY | Implement all TODO functions (~100 new LOC) |
docs/scripts/test_gen_docs.py |
CREATE | Unit tests: extract_title, add_frontmatter, resolve_references, validate_sources |
docs/scripts/test_integration.py |
CREATE | Integration test: build from real project, verify output pages and links |
docs/requirements.txt |
MODIFY | Add pytest, pyyaml |
Budget: 2 new files, 2 modifications = 4 total. Defaults: max 14 files, max 8 new, max 1200 LOC.
4. Detailed Steps¶
Step 1: Implement core utilities¶
def extract_title(content: str, filename: str) -> str:
"""Derive title from first '# ' heading, fallback to filename stem."""
for line in content.splitlines():
if line.startswith("# "):
return line[2:].strip()
return Path(filename).stem.replace("_", " ").title()
def add_frontmatter(content: str, title: str, source: str) -> str:
"""Prepend YAML frontmatter (ยง16.3). Skip if content already has frontmatter."""
if content.startswith("---"):
return content
return f"---\ntitle: \"{title}\"\nsource: \"{source}\"\n---\n\n{content}"
Step 2: Implement source validation¶
def validate_sources() -> list[str]:
"""Check required sources exist, warn on missing optional."""
warnings = []
root = Path(__file__).parent.parent.parent # project root
for source, output, required in STATIC_SOURCES:
path = root / source
if not path.exists():
if required:
raise FileNotFoundError(f"Required source missing: {source}")
warnings.append(f"Optional source missing: {source}")
return warnings
Step 3: Implement static source processing¶
def copy_with_frontmatter(source_path: str, output_path: str) -> None:
"""Read source file, add frontmatter, write as virtual MkDocs page."""
root = Path(__file__).parent.parent.parent
path = root / source_path
content = path.read_text(encoding="utf-8")
title = extract_title(content, source_path)
result = add_frontmatter(content, title, source_path)
result = resolve_references(result)
with mkdocs_gen_files.open(output_path, "w") as f:
f.write(result)
Handle INDEX_OVERRIDE: if docs/index.md exists, use it instead of README.md for index.md output.
Step 4: Implement glob source processing¶
def copy_glob(pattern: str, output_prefix: str) -> None:
"""Glob source files relative to project root, copy each with frontmatter."""
root = Path(__file__).parent.parent.parent
for path in sorted(root.glob(pattern)):
relative = path.relative_to(root)
# Compute output path: strip the source prefix, prepend output prefix
output_path = output_prefix + str(relative.relative_to(
Path(pattern).parent if not "**" in pattern
else _glob_base(pattern)
))
copy_with_frontmatter(str(relative), output_path)
Key considerations:
- tasks/**/*.md โ output preserves relative path: tasks/TFW-18__foo/RF__TFW-18.md โ same path
- knowledge/*.md โ knowledge/{name}.md
- .tfw/workflows/**/*.md โ reference/workflows/{subpath}
- .tfw/templates/**/*.md โ reference/templates/{subpath}
Step 5: Implement reference resolver¶
The resolver scans page content for reference patterns (ยง16.2) and replaces with markdown hyperlinks.
def resolve_references(content: str) -> str:
"""Replace reference patterns with markdown hyperlinks."""
content = _resolve_artifact_refs(content) # [RF TFW-18](../../TFW-18__knowledge_consolidation/RF__PhaseB__knowledge_quality.md), [HL-TFW-19](../../TFW-19__config_propagation/HL-TFW-19__config_propagation.md)
content = _resolve_phase_refs(content) # [RF TFW-18](../../TFW-18__knowledge_consolidation/RF__PhaseB__knowledge_quality.md)/A
content = _resolve_techdebt_refs(content) # [TD-59](../../../reference/tech-debt.md)
content = _resolve_decision_refs(content) # [D24](../../../knowledge-index.md#architecture-decisions)
# P{N} โ skip for now due to false positive risk. Revisit in Phase C
return content
Artifact reference resolution ([RF TFW-18](../../TFW-18__knowledge_consolidation/RF__PhaseB__knowledge_quality.md)):
def _resolve_artifact_refs(content: str) -> str:
"""Resolve '{TYPE} {PREFIX}-{N}' patterns to hyperlinks."""
root = Path(__file__).parent.parent.parent
pattern = re.compile(
r'(?<!\[)\b(HL|TS|RF|ONB|RES|REVIEW)[- ](' + TASK_PREFIX + r'-\d+)\b(?!\])'
)
def replacer(match):
artifact_type = match.group(1)
task_id = match.group(2)
# Glob for matching file
candidates = sorted(root.glob(f"tasks/{task_id}*/{artifact_type}__*.md"))
if not candidates:
# Try HL naming convention: HL-{PREFIX}-{N}
candidates = sorted(root.glob(f"tasks/{task_id}*/HL-{task_id}*.md"))
if candidates:
rel = candidates[0].relative_to(root)
return f"`{match.group(0)}`"
print(f"WARNING [gen_docs]: Unresolved reference: {match.group(0)}")
return f"โ ๏ธ {match.group(0)} *(unresolved)*"
return pattern.sub(replacer, content)
Key constraint: the regex must NOT match references already inside markdown links โ negative lookbehind (?<!\[) and negative lookahead (?!\]) prevent [RF TFW-18](...) from being double-wrapped.
Decision/TD reference resolution:
def _resolve_decision_refs(content: str) -> str:
"""Resolve 'D{N}' to KNOWLEDGE.md anchors. Only inside tables."""
# Only resolve D{N} when it appears in a table row (starts with |)
pattern = re.compile(r'(?<=\| )\bD(\d+)\b')
return pattern.sub(r'[D\1](knowledge-index.md#architecture-decisions)', content)
def _resolve_techdebt_refs(content: str) -> str:
"""Resolve 'TD-{N}' to TECH_DEBT.md anchors."""
pattern = re.compile(r'\bTD-(\d+)\b')
return pattern.sub(r'[TD-\1](reference/tech-debt.md)', content)
P{N} โ deferred to Phase C due to false positive risk. Needs heuristic tuning.
Step 6: Implement main()¶
def main():
"""Entry point for mkdocs-gen-files plugin."""
warnings = validate_sources()
for w in warnings:
print(f"WARNING [gen_docs]: {w}")
root = Path(__file__).parent.parent.parent
# 1. Static sources
for source, output, required in STATIC_SOURCES:
path = root / source
if not path.exists():
continue # already warned in validate_sources
# Handle index override
if output == "index.md":
override = root / INDEX_OVERRIDE
if override.exists():
copy_with_frontmatter(INDEX_OVERRIDE, output)
continue
copy_with_frontmatter(source, output)
# 2. Glob sources
for pattern, prefix, required in GLOB_SOURCES:
copy_glob(pattern, prefix)
Step 7: Read task_prefix from config¶
import yaml
def _read_task_prefix() -> str:
"""Read tfw.task_prefix from PROJECT_CONFIG.yaml."""
root = Path(__file__).parent.parent.parent
config_path = root / ".tfw" / "PROJECT_CONFIG.yaml"
if config_path.exists():
with open(config_path) as f:
config = yaml.safe_load(f)
return config.get("tfw", {}).get("task_prefix", "PROJ")
return "PROJ"
Add pyyaml to docs/requirements.txt.
Step 8: Write tests¶
Unit tests (docs/scripts/test_gen_docs.py)¶
import pytest
from gen_docs import extract_title, add_frontmatter, resolve_references
class TestExtractTitle:
def test_heading_found(self):
assert extract_title("# My Title\nContent", "file.md") == "My Title"
def test_no_heading_fallback_to_filename(self):
assert extract_title("No heading here", "my_file.md") == "My File"
def test_heading_with_prefix(self):
assert extract_title("# RF โ [TFW-26](../HL-TFW-26__documentation_site.md) / Phase A", "rf.md") == "RF โ [TFW-26](../HL-TFW-26__documentation_site.md) / Phase A"
class TestAddFrontmatter:
def test_adds_frontmatter(self):
result = add_frontmatter("# Title\nBody", "Title", "src.md")
assert result.startswith("---")
assert 'title: "Title"' in result
assert 'source: "src.md"' in result
def test_skips_existing_frontmatter(self):
content = "---\ntitle: Existing\n---\nBody"
assert add_frontmatter(content, "New", "src.md") == content
class TestResolveReferences:
def test_artifact_ref_resolved(self, tmp_path):
"""[RF TFW-18](../../TFW-18__knowledge_consolidation/RF__PhaseB__knowledge_quality.md) โ hyperlink when file exists."""
# Create mock task structure
task_dir = tmp_path / "tasks" / "TFW-18__knowledge"
task_dir.mkdir(parents=True)
(task_dir / "RF__TFW-18__knowledge.md").write_text("# RF")
# Test resolution with project_root override
content = "Source: [RF TFW-18](../../TFW-18__knowledge_consolidation/RF__PhaseB__knowledge_quality.md) ยง6"
result = resolve_references(content, project_root=tmp_path)
assert "[RF TFW-18]" in result
assert "RF__TFW-18" in result
def test_unresolved_ref_marked(self, tmp_path):
"""Unresolvable ref gets โ ๏ธ marker."""
content = "Source: RF TFW-999"
result = resolve_references(content, project_root=tmp_path)
assert "โ ๏ธ" in result
assert "unresolved" in result
def test_existing_link_not_doubled(self, tmp_path):
"""[RF TFW-18](path) should NOT be re-wrapped."""
content = "[RF TFW-18](tasks/TFW-18/RF.md)"
result = resolve_references(content, project_root=tmp_path)
assert content == result # unchanged
def test_td_ref_resolved(self):
content = "| [TD-59](../../../reference/tech-debt.md) | source |"
result = resolve_references(content, project_root=None) # no glob needed
assert "[TD-59]" in result
assert "tech-debt" in result
def test_decision_ref_in_table(self):
content = "| [D24](../../../knowledge-index.md#architecture-decisions) | decision |"
result = resolve_references(content, project_root=None)
assert "[D24]" in result
assert "knowledge-index" in result
Integration test (docs/scripts/test_integration.py)¶
import subprocess
import sys
from pathlib import Path
PROJECT_ROOT = Path(__file__).parent.parent.parent
def test_mkdocs_build_succeeds():
"""Full MkDocs build exits 0 on this project."""
result = subprocess.run(
[sys.executable, "-m", "mkdocs", "build",
"--config-file", str(PROJECT_ROOT / "docs" / "mkdocs.yml"),
"--strict"],
capture_output=True, text=True, cwd=str(PROJECT_ROOT)
)
assert result.returncode == 0, f"Build failed:\n{result.stderr}"
def test_static_pages_generated():
"""Key static pages exist in site/ output."""
site = PROJECT_ROOT / "site"
assert (site / "index.html").exists(), "index.html missing"
assert (site / "getting-started" / "index.html").exists()
assert (site / "concepts" / "philosophy" / "index.html").exists()
assert (site / "reference" / "conventions" / "index.html").exists()
assert (site / "reference" / "glossary" / "index.html").exists()
assert (site / "reference" / "changelog" / "index.html").exists()
def test_task_pages_generated():
"""Task artifacts are accessible."""
site = PROJECT_ROOT / "site"
# At least one task folder should have pages
task_pages = list((site / "tasks").rglob("index.html")) if (site / "tasks").exists() else []
assert len(task_pages) > 10, f"Expected 10+ task pages, got {len(task_pages)}"
def test_knowledge_pages_generated():
"""Knowledge topic files are compiled."""
site = PROJECT_ROOT / "site"
knowledge_dir = site / "knowledge"
assert knowledge_dir.exists(), "knowledge/ section missing"
topic_pages = list(knowledge_dir.glob("*/index.html"))
assert len(topic_pages) >= 3, f"Expected 3+ topics, got {len(topic_pages)}"
def test_no_unresolved_markers_in_output():
"""No โ ๏ธ unresolved markers in key pages."""
site = PROJECT_ROOT / "site"
# Check a few key pages for unresolved markers
key_pages = [
site / "knowledge-index" / "index.html",
site / "reference" / "tech-debt" / "index.html",
]
for page in key_pages:
if page.exists():
content = page.read_text(encoding="utf-8")
# Allow some unresolved in tech-debt (old format)
# but knowledge-index should be clean
if "knowledge-index" in str(page):
assert "โ ๏ธ" not in content, f"Unresolved ref in {page}"
Note:
resolve_references()needs aproject_rootparameter for testability (default:Path(__file__).parent.parent.parent). This is a minor signature change from the Phase A skeleton.
Step 9: Run tests and verify locally¶
cd d:/projects/research/steps-framework
pip install -r docs/requirements.txt
pytest docs/scripts/test_gen_docs.py -v
mkdocs build --config-file docs/mkdocs.yml --strict
pytest docs/scripts/test_integration.py -v
mkdocs serve --config-file docs/mkdocs.yml
Spot-check in browser:
1. / โ README.md with Task Board, links to tasks/ work
2. /tasks/TFW-26__documentation_site/PhaseA/RF__PhaseA__compilable_contract/ โ RF accessible
3. /knowledge-index โ KNOWLEDGE.md, scroll to ยง1 Architecture Decisions
4. /reference/conventions โ conventions.md including ยง16
5. /reference/changelog โ 366 lines of CHANGELOG
Verify reference resolution:
- Open /reference/tech-debt โ check that Source column TFW-26/A RF obs. references are resolved (or left as text if format doesn't match pattern)
- Open a knowledge/ topic file โ check Source column references
5. Acceptance Criteria¶
- [ ]
mkdocs build --config-file docs/mkdocs.ymlexits 0 with no ERROR-level messages - [ ] All 9 static sources produce pages with YAML frontmatter
- [ ] All 4 glob patterns produce pages (tasks/, knowledge/, workflows/, templates/)
- [ ] Task Board links in
index.mdnavigate to task pages - [ ] Reference resolver handles
[RF TFW-18](../../TFW-18__knowledge_consolidation/RF__PhaseB__knowledge_quality.md),[HL-TFW-19](../../TFW-19__config_propagation/HL-TFW-19__config_propagation.md),TD-{N}patterns - [ ] References already inside markdown links are NOT double-wrapped
- [ ] Missing optional sources produce WARNING (page not generated, nav entry skipped)
- [ ] Unresolvable references are marked with
โ ๏ธin output (never silent) - [ ]
mkdocs build --strictfails if any references are unresolvable - [ ]
tfw.task_prefixread from PROJECT_CONFIG.yaml (not hardcoded) - [ ]
docs/requirements.txtincludes pytest, pyyaml - [ ] Unit tests pass:
pytest docs/scripts/test_gen_docs.py(8+ tests) - [ ] Integration tests pass:
pytest docs/scripts/test_integration.py(5+ tests) - [ ]
resolve_references()acceptsproject_rootparameter for testability
6. Phase Risks¶
| Risk | Mitigation |
|---|---|
| mkdocs-gen-files API differs from expected (open/write pattern) | Executor tests against installed version. Docs: https://github.com/oprypin/mkdocs-gen-files |
Glob tasks/**/*.md picks up non-artifact files (e.g., research/ stage files) |
These ARE valid artifacts per ยง16.1. Include them all |
| Reference resolver slows build on 130+ task files | Regex is O(n) per file. Expected total: <1s for all files |
| P{N} false positives | Deferred to Phase C. Only TD-{N}, D{N}, artifact refs resolved in Phase B |
| Index override conflicts with gen-files | If docs/index.md exists AND gen_docs writes index.md, gen-files last-write-wins. Logic: skip README.md for index when override exists |
TS โ TFW-26 / Phase B: gen_docs.py Implementation | 2026-04-05