HL — TFW-38: Quality Enforcement — Staged Review, Handoff Enforcement, Diagram Collection¶
Date: 2026-04-14 Author: Coordinator Status: 📝 HL_DRAFT — Updated with RESEARCH (4 iterations)
1. Vision¶
TFW workflow agents consistently produce complete artifacts — including diagrams, fact candidates, and strategic insights — and reviewers perform genuine quality audits via a structured 4-stage review process (Map → Verify → Judge → Decide) with domain-aware modes. Diagrams produced during tasks are indexed in KNOWLEDGE.md as living project documentation.
Impact: Every RF contains §6-8 because the workflow enforces it. Every REVIEW follows a staged flow with independent verification. Every diagram is indexed and discoverable.
"The reviewer caught three claims in the RF that didn't match the actual code. The Judge stage forced them to evaluate quality against HL philosophy, not just check boxes."
2. Current State (As-Is)¶
Observed problems (empirically validated — 4 research iterations across 4+ projects, 80+ RF files):¶
| # | Problem | Root Cause | Evidence |
|---|---|---|---|
| P1 | Executor skips RF §6-8 (96-100% skip rate) | handoff.md Phase 3 doesn't enumerate §6-8 by name |
RES1 F1: grep across 80+ RFs |
| P2 | Reviewer trusts RF claims without verification | review.md has no audit mandate, no spot-check instruction |
RES1 F4: only TFW-19 has independent verification |
| P3 | Researcher skips Findings Map (~50% in newer, 0% in older projects) | research/base.md Step 6 omits Findings Map from enumeration |
RES1 F5 |
| P4 | Diagrams abandoned after task closes | docs.md has no diagram collection mechanism |
RES1 Q3: never attempted |
| P5 | REVIEW checklist is 44% code-specific — generates N/A noise on non-code tasks | Single checklist for all task types | RES2 F9: 4 of 9 items are N/A on docs/spec tasks |
| P6 | KNOWLEDGE.md read in 3 workflows but NEVER cited in output | No citation mandate in plan.md/handoff.md — cross-task knowledge stays siloed | RES4 F14: "read but don't use" pattern |
Root cause (unified — RES1 D1):¶
Agents follow workflow step instructions literally. Template has the section → but workflow doesn't enumerate it → agent skips it. Workflow wins over template in agent attention.
3. Target State (To-Be)¶
3.1 Result Visualization¶
Before → After:
| Artifact | Before | After |
|---|---|---|
| RF §6-8 | Skipped 96-100% | Always present — handoff.md explicitly requires each |
| REVIEW | Single-pass read → trust → checklist | 4-stage flow: Map → Verify → Judge → Decide |
| REVIEW checklist | 9 items, 44% code-only | Mode-aware: 6 universal + 2-4 mode-specific |
| RES Findings Map | Skipped ~50% | Always present — research/base.md explicitly requires |
| Diagrams lifecycle | Created → abandoned | Indexed in KNOWLEDGE.md §2 by docs.md |
| Knowledge citations | Agent says "per D28" — no link, could be hallucinated | Citation table with links in HL §7.2 / ONB §7, reviewer verifies |
Sample 4-stage REVIEW flow:
┌──────┐ ┌────────┐ ┌───────┐ ┌────────┐
│ MAP │ → │ VERIFY │ → │ JUDGE │ → │ DECIDE │
│"What │ │"Are │ │"Is the│ │"What's │
│ was │ │ claims │ │ quality│ │ the │
│ done"│ │ true?" │ │ good?" │ │verdict"│
└──────┘ └────────┘ └───────┘ └────────┘
Read RF Open files Checklist Verdict +
+TS+HL Spot-check (mode: Tech Debt +
Build Re-run test code/docs/ Fact Cands
mental Check AC spec) Traces
model evidence
3.2 Value Flow¶
EXECUTOR REVIEWER DOCS
│ │ │
handoff.md review.md docs.md
Phase 3: 4-stage flow: Checklist #7:
§1-8 explicit Map→Verify→Judge→Decide Diagram index
│ + mode selection (code/docs/spec) │
▼ ▼ ▼
RF with REVIEW with KNOWLEDGE.md §2
all sections evidence-based verdict Diagram Index
Phase B: Knowledge Citation cascade:
Coordinator Executor Reviewer
─────────── ──────── ────────
plan.md Step 3 handoff.md Ph.1 review.md Step 2
│ │ │
SCANS PV Index READS HL §7.2 SCANS PV Index
(7 sources, citations only (independent)
glossary.md) │ │
▼ ▼ ▼
HL §7.2 ONB §7 verify.md
Knowledge "Read ✅, Citations Verified
Citations Applied" / link resolves? ✅/❌
[D28](link) "NEW: [D31](../../knowledge-index.md#architecture-decisions)" hallucinations? {H}
4. Phases¶
Phase A: Review Restructure + Full Enforcement Chain 🔴¶
Requires: Independent
Context for coordinator: 1.
review.md— full workflow 2.REVIEW.mdtemplate — full template 3.handoff.md— Phase 1 (onboarding) + Phase 3 (RF) 4.research/base.md— Step 6 (synthesis) 5.plan.md— Step 3 (knowledge citation) 6.conventions.md§3 Visual Sections + §14 Anti-patterns 7. RES D1, D6→D11, D7→D12, D8, D9, D14-D17Key decisions: - D1: explicit §6-8 mandate in handoff.md - D11: 4-stage review: Map → Verify → Judge → Decide - D12: 3 review modes: code / docs / spec - ~~D8: stages as REVIEW sections (not separate files)~~ → superseded by D18 - D9: REVIEW template restructures to match stages - D14: knowledge citation mandate in plan.md Step 3 - D15: KNOWLEDGE.md inconsistency check in handoff.md Phase 1 - D16: diagram creation mandate in handoff.md Phase 3 - D17: Findings Map mandate in research/base.md Step 6
Deliverables: 1.
review.md— restructured with 4 stages + mode selection (Step 0) 2.REVIEW.mdtemplate — new §1-§7 structure matching stages 3..tfw/workflows/review/code.md— code-mode checklist + verify actions [NEW] 4..tfw/workflows/review/docs.md— docs-mode checklist + verify actions [NEW] 5..tfw/workflows/review/spec.md— spec-mode checklist + verify actions [NEW] 6.handoff.md— Phase 1: add KNOWLEDGE.md to inconsistency check; Phase 3: explicitly enumerate §6-§8 7.research/base.md— Step 6 explicitly requires Findings Map 8.plan.md— Step 3: add "Check KNOWLEDGE.md, cite relevant items in HL §4" 9.conventions.md§14 — new anti-patterns for review trust / section skipping / knowledge non-citation
Phase A.2: Review Stage Files + Self-Check Gates 🔴¶
Requires: Phase A ✅
Key decisions: - D18: review stages as separate files (map.md, verify.md, judge.md) — supersedes D8. Without file fixation stages collapse into single stream of consciousness, same root cause as P1-P3.
Deliverables: 1.
.tfw/templates/review/map.md— Map stage template with self-check gate [NEW] 2..tfw/templates/review/verify.md— Verify stage template with self-check gate [NEW] 3..tfw/templates/review/judge.md— Judge stage template with self-check gate [NEW] 4.review.md— Steps 1-3 write stage files; Step 4 synthesizes into REVIEW 5.REVIEW.mdtemplate — references stage files; synthesis artifact 6.conventions.md§3 — review stage files in artifact taxonomy
Phase B: Knowledge Citation Table 🔴¶
Requires: Phase A.2 ✅
Context for coordinator: 1. HL §7 P6 (Knowledge Gate) + S9 (cross-task knowledge as hard gate) 2. RES iter 4 D14-D15 (knowledge citation mandate) 3.
glossary.md→ Project Values (PV) — term and PV Index table (already added) 4.plan.mdStep 3,handoff.mdPhase 1,review/verify.md5.conventions.md§3 Artifact TypesKey decisions: - D19: Knowledge Citation Table — traceable table with links replacing silent "I checked" - Cascade model: Coordinator + Reviewer do full PV scan. Executor references coordinator's citations. - HL §7.2 (next to Principles), not §4.1 (Phases) — citations and principles are same cognitive space - ONB §7 (standalone), not §6.1 (Inconsistencies) — citations ≠ inconsistencies - Unified naming: "Knowledge Citations" everywhere (one cognitive mode = one name, per D28/D39)
Design rationale (why this structure): - §7.2 in HL: coordinator writes "what I believe" (§7 Principles) and "what I read" (§7.2 Citations) together. Reviewer sees both in one place — can check: are principles grounded in real knowledge? - §7 in ONB: executor DOESN'T rescan everything (wasteful). Reads coordinator's work, confirms reading, adds NEW items coordinator missed. Lightweight but traceable. - Anti-hallucination: reviewer opens each link in verify.md. ❌ = hallucinated citation = Discrepancy. Without this: agent says "per D28" — D28 could be invented. With this: link or it didn't happen. - Greenfield projects: "No applicable knowledge items — project in bootstrap phase" is valid.
Deliverables: 1.
HL.mdtemplate — new §7.2 Knowledge Citations (Coordinator fills, full PV scan) 2.ONB.mdtemplate — new §7 Knowledge Citations (Executor references HL §7.2) 3.review/verify.mdtemplate — Knowledge Citations Verified section (link resolution check) 4.plan.mdStep 3 — instruct coordinator: scan PV Index → fill HL §7.2 5.handoff.mdPhase 1 — instruct executor: read HL §7.2 → fill ONB §7Diagram indexing → moved to TFW-39 (Visual Knowledge System)
5. Definition of Done (DoD)¶
- ✅ 1.
review.mdrestructured with 4 stages (Map → Verify → Judge → Decide) + mode selection - ✅ 2.
REVIEW.mdtemplate restructured to §1-§7 matching stages - ✅ 3. 3 review mode files created (code.md, docs.md, spec.md) in
.tfw/workflows/review/ - ✅ 4.
handoff.mdPhase 1 checks KNOWLEDGE.md inconsistencies; Phase 3 explicitly lists §6-§8 - ✅ 5.
research/base.mdStep 6 explicitly requires Findings Map - ✅ 6.
plan.mdStep 3 requires citing relevant KNOWLEDGE.md items in HL §4 - ✅ 7.
docs.mdhas diagram indexing mechanism - ✅ 8.
conventions.md§14 has new anti-patterns (review trust, §6-8 skip, knowledge non-citation) - ✅ 9.
PROJECT_CONFIG.yamlhastfw.review.default_mode: code - ✅ 10. All changes fit within scope budgets (≤14 files, ≤1200 LOC per phase)
6. Definition of Failure (DoF)¶
- ❌ 1. review.md exceeds 1200 words (token density rule)
- ❌ 2. Mode files duplicate universal checklist items
- ❌ 3. Template-workflow disconnect recreated (sections exist in template but workflow doesn't reference them)
On failure: Compress. Mode files contain only differential items. Universal items inline in review.md.
7. Principles¶
- Workflow > Template — Enforcement belongs in the workflow. Template = format spec.
- Map, Verify, Judge, Decide — Reviewer follows explicit cognitive stages, not a single-pass read.
- Mode, not checklist — Task type determines which checklist items apply (code/docs/spec).
- Index, don't copy — Diagrams stay in RF/RES traces with full context; KNOWLEDGE.md indexes them.
- Naming Creates Behavior — Stage/mode names must be self-explanatory: 1-2 syllables, active verbs/nouns.
- Knowledge Gate — Every role MUST cross-reference KNOWLEDGE.md before producing output. Not a recommendation — a hard gate. Coordinator cites in HL §4 (D14). Executor checks inconsistencies in ONB (D15). Reviewer verifies contradictions in verify.md and judge.md (A.2). "No applicable knowledge items" = valid trace. Silent omission = process failure.
8. Dependencies¶
| Dependency | Status |
|---|---|
| None | ✅ |
9. Risks¶
| Risk | Probability | Impact | Mitigation |
|---|---|---|---|
| review.md word count exceeds 1200 | Medium | High | Mode files hold differential items; keep main workflow lean |
| 3 new mode files = maintenance burden | Low | Medium | Mode files are small (~200 words each); parallel research modes |
| Reviewer mode selection adds friction | Low | Low | Default mode in PROJECT_CONFIG.yaml; auto-detect from TS context |
10. RESEARCH Case¶
Hypotheses (RESEARCH complete — 4 iterations)¶
| # | Hypothesis | Status | Evidence |
|---|---|---|---|
| H1 | Explicit §6-8 enumeration stops skipping | 🟢 confirmed | RES1: 96-100% skip rate, template alone insufficient |
| H2 | Audit step changes reviewer behavior | 🟢 → superseded by H3 | RES1: trust-chain failures confirmed |
| H3 | Structured review stages with modes produce more reliable reviews | 🟢 confirmed | RES2: ISO V&V mapping, mode eliminates N/A friction |
| H4 | "Naming Creates Behavior" — self-explanatory names eliminate comprehension friction | 🟢 confirmed | RES3: user tested — "comprehend"❌, "map"✅, "judge"✅ |
| H5 | Knowledge citation mandate closes the "read but don't use" gap | 🟢 confirmed | RES4: KNOWLEDGE.md in 3 context loads, 0 output citations |
Parking Lot (out of scope — future tasks)¶
- TS Template Conventions — L1-L4 specification level guidance (RES3 D13). Coordinators write 49-71% code in TS files — missing guidance on when to use which level.
11. Strategic Insights (Planning)¶
| # | Insight | Category | Source |
|---|---|---|---|
| S1 | Agents treat workflows as WHAT, templates as HOW. Workflow wins. Enforcement MUST be in workflows. | philosophy | User + empirical evidence |
| S2 | Reviewer needs cognitive mode shift: from "summarize" to "verify." Structured stages enforce this. | philosophy | User, HD project |
| S3 | Diagrams have documentation value beyond task lifecycle. Index, don't copy. | process | User + RES D3/D10 |
| S4 | ONB demonstrates "verify spec vs reality" pattern. Reviewer should apply same to RF claims. | philosophy | User observation |
| S5 | TFW has two classes of workflows: investigative (staged — research, review) and procedural (linear — handoff, docs). Investigative workflows need cognitive mode transitions. | philosophy | RES1 SS1 |
| S6 | Review stages Map→Verify→Judge→Decide parallel research Briefing→Gather→Extract→Challenge but without OODA loops/WAIT gates. | philosophy | RES2 D6-D8 |
| S7 | Mode names describe output type (code/docs/spec), not domain (business/education). Output-type naming is finite and domain-agnostic. | convention | RES2 F8, RES3 D12 |
| S8 | The "explicit N/A" pattern ("No diagrams.", "No applicable knowledge items.") transforms silent skips into conscious traces. The trace enables reviewer challenge: RF §8 says "No diagrams" for a phase with a state machine → reviewer can flag it. Without explicit N/A, reviewer can't distinguish "forgot" from "decided." | philosophy | RES4 F16 |
| S9 | KNOWLEDGE.md cross-reference must be a GATE across all 4 roles, not a soft "check". Coordinator works in task silo without connecting to prior decisions (D14 root cause). Pattern: every role has a mandatory checkpoint where they read KNOWLEDGE.md and either cite relevant items or write "No applicable knowledge items." The gate is complete only when the citation or N/A is written in the output artifact. Coordinator→HL §4, Executor→ONB §6, Reviewer→verify.md checkpoint, Researcher→RES Fact Candidates. | philosophy | RES4 D14-D15, A.2 cross-check |
HL — TFW-38: Quality Enforcement | 2026-04-14