Skip to content

HL — TFW-38: Quality Enforcement — Staged Review, Handoff Enforcement, Diagram Collection

Date: 2026-04-14 Author: Coordinator Status: 📝 HL_DRAFT — Updated with RESEARCH (4 iterations)


1. Vision

TFW workflow agents consistently produce complete artifacts — including diagrams, fact candidates, and strategic insights — and reviewers perform genuine quality audits via a structured 4-stage review process (Map → Verify → Judge → Decide) with domain-aware modes. Diagrams produced during tasks are indexed in KNOWLEDGE.md as living project documentation.

Impact: Every RF contains §6-8 because the workflow enforces it. Every REVIEW follows a staged flow with independent verification. Every diagram is indexed and discoverable.

"The reviewer caught three claims in the RF that didn't match the actual code. The Judge stage forced them to evaluate quality against HL philosophy, not just check boxes."

2. Current State (As-Is)

Observed problems (empirically validated — 4 research iterations across 4+ projects, 80+ RF files):

# Problem Root Cause Evidence
P1 Executor skips RF §6-8 (96-100% skip rate) handoff.md Phase 3 doesn't enumerate §6-8 by name RES1 F1: grep across 80+ RFs
P2 Reviewer trusts RF claims without verification review.md has no audit mandate, no spot-check instruction RES1 F4: only TFW-19 has independent verification
P3 Researcher skips Findings Map (~50% in newer, 0% in older projects) research/base.md Step 6 omits Findings Map from enumeration RES1 F5
P4 Diagrams abandoned after task closes docs.md has no diagram collection mechanism RES1 Q3: never attempted
P5 REVIEW checklist is 44% code-specific — generates N/A noise on non-code tasks Single checklist for all task types RES2 F9: 4 of 9 items are N/A on docs/spec tasks
P6 KNOWLEDGE.md read in 3 workflows but NEVER cited in output No citation mandate in plan.md/handoff.md — cross-task knowledge stays siloed RES4 F14: "read but don't use" pattern

Root cause (unified — RES1 D1):

Agents follow workflow step instructions literally. Template has the section → but workflow doesn't enumerate it → agent skips it. Workflow wins over template in agent attention.

3. Target State (To-Be)

3.1 Result Visualization

Before → After:

Artifact Before After
RF §6-8 Skipped 96-100% Always present — handoff.md explicitly requires each
REVIEW Single-pass read → trust → checklist 4-stage flow: Map → Verify → Judge → Decide
REVIEW checklist 9 items, 44% code-only Mode-aware: 6 universal + 2-4 mode-specific
RES Findings Map Skipped ~50% Always present — research/base.md explicitly requires
Diagrams lifecycle Created → abandoned Indexed in KNOWLEDGE.md §2 by docs.md
Knowledge citations Agent says "per D28" — no link, could be hallucinated Citation table with links in HL §7.2 / ONB §7, reviewer verifies

Sample 4-stage REVIEW flow:

┌──────┐    ┌────────┐    ┌───────┐    ┌────────┐
│ MAP  │ →  │ VERIFY │ →  │ JUDGE │ →  │ DECIDE │
│"What │    │"Are    │    │"Is the│    │"What's │
│ was  │    │ claims │    │ quality│    │ the    │
│ done"│    │ true?" │    │ good?" │    │verdict"│
└──────┘    └────────┘    └───────┘    └────────┘
 Read RF     Open files    Checklist    Verdict +
 +TS+HL      Spot-check    (mode:       Tech Debt +
 Build       Re-run test   code/docs/   Fact Cands
 mental      Check AC      spec)        Traces
 model       evidence

3.2 Value Flow

EXECUTOR                REVIEWER                         DOCS
   │                       │                               │
handoff.md              review.md                       docs.md
Phase 3:                4-stage flow:                   Checklist #7:
§1-8 explicit           Map→Verify→Judge→Decide         Diagram index
   │                    + mode selection (code/docs/spec)    │
   ▼                       ▼                               ▼
RF with                 REVIEW with                    KNOWLEDGE.md §2
all sections            evidence-based verdict         Diagram Index

Phase B: Knowledge Citation cascade:

      Coordinator                    Executor                   Reviewer
      ───────────                    ────────                   ────────
      plan.md Step 3                 handoff.md Ph.1            review.md Step 2
           │                              │                          │
      SCANS PV Index              READS HL §7.2              SCANS PV Index
      (7 sources,                 citations only             (independent)
       glossary.md)                       │                          │
           ▼                              ▼                          ▼
      HL §7.2                        ONB §7                   verify.md
      Knowledge                      "Read ✅,               Citations Verified
      Citations                       Applied" /              link resolves? ✅/❌
      [D28](link)                    "NEW: [D31](../../knowledge-index.md#architecture-decisions)"              hallucinations? {H}

4. Phases

Phase A: Review Restructure + Full Enforcement Chain 🔴

Requires: Independent

Context for coordinator: 1. review.md — full workflow 2. REVIEW.md template — full template 3. handoff.md — Phase 1 (onboarding) + Phase 3 (RF) 4. research/base.md — Step 6 (synthesis) 5. plan.md — Step 3 (knowledge citation) 6. conventions.md §3 Visual Sections + §14 Anti-patterns 7. RES D1, D6D11, D7D12, D8, D9, D14-D17

Key decisions: - D1: explicit §6-8 mandate in handoff.md - D11: 4-stage review: Map → Verify → Judge → Decide - D12: 3 review modes: code / docs / spec - ~~D8: stages as REVIEW sections (not separate files)~~ → superseded by D18 - D9: REVIEW template restructures to match stages - D14: knowledge citation mandate in plan.md Step 3 - D15: KNOWLEDGE.md inconsistency check in handoff.md Phase 1 - D16: diagram creation mandate in handoff.md Phase 3 - D17: Findings Map mandate in research/base.md Step 6

Deliverables: 1. review.md — restructured with 4 stages + mode selection (Step 0) 2. REVIEW.md template — new §1-§7 structure matching stages 3. .tfw/workflows/review/code.md — code-mode checklist + verify actions [NEW] 4. .tfw/workflows/review/docs.md — docs-mode checklist + verify actions [NEW] 5. .tfw/workflows/review/spec.md — spec-mode checklist + verify actions [NEW] 6. handoff.md — Phase 1: add KNOWLEDGE.md to inconsistency check; Phase 3: explicitly enumerate §6-§8 7. research/base.md — Step 6 explicitly requires Findings Map 8. plan.md — Step 3: add "Check KNOWLEDGE.md, cite relevant items in HL §4" 9. conventions.md §14 — new anti-patterns for review trust / section skipping / knowledge non-citation

Phase A.2: Review Stage Files + Self-Check Gates 🔴

Requires: Phase A ✅

Key decisions: - D18: review stages as separate files (map.md, verify.md, judge.md) — supersedes D8. Without file fixation stages collapse into single stream of consciousness, same root cause as P1-P3.

Deliverables: 1. .tfw/templates/review/map.md — Map stage template with self-check gate [NEW] 2. .tfw/templates/review/verify.md — Verify stage template with self-check gate [NEW] 3. .tfw/templates/review/judge.md — Judge stage template with self-check gate [NEW] 4. review.md — Steps 1-3 write stage files; Step 4 synthesizes into REVIEW 5. REVIEW.md template — references stage files; synthesis artifact 6. conventions.md §3 — review stage files in artifact taxonomy

Phase B: Knowledge Citation Table 🔴

Requires: Phase A.2 ✅

Context for coordinator: 1. HL §7 P6 (Knowledge Gate) + S9 (cross-task knowledge as hard gate) 2. RES iter 4 D14-D15 (knowledge citation mandate) 3. glossary.md → Project Values (PV) — term and PV Index table (already added) 4. plan.md Step 3, handoff.md Phase 1, review/verify.md 5. conventions.md §3 Artifact Types

Key decisions: - D19: Knowledge Citation Table — traceable table with links replacing silent "I checked" - Cascade model: Coordinator + Reviewer do full PV scan. Executor references coordinator's citations. - HL §7.2 (next to Principles), not §4.1 (Phases) — citations and principles are same cognitive space - ONB §7 (standalone), not §6.1 (Inconsistencies) — citations ≠ inconsistencies - Unified naming: "Knowledge Citations" everywhere (one cognitive mode = one name, per D28/D39)

Design rationale (why this structure): - §7.2 in HL: coordinator writes "what I believe" (§7 Principles) and "what I read" (§7.2 Citations) together. Reviewer sees both in one place — can check: are principles grounded in real knowledge? - §7 in ONB: executor DOESN'T rescan everything (wasteful). Reads coordinator's work, confirms reading, adds NEW items coordinator missed. Lightweight but traceable. - Anti-hallucination: reviewer opens each link in verify.md. ❌ = hallucinated citation = Discrepancy. Without this: agent says "per D28" — D28 could be invented. With this: link or it didn't happen. - Greenfield projects: "No applicable knowledge items — project in bootstrap phase" is valid.

Deliverables: 1. HL.md template — new §7.2 Knowledge Citations (Coordinator fills, full PV scan) 2. ONB.md template — new §7 Knowledge Citations (Executor references HL §7.2) 3. review/verify.md template — Knowledge Citations Verified section (link resolution check) 4. plan.md Step 3 — instruct coordinator: scan PV Index → fill HL §7.2 5. handoff.md Phase 1 — instruct executor: read HL §7.2 → fill ONB §7

Diagram indexing → moved to TFW-39 (Visual Knowledge System)

5. Definition of Done (DoD)

  • ✅ 1. review.md restructured with 4 stages (Map → Verify → Judge → Decide) + mode selection
  • ✅ 2. REVIEW.md template restructured to §1-§7 matching stages
  • ✅ 3. 3 review mode files created (code.md, docs.md, spec.md) in .tfw/workflows/review/
  • ✅ 4. handoff.md Phase 1 checks KNOWLEDGE.md inconsistencies; Phase 3 explicitly lists §6-§8
  • ✅ 5. research/base.md Step 6 explicitly requires Findings Map
  • ✅ 6. plan.md Step 3 requires citing relevant KNOWLEDGE.md items in HL §4
  • ✅ 7. docs.md has diagram indexing mechanism
  • ✅ 8. conventions.md §14 has new anti-patterns (review trust, §6-8 skip, knowledge non-citation)
  • ✅ 9. PROJECT_CONFIG.yaml has tfw.review.default_mode: code
  • ✅ 10. All changes fit within scope budgets (≤14 files, ≤1200 LOC per phase)

6. Definition of Failure (DoF)

  • ❌ 1. review.md exceeds 1200 words (token density rule)
  • ❌ 2. Mode files duplicate universal checklist items
  • ❌ 3. Template-workflow disconnect recreated (sections exist in template but workflow doesn't reference them)

On failure: Compress. Mode files contain only differential items. Universal items inline in review.md.

7. Principles

  1. Workflow > Template — Enforcement belongs in the workflow. Template = format spec.
  2. Map, Verify, Judge, Decide — Reviewer follows explicit cognitive stages, not a single-pass read.
  3. Mode, not checklist — Task type determines which checklist items apply (code/docs/spec).
  4. Index, don't copy — Diagrams stay in RF/RES traces with full context; KNOWLEDGE.md indexes them.
  5. Naming Creates Behavior — Stage/mode names must be self-explanatory: 1-2 syllables, active verbs/nouns.
  6. Knowledge Gate — Every role MUST cross-reference KNOWLEDGE.md before producing output. Not a recommendation — a hard gate. Coordinator cites in HL §4 (D14). Executor checks inconsistencies in ONB (D15). Reviewer verifies contradictions in verify.md and judge.md (A.2). "No applicable knowledge items" = valid trace. Silent omission = process failure.

8. Dependencies

Dependency Status
None

9. Risks

Risk Probability Impact Mitigation
review.md word count exceeds 1200 Medium High Mode files hold differential items; keep main workflow lean
3 new mode files = maintenance burden Low Medium Mode files are small (~200 words each); parallel research modes
Reviewer mode selection adds friction Low Low Default mode in PROJECT_CONFIG.yaml; auto-detect from TS context

10. RESEARCH Case

Hypotheses (RESEARCH complete — 4 iterations)

# Hypothesis Status Evidence
H1 Explicit §6-8 enumeration stops skipping 🟢 confirmed RES1: 96-100% skip rate, template alone insufficient
H2 Audit step changes reviewer behavior 🟢 → superseded by H3 RES1: trust-chain failures confirmed
H3 Structured review stages with modes produce more reliable reviews 🟢 confirmed RES2: ISO V&V mapping, mode eliminates N/A friction
H4 "Naming Creates Behavior" — self-explanatory names eliminate comprehension friction 🟢 confirmed RES3: user tested — "comprehend"❌, "map"✅, "judge"✅
H5 Knowledge citation mandate closes the "read but don't use" gap 🟢 confirmed RES4: KNOWLEDGE.md in 3 context loads, 0 output citations

Parking Lot (out of scope — future tasks)

  • TS Template Conventions — L1-L4 specification level guidance (RES3 D13). Coordinators write 49-71% code in TS files — missing guidance on when to use which level.

11. Strategic Insights (Planning)

# Insight Category Source
S1 Agents treat workflows as WHAT, templates as HOW. Workflow wins. Enforcement MUST be in workflows. philosophy User + empirical evidence
S2 Reviewer needs cognitive mode shift: from "summarize" to "verify." Structured stages enforce this. philosophy User, HD project
S3 Diagrams have documentation value beyond task lifecycle. Index, don't copy. process User + RES D3/D10
S4 ONB demonstrates "verify spec vs reality" pattern. Reviewer should apply same to RF claims. philosophy User observation
S5 TFW has two classes of workflows: investigative (staged — research, review) and procedural (linear — handoff, docs). Investigative workflows need cognitive mode transitions. philosophy RES1 SS1
S6 Review stages Map→Verify→Judge→Decide parallel research Briefing→Gather→Extract→Challenge but without OODA loops/WAIT gates. philosophy RES2 D6-D8
S7 Mode names describe output type (code/docs/spec), not domain (business/education). Output-type naming is finite and domain-agnostic. convention RES2 F8, RES3 D12
S8 The "explicit N/A" pattern ("No diagrams.", "No applicable knowledge items.") transforms silent skips into conscious traces. The trace enables reviewer challenge: RF §8 says "No diagrams" for a phase with a state machine → reviewer can flag it. Without explicit N/A, reviewer can't distinguish "forgot" from "decided." philosophy RES4 F16
S9 KNOWLEDGE.md cross-reference must be a GATE across all 4 roles, not a soft "check". Coordinator works in task silo without connecting to prior decisions (D14 root cause). Pattern: every role has a mandatory checkpoint where they read KNOWLEDGE.md and either cite relevant items or write "No applicable knowledge items." The gate is complete only when the citation or N/A is written in the output artifact. Coordinator→HL §4, Executor→ONB §6, Reviewer→verify.md checkpoint, Researcher→RES Fact Candidates. philosophy RES4 D14-D15, A.2 cross-check

HL — TFW-38: Quality Enforcement | 2026-04-14