HL — TFW-41: Execution Quality Gates¶
Date: 2026-04-20 Author: Coordinator (AI) + PO Status: 📝 HL_DRAFT — Awaiting review
1. Vision¶
TFW pipelines produce traces but don't prevent quality failures at their source. Executors copy code from over-detailed TS instead of engineering solutions. Coordinators write TS based on their plans instead of the actual output of the previous phase. HL principles exist as text without enforcement mechanisms. After this task, every critical handoff point in TFW has a structural gate — not instructions, not guidelines, but mechanisms that make it harder to fail silently.
Impact: Fewer Phase D "cleanup" phases. Executors think instead of copy. Coordinators verify against reality. HL principles survive into implementation.
"Слова без hard gate = декоративные слова." — PO, HD-18 retrospective
2. Current State (As-Is)¶
9 observed problems from production use (HD-9, HD-16, HD-18), documented in session notes:
| # | Problem | Source | Severity |
|---|---|---|---|
| 1 | RF drifts from template — executor writes from memory | HD-9 | Medium |
| 2 | Coordinator answers ONB questions confidently without sources | HD-9 | Medium |
| 3 | Sessions have no names — impossible to navigate | All | Low |
| 4 | TS = algorithm without "why" — executor follows steps, doesn't check result | HD-16 | High |
| 5 | Over-detailed TS = copy-paste executor (50KB of code in TS) | HD-16 | Critical |
| 6 | HL §7 principles without enforcement — "decorative words" | HD-18 | High |
| 7 | Coordinator verifies plan, not output of previous phase (plan≠fact drift) | HD-18 | Critical |
| 8 | Cross-phase test ownership unclear — rename breaks traceability | HD-18 | Medium |
| 9 | Phase dependency graph only in coordinator's head | HD-18 | Medium |
Root cause taxonomy¶
Three clusters:
CLUSTER 1: Agent acts on inertia (#1-3)
Agent doesn't check context before key action.
Fix pattern: explicit "check before acting" gates
CLUSTER 2: TS doesn't let executor think (#4-5)
TS contains ready code → executor copies instead of engineers.
Fix pattern: Requirements-first TS
CLUSTER 3: Coordinator doesn't verify context (#6-9)
Coordinator writes against plan, not reality.
Fix pattern: Pre-TS gates, enforcement tracing
Key evidence: HD-16 case study¶
| Phase | TS size | Content | Result |
|---|---|---|---|
| A | 23 KB, 534 lines | Step-by-step code | ✅ 22/22 AC |
| B | 30 KB, 842 lines | Full TypeScript files | ✅ 16/16 AC. BUT: palette not designed, 89 hardcoded colors untouched |
| C | 50 KB, 1280 lines | Complete components copy-paste ready | ✅ 23/24 AC. per_page=100 vs spec 500 (backend cap unnoticed) |
| D | 20 KB | Created to fix B+C failures | HL explicitly states root cause: "TS contained code, not requirements" |
Key evidence: HD-18 self-assessment¶
Coordinator scored own TS quality 7/10 (improvement over HD-16). But 3 errors reached ONB:
- Wrong function signature (copied from other workers, not verified)
- Wrong worker count (didn't count create_task calls)
- Referenced already-updated test (read own TS, not RF of previous phase)
3. Target State (To-Be)¶
3.1 Result Visualization¶
Imagine it's 3 months after TFW-41 ships. A coordinator writes TS for Phase C of a multi-phase task:
BEFORE (current):
Coordinator opens TS Phase B (own plan) ← reads PLAN
Writes Phase C TS with code blocks ← writes CODE
Executor copies code from TS ← COPIES
Reviewer finds: "palette not designed" ← Phase D needed
3 TS errors reach ONB ← waste
AFTER ([TFW-41](HL-TFW-41__execution_quality_gates.md)):
Coordinator opens RF Phase B (actual output) ← reads FACT
Writes Phase C TS with Requirements + Gates ← writes WHAT, not HOW
Executor reads Requirements, engineers solution ← THINKS
Executor runs Gate per Requirement ← SELF-CHECKS
Reviewer verifies HL §7 principles survived ← ENFORCES
0 cleanup phases needed ← no Phase D
What a TS looks like after (3 domain examples):
## §4 Acceptance Criteria
### AC-1: ETL covers all data sources
All 5 data sources produce records in the staging table.
- [ ] Row count per source > 0 after pipeline run
- [ ] No source returns only NULLs
Gate: query staging table → 5 sources present
### AC-2: Report reflects transformed data [depends: AC-1]
Summary report uses staging output, not raw data.
- [ ] Report totals match staging aggregates (±1% tolerance)
Gate: compare report vs staging query
### AC-3: Stakeholder review completed [depends: AC-2]
Report reviewed and approved by domain owner.
- [ ] Feedback incorporated or explicitly deferred with rationale
## §5 Technical Guidance
> Reference material, not instructions. Executor MAY deviate.
- Staging table: `analytics.staging_v2` (schema in KNOWLEDGE.md [D12](../../knowledge-index.md#architecture-decisions))
- Previous report used direct queries — this is the root cause of data drift
- Pattern: transform in staging, aggregate in report layer
## §6 Definition of Failure
- ❌ Report uses raw source tables directly → reject RF
- ❌ Any data source missing from staging with no documented reason
3.2 Value Flow¶
USER PAIN [TFW-41](HL-TFW-41__execution_quality_gates.md) CHANGE VALUE
───────── ────────── ─────
"Executor copies, TS template: Requirements Executor engineers
doesn't think" not code. §4 = WHAT, §5 = hints solutions
"Phase D cleanup Execution Loops: Gate per Errors caught at R_n,
phases needed" Requirement before next R_{n+1} not at review
"Coordinator writes Pre-TS Gate: read RF N-1 TS based on reality,
against own plan" before writing TS N not plan
"HL principles are Principles Check in TS + Principles enforced
decorative words" reviewer verification or explicitly N/A
"RF drifts from Pre-RF Gate: open template Template followed
template" before writing structurally
"Dependency graph Phase Dependencies in Any coordinator can
in coordinator's head" Master HL write any Phase TS
4. Phases¶
Phase Dependencies¶
graph LR
A[Phase A: Templates] --> B[Phase B: Workflow Gates]
A --> C[Phase C: Research Templates]
B --> D[Phase D: Glossary + Adapters]
C --> D
| Phase | Depends on | Shared files | Can run in parallel with |
|---|---|---|---|
| A | Independent | — | — |
| B | A (new TS template) | conventions.md |
C |
| C | A (conventions for terminology origin) | glossary.md |
B |
| D | B + C | glossary.md, adapters |
— |
Phase A: TS Template + Conventions 🔴¶
Requires: Independent
Context for coordinator: 1.
.tfw/templates/TS.md— current TS template to rewrite 2..tfw/templates/HL.md— add Phase Dependencies section 3..tfw/conventions.md§14 — anti-patterns list to extend 4. RES iter1 DR1, DR6 — research decisions on TS structure 5. Session notes — Problems 4-8 analysisKey decisions (from research): - DR1: TS §4 → Acceptance Criteria (verifiable requirements), §5 → Technical Guidance (context, patterns, constraints — NOT implementation) - DR6: HL §7 Principles → TS AC mapping table (mandatory). Each TS contains table: Principle → AC item → Gate - D3: TS §6 = Definition of Failure (hard reject conditions) - D5: Cross-Phase Modifications table for multi-phase tasks - AC dependency annotation:
[depends: AC-X]— coordinator explicitly marks dependent AC itemsDeliverables: 1. Rewrite
TS.mdtemplate: Requirements-first structure (§4 AC with[depends]annotation, §5 Technical Guidance, §6 Definition of Failure, Principles Check table) 2. Add 4 anti-patterns toconventions.md§14 3. Add Phase Dependencies section toHL.mdtemplate
Phase B: Workflow Gates 🔴¶
Requires: Phase A ✅
⚠️ Shared files with Phase C:
conventions.md(A adds anti-patterns, C adds terminology origin)Context for coordinator: 1.
.tfw/workflows/handoff.md— execution workflow 2..tfw/workflows/plan.md— planning workflow 3..tfw/workflows/review.md— review workflow 4..tfw/templates/ONB.md— ONB template 5. RES iter1 DR2, DR3 — Pre-TS Gate and Execution Loops decisionsKey decisions (from research): - DR2: Pre-TS Gate: coordinator reads RF of latest completed phase in dependency chain before writing next TS - DR3: Execution Loops: mandatory when AC items have
[depends]annotations. Independent ACs → linear. Threshold = dependency, not count. - D6: Pre-RF Gate in handoff: executor opens template before writing RF - D9: Coordinator ONB answer rule: no source → give options, don't decide - D10: Reviewer checks HL §7 principles in Judge phase - D11: Session Naming convention: every workflow session starts with naming (Role | Task ID | Phase)Deliverables: 1. Add Pre-RF Gate + Execution Loops (dependency-based, triggered by
[depends]) tohandoff.md2. Add Pre-TS Gate toplan.md3. Add coordinator answer protocol to ONB handling inhandoff.md4. Add HL §7 principles verification toreview.md5. Add Session Naming Step 0 to all workflows
Phase C: Research Templates — Embedded Dimensional Analysis 🔴¶
Requires: Independent (only needs
conventions.mdfor terminology origin note)⚠️ Shared files with Phase D:
glossary.mdContext for coordinator: 1.
.tfw/templates/research/gather.md— add Dimensions section 2..tfw/templates/research/extract.md— add Configuration Space section 3..tfw/templates/research/challenge.md— add Consistency Check section 4..tfw/workflows/research/base.md— add dimensional analysis thread 5. RES iter2 DR7-DR13 — embedded dimensional analysis decisionsKey decisions (from research): - DR7: Supersedes DR4/DR5. Embedded analysis across stages, not "mandatory Zwicky Box" - DR8: Gather →
## Dimensions(independent decision factors, ≥3 alternatives, no "recommended") - DR9: Extract →## Configuration Space(cross-reference of Gather dimensions, no evaluation yet) - DR10: Challenge →## Consistency Check(pairwise incompatibility, surviving configurations) - DR11: Native terminology (Dimension, Alternative, Configuration Space, Consistency Check, Surviving Configuration). Glossary references Zwicky as origin. - DR12: Graceful degradation: <3 dimensions → comparison matrix - DR13: Workflow Step 5: 4-line dimensional analysis descriptionDeliverables: 1. Add
## Dimensionstogather.mdtemplate 2. Add## Configuration Spacetoextract.mdtemplate 3. Add## Consistency Checktochallenge.mdtemplate 4. Add dimensional analysis thread toresearch/base.mdworkflow 5. Add terminology origin note toconventions.md
Phase D: Glossary + Adapters Sync 🟡¶
Requires: Phase B + Phase C ✅
Context for coordinator: 1.
.tfw/glossary.md— needs new terms from Phase A, B, C 2. Adapter files — sync new workflow stepsDeliverables: 1. Add terms to glossary: Execution Loop, Pre-TS Gate, Pre-RF Gate, Principles Check, Definition of Failure (TS), Technical Guidance, Phase Dependencies, AC Dependency Annotation, Session Naming, Dimension, Alternative, Configuration Space, Consistency Check, Surviving Configuration 2. Sync adapters (claude-code, cursor, antigravity) with new workflow steps
5. Definition of Done (DoD)¶
- ✅ 1. TS template has §4 Acceptance Criteria (not Detailed Steps), §5 Technical Guidance (not implementation), §6 Definition of Failure
- ✅ 2. TS template has Principles Check table (HL §7 → AC mapping)
- ✅ 3. TS template has
[depends: AC-X]annotation syntax for AC dependency chains - ✅ 4. TS template has Cross-Phase Modifications table (for multi-phase)
- ✅ 5. HL template has Phase Dependencies section (mermaid + table)
- ✅ 6.
handoff.mdhas Pre-RF Gate (executor opens template before writing) - ✅ 7.
handoff.mdhas Execution Loops (triggered by[depends]annotations) - ✅ 8.
plan.mdhas Pre-TS Gate (read RF of latest completed phase before writing next TS) - ✅ 9.
handoff.mdhas coordinator ONB answer protocol (no source → options) - ✅ 10.
review.mdJudge phase checks HL §7 principles enforcement - ✅ 11.
conventions.md§14 has 4 new anti-patterns - ✅ 12. All workflows have Session Naming Step 0 (Role | Task ID | Phase)
- ✅ 13. Research templates have embedded dimensional analysis (Gather: Dimensions, Extract: Configuration Space, Challenge: Consistency Check)
- ✅ 14. Research workflow has dimensional analysis thread (4 lines in Step 5)
- ✅ 15. Glossary updated with all new terms
- ✅ 16. All adapters synced
6. Definition of Failure (DoF)¶
- ❌ 1. TS template still contains
§4 Detailed Steps— not shipped - ❌ 2. No Execution Loops in handoff — executor still linear without self-check
- ❌ 3. No Pre-TS Gate in plan — coordinator still writes against own plan
- ❌ 4. Principles Check absent or optional — HL §7 still decorative
- ❌ 5. Research templates use GMA terminology ("morphological box", "Zwicky") instead of native TFW terms
On failure: Revert to current templates. Analyze which gate was rejected and why.
7. Principles¶
- Gates over guidelines — A gate is a structural mechanism that forces verification. A guideline is text that can be skipped. Every critical handoff point gets a gate, not a guideline.
- Requirements, not implementation — TS describes WHAT the result should achieve and HOW to verify it. Never HOW to implement it. Implementation is the executor's job.
- Verify against fact, not plan — When writing TS for Phase N, read RF Phase N-1 (actual output), not TS Phase N-1 (planned output). Plan ≠ fact.
- Enforce or remove — If a principle is important enough to write in HL §7, it must have an AC with a gate. If it doesn't deserve a gate, remove it from §7 — don't leave decorative text.
- Executor as engineer, not copier — Technical Guidance in TS gives hints and patterns. The executor MUST think, adapt, and verify — not copy-paste from TS.
- Domain-agnostic by default — TFW serves any domain (code, analytics, writing, education, business). Examples and terminology in templates must not assume a specific domain.
7.2 Knowledge Citations¶
| # | Source | Item | How it applies |
|---|---|---|---|
| 1 | conventions.md §14 | Anti-patterns list | Extending with 4 new patterns from observations |
| 2 | conventions.md §3 | TS definition: "self-contained: inputs/outputs/constraints/DoD" | Reinforcing: TS should contain constraints and DoD, not implementation code |
| 3 | glossary.md | Scope Budget | Budget limits exist but TS content quality has no structural limit — this task adds quality gates |
| 4 | README.md Values | "The thinking is the product" | Executor must think (Requirements-first) not copy (code-in-TS) |
8. Dependencies¶
| Dependency | Status |
|---|---|
| No external dependencies | ✅ |
| Session notes (9 problems) | ✅ Documented |
| HD-16, HD-18 case studies | ✅ Analyzed |
9. Risks¶
| Risk | Probability | Impact | Mitigation |
|---|---|---|---|
| Over-constrained TS template — coordinators can't express nuance | Medium | High | Technical Guidance §5 gives flexible hint space |
| Execution Loops add token overhead for simple phases | Low | Low | Trigger only for AC items with cross-component dependencies (DR3) |
| Existing TS from other projects become "non-compliant" | Low | Low | Template change is forward-only — old TS still readable |
| Workflow word count exceeds 1200-word design rule | Medium | Medium | Split additions across workflows, compress existing text |
| Dimensional analysis sections produce simulation (HD-19 pattern) | Medium | High | Cross-stage dependency prevents simulation: Extract needs Gather dimensions (DR7). Native terminology avoids compliance trap (DR11) |
10. RESEARCH Case — ✅ COMPLETE (2 iterations)¶
Hypotheses (final verdicts)¶
| # | Hypothesis | Verdict | Evidence |
|---|---|---|---|
| H1 | Requirements-first TS reduces "Phase D cleanup" phases | ✅ SUPPORTED | HD-16/C (50KB code-in-TS) → Phase D needed. HD-18/C (11KB requirements-first) → no Phase D. Same project/coordinator/executor. |
| H2 | Pre-TS Gate (read RF N-1) eliminates plan≠fact drift errors | ✅ STRONGLY SUPPORTED | HD-18: 3/3 coordinator errors caused by reading own TS, not RF. All 3 preventable. |
| H3 | Execution Loops catch more issues than linear execution | ⚠️ CONDITIONAL | Loops catch dependent-chain failures (HD-16/C per_page). No value for independent items. Threshold = dependency, not count. |
| H4 | Zwicky Box improves research Extract quality | ⚠️ CONDITIONAL → ✅ (via H5) | HD-19 = decorative (all Alt 1, no CCA). TFW-41 live test = genuine (4 from 1024 survived CCA). Needs embedded approach, not instruction-based. |
| H5 | Zwicky steps can be distributed across stages naturally | ✅ SUPPORTED | GMA maps 1:1 to Gather→Extract→Challenge. Cross-stage dependency creates natural enforcement. Native terminology prevents compliance trap. |
Research Decisions (DR1-DR13)¶
| # | Decision | Iter |
|---|---|---|
| DR1 | TS §4 → Acceptance Criteria, §5 → Technical Guidance (NOT code) | 1 |
| DR2 | Pre-TS Gate: read RF of latest completed phase before writing next TS | 1 |
| DR3 | Execution Loops: dependency-based, not count-based | 1 |
| DR6 | HL §7 → TS AC mapping table (Principles Check) | 1 |
| DR7 | Supersedes DR4/DR5. Embedded dimensional analysis across stages | 2 |
| DR8 | Gather: add ## Dimensions |
2 |
| DR9 | Extract: add ## Configuration Space |
2 |
| DR10 | Challenge: add ## Consistency Check |
2 |
| DR11 | Native terminology, not GMA terminology | 2 |
| DR12 | <3 dimensions → comparison matrix (graceful degradation) | 2 |
| DR13 | Workflow Step 5: 4-line dimensional analysis description | 2 |
11. Strategic Insights (Planning + Research)¶
| # | Insight | Category | Source |
|---|---|---|---|
| S1 | "Слова без hard gate = декоративные слова" — HL principles that don't become AC are meaningless. This is the core insight driving the entire task. | philosophy | PO, HD-18 retrospective |
| S2 | TS detail level is inversely proportional to executor quality. More code in TS → less thinking by executor. HD-16 Phase C (50KB TS) produced worse results than Phase A (23KB). | process | PO + HD-16 analysis |
| S3 | Research and Review have structured iterations (stages, passes). Execution is linear (ONB → code → RF). Adding Execution Loops brings execution to the same structural maturity. | philosophy | PO, session discussion |
| S4 | Coordinator reads own plan instead of RF — this is a systematic error observed across HD-18 phases. Not carelessness — structural: workflow doesn't require reading RF. | process | PO, HD-18 coordinator self-assessment |
| S5 | Phase D of HD-16 (design system) has an excellent HL that demonstrates Requirements-first approach: deliverables with AC, gates with grep commands, Definition of Failure. This HL serves as the "gold standard" for what the new TS template should produce. | domain | PO + HD-16 Phase D analysis |
| S6 | Instructions produce compliance, heuristics produce analysis. "MUST do Zwicky Box" → researcher fills box to comply. Template questions that guide thinking → researcher naturally decomposes. Same mechanism as Requirements-first TS: requirements target behavior, code targets form. | philosophy | RES iter2 FC4, PO observation |
| S7 | Cross-stage dependencies are natural enforcement. When Extract needs Gather's Dimensions, researcher can't skip decomposition. Stronger than checkpoint gates (structural, not procedural). | process | RES iter2 FC5 |
| S8 | PO explicitly stated: «Меня не интересует код в ТС. Требования которые нельзя обойти.» Direct instruction to restructure TS from code delivery to requirements engineering. | stakeholder | RES iter1 SS1, PO F24 |
HL — TFW-41: Execution Quality Gates | 2026-04-20