HL — TFW-41: Execution Quality Gates¶

Date: 2026-04-20 Author: Coordinator (AI) + PO Status: 📝 HL_DRAFT — Awaiting review

1. Vision¶

TFW pipelines produce traces but don't prevent quality failures at their source. Executors copy code from over-detailed TS instead of engineering solutions. Coordinators write TS based on their plans instead of the actual output of the previous phase. HL principles exist as text without enforcement mechanisms. After this task, every critical handoff point in TFW has a structural gate — not instructions, not guidelines, but mechanisms that make it harder to fail silently.

Impact: Fewer Phase D "cleanup" phases. Executors think instead of copy. Coordinators verify against reality. HL principles survive into implementation.

"Слова без hard gate = декоративные слова." — PO, HD-18 retrospective

2. Current State (As-Is)¶

9 observed problems from production use (HD-9, HD-16, HD-18), documented in session notes:

#	Problem	Source	Severity
1	RF drifts from template — executor writes from memory	HD-9	Medium
2	Coordinator answers ONB questions confidently without sources	HD-9	Medium
3	Sessions have no names — impossible to navigate	All	Low
4	TS = algorithm without "why" — executor follows steps, doesn't check result	HD-16	High
5	Over-detailed TS = copy-paste executor (50KB of code in TS)	HD-16	Critical
6	HL §7 principles without enforcement — "decorative words"	HD-18	High
7	Coordinator verifies plan, not output of previous phase (plan≠fact drift)	HD-18	Critical
8	Cross-phase test ownership unclear — rename breaks traceability	HD-18	Medium
9	Phase dependency graph only in coordinator's head	HD-18	Medium

Root cause taxonomy¶

Three clusters:

CLUSTER 1: Agent acts on inertia (#1-3)
  Agent doesn't check context before key action.
  Fix pattern: explicit "check before acting" gates

CLUSTER 2: TS doesn't let executor think (#4-5)
  TS contains ready code → executor copies instead of engineers.
  Fix pattern: Requirements-first TS

CLUSTER 3: Coordinator doesn't verify context (#6-9)
  Coordinator writes against plan, not reality.
  Fix pattern: Pre-TS gates, enforcement tracing

Key evidence: HD-16 case study¶

Phase	TS size	Content	Result
A	23 KB, 534 lines	Step-by-step code	✅ 22/22 AC
B	30 KB, 842 lines	Full TypeScript files	✅ 16/16 AC. BUT: palette not designed, 89 hardcoded colors untouched
C	50 KB, 1280 lines	Complete components copy-paste ready	✅ 23/24 AC. per_page=100 vs spec 500 (backend cap unnoticed)
D	20 KB	Created to fix B+C failures	HL explicitly states root cause: "TS contained code, not requirements"

Key evidence: HD-18 self-assessment¶

Coordinator scored own TS quality 7/10 (improvement over HD-16). But 3 errors reached ONB: - Wrong function signature (copied from other workers, not verified) - Wrong worker count (didn't count create_task calls) - Referenced already-updated test (read own TS, not RF of previous phase)

3. Target State (To-Be)¶

3.1 Result Visualization¶

Imagine it's 3 months after TFW-41 ships. A coordinator writes TS for Phase C of a multi-phase task:

BEFORE (current):
  Coordinator opens TS Phase B (own plan)     ← reads PLAN
  Writes Phase C TS with code blocks           ← writes CODE
  Executor copies code from TS                 ← COPIES
  Reviewer finds: "palette not designed"       ← Phase D needed
  3 TS errors reach ONB                        ← waste

AFTER ([TFW-41](HL-TFW-41__execution_quality_gates.md)):
  Coordinator opens RF Phase B (actual output)  ← reads FACT
  Writes Phase C TS with Requirements + Gates   ← writes WHAT, not HOW
  Executor reads Requirements, engineers solution ← THINKS
  Executor runs Gate per Requirement             ← SELF-CHECKS
  Reviewer verifies HL §7 principles survived    ← ENFORCES
  0 cleanup phases needed                        ← no Phase D

What a TS looks like after (3 domain examples):

## §4 Acceptance Criteria

### AC-1: ETL covers all data sources
  All 5 data sources produce records in the staging table.
  - [ ] Row count per source > 0 after pipeline run
  - [ ] No source returns only NULLs
  Gate: query staging table → 5 sources present

### AC-2: Report reflects transformed data  [depends: AC-1]
  Summary report uses staging output, not raw data.
  - [ ] Report totals match staging aggregates (±1% tolerance)
  Gate: compare report vs staging query

### AC-3: Stakeholder review completed  [depends: AC-2]
  Report reviewed and approved by domain owner.
  - [ ] Feedback incorporated or explicitly deferred with rationale

## §5 Technical Guidance
> Reference material, not instructions. Executor MAY deviate.
- Staging table: `analytics.staging_v2` (schema in KNOWLEDGE.md [D12](../../knowledge-index.md#architecture-decisions))
- Previous report used direct queries — this is the root cause of data drift
- Pattern: transform in staging, aggregate in report layer

## §6 Definition of Failure
- ❌ Report uses raw source tables directly → reject RF
- ❌ Any data source missing from staging with no documented reason

3.2 Value Flow¶

USER PAIN                    [TFW-41](HL-TFW-41__execution_quality_gates.md) CHANGE                    VALUE
─────────                    ──────────                       ─────
"Executor copies,            TS template: Requirements        Executor engineers
 doesn't think"              not code. §4 = WHAT, §5 = hints  solutions

"Phase D cleanup             Execution Loops: Gate per        Errors caught at R_n,
 phases needed"              Requirement before next R_{n+1}  not at review

"Coordinator writes          Pre-TS Gate: read RF N-1         TS based on reality,
 against own plan"           before writing TS N              not plan

"HL principles are           Principles Check in TS +         Principles enforced
 decorative words"           reviewer verification            or explicitly N/A

"RF drifts from              Pre-RF Gate: open template       Template followed
 template"                   before writing                   structurally

"Dependency graph             Phase Dependencies in           Any coordinator can
 in coordinator's head"      Master HL                        write any Phase TS

4. Phases¶

Phase Dependencies¶

graph LR
  A[Phase A: Templates] --> B[Phase B: Workflow Gates]
  A --> C[Phase C: Research Templates]
  B --> D[Phase D: Glossary + Adapters]
  C --> D

Phase	Depends on	Shared files	Can run in parallel with
A	Independent	—	—
B	A (new TS template)	`conventions.md`	C
C	A (conventions for terminology origin)	`glossary.md`	B
D	B + C	`glossary.md`, adapters	—

Phase A: TS Template + Conventions 🔴¶

Requires: Independent

Context for coordinator: 1. .tfw/templates/TS.md — current TS template to rewrite 2. .tfw/templates/HL.md — add Phase Dependencies section 3. .tfw/conventions.md §14 — anti-patterns list to extend 4. RES iter1 DR1, DR6 — research decisions on TS structure 5. Session notes — Problems 4-8 analysis

Key decisions (from research): - DR1: TS §4 → Acceptance Criteria (verifiable requirements), §5 → Technical Guidance (context, patterns, constraints — NOT implementation) - DR6: HL §7 Principles → TS AC mapping table (mandatory). Each TS contains table: Principle → AC item → Gate - D3: TS §6 = Definition of Failure (hard reject conditions) - D5: Cross-Phase Modifications table for multi-phase tasks - AC dependency annotation: [depends: AC-X] — coordinator explicitly marks dependent AC items

Deliverables: 1. Rewrite TS.md template: Requirements-first structure (§4 AC with [depends] annotation, §5 Technical Guidance, §6 Definition of Failure, Principles Check table) 2. Add 4 anti-patterns to conventions.md §14 3. Add Phase Dependencies section to HL.md template

Phase B: Workflow Gates 🔴¶

Requires: Phase A ✅

⚠️ Shared files with Phase C: conventions.md (A adds anti-patterns, C adds terminology origin)

Context for coordinator: 1. .tfw/workflows/handoff.md — execution workflow 2. .tfw/workflows/plan.md — planning workflow 3. .tfw/workflows/review.md — review workflow 4. .tfw/templates/ONB.md — ONB template 5. RES iter1 DR2, DR3 — Pre-TS Gate and Execution Loops decisions

Key decisions (from research): - DR2: Pre-TS Gate: coordinator reads RF of latest completed phase in dependency chain before writing next TS - DR3: Execution Loops: mandatory when AC items have [depends] annotations. Independent ACs → linear. Threshold = dependency, not count. - D6: Pre-RF Gate in handoff: executor opens template before writing RF - D9: Coordinator ONB answer rule: no source → give options, don't decide - D10: Reviewer checks HL §7 principles in Judge phase - D11: Session Naming convention: every workflow session starts with naming (Role | Task ID | Phase)

Deliverables: 1. Add Pre-RF Gate + Execution Loops (dependency-based, triggered by [depends]) to handoff.md 2. Add Pre-TS Gate to plan.md 3. Add coordinator answer protocol to ONB handling in handoff.md 4. Add HL §7 principles verification to review.md 5. Add Session Naming Step 0 to all workflows

Phase C: Research Templates — Embedded Dimensional Analysis 🔴¶

Requires: Independent (only needs conventions.md for terminology origin note)

⚠️ Shared files with Phase D: glossary.md

Context for coordinator: 1. .tfw/templates/research/gather.md — add Dimensions section 2. .tfw/templates/research/extract.md — add Configuration Space section 3. .tfw/templates/research/challenge.md — add Consistency Check section 4. .tfw/workflows/research/base.md — add dimensional analysis thread 5. RES iter2 DR7-DR13 — embedded dimensional analysis decisions

Key decisions (from research): - DR7: Supersedes DR4/DR5. Embedded analysis across stages, not "mandatory Zwicky Box" - DR8: Gather → ## Dimensions (independent decision factors, ≥3 alternatives, no "recommended") - DR9: Extract → ## Configuration Space (cross-reference of Gather dimensions, no evaluation yet) - DR10: Challenge → ## Consistency Check (pairwise incompatibility, surviving configurations) - DR11: Native terminology (Dimension, Alternative, Configuration Space, Consistency Check, Surviving Configuration). Glossary references Zwicky as origin. - DR12: Graceful degradation: <3 dimensions → comparison matrix - DR13: Workflow Step 5: 4-line dimensional analysis description

Deliverables: 1. Add ## Dimensions to gather.md template 2. Add ## Configuration Space to extract.md template 3. Add ## Consistency Check to challenge.md template 4. Add dimensional analysis thread to research/base.md workflow 5. Add terminology origin note to conventions.md

Phase D: Glossary + Adapters Sync 🟡¶

Requires: Phase B + Phase C ✅

Context for coordinator: 1. .tfw/glossary.md — needs new terms from Phase A, B, C 2. Adapter files — sync new workflow steps

Deliverables: 1. Add terms to glossary: Execution Loop, Pre-TS Gate, Pre-RF Gate, Principles Check, Definition of Failure (TS), Technical Guidance, Phase Dependencies, AC Dependency Annotation, Session Naming, Dimension, Alternative, Configuration Space, Consistency Check, Surviving Configuration 2. Sync adapters (claude-code, cursor, antigravity) with new workflow steps

5. Definition of Done (DoD)¶

✅ 1. TS template has §4 Acceptance Criteria (not Detailed Steps), §5 Technical Guidance (not implementation), §6 Definition of Failure
✅ 2. TS template has Principles Check table (HL §7 → AC mapping)
✅ 3. TS template has [depends: AC-X] annotation syntax for AC dependency chains
✅ 4. TS template has Cross-Phase Modifications table (for multi-phase)
✅ 5. HL template has Phase Dependencies section (mermaid + table)
✅ 6. handoff.md has Pre-RF Gate (executor opens template before writing)
✅ 7. handoff.md has Execution Loops (triggered by [depends] annotations)
✅ 8. plan.md has Pre-TS Gate (read RF of latest completed phase before writing next TS)
✅ 9. handoff.md has coordinator ONB answer protocol (no source → options)
✅ 10. review.md Judge phase checks HL §7 principles enforcement
✅ 11. conventions.md §14 has 4 new anti-patterns
✅ 12. All workflows have Session Naming Step 0 (Role | Task ID | Phase)
✅ 13. Research templates have embedded dimensional analysis (Gather: Dimensions, Extract: Configuration Space, Challenge: Consistency Check)
✅ 14. Research workflow has dimensional analysis thread (4 lines in Step 5)
✅ 15. Glossary updated with all new terms
✅ 16. All adapters synced

6. Definition of Failure (DoF)¶

❌ 1. TS template still contains §4 Detailed Steps — not shipped
❌ 2. No Execution Loops in handoff — executor still linear without self-check
❌ 3. No Pre-TS Gate in plan — coordinator still writes against own plan
❌ 4. Principles Check absent or optional — HL §7 still decorative
❌ 5. Research templates use GMA terminology ("morphological box", "Zwicky") instead of native TFW terms

On failure: Revert to current templates. Analyze which gate was rejected and why.

7. Principles¶

Gates over guidelines — A gate is a structural mechanism that forces verification. A guideline is text that can be skipped. Every critical handoff point gets a gate, not a guideline.
Requirements, not implementation — TS describes WHAT the result should achieve and HOW to verify it. Never HOW to implement it. Implementation is the executor's job.
Verify against fact, not plan — When writing TS for Phase N, read RF Phase N-1 (actual output), not TS Phase N-1 (planned output). Plan ≠ fact.
Enforce or remove — If a principle is important enough to write in HL §7, it must have an AC with a gate. If it doesn't deserve a gate, remove it from §7 — don't leave decorative text.
Executor as engineer, not copier — Technical Guidance in TS gives hints and patterns. The executor MUST think, adapt, and verify — not copy-paste from TS.
Domain-agnostic by default — TFW serves any domain (code, analytics, writing, education, business). Examples and terminology in templates must not assume a specific domain.

7.2 Knowledge Citations¶

#	Source	Item	How it applies
1	conventions.md §14	Anti-patterns list	Extending with 4 new patterns from observations
2	conventions.md §3	TS definition: "self-contained: inputs/outputs/constraints/DoD"	Reinforcing: TS should contain constraints and DoD, not implementation code
3	glossary.md	Scope Budget	Budget limits exist but TS content quality has no structural limit — this task adds quality gates
4	README.md Values	"The thinking is the product"	Executor must think (Requirements-first) not copy (code-in-TS)

8. Dependencies¶

Dependency	Status
No external dependencies	✅
Session notes (9 problems)	✅ Documented
HD-16, HD-18 case studies	✅ Analyzed

9. Risks¶

Risk	Probability	Impact	Mitigation
Over-constrained TS template — coordinators can't express nuance	Medium	High	Technical Guidance §5 gives flexible hint space
Execution Loops add token overhead for simple phases	Low	Low	Trigger only for AC items with cross-component dependencies (DR3)
Existing TS from other projects become "non-compliant"	Low	Low	Template change is forward-only — old TS still readable
Workflow word count exceeds 1200-word design rule	Medium	Medium	Split additions across workflows, compress existing text
Dimensional analysis sections produce simulation (HD-19 pattern)	Medium	High	Cross-stage dependency prevents simulation: Extract needs Gather dimensions (DR7). Native terminology avoids compliance trap (DR11)

10. RESEARCH Case — ✅ COMPLETE (2 iterations)¶

Hypotheses (final verdicts)¶

#	Hypothesis	Verdict	Evidence
H1	Requirements-first TS reduces "Phase D cleanup" phases	✅ SUPPORTED	HD-16/C (50KB code-in-TS) → Phase D needed. HD-18/C (11KB requirements-first) → no Phase D. Same project/coordinator/executor.
H2	Pre-TS Gate (read RF N-1) eliminates plan≠fact drift errors	✅ STRONGLY SUPPORTED	HD-18: 3/3 coordinator errors caused by reading own TS, not RF. All 3 preventable.
H3	Execution Loops catch more issues than linear execution	⚠️ CONDITIONAL	Loops catch dependent-chain failures (HD-16/C per_page). No value for independent items. Threshold = dependency, not count.
H4	Zwicky Box improves research Extract quality	⚠️ CONDITIONAL → ✅ (via H5)	HD-19 = decorative (all Alt 1, no CCA). TFW-41 live test = genuine (4 from 1024 survived CCA). Needs embedded approach, not instruction-based.
H5	Zwicky steps can be distributed across stages naturally	✅ SUPPORTED	GMA maps 1:1 to Gather→Extract→Challenge. Cross-stage dependency creates natural enforcement. Native terminology prevents compliance trap.

Research Decisions (DR1-DR13)¶

#	Decision	Iter
DR1	TS §4 → Acceptance Criteria, §5 → Technical Guidance (NOT code)	1
DR2	Pre-TS Gate: read RF of latest completed phase before writing next TS	1
DR3	Execution Loops: dependency-based, not count-based	1
DR6	HL §7 → TS AC mapping table (Principles Check)	1
DR7	Supersedes DR4/DR5. Embedded dimensional analysis across stages	2
DR8	Gather: add `## Dimensions`	2
DR9	Extract: add `## Configuration Space`	2
DR10	Challenge: add `## Consistency Check`	2
DR11	Native terminology, not GMA terminology	2
DR12	<3 dimensions → comparison matrix (graceful degradation)	2
DR13	Workflow Step 5: 4-line dimensional analysis description	2

11. Strategic Insights (Planning + Research)¶

#	Insight	Category	Source
S1	"Слова без hard gate = декоративные слова" — HL principles that don't become AC are meaningless. This is the core insight driving the entire task.	philosophy	PO, HD-18 retrospective
S2	TS detail level is inversely proportional to executor quality. More code in TS → less thinking by executor. HD-16 Phase C (50KB TS) produced worse results than Phase A (23KB).	process	PO + HD-16 analysis
S3	Research and Review have structured iterations (stages, passes). Execution is linear (ONB → code → RF). Adding Execution Loops brings execution to the same structural maturity.	philosophy	PO, session discussion
S4	Coordinator reads own plan instead of RF — this is a systematic error observed across HD-18 phases. Not carelessness — structural: workflow doesn't require reading RF.	process	PO, HD-18 coordinator self-assessment
S5	Phase D of HD-16 (design system) has an excellent HL that demonstrates Requirements-first approach: deliverables with AC, gates with grep commands, Definition of Failure. This HL serves as the "gold standard" for what the new TS template should produce.	domain	PO + HD-16 Phase D analysis
S6	Instructions produce compliance, heuristics produce analysis. "MUST do Zwicky Box" → researcher fills box to comply. Template questions that guide thinking → researcher naturally decomposes. Same mechanism as Requirements-first TS: requirements target behavior, code targets form.	philosophy	RES iter2 FC4, PO observation
S7	Cross-stage dependencies are natural enforcement. When Extract needs Gather's Dimensions, researcher can't skip decomposition. Stronger than checkpoint gates (structural, not procedural).	process	RES iter2 FC5
S8	PO explicitly stated: «Меня не интересует код в ТС. Требования которые нельзя обойти.» Direct instruction to restructure TS from code delivery to requirements engineering.	stakeholder	RES iter1 SS1, PO F24

HL — TFW-41: Execution Quality Gates | 2026-04-20