title: "Challenge — "What do we NOT expect?"" source: "tasks/TFW-41__execution_quality_gates/research/challenge.md"
Challenge — "What do we NOT expect?"¶
Parent: HL-TFW-41 Goal: Structural gates at every TFW handoff point — prevent failures, not just detect them.
Findings¶
C1: When Does Requirements-First TS FAIL? (H1 Stress Test)¶
Scenario 1: Novel implementation with no known pattern. Requirements say "WHAT" but the executor has never done anything similar. Without guidance (P1-V4 config), the executor makes bad architectural choices. Example: "Implement permissions cache with TTL" — executor builds Redis-backed solution when in-memory dict suffices (violating HL §7 P4 "No new abstractions").
Verdict: Requirements-first WITHOUT guidance fails for novel domains. The §5. Technical Guidance section (renamed from §4. Detailed Steps) is NOT "code" — it's context, patterns, constraints. The template must distinguish:
- §4 = Acceptance Criteria (WHAT, verifiable, gates)
- §5 = Technical Guidance (HOW context, patterns, NOT code — executor decides implementation)
Scenario 2: CSS/visual tasks without visual spec.
Requirements say "all hardcoded colors → CSS vars" but without a reference (concept.html), the executor doesn't know WHICH colors to use. Acceptance criteria grep '#[0-9a-fA-F]{6}' = 0 matches is necessary but insufficient — executor could replace with ugly vars that technically pass.
Verdict: Visual tasks need concept.html (F23) or screenshot reference IN ADDITION TO AC. The TS template should allow optional visual reference links in §5 for frontend-heavy phases.
Scenario 3: Small/trivial tasks (<3 requirements). For tasks like "delete junk file + run linter" — full requirements-first TS is overhead. HD-18/D had 5 deliverables but each was 1-line mechanical change. A full AC+gate per item adds process without catching anything.
Verdict: Scope budget already handles this (F14: <50 LOC cosmetic fixes bypass TFW). No template change needed — the existing gate (Hotfix-сессии обходят TFW gates) covers trivial cases.
C2: Execution Loops — The >3 Threshold (H3 Stress Test)¶
Question: Is >3 requirements the right threshold for mandatory Execution Loops?
Test against evidence:
| Case | # Requirements | Loops would have helped? |
|---|---|---|
| HD-16/C | 24 AC items | YES — per_page caught at R3, not at review |
| HD-18/C | 5 AC items | MAYBE — all passed anyway (requirements-first was sufficient) |
| HD-18/D | 6 AC items | NO — all mechanical, linear was fine |
| HD-9/A | 8 test items | YES — T7 resume was discovered during ONB, not at review |
Pattern: The threshold isn't about COUNT but about DEPENDENCY. HD-18/D had 6 items but they were independent (each file = separate fix). HD-16/C had items where R3 (per_page) depended on R1 (backend schema). HD-9/A had items where T7 (resume) depended on B1 (elapsed fix).
Revised threshold: Execution Loops are mandatory when any requirement has a cross-file or cross-component dependency. The executor must verify R_n gates before starting R_{n+1} when they share state. For independent requirements, linear execution is fine.
Practical test: At TS write time, coordinator marks each AC as [independent] or [depends: AC-X]. Dependent chains = mandatory loops. Independent items = linear.
C3: Zwicky Box — Are 5 Enforcement Rules Sufficient? (H4 Stress Test)¶
Testing each rule against HD-19 failure:
| Rule | Would it have prevented HD-19 failure? | Over-constraining? |
|---|---|---|
| R1: ≥3 values per parameter | ✅ YES — forces researcher to think beyond binary | No — 3 is minimum for genuine alternatives |
| R2: CCA pairwise table | ✅ YES — would have revealed D2×D3 dependency | Potentially — at 8 params × 4 values = 28 pairwise checks. Manageable? |
| R3: No "recommended" before CCA | ✅ YES — prevents confirmation bias | No — purely procedural |
| R4: ≥2 surviving configs | ✅ YES — prevents single-option validation | No — if only 1 survives CCA, that's genuine signal |
| R5: ≥1 non-obvious config | ⚠️ HARD TO ENFORCE — what is "non-obvious"? | YES — subjective. Reviewer can't verify "non-obvious" |
R4 revision: Change to "≥2 surviving configs OR explicit statement that CCA eliminated all but one, with the elimination log."
R5 revision: Replace with "document which configuration(s) you did NOT initially expect to survive CCA." This is verifiable — the researcher can state their pre-CCA expectation and compare.
R2 concern at scale: At 8 parameters with 4 values each, there are C(8,2)=28 parameter pairs, each needing up to 4×4=16 pairwise checks. That's 448 checks. Impractical.
Fix: CCA should be done at PARAMETER level (8 parameters → 28 pairs), not VALUE level. Question: "Are parameters P_i and P_j independent?" If not: "Which P_i values are inconsistent with which P_j values?" This reduces to 28 questions with selective deep-dives.
C4: Pre-TS Gate — Overhead vs Value (H2 Edge Case)¶
Scenario: Phase A (first phase, no RF N-1 to read). Pre-TS Gate says "read RF N-1 before writing TS." Phase A has no RF N-1. What does the coordinator read?
Answer: Phase A reads HL (already required) + RESEARCH artifact (if exists). Pre-TS Gate is N/A for Phase A — the gate applies only to Phase N where N > 1. This is already handled by the "Requires: Phase {N-1} ✅" field in the HL.
Scenario: Multi-phase with parallel phases (B and C independent of each other, both depend on A). Coordinator writes TS-B and TS-C after RF-A. Both should read RF-A. No issue — Pre-TS Gate applies to the LATEST completed RF in the dependency chain, not to a linear N-1.
Fix: Pre-TS Gate wording: "Read the RF of the most recent completed phase in this task's dependency chain."
C5: Template Enforcement — The Backward Compatibility Risk¶
Current TS template has §4. Detailed Steps. Changing it to §4. Acceptance Criteria + §5. Technical Guidance will break:
- All existing tasks in all projects that reference "TS §4 Detailed Steps" in their ONB/REVIEW
- Training data for coordinators who learned the current template
- REVIEW workflow that says "check TS §5 Acceptance Criteria" → would need renumbering
Mitigation: This is a versioned change (TFW v3 → v4?). The TS template change should be accompanied by: - VERSION bump - Migration note in CHANGELOG - Old §4→§5 numbering preserved in meaning (guidance section moves to §5) - Existing projects don't need to update — only new TS files use the new template
C6: When Enforcement Rules Become the Problem¶
Risk: The 5 Zwicky enforcement rules (E4-C in extract) could become bureaucratic overhead for research tasks where the solution space is genuinely simple (1-2 parameters, 2 values each).
Test: For a research question like "Should we use Redis or in-memory cache for permissions?" — Zwicky Box with 1 parameter and 2 values produces a 1×2 table. CCA is trivial. The rules mandate ≥3 values, forcing the researcher to invent a third option just to pass the gate.
Fix: Zwicky Box should be mandatory only when the research question has ≥3 independent parameters. For 1-2 parameter decisions, a simple comparison table (pros/cons) is sufficient. The workflow should specify: "If ≥3 parameters → Morphological Box. If 1-2 parameters → comparison matrix."
Checkpoint¶
| Found | Remaining |
|---|---|
| H1 FAILS when executor has no domain context → §5 Technical Guidance needed (but NOT code) | — |
| H3 threshold: not count-based but dependency-based. Loops mandatory for dependent AC chains | — |
| H4 R5 "non-obvious" is unverifiable → replace with "unexpected survivor" check | — |
| H4 CCA at scale: parameter-level, not value-level | — |
| Pre-TS Gate: N/A for Phase A, uses dependency chain not linear N-1 | — |
| Backward compatibility: version bump, no migration for existing projects | — |
| Zwicky Box: mandatory only for ≥3 parameters, comparison matrix for 1-2 | — |
Sufficiency: - [x] External source used? (N/A for Challenge — this stage uses internal evidence to stress-test) - [x] Briefing gap closed? All hypotheses stress-tested, edge cases mapped, thresholds refined
Deep mode criteria: - [x] Counter-evidence actively sought? Yes: Requirements-first failure modes (C1), loop threshold revision (C2), Zwicky rule over-constraining (C3, C6) - [x] Hypotheses revised based on challenge? H3 revised (count → dependency), H4 rules revised (R4, R5, scope threshold)
Stage complete: YES → Recommend: proceed to Synthesis. All hypotheses have verdicts. Challenge refined thresholds and fixed over-constraining rules. Ready to write final RES artifact.