Skip to content

RES β€” TFW-38: Quality Enforcement (Iteration 3)

Date: 2026-04-14 Author: Researcher Status: πŸ”¬ RES β€” Complete Parent HL: HL-TFW-38 Predecessors: RES iter 1, RES iter 2 Mode: Pipeline


Research Context

Iteration 3 addressed two user-directed threads: (A) TS over-specification audit β€” are coordinators writing implementation instead of requirements? (B) Complete naming refinement for review stages, modes, and all downstream references.

Decisions

# Decision Rationale
D11 Review stage names: Map β†’ Verify β†’ Judge β†’ Decide "Naming Creates Behavior" (TFW value): if you must explain it, it's named wrong. User rejected "comprehend" (unclear) and "assess" (vague, prefers "judge" β€” more intentional). All names are 1-2 syllables, active verbs from established disciplines (navigation, engineering, legal, management). Chain reads as sentence: "Map the work, Verify the claims, Judge the quality, Decide the verdict."
D12 Review mode names: code / docs / spec User rejected "prose" (meaningless). Modes describe output type (noun), not domain. All 1 syllable. "docs" replaces "prose" for writing/documentation tasks β€” clear, short, no namespace collision with project docs/ folders.
D13 TS over-specification is a tendency, not a systemic defect. Out of TFW-38 scope. 4 TS sampled: 49-71% code. Coordinators write full code for complex/critical logic (SLA algorithms, security), signatures for standard patterns, pseudocode for UI. This varies naturally. The issue is missing guidance on WHEN to use which level (L1-L4). Recommend separate task for TS template conventions.

Hypothesis Status (Cumulative)

# Hypothesis Status Iteration
H1 Explicit §6-8 enumeration stops skipping 🟒 confirmed 1
H2 Audit step changes reviewer behavior 🟒 superseded by H3 1
H3 Domain-specific review stages produce more reliable reviews 🟒 confirmed 2
H4 "Naming Creates Behavior" applied to review stages eliminates comprehension friction 🟒 confirmed 3 (user test: "comprehend" fails, "map" passes)

Final Naming Inventory

Review Stages

Stage Name REVIEW Section Cognitive Mode Verbs Chain
1 Map Β§1 Map "Do I understand what was done?" Map the work
2 Verify Β§2 Verify "Are the claims true?" Verify the claims
3 Judge Β§3 Judge "Is the quality sufficient?" Judge the quality
4 Decide Β§4 Decide "What's the verdict?" Decide the verdict

Review Modes

Mode Tasks Checklist Items Verify Actions
code Implementation 10 items (6 universal + 4 domain) Spot-check 2-3 files, check tests
docs Writing, docs, design 8 items (6 universal + 2 domain) Verify deliverable existence, check structure
spec Analytical, research 8 items (6 universal + 2 domain) Verify deliverables, check source citations

File Paths

Entity Path
Review workflow .tfw/workflows/review.md
Code mode file .tfw/workflows/review/code.md
Docs mode file .tfw/workflows/review/docs.md
Spec mode file .tfw/workflows/review/spec.md
REVIEW template .tfw/templates/REVIEW.md
Config key tfw.review.default_mode

Cross-Reference: Review vs Research Naming

Research Review Structural Parallel
Briefing Map Prepare/understand
Gather Verify Collect evidence
Extract Judge Apply cognitive framework
Challenge β€” (no challenge in review)
RES Decide Synthesize conclusion

HL Update Recommendations

# What to update Source
1 Phase A: rename all stage references from Comprehend/Verify/Assess/Synthesize to Map/Verify/Judge/Decide D11
2 Phase A: rename mode "prose" to "docs" D12
3 Add new task recommendation to HL Β§7 Parking Lot: "TS Template Conventions β€” L1-L4 specification level guidance" D13, G2-G6
4 HL Β§10: add H4 (naming validation) D11

Fact Candidates

# Category Candidate Source Confidence
F10 convention Review stages: Map β†’ Verify β†’ Judge β†’ Decide. All 1-2 syllables, active verbs, from navigation/engineering/legal/management disciplines. Validated by TFW "Naming Creates Behavior" principle: "comprehend" failed user test (unclear), "assess" failed user test (too vague), "judge" passed (intentional, direct). User, 2026-04-14, Challenge C1-C5 High
F11 convention Review modes: code / docs / spec. All 1-syllable nouns describing output type. "prose" rejected by user (meaningless). Modes configure the Judge-stage checklist, not the domain. User, 2026-04-14, G10, C4 High
F12 process TS files across helpdesk and atamat contain 49-71% ready-to-paste code (average 58%). Coordinators write full implementation (L4) for complex/critical logic and method signatures (L3) for standard patterns. This is natural risk-averse behavior, not pathological. Missing: explicit guidance on when to use L1-L4 in TS template. G2-G6, C6, 4 TS sampled High
F13 philosophy User: "ΠŸΡ€ΠΎΠ²Π΅Ρ€ΠΈΡ‚ΡŒ Ρ‡Ρ‚ΠΎ ΠΊΠΎΠΎΡ€Π΄ΠΈΠ½Π°Ρ‚ΠΎΡ€ Π² Π’Π‘ Π½Π΅ ставит Π·Π°Π΄Π°Ρ‡Ρƒ слишком Π΄Π΅Ρ‚Π°Π»ΡŒΠ½ΠΎ, Ρ‚Π°ΠΊ Ρ‡Ρ‚ΠΎ ΡƒΠΆΠ΅ ΠΏΠΎΡ‡Ρ‚ΠΈ выполняСт Π΅Ρ‘. ΠžΡΡ‚Π°Π²Π»ΡΠ΅Ρ‚ Π»ΠΈ ΠΎΠ½ Π½Π° ΠΊΡ€Π΅Π°Ρ‚ΠΈΠ² исполнитСля." TS over-specification suppresses executor creativity and makes ONB trivial. Token double-spend when coordinator writes code without testing and executor copies it. User, 2026-04-14 High

Strategic Insights

# Category Insight Source Confidence
SS2 process TS content quality has a specification level spectrum (L1: Goal β†’ L2: Requirement β†’ L3: Design β†’ L4: Implementation). Most TS files are at L4 for complex steps and L2-L3 for simple steps. The missing piece is explicit guidance: templates should recommend which specification level to use based on step complexity. This is a separate methodology improvement, not a quality enforcement fix. G2-G6, E3-E4, C6 β˜…β˜…β˜…

Findings Map

ITERATION 3 SCOPE
══════════════════

Thread A: TS Over-Spec                  Thread B: Naming
────────────────────                    ────────────────

4 TS sampled:                           User feedback:
HD PhaseD: 71% code                     ❌ "comprehend" = unclear
HD PhaseF: 56% code                     ❌ "prose" = meaningless
AT PhaseH: 55% code                     βœ… "judge" = intentional
AT PhaseF2: 49% code                    
     ↓                                       ↓
Average: 58% code                       "Naming Creates Behavior"
     ↓                                  filter applied:
Tendency, not defect                         ↓
Complex β†’ L4 (natural)                  Stages: Map β†’ Verify β†’ Judge β†’ Decide
Simple β†’ L2-L3 (natural)               Modes:  code / docs / spec
     ↓                                       ↓
Missing: L1-L4 guidance                All 1-2 syllables
β†’ Separate task, not [TFW-38](HL-TFW-38__quality_enforcement.md)            All active, need no explanation
                                       βœ… User test passed

CUMULATIVE RESEARCH (3 iterations)
══════════════════════════════════

Iter 1: Template-workflow disconnect β†’ Β§6-8 skip rate 96-100%
        Fix: explicit enumeration in handoff.md ([D1](../../knowledge-index.md#architecture-decisions))
        Reviewer trust-chain β†’ audit step needed ([D2](../../knowledge-index.md#architecture-decisions))

Iter 2: Review should have staged structure ([D6](../../knowledge-index.md#architecture-decisions))
        3 modes based on output type ([D7](../../knowledge-index.md#architecture-decisions))
        Stages as REVIEW sections, not files ([D8](../../knowledge-index.md#architecture-decisions))
        Template must restructure to match ([D9](../../knowledge-index.md#architecture-decisions))
        Diagrams β†’ index, not copy ([D10](../../knowledge-index.md#architecture-decisions))

Iter 3: Stages renamed: Map β†’ Verify β†’ Judge β†’ Decide ([D11](../../knowledge-index.md#architecture-decisions))
        Modes renamed: code β†’ docs β†’ spec ([D12](../../knowledge-index.md#architecture-decisions))
        TS over-spec = tendency, separate task ([D13](../../knowledge-index.md#architecture-decisions))

Iteration Status

  • Iteration: 3 of 2 (min) / 3 (max)
  • Hypotheses tested: H1 (🟒), H2 (πŸŸ’β†’H3), H3 (🟒), H4 (🟒)
  • Hypotheses deferred: None
  • Gaps discovered: TS specification level guidance needed (out of scope)
  • Superseded decisions: D7 mode "prose" β†’ D12 mode "docs"

Open Threads

# Thread Status
1 Staged review design βœ… Resolved β€” D6, D8, D9, D11
2 Review modes βœ… Resolved β€” D7β†’D12
3 Naming validation βœ… Resolved β€” D11, D12, user test
4 TS over-specification βœ… Resolved β€” D13, out of scope, separate task
5 Diagram indexing βœ… Resolved β€” D10

Recommendation

  • [x] SUFFICIENT β€” proceed to /tfw-plan to update HL and write TS
  • [ ] MORE NEEDED

3 iterations completed (at max). All hypotheses tested. All open threads resolved. Naming validated by user. TS over-spec scoped out cleanly. Ready for TS specification.

Conclusion

Three research iterations produced a comprehensive evidence base for TFW-38 quality enforcement. Iteration 1 empirically confirmed the template-workflow disconnect (96-100% Β§6-8 skip rate across 80+ files) and the reviewer trust-chain failure. Iteration 2 designed a 4-stage domain-agnostic review flow with 3 output-type modes. Iteration 3 refined naming via the TFW's own "Naming Creates Behavior" principle β€” final stages Map β†’ Verify β†’ Judge β†’ Decide, modes code / docs / spec β€” and identified the TS over-specification tendency as a separate improvement track (L1-L4 guidance). The research is ready for HL update and TS specification.


RES β€” TFW-38: Quality Enforcement (Iteration 3) | 2026-04-14