title: "Gather — "What do we NOT know?"" source: "tasks/TFW-38__quality_enforcement/research/gather.md"

Gather — "What do we NOT know?"¶

Parent: HL-TFW-38 Goal: Empirical evidence on RF §6-8 skip rates, reviewer audit behavior, and Findings Map presence across real TFW projects.

Findings¶

G1: RF §6-8 Presence Audit (cross-project)¶

Corpus: 4 projects, ~100 RF files total.

Project	RF files	§6 Fact Candidates	§7 Strategic Insights
Helpdesk (12 RFs)	12	1 (PhaseG only)	1 (PhaseH only)
Steps-framework (40+ RFs)	40+	2 (TFW-21, TFW-27 PhaseB partial)	0
Auto-schedule (8+ RFs)	8+	0	0
Atamat (20+ RFs)	20+	0 (older TFW version)	0

Total skip rates across §6-8: - §6 Fact Candidates: present in ~3 of 80+ RFs → ~96% skip rate - §7 Strategic Insights: present in ~1 of 80+ RFs → ~99% skip rate - §8 Diagrams: present in ~0 of 80+ (confirmed zero) → 100% skip rate

Critical exception: The two most recent helpdesk RFs (PhaseG, PhaseH) are the ONLY ones with §6-8 fully filled. These were written 2026-04-14 — TODAY. This strongly suggests a recent user intervention or explicit prompting, not a workflow-driven behavior.

HD-4 PhaseH RF has all three sections: §6 (5 fact candidates), §7 (3 strategic insights), §8 (1 mermaid diagram). HD-3 PhaseG RF has §8 renamed as "Fact Candidates" (5 entries). Both files were written with the newer TFW templates active.

G2: RES Findings Map Presence Audit¶

Project	RES files	Findings Map present	Findings Map absent
Helpdesk	10	8 (all HD-2 iterations, all HD-3/G, HD-4 iter1)	2
Steps-framework	12+	2 (TFW-36 iter1, iter2 only)	10+
Auto-schedule	2	0	2

Helpdesk has 80% Findings Map presence. Steps-framework has ~15%. Auto-schedule has 0%.

Why helpdesk is different: Helpdesk was created AFTER the Findings Map was added to the RES template (TFW-32 Phase B). Older projects predate the section.

G3: Reviewer Audit Behavior¶

Pattern observed in REVIEW files:

No REVIEW file in any project contains an explicit "I independently verified X" audit step as a section. The word "verified" appears in checklist evidence only (e.g., "verified independently" as a checklist note).
TFW-36 Phase A REVIEW is the outlier — contains a self-critical "Reviewer Self-Assessment" section (lines 64-72) where the reviewer admits: "My initial review marked 'Data accuracy ✅' based on executor's self-reported fix. I did not: open knowledge_state.yaml to verify numbers independently, search for other unverified claims."
HD-3 Full Task REVIEW contains "Actual coverage % not verified independently" — the reviewer flags claims it couldn't check, but doesn't describe a verification protocol.
HD-3 PhaseF REVIEW has "Code audit" and "Acceptance Criteria Audit" as section names — suggesting the reviewer adopted audit language organically.
TFW-19 REVIEW has a "§3. Independent Verification" section — the only REVIEW with an explicit verification section.

Pattern: Reviewers who DO verify do so organically / per-reviewer personality. There is no workflow instruction that forces it.

G4: handoff.md Phase 3 Text Analysis¶

Current handoff.md Phase 3 (lines 72-79) says:

12. Create RF file... Must contain:
    - What was done (changes list with file paths)
    - Test results (pass/fail, output logs)
    - Known limitations or tech debt
    - Deviations from TS (if any, with justification)
    - Screenshots / logs if applicable
    - **Observations** (out-of-scope items noticed during work)

Missing: §6 Fact Candidates, §7 Strategic Insights, §8 Diagrams. The mention of "Fact Candidates" appears only in the blue info box below (lines 81-87), but as informational context, not as a mandatory section list item.

G5: External Research — LLM Instruction Following¶

Apple RLCF and TICK/STICK research (2025-2026) confirms: - Decomposing instructions into explicit, verifiable checklist items significantly improves model adherence - Models struggle with >10 constraints; explicit enumeration beats implicit template compliance - "Attention drift" — reasoning chains can cause models to lose sight of specific structural requirements - Industry consensus: verification is primary — checklist presence beats template hope

This directly supports H1: explicit enumeration in the workflow is the correct mechanism.

Checkpoint¶

Found	Remaining
§6-8 skip rate: 96-100% across projects	None — data is clear
Reviewer audit behavior: organic, not enforced	None
Findings Map: present mostly in newer projects	None
External evidence confirms explicit enumeration works	None

Sufficiency: - [x] External source used? (Apple RLCF, TICK/STICK research) - [x] Briefing gap closed? (All 4 problems empirically confirmed)

Stage complete: YES → User decision: ___