Briefing¶
Parent: HL-TFW-38 Goal: Empirical validation that TFW agents skip RF §6-8, reviewers rubber-stamp, and researchers omit Findings Maps — across real projects.
Research Plan¶
Gather¶
- Scan RF files from helpdesk, vlm-3, content marketing, and other active TFW projects
- Check which RF sections (§6 Fact Candidates, §7 Strategic Insights, §8 Diagrams) are present vs absent
- Scan REVIEW files — check if reviewers provide evidence or just checkmarks
- Scan RES files — check if Findings Map sections are present
Extract¶
- Analyze handoff.md Phase 3 (lines 71-107) — what does it explicitly mention vs what it omits
- Analyze review.md Step 1 — is there any verification/audit instruction?
- Analyze research/base.md Step 6 — does it mention Findings Map?
- Cross-reference: do templates contain sections that workflows don't mention?
Challenge¶
- Counter-hypothesis: maybe agents DO fill §6-8 when prompted well — check for positive examples
- Challenge the "workflow > template" claim — search for external evidence on LLM prompt-following behavior
- Check if audit-style review creates overhead that makes it impractical for trivial tasks
Hypotheses (from HL §10)¶
| # | Hypothesis | HL Status |
|---|---|---|
| H1 | Adding explicit §6-8 enumeration to handoff.md Phase 3 will stop executors from skipping them | open |
| H2 | An audit verification step in review.md will change reviewer behavior from "trust" to "verify" | open |
Scope Intent¶
- In scope: Empirical analysis of real TFW artifact quality; current workflow text analysis; external research on LLM instruction-following patterns
- Out of scope: Implementation of fixes (that's Phase A/B); CI/linting enforcement (TFW-34 scope); template format changes
Guiding Questions¶
- Which active projects have the most RF/RES/REVIEW files to sample from?
- Are there any projects where §6-8 ARE consistently filled — and if so, what's different about the prompt or context there?
- Is the diagram## User Direction
Q1 answer: Projects to scan: helpdesk (/d/projects/research/helpdesk), atamat (/d/projects/research/atamat), auto-schedule-prototype (/d/projects/research/auto-schedule-prototype), and steps-framework itself. Helpdesk is newest/most active. Atamat has most tasks but older TFW version. Scheduler has a few recent tasks.
Q2 answer: No projects where §6-8 are consistently filled — helpdesk is closest but only in the most recent phases.
Q3 answer: No — diagram collection has never been attempted.
User steering (post-Challenge): Review should be structured like the research flow — with explicit stages and cognitive mode transitions (Read → Verify → Challenge → Synthesize). This is a deeper insight than "add an audit step."
Stage complete: YES