title: "Gather — "What do we NOT know?"" source: "tasks/TFW-42__research_cycle_restructure/research/gather.md"
Gather — "What do we NOT know?"¶
Parent: HL-TFW-42 Goal: Determine how TFW should present multi-agent research orchestration to users via iterations.yaml.
Dimensions¶
| Dimension | Alt A | Alt B | Alt C | Alt D |
|---|---|---|---|---|
| D1: Agent specification | Free-text name only (agent: antigravity) |
Structured profile (agent: {name, strengths, model}) |
Enum from registry (agent: antigravity validated against .tfw/agents.yaml) |
No agent field — implicit from context |
| D2: Iteration dependency | Linear chain (depends_on: [iter-1]) |
DAG (arbitrary depends_on graph) |
None (pure sequential by number) | Implicit (each reads predecessor RES) |
| D3: Framework guidance | Document-only (conventions describe multi-agent as possible) | Prompt-in-template (iterations.yaml template includes agent: with comment guidance) |
Active suggestion (plan.md Step 6b prompts coordinator to consider agent assignment) | Auto-detect (framework infers best agent from iteration focus) |
Findings¶
G1: Multi-agent framework landscape (external research)¶
Surveyed 4 major multi-agent frameworks (CrewAI, AutoGen, LangGraph, MetaGPT) + industry methodology patterns.
Key patterns discovered:
| Framework | Agent specification | Task-agent binding | Config format | Orchestration |
|---|---|---|---|---|
| CrewAI | YAML: role, goal, backstory, allow_delegation | agent: field on task YAML |
Separate agents.yaml + tasks.yaml |
Sequential or Hierarchical (code) |
| AutoGen | Python: name, description, llm_config | speaker_selection_method (auto/manual/custom) |
Code + JSON for LLM config | GroupChat manager selects dynamically |
| LangGraph | Python: node functions | Conditional edges (routing functions) | Code (state machine) | Graph-based with state schema |
| MetaGPT | YAML: role-specific LLM config | _watch / _act cycle (message-passing) |
config2.yaml for LLM, Python for structure |
SOP (Standard Operating Procedure) |
Critical distinction: All frameworks separate agent DEFINITION (who the agent is) from task ASSIGNMENT (which task the agent works on). CrewAI is the closest analogue to TFW's model — YAML config, explicit agent assignment per task.
TFW's position: TFW doesn't need agent definition (agents are external tools — Antigravity, Claude Code, Codex CLI). TFW only needs task assignment — which iteration gets which agent. This is simpler than all surveyed frameworks.
G2: AFD-2 empirical patterns (internal analysis)¶
From HL §2 and iterations.yaml focus field:
| Iteration | Agent | Focus | Why this agent |
|---|---|---|---|
| 1 | Antigravity | Initial investigation (web research + domain analysis) | Best at web search, multi-source synthesis, large context |
| 2 | Codex CLI | Code audit (Gradle structure analysis) | Fast file traversal, code-native, can run commands |
| 3 | Codex CLI | Deeper code audit (module dependency mapping) | Continuation of iter 2, same tool strengths |
| 4 | Antigravity | Architecture synthesis (26-module design) | Required web research (Gradle best practices) + large synthesis |
| 5 | Claude Code | Server reconnaissance (production infra audit) | MCP server access, interactive SSH, real-time queries |
| 6 | Antigravity | Domain modeling (port-interface contracts) | Large context window needed for cross-module design |
| 7 | Antigravity | Build-logic research (convention plugins) | Web research on Gradle conventions + synthesis |
| 8 | Antigravity | Final synthesis (implementation strategy) | Cross-iteration synthesis requiring full context |
Patterns observed: 1. Agent choice correlates with iteration TYPE, not with sequential position 2. Three agent archetypes emerge: Web Researcher (Antigravity), Code Auditor (Codex), Infra Operator (Claude Code) 3. Consecutive iterations with same agent (2→3, 6→7→8) suggest "continuation" is a natural pattern 4. Agent switching happened at domain boundaries, not arbitrary points 5. No iteration used more than one agent — agent is per-iteration, not per-stage
G3: TFW coordinator workflow analysis (internal)¶
Current plan.md Step 6b creates iterations.yaml. The coordinator decides:
- How many iterations
- What each iteration investigates (focus + hypotheses)
- Status tracking
Missing from current workflow: - No prompt to consider agent assignment - No guidance on WHEN different agents are useful - No mechanism to express "this iteration needs web research" vs "this needs code audit"
The agent field in HL-TFW-42's proposed schema is a string — free text. This matches AFD-2's actual usage where agents were named informally.
G4: Industry methodology patterns (external research)¶
Supervisor-Worker is the dominant pattern for high-stakes multi-agent research: - Central coordinator decomposes work - Specialized workers execute subtasks - Results flow back to coordinator for synthesis
This maps directly to TFW's model: Coordinator (plan.md) → Researcher (research/base.md) → back to Coordinator.
Human-in-the-loop patterns: - "Pause points" / structured gates — TFW already has these (🛑 WAIT gates) - "Progressive autonomy" — TFW supports via CL/AG modes - "Role-based specialization" — TFW has Role Lock
Key insight from methodology research: The industry consensus is that agent selection should be coordinator-driven with guidance, not automated. Auto-detection requires capability models that don't exist yet for AI coding tools. Best practice: provide a decision framework, let the human/coordinator choose.
G5: Counter-evidence — when agent field adds overhead (deep mode requirement)¶
Counter-argument: Most TFW projects use a single agent. Adding agent field creates noise for the common case.
Evidence for counter:
- Of all TFW-42's predecessor tasks (TFW-2 through TFW-41), ALL used a single agent (Antigravity or Claude Code)
- Multi-agent research is observed only in AFD-2 (a large, multi-domain project)
- Single-agent projects would write agent: antigravity on every iteration — pure noise
Mitigation found in HL §7 P5: "Optional enrichment — new iterations.yaml fields are optional." This principle already addresses the counter-argument. The field exists for those who need it, doesn't burden those who don't.
Checkpoint¶
| Found | Remaining |
|---|---|
| 3 dimensions identified with 4 alternatives each | None — space well-mapped |
| 4 external frameworks analyzed + 1 methodology survey | None |
| AFD-2 empirical data mapped (8 iterations, 3 agent archetypes) | None |
| Counter-evidence found and addressed (single-agent overhead) | None |
Sufficiency: - [x] External source used? (5 web searches: CrewAI, AutoGen, LangGraph, MetaGPT, methodology patterns) - [x] Briefing gap closed? (H1 evidence gathered from both external and internal sources) - [x] Dimensions identified? (3 dimensions × 4 alternatives each)
Deep mode criteria: - [x] Hypothesis tested? (H1 — agent field in iterations.yaml vs separate mechanism — evidence gathered) - [x] Counter-evidence sought? (G5: single-agent overhead)
Metacognitive check: I discovered something NEW — the distinction between agent definition and task assignment. TFW doesn't need agent profiles (unlike CrewAI/MetaGPT) because agents are external tools. This simplifies the design significantly. I also discovered the 3-archetype pattern from AFD-2 data, which is actionable for framework guidance.
Stage complete: YES → User decision: proceed (autonomous mode)