Gather — Iteration 3¶

Parent: HL-TFW-38 Goal: (A) TS over-specification audit, (B) Naming matrix for all review terms.

Thread A: TS Over-Specification Audit¶

G1: TS Template vs Real TS Content¶

TS template (.tfw/templates/TS.md) defines §4 as:

## 4. Detailed Steps
### Step 1: {title}
{What to do, with code examples if relevant}

The template says "with code examples if relevant" — which is permissive, not prescriptive. It doesn't say "write the complete implementation."

G2: HD PhaseD TS — 1036 lines, 41KB¶

Content classification:

Section	Lines	Content Type
§1 Objective	3	Requirement (WHAT)
§2 Scope	25	Requirement
§3 Affected Files	16	Design direction
§4 Step 1 Common Schemas	27	FULL IMPLEMENTATION — complete Python classes
§4 Step 2 Ticket Schemas	121	FULL IMPLEMENTATION — every field, every type hint, every docstring
§4 Step 3 SLA Engine	194	FULL IMPLEMENTATION — complete class with all methods, algorithms, edge cases
§4 Step 4 Ticket Repository	134	FULL IMPLEMENTATION — every CRUD method, advisory lock, pagination
§4 Step 5 Ticket Service	260	FULL IMPLEMENTATION — complete business logic, state machine, SLA integration
§4 Steps 6-8 API Routes	100	FULL IMPLEMENTATION — partial with Python snippets
§4 Steps 9-10 Tests	20	Requirement (what to test)
§5 AC	34	Requirement

Total code in TS: ~736 lines of Python out of 1036 total. 71% of the TS is ready-to-paste code.

G3: HD PhaseF TS — 356 lines, 14KB¶

Content classification:

Section	Lines	Content Type
§1 Objective	3	Requirement
§2 Inputs	12	Context
§3 Steps 1-2 Mat views + schemas	95	CODE — DDL specs, complete Pydantic schemas
§3 Steps 3-4 Analytics Repo + Service	38	Mixed — method signatures + formulas, not full code
§3 Step 5 Report Service	8	Requirement (method list)
§3 Step 6 Workers	70	FULL CODE — complete asyncio workers with SQL
§3 Steps 7-10 Routes + Tests	70	Requirement (endpoint table, test list)
AC	35	Requirement

Code ratio: ~200 lines of code / 356 total = 56%. Still more code than spec, but closer to "design direction with examples."

G4: Atamat TFW-16 PhaseH TS — 546 lines, 19KB¶

Content classification:

Section	Lines	Content Type
§0-1 Scope + Manifest	38	Requirement/Design
Step 1 geocode_place	86	FULL CODE — complete Python function
Step 2 find_route	102	FULL CODE — complete Python function
Step 3 Agentic Loop integration	90	MIXED — code snippets at specific line numbers
Step 4 Prompt instructions	22	FULL CODE — exact XML block
Step 5 Widget frontend	118	MIXED — TypeScript types + pseudocode + CSS
AC + Verification	50	Requirement

Code ratio: ~300 / 546 = 55%.

G5: Atamat TFW-16 PhaseF2 TS — 512 lines, 16KB¶

Content classification:

Section	Lines	Content Type
§0-1 Scope + Manifest	30	Requirement/Design
Step 1-2 isLoading + thinking	72	FULL CODE — TSX, TypeScript
Step 3 RouteStrip	80	PSEUDOCODE — structure + state logic, not exact code
Step 4-5 Integration + CSS	152	FULL CSS CODE
AC + Verification	60	Requirement

Code ratio: ~250 / 512 = 49%.

G6: TS Over-Specification Summary¶

TS	Total lines	Code lines	Code %	Type
HD PhaseD	1036	~736	71%	FULL IMPLEMENTATION
HD PhaseF	356	~200	56%	MIXED (formulas + code)
Atamat PhaseH	546	~300	55%	FULL IMPLEMENTATION (tools)
Atamat PhaseF2	512	~250	49%	MIXED (pseudo + CSS)
Average	~612	~371	58%	More code than spec

Conclusion: On average, 58% of TS content is ready-to-paste code. HD PhaseD is the extreme case at 71% — the coordinator essentially wrote the entire implementation. The executor's job in such cases is to copy-paste from TS, adjust for runtime issues, and fill in gaps.

Token cost: HD PhaseD TS = 41KB. At ~4 chars/token, that's ~10,250 tokens. If 71% is code the executor would have written anyway, ~7,300 tokens were "double-spent" (coordinator writes it, executor reads it and types it).

Thread B: Naming Matrix¶

G7: TFW Naming Principles (from .tfw/README.md §Values)¶

Key value: "Naming Creates Behavior" (line 104-106):

"Right terminology triggers right associations in AI agents. A small prompt with precise terms is more effective than a long prompt with explanations. TFW adopted OODA, Sufficiency Verdict, Trust Protocol, Progressive Disclosure — each term replaced paragraphs of instructions. If you have to explain what a step does, the step is named wrong."

This is the ultimate filter: if the user has to ask "what does this mean?" — the name failed.

User feedback: - "prose мне непонятно вообще" → FAIL - "comprehend мне тоже непонятно что это вообще" → FAIL - "между assess и judge мне больше нравится judge, намерение явное" → assess is unclear, judge is direct

G8: Existing TFW Naming Patterns¶

Current Term	What It Triggers	Works?
Gather	"Collect data"	✅ Clear, active verb
Extract	"Pull out patterns"	✅ Clear, active verb
Challenge	"Test, push back"	✅ Clear, active verb
Briefing	"Preparation, plan"	✅ Military metaphor, understood
Onboarding	"Learning the context"	✅ Industry standard
Handoff	"Passing to someone"	✅ Clear metaphor
Role Lock	"Can't change role"	✅ Structural, enforcement
Verdict	"Final decision"	✅ Legal metaphor, decisive

Pattern: TFW uses short, active, metaphorical terms from established disciplines (military, legal, engineering). The best terms are 1-2 syllables and need no explanation.

G9: Candidate Matrix — Review Stages¶

Current proposal tested in iter 2: Comprehend → Verify → Assess → Synthesize

Candidate	Syllables	Meaning Clear?	User Test	Parallel Discipline	Score
Comprehend	3	"Understand deeply" — pretentious	❌ "непонятно"	Academic	2/5
Read	1	"Look at the document"	✅ Obvious	—	4/5 but too passive
Scan	1	"Quick look through"	✅ Clear	Military/medical	3/5 too shallow
Orient	3	"Get bearings"	⚠️ Known from OODA, but alone unclear	Military (OODA)	3/5
Study	2	"Read carefully"	✅ Clear	Academic	3/5 too passive
Map	1	"Build mental map of what was done"	✅ Metaphorical, active	Navigation	4/5
Verify	3	"Check if claims are true"	✅ Universal	Engineering/audit	5/5
Check	1	"Quick inspection"	✅ Simple	—	4/5 too casual
Audit	2	"Systematic examination"	✅ Clear, strong	Financial/engineering	4/5
Assess	2	"Evaluate quality" — vague	⚠️ "между assess и judge"	HR/education	3/5
Judge	1	"Make a decision about quality"	✅ User prefers	Legal	5/5
Rate	1	"Assign score"	⚠️ Too numerical	—	2/5
Weigh	1	"Consider pros/cons"	✅ Metaphorical	Legal	3/5
Synthesize	4	"Combine to produce whole" — academic	⚠️ Long	Academic	3/5
Decide	2	"Make the call"	✅ Clear, active	Management	4/5
Close	1	"Finish, wrap up"	✅ Clear	Project management	4/5

Top candidates per stage:

Stage 1 (understand): Map (1 syllable, active, metaphorical) Stage 2 (check evidence): Verify (clear, no alternatives) Stage 3 (quality judgment): Judge (user preference, 1 syllable, direct) Stage 4 (decide + capture): Decide (2 syllables, active) or Close (1 syllable, clear)

G10: Candidate Matrix — Review Modes¶

Current proposal: code, prose, spec

Candidate	For Code Tasks	For Writing Tasks	For Analytical Tasks
A: code / prose / spec	✅ clear	❌ user rejects "prose"	⚠️ OK
B: code / content / analysis	✅	⚠️ vague	⚠️ confusable with data analysis tasks
C: build / write / analyze	✅ active	✅ clear verb	⚠️ confusable
D: code / docs / research	✅	✅ simple	⚠️ research collides with RES
E: implementation / deliverable / specification	✅ formal	⚠️ long	⚠️ long
F: dev / text / study	✅ short	✅ short	⚠️ unclear

"prose" alternatives for writing tasks: - text — too generic - content — vague - docs — clear, short, understood - write — active verb, matches TFW pattern - creative — wrong connotation

Best option: modes should match the REVIEW template variant name. They'll appear in PROJECT_CONFIG.yaml and be referenced in workflow text. Short, memorable.

Checkpoint¶

Found	Remaining
TS are 49-71% ready-to-paste code — confirmed over-specification (G2-G6)	Need to determine: is this a PROBLEM or a FEATURE?
"Naming Creates Behavior" is the filter — if you must explain it, it's wrong (G7)	—
"comprehend" and "prose" fail the user test (G9, G10)	Need final naming decision
Top candidates: Map → Verify → Judge → Decide (G9)	Need challenge validation
Mode naming unclear — docs/write are top contenders for writing mode (G10)	Need final decision

Sufficiency: - [x] External source used? (ISO model from iter2, TFW values) - [x] Briefing gap closed? (Both threads gathered)

Stage complete: YES → User decision: ___