## Screenshots
<img width="1916" height="941" alt="image" src="https://github.com/user-attachments/assets/ac4648e0-36ac-4359-a6b7-af13e7811e80" />
## Summary
End-to-end chat-driven editing loop for Budget Bot 4.0:
- B3.1 — Register four section-editing tools with Claire's chat call (regenerate_section, rewrite_section, add_comment, update_table_cell) so she can propose concrete edits instead of only narrating advice.
- B3.4 + B3.5 — Render those proposals as inline cards in the chat panel with Accept / Reject buttons; Accept routes to the right backend endpoint per tool and the editor preview auto-updates.
- B3.7 — SectionComment model + endpoints + per-section open-comment badge in the SectionNav so add_comment Accept actually persists annotations.
Verified live on Skyvera Q2: regenerated the Goals section conversationally, editor refreshed in place without losing scroll, comment created and badge appeared.
## Why it's needed
After PR #2684 (Tester Sprint), Claire could talk about findings and section content but couldn't *do* anything — if a tester clicked a finding and asked her to fix it, she gave narrative advice and stopped. That was the gap in the "Cursor for documents" promise the editor architecture is built around.
This PR closes the gap end-to-end:
| After commit | Capability |
|---|---|
| B3.1 backend | Claire emits structured tool_use blocks alongside text (visible in API, invisible in UI) |
| B3.4 / B3.5 / B3.7 | Tool calls render as inline cards; Accept routes to production endpoints; add_comment has somewhere to land + a badge to advertise |
| B3.4 follow-ups | Editor preview auto-refreshes after Accept; regenerated sections no longer ship 20+ citation footnotes; long regen shows a clearer "1–2 minutes" hint |
The tool count grew from the original 3 in the backlog spec to 4 because we added regenerate_section, which delegates to the existing production pipeline (_regenerate_section — full DataPackage + section-specific prompts + brainlift). The tool descriptions tell Claire to prefer it over inline rewrite_section for substantive rewrites, since her chat-only context lacks the data the pipeline already has.
## Changes
### Backend — B3.1 tool registration
- New klair-api/budget_bot/board_doc/claire_tools.py: per-tool Pydantic input validators, CLAIRE_TOOLS Anthropic schema list with curated descriptions, ToolCall model preserving the tool_use_id, and parse_tool_calls(response) that extracts validated proposals (unknown tool names + invalid input dicts are logged and dropped — a single bad proposal never crashes the chat reply).
- _create_message_sync grew an optional tools= kwarg (additive; all existing callers omit it).
- handle_chat registers CLAIRE_TOOLS on every chat call, parses tool calls from the response, and stashes them on data["tool_calls"] only when present (text-only replies are byte-identical to pre-B3.1).
### Backend — B3.7 section comments
- SectionComment Pydantic model: comment_id / section_id / paragraph_text / body / author (default "claire") / created_at / status ("open" | "resolved").
- WizardSession.section_comments: list[SectionComment] (flat list with section_id field — easier than per-section dict and makes badge-count a single pass).
- Three new endpoints under /board-doc/wizard/{id}:
- POST /sections/{section_id}/comments — create
- GET /comments — list (open + resolved; client filters)
- PATCH /comments/{comment_id} — soft-resolve (no hard delete)
- All write paths use save_with_merge_retry (CF17 pattern) for cross-process safety.
### Backend — B3.4 follow-up: citation strip on regen
- generate_custom_section had a hardcoded "Cite data sources in footnotes." in its system prompt that _resolve_system_prompt couldn't strip (it only swaps SHARED_SUFFIX_WITH_CITATIONS → SHARED_SUFFIX_NO_CITATIONS). Made the inline directive conditional on spec.include_citations (default False in the wizard flow).
- _regenerate_section now runs strip_citations_and_gaps as a belt-and-suspenders safety net before persisting (previously only the assembler did this on full publish). User never sees citations even if the LLM ignores the prompt fix.
- Pre-existing bug fixed in strip_citations_and_gaps: inline [N] strip ran BEFORE the legend-block strip, killing the legend regex's anchor and leaving orphan source names ("--- Redshift: arr_snowball_data GSheets: …"). Reordered. Pre-dated this PR but only manifested once _regenerate_section started invoking the function.
### Frontend — B3.4 / B3.5 proposal rendering + Accept routing
- New ChatToolProposal component with per-variant body:
- rewrite_section — word-diff (when prior content is available) or preview (when not).
- regenerate_section — feedback summary + a "1–2 minutes" status hint while busy (the regen call routinely takes 60–120s; without this, users assume the click hung).
- add_comment — paragraph anchor + comment body.
- update_table_cell — coordinates + new value.
- Inline diffWords LCS utility (~60 LOC, no npm dep — diff package install hit a transient npm error and the algorithm is small enough to own).
- ChatPanel renders proposals below assistant message bubbles, filtered by resolvedToolIds so Accept/Reject removes the card.
- Accept routing per tool:
- rewrite_section → existing updateSection PUT (B2.4)
- regenerate_section → existing wizardRegenerate POST with action=regenerate_section
- add_comment → new createSectionComment POST (B3.7)
- update_table_cell → "land in a follow-up" notice; no backend yet for surgical markdown-table cell mutation
- markToolCallResolved on the wizard hook hides resolved cards (append-only resolved-id list per message — never clear, so a re-render can't surface a dismissed proposal).
### Frontend — B3.4 follow-up: editor refresh after Accept
- New DocumentEditorActions.refetchSection(sectionId) action.
- useDocumentEditor.refetchSection re-fetches one section, updates the autosave baseline so the refresh isn't diffed as a stale-overwrite candidate, splits the current document, swaps just the target section's content, re-assembles. Preserves scroll position, cursor, and unsaved edits in OTHER sections — meaningfully nicer than a reloadNonce bump that would full-remount the editor.
- handleSectionUpdated calls it after every successful Accept.
- No-op (with logged warn) on unknown id or fetch failure so a refresh failure can never wedge the editor into a half-updated state.
### Frontend — B3.7 comment badges
- SectionNav grew an openCommentsBySection prop and renders a small MessageCircle badge with the open-comment count per section.
- DocumentEditorPage fetches comments on mount and re-fetches after each successful Accept.
- New apiClient.patch helper (was missing — only get / post / put / del existed).
### Frontend — infrastructure: vitest config
- Set pool: 'forks' + isolate: true in vitest.config.ts. Eliminates a vitest 4 SWC parallel-transform race that intermittently crashed BoardDoc spec files with SyntaxError: missing ) after argument list. Symptom: identical commands, sometimes 0 failures sometimes 5+; every spec passes in isolation. Same race the _smoke-suffix workaround in DocumentEditorPage.smoke.spec.tsx was originally added for. Same wall time as the racy default (~2.4s).
## Breaking changes
None. Every change is additive or strictly safer:
- _create_message_sync(tools=...) is optional with None default.
- handle_chat returns the same StepResponse shape; data["tool_calls"] only appears when Claire emits tool calls.
- WizardSession.section_comments defaults to []; existing sessions deserialise cleanly.
- DocumentEditorActions.refetchSection is a new field; pre-existing callers that only used scrollToSection keep working unchanged.
- SectionNav.openCommentsBySection is optional; pre-B3.7 callers omit it and no badge renders.
- ChatPanel props for tool-call rendering (sessionId, getToken, currentContent, onResolveTool, onSectionUpdated) are all optional; without them, proposals don't render and the panel behaves exactly as before.
## Test plan
- [x] Backend full suite — uv run pytest tests/board_doc -q → 875 / 875 pass (~85s). New: 32 in test_claire_tools.py, 6 in test_chat_tool_calls.py, 12 in test_section_comments.py, 4 in test_regenerate_citation_strip.py. Pre-existing 821 unchanged.
- [x] Frontend full BoardDoc suite — npx vitest run src/screens/BoardDoc → 103 / 103 pass on two consecutive runs (was flaking 0–10 failures pre-vitest-config-fix). New: 8 for diffWords, 13 for ChatToolProposal, 5 for useDocumentEditor.refetchSection. Pre-existing 77 unchanged + 3 small mock additions for listSectionComments.
- [x] ruff check + ruff format --check clean on all touched backend files.
- [x] tsc --noEmit clean.
- [x] eslint --max-warnings 0 clean on all touched FE files.
- [x] Manual verification (Skyvera Q2, Apr 29): opened editor → asked Claire to regenerate the Goals section → proposal card rendered with regen feedback → clicked Accept → "1–2 minutes" hint appeared → after ~2 minutes the regenerated section auto-appeared in the editor preview *with no page reload*, *no citation footnotes*, *and scroll position preserved*. Same loop verified for the Prior Quarter Review section.
- [ ] Pending external review — once internal review fixes land.
## Follow-ups (next branches)
- B3.4-fu — Plumb live editor section content into ChatPanel.currentContent so rewrite_section always renders as a real word-diff (today it falls back to preview when prior content isn't passed).
- B3.5-fu / new — update_table_cell Accept handler. Needs a spec for surgical markdown-table mutation.
- C4.4 — "Address with Claire" button per finding (now unblocked by B3.1 + B3.4 + B3.5): pre-fills chat with finding context so Claire proposes a regenerate_section or rewrite_section.
- B5.4 — Claire artifact tool (attach_data_visualization); also unblocked by B3.1 (slots into the same tool-registration path).
- B3.6 — Quick action buttons in chat ("Rewrite this section", "Make more specific", etc.). Lone remaining B3 item, deferred per Marcus's call as the lowest-impact in the B3 set.
## Known cosmetic noise (non-blocking)
Local Windows dev sees harmless ConnectionResetError [WinError 10054] tracebacks from _ProactorBasePipeTransport._call_connection_lost after CORS preflight requests. Long-standing CPython + Proactor event loop quirk; production runs on Linux's SelectorEventLoop which never hits this path. HTTP layer never sees it (every preflight + actual request returns the right status). Discussed Apr 29 — left as-is rather than adding event-loop-policy startup config for a purely cosmetic local-dev annoyance.