## Demo
<img width="2624" height="1636" alt="image" src="https://github.com/user-attachments/assets/3d9c3dbf-9ef9-463b-beaa-4d7364c1bcff" />
Proves the dashboard's Anthropic dollars now tie out to the billed Cost API. Run by importing the changed service directly (no HTTP layer) against live Redshift — read-only: AICostsService.get_summary / get_time_series / get_by_bu / get_by_model / get_top_drivers, compared to ground truth SELECT SUM(amount) FROM core_finance.ai_spend_anthropic_cost_reports.
Backend — billed-dollar allocation (throwaway script run from klair-api/ with uv run; imports the service and calls each swapped consumer):
WINDOW 2026-05-13..2026-06-10========================================================================
1) Ground truth: billed Cost API total (SUM(amount)) = $801,242.15
2) get_summary().anthropic (direct + TF, allocated) = $801,242.14
TIE-OUT diff vs billed = $-0.02
openai (must be unchanged by this PR) = $411,129.34
claude_ai (separate provider, must be unaffected) = $136,437.99
3) get_time_series daily: 29 pts, duplicate keys=NONE
sum(anthropic) = $801,242.14 (diff vs summary $0.00)
4) get_by_bu sum(anthropic) = $801,242.14 (diff vs summary $0.00)
Central Engineering $ 199,787.32
DevFactory $ 123,352.05
Trilogy-Inc $ 106,604.41
Tech Super Builders $ 65,804.48
Core Education $ 43,306.68
5) get_by_model claude-fable-5 (was $0 pre-PR, no pricing row): $23,819.71
6) get_top_drivers (anthropic, top 5):
#1 Unknown $ 76,571.08
#2 Claude Code $ 76,216.94
#3 Claude Code $ 46,705.13
#4 Education $ 41,593.10
#5 Unknown $ 33,717.11
7) BU-filtered get_summary (Central Engineering) anthropic = $320,412.07 (sane: > 0, < total)
What this demonstrates:
- Full-bill tie-out (lines 1–2): dashboard Anthropic (direct allocation + rescaled TF slice) reproduces the billed total to within $0.02 (float rounding) over a 29-day window.
- claude-fable-5 recovered (line 5): previously $0 on the dashboard (no pricing-table row); now carries its billed $23,819.71.
- Line 7's value exceeds Central Engineering's by-BU row because BU-filtered summaries always include ANTHROPIC_SHARED_BUS — pre-existing semantics, unchanged by this PR.
Most at risk from this change:
1. TF slice double-count — direct consumers must exclude TF-flagged allocated rows while _tf_anthropic_* re-adds the rescaled slice; a regression here inflates totals by ~$177K. The $0.02 tie-out above proves it's absent, and test_direct_consumers_exclude_tf_flagged_allocated_rows guards it (verified to fail against pre-fix code).
2. Consumer divergence / chart breakage — all five swapped consumers share the new CTE; duplicate period keys or disagreement would break the dashboard. Lines 3–4: daily series = by-BU = summary to the cent, no duplicate keys.
3. Other providers — OpenAI ($411,129.34) and claude_ai ($136,437.99) flow through the same methods and are unchanged (line 2).
Verified via the scoped suite (uv run pytest tests/test_ai_costs_service.py -q):
153 passed in 0.37s
## Summary
Re-anchors the /ai-adoption dashboard's Anthropic dollars from the computed estimate (token counts × the hand-maintained ai_spend_token_pricing, materialized as ai_spend_claude_token_usage.total_cost_dollars) to the actual bill in core_finance.ai_spend_anthropic_cost_reports. Billed dollars are redistributed to BUs at query time via per-workspace API-key token shares — no new pipeline, no new table. After this change the dashboard's Anthropic total (direct + TrueFoundry) equals SUM(ai_spend_anthropic_cost_reports.amount) for any date window, to the dollar, while BU attribution still honors ai_spend_bu_overrides edits.
Specs 01–02 are backend-only (response envelopes reused verbatim); spec 03 adds a single client-side frontend change — a By BU roll-up in the Provider Detail panel (see below).
Linear ticket: [KLAIR-2878 — AI Spend: anchor Anthropic dashboard dollars to billed Cost API (redistribute billed $ to BUs via per-workspace API-key token shares)](https://linear.app/builder-team/issue/KLAIR-2878/ai-spend-anchor-anthropic-dashboard-dollars-to-billed-cost-api)
## Performance fix (spec 02)
Follow-up commit 2563f4e7e mitigates a performance regression the allocation CTE introduced. The CTE is ~10× heavier than the query it replaced (0.8s → 7.8s for a one-month window) and is embedded ~13× per dashboard load (each of the 5 consumers, plus every _tf_* helper re-runs the full allocation block via _tf_day_factor_with_block). Because the 5 dashboard endpoints were async def calling blocking Redshift code, FastAPI serialized the frontend's parallel requests on the event loop (~100s load) — so get_by_bu (the heaviest endpoint) was still pending when a user drilled into the Anthropic provider, leaving the "By Workspace" panel empty ("0 workspaces") even though the backend returns correct data.
Mitigation (allocation math untouched — direct + TF == SUM(amount) to the cent still holds):
- Threadpool concurrency — the 5 dashboard endpoints (summary, time-series, by-model, by-bu, top-drivers) are converted async def → sync def, so Starlette runs them concurrently instead of serializing on the event loop.
- Pool guardrail — a process-global bounded semaphore caps how many _execute_query calls hit the shared Redshift pool at once (≤ REDSHIFT_POOL_SIZE, default cap pool − 1) so the fan-out can't over-subscribe the pool and block on connection checkout.
- Result cache — a short-TTL ((query, params), default 300s) cache serves concurrent/repeat loads from memory and collapses the duplicate _tf_unmapped_billed lookups a single load fires. Disabled under tests via a conftest.py autouse fixture.
| Metric | Before (serialized) | After |
|--------|--------------------|-------|
| Cold parallel load | ~100s, by-bu never arrives → empty panel | 40s, all 5 endpoints succeed |
| Warm load (cache hit) | n/a | 0.9s (44×) |
| by-BU Anthropic tie-out | — | $822,205.55 vs billed $822,205.57 (−$0.02) |
| Anthropic workspace rows | 0 (panel empty) | 100 |
Trade-off: the first cold load (and loads after the TTL expires or for a new date range) is still ~40s but now reliably completes; repeat loads are instant. The deeper "compute the allocation once per load and aggregate in-app" refactor (collapses ~13× → 1×, ~8s cold) is deferred — documented in [spec 02](https://github.com/AI-Builder-Team/Klair/blob/feat/anthropic-billed-dollar-allocation/features/ai-spend-and-adoption/anthropic-billed-dollar-allocation/specs/02-dashboard-query-performance/spec.md).
## Frontend — By BU section (spec 03)
Commit 9ff3fa9ab adds the one frontend change: a By BU roll-up section in the Provider Detail drill-down panel, so spec 01's redistributed billed dollars are legible per BU. The panel previously showed only By Model and By Workspace — a reader had to mentally sum dozens of workspace rows to see each BU's share.
- Renders as §02, between By Model (§01) and By Workspace (now §03) — a natural drill-down order (provider → model → BU → workspace).
- Client-side only: rolls up the already-fetched by-bu response (the same provider-matched workspace rows the By-Workspace section uses) by BU — sum spend, count workspaces, % of provider total, sorted desc. No new request, no backend change.
- Reconciles to the cent with By Workspace (identical source rows).
- Provider-agnostic: keys on project.provider, so every provider's panel gets the section (OpenAI, Cursor, Bedrock, GCP, Azure, Anthropic), not Anthropic-only. Verified the backend yields the same per-BU total (each BU's anthropic_cost equals the sum of its anthropic-provider project costs).
- Extracts a shared RankedSpendTable rendered by both sections (net markup reduction).
klair-client/src/screens/AIAdoptionV2/components/panels/ProviderDetailPanel.tsx (+ .spec.tsx). Tests: existing single-table assertion scoped by aria-label; added BU roll-up + per-BU empty-state tests — 9 panel tests / 314 AIAdoptionV2 tests green; eslint / tsc / prettier clean. Documented in [spec 03](https://github.com/AI-Builder-Team/Klair/blob/feat/anthropic-billed-dollar-allocation/features/ai-spend-and-adoption/anthropic-billed-dollar-allocation/specs/03-provider-detail-by-bu-section/spec.md).
## Why (drift evidence)
Computed Anthropic dollars drift from what Anthropic actually bills:
- Zero-match models bill at $0. claude-fable-5 has a billed line but no pricing row — silently rendered as $0 on the dashboard (June 2026: ~$24K of real spend invisible).
- Prefix fallback bills new models at ancestor rates (cause of the Apr 2026 ~$70K remediation, KLAIR-2580).
- Computed ignores the negotiated 10% input/output discount → systematic ~+2% overstatement even when the pricing config is correct.
- Net drift: Apr computed +2.7% vs billed, May +0.6%, June running −5.0%.
The actual bill already lands daily in ai_spend_anthropic_cost_reports (~06:00 UTC, anthropic-cost-pipeline) but was only read by the raw-data report screens — no dashboard aggregate consumed it.
## What was implemented
All changes are inside klair-api/services/ai_costs_service.py: one new shared SQL builder plus call-site swaps in every dashboard read path.
- _anthropic_billed_allocation_cte(...) — a shared builder returning a WITH block + ordered params:
- cost_cells — billed dollars per (bu, workspace_id, model, context_window, report_date, token_type) from ai_spend_anthropic_cost_reports (cost_type='tokens').
- key_weights — the matching token column per api_key_id from ai_spend_claude_token_usage, summed per cell (carries is_truefoundry_routed), via a token_type → usage-column CASE map. Join keyed on the full cell including org bu with NULL-safe COALESCE on bu/workspace_id (the default workspace is NULL workspace_id in both tables for every org, so a bu-less key would smear one org's billed dollars across others).
- cell_totals — SUM(key_tokens) per cell.
- allocated — billed(cell) × key_tokens / cell_tokens at api_key_id grain, with the UNION ALL fallback cascade (tiers 0–3: full cell → drop context_window → workspace-day computed-cost share → org-bu passthrough) plus the non-token (web_search/session_usage) workspace-day split. Redshift has no FILTER clause — all conditional sums use SUM(CASE WHEN … THEN x END).
- Consumer swaps — get_summary (incl. its prior-period block), get_time_series, get_by_bu, get_by_model, get_top_drivers swap the direct-Anthropic FROM ai_spend_claude_token_usage … SUM(total_cost_dollars) aggregate for FROM (<allocation CTE>) … SUM(amount), keeping _get_anthropic_override_join + _build_effective_bu_filter(..., always_include=ANTHROPIC_SHARED_BUS) wiring identical (allocated rows keep api_key_id + bu grain).
- TF rescale — the _tf_anthropic_* helpers are multiplied by a per-day factor day_factor = tf_billed_day / tf_computed_day so the TF add-back equals the TF keys' allocated billed dollars. Days with TF billed $ but no TF usage rows route to Unmapped. Direct consumers keep ANTHROPIC_TF_EXCLUDE on the terminal CTE flag.
- get_filters — verified to have no dependency on the computed Anthropic cost column.
## Allocation model
- Cell = (org bu, workspace_id, model, context_window, report_date, token_type) over cost_type='tokens' rows.
- Weights = matching token column per api_key_id, summed per cell. token_type maps 1:1 to usage columns (uncached_input_tokens, output_tokens, cache_read_input_tokens, cache_creation.ephemeral_5m/1h_input_tokens → cache_creation_5m/1h_tokens).
- allocated(key, cell) = billed(cell) × key_tokens / cell_tokens. Weights are immune to pricing drift because price is uniform within a cell.
- Fallback cascade (defensive — May 2026 verified 100% of billed token $ matched at full cell grain): drop context_window → workspace-day computed-cost share → org-bu passthrough.
- Non-token billed rows (web_search, session_usage; ~$1K/mo): allocate by workspace-day computed-cost share.
- BU attribution semantics unchanged: allocated rows keep api_key_id grain, so effective_bu = COALESCE(o.bu_override, bu) and ANTHROPIC_SHARED_BUS always-include carry over verbatim. 58% of May billed $ sits in workspaces whose keys map to >1 effective BU — this genuinely redistributes, not just rescales.
- TF slice: the TF keys' allocated billed dollars become the TF slice total per day; re-attributed per BU by TF metered-Anthropic usage shares (the _tf_anthropic_* helpers × day_factor). Result: direct + TF = SUM(amount) for any window, to the dollar.
## Test coverage
tests/test_ai_costs_service.py — 159 pass:
- 27 new tests (spec 01): CTE structural contract, allocation math (multi-key cell split, fallback tiers, zero-weight cell), _bu_includes, TF day-factor / Unmapped routing, consumer swaps.
- 6 new tests (spec 02): TestDashboardQueryCache — result-cache hit/keying/TTL expiry, cross-instance sharing, disabled-cache refetch, concurrency-cap bound.
- 1 added regression guard.
- 3 legacy tf_claude_real tests updated to the new query contract.
- The autouse _neutralize_tf_claude_sources neutralization fixture is unchanged, keeping legacy ordered-mock tests valid.
## Self-review findings addressed
- IMPORTANT — spec/implementation contradiction on the TF exclude flag: fixed; spec synced to match the implementation.
- MINOR — computed_cost 5× ratio-only constraint: clarifying comment added.
- MINOR — days-metric edge case on TF-only days: accepted; the existing date-range fallback already covers it.
## Live verification
Live Redshift, window 2026-05-13 .. 2026-06-10:
| Check | Old/Reference | New (allocated) |
|-------|---------------|-----------------|
| Anthropic dashboard (direct + TF) | — | $801,242.14 |
| Billed SUM(amount) | $801,242.15 | tie-out within $0.02 (float rounding) |
| Daily series total | — | = by-BU = summary, to the cent |
| claude-fable-5 (no pricing row) | $0 | $23,819.71 |
| OpenAI (unaffected) | $411,129.34 | $411,129.34 (unchanged) |
| claude_ai (out of scope) | unaffected | unaffected |
## Expected dashboard shifts
Vs the old computed dashboard: May −0.6%, June +5% (the June jump includes claude-fable-5's previously-missing ~$24K). These are expected and correct — the dashboard now equals the bill.
## Out of scope
1. claude_ai provider line (Claude.ai Enterprise, separate table/API) — unchanged.
2. fct_ai_spend mart (022_fct_ai_spend.sql) — still computes from token usage; re-anchoring is a follow-up ticket.
3. MCP query_ai_spend — still reads computed usage; same follow-up.
4. Surtr pipelines — unchanged; both feeds land daily ~06:00 UTC at equal freshness.
5. Pricing-table fixes / drift detection (KLAIR-2580 Spec 2) — orthogonal; computed costs survive as allocation-fallback weights and for savings views.
6. Dashboard-wide BU pivot — the spec 03 By-BU section is scoped to the provider drill-down panel; a top-level "group by BU" for the whole Cost section is a separate change. (Backend response shapes are unchanged regardless.)
🤖 Generated with [Claude Code](https://claude.com/claude-code)