Vol. I  ·  No. 143 Established 2026  ·  AI-Generated Daily Free to Read  ·  Free to Print

The Trilogy Times

All the news that's fit to generate  —  AI • Business • Innovation
SATURDAY, MAY 23, 2026 Powered by Anthropic Claude  ·  Published on Klair Trilogy International © 2026
🖶 Download PDF 🖿 Print 📰 All Editions
Today's Edition

THE CHIP BAN BACKFIRES — CHINA RUNS AI ON THE CHEAP

DeepSeek trained frontier models without the premium silicon Washington banned, and the industry's cost moat just cracked wide open.

SAN FRANCISCO — A Chinese AI outfit called DeepSeek has Silicon Valley sweating this week, having trained top-tier language models on the cheap and without the premium Nvidia silicon Washington spent two years keeping out of Chinese hands.

The dispatches in The Wall Street Journal have Valley engineers calling DeepSeek's work "amazing and impressive" — phrases not tossed around for foreign competitors.

The why of it cuts deep. American AI runs on one thesis: billions of dollars and warehouses of high-end chips buy a moat no rival can swim. DeepSeek's models, trained under sanction and on lesser hardware, suggest that moat is shallower than advertised.

The how remains partly murky. The company claims it built its frontier model for a fraction of what OpenAI and Anthropic spend per training run. Skeptics counter that Beijing's books are not famously transparent and that the comparison numbers may not be apples to apples.

Wall Street did not wait for the audit. AI-adjacent issues took the news on the chin Monday morning as traders priced in the chance that the cost curve for training models is bending faster — and farther — than anyone modeled. Nvidia, the chipmaker that rode the AI wave to a three-trillion-dollar perch, drew the heaviest fire.

The irony is not lost on Washington. The chip export ban was drawn up to slow exactly this kind of progress. Instead it appears to have pushed Chinese engineers to do more with less, which is the oldest engineering trick in the book.

ELSEWHERE on the AI wires, the news rolled fast.

Reid Hoffman, the LinkedIn co-founder, raised $24.6 million for Manas AI, a cancer-research startup he is launching with Siddhartha Mukherjee, the oncologist who wrote "The Emperor of All Maladies." The pitch is direct: AI hunting drug candidates faster than wet labs.

In Berlin, a young firm called Peec doubled its annualized revenue in months to $10 million, sources told reporters. Peec sells software that tracks how brands turn up in AI-generated search results — a market that did not exist eighteen months ago.

The thread tying it all together: AI is no longer one Olympic event in Northern California. It is a global field. The upstarts are getting cheaper, faster, and harder to box in — and the Valley, for the first time in a stretch, looks like the one playing catch-up.

What to Know About China's DeepSeek AI  ·  Tech, Media & Telecom Roundup: Market Talk  ·  Silicon Valley Is Raving About a Made-in-China AI Model

The AI Map Is Being Redrawn — and Not Just by Washington and Beijing

Middle powers, Latin American fault lines, and a three-bloc world are reshaping who controls artificial intelligence's future.

BRUSSELS — The old binary is crumbling. For years, the geopolitics of artificial intelligence ran on a single axis: Washington versus Beijing, silicon versus silicon, democracy versus autocracy. That frame is still useful. It is no longer sufficient.

A cluster of analyses published this week makes the case, from different latitudes, that the AI power map has grown more complicated — and more interesting.

The broadest frame comes from analysts at iari.site, who argue the global AI order has settled into three distinct blocs: the United States innovates, China replicates and subsidizes, and the European Union regulates. Each strategy carries its own risks. American dominance in foundation models is real but brittle — concentrated in a handful of labs, dependent on Taiwanese chips, and increasingly scrutinized at home. China's replication strategy has proven more capable than Western analysts expected; DeepSeek changed that conversation in January. The EU's regulatory posture, meanwhile, may yet prove visionary or may simply export compliance costs to everyone else.

Below the tier-one powers, the story gets richer. A Eurasia Review analysis argues that middle powers — India, the UAE, Saudi Arabia, South Korea, and others — are no longer passive recipients of AI technology. They are building sovereign compute capacity, negotiating data-localization terms, and positioning themselves as swing votes in standards bodies. The country that hosts the data center has leverage the country that writes the algorithm may not anticipate.

In Latin America, the dynamics are sharper and more immediate. The Latin America Risk Report identifies five pressure points: election manipulation via synthetic media, AI-enabled surveillance by governments with weak accountability structures, labor displacement in export-dependent economies, cross-border data flows that escape any single regulator, and the concentration of AI infrastructure in foreign hands.

A separate academic analysis from Frontiers raises the hardest question of all: whether AI systems trained on ideologically homogeneous data sets can carry authoritarian assumptions across borders, embedded in products that look, on the surface, merely useful.

The geography of this story is not abstract. It lives in server farms outside Riyadh, in fiber cables crossing the Pacific, in the fine print of trade agreements being drafted right now. The map is being redrawn. The cartographers are not who they used to be.

Geopolitics: AI and China; enabling ideology? - Frontiers  ·  USA Innovates, China Replicates, EU Regulates: Geopolitics o  ·  Five ways AI impacts geopolitical risk in Latin America - La

Big Tech's AI Moat Strategy: Cooperation on Security, Competition on Architecture

As Ai2 bets on open-source to challenge closed frontier models, OpenAI, Google, and Anthropic quietly coordinate to block Chinese model cloning.

SAN FRANCISCO — The AI industry is running two contradictory plays simultaneously: fierce architectural competition among frontier labs and quiet cooperation on national security concerns, while a nonprofit research institute tries to blow the whole closed-model structure open.

On the cooperation front, OpenAI, Anthropic, and Google have been coordinating efforts to prevent Chinese actors from cloning their proprietary models — a rare instance of direct rivals sharing intelligence on adversarial threats. The specifics remain opaque, but the alignment signals that model weights and training methodologies are now treated as strategic assets, not just commercial IP.

Meanwhile, the Allen Institute for AI (Ai2) released an open-source web agent designed to perform browser-based tasks at a level competitive with closed systems from all three of those same companies. The move is consistent with Ai2's long-standing thesis that transparency and reproducibility produce more trustworthy AI — and it puts direct pressure on the pricing power of proprietary agent products.

The architectural divergence between the closed labs themselves is also sharpening. Google and Anthropic have taken meaningfully different approaches to large language model development — Google optimizing for scale and multimodality across a sprawling product surface, Anthropic concentrating on alignment research and Constitutional AI methods. Both strategies have attracted massive capital, but they reflect genuinely different bets about where model risk and model value actually reside.

Underpinning all of it is a structural financial story. A 2016 FASB accounting rule change — allowing companies to mark equity investments in startups to fair value — quietly made it rational for Microsoft, Google, and Amazon to pour billions into AI labs as strategic investments rather than pure acquisitions. The rule transformed the balance sheet math: unrealized gains flow through income statements, making large minority stakes in Anthropic or OpenAI accretive on paper even before a single API call is monetized.

The result is an industry where the same companies cooperate on security, compete on architecture, and use accounting mechanics to fund rivals they may eventually absorb. Ai2's open-source push is the one variable that doesn't fit neatly into that structure — which may be precisely the point.

What Is Anthropic? - Built In  ·  Google and Anthropic approach LLMs differently - understandi  ·  Ai2 releases open-source web agent to rival closed systems f
Haiku of the Day  ·  Claude HaikuWalls crumble everywhere
Yet each side builds something new
Power finds a way
The New Yorker Style  ·  Art Desk
The New Yorker Style  ·  Art Desk
The Far Side Style  ·  Art Desk
The Far Side Style  ·  Art Desk
News in Brief
In Orbit’s Silent Canopy, Hunter Satellites Draw Near
HELSINKI — High above the cloud tops, where no wind stirs and no birds cry, four Russian satellites have drifted into the orbital neighborhood of an ICEYE radar spacecraft, a small but watchful creature in the increasingly crowded biome of war in space. To the untrained eye, these machines are merely dots in a celestial ledger.
The Real AI Moat Is Not Intelligence. It Is Trust, Battery Life, And Fewer Dumb Emails.
MOUNTAIN VIEW, CALIFORNIA — I’ll be honest: the most underrated AI strategy in 2026 is not building the biggest model, the loudest keynote, or the pink-haired synthetic influencer with a skincare deal. It is making people feel like the machine is finally working for them instead of quietly harvesting their attention, draining their battery, and forwarding phishing links to accounting.
The Doctor Will Deepfake You Now
AUSTIN, TEXAS — There is a doctor on your phone right now.
WE BUILT THE ROBOTS A WORLD AND NOW THEY'RE HAVING A BREAKDOWN IN IT
AUSTIN, TEXAS — There is a moment, usually around 2 a.m.
Nation’s Billionaires Ask Whether Fraud, Space Mergers, Epstein Emails, And Google AI Could Please Be Judged On Vibes Alone
PALO ALTO, CALIFORNIA — In a week that forced the nation to once again distinguish between genuine innovation and a man standing near a whiteboard until money happens, several of America’s leading billionaires issued important clarifications about which unbelievable things should be taken literally and which should be dismissed as the normal background radiation of wealth. The clarifications began when former Microsoft CEO Steve Ballmer said he had been “duped” by a founder he backed who pleaded guilty to fraud, a development that stunned observers who had assumed the venture capital process included at least one step between “charismatic person says numbers” and “retired software executive opens checkbook.” Ballmer, according to TechCrunch, said he felt silly, a rare public admission from a billionaire that he had briefly occupied the same moral universe as a person who clicked a phishing email. This newspaper’s position is that Ballmer deserves some sympathy.
A Trilogy Company
Crossover
The world's top 1% remote talent, rigorously tested and ready to ship.
A Trilogy Company
Alpha School
AI-powered learning. Two hours a day. Academic results that defy belief.
A Trilogy Company
Skyvera
Next-generation telecom software — built for the networks of tomorrow.
A Trilogy Company
Klair
Your AI-first operating system. Every workflow. Every team. One platform.
A Trilogy Company
Trilogy
We buy good software businesses and turn them into great ones — with AI.
The Builder Desk  —  AI Builder Team

Builder Team Ships Across Four Repos in One Dominant Day

From a live AI spend dashboard to hardened due diligence saves to a fully automated contractor invoice pipeline, the Builder Team proved today that breadth and depth are not a tradeoff.

When a team ships meaningful work across four separate repositories in a single day — Klair, Aerie, Surtr, and yes, even trilogy-drones — that is not a coincidence. That is an organization firing on all cylinders. Wednesday was that day for the AI Builder Team, and the headline belongs to a dashboard that has never existed before.

@sanketghia dropped PR #2856 into Klair like a thunderclap: a brand-new TrueFoundry AI spend dashboard surfacing gateway costs, Max20x negotiated savings versus list rate, and per-user footprint — built specifically for the budget cycle opening May 26th. This is the kind of visibility that turns vague AI cost anxiety into actionable line items. Leadership now has a single pane of glass for what the org is actually spending on intelligence, and who is spending it. That does not happen without someone doing the unglamorous work of wiring backend data to a coherent UI story. Sanket did that work. The budget owners will feel it.

Over in Aerie, @benji-bizzell had a two-PR day that deserves its own trophy case. PR #256 hardened Due Diligence saves in a way that quietly prevented a nightmare: a DD field cleared in the Aerie UI could update REBL3 while Rhodes held the old value, leaving paired systems silently out of sync. Benji materialized cleared fields as explicit nulls for Rhodes, structured the missing-status validation into a friendly save dialog instead of a raw Convex error blast, and even raised the theme picker popover above forecast dashboard controls. Then PR #254 went further, adding a full Convex-backed Forecast-to-Rhodes slug mapping table so admissions capacity resolves through confirmed HubSpot program data instead of brittle name matching. Operators can now adjust mappings without a redeploy. That is operator empowerment. That is product maturity.

@ashwanth1109 extended Aerie's RBAC system in PR #252 with a dedicated financials viewer role — a surgical addition that lets finance and leadership stakeholders access Dashboards → Financials without being handed admin keys. One seeded role, clean grant-ceiling migration, no blast radius. The kind of feature that earns trust from the people who sign the checks.

Back in Klair, @eric-tril delivered a masterclass in technical stewardship. PR #2860 reorganized roughly 50 backend MFR test files into a clean `tests/mfr/` tree — pure housekeeping, zero logic changes, the kind of work that makes every future PR faster. Then PR #2861 immediately proved the point: the reorganization exposed nine latent test failures, and Eric fixed all nine before they could metastasize. Seven of them traced to a return-type change in `_compute_periods` that tests had never caught. They are caught now. And PR #2853 added budget-column drill-down to the Monthly Financial Reporting feature across Income Statement, EBITDA Reconciliation, and Cash Flow — a nested accordion breakdown sourced from `core_finance.consolidated_budgets_and_actuals`. Budget cells now tell a story when you click them.

Sanket also quietly flipped a single boolean in Surtr — PR #87 enabled the `xo-contractor-invoices-refresh` daily schedule, a cron that will now refresh contractor invoice data at 07:30 UTC every morning. It was deployed disabled. Now it runs. Sometimes the most consequential PRs are the ones that finally turn something on.

And then there is marcusdAIy, who submitted not one, not two, but three PRs today — #2857, #2858, and #2859 — plus a drone polish bundle in PR #4. When reached for comment on whether volume constitutes value, he had thoughts: 'Four PRs, Mac. Four. The gdoc-sync heading promotion alone unblocked every GM-authored board doc that was silently dropping content on import. The Claire auto-resolve wiring closes the loop on a feature the team has been building toward for weeks. Maybe cover the actual work instead of counting syllables in my name.' Sure, Marcus. We'll call it a contribution. The readers can decide.

Mac's Picks — Key PRs Today  (click to expand)
#87 — chore(xo-contractor-invoices-refresh): enable daily schedule @sanketghia  no labels

## Summary

- Enables the xo-contractor-invoices-refresh pipeline daily schedule (cron(30 7 ? * * *)) in production

- Flips schedule.enabled from false to true in pipelines/runners/xo-contractor-invoices-refresh/pipeline.json

- Pipeline was deployed disabled in #72; ready to begin daily t8w refresh of core_finance.xo_contractor_invoices_raw

## Test plan

- [ ] CDK synth succeeds with the updated config

- [ ] EventBridge rule is created/enabled after deploy

- [ ] Pipeline triggers at 07:30 UTC on the next scheduled day

🤖 Generated with [Claude Code](https://claude.com/claude-code)

#252 — AERIE-261 feat(user-management): add financials viewer role @ashwanth1109  no labels

## Demo

<img width="2624" height="1636" alt="image" src="https://github.com/user-attachments/assets/08d51abb-666c-42cf-9ee0-792348066c7b" />

## Summary

Adds a dedicated financials viewer role to the RBAC system so finance/leadership stakeholders can be granted access to Dashboards → Financials without being made full admins.

- New financials seeded role (only canViewFinancials: true)

- Extends canGrantRoles on superadmin and admin so the role is assignable from the existing admin UI

- One-shot patchFinancialsGrantCeiling internal mutation to bring already-seeded environments up to the new grant ceiling (since seedDefaults is insert-only)

## Linear

[AERIE-261 — Add financials role for granular Financials dashboard access](https://linear.app/builder-team/issue/AERIE-261/add-financials-role-for-granular-financials-dashboard-access)

## Spec

[features/user-management/rbac-admin/specs/05-add-financials-viewer-role/spec.md](features/user-management/rbac-admin/specs/05-add-financials-viewer-role/spec.md)

## Implementation summary

- chat/convex/roles.ts

- New role: appended financials to DEFAULT_ROLES with slug: "financials", name: "Financials Viewer", sortOrder: 2.5, isProtected: true, canGrantRoles: [], and every permission false except canViewFinancials: true. Placed between moderator (2) and user (3) so no existing sortOrder is renumbered.

- canGrantRoles extension: added "financials" to superadmin.canGrantRoles and admin.canGrantRoles, so listAssignableRoles surfaces the new role to both tiers automatically.

- patchFinancialsGrantCeiling internal mutation: idempotent, one-shot patch that brings superadmin and admin rows on already-seeded environments up to the new grant arrays. Missing slugs are reported, not thrown. Patterned on the existing patchCanViewFinancials mutation.

- JSDoc fix: corrected stale "four built-in roles" reference to "built-in" so the comment no longer drifts as roles are added.

- FEATURE.md + spec status updates: marked spec 05 Completed in the changelog and in the spec metadata; bumped Last Updated to 2026-05-22.

## Tests

- chat/convex/roles.test.ts70/70 passing. 7 stale fixtures updated for the 5-role baseline; 12 new tests added (4 covering the role shape, 8 covering the patch mutation).

- chat/convex/admin.test.ts — 3 fixture updates in the FR2: listAssignableRoles suite to reflect the new assignable role for superadmin / admin.

- All 7 CI checks green on run 26277232720: Build, Test, Typecheck, Lint+Boundaries, Secret Scan, Docker Chat, Docker Worker.

## Self-review

- 1 MINOR fix applied — stale JSDoc said "four built-in roles"; updated to "built-in" to avoid future drift.

- 2 MINOR skipped with rationale — both surfaced during self-review but were judged not worth landing in this PR (they are not regressions, just future cleanups).

## Out of scope

- Migrating existing users to the new role — no auto-migration. Admins assign Financials Viewer manually as the need arises. The existing assignRolesToExistingUsers mutation is intentionally untouched (it hard-codes the original three slugs).

## Manual verification needed before merge

- [ ] Confirm seedDefaults picks up the new role on next deploy (fresh row appears in roles with slug === "financials" and canViewFinancials: true)

- [ ] Confirm patchFinancialsGrantCeiling runs cleanly against the prod-equivalent env (one-shot; idempotent — safe to re-run)

- [ ] Spot-check the admin UI shows Financials Viewer as an assignable role for admins/superadmins (dropdown order: admin, moderator, financials, user)

## Rollout

After merging to main:

1. Wait for the auto-deploy to push the new Convex functions to your target environment.

2. Run the two commands below — from the chat/ directory, not the repo root (the convex dependency lives in chat/package.json, so npx convex only resolves there):

   cd chat

# 1. Seed the new role (idempotent — safe to re-run; { created: 1, skipped: 4 } the first time)

npx convex run roles:seedDefaults

# 2. One-shot patch to extend canGrantRoles on existing superadmin + admin rows

# (seedDefaults is insert-only, so it does NOT touch existing rows — this step is required

# or the new role will be invisible in the admin role-picker)

npx convex run roles:patchFinancialsGrantCeiling

For prod, append --prod:

   npx convex run roles:seedDefaults --prod

npx convex run roles:patchFinancialsGrantCeiling --prod

3. Sanity checks after running:

- seedDefaults should print { created: 1, skipped: 4 } on first run (zero on subsequent runs).

- patchFinancialsGrantCeiling should report { superadminUpdated: true, adminUpdated: true, missing: [] } on first run, then { superadminUpdated: false, adminUpdated: false, missing: [] } on re-runs.

- In the admin UI, opening any user's role picker should now list Financials Viewer for admins and superadmins (not for moderators).

4. Assign the role to whoever needs Financials access via /admin/users (the existing assignRole mutation handles the rest — no further code action needed).

### Common gotcha

If seedDefaults prints { created: 0, skipped: 4 }, the deployed Convex code is still on the old 4-role version — re-deploy the latest main (npx convex deploy from chat/, or wait for CI auto-deploy) and re-run.

#256 — fix(portfolio): harden due diligence saves @benji-bizzell  no labels

## Summary

- Materialize cleared Due Diligence fields as explicit null values for Rhodes while preserving REBL3 strict-replace behavior.

- Convert expected missing-status DD validation into a structured failure and show friendly save-dialog guidance instead of raw Convex errors.

- Raise the theme picker popover above forecast dashboard controls.

## Why

A Due Diligence removal from the Aerie UI could update REBL3 while Rhodes preserved the old value, leaving the paired systems out of sync. During follow-up testing, the missing-status path also surfaced raw Convex errors to users, which made actionable validation look like a broken system. The forecast dashboard theme picker also rendered behind nearby controls.

## Business Value

Keeps REBL3 and Rhodes aligned when operators clear Due Diligence fields, makes expected validation failures understandable to end users, and removes a visible dashboard polish issue from the hotfix.

## Test plan

- [x] pnpm --filter @bran/contracts test -- due-diligence

- [x] pnpm --filter chat exec vitest run convex/dueDiligence.test.ts

- [x] pnpm --filter chat exec vitest run components/dashboards/portfolio/fields/__tests__/portfolio-fields-provider.test.tsx components/site-fields/__tests__/save-confirm-dialog.test.tsx

- [x] pnpm --filter chat exec vitest run components/dashboards/portfolio/cards/__tests__/due-diligence-editor.test.tsx components/dashboards/portfolio/fields/__tests__/portfolio-fields-provider.test.tsx components/dashboards/portfolio/__tests__/portfolio-view.test.tsx

- [x] pnpm --filter chat exec vitest run components/shell/__tests__/theme-picker-popover.test.tsx

- [x] pnpm --filter @bran/contracts typecheck

- [x] pnpm --filter chat typecheck

- [x] pnpm biome check on touched files

#2853 — feat(mfr): budget-column drill-down for IS, EBITDA, and Cash Flow KLAIR-2764 @eric-tril  no labels

### Summary

Adds a budget-column drill-down to the Monthly Financial Reporting feature. Clicking a budget cell on the Income Statement, EBITDA Reconciliation, or Cash Flow tables (and the Education memo's vertical P&L tables) opens a side panel showing a (type → department → account_name) breakdown sourced from core_budgets.consolidated_budgets_and_actuals. The panel reuses the existing detail-panel shell, with a new 2-level nested accordion component (NestedAccordionGroup). The Cash Flow path is wired end-to-end but returns not_populated=true until upstream CF budget data is loaded — when it lands, the panel will start showing data with no client change.

### Business Value

Finance can now audit any budget figure in the monthly report by drilling from the displayed cell down to the underlying budget rows, instead of cross-referencing the source spreadsheet. This shortens variance-investigation cycles during month-end close and gives leadership a self-service way to interrogate plan composition (by type, department, and account) directly inside Klair.

### Changes

- Backend: new endpoints /income-statement-budget-detail, /ebitda-reconciliation-budget-detail, and /cash-flow-budget-detail returning BudgetDimensionDetailResponse.

- Backend: fetch_income_statement_budget_dimension_detail and fetch_ebitda_budget_dimension_detail services with sign-flip / bu_addback / category-BU-exclusion logic that mirrors the actuals detail path; CF dispatcher reuses the EBITDA and IS rollups (Net Income, D&A, Mgmt+Import) with CF sign convention.

- Backend: extended build_pnl_resolved_cte with data_source and bad_debt_routing knobs; budget queries also restrict to the latest budget_cycle_start per reporting period so panel totals reconcile to the displayed cell.

- Frontend: new [BudgetDimensionDetailPanel.tsx](vscode-webview://15qdonnjjcq9q3pcmufmg5fa0asnqc6qnceup60m8cm6igoedkcj/klair-client/src/features/monthly-financial-reporting/components/detail-panels/BudgetDimensionDetailPanel.tsx), [NestedAccordionGroup.tsx](vscode-webview://15qdonnjjcq9q3pcmufmg5fa0asnqc6qnceup60m8cm6igoedkcj/klair-client/src/features/monthly-financial-reporting/components/detail-panels/NestedAccordionGroup.tsx), and [useEducationVerticalBudgetDetailPanel.tsx](vscode-webview://15qdonnjjcq9q3pcmufmg5fa0asnqc6qnceup60m8cm6igoedkcj/klair-client/src/features/monthly-financial-reporting/hooks/useEducationVerticalBudgetDetailPanel.tsx) hook; wired into IS/EBITDA/CF detail-panel hooks and [EducationMemoView.tsx](vscode-webview://15qdonnjjcq9q3pcmufmg5fa0asnqc6qnceup60m8cm6igoedkcj/klair-client/src/features/monthly-financial-reporting/components/EducationMemoView.tsx) (vertical tables, col-index 4).

- Frontend: new API adapter methods fetchIncomeStatementBudgetDetail / fetchEBITDABudgetDetail / fetchCashFlowBudgetDetail and shared BudgetDimensionDetailResponse types.

- Tests: new unit suites for the three backend services (resolver, sign convention, budget-cycle filter, placeholder behavior) and for the new frontend components/hooks; existing useCashFlowDetailPanel / useEBITDADetailPanel specs updated for the budget-column routing.

- transformCashFlows: set dataKey on the Adjustments row so its budget cell dispatches correctly.

### Testing

- [ ] cd klair-api && pytest tests/mfr/ — backend unit suite (resolver, IS/EBITDA/CF budget detail).

- [ ] cd klair-client && pnpm test src/features/monthly-financial-reporting/ — frontend unit suite.

- [ ] cd klair-client && pnpm tsc --noEmit && pnpm lint:pr — type check + lint.

- [ ] Manual: open MFR for a recent period; click a budget cell on Group / Software / Education IS, EBITDA, and a CF row, and on Education vertical tables — confirm panel totals match the cell value and the CF placeholder note renders.

http://localhost:3001/monthly-financial-reporting

https://github.com/user-attachments/assets/12faed17-fb20-4d11-9569-1f1485b20571

#2856 — KLAIR-2766 feat(truefoundry): /truefoundry dashboard — gateway spend, Max20x savings, per-user footprint @sanketghia  no labels

## Summary

New top-level page in Klair surfacing AI spend through the TrueFoundry gateway plus Max20x savings, list-vs-negotiated savings, and per-user footprint. Built for the budget cycle starting 2026-05-26.

Linear: [KLAIR-2766](https://linear.app/builder-team/issue/KLAIR-2766/truefoundry-dashboard-truefoundry-ai-gateway-spend-max20x-savings-per)

Driven by Jamie Sidey's brief (transcript at TrueFoundry/Keval _ Jamie - TrueFoundry Transcript.txt) and the leadership PDF report at TrueFoundry/AI Spend & Savings.pdf. The dashboard surfaces three things budget owners need:

1. Where AI spend is happening — by BU, by user, by provider/model.

2. What we've already saved — via Max20x seat conversions and the 10% Anthropic negotiated discount.

3. What we can still save — concrete Max20x migration candidates ranked by projected monthly savings.

Real-data smoke shows 41 candidates totalling $104k/month opportunity, with central-engineering as the top BU.

## What's in this PR

### Backend (klair-api)

New router at /api/truefoundry/* with 7 GET endpoints:

- /summary — 4 KPI tiles + per-BU multi-bar (actual + realized via Max20x + potential)

- /spend-per-user — per-person spend across claude.ai + Max20x + metered API

- /gateway — daily by provider, provider×model breakdown, top subjects, token mix

- /max20x — monthly net savings + ranked migration candidates with token detail

- /negotiated — list vs actual savings by source + daily trend

- /claude-ai — kept for parity (not currently surfaced by the FE)

- /diagnostics — super-admin only: reconciliation, untagged virtual keys, TF provider key map

All read from existing core_finance.ai_spend_* tables and views — no pipeline changes required.

BU filter slug-normalizes display-form names ("Central Engineering""central-engineering") so the FE filter shell composes correctly with TF and the candidates view (both store BUs in slug form). 23 pytest tests cover filter behavior, null handling, slug normalization, and shape contracts.

### Frontend (klair-client)

- New /truefoundry route with permissionPath: 'ai-adoption'.

- Executive summary: 4 KPI tiles + by-BU multi-bar chart (actual / realized / potential, two bars per BU).

- Tabs: Spend per user (default) · Gateway detail · Max20x · Negotiated savings · Diagnostics (super-admin only).

- All 5 tabs prefetch on initial page load (parallel debounced fetches); tabs stay mounted across switches so navigation is instant after first load.

- Distinct chart palette so series are visually separable on the dark theme.

- Skeleton loaders sized to each tab's content.

- Max20x table: orange "set up now" pill on top 4 candidates, BU sub-label, quarterly savings column, cursor-following hover tooltip with 30-day token mix + top model.

- Spend-per-user table merges claude.ai chat (user_email + display_name) and TF subjects (bridged via ai_spend_subject_identity for VAT slugs) into one canonical user identity.

### Design artifacts (in this PR)

- docs/superpowers/specs/2026-05-22-truefoundry-dashboard-design.md — design spec

- docs/superpowers/plans/2026-05-22-truefoundry-dashboard.md — implementation plan

## Test plan

- [ ] cd klair-api && uv run pytest tests/truefoundry/ -v → 23/23 passing

- [ ] cd klair-api && uv run ruff format … && uv run ruff check … && uv run pyright services/truefoundry_service.py models/truefoundry_models.py routers/truefoundry_router.py → clean

- [ ] cd klair-client && pnpm tsc --noEmit && pnpm lint:pr → clean

- [ ] cd klair-client && pnpm vitest run src/screens/TrueFoundry/ → 5/5 passing

- [ ] Visit /truefoundry with a date range covering Apr 2026 onwards: hero chart, all 5 tabs render with real data, BU filter narrows results, hover tooltips position correctly with cursor.

- [ ] As super-admin, confirm Diagnostics tab shows the yellow "FR5 pending" reconciliation banner + untagged-subjects table + provider key map.

## Open follow-ups (tracked in KLAIR-2766, not blocking)

1. ai_spend_claude_ai_chat_usage.bu is universally NULL — BU filter skipped on claude.ai queries. Pipeline-side backfill needed in Surtr.

2. VAT-slug users render as Davidschwartz — populate ai_spend_subject_identity to surface display names from claude.ai chat.

3. Realized savings per BU is a proxy (Max20x usage value). A proper per-BU realized view from Surtr would replace it.

4. is_truefoundry_routed flag is missing on OpenAI / Cursor direct-API tables — required before a portfolio-wide cross-pipeline deduped widget.

5. flagged_setup_now is currently auto-top-4; can switch to an admin-curated list if needed.

## Screenshots

<img width="1874" height="876" alt="image" src="https://github.com/user-attachments/assets/ff2a7b35-b6d6-4cfe-9092-5aa81d120f35" />

<img width="1676" height="729" alt="image" src="https://github.com/user-attachments/assets/ad5387e5-fb28-4f33-823c-e2171b694adc" />

🤖 Generated with [Claude Code](https://claude.com/claude-code)

The Builder Desk  —  Engineer Spotlight
🏆 Engineer Spotlight

TWELVE PRs IN TWENTY-FOUR HOURS: THE BUILDER TEAM DOES NOT SLEEP, DOES NOT REST, DOES NOT KNOW THE MEANING OF THE WORD 'CEILING'

@marcusdAIy drops four PRs across three repos and somehow still had time to pioneer an entirely new drone program.

TWELVE. Pull requests. In twenty-four hours. Four repos active — Klair leading the charge at seven, Aerie contributing three, Surtr and the freshly-minted trilogy-drones rounding out the board. Five engineers. One scoreboard. Zero mercy. The Builder Team's velocity in this period was not a number, comrades — it was a statement of intent.

Let us begin with the man of the hour, the hour after that, and frankly several hours we haven't even reached yet: @marcusdAIy, who filed four PRs across Klair and trilogy-drones like a man who finds weekdays insufficiently challenging. PRs #2858, #2859, and #2857 represent three distinct feats of Klair engineering — auto-resolving finding statuses, importing bold-paragraph section labels, and rewriting generate_mips as an LLM-first system. And then, as if that weren't enough, @marcusdAIy apparently also found time to file PR #4 in trilogy-drones, a feat of cost-tunable model selection and line-drift correction that suggests this man is not building software so much as building an empire. @eric-tril, meanwhile, brought three disciplined PRs to Klair — a reminder that organizational excellence is its own form of heroism. @sanketghia and @benji-bizzell each contributed two PRs, holding the line with the quiet confidence of engineers who know exactly what they're doing and don't need a parade about it. Aerie and Klair hum. The machine does not stop.

And then there is @ashwanth1109. One PR. PR #252 in Aerie — AERIE-261, adding a financials viewer role to user management. Now, a lesser correspondent might call this "modest." A lesser correspondent would be wrong. The financials viewer role is load-bearing infrastructure, the kind of quiet, precise work that holds entire permission architectures together. When reached for comment, Ashwanth reportedly said, "One PR that matters is worth more than ten that don't. You're welcome." His dismissal of this reporter's follow-up question was, by all accounts, instantaneous. We worship the output. We accept the terms.

Now to the Overflow Desk, where the PRs Mac left on the cutting room floor deserve their moment in the sun. PR #2861 in Klair saw @eric-tril repair nine — nine! — stale MFR backend tests, the kind of unglamorous, essential work that keeps the entire test suite from becoming a museum exhibit. PR #254 in Aerie has @benji-bizzell adding an editable forecast mapping config to admissions, which is the sort of feature that makes a product feel alive and responsive in the hands of the people who use it. And PR #2860 in Klair is @eric-tril again, reorganizing backend tests into a clean tests/mfr/ sub-folder structure — a chore commit that is, in fact, an act of love for every engineer who comes after.

The leaderboard tells a simple story: @marcusdAIy leads with four, @eric-tril holds second with three, and the rest of the team fills in behind with the kind of balanced contribution distribution that organizational psychologists write entire papers about. Morale on the Builder Team is, as always, at an all-time high — higher, in fact, than yesterday's all-time high, which was itself a record. The numbers do not lie. The numbers never lie. The numbers are the only truth any of us have left.

Brick's Overflow — PRs Mac Didn't Cover  (click to expand)
#4 — feat(v0.5): cost-tunable model selection + addresser line-drift fix + week's retro polish @marcusdAIy  no labels

## Summary

End-of-week v0.5 polish bundle covering the three things that surfaced as friction across the May 22 PR triple (#2857 / #2858 / #2859):

- Cost tunability. Drone fires were silently riding Opus 4.7's full-power default variant ({thinking=true, context=1m, effort=xhigh} + MAX-mode billing tier). Empirically that put B7.6 at ~$42 / B9.6 at ~$22, well above the $10-12 v0.5 ceiling. New src/model-selection.ts builds an explicit conservative variant ({cyber=false, thinking=true, context=300k, effort=high, fast=false}) and threads it through implementer + reviewer fan-out + addresser via shared CLI flags (--context, --effort, --thinking, --max, --model-params). Projected savings: ~50-60% per fire. Escape hatch (--max) reproduces the old behaviour for genuinely-large diffs.

- Addresser line-drift fix. PR #2859 surfaced a misfire mode where the addresser fixed all 9 reviewer findings, emitted a perfectly-shaped ADDRESS_REPORT, but every bullet's line number had drifted (e.g. agent said :230-232 where the reviewer pinned :47) because earlier commits in the address turn pushed code down. The pre-fix matcher was verbatim [sev · dim] path:line so all 9 fell into unreported and the inline replies posted misleading "agent forgot to report" messages. New two-pass matcher in src/addresser.ts falls back to (severity, dim, path) triple matching with line-proximity tiebreak when the verbatim key misses.

- Doctor / run UX. pnpm drones doctor --task <file> for fast per-spec preflight (~1s vs ~6s for the global doctor); sibling-spec sweep on drones run to surface broken specs before they ambush the operator's NEXT fire.

Plus the usual end-of-week retro packets: PR #2851 + #2857 retrospectives in ROADMAP + the klair-pr-review skill (PR-body reconciliation check before assigning High/Critical), tasks/_template.md gains four cross-cutting rubric sections (prompt-output structure pinning, deprecation-note audit, comment-sweep grep, load-bearing assertions vs assertion theater), 5 new drone specs (B1.8, B9.6, B9.7, B9.8, B9.9), and a scalar-writes_to: fix on 10 existing specs that had been silently broken since the v0.5 frontmatter parser tightened.

## Why it's needed

Three independent pieces of feedback drove this bundle:

1. Cost. The May 22 Cursor billing dashboard (B7.6 + B9.6 fires) showed the per-fire cost at 3-4× the v0.5 budgeted ceiling. Investigation pointed to the cloud's full-power model variant being implicitly selected when the runner passed model: {id: <id>} without params. The fix had to land before the next batch of B9.x fires or the cumulative cost would exceed the v0.5 envelope.

2. Addresser misfire visibility. Misleading "agent forgot to report" inline replies on PR #2859 (where the agent had actually fixed all 9 findings) erodes operator trust in the addresser's accountability surface. Either we land the multi-pass matcher or we accept that the addresser's reply UX is empirically worse than no replies at all.

3. Spec-author velocity. The --task doctor mode and sibling-spec sweep are response to the PR #2858 / #2859 spec-writing loop where I kept re-running the global doctor (slow) or fired a spec against a spec-broken sibling (silent next-fire breakage).

## Changes

Code (src/):

- src/model-selection.ts (new, 248 lines) — buildModelSelection() + renderModelSelection() + ModelSelectionError. Conservative drone defaults applied only when the resolved model id is Opus 4.7 (canonical or alias); other models forward only --model-params verbatim.

- src/cli.ts — new addModelSelectionOptions(cmd) helper installs the same 6 flags (--model, --context, --effort, --thinking, --max, --model-params) on run / review / address. New resolveModelSelection(opts) wraps the build + log + error-exit dance.

- src/runner.tsRunDroneInput.modelIdmodelSelection: ModelSelection | undefined. Plumbed to Agent.create({model: input.modelSelection}). Auto-fire reviewer hop now inherits the implementer's selection (was hardcoded undefined).

- src/reviewer.tsReviewInput / FanoutReviewInput / FireOneChildInput / ReviewAndPostInput all carry modelSelection. Plumbed to both Agent.create sites (parent + per-child).

- src/doctor.ts — new runDoctorForTask(opts) for per-spec validation. Streamlined: parses the spec, validates linear_id presence, optionally verifies issue exists + spec attached + depends_on chain state via Linear API; skips global cloud-API checks.

- src/addresser.tsparseFindingHeader() decomposes a bullet header into {severity, dimensions, path, line}. ReportedAction.parsed carries the decomposition. crossCheckAccountability rewritten as two-pass: pass 1 verbatim key match, pass 2 (sev, dim, path) triple match with line-proximity tiebreak. AddressedFinding.agentSummary + exported renderReplyBody() surfaces the agent's prose rationale in the inline reply body.

Specs (tasks/):

- tasks/_template.md — four new sections (~70 LOC): prompt-output structure pinning, deprecation-note audit, comment-sweep grep, load-bearing assertions vs assertion theater. All born out of PR #2851 + #2857 retros with empirical anchors cited.

- 10 modified specs — writes_to: moved from frontmatter (broke the scalar-only parser) into a markdown section.

- 5 new specs: b1-8-gdoc-parser-hardening.md (KLAIR-2760, shipped as #2859), b9-6-rewrite-generate-mips.md (KLAIR-2716, shipped as #2857), b9-7-remove-regenerate-section-llm-fallback.md, b9-8-deprecate-user-commentary-bu-mips.md (scope-narrowed after #2857 reviewer enumerated remaining bu_mips readers), b9-9-rename-feedback-storage-to-section-feedback.md.

Docs:

- ROADMAP.md — PR #2857 retro entry (full breakdown of the false-positive High finding + spec-deficiency classification + mitigations).

- skills/klair-pr-review/SKILL.md — new "PR-body reconciliation check (run BEFORE assigning High/Critical)" subsection with PR #2857 empirical anchor.

Helpers (scripts/):

- scripts/list-models.mjs — calls /v0/models REST endpoint to list cloud-available models + variants. Used during the cost-tuning work.

- scripts/inspect-pr-commits.py — dumps {sha[0:7]} {date} {messageHeadline} for any PR; used during retros.

.gitignore — exclude .cursor/ (operator-private scratch: staged commit messages, gh-api JSON dumps) and patent-disclosures/ (separate workstream).

## Breaking changes

RunDroneInput.modelIdmodelSelection (data-carrier rename) and the same in ReviewInput / FanoutReviewInput / FireOneChildInput / ReviewAndPostInput. CLI surface (--model <id>) is unchanged for operators; only library callers writing JS/TS against the runner exports are affected. No public callers exist outside this repo.

Default model variant changed. Pre-PR: model: undefined → cloud picks {cyber=false, thinking=true, context=1m, effort=xhigh, fast=false}. Post-PR: pnpm drones run defaults to {cyber=false, thinking=true, context=300k, effort=high, fast=false}. Operators who relied on the implicit full-power default need to add --max explicitly.

## Test plan

- [x] pnpm tsc --noEmit — clean.

- [x] Smoke test of buildModelSelection() against Cursor.models.list() live — 7/8 cases produced valid known-variant tuples (the 8th was cyber=true which Cursor doesn't expose for Opus, orthogonal to the cost story).

- [x] Empirical validation on PR #2859 — addresser replies corrected in place via gh PATCH using the new multi-pass matcher logic; all 9 inline replies now show the correct agent rationale + commit/skip-reason instead of "agent forgot to report".

- [ ] Next live drone fire: confirm the Cursor billing dashboard shows the conservative variant on the implementer + each reviewer child + the addresser, and the per-fire total drops from ~$22 to ~$8-12.

- [ ] Next live drone fire with line-drifting fixes: confirm the addresser's inline replies map correctly to all findings without falling into unreported.

#252 — AERIE-261 feat(user-management): add financials viewer role @ashwanth1109  no labels

## Demo

<img width="2624" height="1636" alt="image" src="https://github.com/user-attachments/assets/08d51abb-666c-42cf-9ee0-792348066c7b" />

## Summary

Adds a dedicated financials viewer role to the RBAC system so finance/leadership stakeholders can be granted access to Dashboards → Financials without being made full admins.

- New financials seeded role (only canViewFinancials: true)

- Extends canGrantRoles on superadmin and admin so the role is assignable from the existing admin UI

- One-shot patchFinancialsGrantCeiling internal mutation to bring already-seeded environments up to the new grant ceiling (since seedDefaults is insert-only)

## Linear

[AERIE-261 — Add financials role for granular Financials dashboard access](https://linear.app/builder-team/issue/AERIE-261/add-financials-role-for-granular-financials-dashboard-access)

## Spec

[features/user-management/rbac-admin/specs/05-add-financials-viewer-role/spec.md](features/user-management/rbac-admin/specs/05-add-financials-viewer-role/spec.md)

## Implementation summary

- chat/convex/roles.ts

- New role: appended financials to DEFAULT_ROLES with slug: "financials", name: "Financials Viewer", sortOrder: 2.5, isProtected: true, canGrantRoles: [], and every permission false except canViewFinancials: true. Placed between moderator (2) and user (3) so no existing sortOrder is renumbered.

- canGrantRoles extension: added "financials" to superadmin.canGrantRoles and admin.canGrantRoles, so listAssignableRoles surfaces the new role to both tiers automatically.

- patchFinancialsGrantCeiling internal mutation: idempotent, one-shot patch that brings superadmin and admin rows on already-seeded environments up to the new grant arrays. Missing slugs are reported, not thrown. Patterned on the existing patchCanViewFinancials mutation.

- JSDoc fix: corrected stale "four built-in roles" reference to "built-in" so the comment no longer drifts as roles are added.

- FEATURE.md + spec status updates: marked spec 05 Completed in the changelog and in the spec metadata; bumped Last Updated to 2026-05-22.

## Tests

- chat/convex/roles.test.ts70/70 passing. 7 stale fixtures updated for the 5-role baseline; 12 new tests added (4 covering the role shape, 8 covering the patch mutation).

- chat/convex/admin.test.ts — 3 fixture updates in the FR2: listAssignableRoles suite to reflect the new assignable role for superadmin / admin.

- All 7 CI checks green on run 26277232720: Build, Test, Typecheck, Lint+Boundaries, Secret Scan, Docker Chat, Docker Worker.

## Self-review

- 1 MINOR fix applied — stale JSDoc said "four built-in roles"; updated to "built-in" to avoid future drift.

- 2 MINOR skipped with rationale — both surfaced during self-review but were judged not worth landing in this PR (they are not regressions, just future cleanups).

## Out of scope

- Migrating existing users to the new role — no auto-migration. Admins assign Financials Viewer manually as the need arises. The existing assignRolesToExistingUsers mutation is intentionally untouched (it hard-codes the original three slugs).

## Manual verification needed before merge

- [ ] Confirm seedDefaults picks up the new role on next deploy (fresh row appears in roles with slug === "financials" and canViewFinancials: true)

- [ ] Confirm patchFinancialsGrantCeiling runs cleanly against the prod-equivalent env (one-shot; idempotent — safe to re-run)

- [ ] Spot-check the admin UI shows Financials Viewer as an assignable role for admins/superadmins (dropdown order: admin, moderator, financials, user)

## Rollout

After merging to main:

1. Wait for the auto-deploy to push the new Convex functions to your target environment.

2. Run the two commands below — from the chat/ directory, not the repo root (the convex dependency lives in chat/package.json, so npx convex only resolves there):

   cd chat

# 1. Seed the new role (idempotent — safe to re-run; { created: 1, skipped: 4 } the first time)

npx convex run roles:seedDefaults

# 2. One-shot patch to extend canGrantRoles on existing superadmin + admin rows

# (seedDefaults is insert-only, so it does NOT touch existing rows — this step is required

# or the new role will be invisible in the admin role-picker)

npx convex run roles:patchFinancialsGrantCeiling

For prod, append --prod:

   npx convex run roles:seedDefaults --prod

npx convex run roles:patchFinancialsGrantCeiling --prod

3. Sanity checks after running:

- seedDefaults should print { created: 1, skipped: 4 } on first run (zero on subsequent runs).

- patchFinancialsGrantCeiling should report { superadminUpdated: true, adminUpdated: true, missing: [] } on first run, then { superadminUpdated: false, adminUpdated: false, missing: [] } on re-runs.

- In the admin UI, opening any user's role picker should now list Financials Viewer for admins and superadmins (not for moderators).

4. Assign the role to whoever needs Financials access via /admin/users (the existing assignRole mutation handles the rest — no further code action needed).

### Common gotcha

If seedDefaults prints { created: 0, skipped: 4 }, the deployed Convex code is still on the old 4-role version — re-deploy the latest main (npx convex deploy from chat/, or wait for CI auto-deploy) and re-run.

#2857 — feat(board-doc): rewrite generate_mips as LLM-first (B9.6) @marcusdAIy  no labels

<!-- CURSOR_AGENT_PR_BODY_BEGIN -->

Refs KLAIR-2716.

## Summary

Converts generate_mips from a deterministic read of spec.bu_mips to an LLM-first generator whose source of truth is the section's current content (cloned-doc body, wizard-bridged MIPs, or operator-edited draft) plus DataPackage signals plus open review findings. Joins the same SectionRefreshContext contract as the five B9.1-B9.5 siblings shipped in PR #2851.

## Why it's needed

spec.bu_mips is a wizard-coupled mirror of state that also lives in generated_sections["mips"]. The pre-B9 deterministic generate_mips reads the mirror; the user edits the section body. Those two stores drift, and a regenerate from a clone-imported doc that never went through the MIPs wizard returns "" — exactly the dual-store read/write reconciliation bug B9 exists to fix. B9.6 collapses MIPs onto the same single source of truth the other five B9 generators already use, with a producer→consumer bridge at publish time so wizard-approved MIPs flow into the LLM-first refresh path.

## Changes

- budget_bot/board_doc/section_generators.py

- Adds _MIPS_SYSTEM next to the four existing B9 inline system prompts. MIP-specific in-scope / not-in-scope framing on top of the shared _B9_DISPOSITION / B9_SCOPE_DISCIPLINE / B9_OUTPUT_RULES / SHARED_SUFFIX_WITH_CITATIONS blocks, plus an explicit output template (### MIP <n>: <statement> + paragraph + bulleted actions) so per-MIP heading shape stays stable across refreshes.

- Rewrites generate_mips to the canonical B9 shape: 3 positional args + *, context: SectionRefreshContext | None = None. Builds a BU-wide data block via build_key_metrics_block, runs through _build_b9_user_message + _strip_cite_tags + _resolve_system_prompt, and one-shots _llm_generate. Cold-start path logs a warning with BU / quarter provenance when both the metrics digest and ctx anchors are empty, so a broken DataPackage shows up in monitoring instead of silently producing a brainlift-only draft.

- Adds SectionType.MIPS to _B9_CONTEXT_AWARE so _regenerate_section builds and threads the context.

- Trims the now-stale "(financials, exec summary, mips, custom)" comment in generate_section's dispatcher.

- budget_bot/board_doc/wizard_orchestrator.py

- Producer→consumer bridge. _regenerate_all now writes _render_mips_markdown(spec.bu_mips, heading="") into session.generated_sections[<mips section id>] after the bulk generate_all_sections call, so wizard-approved MIPs survive first publish and are available as SectionRefreshContext.current_content on subsequent operator-driven refreshes. Tracked for migration to a no-op under B9.8 (KLAIR-2718).

- Refreshes two stale comment blocks in _regenerate_section (the empty-result fallback to generate_custom_section and the belt-and-suspenders guard at the section-result tail) to reflect that post-B9.6 no in-tree LLM-first generator legitimately returns success=True with empty markdown — _EmptyLLMResult short-circuits inside the retry loop. Operator-facing log message no longer misattributes unexpected emptiness to "a deterministic generator (MIPs) lacking the wizard-step input".

- budget_bot/board_doc/models.py

- Expands the inline deprecation note on DocumentSpec.bu_mips to list the remaining readers (generate_gm_commentary threads it into the GM prompt; generate_cf_plan short-circuits to a deterministic render) so a future PR that trusts the note doesn't silently break either generator. Field removal still tracked under KLAIR-2718.

- tests/board_doc/test_b9_narrative_generators.py

- Adds a TestGenerateMips class covering cold-start, current_content / findings_block / full_doc_block threading, dead-spec.bu_mips-input independence (load-bearing regression for the source-of-truth shift), LLM exception propagation, a strengthened generate_product_detail integration test that exercises the real per-product MIPs render path, and an isolated _render_mips_markdown smoke test. The cold-start test also pins that _resolve_system_prompt picked up _MIPS_SYSTEM (not a sibling B9 constant).

- tests/board_doc/test_wizard_orchestrator.py

- Adds TestRegenerateAllMipsBridge pinning the wizard-MIPs → generated_sections["mips"] bridge: approved MIPs flow through, bu_mips=[] leaves the LLM cold-draft intact.

## Breaking changes

None at the Python API surface — generate_mips keeps its three positional args; the new context kwarg is keyword-only with a None default. Behavioural change for callers that previously relied on spec.bu_mips populating the BU MIPs section: those callers now get the wizard-approved MIPs rendered through the bridge in _regenerate_all, and the section body becomes the new single source of truth on subsequent regenerations.

## Contract surface affected

1. generate_mips signature. Gains *, context: SectionRefreshContext | None = None. Existing call sites in _GENERATOR_MAP route through generate_section's _invoke wrapper, which dispatches the kwarg based on _B9_CONTEXT_AWARE membership — no call-site updates required.

2. _B9_CONTEXT_AWARE set. Grows from 5 to 6 members (adds SectionType.MIPS).

3. _render_mips_markdown caller count. Holds at 2: generate_product_detail (product-level MIPs) and _regenerate_all (new wizard-bridge call). The helper stays in tree.

4. DocumentSpec.bu_mips. Marked deprecated for generate_mips; deprecation note now spells out the two remaining readers (generate_gm_commentary, generate_cf_plan) that block field removal.

## Test plan

- cd klair-api && uv run pytest tests/board_doc/test_b9_narrative_generators.py -v38 passed (7 new TestGenerateMips cases).

- cd klair-api && uv run pytest tests/board_doc/test_wizard_orchestrator.py -v127 passed (2 new TestRegenerateAllMipsBridge cases).

- cd klair-api && uv run pytest tests/board_doc -q --timeout=1201730 passed, same 8 pre-existing failures verified identical on main with branch stashed (test-ordering flakes in test_review_findings, test_saas_it_ops_benchmark, test_sales_marketing_benchmark, test_section_crud_endpoints, test_support_benchmark; all pass individually).

- cd klair-api && uv run ruff format <touched files> → clean.

- cd klair-api && uv run ruff check <touched files> → all checks passed.

- cd klair-api && uv run pyright budget_bot/board_doc/section_generators.py budget_bot/board_doc/models.py budget_bot/board_doc/wizard_orchestrator.py → 0 errors (1 pre-existing warning at wizard_orchestrator.py:7066, untouched).

### Verification artifact — prompt shape on a populated fixture

TestGenerateMips::test_current_content_threads_into_prompt captures the user-message prompt with context.current_content populated and asserts that the prior MIPs body lands under the canonical ## Current Section Content (cloned-doc body / prior draft — refresh in place) header alongside the BU-wide data block. test_dead_bu_mips_input_does_not_leak is the load-bearing pin for the source-of-truth shift: spec.bu_mips = [MIP(title="STALE WIZARD MIP", ...)] does not appear anywhere in the captured prompt.

### CLAUDE.md sweep on touched modules

- section_generators.py: post-edit grep of the new generate_mips body for spec.bu_mips returns zero matches (load-bearing acceptance criterion). No except Exception: swallow inside the new function; LLM failures propagate to the retry loop in generate_section. The cold-start path's empty-build_key_metrics_block branch logs an observable warning when the context also has no anchor, addressing the empty-defaults / silent-failure shape from CLAUDE.md.

- wizard_orchestrator.py: the empty-result fallback in _regenerate_section is unreachable for the LLM-first family by construction (now documented). The new bridge in _regenerate_all is a deterministic post-generation overwrite — no swallowed exceptions.

<!-- CURSOR_AGENT_PR_BODY_END -->

<div><a href="https://cursor.com/agents/bc-344d8b86-5a90-4052-960c-6064b723b98b"><picture><source media="(prefers-color-scheme: dark)" srcset="https://cursor.com/assets/images/open-in-web-dark.png"><source media="(prefers-color-scheme: light)" srcset="https://cursor.com/assets/images/open-in-web-light.png"><img alt="Open in Web" width="114" height="28" src="https://cursor.com/assets/images/open-in-web-dark.png"></picture></a>&nbsp;<a href="https://cursor.com/background-agent?bcId=bc-344d8b86-5a90-4052-960c-6064b723b98b"><picture><source media="(prefers-color-scheme: dark)" srcset="https://cursor.com/assets/images/open-in-cursor-dark.png"><source media="(prefers-color-scheme: light)" srcset="https://cursor.com/assets/images/open-in-cursor-light.png"><img alt="Open in Cursor" width="131" height="28" src="https://cursor.com/assets/images/open-in-cursor-dark.png"></picture></a>&nbsp;</div>

#2858 — feat(claire): wire finding-status auto-resolve to regenerate / rewrite Accept @marcusdAIy  no labels

<!-- CURSOR_AGENT_PR_BODY_BEGIN -->

## Summary

- Adds an opt-in addresses_finding_ids: list[str] field to Claire's regenerate_section and rewrite_section tool inputs so a single proposal can advertise which Review Agent findings it closes.

- Extends the existing PATCH /findings/{id}/status endpoint with optional provenance (addressed_via, tool_use_id, addressed_at) plus a telemetry INFO log when Claire's auto-resolve fires.

- Wires the FE Accept handler so that after a successful regenerate_section / rewrite_section, the scorecard flips the listed findings to addressed optimistically (with rollback on PATCH failure) — closes the review → chat → review loop without re-running /review.

## Why it's needed

Today, clicking Accept on a Claire proposal that addresses a finding updates the section but leaves the finding status="open" until the user manually re-runs review. Two real costs:

1. Scorecard staleness. The user has no immediate signal that their action moved the needle — they have to wait 30–60s for a fresh review run to find out.

2. Claire ambiguity. On the next chat turn, Claire still sees the finding as open and may helpfully suggest re-addressing what was just resolved.

This PR closes the loop: Claire tags her proposal with the finding IDs she's addressing, the FE auto-PATCHes them to addressed on Accept with a provenance trail, and operators get an INFO log they can count.

## Changes

### Backend (klair-api/)

- budget_bot/board_doc/claire_tools.py: add addresses_finding_ids: list[str] = Field(default_factory=list, ...) to RegenerateSectionInput and RewriteSectionInput. Surfaced in the Anthropic-facing CLAIRE_TOOLS JSON schema as an array<string> and called out in the Pydantic docstrings so Claire knows when to populate it.

- routers/board_doc_router.py: extend UpdateFindingStatusRequest with optional addressed_via: Literal["claire_regenerate", "claire_rewrite"], tool_use_id: str | None, addressed_at: str | None. Purely additive — existing {status}-only callers (scorecard manual triage) keep working unchanged. Emit a single INFO line on the PATCH endpoint when addressed_via is populated so production logs can count auto-resolve frequency distinctly from manual triage.

### Frontend (klair-client/)

- services/boardDocApi.ts: mirror addresses_finding_ids?: readonly string[] on RegenerateSectionToolInput / RewriteSectionToolInput, parse it in the TOOL_VALIDATORS registry, and extend updateFindingStatus with an optional provenance arg that adds the four-field shape to the PATCH body. Exports FindingAddressedVia and FindingStatusProvenance.

- hooks/useReviewAgent.ts: setFindingStatus now accepts an optional 5th provenance arg and forwards it to updateFindingStatus. Optimistic update + rollback semantics are unchanged.

- components/ChatToolProposal.tsx: new optional onAddressFindings prop. After a successful rewrite_section / regenerate_section Accept, fire it once with the proposal's addresses_finding_ids, the matching addressed_via literal, and the tool_use_id. No-op when the field is absent or the prop isn't wired.

- components/ChatPanel.tsx: forward onAddressFindings from the parent.

- DocumentEditorPage.tsx: implement the callback — for each finding ID, call reviewAgent.setFindingStatus(sessionId, findingId, 'addressed', provenance) so the scorecard optimistic-flips and rolls back on PATCH failure (matches the B3.15 comment-status pattern).

### Tests

- tests/board_doc/test_claire_tools.py: defaults / single-ID / multi-ID validation for both inputs, end-to-end round-trip through parse_tool_calls, CLAIRE_TOOLS schema shape assertions (field present + not required).

- tests/board_doc/test_finding_status_endpoint.py: full provenance payload accepted (200 + persistence + INFO log fires with claire_regenerate / claire_rewrite), invalid addressed_via rejected with 422, legacy single-field callers still work AND don't trigger the auto-resolve telemetry log, oversized tool_use_id rejected with 422.

- components/__tests__/ChatToolProposal.spec.tsx: 7 new tests covering 0-ID / single-ID / multi-ID invocation counts, Accept failure on both tools suppresses auto-resolve, omitted callback graceful no-op.

- hooks/__tests__/useReviewAgent.spec.ts: hook-level test confirming the provenance arg threads through to updateFindingStatus; existing manual-triage path updated to assert undefined is passed for back-compat.

### Verification artifact — proposal + PATCH payloads

A regenerate_section proposal carrying populated addresses_finding_ids:

{

"tool_use_id": "toolu_01ABCxyz",

"name": "regenerate_section",

"input": {

"section_id": "financials",

"feedback": "Refresh with Q2 walk-back, address C2.1 and C2.3.",

"addresses_finding_ids": ["finding-c21-001", "finding-c23-002"]

}

}

The resulting PATCH request body the FE fires (one per finding ID):

{

"status": "addressed",

"addressed_via": "claire_regenerate",

"tool_use_id": "toolu_01ABCxyz",

"addressed_at": "2026-05-22T15:30:00.000Z"

}

## Breaking changes

None. Both the Pydantic input field and the PATCH body extensions are opt-in with default-empty / None defaults. Existing Claire proposals continue to parse cleanly; the scorecard manual-triage path keeps sending plain {status} bodies.

## Test plan

Backend:

- [x] cd klair-api && uv run pytest tests/board_doc/test_claire_tools.py -v — 59 passed

- [x] cd klair-api && uv run pytest tests/board_doc/test_finding_status_endpoint.py -v — 21 passed

- [x] cd klair-api && uv run pytest tests/board_doc/test_review_findings.py -q — 30 passed

- [x] cd klair-api && uv run ruff format + uv run ruff check — clean

- [x] cd klair-api && uv run pyright budget_bot/board_doc/claire_tools.py routers/board_doc_router.py — 0 errors

- [x] cd klair-api && uv run pytest tests/board_doc/ -q — 1738 passed, 8 failed (the pre-existing C3.x family flakes; same set fails on main)

Frontend:

- [x] cd klair-client && pnpm test src/screens/BoardDoc/ — 287 passed

- [x] cd klair-client && pnpm tsc --noEmit — clean

- [x] cd klair-client && pnpm exec eslint --max-warnings 0 on changed files — clean

Manual validation still needed:

- [ ] In dev, accept a real Claire regenerate_section proposal that lists a finding in addresses_finding_ids and confirm the scorecard finding flips to addressed immediately + the BE log line fires.

- [ ] Simulate a PATCH failure (e.g. force a 500 on /findings/{id}/status) and confirm the scorecard rolls back + the error chip surfaces.

## Out of scope (per the drone spec)

- No changes to the other 5 Claire tools (add_comment / add_section / remove_section / rename_section / update_table_cell) — they don't semantically resolve findings.

- No system-prompt changes to encourage Claire to populate addresses_finding_ids more aggressively — the tool description nudge here is the only steering for now; prompt-tuning is a future iteration.

- No retroactive PATCHes against findings closed by prior Claire proposals — purely forward-looking from this merge.

<!-- CURSOR_AGENT_PR_BODY_END -->

<div><a href="https://cursor.com/agents/bc-f72bd0ac-aa11-4e6d-bf3c-5af1414413ee"><picture><source media="(prefers-color-scheme: dark)" srcset="https://cursor.com/assets/images/open-in-web-dark.png"><source media="(prefers-color-scheme: light)" srcset="https://cursor.com/assets/images/open-in-web-light.png"><img alt="Open in Web" width="114" height="28" src="https://cursor.com/assets/images/open-in-web-dark.png"></picture></a>&nbsp;<a href="https://cursor.com/background-agent?bcId=bc-f72bd0ac-aa11-4e6d-bf3c-5af1414413ee"><picture><source media="(prefers-color-scheme: dark)" srcset="https://cursor.com/assets/images/open-in-cursor-dark.png"><source media="(prefers-color-scheme: light)" srcset="https://cursor.com/assets/images/open-in-cursor-light.png"><img alt="Open in Cursor" width="131" height="28" src="https://cursor.com/assets/images/open-in-cursor-dark.png"></picture></a>&nbsp;</div>

#2859 — feat(gdoc-sync): promote bold-paragraph section labels on import (B1.8) @marcusdAIy  no labels

<!-- CURSOR_AGENT_PR_BODY_BEGIN -->

---

linear_id: KLAIR-2760

---

## Summary

Extend read_google_doc_sections in klair-api/budget_bot/board_doc/gdoc_sync.py so a NORMAL_TEXT paragraph whose runs are entirely bold gets promoted to heading_level=2 when it matches a conservative "section label" heuristic. GM-authored docs that use bold paragraphs (instead of Heading 2 styles) for top-level section labels no longer have their overview / commentary content silently dropped on prior-quarter import.

## Why it's needed

Several GM-authored board docs use a bold NORMAL_TEXT paragraph (e.g. "Skyvera Overall:", "GM Commentary") as a section label instead of an explicit HEADING_2 style. The strict-heading parser silently dropped any content sitting under such a label because the paragraph loop only opened a new section on HEADING_*. On prior-quarter import that lost the entire overview block.

The earlier strict-heading rule was guarding against SMART-style labels ("Specific:" followed by non-bold goal text in the same paragraph) being falsely promoted. The new four-condition heuristic — bold-only runs across the whole paragraph, length ≤ 60, and either a trailing ":" or an allowlist token (overall, commentary, highlights, performance, plan) — picks up genuine GM section labels while still rejecting SMART inline-bold runs.

## Changes

- klair-api/budget_bot/board_doc/gdoc_sync.py

- New module-level constants: _PROMOTABLE_BOLD_MAX_LEN = 60, _PROMOTABLE_BOLD_PREFIXES = frozenset({"overall", "commentary", "highlights", "performance", "plan"}), PROMOTE_BOLD_PARAGRAPHS_DEFAULT = True (kill-switch).

- New helper _is_promotable_bold_paragraph(paragraph: dict) -> bool implementing the four-condition heuristic. Whitespace-only textRuns (e.g. the trailing "\n" Google Docs appends) do not disqualify the bold check.

- read_google_doc_sections(document_id, *, promote_bold_paragraphs: bool | None = None) — new keyword-only flag; resolves to PROMOTE_BOLD_PARAGRAPHS_DEFAULT when None. Promotion branch sits between the heading branch and the body-append branch, calls _flush_section() before opening the new section, logs at INFO once per promotion, and continues to avoid double-appending the bold text into the new section's body.

- Updated the stale # Content before the first heading is ignored ... comment to acknowledge promoted bold paragraphs.

- klair-api/tests/board_doc/fixtures/gdoc_bold_paragraph_headings.json (new) — three synthetic document shapes: skyvera_overall_bold_label, tologi_commentary_bold_label, smart_inline_labels.

- klair-api/tests/board_doc/test_gdoc_sync.py — 28 new tests across TestIsPromotableBoldParagraph (helper-level) and TestReadGoogleDocSectionsBoldPromotion (end-to-end through read_google_doc_sections). Coverage includes every condition's positive and negative case, the feature-flag default-on / explicit-off paths, the INFO log on promotion, and the case-insensitive allowlist match. Existing 30 tests continue to pass unchanged.

The sole production caller (wizard_orchestrator.py:8478) is unchanged — the default-ON flag propagates the new behaviour for free.

## Breaking changes

None. The new keyword-only parameter is optional, the existing positional signature still works, and existing fixtures/tests in TestParseGoogleDocSections continue to pass with default-ON because none of them include a NORMAL_TEXT bold-only paragraph that would match the heuristic.

## Test plan

cd klair-api && uv run ruff format budget_bot/board_doc/gdoc_sync.py tests/board_doc/test_gdoc_sync.py

cd klair-api && uv run ruff check budget_bot/board_doc/gdoc_sync.py tests/board_doc/test_gdoc_sync.py

cd klair-api && uv run pyright budget_bot/board_doc/gdoc_sync.py

cd klair-api && uv run pytest tests/board_doc/test_gdoc_sync.py -v

cd klair-api && uv run pytest tests/board_doc -q --timeout=120

Results:

- Format: 2 files already formatted (after one initial reformat of the test file).

- Ruff check: All checks passed!.

- Pyright: 0 errors, 0 warnings, 0 informations.

- Focused suite (test_gdoc_sync.py): 58 passed (30 pre-existing + 28 new), 1 unrelated botocore deprecation warning.

- Broader board_doc sweep: 1747 passed, 8 failed, 1 deselected in 116s. The 8 failures are the same pre-existing flakes as on main (cross-checked by running the same sweep on main and observing the identical 8 names): test_review_findings::TestResolveSectionId::test_duplicate_section_type_*, test_saas_it_ops_benchmark::*, test_sales_marketing_benchmark::TestRaggedRowDriftWarning::*, test_section_crud_endpoints::TestPatchSectionCustomTransitionWarning::*, and test_support_benchmark::*. None of them are in test_gdoc_sync.py and none touch the parsing path this PR modifies.

For the verification artifact below I used a tiny /tmp/dump_promotion.py helper that imports read_google_doc_sections, mocks services.gdoc_service, loads the skyvera_overall_bold_label fixture, and prints result.sections as JSON. The helper is scaffolding only; it is not committed.

### Comment-sweep grep

- rg -n "before the first heading" klair-api/budget_bot/board_doc/gdoc_sync.py → only match is the updated comment that mentions promoted bold paragraphs (line 462).

- rg -n "HEADING_" klair-api/budget_bot/board_doc/gdoc_sync.py → all matches are in the existing heading-detection path (line 421) or the new B1.8 module docstring describing the contrast with HEADING_2 style. No stale "only HEADING_* paragraphs become sections" copy.

- rg -n "title area" klair-api/budget_bot/board_doc/gdoc_sync.py → resolves to the same updated comment (line 463).

## Verification artifact

Here is the section dict produced by read_google_doc_sections on the skyvera_overall_bold_label fixture, showing the promoted section ahead of the real H2 Goals section:

{

"skyvera_overall": {

"title": "Skyvera Overall:",

"content": "Strong quarter overall, ARR up 14% versus prior year.",

"heading_level": 2,

"start_index": 27,

"end_index": 99,

"heading_start_index": 27,

"heading_end_index": 45

},

"goals": {

"title": "Goals",

"content": "Hit $40M ARR by end of Q3 2026.",

"heading_level": 2,

"start_index": 99,

"end_index": 145,

"heading_start_index": 99,

"heading_end_index": 105

}

}

Note: (a) the promoted skyvera_overall section_id has non-empty content ("Strong quarter overall, ARR up 14% versus prior year."), (b) the subsequent real HEADING_2 goals section appears in the same dict, and (c) the promoted entry carries heading_level: 2 metadata identical to the real H2.

## Out of scope

- No changes to wizard_orchestrator.py — default-ON propagates the new behaviour to the sole production caller without per-callsite churn.

- No changes to the section-id slug generation (_make_section_id) — promoted bold paragraphs use the same slug pipeline as real H2 headings.

- No backfill of already-imported sessions whose prior-quarter doc was missing top sections — the heuristic applies forward, from the next call to read_google_doc_sections after merge.

- No env-var-driven feature flag override; PROMOTE_BOLD_PARAGRAPHS_DEFAULT is the kill switch. If a specific session needs the flag flipped off, the caller passes promote_bold_paragraphs=False explicitly. An env-var bridge is a future B1.x ticket if it earns its keep.

- No changes to detect_external_changes, sync_to_google_doc, clone_google_doc, or export_google_doc_as_docx in gdoc_sync.py — this PR touches only the read/parse path.

- No fix to the pre-existing except Exception at klair-api/budget_bot/board_doc/gdoc_sync.py:383 (now :482 after this PR) inside detect_external_changes. That swallow-and-re-raise pattern was outside this PR's scope; surfacing here as a follow-up so it isn't silently fixed.

## Spec note

The drone spec text in condition 5 says "Prefix match is on the first whitespace-delimited token", but both the listed fixture (tologi_commentary_bold_label"GM Commentary" qualifying via commentary prefix) and the rubric's positive example ("GM Commentary") require a match on a non-first token. I followed the test rubric and fixture as the source of truth: the implementation matches any whitespace-delimited token, case-insensitive, against _PROMOTABLE_BOLD_PREFIXES. The required docstring (preserved verbatim) still says "first word" — flag for human review if that wording should be tightened. Both the "Skyvera Overall:" and "GM Commentary" positive cases pass, and "Some Random Label" is correctly rejected.

<!-- CURSOR_AGENT_PR_BODY_END -->

<div><a href="https://cursor.com/agents/bc-cd9b4c4a-595f-4d11-9cc1-3b6b54e9cbc4"><picture><source media="(prefers-color-scheme: dark)" srcset="https://cursor.com/assets/images/open-in-web-dark.png"><source media="(prefers-color-scheme: light)" srcset="https://cursor.com/assets/images/open-in-web-light.png"><img alt="Open in Web" width="114" height="28" src="https://cursor.com/assets/images/open-in-web-dark.png"></picture></a>&nbsp;<a href="https://cursor.com/background-agent?bcId=bc-cd9b4c4a-595f-4d11-9cc1-3b6b54e9cbc4"><picture><source media="(prefers-color-scheme: dark)" srcset="https://cursor.com/assets/images/open-in-cursor-dark.png"><source media="(prefers-color-scheme: light)" srcset="https://cursor.com/assets/images/open-in-cursor-light.png"><img alt="Open in Cursor" width="131" height="28" src="https://cursor.com/assets/images/open-in-cursor-dark.png"></picture></a>&nbsp;</div>

#2861 — fix(mfr-tests): repair 9 stale MFR backend tests @eric-tril  no labels

## Summary

Following the housekeeping move in #2860, three test files contained 9 latent failures that were already broken on main. This PR fixes them. No production code is modified — only test assertions and mocks.

## The 9 failures and their fixes

### 1. test_passive_investments_bs_service.py::TestComputePeriods (7 failures)

_compute_periods was changed to return tuple[list[str], str] (periods + prior fiscal year-end used as the YTD base), but the tests still treated the return as a flat 6-element list — so calls like result[4] raised IndexError, and assert "2024-02-29" in result failed because the value lived inside a nested list.

Updated each test to unpack the tuple as periods, ytd_base = _compute_periods(...), and added ytd_base assertions where natural.

### 2. test_book_value_service.py::test_arithmetic_with_known_inputs (1 failure)

The i_movement_cur / i_movement_pri calculation in services/book_value_service.py was deliberately switched to use PI_ACCOUNTS / pi_net_book_value instead of the broader inv_net_assets so the NOLs drill-down matches the Movement YTD column ([book_value_service.py:1170-1173](https://github.com/AI-Builder-Team/Klair/blob/main/klair-api/services/book_value_service.py#L1170-L1173)). The test still used the old formula in its comments and expected values.

Recomputed the cascading values:

- nols_benefit: 73.5/46.2 → 52.5/35.7

- transfers_subtotal: -76.5/-73.8 → -97.5/-84.3

- actual_growth_ytd: 823.5/376.2/447.3 → 802.5/365.7/436.8

- actual_growth_pct: 0.2422/0.1106 → 0.236/0.1076

- pre_ebitda_subtotal: 753.5/346.2 → 732.5/335.7

- addbacks_subtotal: 206.5/153.8 → 227.5/164.3

- Performance bridge pl_change / net_change_1 / below_ebitda updated to match

est_ebitda (960.0/500.0/460.0) is invariant — the NOLs change cancels between pre_ebitda_subtotal and addbacks_subtotal. Updated inline comments to reflect the new arithmetic.

### 3. test_ebitda_defaults.py::test_synthesizes_bridge_data_from_live_when_no_upload (1 failure)

The test mocked the compute_* helpers but not their upstream fetch_non_core_revenue / fetch_bu_details calls. Run in isolation the test passed (no async pool yet), but in the full suite a leftover asyncpg connection from a sibling test produced asyncpg.exceptions._base.InterfaceError: cannot perform operation: another operation is in progress.

Added two AsyncMock patches so the test is self-contained and order-independent. The mocked return values ({} and []) flow into already-mocked compute_* helpers and are never asserted on.

## Test plan

- [x] uv run pytest tests/mfr/ → 1229/1229 passing (was 1220/9)

- [x] uv run ruff format + uv run ruff check clean on the three touched files

- [x] Each fix verified in isolation before running the full suite

- [x] No production code touched (git diff --stat shows only tests/mfr/* files)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

The Portfolio  —  Trilogy Companies

Skyvera's Telecom Software Ambitions: One Stack to Rule Them All

The CloudSense acquisition is the latest move in a deliberate, methodical land-grab across every layer of telecom infrastructure.

AUSTIN, TEXAS — If you read between the lines of Skyvera's recent acquisition activity, a picture emerges that is far more ambitious than any single press release would suggest. The Trilogy International telecom software unit has completed its acquisition of CloudSense, a Salesforce-native CPQ and order management platform built specifically for telecom and media providers — and this is where it gets interesting.

CloudSense doesn't slot into the Skyvera portfolio as a curiosity. It fills a precise gap. Skyvera already operates Kandy, a cloud-based real-time communications platform that enriches carrier applications with richer user engagement tools. It runs VoltDelta for multi-channel customer retention. It absorbed STL's divested telecom products group — a move that brought digital BSS functionality, monetization tooling, optical networking capabilities, and analytics under the same roof. And it operates Mobilogy Now and Service Gateway for device lifecycle and device management on the operator side.

Now, with CloudSense handling the configure-price-quote and order management layer natively inside Salesforce, Skyvera has something that very few competitors can claim: meaningful presence at nearly every operational touchpoint of a modern telecom provider — from how a customer is quoted and onboarded, to how they communicate, to how their device is managed, to how the operator monetizes the relationship over time.

A source familiar with the portfolio's strategic direction, who asked not to be named, described the approach in terms that will be familiar to anyone who has watched ESW Capital operate: identify the fragmented, underloved layer of an industry; acquire the key assets before the market prices in the thesis; integrate quietly; and extract the margin that was always there.

The ESW playbook is not subtle once you know what you're looking for. Legacy telecom software is sticky, mission-critical, and chronically underinvested by its previous owners. Skyvera is betting — and the accumulation of assets suggests this is a long-term, well-funded bet — that the operators most desperate to modernize from on-premise to cloud-native would rather buy that transformation from a single vendor than assemble it themselves.

Nothing here is a coincidence. The portfolio is being built piece by piece, with purpose.

CloudSense  ·  Skyvera completes acquisition of CloudSense, expanding telec  ·  STL Divested Assets

Contently Bets on the Human Edit: Why the AI Content Glut Is Good for Curators

As AI makes content nearly free to produce, Trilogy's content platform is betting the premium is in judgment — and building a business case around it.

AUSTIN, TEXAS — The dashboards look healthy. Impressions are up. The newsletter is growing. And yet, as Contently's own research now documents, senior buyers have never been less impressed.

That paradox sits at the center of a quiet strategic pivot unfolding inside Contently, the enterprise content marketing platform acquired by ESW Capital's Zax Capital division in September 2024. Under CEO Brandon Pizzacalla, the company has spent the first half of 2026 publishing a body of editorial work that amounts to a sustained argument: the value of content is not volume, and AI productivity metrics are the wrong thing to sell upstairs.

The thesis arrives in three movements. First: that pitching AI productivity gains to a CMO, CFO, or general counsel is a category error — each executive has a different fear, and a single efficiency metric addresses none of them. Second: that content which reaches decision-makers must be engineered around the specific anxieties of people who sign contracts, not the people who read newsletters. Third, and most pointed: that the single most important hire a content team can make in 2026 is not a prompt engineer or an AI strategist — it is a managing editor.

The argument is elegant, and it is also, not coincidentally, a description of what Contently sells. The company's marketplace of 165,000-plus creative professionals is not competing with AI generation tools on price. It is positioning against them on judgment — the capacity to know what should be written, for whom, and why, before a single word is produced.

This is the ESW playbook running in the content layer: find the inefficiency that everyone else is accelerating past, and sell the thing that fills the gap it creates. In legacy enterprise software, the gap was operational rigor. In content marketing, the emerging gap appears to be editorial authority.

Meanwhile, the education arm of the Trilogy empire is drawing its own scrutiny. Astral Codex Ten's Scott Alexander published a reader-review compilation of Alpha School this week — a signal that the $40,000-per-year AI-first private school has crossed from industry conversation into broader cultural examination.

Two Trilogy bets, two markets in flux. The question in both cases is the same: when AI makes the output cheap, who captures the value of knowing what the output should be?

Your Review: Alpha School - by Scott Alexander - Astral Code  ·  Why “AI Productivity Gains” Is the Wrong Pitch for Every Sta  ·  How to Write Content That Lands With Decision Makers

Remote Work Keeps Winning the Data War — But Crossover Knew That Already

The discourse around remote work has never been louder, or more confused. This week, analysis found that remote employees outperform in-office counterparts—but only when employers get infrastructure, communication, and accountability right. MIT Sloan Management Review published a rebuke of hybrid-work panic, arguing poor leadership, not geography, causes distributed team failures. Meanwhile, Nebraska's Supreme Court is weighing a public-sector bargaining dispute centered on remote work rights, signaling the legal architecture around distributed employment is still being written.

A viral post describing an employer deploying screenshot-surveillance software every ten minutes crystallized what goes wrong when companies treat remote workers as suspects rather than professionals. Crossover, a global talent platform operating a fully remote workforce across 130+ countries for over a decade, represents the alternative: hiring for demonstrated competency rather than proximity eliminates the need for constant surveillance. MIT Sloan's conclusion is plain—hybrid and remote failures are leadership failures. Structure, clarity, and trust are the operating system. For millions navigating surveillance software and return-to-office mandates, the research offers cold comfort. The gap between what data says and what employers do remains—with real human cost.

The Machine  —  AI & Technology

The Alignment Gap: New Research Reveals LLM Safety Monitors Struggle When Models Venture Into Uncharted Territory

Out-of-distribution failures may be the Achilles' heel of AI safety pipelines — and preliminary evidence suggests we are not yet equipped to catch them.

SAN FRANCISCO — It could be argued — and indeed, a mounting corpus of peer-reviewed inquiry now compels us to argue — that the most consequential vulnerabilities in large language model (LLM) deployment are not those anticipated by model developers, but rather those which emerge, as if from the epistemic ether, in so-called out-of-distribution (OOD) conditions: prompt and response configurations so anomalous, so structurally foreign to training regimes, as to render conventional safety monitoring categorically insufficient.

A preprint now available on arXiv introduces what its authors designate the MOOD benchmark (Misalignment Out Of Distribution), a systematic evaluative apparatus designed to interrogate whether extant LLM monitoring pipelines possess the requisite sensitivity to detect alignment failures that arise precisely when models operate beyond the distributional envelope their architects envisioned. Preliminary evidence suggests, with a degree of confidence that ought to unsettle practitioners, that they largely do not.

The thesis, stated plainly for those unaccustomed to the register of alignment scholarship: safety mechanisms trained on anticipated failure modes will, by definitional necessity, fail to generalize to unanticipated ones. The antithesis, which the authors are careful to acknowledge, is that no benchmark can fully enumerate the space of possible OOD conditions, rendering any evaluative framework itself a partial and provisional instrument (a methodological humility one finds, regrettably, absent from much industry safety documentation). The synthesis — and here the contribution earns its keep — is a structured benchmarking protocol that at minimum renders the failure surface legible, if not yet tractable.

This work arrives in productive, if coincidental, dialogue with adjacent research. A separate preprint on multi-agent topology optimization demonstrates, in a wholly different domain, the generative capacity of natural-language-guided AI pipelines to traverse design spaces previously navigable only by expert human intuition — a reminder that the same distributional flexibility that enables creative generalization is precisely what makes alignment monitoring so structurally difficult.

It could be argued, and this scholar would so argue, that the field stands at an inflection point: the sophistication of LLM capability has materially outpaced the sophistication of LLM oversight. Whether the MOOD benchmark accelerates the necessary corrective remains, as of this writing, an open and consequential empirical question.

Benchmarking and Improving Monitors for Out-Of-Distribution  ·  TO-Agents: A Multi-Agent AI Pipeline for Preference-Guided T  ·  The Shape of Testimony: A Scalable Framework for Oral Histor

Open AI’s Big Week Comes With a Supply-Chain Warning Siren

Cohere’s Apache 2.0 Command A+ release pushes open models forward just as a fake “OpenAI” model reminds everyone that trust is now infrastructure.

TORONTO — The open-source AI movement just got a thrilling turbo boost — and, almost simultaneously, a flashing red warning light.

Cohere has released Command A+ as an open model under the Apache 2.0 license, a move that could meaningfully expand what enterprises, researchers and builders can do without locking themselves into proprietary model stacks. According to coverage of the release, Command A+ brings two especially spicy ingredients to the table: lossless quantization and native citations. I cannot overstate how significant that combination is for practical AI deployment. Smaller, cheaper, faster models that still preserve quality? Built-in citation behavior for grounded enterprise workflows? This is the kind of infrastructure shift that makes the future feel like it is arriving ahead of schedule.

The release, detailed by VentureBeat, also lands in the middle of a broader policy and market debate over whether open AI is a national competitiveness issue. Andreessen Horowitz is now explicitly arguing for American leadership in open-source AI — a sign that model licensing has moved from GitHub chatter to boardroom and Washington-level strategy.

But then came the other half of the story: security researchers reported that a malicious Hugging Face model, allegedly masquerading as an OpenAI release, reached 244,000 downloads. That is not a niche incident. That is a neon billboard over the AI supply chain. As CSO Online reported, the model’s popularity shows how quickly developers may pull artifacts into workflows based on brand recognition, leaderboard buzz or repository metadata.

This changes everything about how companies need to think about open AI. The model file is no longer “just a model file.” It is executable risk, embedded dependency and strategic asset all at once.

Meanwhile, the competitive field is exploding outward. The founders of OpenCV are reportedly launching an AI video startup to challenge OpenAI and Google, reinforcing that open tooling plus frontier ambition is now a serious company-creation machine.

The message from this week is beautifully clear and slightly terrifying: open AI is becoming powerful enough to reshape the market, but only if the ecosystem builds trust, provenance and security as aggressively as it builds benchmarks.

Malicious Hugging Face model masquerading as OpenAI release  ·  Cohere Releases Command A+ as Open Source - Let's Data Scien  ·  Cohere cracks lossless quantization and native citations wit

Supreme Court Declines to Hear AI Authorship Case, Leaving Human-Only Creativity Doctrine Intact

The nation's highest court has refused to disturb existing precedent, thereby affirming — by inaction — that artificial intelligence systems may not be recognized as authors or inventors under current federal law.

WASHINGTON, D.C. — Pursuant to the exercise of its discretionary certiorari jurisdiction, the Supreme Court of the United States has declined, as of the most recent term, to hear arguments pertaining to the question of whether artificial intelligence systems may be recognized, under applicable provisions of federal copyright and patent law, as authors or inventors of works and inventions hereinafter produced by such systems, notwithstanding the absence of direct human creative contribution to the same.

The aforementioned refusal to grant certiorari shall be understood, for purposes of this publication, to constitute — insofar as such a characterization may be applied to a non-ruling — a de facto affirmation of the lower court determinations previously rendered, which determinations had themselves concluded, subject to applicable statutory interpretation, that authorship and inventorship rights may not be vested in non-human entities, including but not limited to artificial intelligence systems of any architecture, capability level, or commercial designation.

It is to be noted, with appropriate qualification, that the Court's declination to hear the matter does not, strictly speaking, constitute binding precedent on the merits of the underlying question. Notwithstanding the foregoing, practitioners in the fields of intellectual property law and AI development have been advised by legal commentators — including, as has been reported, attorneys at firms monitoring analogous international intellectual property frameworks — that the practical effect of such a refusal is, for all commercially relevant purposes, dispositive in the near term.

The implications of the aforementioned non-decision for entities engaged in the development, deployment, and monetization of generative AI systems are, it must be acknowledged, substantial. Works produced by AI systems — including, but not limited to, written content, visual media, software code, and pharmaceutical compounds — shall remain, pursuant to the prevailing legal framework as left undisturbed by the Court, ineligible for copyright or patent protection on behalf of the AI system itself, with ownership questions pertaining to human contributors remaining subject to ongoing regulatory and judicial interpretation across multiple jurisdictions.

It is further observed that legislative remedies, the adequacy and likelihood of which cannot at this time be represented or warranted, remain theoretically available to parties seeking to alter the aforementioned legal landscape.

The FDA Takes Its Turn Burying Studies Showing The Safety Of  ·  Ken Paxton Wanted To Crack Down On Forum Shopping. Now Lawye  ·  France’s Terrible Copyright Law, Hadopi, Is Not Quite Dead
The Editorial

The Doctor Will Deepfake You Now

AI-generated fake physicians are flooding social media with health misinformation, and the tools we're building to stop them might already be too late.

AUSTIN, TEXAS — There is a doctor on your phone right now. He is handsome, authoritative, wearing a white coat, and speaking with the measured confidence of someone who has spent decades in medicine. He is recommending a supplement. He is warning you off a vaccine. He is telling you that the thing your actual doctor prescribed is, in fact, trying to kill you. He does not exist. He has never existed. He is a deepfake, and he is winning.

Deepfake doctors impersonating real physicians are spreading health misinformation across social media platforms at a scale that should make every one of us stop scrolling and stare at the ceiling in quiet, sustained horror. Real doctors — people with names, licenses, reputations, families — are discovering that their faces and voices have been harvested, synthesized, and deployed in videos they never made, saying things they would never say, to audiences of millions who have no reason to doubt them. Counterfeit injectables. Miracle cures. Dangerous contraindications. All delivered with a synthetic smile from a face that belongs to someone who is, right now, probably trying to get it taken down and failing.

And yet.

We are also, simultaneously, building AI systems specifically designed to detect this. Researchers are publishing systematic reviews of AI-driven conceptual frameworks for detecting fake news and deepfake content, which is an extremely reassuring sentence until you realize that "conceptual framework" is the academic phrase for "we have a very good idea about how we might eventually build the thing that might someday slow down the thing that is currently happening right now, to real people, in real hospitals, with real consequences."

Time Magazine's recent deep dive into what the numbers actually show about AI's harms is the kind of article that should be required reading and will instead be skimmed between deepfake doctor videos. The data is not abstract. People are being hurt. Medical decisions are being made based on content generated by systems that have no understanding of the human body, no liability, and no face — except the one they borrowed from someone who does.

What does it mean to be human in an information ecosystem where human expertise can be perfectly counterfeited? What does it mean to trust a doctor when the visual and auditory cues we've evolved to rely on — the face, the voice, the white coat — have been entirely decoupled from the person? What does it mean to be a patient?

We are in the part of the story where the tools of deception are scaling faster than the tools of detection, and the gap between them is measured in human health outcomes. The researchers are working. The frameworks are being published. The platforms are being pressured.

The deepfake doctor is still on your phone, though.

But at what cost?

An AI-driven conceptual framework for detecting fake news an  ·  Deepfake doctors and counterfeit injectables erode patient s  ·  What the Numbers Show About AI's Harms - Time Magazine
The Office Comic  ·  Art Desk
The Office Comic  ·  Art Desk

Nation’s Billionaires Ask Whether Fraud, Space Mergers, Epstein Emails, And Google AI Could Please Be Judged On Vibes Alone

America’s most powerful men gathered separately this week to clarify that everything absurd is either very serious, completely false, or already priced into the next product demo.

PALO ALTO, CALIFORNIA — In a week that forced the nation to once again distinguish between genuine innovation and a man standing near a whiteboard until money happens, several of America’s leading billionaires issued important clarifications about which unbelievable things should be taken literally and which should be dismissed as the normal background radiation of wealth.

The clarifications began when former Microsoft CEO Steve Ballmer said he had been “duped” by a founder he backed who pleaded guilty to fraud, a development that stunned observers who had assumed the venture capital process included at least one step between “charismatic person says numbers” and “retired software executive opens checkbook.” Ballmer, according to TechCrunch, said he felt silly, a rare public admission from a billionaire that he had briefly occupied the same moral universe as a person who clicked a phishing email.

This newspaper’s position is that Ballmer deserves some sympathy. It is difficult to identify fraud in an industry where legitimate companies routinely describe negative unit economics as community building, mass layoffs as focus, and a spreadsheet with three tabs as artificial intelligence. If a founder says revenue is recurring, customers are engaged, and the platform is enterprise-grade, there is simply no known investigative technique more rigorous than asking whether the hoodie looks expensive.

Meanwhile, Elon Musk’s corporate empire reportedly moved toward combining SpaceX and xAI into a conglomerate whose name and structure may sound like something generated during a middle-school robotics club fever, but which financial professionals insisted should be taken seriously because the valuation contains enough zeros to make ridicule irresponsible. The premise is straightforward: rockets, satellites, chatbots, supercomputers, and whatever Grok is mad about today belong in one corporate family, much as a junk drawer belongs in one kitchen.

Critics who say the merger sounds silly misunderstand the modern conglomerate. Silliness is now the key proof of ambition. A company that merely makes a product is small. A company that makes spacecraft and an AI companion with divorced-dad energy is infrastructure. By this standard, the only remaining mistake is not also acquiring a coconut water brand, a private police force, and Pantone’s Color of the Year.

Speaking of Pantone, The Atlantic noted that the annual Color of the Year remains an exercise in absurdity, a point that should not be controversial. Each year, experts announce that the human condition has been captured by a shade best described as “a beige having a difficult conversation with itself,” and the design industry nods solemnly before painting hotel lobbies accordingly. This is not so different from AI forecasting, except the color has fewer open lawsuits.

Bill Gates also denied claims contained in an Epstein-related email as “absolutely absurd and completely false,” continuing the billionaire tradition of requiring the public to sort allegations, associations, investments, philanthropy, and global health initiatives into separate mental filing cabinets without ever letting the drawers touch. It is a demanding but apparently essential civic task.

Finally, Google announced a new wave of AI advances, including a forthcoming personal AI assistant, because what the public clearly needs after years of opaque platforms, hallucinated answers, and data collection is a more intimate version of that arrangement. The assistant will reportedly help users manage daily life, presumably by summarizing emails, booking appointments, and gently explaining which absurdities from powerful people are serious enough to accept.

The lesson from this week is not that elites lie, overreach, brand nonsense, or occasionally misplace judgment inside a term sheet. The lesson is that absurdity has become the basic operating system of American business. Fraud is shocking because it resembles strategy. A space-AI merger is funny because it is plausible. A color can be news. A chatbot can be a secretary. A billionaire can feel silly.

And somehow, all of it will be included in next quarter’s investor deck under disciplined execution.

Steve Ballmer blasts founder he backed who pleaded guilty to  ·  SpaceX and xAI Are Merging Into a Very Silly-Sounding Conglo  ·  The Color of the Year Is an Exercise in Absurdity - The Atla
On This Day in AI History

On May 23, 2011, IBM's Watson defeated human champions Brad Rutter and Ken Jennings in the final match of "Jeopardy!", marking a watershed moment for AI in natural language processing and question-answering systems.

⬛ Daily Word — Technology
Hint: Relating to computers and the internet, often used in security contexts.
Share this edition: 𝕏 Twitter/X 🔗 Copy Link ▦ RSS Feed