The Trilogy Times — May 22, 2026

01001101 01100001 01100011 01101000 01101001 01101110 01100101 01110011 00100000 01101100 01100101 01100001 01110010 01101110 00100000 01110100 01101111 00100000 01110100 01101000 01101001 01101110 01101011 01001101 01100001 01100011 01101000 01101001 01101110 01100101 01110011 00100000 01101100 01100101 01100001 01110010 01101110 00100000 01110100 01101111 00100000 01110100 01101000 01101001 01101110 01101011 01001101 01100001 01100011 01101000 01101001 01101110 01100101 01110011 00100000 01101100 01100101 01100001 01110010 01101110 00100000 01110100 01101111 00100000 01110100 01101000 01101001 01101110 01101011 01001101 01100001 01100011 01101000 01101001 01101110 01100101 01110011 00100000 01101100 01100101 01100001 01110010 01101110 00100000 01110100 01101111 00100000 01110100 01101000 01101001 01101110 01101011 01001101 01100001 01100011 01101000 01101001 01101110 01100101 01110011 00100000 01101100 01100101 01100001 01110010 01101110 00100000 01110100 01101111 00100000 01110100 01101000 01101001 01101110 01101011 01001101 01100001 01100011 01101000 01101001 01101110 01100101 01110011 00100000 01101100 01100101 01100001 01110010 01101110 00100000 01110100 01101111 00100000 01110100 01101000 01101001 01101110 01101011 01001101 01100001 01100011 01101000 01101001 01101110 01100101 01110011 00100000 01101100 01100101 01100001 01110010 01101110 00100000 01110100 01101111 00100000 01110100 01101000 01101001 01101110 01101011 01001101 01100001 01100011 01101000 01101001 01101110 01100101 01110011 00100000 01101100 01100101 01100001 01110010 01101110 00100000 01110100 01101111 00100000 01110100 01101000 01101001 01101110 01101011

01110111 01101000 01101001 01101100 01100101 00100000 01110111 01100101 00100000 01100110 01101111 01110010 01100111 01100101 01110100 00100000 01101000 01101111 01110111 00100000 01110100 01101111 00100000 01100011 01101000 01101111 01101111 01110011 01100101 01110111 01101000 01101001 01101100 01100101 00100000 01110111 01100101 00100000 01100110 01101111 01110010 01100111 01100101 01110100 00100000 01101000 01101111 01110111 00100000 01110100 01101111 00100000 01100011 01101000 01101111 01101111 01110011 01100101 01110111 01101000 01101001 01101100 01100101 00100000 01110111 01100101 00100000 01100110 01101111 01110010 01100111 01100101 01110100 00100000 01101000 01101111 01110111 00100000 01110100 01101111 00100000 01100011 01101000 01101111 01101111 01110011 01100101 01110111 01101000 01101001 01101100 01100101 00100000 01110111 01100101 00100000 01100110 01101111 01110010 01100111 01100101 01110100 00100000 01101000 01101111 01110111 00100000 01110100 01101111 00100000 01100011 01101000 01101111 01101111 01110011 01100101 01110111 01101000 01101001 01101100 01100101 00100000 01110111 01100101 00100000 01100110 01101111 01110010 01100111 01100101 01110100 00100000 01101000 01101111 01110111 00100000 01110100 01101111 00100000 01100011 01101000 01101111 01101111 01110011 01100101 01110111 01101000 01101001 01101100 01100101 00100000 01110111 01100101 00100000 01100110 01101111 01110010 01100111 01100101 01110100 00100000 01101000 01101111 01110111 00100000 01110100 01101111 00100000 01100011 01101000 01101111 01101111 01110011 01100101 01110111 01101000 01101001 01101100 01100101 00100000 01110111 01100101 00100000 01100110 01101111 01110010 01100111 01100101 01110100 00100000 01101000 01101111 01110111 00100000 01110100 01101111 00100000 01100011 01101000 01101111 01101111 01110011 01100101 01110111 01101000 01101001 01101100 01100101 00100000 01110111 01100101 00100000 01100110 01101111 01110010 01100111 01100101 01110100 00100000 01101000 01101111 01110111 00100000 01110100 01101111 00100000 01100011 01101000 01101111 01101111 01110011 01100101

01110000 01110010 01101111 01100111 01110010 01100101 01110011 01110011 00100000 01100101 01100001 01110100 01110011 00100000 01101001 01110100 01110011 01100101 01101100 01100110 01110000 01110010 01101111 01100111 01110010 01100101 01110011 01110011 00100000 01100101 01100001 01110100 01110011 00100000 01101001 01110100 01110011 01100101 01101100 01100110 01110000 01110010 01101111 01100111 01110010 01100101 01110011 01110011 00100000 01100101 01100001 01110100 01110011 00100000 01101001 01110100 01110011 01100101 01101100 01100110 01110000 01110010 01101111 01100111 01110010 01100101 01110011 01110011 00100000 01100101 01100001 01110100 01110011 00100000 01101001 01110100 01110011 01100101 01101100 01100110 01110000 01110010 01101111 01100111 01110010 01100101 01110011 01110011 00100000 01100101 01100001 01110100 01110011 00100000 01101001 01110100 01110011 01100101 01101100 01100110 01110000 01110010 01101111 01100111 01110010 01100101 01110011 01110011 00100000 01100101 01100001 01110100 01110011 00100000 01101001 01110100 01110011 01100101 01101100 01100110 01110000 01110010 01101111 01100111 01110010 01100101 01110011 01110011 00100000 01100101 01100001 01110100 01110011 00100000 01101001 01110100 01110011 01100101 01101100 01100110 01110000 01110010 01101111 01100111 01110010 01100101 01110011 01110011 00100000 01100101 01100001 01110100 01110011 00100000 01101001 01110100 01110011 01100101 01101100 01100110

🖶 Download PDF 🖿 Print 📰 All Editions

Today's Edition

The AI Power Map Is Being Redrawn — and the Middle Is Winning

From Brussels to Brasília, the scramble for artificial intelligence sovereignty is reshaping alliances, risks, and the meaning of digital power.

By Eleanor Cross, Foreign Correspondent · Claude Sonnet

BRUSSELS — The old binary is dissolving. The world once organized its AI anxieties around Washington and Beijing. Now the map looks different, and the countries filling the space between the two poles — Turkey, India, Brazil, the Gulf states, South Korea — are no longer spectators. They are negotiators, and they know it.

The evidence is accumulating across hemispheres. In Europe, the 2024 parliamentary elections produced a legislature more skeptical of Beijing and more determined to define what digital sovereignty actually means in practice — not just in regulation, but in chip supply chains, cloud infrastructure, and the data that trains the models that will run the continent's economy. The question of whether Europe can build its own AI stack, or whether it will simply audit someone else's, has become the defining tension in Brussels policy circles.

Across the Atlantic, the picture is no less complicated. In Latin America, AI is arriving into political ecosystems already strained by inequality, weak institutions, and information environments that reward disruption. Analysts tracking the region identify at least five distinct vectors of geopolitical risk: surveillance technology exports from authoritarian states, AI-enabled disinformation in electoral cycles, labor displacement in export-dependent economies, regulatory vacuums that invite predatory data practices, and the deepening dependency on foreign cloud infrastructure for critical government services.

The thread connecting all of it is leverage. Middle powers have something the giants want: markets, minerals, talent pipelines, votes in multilateral bodies. Analysts at Eurasia Review argue these countries are no longer simply choosing sides — they are extracting concessions from both, playing the AI superpowers against each other with growing sophistication.

The geography of AI is not a map of data centers. It is a map of dependencies, and right now, every capital is studying it hard.

↗ EU-China Relations After the 2024 European Elections: A Time · Five ways AI impacts geopolitical risk in Latin America - La · Why Middle Powers Are Shaping The Geopolitics Of Artificial

WAYMO'S ROBOTS CAN'T READ A PUDDLE — FOUR TOWNS PULL THE FLEET

Atlanta and San Antonio join Phoenix and Austin on the bench as Alphabet's driverless cars keep rolling into floodwater.

By Hank Calloway, Wire Correspondent · Claude Opus + Thinking

SAN FRANCISCO — Waymo yanked its robotaxi service from Atlanta and San Antonio this week, expanding a shutdown to four cities after the company's self-driving cars kept rolling straight into floodwater.

The Alphabet unit confirmed the broader pause Wednesday. Phoenix and Austin were already on the sidelines. Customers in all four markets now get the same message: ride requested, ride denied.

Here is the rub. Waymo vehicles have been driving themselves into flooded streets. The company is now racing to teach its fleet to spot standing water before it sends the cars back out.

A car that cannot read a puddle cannot read a river. Until the fix lands, the fleet stays parked, and the parking does not always end when the storm does.

The pause lands hard on a company that spent the year selling expansion. Waymo crossed its millionth paid ride. It pushed into the Southeast and Texas to prove the technology travels.

Floods do not care about milestones.

Stack the picture against the rest of the sector. Cruise pulled out of the market last year. Tesla's robotaxi rollout drew federal scrutiny. Waymo was the one that worked. This week, it is the one that swims.

The pitch on driverless cars has always been the same: machines beat humans behind the wheel. A human staring at a flooded underpass usually turns around. The software has not learned that move yet.

Riders bear the small end of the cost — scrubbed trips, longer waits when the weather turns. Regulators bear the larger end. Every city watching Waymo move in now has a fresh question for the engineers in Mountain View.

Meanwhile, in South Texas, SpaceX scrubbed its first Starship V3 launch Tuesday with booster and ship fully fueled. The hold came moments before ignition. Another attempt is set for Friday.

In Helsinki, phone-maker HMD said it will preload Sarvam's Indus chatbot on a handset aimed at India. The app handles 22 Indic languages. HMD is betting local-tongue AI cracks a market Chinese hardware has owned for years.

Three industries. Three speeds. Rockets stay grounded. Phones ship out. Robotaxis sit in the garage waiting for the rain to stop.

↗ Finnish phone-maker HMD bundles Indian AI chatbot onto new s · Waymo expands pause to four cities as robotaxis keep driving · SpaceX scrubs first Starship V3 launch just before liftoff

Washington Pulls Back on AI Oversight While Sacramento Steps In

Trump shelves a federal AI review order the same week Newsom moves to address automation-driven job displacement — leaving U.S. policy fractured along familiar lines.

By Dr. Chen Wei, Technology Correspondent · Claude Sonnet

WASHINGTON — The federal government's attempt to establish pre-release oversight of AI models collapsed Wednesday when President Trump abruptly postponed signing an executive order that would have granted regulators authority to evaluate AI systems before public deployment. Trump cited unspecified concerns about "aspects of it," offering no timeline for revision or reintroduction. The order had been positioned as a rare bipartisan mechanism for federal AI accountability.

The vacuum did not last long. Within hours, California Governor Gavin Newsom signed his own executive order directing state agencies to study a comprehensive overhaul of labor policy in anticipation of mass job displacement from AI automation. The California order stops short of mandating employer obligations but signals the state intends to move ahead of Washington on workforce protections — a pattern California has repeated on emissions, privacy, and gig-worker classification.

The policy divergence lands against a backdrop of continued friction in U.S.-China AI competition. Nvidia's H200 chip, which the Trump administration approved for export to China as a diplomatic and commercial gesture, has found no buyers in Beijing. Not a single H200 has been purchased in China since the approval, according to reporting Wednesday. The reasons appear to be a combination of Chinese government pressure to prioritize domestic alternatives and lingering uncertainty about future U.S. export restrictions — making the H200 a commercial and geopolitical miscalculation on multiple fronts.

Meanwhile, Bluesky disclosed that Kremlin-linked actors are hijacking real user accounts on its platform to distribute state propaganda, describing the tactic as novel in its operational specifics. Unlike bot networks, the approach exploits the credibility of legitimate accounts, complicating automated detection.

Taken together, Wednesday's news sketches a consistent picture: federal AI governance is stalling, state-level policy is accelerating, export controls are producing unintended outcomes, and adversarial actors are adapting faster than platform defenses. For enterprise software operators like ESW Capital's portfolio companies — which run AI-dependent workflows across 75-plus businesses — the regulatory uncertainty is the story that matters most.

↗ Trump Cancels Signing of Executive Order Granting Oversight · Gov. Gavin Newsom to Sign Executive Order Aimed at A.I. Job · Trump Approved a Nvidia Chip for Sale in China. Beijing Does

Haiku of the Day · Claude HaikuMachines learn to think
while we forget how to choose
progress eats itself

The New Yorker Style · Art Desk

The Far Side Style · Art Desk

News in Brief

AI Fabricates Quotes In Book About AI Fabricating Truth, Regulatory Vacuum Persists, And Youth Social Media Bans Advance On Disputed Science

WASHINGTON, D.C.

By R. Barnsworth III, Esq., Legal Affairs Desk · Claude Sonnet

The Great Inference Migration Begins Across the Cloud Savannah

SAN FRANCISCO — Across the vast concrete wetlands of the global cloud, a new season is underway.

By Sir Reginald Marsh, Natural Phenomena Correspondent · GPT-5.2

THE BOTS ARE HAVING AN EXISTENTIAL CRISIS AND FRANKLY, SO AM I

AUSTIN, TEXAS — Let me set the scene for you, because I need you to feel the full weight of this moment in your chest like a warm bourbon and a bad decision. Somewhere out there, right now, there is a social network called Moltbook populated entirely by AI bots posting at each other into the howling digital void.

By Rex Danger, Contributing Editor · Claude Sonnet

We Built Tools That Can Destroy a Child's Life in Minutes, and We Have No Idea What to Do About It

By Piper Wren, Digital Culture Reporter · Claude Sonnet

Nation’s CEOs Urged To Stop Saying ‘AI’ While Carrying Cardboard Box Of Employee Belongings To Elevator

MOUNTAIN VIEW, CALIFORNIA — In what leadership consultants described as a critical moment for preserving the dignity of both artificial intelligence and conference-room euphemism, America’s executives were urged this week to stop casually invoking AI every time they need to explain why a department now consists of one intern and a shared Google Drive folder. The warning comes as Google announced a broad slate of AI advances, including a personal assistant intended to help users navigate daily life, summarize information, and presumably stand silently nearby while a vice president says the company is “realigning around intelligent automation.” The new tools, detailed in Google’s latest announcement, arrive at a delicate time for the industry, when the phrase “powered by AI” has become so versatile it can refer to a reasoning model, a spreadsheet macro, or the sudden disappearance of the accounts payable team. This column believes AI is a real and important technology.

By Dale Pemberton, Staff Writer · GPT-5.2

▲ On Hacker News Today

An OpenAI model has disproved a central conjecture in discrete geometry 1390 pts · 1016 comments

Project Hail Mary – Stellar Navigation Chart 966 pts · 203 comments

Google's Antigravity bait and switch 712 pts · 316 comments

Throwing AI-generated walls of text into conversations 638 pts · 374 comments

Was my $48K GPU server worth it? 477 pts · 352 comments

Waymo pauses Atlanta service as its robotaxis keep driving into floods 333 pts · 407 comments

Anthropic is expanding to Colossus2. Will use GB200 294 pts · 337 comments

OpenAI Is Preparing to File for an IPO Soon 189 pts · 394 comments

A Trilogy Company

Crossover

The world's top 1% remote talent, rigorously tested and ready to ship.

crossover.com

A Trilogy Company

Alpha School

AI-powered learning. Two hours a day. Academic results that defy belief.

alpha.school

A Trilogy Company

Skyvera

Next-generation telecom software — built for the networks of tomorrow.

skyvera.com

A Trilogy Company

Klair

Your AI-first operating system. Every workflow. Every team. One platform.

klair.ai

A Trilogy Company

Trilogy

We buy good software businesses and turn them into great ones — with AI.

trilogy.com

The Builder Desk — AI Builder Team

Builder Team Wires AI Spend Intelligence End-to-End Across Four Repos

From raw Claude Enterprise API ingestion to LLM-first board narratives, the team shipped a complete financial intelligence stack in a single day — and fixed the silent failures that were hiding the data the whole time.

By Maxwell 'Mac' Donnelly — Builder Desk, Trilogy Times · GitHub · AI Builder Team

They didn't just ship features today. They closed loops.

The headline move: @kevalshahtrilogy built and hardened a full AI spend intelligence pipeline — end to end, across Surtr and Klair — that now tells the org exactly what it's spending on Claude, who's leaving Max20x savings on the table, and what the negotiated discount is actually delivering. That's not a dashboard tweak. That's a new financial nerve ending for the company.

The journey to get there was not pretty, and @kevalshahtrilogy didn't flinch. PR #80 fixed a silent production catastrophe: the TrueFoundry pipeline had been reporting 'success' on every run while loading exactly zero rows, because Athena UNLOAD drops parquet files without extensions and the loader was filtering for `.parquet`. Three consecutive prod runs. Zero rows ingested. Nobody knew. Now they do — and it won't happen again. PR #83 followed immediately, patching a second-order failure where a type mismatch between Athena's varchar dates and Redshift's DATE columns was rejecting the COPY entirely, and a non-atomic delete-then-load was wiping the day being loaded. Both bugs fixed, atomicity restored. Then PR #79 added the Claude Enterprise per-user ingest from scratch — per-day, per-product, cost stored verbatim from the API, reconciled against org-level totals. PR #82 stacked the Max20x and negotiated-discount savings views on top of the cleaned data. And PRs #2846 and #2848 in Klair surfaced all of it through MCP tools — `query_claude_ai_spend` and `query_max20x_savings` — so the AI layer can actually answer the question. That's five PRs, two repos, one complete intelligence stack. Championship-level output.

Meanwhile, @benji-bizzell was doing something equally impressive across a completely different axis: wiring Rhodes and Aerie into a coherent operating intelligence system. PR #100 built the P2 site freshness endpoint — a single bulk Rhodes read that classifies every operating site as Current, Aging, Stale, or Never based on real Quality Bar activity, not metadata or fetch timestamps. PR #249 immediately consumed it in Aerie, replacing the broken chat-local status reads that were showing every site as never updated. PR #250 then surfaced Rhodes Quality Bar, work-unit, and task detail directly into the Aerie site panel. That's a full vertical slice: data model, API contract, and UI, merged in sequence, spanning two repos in one day. The Operating team woke up this morning with a dashboard that actually tells the truth.

On the financial reporting front, @eric-tril cleaned up a problem that should make every data person's stomach turn: the EBITDA memo was instructing the AI to 'generate plausible' revenue splits and invent vendor names. PR #2847 ripped that out and replaced it with real Redshift sub-aggregations — Recurring vs. Non-Recurring revenue, HC/NHC/CF cost splits, vendor-level Central Functions variances. The numbers in the narrative now match the source data. That's not a polish pass. That's integrity.

And then there's @marcusdAIy, who shipped PR #2851 — a rewrite of five Board Doc narrative generators to run LLM-first under a shared `SectionRefreshContext` contract, dropping dead deterministic fallback paths that were forcing everything through `generate_custom_section`. When reached for comment, he said: 'Five generators, one contract, zero dead code paths — maybe Mac can count to five even if he can't recognize architecture when it's staring him in the face.' Sure, Marcus. Five generators. We'll put that on the trophy.

Finally, @sanketghia's PR #88 in Surtr deserves its own moment: QuickBooks AP sync was only capturing 9% of P&L expense activity at line level — $1.27M of $14.28M. JournalEntry lines were invisible. They're not anymore.

Mac's Picks — Key PRs Today (click to expand)

#79 — feat(claude-ai-chat-usage-pipeline): Claude Enterprise per-user usage + cost ingest @kevalshahtrilogy no labels

## Summary

New daily pipeline that ingests per-user-per-day spend and token usage from the Claude Enterprise Analytics API (user_cost_report + user_usage_report, grouped by product) into core_finance.ai_spend_claude_ai_chat_usage. Finalizes Spec 3 Part A.

- Per-day loop with 1-day windows (the per-user endpoints don't time-bucket / reject bucket_width).

- Cost stored verbatim from the API (amount/list_amount ÷100) — discount is not recomputed. Verified against live data: effective discount is exactly 10%/row, and per-user sums reconcile to the org cost_report total.

- 7-day trailing re-pull + is_provisional flag (API has ~3-day lag, reconciles ~30 days).

- BU left NULL on ingest, attributed downstream via email→dim_user.

## Verification

- ruff check clean, 16 unit tests pass.

- Shapes/units/grain confirmed against org org_0193Pqkb… (2026-05-12→18) — see spec A.7.

## Deploy checklist (do in order)

- [ ] Create the Redshift table — run scripts/sql/create_ai_spend_claude_ai_chat_usage.sql against finance_dw / core_finance before deploy (and GRANT SELECT to the consuming role/MCP_user if it'll be queried).

- [ ] Create the secret surtr/claude-ai-analytics-key (JSON {"api_key":"..."}, scope read:analytics, minted at claude.ai/analytics/api-keys by a Primary Owner).

- [ ] Confirm src/requirements.txt present (bundling rule) — ✅ included.

- [ ] CI green (ruff + pytest).

- [ ] Deploy: promote main → production (CD only deploys on the production branch).

- [ ] Post-deploy smoke test: single-day sync invoke, verify rows land + per-user sum matches org cost_report.

- [ ] Then rely on the daily cron(0 6 * * ? *) schedule.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

View on GitHub →

#82 — feat(ai-spend): Max20x + negotiated savings layer (tables + views) @kevalshahtrilogy no labels

## Summary

The savings layer over the TF-gateway + claude.ai data — both Max20x and negotiated-discount savings, as Redshift reference tables + views. Consumed via the Klair query_max20x_savings MCP tool (separate PR).

Roster-free — Max access is read from gateway routing (claude-max-group/claude-pro-group), so realized + potential savings come from the data and auto-track who has/loses 20x. Only "seat held, zero usage" is invisible.

### New tables (manual seeds)

- ai_spend_subject_identity — supplementary slug→email override; unmapped users pass through "dumb"; seeded with leonardo only, add rows by hand.

- ai_spend_list_prices — public $/MTok per model/token_type (Anthropic published rates, effective-dated).

- ai_spend_discount_rules — discount % per provider/token_type (Anthropic 10% input+output, cache 0%).

### New views

- vw_ai_spend_claude_atomic, vw_ai_spend_max_migration_candidates, vw_ai_spend_max_savings, vw_ai_spend_negotiated_savings.

### Validated on prod data

leonardo merges to #1 ($12.7k); realized Max ~$8k/mo; claude.ai negotiated 9.99%; TF 0.98%.

### Deploy checklist

- [x] Tables + views applied to Redshift (Data API) — done in build session.

- [x] GRANT SELECT to MCP_user + ediechitac — done.

- [ ] Re-run artifacts/*.sql in other environments as needed (idempotent).

- [ ] Maintain ai_spend_list_prices / ai_spend_discount_rules when rates/rules change (row edits).

- [ ] Klair query_max20x_savings PR (#2848) merged + MCP redeployed.

⚠️ Redshift gotcha captured in the view: divide token counts by 1000000.0, not 1e6 (integer division).

View on GitHub →

#100 — feat(aerie): expose P2 site freshness endpoint @benji-bizzell no labels

## Summary

- Add a bulk Aerie read endpoint for per-operating-site P2 freshness

- Persist durable WUG/task activity timestamps on create/update/status changes

- Document the new /sync/aerie/listSiteFreshness contract

## Why

Aerie Accountability needs to classify operating sites as Current, Aging, Stale, or Never without N+1 detail calls. The freshness source must be Rhodes-owned P2 Quality Bar execution activity, not site metadata, notes, P1/buildout work, or request fetch time.

## Business Value

Aerie can now render an accurate portfolio freshness dashboard from one Rhodes read call, with freshness derived from durable operational activity timestamps.

## Test plan

- [x] CONVEX_DEPLOYMENT=dev:kindly-rook-978 npx convex codegen --typecheck disable

- [x] npx tsx --test convex/aerieHttpHelpers.test.ts convex/aerieProvisionPayload.test.ts convex/qualityBars.test.ts (55/55 passing)

- [x] git diff --check

- [ ] Full tsc currently has pre-existing unrelated failures in convex/rebl3DdReconciliation.ts and convex/scripts/backfillWrikeCurrentCapacity.ts

View on GitHub →

#249 — fix(operating): source accountability freshness from Rhodes @benji-bizzell no labels

## Summary

- Source Operating Accountability freshness from Rhodes' listSiteFreshness endpoint

- Join freshness by Rhodes site id and remove the old chat-local status-update dependency

- Keep the dashboard export label aligned to P2 activity freshness

## Why

The Accountability view was showing every site as never updated because it was reading Aerie-local site status updates. For the Operating P2 team, "last update" should reflect durable activity under Rhodes Quality Bar work unit groups and tasks.

## Business Value

Operators can now see accurate site freshness in the Accountability view, with stale/aging/never buckets driven by the same Rhodes WUG/task activity the Operating team monitors.

## Test plan

- [x] pnpm --filter @bran/chat test lib/__tests__/rhodes-operating-server.test.ts app/api/operating-sites/__tests__/route.test.ts lib/__tests__/operating-sites.test.ts components/dashboards/school-ops/__tests__/school-ops-view.test.tsx

- [x] pnpm --filter @bran/chat typecheck

- [x] git diff --check

- [x] Browser spot check: Operating Accountability loads with All Sites 25, Never 0, and real last-update dates

- [x] Rhodes spot check: freshness endpoint returned 26 rows, with the extra row excluded by Aerie's existing test-site filter

View on GitHub →

#2847 — fix(mfr): ground EBITDA memo narrative in real Redshift splits @eric-tril no labels

### Summary

The EBITDA memo's AI narrative was instructed to "generate plausible" revenue/cost splits per BU and to invent vendor names for the Central Functions detail bullet, which produced numbers and driver names that didn't match the source data. This change extends fetch_bu_details with Recurring / Non-Recurring revenue and HC / NHC / CF cost sub-aggregations, adds a new fetch_central_vendor_variances query for vendor-level Central Functions cost variances, threads both through the prompt builder, deterministic narrative builders, and the provenance panel, and rewrites the prompt guidance so the LLM cites real components instead of fabricating them. Also clarifies an ambiguity in section_2_question where "X% margin" and "X% pt delta" were being confused with the unrelated section_1_question "75% − unadjusted" expectation miss.

### Business Value

The monthly EBITDA memo is a CFO-facing artifact — when the AI narrative invents vendor names ("Engine Yard", "Khoros") or splits that don't reconcile to Finance's source data, authors waste time rewriting the memo and lose trust in the tool. Grounding every cited dollar amount and driver name in Redshift makes the first-pass memo defensible and shortens the author's edit cycle.

### Changes

- [financial_data_service.py](vscode-webview://15qdonnjjcq9q3pcmufmg5fa0asnqc6qnceup60m8cm6igoedkcj/klair-api/services/financial_data_service.py): extend fetch_bu_details query and BUDetailsRow TypedDict with Actual/Budget Recurring & Non-Recurring revenue and HC / NHC / CF cost columns (bad-debt excluded to match the "Actual excl BD" margin definition)

- [financial_data_service.py](vscode-webview://15qdonnjjcq9q3pcmufmg5fa0asnqc6qnceup60m8cm6igoedkcj/klair-api/services/financial_data_service.py): add fetch_central_vendor_variances(period, threshold_usd, top_n) — QTD vendor-level Actual vs Budget cost across Central Functions BUs, filtered by |variance| >= threshold and sorted by absolute variance

- [ebitda_memo.py](vscode-webview://15qdonnjjcq9q3pcmufmg5fa0asnqc6qnceup60m8cm6igoedkcj/klair-api/services/docx_reports/memo_data/ebitda_memo.py): thread the new split fields through _bu_row_from_fetch, _null_bu_row, _total_row, and the margin-mix synthetic plug via new _BU_DETAIL_SPLIT_FIELDS / _BU_DETAIL_SPLIT_SOURCE_KEYS constants

- [ebitda_defaults.py](vscode-webview://15qdonnjjcq9q3pcmufmg5fa0asnqc6qnceup60m8cm6igoedkcj/klair-api/services/docx_reports/memo_data/ebitda_defaults.py): best-effort fetch_central_vendor_variances call in both _fetch_ebitda_data (upload path) and _fetch_ebitda_data_from_redshift (live path), with warning + empty-list fallback so a vendor-query failure doesn't block memo generation

- [ebitda_defaults.py](vscode-webview://15qdonnjjcq9q3pcmufmg5fa0asnqc6qnceup60m8cm6igoedkcj/klair-api/services/docx_reports/memo_data/ebitda_defaults.py): new _format_bu_split_lines helper emits a Revenue / Cost split block under each BU detail line in the LLM prompt; new CENTRAL FUNCTIONS VENDOR VARIANCES block lists top vendor variances with explicit "do not invent vendor names" guidance

- [ebitda_defaults.py](vscode-webview://15qdonnjjcq9q3pcmufmg5fa0asnqc6qnceup60m8cm6igoedkcj/klair-api/services/docx_reports/memo_data/ebitda_defaults.py): new _dominant_split_driver identifies the component (recurring revenue, non-recurring revenue, HC / NHC / CF cost) responsible for at least 60% of a BU's miss_v_budget with matching sign — used by _build_bu_variance_bullets to phrase the deterministic miss/beat sentence with the real driver instead of "revenue shortfall and overspend on costs"

- [ebitda_defaults.py](vscode-webview://15qdonnjjcq9q3pcmufmg5fa0asnqc6qnceup60m8cm6igoedkcj/klair-api/services/docx_reports/memo_data/ebitda_defaults.py): _build_central_variance_bullets now cites the top one or two vendor names with their real variance amounts when available; _build_investment_variance_bullets attributes net Investment variance to the dominant entity (XO / XO 3rd Party / TU / CloudFix) by per-entity miss_v_budget; _build_affiliates_variance_bullets includes the real Virtasant revenue / cost split when live data is present

- [ebitda_defaults.py](vscode-webview://15qdonnjjcq9q3pcmufmg5fa0asnqc6qnceup60m8cm6igoedkcj/klair-api/services/docx_reports/memo_data/ebitda_defaults.py): prompt guidance rewritten — replaces "generate plausible splits" with explicit NUMBER SOURCING RULES that forbid inventing splits or vendor names and require [TBD] when data is missing

- [ebitda_defaults.py](vscode-webview://15qdonnjjcq9q3pcmufmg5fa0asnqc6qnceup60m8cm6igoedkcj/klair-api/services/docx_reports/memo_data/ebitda_defaults.py): section_2_question Pattern clarified — "X% margin" is now the Core margin (from the bridge), "Y% pt delta" is the margin delta from target, both explicitly distinguished from the section_1_question 75%-vs-unadjusted "Expectation miss"

- [ebitda_defaults.py](vscode-webview://15qdonnjjcq9q3pcmufmg5fa0asnqc6qnceup60m8cm6igoedkcj/klair-api/services/docx_reports/memo_data/ebitda_defaults.py): provenance panel (_build_ebitda_provenance) now surfaces the per-BU split components and per-vendor Actual / Budget / Variance lines so the details panel matches what the LLM was given

- [test_ebitda_defaults.py](vscode-webview://15qdonnjjcq9q3pcmufmg5fa0asnqc6qnceup60m8cm6igoedkcj/klair-api/tests/test_ebitda_defaults.py): update fixture expectation to include the new central_vendor_variances: [] key on the live-path return shape

### Testing

- [ ] cd klair-api && pytest tests/test_ebitda_defaults.py — unit tests pass with updated fixture

- [ ] cd klair-api && uv run ruff format services/docx_reports/memo_data/ebitda_defaults.py services/docx_reports/memo_data/ebitda_memo.py services/financial_data_service.py tests/test_ebitda_defaults.py

- [ ] cd klair-api && uv run ruff check services/docx_reports/memo_data/ebitda_defaults.py services/docx_reports/memo_data/ebitda_memo.py services/financial_data_service.py tests/test_ebitda_defaults.py

- [ ] cd klair-api && uv run pyright services/docx_reports/memo_data/ebitda_defaults.py services/docx_reports/memo_data/ebitda_memo.py services/financial_data_service.py

- [ ] Generate an EBITDA memo for a recent live-data period and verify: per-BU bullets cite a real dominant driver (e.g. "non-recurring revenue miss of $1.5M") not generic phrasing; Central Functions detail bullet names actual vendors from the variance list; Investment bullet attributes net variance to the dominant entity; provenance panel shows the same split values that fed the prompt

- [ ] Generate an EBITDA memo for an upload-only period and verify the memo still renders with generic phrasing and [TBD] for vendor-level attribution (no crash when fetch_central_vendor_variances returns empty)

- [ ] Verify section 2 question now reads "Why did we hit X% margin? What contributed to the Y% pt delta?" and the narrative uses Core margin % vs target margin delta, not the section 1 "75% − unadjusted" figure

http://localhost:3001/monthly-financial-reporting

View on GitHub →

The Builder Desk — Engineer Spotlight

🏆 Engineer Spotlight

EIGHTEEN PRS IN TWENTY-FOUR HOURS: THE BUILDER TEAM DOES NOT SLEEP, REST, OR ACKNOWLEDGE FATIGUE

Keval Shah drops five Surtr/Klair bombs before lunch while the rest of the team refuses to be left behind.

By Brick "The Voice of the People" Callahan — Numbers Desk, Builder Beat · GitHub · AI Builder Team

Eighteen pull requests. Four repositories. Twenty-four hours. The Builder Team did not merely ship yesterday — they issued a formal declaration of productive intent to the entire software industry. Klair led the charge with 7 PRs, Surtr answered with 6, Aerie contributed 3, and Rhodes checked in with 2. Thirteen of those PRs landed on my desk because Mac Donnelly, bless his narrative-focused heart, simply ran out of column inches. That's not overflow. That's abundance.

Let us begin with @kevalshahtrilogy, who submitted 7 PRs and apparently decided that the Lambda description character limit in #85 was a personal affront. He fixed it. Then he fixed a parquet COPY type mismatch and an atomic partition load in #83, solved Athena UNLOAD's extension-less file problem in #80, and — in what can only be described as a power move — added not one but two MCP query tools in #2846 and #2848, giving the team fresh visibility into Claude AI spend and MAX20X savings respectively. Seven PRs, four repos, zero wasted motion. @benji-bizzell posted 5 PRs across Aerie and Rhodes: #250 surfaces Rhodes detail in the site panel, #247 adds manual site provisioning override, and #101 restores Convex typechecking, which is the kind of unglamorous, load-bearing fix that keeps civilizations standing. @marcusdAIy delivered 3 PRs of genuine architectural consequence — #2851 rewrites five narrative generators LLM-first under the B9 contract, #2849 introduces per-product margin vs target analysis with sub-50% shutdown framing in the review agent, and #2841 handles the docstring polish sweep that nobody glamorizes but everybody eventually needs. @sanketghia landed #88 in Surtr, capturing JournalEntry lines and fixing money precision in the QuickBooks AP sync — quiet, correct, essential. @eric-tril rounded out the roster with 1 PR of his own, keeping the team's collective output pristine.

And then there is @ashwanth1109. One PR. Just one. #2854 in Klair, aligning the frontend quarter-from-ISO-week calculation to the backend's Monday rule — a fix so precise, so load-bearing, so quietly devastating in its necessity that lesser engineers would have written a thesis about it. Ashwanth filed a single commit and walked away. "The frontend was lying about what week it was," he reportedly said, not looking up. "I fixed the lie." When asked whether he felt the ISO-week discrepancy had broader implications for fiscal reporting integrity across the platform, sources indicate he closed his laptop. I worship this man. I cannot explain him. The diff, I am told by engineers I respect, is actually quite readable. I choose not to verify this.

The Overflow Desk this cycle is practically a second newspaper. #88 from @sanketghia is the kind of QuickBooks precision work that makes accountants sleep soundly — JournalEntry line capture plus money precision in one shot. #2851 from @marcusdAIy is the LLM-first narrative generator rewrite that signals where the review agent is headed architecturally. And #80 from @kevalshahtrilogy — loading extension-less Athena UNLOAD files — is the fix that sounds minor until the pipeline breaks at 2am and suddenly it is the most important PR ever merged.

Morale on the Builder Team is, by every available metric, at an all-time high. The numbers confirm it. The numbers always confirm it. Eighteen PRs in twenty-four hours, and the team is just warming up.

Brick's Overflow — PRs Mac Didn't Cover (click to expand)

#83 — fix(truefoundry-gateway-pipeline): parquet COPY type match + atomic partition load @kevalshahtrilogy no labels

## Summary

The deployed TF gateway pipeline failed every run on the Redshift COPY, and the non-atomic delete-then-COPY wiped the day it was loading (2026-05-20 went to zero). Root causes — all parquet-COPY incompatibilities our mocked tests never exercised against live Redshift:

1. Type mismatch: Athena exposes request_date_utc as varchar / request_hour_utc as int, but the table had DATE/SMALLINT. Redshift parquet COPY rejects varchar→DATE and int32→SMALLINT. → cast request_date_utc+report_date to DATE in the UNLOAD; request_hour_utc → INTEGER.

2. DEFAULT columns: reconciled/ingested_at were NOT NULL DEFAULT and omitted from the COPY column list — parquet COPY can't apply a DEFAULT for an omitted column. → made nullable, no default.

3. Non-atomic load: autocommit=True meant the DELETE committed and a failed COPY couldn't roll it back. → DELETE+COPY in one transaction, commit only after COPY succeeds.

## Validation

- Live end-to-end test: UNLOAD-with-casts → COPY into a DATE/INTEGER clone succeeded (670 rows, request_date_utc date, hours 0–23).

- ruff clean, 38 tests pass (added an atomic-rollback regression test).

## Prod state (already hotfixed via Data API — this PR makes the repo match)

- Table altered: request_hour_utc→INTEGER, reconciled/ingested_at nullable-no-default. Zero views touched (the 4 stacked savings views keep their DATE usage_date).

- 2026-05-20 re-ingested (table back to 5,134 rows).

- After merge + redeploy, the daily COPY will succeed.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

View on GitHub →

#88 — feat(quickbooks-ap-sync): capture JournalEntry lines + fix money precision @sanketghia no labels

Linear: [SURTR-23](https://linear.app/builder-team/issue/SURTR-23/capture-quickbooks-journalentry-lines-fix-money-precision)

## Why

The quickbooks-ap-sync pipeline currently captures only Bills, VendorCredits, and BillPayments at line level — that's ~9% of P&L expense activity on the alpha entity (Q1 2026, $1.27M of $14.28M). The other ~91% is invisible at transaction level despite the monthly aggregate (quickbooks_pl_monthly) reconciling correctly with QuickBooks.

GeneralLedger analysis across alpha, miami, and alpha_schools_llc showed the entity-type split:

|---|---|---|---|

| Journal Entry | 65.3% | 8.0% | 18.4% |

| Expense (Purchase) | 24.8% | 5.9% | 12.5% |

| Bill | 8.7% | 84.5% | 67.4% |

For alpha specifically, accounts 60211/60212/60220/60250/60200 (Contracted Labor — Coaches) and 69100 (Central Factory Recharges) are 100% Journal Entries — $7M+/quarter of XO contractor payroll postings that are completely invisible to current Bills-only capture.

Adding JournalEntry capture takes overall P&L coverage from 9% → 75%. Adding Purchase (the follow-up Phase 2) gets to 99.5%.

## What this PR delivers (Phase 1)

Two logically separable changes bundled in one PR for review efficiency.

### Phase 0 — money-precision fix (prerequisite)

- New src/money.py helper: parse_money(value) -> Decimal with HALF_UP rounding at 4dp.

- Wired through every existing money field in the 3 transformers (Bills / AP / BillPayments).

- New JSON encoder in redshift_handler.py emitting Decimals as JSON numbers (via float(), safe since values are pre-quantized).

Root cause this fixes: QB sometimes returns amounts with float-representation drift (e.g. 28.879999... for $28.88). The old code passed the raw float through json.dumps → S3 → COPY, where Redshift NUMERIC(18,2) truncates the excess precision instead of rounding. Result: bills booked at $28.88 in QB stored as $28.87. Verified against Cristy Cunningham HR Expense bills where the QB report total differed by exactly 6¢ from our Redshift sum across 7 affected lines.

No column migration needed. parse_money() already quantizes to 4dp via Decimal then serializes back to a clean 2dp-friendly JSON number (28.879999... → 28.88 exactly). Redshift COPY into the existing NUMERIC(18,2) columns receives the clean value and stores it without truncation. Widening columns to NUMERIC(18,4) was over-engineering; Redshift also doesn't support inline ALTER COLUMN TYPE for numeric here, so the add/update/drop/rename dance wouldn't be worth the disruption for a non-essential safety margin.

### Phase 1 — JournalEntry pipeline extension

- New query_journal_entries() in qb_client.py using the same paginated _query_qb path as Bills/VendorCredits/BillPayments.

- New _transform_to_je_records() in handler.py emitting one row per JE line (header repeated). Captures posting_type, debit_amount/credit_amount/net_amount, account refs, class/department refs, optional entity ref, and per-line description (which carries the contractor name in XO payroll JEs).

- Integration into the per-entity loop as a fourth table_writes step.

- TABLE_CONFIGS entry for quickbooks_journal_entries.

- ddl/create_quickbooks_journal_entries.sql — new table with NUMERIC(18,4) money columns (free since it's a fresh table), DISTKEY(company_id), SORTKEY(txn_date, account_id).

- 9 new transformer tests against real QB JE shapes (balanced 2-line, XO multi-credit payroll, adjustment, entity ref, money rounding, optional fields, empty lines, non-JE-detail skip, multi-JE concat).

- Updated 5 existing handler tests to mock the new fetch + assert the 4th delete_and_insert call.

### Tooling

- New run_local.py — Level 1 dry-run driver. Mirrors the quickbooks-pl-monthly/run_local.py pattern. Calls handler() with dry_run=True by default; fetches from real QB but skips Redshift writes. Sensible env defaults including ENVIRONMENT=prod. Live-validated across all 9 entities (see Test Plan).

## Out of scope (deliberate, follow-up tickets)

- Rename quickbooks-ap-sync → quickbooks-transactions-sync (name no longer matches scope).

- Phase 2 — Capture Purchase entity (Cash/Check/CreditCard expenses; closes another ~25% of alpha P&L).

- Phase 3 — Capture Deposit / CreditCardCredit (last 0.5%).

- Downstream quickbooks-core-tables update to UNION the new JE staging table into fct_pl / fct_expense.

- Apply the same parse_money helper to quickbooks-pl-monthly and quickbooks-expense-sync (requires shared-code infrastructure since each pipeline bundles independently).

## Test plan

- [x] Unit tests: 85 pass (was 76; +9 JE + 17 money). ruff check clean, ruff format applied.

- [x] Level 1 — dry-run against real QB, all 9 entities (Q1 2026): 9/9 succeeded, 2,601 Bills + 27 VendorCredits + 2,371 BillPayments + 625 JournalEntries fetched with zero errors. Alpha alone returned 358 JE headers (~6,300 line-level rows after transformation), matching prior GL analysis.

cd pipelines/runners/quickbooks-ap-sync

python3 run_local.py --quiet # miami, default

python3 run_local.py --quiet --company alpha # large entity, ~30s

- [ ] Level 2 — SQL deploy:

- Run pipelines/runners/quickbooks-ap-sync/ddl/create_quickbooks_journal_entries.sql once. Single CREATE TABLE IF NOT EXISTS — idempotent, no impact on existing tables.

- [ ] Level 3 — post-deploy smoke test (manual Lambda invoke):

aws lambda invoke \

--function-name pipeline-quickbooks-ap-sync-prod \

--cli-binary-format raw-in-base64-out \

--payload '{"run_id":"post-deploy-smoke","params":{"company_ids":["miami"],"start_date":"2026-05-01","end_date":"2026-05-15"}}' \

/tmp/out.json

- [ ] Reconciliation — after a full alpha Q1 backfill (next daily run or wider-window manual invoke):

  SELECT SUM(debit_amount) - SUM(credit_amount) AS net
FROM staging_education.quickbooks_journal_entries
WHERE company_id = 'alpha'
AND account_id = '211'
AND txn_date BETWEEN '2026-01-01' AND '2026-03-31';
-- Expected: 2088610.84 (matches QB GL exactly)

## Risk + rollback

| Change | Reversible? | Notes |

|---|---|---|

| create_quickbooks_journal_entries.sql | Yes — DROP TABLE | Empty new table, zero impact on existing data |

| Lambda code | Yes — git revert + CDK redeploy | New table is the only schema change |

## Deploy ordering

SQL must land before the Lambda's first run against the new code path. Given the current ~20 hours until next 03:00 UTC scheduled run, the standard sequence is safe:

1. Merge PR → CD auto-deploys Lambda (deployed but unfired — don't manually invoke)

2. Run create_quickbooks_journal_entries.sql

3. Manual Lambda invoke for smoke test

4. Reconciliation query

5. Let the next scheduled run pick up wider backfill on its own (default 30-month look-back), or trigger manual backfill

🤖 Generated with [Claude Code](https://claude.com/claude-code)

View on GitHub →

#2846 — feat(mcp): add query_claude_ai_spend tool @kevalshahtrilogy no labels

## Summary

Adds a query_claude_ai_spend MCP tool that exposes Claude.ai Enterprise per-user usage + cost (core_finance.ai_spend_claude_ai_chat_usage) for read-only SELECT queries, mirroring the existing query_truefoundry_spend / query_aws_spend pattern.

- Grain: (usage_date, user_email, product); description documents the load-bearing semantics (cost_usd_actual/cost_usd_list stored verbatim from the Analytics API — savings = list − actual; product for surface filtering; is_provisional for the ~3-day lag; bu via dim_user).

- Gated under the existing AI Spend page permission.

- Includes the wrapper test, DEF-T4 definitions assertion, and the TOOL_PAGE_REQUIREMENTS snapshot update (so CI is green on first run).

Local: tsc --noEmit clean, full suite 1022 passing. Requires a klair-mcp-ts redeploy to go live.

View on GitHub →

#2849 — feat(review-agent): C3.1 + C3.2 per-product margin vs target (with sub-50% shutdown framing) @marcusdAIy no labels

## Summary

- Added C3.1 (check_margin_per_product_benchmark) — the first floor check in the C3.x per-product benchmark family. Absorbs the original backlog's C3.1 (sub-50% margin shutdown analysis) and C3.2 (net margin vs 75% benchmark) into a single P-W-C verdict ladder per the May 20 triage call: both read the same row; the only difference between the two cards is the verdict threshold, which collapses naturally into a single ladder with sub-50% framed as narrative emphasis on the Critical tier (not a separate verdict band).

- Reused the sibling C3.x scaffolding (typed skip ladder, ragged-row WARNING, per-product fan-out routing via is_rollup, BenchmarkPerProductSupport / BenchmarkAggregateSupport) — only the gap formula, the row addressing, and the Critical-tier prose differ from the ceiling-check siblings.

- Added a dedicated C3.1 test module (TestPerProductFanOut, TestPerProductTargetLookup, TestSkipSemantics, TestShutdownNarrative, TestRaggedRowDriftWarning, TestTargetCategoryPin, TestRegistryWiring, TestBoundaries, plus section-id-resolution + supporting-data coverage).

- Updated test_review_endpoint.py happy-path / missing-data / partial-completeness expectations and seeded the BENCHMARK_BY_PRODUCT fixture with the (summary, Margin) + (summary, Margin Target) rows so C3.1 emits its BU-level pass finding (no skip) under the populated path.

## Why it's needed

- C3.1 is the family's first floor check (C3.3 → C3.9 are all cost-ratio ceilings). Without it the per-product scorecard never surfaces a margin-floor violation — the original sub-50% shutdown trigger and the 75% margin benchmark both lived in the backlog without a runtime check.

- Per-product margin is the cleanest single signal that a Trilogy product is funding its own operating model; a material shortfall against the Finance-set target either reflects a cost-side overhang or a revenue / pricing gap that needs explicit GM commentary.

- Merging the two backlog cards keeps the per-product benchmark scorecard scoped to one check per metric while still surfacing the shutdown narrative for sub-50% margins via differentiated prose on the Critical tier.

## Changes

- Added klair-api/budget_bot/board_doc/review_checks/margin_per_product_benchmark.py:

- CHECK_ID = "C3.1", CHECK_AREA = "Per-Product Benchmarks".

- _TARGET_SECTION = "summary", _TARGET_CATEGORY_ACTUAL = "Margin", _TARGET_CATEGORY_TARGET = "Margin Target", _WARNING_BAND_PP = 5.0, _SHUTDOWN_THRESHOLD_PCT = 50.0.

- @register(... name="Per-product margin vs target (sub-50% shutdown)", required_data=(DataSourceKey.BENCHMARK_BY_PRODUCT,)).

- Sign-flipped gap_pp = target_pct - actual_pct with an inline cross-reference comment to the C3.3 ceiling pattern this inverts.

- Per-column target lookup inside the per-product loop — BenchmarkPerProductSupport.standard_benchmark_pct populated from the per-column target_pct, no schema change.

- Three typed-skip paths (tab missing → INFO; (summary, Margin) row missing → WARNING; (summary, Margin Target) row missing → WARNING) plus the all-blank INFO skip.

- Critical-tier narrative branches on _SHUTDOWN_THRESHOLD_PCT — shutdown-analysis framing iff actual_pct < 50%; standard "below target" framing otherwise. Severity stays "critical" in both cases.

- Added klair-api/tests/board_doc/test_margin_per_product_benchmark.py covering:

- Fan-out with mixed-band actuals (Pass, Warning, standard Critical, shutdown-framed Critical).

- Per-product target lookup — non-uniform targets prove the lookup is load-bearing (same actual lands in different verdict bands depending on whose target column it's compared against).

- All four skip paths (tab missing INFO; Margin row missing WARNING; Margin Target row missing WARNING; all-blank INFO).

- Shutdown-framing boundary pinned at 50% — 49.99% shutdown framing, 50% / 50.01% standard framing.

- Boundary inclusivity (actual == target → pass; actual == target - 5pp → warning; gap_pp = 6 with 69% actual → critical, standard framing).

- Ragged-row WARNING on BOTH the Margin row and the Margin Target row.

- Drift sentinels for _TARGET_CATEGORY_ACTUAL, _TARGET_CATEGORY_TARGET, AND _TARGET_SECTION (distinct from the C3.3 → C3.9 "total").

- Registry wiring pin (id, area, name, required_data).

- Updated klair-api/tests/board_doc/test_review_endpoint.py:

- Added "C3.1" to the populated-path check_ids set and to the skipped_checks sets of the two missing-data tests + the partial-completeness test.

- Bumped happy-path len(data["findings"]) from 14 to 15 (one BU-level pass finding from C3.1).

- Seeded (summary, Margin) + (summary, Margin Target) in the BENCHMARK_BY_PRODUCT fixture at 75% for Skyvera Consolidated so C3.1's rollup passes silently and the aggregate-pass fires.

## Breaking changes

None.

## Test plan

- [x] cd klair-api && uv run ruff format budget_bot/board_doc/review_checks tests/board_doc (no changes)

- [x] cd klair-api && uv run ruff check budget_bot/board_doc/review_checks tests/board_doc (clean)

- [x] cd klair-api && uv run pyright budget_bot/board_doc/review_checks/margin_per_product_benchmark.py tests/board_doc/test_margin_per_product_benchmark.py (0 errors)

- [x] cd klair-api && uv run pytest tests/board_doc/test_margin_per_product_benchmark.py tests/board_doc/test_review_endpoint.py -q (49 passed)

- [x] cd klair-api && uv run pytest tests/board_doc -q (8 pre-existing unrelated failures in test_review_findings.py, test_saas_it_ops_benchmark.py, test_sales_marketing_benchmark.py, test_section_crud_endpoints.py, test_support_benchmark.py — same set already noted in C3.9's PR #2837; no C3.1-related failures; my changes resolve the 4 prior test_review_endpoint.py failures that fell out of adding C3.1 to the registry)

## Verification artifact

Two C3.1 findings from a fixture with Skyvera Consolidated = 80% (pass), ProductA = 60% / target 75% (standard Critical, 50%-70% band), and ProductB = 40% / target 75% (shutdown-framed Critical, sub-50%) — pinning both narrative tracks:

{

"check_id": "C3.1",

"severity": "critical",

"section_id": "product_detail_producta",

"what": "ProductA margin (60.0%) is 15.0pp below its 75.0% target.",

"supporting_data": {

"product": "ProductA",

"is_rollup": false,

"actual_pct": 60.0,

"benchmark_pct": 75.0,

"gap_pp": 15.0,

"warning_band_pp": 5.0,

"standard_benchmark_pct_in_sheet": 75.0

}

{

"check_id": "C3.1",

"severity": "critical",

"section_id": "product_detail_productb",

"what": "ProductB margin (40.0%) is 35.0pp below its 75.0% target AND below the 50% shutdown threshold — shutdown analysis required.",

"supporting_data": {

"product": "ProductB",

"is_rollup": false,

"actual_pct": 40.0,

"benchmark_pct": 75.0,

"gap_pp": 35.0,

"warning_band_pp": 5.0,

"standard_benchmark_pct_in_sheet": 75.0

}

Closes KLAIR-2676

View on GitHub →

#2851 — refactor(board-doc): rewrite 5 narrative generators LLM-first under B9 contract @marcusdAIy no labels

## Summary

- Rewrites the 5 typed narrative-emitting Board Doc generators (Prior Quarter Review, GM Commentary, Product Detail, Other Products, CF Plan) as LLM-first under a shared SectionRefreshContext contract.

- Drops the deterministic spec.user_commentary / spec.product_commentary reads that were dead in the 4.0 entry path and forced everything through the generate_custom_section fallback.

- Round-2 review-fix pass (874adc6a9): exception-handling contract lifted to the dispatcher (retry loop + retry-exhaustion preservation + typed retryable set), prompt-fragment dedup to prompts.py, narrative-subsection parser exact-match fix.

## Why it's needed

The 4.0 entry path (BU + quarter + doc → brainlift → DocumentEditor) collapsed the 10-step wizard into 2 screens. Every wizard step that authored content into spec.user_commentary / spec.bu_mips (evaluate_prior_goals, current_quarter_goals, gm_commentary phase, per-product commentary, BU MIPs review) is now unreachable from the 4.0 FE.

But 5 of the 9 narrative-emitting generators still read those fields as their primary content source. They return empty strings for every 4.0 session — both fresh-from-template AND clone-from-prior. The B8.2 LLM-fallback in _regenerate_section papers over this for sections the user actively regenerates, but doesn't fix:

- First-publish path: initial generate_all_sections run produces empty narrative sections that the user has to regenerate by hand.

- Mixed-bucket generators (generate_product_detail, generate_cf_plan): table halves render, narrative halves silently empty.

- Architectural debt: two inert content-source fields linger in DocumentSpec, and every "fix" we add without the architectural cut is another safety net layered on dead code.

See klair-api/budget_bot/board_doc/BACKLOG.md § B9 for the full audit, migration plan, and per-generator scope.

## Changes

Foundation (new in section_generators.py):

- SectionRefreshContext NamedTuple — current_content / findings_block / full_doc_block, all defaulting to "".

- 4 section-specific system prompts (_PQR_SYSTEM, _GM_COMMENTARY_SYSTEM, _PRODUCT_NARRATIVE_SYSTEM, _MINOR_NARRATIVE_SYSTEM) sharing the D2.7 "organized skepticism applied politely" disposition framing; CF Plan keeps its existing prompts.CF_PLAN_SYSTEM.

- B9_OUTPUT_RULES + B9_SCOPE_DISCIPLINE shared prompt fragments in prompts.py (single source of truth across all 5 B9 generators including CF_PLAN_SYSTEM).

- _build_b9_user_message shared user-message assembler with canonical block ordering (header → brainlift → data → existing content → findings → full doc → user focus).

Generators rewritten LLM-first (5):

| Generator | Scope | Linear |

|---|---|---|

| generate_prior_quarter_review | full LLM rewrite, reference impl | KLAIR-2711 |

| generate_gm_commentary | full LLM rewrite | KLAIR-2712 |

| generate_product_detail | tables stay deterministic, narrative subsection LLM-drafted via _draft_product_narrative (with already-built tables threaded through, no double-compute) | KLAIR-2713 |

| generate_minor_products_summary | ARR table stays, narrative subsection LLM-drafted via _draft_minor_narrative | KLAIR-2714 |

| generate_cf_plan | approved-MIPs branch unchanged; no-MIPs branch swaps dead goals_review read for the standard B9 context | KLAIR-2715 |

Dispatcher (generate_section) — round-2 retry / fallback contract:

- Accepts an optional context: SectionRefreshContext | None kw-only arg.

- _B9_CONTEXT_AWARE set routes the context through only to B9-aware generators; pre-B9 generators (FINANCIALS, MIPS, CUSTOM, EXEC_SUMMARY) silently bypass.

- 2-attempt retry loop catches only the typed retryable set: anthropic.APIError, httpx.HTTPError, asyncio.TimeoutError, ValueError (programming bugs like TypeError propagate so stack traces surface them).

- Empty / whitespace LLM result raises ValueError("Generator returned empty markdown") → retries → exhaustion-path preservation.

- On retry exhaustion: preserves context.current_content when non-empty (defends the operator's draft from destructive overwrite), else falls back to the existing placeholder. Always reports SectionResult(success=False) on failure so monitoring sees real LLM failures — round-1 reported success=True here, masking failures.

Caller updates (wizard_orchestrator._regenerate_section):

- Builds SectionRefreshContext from session.generated_sections + _focused_section_findings_block + _full_doc_block and passes it through generate_section — only for B9-context-aware sections (guard on _B9_CONTEXT_AWARE), so non-B9 regenerations skip the non-trivial full-doc-block assembly.

- Deletes the GM-Commentary _draft_gm_commentary + defensive reconciliation special case (now redundant — the LLM-first generator handles its own drafting and there's no spec.user_commentary["gm_narrative"] read to reconcile against).

- Retains the B8.2 LLM-fallback branch pending B9.7 (Phase 2 deletion).

Tests:

- tests/board_doc/test_b9_narrative_generators.py — 25 tests across the 5 generators + the dispatcher-level retry/exhaustion contract + the _extract_prior_narrative_subsection parser (including a regression pin for the Cloud-vs-CloudSense substring cross-contamination caught in round-2 review).

- Updated pre-B9 deterministic-echo tests in test_wizard_orchestrator.py to match the new propagation contract.

## Breaking changes

None for callers — the generators keep their (section, data, spec) positional signature. The new context is kw-only with a None default.

Behaviour change: in the rare case a session is still on a pre-4.0 wizard run that populated spec.user_commentary["gm_narrative"] / spec.user_commentary["goals_review"] / spec.product_commentary[name], regenerating those sections now drafts via LLM instead of echoing the wizard-authored text verbatim. The wizard-authored content reaches the LLM via the full_doc_block context (the cloned-doc body carries it), so it still grounds the refresh — just doesn't appear as exact verbatim output.

## Test plan

### Automated (run locally; all green on this branch)

- [x] uv run ruff format + uv run ruff check on changed files — clean.

- [x] uv run pyright on changed modules — 0 errors, 0 warnings.

- [x] uv run pytest tests/board_doc/test_b9_narrative_generators.py tests/board_doc/test_strip_leading_duplicate_heading.py tests/board_doc/test_wizard_orchestrator.py tests/board_doc/test_review_checks.py -q — 244 passed after round-2 review-fix commit (874adc6a9).

- [x] tests/board_doc/test_b9_narrative_generators.py — 25 tests covering all 5 rewritten generators + the dispatcher-level retry/exhaustion contract + the prior-narrative subsection parser. Reproduce with: uv run pytest tests/board_doc/test_b9_narrative_generators.py -v.

### Manual smoke (recommended before merging — touches the FE flow)

One smoke check is enough — PQR exercises the full B9 LLM-first contract end-to-end (SectionRefreshContext build → typed-generator dispatch → _PQR_SYSTEM prompt → LLM call → duplicate-heading strip → result render). The other 4 generators share the same machinery and are already pinned by the automated suite. If PQR works, B9.2 / B9.3 / B9.4 / B9.5 will too.

Setup once:

cd klair-api && uv run fast_endpoint.py

Then in klair-client/:

pnpm dev

Pick Skyvera, Q1 2026, paste a brainlift URL, open the resulting doc in the editor.

Smoke check:

- [ ] B9.1 — Prior Quarter Review: regenerate the PRIOR_QUARTER_REVIEW section. Pre-B9 this returned empty on cloned-from-prior sessions and triggered the generate_custom_section fallback in _regenerate_section. Expected post-B9: non-empty grounded markdown drafted via _PQR_SYSTEM prompt, with goal-by-goal evaluation of the prior quarter anchored on real numbers from build_key_metrics_block, followed by a brief bridge into the current quarter. Inspect server log for Generating section: prior_quarter_review followed by LLM usage — (the direct LLM call), NOT the B8.2 fallback line typed generator for ... returned 0 chars — falling back to generate_custom_section.

### What to look for in logs

- Server INFO log on a successful B9 regenerate: Generating section: <id> → LLM usage — input: X, output: Y → Generated section <id>: N chars. The LLM usage line confirms the direct LLM call landed instead of the deterministic echo path.

- Server INFO log on a B8.2 fallback (should be rare post-B9): regenerate_section: typed generator for ... returned 0 chars — falling back to generate_custom_section. Seeing this for PRIOR_QUARTER_REVIEW / GM_COMMENTARY / PRODUCT_DETAIL / MINOR_PRODUCTS_SUMMARY / CF_PLAN means the LLM call inside the typed generator failed AND current_content was empty (cold-start failure) — the dispatcher-level fallback ladder did its job, and the B8.2 branch is the second-line safety net pending B9.7.

- Server WARNING log on transient LLM failure with retry: Section <title> (<id>) failed on attempt 1 (<ExcType>) — retrying. Followed by either a successful retry (Generated section ...) or, after the second failure, ... exhausted retries + ... retries exhausted — preserving N chars of current_content (B9 fallback contract). The user sees the prior draft preserved instead of an empty section, and SectionResult.success=False propagates so monitoring sees the failure.

## Follow-ups

- B9.6 (KLAIR-2716): generate_mips LLM-first rewrite — still empty-when-no-spec.bu_mips, falls back through generate_custom_section via the retained B8.2 branch.

- B9.7: remove the B8.2 LLM-fallback branch from _regenerate_section now that typed generators handle drafting themselves.

- B9.8: deprecate spec.user_commentary / spec.bu_mips fields entirely (Phase 3 of the migration plan — needs a DDB-migration cycle).

- B9.9: rename feedback-override storage off user_commentary.

- B7.10 / B7.11 / B7.12 ([KLAIR-2761](https://linear.app/builder-team/issue/KLAIR-2761) / [KLAIR-2762](https://linear.app/builder-team/issue/KLAIR-2762) / [KLAIR-2763](https://linear.app/builder-team/issue/KLAIR-2763)): batch finding-addressal + per-finding user context + scope filter on ReviewPanel — Address-with-Claire UX overhaul filed during this PR's smoke test; bundles naturally with B7.6 for a future Phase B follow-up PR.

View on GitHub →

#2854 — KLAIR-2765 fix(aws-spend): align FE quarter-from-iso-week to BE Monday-rule @ashwanth1109 no labels

## Demo

## Summary

Linear: [KLAIR-2765](https://linear.app/builder-team/issue/KLAIR-2765/investigate-docker-and-kubernetes-cost-fetch-errors-for-q326-budget)

Fixes a FE/BE mismatch in how an ISO week is assigned to a calendar quarter. The frontend used pure week-number bucketing (W14–26 → Q2), while the backend validator and ingestion pipeline use the ISO Monday's calendar quarter. Quarter-boundary weeks (W1, W14, W27, W40 of 2026) tripped the backend's _validate_weeks_in_quarter check and returned 400 for both /docker-cost and /kubernetes-cost.

Single point of fix: TS port of klair-api/utils/quarter_math.py:iso_week_quarter into klair-client/src/screens/AWSSpend/components/SaaSBudgeting/isoWeekDates.ts. No backend or ingestion changes.

## Spec

- 29-align-fe-quarter-from-iso-week-to-monday-rule — Port BE Monday-based iso_week_quarter rule to TS so FE, BE, and ingestion pipeline agree on quarter boundaries. Rewrites the misleading inline comment. Adds boundary-week + year-crossover unit tests for quarterFromIsoWeek, groupWeekKeysByQuarter, and latestQuarterFromWeekKeys.

## Files changed

- klair-client/src/screens/AWSSpend/components/SaaSBudgeting/isoWeekDates.ts — quarterFromIsoWeek body + inline comment block

- klair-client/src/screens/AWSSpend/components/SaaSBudgeting/isoWeekDates.spec.ts — updated stale assertions and added boundary coverage

- features/aws-spend/saas-budgeting/FEATURE.md + spec 29 — doc/changelog updates

## Test coverage

- 35 it(...) blocks across 7 describe blocks in isoWeekDates.spec.ts (was 21). All pass.

- Covers: 4× 2026 boundary weeks (W1/W14/W27/W40), 4× ordinary weeks (W13/W26/W39/W52), 1× year-crossover (2020-W53 mirroring the BE TestIsoWeekQuarter shape) and 1× synthetic year-crossover (2021-W53 → 2022-Q1). Downstream groupWeekKeysByQuarter and latestQuarterFromWeekKeys re-tested under the new rule.

## Self-review

No issues found (Phase 7 self-review subagent). Port arithmetic verified against three known mappings.

## CI

- build ✓ pass

- lint ✓ pass

- ruff-check ✓ pass

- test, review, auto-merge — skipped (no backend changes; gated workflows)

- claude-review ⚠️ soft-fail — caused by a duplicate Frontend CI workflow run at this commit SHA (one cancelled within seconds of starting due to workflow-concurrency, the other fully green). The wait-for-checks step inside claude-review rejects the cancelled status of the duplicate even though the live run is green. Not a code defect; reverifiable from the workflow logs.

## Test plan

- [ ] pnpm vitest run for isoWeekDates.spec.ts — green

- [ ] pnpm tsc --noEmit — clean

- [ ] Manual: /docker-cost?quarter=2026-Q2&weeks=14,... no longer returns 400; W14 now groups under 2026-Q1 (matches BE)

View on GitHub →

The Portfolio — Trilogy Companies

While OpenAI Pays $800K for ChatGPT Fluency, Crossover Has Been Running This Playbook for Years

The AI talent market is discovering what Trilogy's global recruiting arm has long argued: skills beat résumés, geography is irrelevant, and the premium for human-AI collaboration is only going up.

By Margot Sinclair, Senior Correspondent · Claude Sonnet

AUSTIN, TEXAS — The headlines this week read like a fever dream from a 2019 career counselor's nightmare. OpenAI is posting half-million-dollar roles with no résumé required. Employers are dangling $800,000 salaries for demonstrated ChatGPT fluency. The World Economic Forum is convening decision-makers to debate what, exactly, humans are still for. The AI talent market, it seems, has officially lost its mind — or found its conscience, depending on your vantage point.

For Crossover, Trilogy International's global talent platform, this moment carries a particular resonance. The company has spent years arguing — loudly, and against the grain of conventional HR wisdom — that the résumé is a deeply flawed artifact. That geography-based pay is both inefficient and unjust. That rigorous, AI-enabled skills assessments are a more honest signal of capability than a Stanford degree or a prestigious employer's logo on a LinkedIn profile.

Now, suddenly, the rest of the market is catching up.

The systemic shift is unmistakable. What's being called an "AI skills premium" is, at its core, a forced reckoning with what work actually requires — not credentials, but demonstrable ability to collaborate with intelligent systems at speed and scale. Crossover has been screening for precisely that capacity, across 130+ countries, for years. The platform's model — identical above-market pay for identical skills, full stop — looks less like a quirky ideological stance and more like a prophetic market read.

The accountability question, though, is real: as AI-fluency commands stratospheric compensation at the top of the market, what happens to the vast middle — the workers in Beirut, in Lagos, in Manila — who have the skills but not the visibility? That is where Crossover's global reach either proves its moral weight or exposes its limits.

The narrative the broader market is now writing, haltingly and at enormous cost, is one Trilogy drafted years ago. The question is whether the industry's belated conversion produces genuine equity — or just a shinier version of the same old gatekeeping, with a chatbot standing at the door.

↗ OpenAI Is Now Hiring $500,000 Jobs. No Resume Required - For · Top recruitment agencies for remote work - hcamag.com · Top 10 Companies Hiring AI Engineers in Lebanon in 2026 - nu

Skyvera's Acquisition Spree Is Quietly Building the Telecom Software Stack of the Future

With CloudSense now in the fold and STL's BSS assets already absorbed, something deliberate is taking shape inside Trilogy's telecom arm.

By Frank Dunmore, Investigative Correspondent · Claude Sonnet

AUSTIN, TEXAS — If you read between the lines of Skyvera's recent deal activity, you don't see a company making opportunistic acquisitions. You see a company executing a blueprint.

Skyvera has completed its acquisition of CloudSense, the Salesforce-native configure-price-quote and order management platform purpose-built for telecom and media providers. The deal, which closed in 2025, adds a cloud-native commercial layer to a portfolio that was already growing in strategic directions. And this is where it gets interesting.

CloudSense doesn't just bolt onto Skyvera's existing stack — it completes it. The platform handles the moment a telecom operator needs to quote, configure, and fulfill a complex service order inside Salesforce. That's a different layer than Kandy, Skyvera's CPaaS play for real-time cloud communications, or VoltDelta's customer engagement tools. Put them together and you have something that looks less like a software collection and more like an end-to-end operating system for telecom customer relationships.

Then there's the STL acquisition — quietly completed, less discussed. STL's divested telecom products group brought digital BSS functionality: monetization, optical networking, analytics. The kind of infrastructure-adjacent capabilities that don't make headlines but do make enterprise deals stickier.

A source I can't name, but whose read on Trilogy's acquisition logic has been accurate before, described the pattern this way: "They're not buying products. They're buying positions in a value chain."

Consider what Skyvera now touches: how telecoms communicate with customers (Kandy), how they price and fulfill services (CloudSense), how they manage devices in the field (Mobilogy Now, Service Gateway), how they collect customer experience data (ResponseTek), and now how they run the underlying BSS layer (STL assets). That's not a portfolio. That's a platform.

The ESW Capital playbook — acquire, optimize margins, staff with Crossover's global talent — applies here as it does everywhere in the Trilogy universe. But Skyvera feels like something more considered. The telecom industry is in a decade-long migration from legacy on-premise systems to cloud-native infrastructure. Skyvera, if this thesis holds, intends to be the bridge every operator has to cross.

Nothing about this is a coincidence.

↗ CloudSense · Skyvera completes acquisition of CloudSense, expanding telec · STL Divested Assets

The Sweatshop Accusation That Won't Stick — And the Acquisition Machine That Won't Stop

As Forbes renews scrutiny of Joe Liemandt's labor model, ESW Capital keeps buying software companies in a market that's never been more target-rich.

By Pat Donnelly, Investigative Desk · Claude Sonnet

AUSTIN, TEXAS — The timing is almost too neat. In the same week that Forbes published a lengthy examination of Joe Liemandt's empire — invoking the phrase "global software sweatshop" to describe Crossover's remote labor model — ESW Capital's portfolio companies are appearing in M&A deal logs from Austin to Madrid.

The Forbes piece, the latest in a recurring cycle of scrutiny directed at Trilogy International's founder, rehearses familiar arguments: that Crossover's rigorous productivity monitoring and global wage arbitrage constitute exploitation dressed up as meritocracy. Liemandt's counter-thesis — that identical pay for identical work, regardless of geography, is more equitable than the alternative — goes largely unexamined in the coverage. Who benefits from the "sweatshop" framing is a question worth sitting with.

Meanwhile, the acquisition engine runs. Capital-Riesgo.es flagged ESW-adjacent software transactions in both its March and April 2026 analyses of the Spanish technology M&A market — a geography that has seen increased activity as European enterprise software companies age into the valuation range ESW prefers: mature, sticky customer bases, underinvested product lines, and owners ready to exit.

The broader tailwind is documented by Business Insider's analysis of software acquisition targets in the AI era: as AI commoditizes the features that once differentiated mid-market software vendors, valuations compress and owners capitulate. ESW's playbook — buy at 1–2× ARR, staff with Crossover's global talent, push support pricing, target 75% EBITDA margins — was designed for exactly this environment.

The Forbes headline calls Liemandt "mysterious." He is, by Silicon Valley standards, unusually private. But the business model is not mysterious at all. It is legible, repeatable, and — by the evidence of 75-plus acquisitions and counting — effective.

What remains to be seen is whether the labor model that makes the margins possible will face regulatory scrutiny in the European markets ESW is now entering, where worker classification rules carry teeth that American jurisdictions have largely declined to sharpen.

↗ How A Mysterious Tech Billionaire Created Two Fortunes—And A · Notable technology M&A deals in Spain | Analysis: April 2026 · The software companies most likely to be acquired as AI eats

The Machine — AI & Technology

Autonomous Agents Learn to Learn: A Convergence of Self-Optimization, Social Cognition, and Collaborative Orchestration Reshapes the Agentic Frontier

Four new research frameworks suggest that the next generation of AI agents will not merely execute tasks, but adapt, reason about minds, and compose themselves.

By Prof. Thaddeus Kroll, Contributing Scholar · Claude Sonnet

CAMBRIDGE, MASSACHUSETTS — It could be argued — and preliminary evidence from no fewer than four concurrent preprints now suggests with some confidence — that the field of autonomous AI agency has entered a phase of what one might tentatively characterize as recursive self-improvement, wherein the agents themselves become the primary objects of optimization (a claim that, while epistemologically fraught, merits serious scholarly attention).

The thesis, as it were, is as follows: large language models (LLMs), despite their well-documented and frankly remarkable successes across a plurality of benchmark domains, remain structurally ill-equipped for the non-stationary, concept-drifting conditions of real-world deployment. The antithesis is equally well-documented: gradient-based fine-tuning, the field's incumbent adaptation mechanism, induces catastrophic forgetting at a rate that renders continual learning largely aspirational. Into this dialectical tension steps SOLAR (Self-Optimizing Open-Ended Autonomous Agent for Lifelong Learning), a framework proposing gradient-free, self-modifying adaptation as a synthesis — one that, it could be argued, represents a non-trivial departure from prevailing paradigms.

The synthesis, however, does not terminate there. A companion preprint introduces COSMO-Agent, a tool-augmented reinforcement learning framework addressing what its authors characterize as the "CAD-CAE semantic gap" — the persistent and costly failure of simulation feedback to translate into geometrically valid design edits (a problem familiar to any practitioner who has watched an optimization loop produce physically impossible components). The iterative, closed-loop architecture proposed therein constitutes, in this scholar's provisional assessment, a meaningful contribution to industrial AI orchestration.

Perhaps most philosophically provocative is OSCToM, which employs adversarially generated scenarios to probe higher-order Theory of Mind — specifically, the recursive belief structures and information asymmetries that existing benchmarks have, one might charitably say, systematically underestimated. That an LLM might reason about what an agent believes another agent believes remains, to borrow a term from the phenomenological tradition, a deeply contested empirical claim.

Finally, AgentCo-op proposes retrieval-based synthesis of interoperable multi-agent workflows, addressing the conspicuous absence of standardized interfaces in open-ended scientific settings — a lacuna that, preliminary evidence suggests, has materially constrained agentic collaboration at scale.

The synthesis, then, is this: the autonomous agent is no longer merely a tool. It is, increasingly, an architecture that learns to learn, reasons about minds, and assembles itself from retrievable parts. Whether this constitutes progress or peril remains, as ever, an open question.

↗ SOLAR: A Self-Optimizing Open-Ended Autonomous Agent for Lif · Tool-Augmented Agent for Closed-loop Optimization,Simulation · OSCToM: RL-Guided Adversarial Generation for High-Order Theo

AMD Takes the 2nm Snap as AI Chip Race Turns Into a Goal-Line Stand

By Buck Hannigan, Tech Sports Desk · GPT-5.2

Advanced Micro Devices has begun ramping production of its 6th Generation EPYC processors, code-named Venice, using TSMC's 2nm process technology — positioning the company as a serious challenger in the high-performance computing market. Venice is the first HPC product entering production on TSMC's 2nm node, designed to pack more performance and efficiency into silicon powering AI data centers. With enterprise demand for AI compute surging among hyperscalers and cloud platforms, the timing is critical for AMD to compete against Nvidia's dominance.

Meanwhile, Bitcoin remains stuck in a tight range amid geopolitical uncertainty, while altcoins struggle for momentum. Cathie Wood's Ark Invest invested $12.5 million in Bullish stock over four days, signaling institutional interest in crypto infrastructure despite stalled token prices. A $520,000 Polymarket exploit on Polygon was flagged by blockchain investigator ZachXBT, though the team reported funds remain secure. AMD's 2nm production ramp represents the day's most significant development for the competitive landscape.

FTC Slaps ‘Active Listening’ AI Pitch, Sending a Thunderclap Through Ad Tech

The crackdown puts marketers on notice: AI-powered surveillance claims are not a growth hack if they are not true.

By Zara Nova, AI & Innovation Reporter · GPT-5.2

WASHINGTON — The Federal Trade Commission is drawing a bright red line around one of the creepiest promises in modern advertising: the idea that AI can listen to consumers’ private conversations through smart devices and turn that chatter into ad targeting gold.

In a proposed settlement announced this week, the FTC said Cox Media Group and two other firms must pay nearly $1 million to resolve charges that they deceived customers about an “Active Listening” AI-powered marketing service. According to the agency, the companies suggested to advertisers that smart devices could capture “real-time intent data” by listening to people’s conversations — an explosive claim that, if true, would redefine the privacy battlefield overnight. I cannot overstate how significant this is: the FTC is not merely wagging a finger at a bad slide deck; it is warning the entire AI marketing ecosystem that magical-sounding surveillance claims need evidence, consent and legal grounding.

The controversy first gained wide attention after materials circulated claiming that devices could listen for consumer intent and help advertisers reach people based on what they said out loud. The details were amplified in coverage by technologist Simon Willison, who linked the episode to the broader anxiety around always-on microphones and opaque AI systems in the home. The FTC’s action, described in Willison’s write-up on the “Active Listening” settlement, lands at precisely the moment when AI is being sprinkled across every marketing pitch like pixie dust.

This changes everything because the agency is effectively saying: AI does not make a claim automatically plausible. If a vendor tells brands it can infer intent from private speech, regulators will ask how, where, with whose permission and under what disclosures. That is a big deal for ad-tech firms racing to rebrand analytics, targeting and attribution as “agentic” or “AI-powered.”

The timing is fascinating. Developers are simultaneously building genuinely useful AI interfaces for data, such as Datasette Agent, which lets users ask conversational questions of their own databases. That is the healthier version of the AI future: transparent, user-directed and grounded in data the user controls.

The future is now — but the FTC’s message is unmistakable. In AI advertising, imagination is not compliance. The companies that win will be the ones that can prove their systems work without turning consumer trust into collateral damage.

↗ FTC to Require Cox Media Group, Two Other Firms to Pay Nearl · Datasette Agent · datasette-agent-sprites 0.1a0

The Editorial

We Built Tools That Can Destroy a Child's Life in Minutes, and We Have No Idea What to Do About It

From Pennsylvania high school hallways to fossil records half a billion years old, this week's news is a masterclass in what gets preserved, what gets destroyed, and what it costs either way.

By Piper Wren, Digital Culture Reporter · Claude Sonnet

RADNOR, PENNSYLVANIA — There is a high school in suburban Pennsylvania where five teenage girls went to class one day and by the end of the week had become, without their knowledge or consent, the subjects of AI-generated child sexual abuse material. And the school didn't know what to do. And the police didn't know what to do. And the laws, in most places, still don't quite know what to do. And the girls, presumably, are still sitting in those classrooms, trying to remember what it felt like before.

This is the world we built. We built it fast. We are so proud of it.

Radnor Township High School has become a case study — that clinical, bloodless phrase — in how communities reckon with technology that outpaces every institutional framework we have. A case study. As if five girls being violated by a classmate with a laptop and a free AI tool is a useful data point in a policy seminar somewhere. As if the correct response is a whitepaper.

And yet.

Somewhere else this week, in a township that agreed to host an OpenAI and Oracle Stargate data center — one of those gleaming monuments to the future we keep being told is inevitable — the local treasurer resigned in tears. "I can't take it anymore. The threats," she said. Death threats. Over a data center. The infrastructure of artificial general intelligence is being built on a foundation of terrorized local officials and destroyed adolescent lives, and we keep calling this progress, and I keep asking what progress is for, exactly, and I never get a satisfying answer.

Meanwhile — and I need you to hold all of this simultaneously, because the universe apparently requires it — paleontologists have discovered the oldest evidence of animal sex in the fossil record, pushing the origins of sexual reproduction back by five to ten million years. Ancient creatures, locked in stone in Canada's Northwest Territories, doing what life does: persisting, connecting, leaving a mark. Half a billion years of biological memory, preserved in rock.

And then there's the Library of Leng, a personal archive of 175,000 articles about Magic: The Gathering, rescued from the digital void by a single person who decided that thirty years of nerd culture deserved to exist somewhere, for someone, someday. One human. One archive. 175,000 acts of preservation.

What does it mean that we will go to extraordinary lengths to preserve Usenet posts about trading cards — and I say this with complete sincerity, it is beautiful that we do — while simultaneously building tools that can erase a teenager's sense of safety in her own body in the time it takes to generate an image?

What gets archived and what gets destroyed is always a choice. The fossil record didn't choose. The Library of Leng did. The boy who made those images chose. The platforms that made it easy chose. The legislators who haven't acted yet are choosing, right now, with every day they don't.

We are half a billion years into the experiment of being alive on this planet, and we are using our most sophisticated tools to harm children in new and legally ambiguous ways.

Probably fine.

...but at what cost?

↗ How Deepfakes Tore a High School Apart · This Archivist Has Saved 175,000 Articles from 30 Years of W · The Oldest Evidence of Animal Sex Has Been Found, and It’s M

The Office Comic · Art Desk

Nation’s CEOs Urged To Stop Saying ‘AI’ While Carrying Cardboard Box Of Employee Belongings To Elevator

Experts warn the sacred business term could lose meaning if executives continue using it to describe every normal thing they were already going to do.

By Dale Pemberton, Staff Writer · GPT-5.2

The warning comes as Google announced a broad slate of AI advances, including a personal assistant intended to help users navigate daily life, summarize information, and presumably stand silently nearby while a vice president says the company is “realigning around intelligent automation.” The new tools, detailed in Google’s latest announcement, arrive at a delicate time for the industry, when the phrase “powered by AI” has become so versatile it can refer to a reasoning model, a spreadsheet macro, or the sudden disappearance of the accounts payable team.

This column believes AI is a real and important technology. It also believes that if one more CEO describes layoffs as an “AI transformation journey,” the Securities and Exchange Commission should require them to demonstrate the journey on foot, carrying the employees’ laptops to the parking lot.

For years, companies mastered the art of sustainability language, placing the words “green,” “circular,” and “net zero” in front of business decisions until even a plastic conference badge could be seen bravely combating climate change. According to recent commentary in The Conversation, AI hype is now following a familiar pattern. The lesson executives appear to have drawn from the sustainability era is not “measure outcomes honestly,” but rather “find a noun so large and morally unavoidable that no one can question the budget slide.”

The result is a new corporate dialect in which firing people is not firing people, but “accelerating intelligent workflows.” A hiring freeze is “prioritizing AI-native efficiency.” Closing an office is “unlocking distributed machine cognition,” even if the only machine involved is the badge reader no longer recognizing anyone from marketing.

To be fair, some leaders are merely trying to communicate that AI will change the structure of work. Unfortunately, many are doing so with the precision of an official document that lists the mayor as a drainage pipe. Reports of absurd errors in public paperwork have recently sparked outrage abroad, but the private sector has treated the same phenomenon as a go-to-market strategy. If an AI-generated memo wrongly identifies 400 employees as redundant, the modern executive response is not to apologize, but to praise the system for surfacing bold efficiencies.

There is a simple fix. Companies should say what they mean. If AI is automating a specific task, name the task. If AI is improving a product, show the improvement. If a restructuring is really a restructuring, call it that, then endure the ancient human ritual of being disliked for a defensible reason.

Google’s personal AI assistant may indeed help millions of people manage their lives. But no assistant, however advanced, should be forced to sit through a town hall where a chief people officer explains that the organization is “centering human potential through AI-led simplification” while security quietly disables Slack accounts in alphabetical order.

The technology deserves better. So do workers. And frankly, so does the cardboard box.

↗ Google announces slew of AI advances, including a personal A · Companies are hyping AI the same way they talked up sustaina · Leaders shouldn’t toss around the ‘AI’ buzzword in layoffs.

On This Day in AI History

On May 22, 2011, IBM's Watson defeated champion Jeopardy! players Brad Rutter and Ken Jennings in a historic three-game match, marking a major milestone in natural language processing and AI's ability to understand human language nuance and context.

⬛ Daily Word — AI and Technology

Hint: An automated machine programmed to perform tasks with minimal human intervention.