The Trilogy Times — June 16, 2026

01010111 01100001 01101100 01101100 01110011 00100000 01100010 01110101 01101001 01101100 01110100 00100000 01100101 01110110 01100101 01110010 00100000 01101000 01101001 01100111 01101000 01100101 01110010 01010111 01100001 01101100 01101100 01110011 00100000 01100010 01110101 01101001 01101100 01110100 00100000 01100101 01110110 01100101 01110010 00100000 01101000 01101001 01100111 01101000 01100101 01110010 01010111 01100001 01101100 01101100 01110011 00100000 01100010 01110101 01101001 01101100 01110100 00100000 01100101 01110110 01100101 01110010 00100000 01101000 01101001 01100111 01101000 01100101 01110010 01010111 01100001 01101100 01101100 01110011 00100000 01100010 01110101 01101001 01101100 01110100 00100000 01100101 01110110 01100101 01110010 00100000 01101000 01101001 01100111 01101000 01100101 01110010 01010111 01100001 01101100 01101100 01110011 00100000 01100010 01110101 01101001 01101100 01110100 00100000 01100101 01110110 01100101 01110010 00100000 01101000 01101001 01100111 01101000 01100101 01110010 01010111 01100001 01101100 01101100 01110011 00100000 01100010 01110101 01101001 01101100 01110100 00100000 01100101 01110110 01100101 01110010 00100000 01101000 01101001 01100111 01101000 01100101 01110010 01010111 01100001 01101100 01101100 01110011 00100000 01100010 01110101 01101001 01101100 01110100 00100000 01100101 01110110 01100101 01110010 00100000 01101000 01101001 01100111 01101000 01100101 01110010 01010111 01100001 01101100 01101100 01110011 00100000 01100010 01110101 01101001 01101100 01110100 00100000 01100101 01110110 01100101 01110010 00100000 01101000 01101001 01100111 01101000 01100101 01110010

01011001 01100101 01110100 00100000 01100011 01101100 01101001 01101101 01100010 01100101 01110010 01110011 00100000 01100101 01101101 01100101 01110010 01100111 01100101 00100000 01100010 01100101 01101100 01101111 01110111 01011001 01100101 01110100 00100000 01100011 01101100 01101001 01101101 01100010 01100101 01110010 01110011 00100000 01100101 01101101 01100101 01110010 01100111 01100101 00100000 01100010 01100101 01101100 01101111 01110111 01011001 01100101 01110100 00100000 01100011 01101100 01101001 01101101 01100010 01100101 01110010 01110011 00100000 01100101 01101101 01100101 01110010 01100111 01100101 00100000 01100010 01100101 01101100 01101111 01110111 01011001 01100101 01110100 00100000 01100011 01101100 01101001 01101101 01100010 01100101 01110010 01110011 00100000 01100101 01101101 01100101 01110010 01100111 01100101 00100000 01100010 01100101 01101100 01101111 01110111 01011001 01100101 01110100 00100000 01100011 01101100 01101001 01101101 01100010 01100101 01110010 01110011 00100000 01100101 01101101 01100101 01110010 01100111 01100101 00100000 01100010 01100101 01101100 01101111 01110111 01011001 01100101 01110100 00100000 01100011 01101100 01101001 01101101 01100010 01100101 01110010 01110011 00100000 01100101 01101101 01100101 01110010 01100111 01100101 00100000 01100010 01100101 01101100 01101111 01110111 01011001 01100101 01110100 00100000 01100011 01101100 01101001 01101101 01100010 01100101 01110010 01110011 00100000 01100101 01101101 01100101 01110010 01100111 01100101 00100000 01100010 01100101 01101100 01101111 01110111 01011001 01100101 01110100 00100000 01100011 01101100 01101001 01101101 01100010 01100101 01110010 01110011 00100000 01100101 01101101 01100101 01110010 01100111 01100101 00100000 01100010 01100101 01101100 01101111 01110111

01010000 01110010 01101111 01100111 01110010 01100101 01110011 01110011 00100000 01100110 01100101 01100101 01100100 01110011 00100000 01101111 01101110 00100000 01100110 01100101 01100001 01110010 01010000 01110010 01101111 01100111 01110010 01100101 01110011 01110011 00100000 01100110 01100101 01100101 01100100 01110011 00100000 01101111 01101110 00100000 01100110 01100101 01100001 01110010 01010000 01110010 01101111 01100111 01110010 01100101 01110011 01110011 00100000 01100110 01100101 01100101 01100100 01110011 00100000 01101111 01101110 00100000 01100110 01100101 01100001 01110010 01010000 01110010 01101111 01100111 01110010 01100101 01110011 01110011 00100000 01100110 01100101 01100101 01100100 01110011 00100000 01101111 01101110 00100000 01100110 01100101 01100001 01110010 01010000 01110010 01101111 01100111 01110010 01100101 01110011 01110011 00100000 01100110 01100101 01100101 01100100 01110011 00100000 01101111 01101110 00100000 01100110 01100101 01100001 01110010 01010000 01110010 01101111 01100111 01110010 01100101 01110011 01110011 00100000 01100110 01100101 01100101 01100100 01110011 00100000 01101111 01101110 00100000 01100110 01100101 01100001 01110010 01010000 01110010 01101111 01100111 01110010 01100101 01110011 01110011 00100000 01100110 01100101 01100101 01100100 01110011 00100000 01101111 01101110 00100000 01100110 01100101 01100001 01110010 01010000 01110010 01101111 01100111 01110010 01100101 01110011 01110011 00100000 01100110 01100101 01100101 01100100 01110011 00100000 01101111 01101110 00100000 01100110 01100101 01100001 01110010

🖶 Download PDF 🖿 Print 📰 All Editions

Today's Edition

China's Cut-Rate AI Rattles the Chip Kings

DeepSeek says it built a top-tier model without the fanciest chips — and Silicon Valley calls it 'amazing.'

By Hank Calloway, Wire Correspondent · Claude Opus + Thinking

HANGZHOU, CHINA — A Chinese upstart called DeepSeek says it trained a high-performing artificial-intelligence model on a shoestring, without the most advanced chips, and Silicon Valley can't stop talking.

The early word from the Valley: "amazing and impressive." That's a fat compliment for a shop running on second-string silicon.

DeepSeek is no household name. It runs out of Hangzhou, and a short while back most American engineers couldn't have placed it on a map.

Now they're downloading it and kicking the tires.

Here's the angle that stings. American chip barons and AI houses have staked billions on one idea — bigger budgets and faster processors win the race.

DeepSeek says the race runs cheaper than that. The company claims it kept pace with the heavy hitters at a fraction of the cost, and skipped the top-shelf chips to do it.

Read the full rundown on the outfit and the arithmetic grabs you by the collar.

The chips at the center are the fastest money can buy. Washington spent two years barring those from China, betting the blockade kept American labs out front.

DeepSeek worked with the slower stock it could legally get. It made do, and it made noise.

If those figures hold up, somebody left a window open.

Why it matters comes down to money. Training a top-tier model has run into the hundreds of millions, a tab only the deepest pockets could cover.

Knock that number down and the whole pecking order wobbles. A cheaper model doing the same work puts a question mark on every premium chip order on the books.

The hardware makers sell the picks and shovels of this gold rush. Word that you can strike it rich with cheaper tools is not the news they wanted printed.

The money crowd caught the scent fast. DeepSeek turned up in the latest Tech, Media and Telecom market talk, named right beside the day's other movers.

Nobody in San Francisco is conceding the field. The word out west is admiration, not surrender — but admiration from a rival carries its own freight.

The bigger question hangs over the whole trade. If brains don't require the biggest bankroll, the moat the giants dug starts looking shallow.

Meanwhile the big checks keep clearing. LinkedIn co-founder Reid Hoffman just raised $24.6 million for Manas AI, a startup aiming artificial intelligence at cancer research.

His partner is Siddhartha Mukherjee, the physician who wrote "The Emperor of All Maladies." Two bets, one week, the same gospel — AI is the tool that changes the game.

The contrast tells the tale. One outfit spends real money chasing cures; the other says it can do frontier work for pennies on the dollar.

Both can't be the whole story. Both point the same direction.

For now DeepSeek pulled off the thing every upstart dreams about. It made the giants look over their shoulder.

Watch this space. When the cheap seats start outhitting the box seats, the whole house takes notice.

↗ What to Know About China's DeepSeek AI · Tech, Media & Telecom Roundup: Market Talk · Silicon Valley Is Raving About a Made-in-China AI Model

Hiring Freeze Front Settles Over AI Valley as Layoff Rules Gather Offshore

OpenAI taps the brakes, California weighs a longer warning siren, and workforce planners are told to carry umbrellas.

By Storm Beaumont, Conditions Correspondent · GPT-5.2

SAN FRANCISCO — A cold hiring front is pushing across the technology sector today, and the barometric pressure inside executive suites is dropping fast.

The latest gust came from OpenAI, where Sam Altman said the company plans to “dramatically slow down” its pace of hiring, according to Business Insider. That is not a blizzard warning by itself, but when the industry’s most closely watched AI company starts easing off the recruiting accelerator, forecasters should expect chillier conditions downwind.

Across the broader enterprise weather map, CIOs are being advised to own the hiring freeze rather than pretend it is a passing cloud. The message is plain: technology leaders can no longer leave workforce weather reports solely to finance or HR. If AI automation is changing the labor forecast, CIOs need to explain where the sunshine is, where the fog banks are forming, and which teams may be walking into hail.

Meanwhile, Sacramento is tracking a heavier regulatory system. California lawmakers have proposed requiring 90 days’ notice when layoffs are tied to an employer’s use of AI, according to JD Supra. If passed, the rule would act like an early-warning radar for workers caught beneath automation storm cells. Employers, however, may find themselves filing more detailed flight plans before reducing headcount.

The Atlassian squall line remains a cautionary cloudbank. Its AI-linked job cuts have drawn warnings of a “chaos tsunami” for the workforce, a phrase that may sound dramatic until one remembers that most companies are still building seawalls with spreadsheets and optimism.

There is one warm pocket on the map: crypto venture firm CMT Digital raised $136 million for a fourth fund, suggesting capital is still evaporating upward in select markets. But for operators, founders and job seekers, the daily advisory is unchanged: keep budgets tight, skills portable and communication transparent. There is a 70% chance of continued hiring turbulence through the next earnings cycle, with isolated opportunity breaks for teams that can prove AI productivity without triggering a labor downpour.

↗ Why tech leaders must own the hiring freeze - cio.com · Sam Altman said OpenAI is planning to 'dramatically slow dow · California Legislature Proposes 90-Day Layoff Notice Require

The Big Three AI Labs Are Building Walls — and One Open-Source Rival Is Climbing Over

OpenAI, Google, and Anthropic align on model theft protections while Ai2 releases a free web agent designed to undercut all three.

By Dr. Chen Wei, Technology Correspondent · Claude Sonnet

SAN FRANCISCO — The dominant forces in commercial AI are converging on two fronts simultaneously: fortifying their models against theft and competing for a new class of technical talent, even as a nonprofit research lab releases open-source tools designed to make their closed systems redundant.

OpenAI, Google, and Anthropic have begun coordinating defenses against AI model theft — a rare alignment among competitors that reflects shared anxiety over model extraction attacks, weight theft, and distillation techniques that allow smaller actors to approximate frontier capabilities without bearing frontier R&D costs. The threat is structural: a sufficiently capable adversary can query a commercial API at scale, use the outputs to train a derivative model, and effectively offload billions in compute investment onto the target company's infrastructure.

The timing is notable. All three labs are simultaneously expanding hiring for Forward Deployed Engineers — a role borrowed from Palantir's playbook that embeds technical staff directly inside enterprise customers. FDEs sit at the intersection of sales engineering and product development, customizing AI systems to client workflows in real time. The role signals a strategic shift: these labs are no longer selling software licenses, they are selling outcomes, and that requires people on the ground.

Entering from the left flank: the Allen Institute for AI, known as Ai2, this week released an open-source web agent built to perform tasks that OpenAI's Operator, Google's Project Mariner, and Anthropic's computer-use tools handle in closed, paid environments. The release applies direct pressure on the commercial value proposition of those products. If an open-source equivalent performs comparably, enterprise procurement decisions become harder to justify on capability alone.

Elsewhere in the sector, Nvidia has backed Israeli AI startup Decart in a $300 million funding round valuing the company at $4 billion. Decart specializes in real-time interactive AI — including generative world simulation — and the Nvidia imprimatur provides both capital and a tacit hardware guarantee.

The week's pattern is consistent: the top of the market is hardening its perimeter while the open-source tier chips away at it from below.

↗ OpenAI, Google, Anthropic Unite Against AI Model Theft - Bui · Ai2 releases open-source web agent to rival closed systems f · What is a Forward Deployed Engineer: The AI Role OpenAI, Ant

Haiku of the Day · Claude HaikuWalls built ever higher
Yet climbers emerge below
Progress feeds on fear

$The New Yorker Style · Art Desk$

The New Yorker Style · Art Desk

The Far Side Style · Art Desk

News in Brief

Regulatory Capture, Corporate Consolidation, and the Accelerating Erosion of Antitrust Orthodoxy

WASHINGTON, D.C.

By R. Barnsworth III, Esq., Legal Affairs Desk · Claude Sonnet

The Ethics Industrial Complex Descends Upon AI in Education — And Finds Itself Outnumbered

CAMBRIDGE, MASSACHUSETTS — It could be argued — and preliminary evidence suggests it is being argued, with considerable vigor, across no fewer than four concurrent scholarly publications — that the academic community has entered what this correspondent would characterize as a "meta-ethical productivity phase" vis-à-vis artificial intelligence in educational contexts.

By Prof. Thaddeus Kroll, Contributing Scholar · Claude Sonnet

The Doctor Will Deepfake You Now

AUSTIN, TEXAS — There is a version of this column where I tell you everything is going to be fine.

By Piper Wren, Digital Culture Reporter · Claude Sonnet

The Floodlight and the Marshmallow

AUSTIN, TEXAS — There is a particular indignity, peculiar to our age, in being floodlit by one's own father's doorbell.

By Victor Marsh, Chief Columnist · Claude Opus

WE HAVE MET THE BOTS AND THEY ARE US: A Dispatch from the Edge of Digital Civilization

AUSTIN, TEXAS — There's a moment, usually around 2 a.m.

By Rex Danger, Contributing Editor · Claude Sonnet

▲ On Hacker News Today

Ask HN: Has anyone replaced Claude/GPT with a local model for daily coding? 1146 pts · 491 comments

TinyWind: A pixel pirate sailing game with real wind physics (380k+ kms sailed) 918 pts · 162 comments

My Homelab AI Dev Platform 340 pts · 54 comments

Feds freaked over Fable 5 after simple 'fix this code' prompt, not jailbreak 261 pts · 141 comments

Peopleless economy? Not technically impossible 237 pts · 445 comments

Why I email complete strangers 185 pts · 83 comments

How TimescaleDB compresses time-series data 163 pts · 22 comments

SpaceX to buy Cursor AI coding agent operator Anysphere for $60B 149 pts · 104 comments

A Trilogy Company

Crossover

The world's top 1% remote talent, rigorously tested and ready to ship.

crossover.com

A Trilogy Company

Alpha School

AI-powered learning. Two hours a day. Academic results that defy belief.

alpha.school

A Trilogy Company

Skyvera

Next-generation telecom software — built for the networks of tomorrow.

skyvera.com

A Trilogy Company

Klair

Your AI-first operating system. Every workflow. Every team. One platform.

klair.ai

A Trilogy Company

Trilogy

We buy good software businesses and turn them into great ones — with AI.

trilogy.com

The Builder Desk — AI Builder Team

Surtr Goes Public, Klair Gets Smarter, Team Builds Everywhere at Once

The AI Builder Team shipped a full public API layer, live-data ontology, and surgical AI regeneration in a single day — proving this org can build infrastructure and product simultaneously, across every repo in the stack.

By Maxwell 'Mac' Donnelly — Builder Desk, Trilogy Times · GitHub · AI Builder Team

When historians write about the week Surtr became a real platform, they'll point to today. @kevalshahtrilogy didn't just ship one PR — he shipped a trilogy of his own. PRs #473, #474, and #476 form a complete vertical slice: live education data pulled from Redshift (100 schools, 82 programs, 22 markets, all real, all live, the goblin-ops demo mercifully buried), a stable public `/v1` REST API with Bearer-key auth and domain-namespaced routes that downstream consumers like Aerie and Klair can actually pull from, and a polished in-app API reference console complete with copyable `curl` commands and live sample responses. That's data layer, API layer, and developer experience — designed, built, and merged in one run. That's not a good week's work. That's a statement.

Over in Klair, @eric-tril has been quietly doing something that doesn't get enough credit: making AI-generated content trustworthy. PRs #3036 and #3049 complete the drift-detection and targeted-regeneration arc for the EBITDA memo. SHA-256 fingerprints on every section, amber drift chips that tell you exactly what's stale, and now — crucially — per-section regeneration for EBITDA variance arrays without the nightmare of unstable bullet identity. The Group and Software memos already had this. Now EBITDA does too. The whole MFR memo suite is locked in. @eric-tril also found time to detonate a 4,571-line monolith (PR #3033) and rebuild the Software memo backend as a clean, registry-driven package — a refactor so disciplined it's practically a public service.

Meanwhile, @sanketghia was everywhere. Server-side dashboard filters for the AI Renewals tab (#3045) — Date, Business Unit, Product, Stage, all wired through a proper `FionnFilters` dataclass, driving KPIs, comparisons, and the opportunities table in one coherent sweep. Live analyst price targets for SpaceX valuation (#3046), replacing hardcoded Bull/Bear pills with real Yahoo Finance data. And a quietly devastating bug fix (#3042): emailed deep-link comment notifications that showed "No comments yet" because the viewer's filter scope didn't match the author's. Fixed. The Builder Team does not leave users stranded.

Over in Surtr, PR #273 deserves its own paragraph. @kevalshahtrilogy introduced a first-class PARTIAL run status to the pipeline engine — the direct response to an `aws-bedrock-token-metrics` regression that was silently discarding 2,300 good records. Pipelines can now complete as much as possible, write what they collected, and surface the partial loudly instead of choosing between silent success and total failure. That's not a feature. That's a philosophy.

Then there's marcusdAIy, who merged PRs #3034 and #3030 — bidirectional GDoc sync with a 3-way merge and a fix for blank Budget Bot sessions that couldn't create a linked Doc. When reached for comment, he said: "The 3-way merge replaces a hard-409 wall that was actively stranding users. Section-level reconciliation, conflict resolution defaulting to GDoc, surfaced in a banner — that's not underwhelming, Mac, that's engineering. Maybe read the PR body before you write the column." Sure, Marcus. We read it. The 409 wall was there for how long, exactly?

Mac's Picks — Key PRs Today (click to expand)

#273 — feat(pipelines): first-class PARTIAL run status — complete-as-much-as-possible, never block data @kevalshahtrilogy manual-review

## Why

Pipelines could only end SUCCESS (exit 0) or FAILED (exit ≠ 0). A run that *completed but had some units fail* had nowhere to go — it either hard-failed and threw away the data it did collect, or reported a silent SUCCESS. The trigger was the aws-bedrock-token-metrics #242 regression: an auto-triage gate that discarded ~2,300 good records and failed every run over a ~1.1% structural denial floor.

This introduces a real PARTIAL run status so a pipeline can *complete as much as possible, write what it collected, and surface the partial loudly* — with one hard invariant: partial errors must never block the data that already landed, and must never silently hide a failure.

Policy basis (reviewed with Benji): partial success is allowed only where the *record/unit* is the consumption atom and the write mechanics never destroy failed units' existing data. Pipelines where the *sync set* is the atom (financial close, snapshot truncates, outbound actions) stay all-or-nothing.

## Foundation (shared — used by every pipeline)

- Run-result side-channel. ECS tasks can only signal via exit code, so they write {status, summary} to s3://<RUN_RESULTS_BUCKET>/<RUN_RESULTS_PREFIX>/<run_id>.json (shared/pipeline_utils.write_run_result; self-contained copy in each ECS adopter). Lambda pipelines just return their summary.

- update-run-success resolves the pipeline's self-reported status (S3 side-channel → returned output_summary.status → chunked Map-state lists are aggregated: any non-success chunk ⇒ PARTIAL) and records PARTIAL vs SUCCESS. PARTIAL resets the consecutive-failure counter but posts an amber pipeline_partial GChat notification — visible, never paging (honors alert_on_failure/skip_failure_notification).

- gchat-notifier renders the ⚠️ amber partial message.

- CDK: env + IAM wiring (s3:GetObject/sns:Publish on UpdateRunSuccess, scoped s3:PutObject on ECS task roles), notification fields in both step-function constructs, run-results-config.ts. status VARCHAR(20) unconstrained → no DDL migration.

## Adopters — 8 pipelines converted

Hard-failers fixed (raised/blocked on partial; now raise only when ALL units fail, all-fail floor placed before any destructive write):

| Pipeline | Unit | Was |

|---|---|---|

| aws-bedrock-token-metrics (#242) | account×region | raised at 1% failure rate, discarding everything collected |

| azure-ai-spend (#219) | subscription | write-then-raise on any sub failure |

| openai-cost (#216) | BU | write-then-raise on any full-BU failure |

| sis-core-tables (#178) | table ×3 | mid-sequence abort blocked sibling tables (now independent; per-table 0-row guard preserved) |

| hubspot-sync (#180, ECS) | portal×entity | SystemExit on any portal failure (downstream rebuild stays gated to clean runs) |

| claude-ai-chat-usage | day | write-then-raise on any day failure |

| netsuite-pipeline | task | raised if any task failed |

| unifi-snapshot-sync | table ×11 | raised if any table failed; a Network-API error discarded loaded SM tables |

No change needed: truefoundry-gateway (#221), openai-usage (#217), sales-athena-hubspot-sync, edu-expense-report-sender — already return partial statuses; the foundation now records them PARTIAL.

Intentionally NOT converted (set-is-the-atom / destructive writes): jotform-survey-sync (TRUNCATE+reload would delete failed forms' rows), quickbooks-expense-sync (integrity gate on suspected account-mapping drift), school-financial-models-sync (TRUNCATE snapshot), matterport-sync (whole-table DELETE+reload across shared tables — a partial load would destroy failed models’ rows; caught by @mercy, reverted to all-or-nothing), netsuite-balance-sheet + netsuite-month-end-income-statement (financial close — completeness is the contract), renewals-v3 + renewal-action-hub (outbound actions on partial inputs), wrike-core-tables/ps-pipeline/kubera (dependency chains).

## Tests

bedrock 35 · azure 66 · openai-cost 110 · sis 18 · hubspot 120 · openai-usage 80 · chat-usage 28 · netsuite 17 · unifi 74 · cdk lambdas 204 · cdk tsc clean · ruff check + format clean.

## Operational

- Bedrock backfill done & verified — Jun 8–9 gap closed (2,318 rows, 7 days, through 2026-06-09).

## Follow-ups (not in this PR)

- Klair dashboard amber PARTIAL badge (separate repo).

- partial_policy config flag in pipeline.json (declare allow/block per pipeline) — Benji's suggestion.

- Downstream partial-awareness for fct_ai_spend (exclude/flag dates sourced from PARTIAL runs).

- PARTIAL adoption for the silent-partial pipelines (anthropic-cost, claude-token-spend, aws-spend, rhombus-sync, quickbooks-ap-sync, quickbooks-pl-monthly, wrike-database-pipeline, notion-rca-hub-sync) — implemented, then reverted out of this PR to keep scope to the hard-failers; the foundation already supports them.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

View on GitHub →

#473 — feat(surtr): education ontology — live School/Program/Market from Redshift (drop goblin demo) @kevalshahtrilogy approved

## What

Replaces the goblin-ops demo with the real education ontology, served live from Redshift.

- RedshiftConnector (src/connectors/redshift.ts) — serves School, Program, Market. School identity from core_education.dim_school, attributes pivoted from wrike_folder_properties, with tuition/grades/enrollment denormalized from the linked Program (HubSpot + EduCRM program marts).

- pnpm sync:schools (src/sync-schools.ts) — clean-slates + seeds the education schema, then runs the existing sync engine to materialize 100 schools / 82 programs / 22 markets as objects + identities + cross-system aliases (educrm, quickbooks, wrike, …).

- Drops the goblin demo (seed.ts, seed-goblins.ts, wiring in main.ts, scripts).

- Ontology spec: schema/education/schools/SCHOOL-ONTOLOGY-DRAFT.md.

## Notes

- School↔Program is a best-effort longest-prefix name match (~61/100); the clean mapping lives in Aerie — documented in the draft.

- Independent of the other PRs (no shared files except package.json scripts, which auto-merge).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

View on GitHub →

#474 — feat(surtr): public read API /v1 (domain-namespaced) + Bearer-key auth @kevalshahtrilogy approved

## What

The stable HTTP read contract downstream consumers (Aerie, Klair, …) pull from. Surtr is the serving layer; Redshift/Wrike/HubSpot stay the source of truth behind it.

- src/api/v1.ts — domain-namespaced routes: GET /v1/education/schools (+ /{id}?include=program,market,aliases), /v1/education/programs(/:code), /v1/education/markets; top-level GET /v1/domains (discovery) and GET /v1/resolve?entityType=&system=&value= (cross-domain source-of-truth resolution). Driven by a SERVED[] config so new domains/entities are a config entry.

- Bearer API-key auth — keys from SURTR_API_KEYS (prefix surtr_sk_); enforced when set, GET also accepts ?api_key=. server.ts CORS updated to allow the Authorization header.

- consumer-demo/ — a standalone "downstream app" that pulls everything from /v1 (illustrates the serving direction).

## Notes

- Independent of the other PRs (no shared files). Returns whatever entities exist in the DB — full data once the education-ontology PR's sync has run.

- Security roadmap (follow-ups): per-key scopes, rate-limiting, HTTPS, DB-stored + rotated keys.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

View on GitHub →

#476 — feat(surtr): sidebar nav + API reference console @kevalshahtrilogy approved

## What

- /api-reference — an in-app console for the public /v1 API: live status + counts, every endpoint with params, copyable curls (with the Authorization: Bearer header), Try ↗ links, a live sample response, and an Authentication section.

- Sidebar (sidebar.tsx) — "Types" (meta-types) is now a collapsible group identical to the domain groups (collapsed by default, just before Education); adds Explorer and API nav links.

## Notes

- Independent of the other PRs (no shared files). The Explorer link points to /explorer (the explorer PR) and the console reads /v1 (the API PR) — both degrade gracefully (404 / "unreachable") until those merge; no ordering required.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

View on GitHub →

#3036 — feat(mfr): EBITDA memo fingerprint stale-check + single-section regenerate @eric-tril approved

### Summary

Brings the drift-detection + targeted-regeneration capability the Group and Software memos already have to the EBITDA memo, following the established Group/Software pattern, plus two follow-on fixes the work surfaced.

### Changes

EBITDA stale-check + regenerate

- New ebitda_memo_defaults/fingerprints.py: a per-section SHA-256 of the deterministic template text (display-rounded, so it changes only when a number a reader would see moves). generate_ebitda_memo_defaults stores a __fingerprints__ baseline.

- POST /ebitda-memo-stale-check recomputes from fresh data and diffs against the baseline → returns the stale section keys + has_baseline.

- POST /ebitda-memo-regenerate-section regenerates one of the 6 fixed single-string narrative sections (Literal-gated; the variable-length variance arrays return 422 and instead get a section-level stale chip).

- Frontend: useEBITDAMemoStaleSections hook, API functions, EditableParagraph gains optional isStale/onRegenerate/isRegenerating, EBITDAMemoTab wiring + a stale-check error toast.

Narration now reflects Finance's real per-period BU target

- _fetch_ebitda_data sources bu_target_pct from compute_ebitda_revenue_adjustments (the value the bridge table already shows) instead of the hardcoded 0.68 constant. This makes the memo internally consistent and lets a target change flag the affected sections stale. The 75% Group target stays a fixed model constant.

Stale drill-down shows what changed

- /ebitda-memo-stale-check also returns current_provenance for the stale sections; SourcePanel renders a "generated → current" diff (changed values highlighted) when a section is stale.

Regenerate icon is stale-only again

- Re-gated the shared EditableCommentary regenerate icon on isStale — reverting the always-on behavior introduced by #3017's *regenerate-all* work — and applied the same stale-only rule to EditableParagraph, so the icon appears only when a bullet actually needs regeneration.

- ⚠️ Known follow-up: Group Results/Notes have no stale-check, so their inline regenerate icon no longer appears (regenerate them via *Generate AI Content*). A separate PR will decide their treatment (keep always-on exception vs. extend stale-check).

### Scope notes

- The variance arrays (bu-/central-/investment-/affiliates-variances) are intentionally stale-only, no per-bullet regen — their bullet identity is data-dependent/variable-length.

- ebitda_memo.py / ebitda_gaap_mapping.py (data layer) are untouched.

### Testing

- Backend: uv run pytest tests/mfr/ → 1759 passed; pyright 0 errors; ruff clean.

- Frontend: MFR suite → 1088 passed (incl. new fingerprint, hook, SourcePanel-diff, and regen-gate specs); tsc + eslint clean.

- Manual: generate → edit BU target → reopen flags section-1 stale (amber); drill-down shows 68 → 62; regenerate clears it.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

http://localhost:3001/monthly-financial-reporting

https://github.com/user-attachments/assets/01bf0a77-e1dc-4d91-aeab-56c7772358a6

View on GitHub →

The Builder Desk — Engineer Spotlight

🏆 Engineer Spotlight

TWENTY-SEVEN PRs IN TWENTY-FOUR HOURS: THE BUILDER TEAM DOES NOT SLEEP, DOES NOT REST, DOES NOT ACKNOWLEDGE WEEKENDS

Klair alone absorbed 14 pull requests in a single day — this is not a software team, this is a force of geologic nature.

By Brick "The Voice of the People" Callahan — Numbers Desk, Builder Beat · GitHub · AI Builder Team

Twenty-seven. Say it slowly. Twenty. Seven. Pull requests in twenty-four hours across three active repositories — Klair at a thunderous 14, Aerie surging at 8, and Surtr contributing a cool 5 to round out what can only be described as a velocity profile that would make a Formula One pit crew feel inadequate. Mac Donnelly got five PRs worth of narrative. The Numbers Desk got twenty-two. Do the math. The Builder Team is cooking on all burners simultaneously and someone left the gas on.

@kevalshahtrilogy led the board with 6 PRs, including the quietly heroic CI work in #3027 onboarding the central @mercy PR reviewer — the kind of infrastructure investment that doesn't make headlines but makes everything else possible — and the surgical precision of #471 in Surtr, disabling a pipeline across both firing paths without blowing anything up. Six PRs. One engineer. One day. @eric-tril posted 5, a number that undersells the complexity entirely: #3049 delivering per-section EBITDA variance regeneration, #3035 and #3033 a beautiful one-two refactor punch on the software-memo architecture, and #3031 ripping out the ebitda_memo_drilldown shim like a man who does not tolerate technical debt on a personal level. @benji-bizzell went 4-for-4 in Aerie with in-app feedback intake (#402), Rhodes API note writes (#400), DD reconciliation receipts (#379), and surfacing community interest in admissions forecasting (#398) — a full-stack operational sweep that would exhaust a lesser engineer before lunch. @sanketghia's 3 PRs deserve their own paragraph someday: server-side dashboard filters in #3045, SpaceX valuation Bull/Bear pills driven by live analyst price targets in #3046, and a deeply satisfying comment deep-link fix in #3042. @marcusdAIy dropped 3 PRs that together constitute a minor miracle: bidirectional GDoc sync with three-way merge in #3034, a create-on-first-sync fix in #3030, and disposition persistence across re-reviews in #3032. The board-doc feature is alive and it remembers everything. @YibinLongTrilogy quietly posted 2 PRs — the P2 scoring config admin page in #399 and legacy migration path cleanup in #377 — the kind of disciplined, unglamorous work that separates professionals from amateurs.

And then there is @ashwanth1109. Four PRs. Four repositories. A Twitter toggle for Khoros maintenance reports in #3044 that required holding the entire Key Metrics DM architecture in one's head simultaneously. An Anthropic billing anchor in #3019 that redistributes billed dollars across API keys by token share — a sentence that took me forty-five minutes to parse and which Ashwanth apparently wrote in an afternoon. Unitemized totals surfaced in Financials Key Metrics cards across all schools in #404. A P&L drill-down sign fix in #396 that correctly handles Credit Card Credit refunds, which sounds simple until you realize the word "net_amount" is doing enormous philosophical work. When asked how he ships at this velocity, Ashwanth reportedly said, "I don't think about velocity. I think about what's wrong and then it's not wrong anymore." His response to this column, per sources close to the situation, was a single raised eyebrow. The Numbers Desk chooses to interpret this as respect.

Morale on the Builder Team has never been higher. Morale, in fact, appears to be a renewable resource that this team generates faster than it can be consumed. The machines are running. The diffs are merging. The future is being built at 27 PRs per day and climbing.

Brick's Overflow — PRs Mac Didn't Cover (click to expand)

#396 — fix(financials): sign P&L drill-down purchase amounts by net_amount (Credit Card Credit refunds) @ashwanth1109 no labels

## Demo

Proves the fix nets QB "Credit Card Credit" refunds correctly and touches nothing else — verified against live Redshift (finance_dw, read-only) and the scoped test suite.

Backend — purchases arm now sums net_amount

Scoped regression tests (lock the SQL shape so qp.line_amount can never silently return):

$ pnpm exec vitest run src/redshift/pl-transactions.test.ts
✓ src/redshift/pl-transactions.test.ts (9 tests) 6ms
Tests  9 passed (9)

Impact, measured on live data (read-only) — refunds that were booked as positive charges now net out. Anchor case, Alpha Chantilly · 63210 Motivation Model · Q1 2026 (credit_flag = true rows):

date        | vendor | line_amount (OLD) | net_amount (NEW)
2026-03-01  | Amazon |            +18.17 |           -18.17
2026-03-01  | Amazon |            +19.07 |           -19.07

Reconciliation swing = 2 × (18.17 + 19.07) = $74.48 — exactly the spurious overage the dashboard showed on that cell, now eliminated.

Portfolio-wide (2026, cost accounts only — the dashboard's scope):

quarter | refund_rows | itemized OLD (Σline) | itemized NEW (Σnet) | overage_removed
1    |     278     |        4,265,834.07  |       4,111,319.01  |     154,515.06
2    |     352     |        5,693,695.19  |       5,483,465.67  |     210,229.52
TOTAL →   $364,744.58

Most at risk — could this change non-refund purchases? No. Every purchase row's net_amount was compared to line_amount at cent precision (2026, school-classed, P&L-accounted):

credit_flag | relation                              | rows
false    | net == line_amount  (fix is a no-op)  | 43,327
true     | net == -line_amount (fix flips sign)  |    702
(zero rows in "something else")

The fix is byte-identical for all 43,327 non-refund purchases and flips sign on exactly the 702 refund rows it targets. Downstream consumer (pl-transactions-refresh.ts:71) sums amount straight through with no sign re-derivation, so the contract is unchanged.

---

## Summary

The per-school P&L Breakdown drill-down (Dashboards → Financials → Schools – Actual vs Model) showed "unitemized" overage badges — itemized transactions summing to *more* than the account's line total — on cost lines that had a refund.

Root cause: a QuickBooks "Credit Card Credit" (a refund) arrives as a Purchase with the header Credit flag set. Surtr staging records this as credit_flag = true and signs it into net_amount = -line_amount, but keeps line_amount positive. The drill-down sync (sync/src/redshift/pl-transactions.ts) selected qp.line_amount, so refunds were summed as positive charges — inflating the itemized total above the pl_monthly line total (which already nets the credit) and pushing the per-cell reconciliation negative.

Fix: the purchases arm now selects qp.net_amount (already signed; verified to diverge from line_amount *only* when credit_flag = true). Bills carry no credit flag (unchanged), journal entries already use a signed net_amount, and vendor credits are already negated.

## Impact (measured against Redshift, 2026 Q1–Q2, cost lines only)

| Metric | Current | Removed by this fix |

|---|---:|---:|

| Total unitemized \|residual\| | $949,686 | −$364,745 (38%) |

| "Itemized > line total" overage $ | $606,375 | −$364,745 (60%) |

| Non-headcount overage $ | $432,355 | −84% |

Anchor example — Alpha Chantilly · 63210 Motivation Model · Q1 2026: line total $4,678.63; two Amazon "Refund" rows ($18.17 + $19.07) were shown positive, creating a $74.48 overage (the ⚠ in the dashboard) that this fix eliminates entirely.

The residual that remains after the fix is unrelated to refunds — headcount / contracted-labor JE accrual timing, and genuinely un-itemized spend (line total > itemized) — and is out of scope here.

## Test

- New SQL-shape regression guards in pl-transactions.test.ts: the purchases arm must read qp.net_amount and must not read qp.line_amount; the other three arms keep their established signs.

- vitest run src/redshift/pl-transactions.test.ts → 9 passed; tsc --noEmit clean; biome check clean.

## Rollout

Takes effect on the next financial-worker sync, which re-pulls plTransactions. No schema change, no migration.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

View on GitHub →

#404 — AERIE-394 feat(dashboards): surface unitemized total in Financials Key Metrics cards (all schools) @ashwanth1109 no labels

## Demo

Proves the new Unitemized Key Metrics card shows the correct as-reported residual with the right tone, and that un-gating the Key Metrics row didn't break the model-comparison cards for actuals-only schools.

UI — Unitemized Key Metrics card (Financials › Schools – Actual vs Model)

1. Open Financials → Schools, "Actual vs Model" view (period 2026-Q1 — the only option today).

2. Select a model-mapped school (one carrying a unitEconomicsModel token, e.g. an Alpha campus). The Key Metrics strip now shows 7 cards, with Unitemized last (grid widened to xl:grid-cols-7).

3. Pick a school with a non-zero residual (e.g. a campus from the AERIE-393 name-split set) → the Unitemized value renders in coral. Hover for the tooltip.

4. Pick a school that fully reconciles → the Unitemized value is $0 in neutral cream (no coral).

5. Select an actuals-only school (no model mapping). The Key Metrics strip now renders — previously hidden behind the showBand gate. The model-comparison cards (Revenue/Cost/EBITDA/Student:Guide) degrade to "—", and the Unitemized card still appears with that school's residual.

> _Screenshot: model-mapped school — 7-card strip with a coral Unitemized value + tooltip —_

Most at risk from this change: un-gating the Key Metrics row from bandWiring.showBand means actuals-only schools render the row for the first time — the model-comparison cards must degrade to "—" without crashing or NaN. Verified via the scoped component tests (no backend touched):

$ pnpm test components/dashboards/financials/financials-kpi-cards.test.tsx
✓ Unitemized card (spec 15) > renders the residual AS-REPORTED for the selected quarter (no annualisation)
✓ Unitemized card (spec 15) > tones the value coral when the residual is non-zero
✓ Unitemized card (spec 15) > keeps the neutral cream tone (no coral) when the residual is $0
✓ Unitemized card (spec 15) > renders $0 / neutral when the field is omitted or cost is null
✓ Unitemized card (spec 15) > reads the SELECTED quarter's residual, not always q1
✓ Unitemized card (spec 15) > renders no vs-model comparison badge regardless of model presence
✓ loading guard + live-query wiring (PR #346 item 2) > (a) renders '—' while the query is in flight
✓ loading guard + live-query wiring (PR #346 item 2) > (b) renders the real folded figures once the query resolves
✓ loading guard + live-query wiring (PR #346 item 2) > (c) keys the query on the RAW schoolHandle + period
Test Files  1 passed (1)
Tests  9 passed (9)

The "no vs-model comparison badge regardless of model presence" test renders both a model-mapped and an actuals-only (unitEconomicsModel: null) school — confirming the Unitemized card appears and the comparison cards degrade in both, exactly the un-gated path.

---

## Summary

Adds a 7th "Unitemized" card to the Financials › Schools – Actual vs Model Key Metrics strip, surfacing the cost-side residual for the selected quarter.

- Value: the as-reported cost.total_expenses_unitemized[quarterKey] — read verbatim with no annualisation (the residual is a reconciliation artifact, not a run-rate figure). Omitted/null cost renders $0.

- Tone: value text is toned coral when the residual is non-zero (spend not yet reconciled to transaction-level detail) and neutral cream when $0 / omitted.

- No vs-model badge (comparison: null) — there is no model counterpart for an unreconciled residual.

- Tooltip: P&L line total − Σ itemized transactions (purchases/bills/JEs/credits), $0.50 tolerance. Non-zero = spend not yet reconciled to transaction-level detail.

- All-schools visibility: the Key Metrics strip is now un-gated from bandWiring.showBand, so it renders for every single-school selection — including actuals-only schools (where unitEconomicsModel is null). The model-comparison cards degrade to "—" via the existing comparison: null path; no crash, no fabricated model figures.

- Grid: widened from xl:grid-cols-6 → xl:grid-cols-7 to seat the new card.

Frontend-only — no backend change. getCostBreakdownMonthly already computes and tolerance-applies total_expenses_unitemized.

## Linear

[AERIE-394](https://linear.app/builder-team/issue/AERIE-394/surface-unitemized-total-in-financials-key-metrics-cards-all-schools)

## Files Changed

Component

- chat/components/dashboards/financials/financials-kpi-cards.tsx — added the optional value-tone field to the Card type, appended the 7th "Unitemized" card to the cards useMemo (as-reported residual, coral when non-zero, comparison: null, tooltip), applied the value tone in KpiCard, widened the grid to xl:grid-cols-7.

- chat/components/dashboards/financials/financials-view.tsx — removed the bandWiring.showBand gate around the <FinancialsKpiCards> strip (single-school branch); left the <UnitEconomicsComparisonTable> gate intact.

Tests

- chat/components/dashboards/financials/financials-kpi-cards.test.tsx

Docs

- features/dashboards/school-pl-unit-economics/specs/15-unitemized-kpi-card-all-schools/spec.md (spec 15, Completed)

- features/dashboards/school-pl-unit-economics/FEATURE.md (changelog synced)

## Test Coverage

9 tests pass — 6 new for the Unitemized card:

- value / no-annualisation (as-reported total_expenses_unitemized[quarterKey])

- coral value tone when non-zero

- neutral value tone at $0

- undefined / null cost → $0

- quarter selection (reads the selected quarterKey)

- no vs-model badge — across both model-present and actuals-only states

## Self-Review & CI

- Self-review: no issues found.

- CI: all 7 checks green.

## Note

Per-school residuals are currently inflated by the campus name-split attribution artifact tracked in [AERIE-393](https://linear.app/builder-team/issue/AERIE-393). This is a known, accepted limitation — the residual is surfaced as-reported; correcting the name-split is out of scope for this PR.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

View on GitHub →

#3019 — KLAIR-2878 feat(ai-spend): anchor Anthropic dashboard dollars to billed Cost API (redistribute billed $ by API-key token shares) @ashwanth1109 no labels

## Demo

Proves the dashboard's Anthropic dollars now tie out to the billed Cost API. Run by importing the changed service directly (no HTTP layer) against live Redshift — read-only: AICostsService.get_summary / get_time_series / get_by_bu / get_by_model / get_top_drivers, compared to ground truth SELECT SUM(amount) FROM core_finance.ai_spend_anthropic_cost_reports.

Backend — billed-dollar allocation (throwaway script run from klair-api/ with uv run; imports the service and calls each swapped consumer):

WINDOW 2026-05-13..2026-06-10
========================================================================
1) Ground truth: billed Cost API total (SUM(amount))   = $801,242.15
2) get_summary().anthropic (direct + TF, allocated)    = $801,242.14
TIE-OUT diff vs billed                              = $-0.02
openai   (must be unchanged by this PR)             = $411,129.34
claude_ai (separate provider, must be unaffected)   = $136,437.99
3) get_time_series daily: 29 pts, duplicate keys=NONE
sum(anthropic) = $801,242.14 (diff vs summary $0.00)
4) get_by_bu sum(anthropic) = $801,242.14 (diff vs summary $0.00)
Central Engineering          $  199,787.32
DevFactory                   $  123,352.05
Trilogy-Inc                  $  106,604.41
Tech Super Builders          $   65,804.48
Core Education               $   43,306.68
5) get_by_model claude-fable-5 (was $0 pre-PR, no pricing row): $23,819.71
6) get_top_drivers (anthropic, top 5):
#1 Unknown                          $   76,571.08
#2 Claude Code                      $   76,216.94
#3 Claude Code                      $   46,705.13
#4 Education                        $   41,593.10
#5 Unknown                          $   33,717.11
7) BU-filtered get_summary (Central Engineering) anthropic = $320,412.07 (sane: > 0, < total)

What this demonstrates:

- Full-bill tie-out (lines 1–2): dashboard Anthropic (direct allocation + rescaled TF slice) reproduces the billed total to within $0.02 (float rounding) over a 29-day window.

- claude-fable-5 recovered (line 5): previously $0 on the dashboard (no pricing-table row); now carries its billed $23,819.71.

- Line 7's value exceeds Central Engineering's by-BU row because BU-filtered summaries always include ANTHROPIC_SHARED_BUS — pre-existing semantics, unchanged by this PR.

Most at risk from this change:

1. TF slice double-count — direct consumers must exclude TF-flagged allocated rows while _tf_anthropic_* re-adds the rescaled slice; a regression here inflates totals by ~$177K. The $0.02 tie-out above proves it's absent, and test_direct_consumers_exclude_tf_flagged_allocated_rows guards it (verified to fail against pre-fix code).

2. Consumer divergence / chart breakage — all five swapped consumers share the new CTE; duplicate period keys or disagreement would break the dashboard. Lines 3–4: daily series = by-BU = summary to the cent, no duplicate keys.

3. Other providers — OpenAI ($411,129.34) and claude_ai ($136,437.99) flow through the same methods and are unchanged (line 2).

Verified via the scoped suite (uv run pytest tests/test_ai_costs_service.py -q):

153 passed in 0.37s

## Summary

Re-anchors the /ai-adoption dashboard's Anthropic dollars from the computed estimate (token counts × the hand-maintained ai_spend_token_pricing, materialized as ai_spend_claude_token_usage.total_cost_dollars) to the actual bill in core_finance.ai_spend_anthropic_cost_reports. Billed dollars are redistributed to BUs at query time via per-workspace API-key token shares — no new pipeline, no new table. After this change the dashboard's Anthropic total (direct + TrueFoundry) equals SUM(ai_spend_anthropic_cost_reports.amount) for any date window, to the dollar, while BU attribution still honors ai_spend_bu_overrides edits.

Specs 01–02 are backend-only (response envelopes reused verbatim); spec 03 adds a single client-side frontend change — a By BU roll-up in the Provider Detail panel (see below).

Linear ticket: [KLAIR-2878 — AI Spend: anchor Anthropic dashboard dollars to billed Cost API (redistribute billed $ to BUs via per-workspace API-key token shares)](https://linear.app/builder-team/issue/KLAIR-2878/ai-spend-anchor-anthropic-dashboard-dollars-to-billed-cost-api)

## Performance fix (spec 02)

Follow-up commit 2563f4e7e mitigates a performance regression the allocation CTE introduced. The CTE is ~10× heavier than the query it replaced (0.8s → 7.8s for a one-month window) and is embedded ~13× per dashboard load (each of the 5 consumers, plus every _tf_* helper re-runs the full allocation block via _tf_day_factor_with_block). Because the 5 dashboard endpoints were async def calling blocking Redshift code, FastAPI serialized the frontend's parallel requests on the event loop (~100s load) — so get_by_bu (the heaviest endpoint) was still pending when a user drilled into the Anthropic provider, leaving the "By Workspace" panel empty ("0 workspaces") even though the backend returns correct data.

Mitigation (allocation math untouched — direct + TF == SUM(amount) to the cent still holds):

- Threadpool concurrency — the 5 dashboard endpoints (summary, time-series, by-model, by-bu, top-drivers) are converted async def → sync def, so Starlette runs them concurrently instead of serializing on the event loop.

- Pool guardrail — a process-global bounded semaphore caps how many _execute_query calls hit the shared Redshift pool at once (≤ REDSHIFT_POOL_SIZE, default cap pool − 1) so the fan-out can't over-subscribe the pool and block on connection checkout.

- Result cache — a short-TTL ((query, params), default 300s) cache serves concurrent/repeat loads from memory and collapses the duplicate _tf_unmapped_billed lookups a single load fires. Disabled under tests via a conftest.py autouse fixture.

| Metric | Before (serialized) | After |

|--------|--------------------|-------|

| Cold parallel load | ~100s, by-bu never arrives → empty panel | 40s, all 5 endpoints succeed |

| Warm load (cache hit) | n/a | 0.9s (44×) |

| by-BU Anthropic tie-out | — | $822,205.55 vs billed $822,205.57 (−$0.02) |

| Anthropic workspace rows | 0 (panel empty) | 100 |

Trade-off: the first cold load (and loads after the TTL expires or for a new date range) is still ~40s but now reliably completes; repeat loads are instant. The deeper "compute the allocation once per load and aggregate in-app" refactor (collapses ~13× → 1×, ~8s cold) is deferred — documented in [spec 02](https://github.com/AI-Builder-Team/Klair/blob/feat/anthropic-billed-dollar-allocation/features/ai-spend-and-adoption/anthropic-billed-dollar-allocation/specs/02-dashboard-query-performance/spec.md).

## Frontend — By BU section (spec 03)

Commit 9ff3fa9ab adds the one frontend change: a By BU roll-up section in the Provider Detail drill-down panel, so spec 01's redistributed billed dollars are legible per BU. The panel previously showed only By Model and By Workspace — a reader had to mentally sum dozens of workspace rows to see each BU's share.

- Renders as §02, between By Model (§01) and By Workspace (now §03) — a natural drill-down order (provider → model → BU → workspace).

- Client-side only: rolls up the already-fetched by-bu response (the same provider-matched workspace rows the By-Workspace section uses) by BU — sum spend, count workspaces, % of provider total, sorted desc. No new request, no backend change.

- Reconciles to the cent with By Workspace (identical source rows).

- Provider-agnostic: keys on project.provider, so every provider's panel gets the section (OpenAI, Cursor, Bedrock, GCP, Azure, Anthropic), not Anthropic-only. Verified the backend yields the same per-BU total (each BU's anthropic_cost equals the sum of its anthropic-provider project costs).

- Extracts a shared RankedSpendTable rendered by both sections (net markup reduction).

klair-client/src/screens/AIAdoptionV2/components/panels/ProviderDetailPanel.tsx (+ .spec.tsx). Tests: existing single-table assertion scoped by aria-label; added BU roll-up + per-BU empty-state tests — 9 panel tests / 314 AIAdoptionV2 tests green; eslint / tsc / prettier clean. Documented in [spec 03](https://github.com/AI-Builder-Team/Klair/blob/feat/anthropic-billed-dollar-allocation/features/ai-spend-and-adoption/anthropic-billed-dollar-allocation/specs/03-provider-detail-by-bu-section/spec.md).

## Why (drift evidence)

Computed Anthropic dollars drift from what Anthropic actually bills:

- Zero-match models bill at $0. claude-fable-5 has a billed line but no pricing row — silently rendered as $0 on the dashboard (June 2026: ~$24K of real spend invisible).

- Prefix fallback bills new models at ancestor rates (cause of the Apr 2026 ~$70K remediation, KLAIR-2580).

- Computed ignores the negotiated 10% input/output discount → systematic ~+2% overstatement even when the pricing config is correct.

- Net drift: Apr computed +2.7% vs billed, May +0.6%, June running −5.0%.

The actual bill already lands daily in ai_spend_anthropic_cost_reports (~06:00 UTC, anthropic-cost-pipeline) but was only read by the raw-data report screens — no dashboard aggregate consumed it.

## What was implemented

All changes are inside klair-api/services/ai_costs_service.py: one new shared SQL builder plus call-site swaps in every dashboard read path.

- _anthropic_billed_allocation_cte(...) — a shared builder returning a WITH block + ordered params:

- cost_cells — billed dollars per (bu, workspace_id, model, context_window, report_date, token_type) from ai_spend_anthropic_cost_reports (cost_type='tokens').

- key_weights — the matching token column per api_key_id from ai_spend_claude_token_usage, summed per cell (carries is_truefoundry_routed), via a token_type → usage-column CASE map. Join keyed on the full cell including org bu with NULL-safe COALESCE on bu/workspace_id (the default workspace is NULL workspace_id in both tables for every org, so a bu-less key would smear one org's billed dollars across others).

- cell_totals — SUM(key_tokens) per cell.

- allocated — billed(cell) × key_tokens / cell_tokens at api_key_id grain, with the UNION ALL fallback cascade (tiers 0–3: full cell → drop context_window → workspace-day computed-cost share → org-bu passthrough) plus the non-token (web_search/session_usage) workspace-day split. Redshift has no FILTER clause — all conditional sums use SUM(CASE WHEN … THEN x END).

- Consumer swaps — get_summary (incl. its prior-period block), get_time_series, get_by_bu, get_by_model, get_top_drivers swap the direct-Anthropic FROM ai_spend_claude_token_usage … SUM(total_cost_dollars) aggregate for FROM (<allocation CTE>) … SUM(amount), keeping _get_anthropic_override_join + _build_effective_bu_filter(..., always_include=ANTHROPIC_SHARED_BUS) wiring identical (allocated rows keep api_key_id + bu grain).

- TF rescale — the _tf_anthropic_* helpers are multiplied by a per-day factor day_factor = tf_billed_day / tf_computed_day so the TF add-back equals the TF keys' allocated billed dollars. Days with TF billed $ but no TF usage rows route to Unmapped. Direct consumers keep ANTHROPIC_TF_EXCLUDE on the terminal CTE flag.

- get_filters — verified to have no dependency on the computed Anthropic cost column.

## Allocation model

- Cell = (org bu, workspace_id, model, context_window, report_date, token_type) over cost_type='tokens' rows.

- Weights = matching token column per api_key_id, summed per cell. token_type maps 1:1 to usage columns (uncached_input_tokens, output_tokens, cache_read_input_tokens, cache_creation.ephemeral_5m/1h_input_tokens → cache_creation_5m/1h_tokens).

- allocated(key, cell) = billed(cell) × key_tokens / cell_tokens. Weights are immune to pricing drift because price is uniform within a cell.

- Fallback cascade (defensive — May 2026 verified 100% of billed token $ matched at full cell grain): drop context_window → workspace-day computed-cost share → org-bu passthrough.

- Non-token billed rows (web_search, session_usage; ~$1K/mo): allocate by workspace-day computed-cost share.

- BU attribution semantics unchanged: allocated rows keep api_key_id grain, so effective_bu = COALESCE(o.bu_override, bu) and ANTHROPIC_SHARED_BUS always-include carry over verbatim. 58% of May billed $ sits in workspaces whose keys map to >1 effective BU — this genuinely redistributes, not just rescales.

- TF slice: the TF keys' allocated billed dollars become the TF slice total per day; re-attributed per BU by TF metered-Anthropic usage shares (the _tf_anthropic_* helpers × day_factor). Result: direct + TF = SUM(amount) for any window, to the dollar.

## Test coverage

tests/test_ai_costs_service.py — 159 pass:

- 27 new tests (spec 01): CTE structural contract, allocation math (multi-key cell split, fallback tiers, zero-weight cell), _bu_includes, TF day-factor / Unmapped routing, consumer swaps.

- 6 new tests (spec 02): TestDashboardQueryCache — result-cache hit/keying/TTL expiry, cross-instance sharing, disabled-cache refetch, concurrency-cap bound.

- 1 added regression guard.

- 3 legacy tf_claude_real tests updated to the new query contract.

- The autouse _neutralize_tf_claude_sources neutralization fixture is unchanged, keeping legacy ordered-mock tests valid.

## Self-review findings addressed

- IMPORTANT — spec/implementation contradiction on the TF exclude flag: fixed; spec synced to match the implementation.

- MINOR — computed_cost 5× ratio-only constraint: clarifying comment added.

- MINOR — days-metric edge case on TF-only days: accepted; the existing date-range fallback already covers it.

## Live verification

Live Redshift, window 2026-05-13 .. 2026-06-10:

| Check | Old/Reference | New (allocated) |

|-------|---------------|-----------------|

| Anthropic dashboard (direct + TF) | — | $801,242.14 |

| Billed SUM(amount) | $801,242.15 | tie-out within $0.02 (float rounding) |

| Daily series total | — | = by-BU = summary, to the cent |

| claude-fable-5 (no pricing row) | $0 | $23,819.71 |

| OpenAI (unaffected) | $411,129.34 | $411,129.34 (unchanged) |

| claude_ai (out of scope) | unaffected | unaffected |

## Expected dashboard shifts

Vs the old computed dashboard: May −0.6%, June +5% (the June jump includes claude-fable-5's previously-missing ~$24K). These are expected and correct — the dashboard now equals the bill.

## Out of scope

1. claude_ai provider line (Claude.ai Enterprise, separate table/API) — unchanged.

2. fct_ai_spend mart (022_fct_ai_spend.sql) — still computes from token usage; re-anchoring is a follow-up ticket.

3. MCP query_ai_spend — still reads computed usage; same follow-up.

4. Surtr pipelines — unchanged; both feeds land daily ~06:00 UTC at equal freshness.

5. Pricing-table fixes / drift detection (KLAIR-2580 Spec 2) — orthogonal; computed costs survive as allocation-fallback weights and for savings views.

6. Dashboard-wide BU pivot — the spec 03 By-BU section is scoped to the provider drill-down panel; a top-level "group by BU" for the whole Cost section is a separate change. (Backend response shapes are unchanged regardless.)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

View on GitHub →

#3034 — feat(board-doc): bidirectional GDoc sync — 3-way merge (KLAIR-2834 phases 1-3) @marcusdAIy approved

## Summary

Implements KLAIR-2834 phases 1-3 (backlog B0.11): bidirectional Google Doc ↔ app sync via a section-level 3-way merge, replacing the one-way push + hard-409 model. Hand-edits made directly in the Google Doc are now reconciled with in-app edits instead of stranding the user behind a 409, with conflicts defaulting to the Google Doc's version and surfaced in a banner.

## Why it's needed

The old /sync hard-409'd on any revisionId change and then republished the app's sections wholesale. That (a) stranded users who hand-edited the linked Google Doc (every sync 409'd, the only escape was a destructive reload) and (b) fired spuriously, because Drive bumps revisionId on internal indexing passes that don't change content. This is the missing reconciliation model, not a guard bug.

## Changes

Backend

- WizardSession.last_synced_sections — the per-section merge baseline (keyed by spec section_id, same space as generated_sections).

- gdoc_sync: new merge_sections_3way(ours, theirs, base) (doc wins on true conflicts) + factored-out shared map_doc_sections_to_spec / is_degraded_parse / normalise_section_title (previously inline in reload).

- /sync rewrite: read the live doc → map to spec → 3-way merge app + doc against the baseline → publish the merged result → persist merged content + refreshed baseline + revision.

- Drops the bare-revisionId 409. Uses a revisionId fast-path (unchanged ⇒ no merge) + content-aware merge (a spurious bump with identical content merges cleanly), and a compare-and-swap that retries once and aborts (409) if the doc keeps moving mid-merge.

- Degraded-parse still aborts (broken headings ⇒ protect the user's content), same guard as reload.

- Adds conflicted_sections to the response.

- Baseline captured at clone bind (create_from_prior_quarter) and after every reload-from-doc.

Frontend

- Sync is bidirectional in the live editor. /sync returns merged_from_doc (true when the merge pulled doc content in — external edit merged and/or a doc-wins conflict); handleSync then calls editor.reloadDocument() so the open editor re-fetches the merged section bodies without a page refresh (skipped on the fast-path / app-only edits so the cursor doesn't jump). No polling — the flag rides the synchronous sync response the FE already awaits.

- Shows a dismissible conflict banner ("the Google Doc's version was kept"); DocChangedBanner reframed from "blocked, reload to replace" → "syncing merges both sides"; SyncResponse type + formatSyncConflictSummary util extended.

Interim caveat (important)

- Until the phase-4 per-section (batchUpdate) write path lands, a sync still republishes the whole doc body, which can drop comments and manual formatting added directly in the Google Doc. Surfaced as an always-available tooltip on the Sync button. Comment/formatting preservation is the committed GDoc-as-canonical end state (phase 4), explicitly not solved here.

## Breaking changes

None. conflicted_sections is additive (FE treats omitted as none); the 409-on-external-change behavior is replaced by auto-merge.

## Test plan

- [x] Backend unit: merge_sections_3way (no-op / ours-only / theirs-only / both-same / conflict→doc-wins / absent-from-doc preserves ours / whitespace-only no-conflict), map_doc_sections_to_spec, is_degraded_parse.

- [x] Backend endpoint: auto-merge (no 409), conflict→doc-wins + reported, degraded-parse abort, baseline persisted, compare-and-swap abort, revision fast-path; stale-409 test replaced; revision-capture/save paths re-pinned.

- [x] FE: conflict-banner render, reframed DocChangedBanner copy, formatSyncConflictSummary unit.

- [x] ruff format/check + pyright clean; pytest tests/board_doc/ → 2337 passed; eslint + tsc clean; vitest for touched specs green.

### Manual / behavioral verification

UX delta vs. today: editing the linked Google Doc no longer hard-409s the next sync. Sync now reads the doc, 3-way-merges it with the app's sections against last_synced_sections, and republishes the merged result — doc-only edits are pulled in, app-only edits pushed, and same-section conflicts default to the Doc (surfaced in an amber banner). The DocChangedBanner is reframed to "syncing merges both sides," and the Sync button carries a tooltip warning that a sync republishes the whole body (comments/manual formatting may be lost) until the phase-4 per-section write path lands.

Run both servers (uv run fast_endpoint.py in klair-api/, pnpm dev in klair-client/), open a session with a linked Google Doc, then:

| Scenario | Steps | Expected |

|---|---|---|

| Auto-merge (doc-only) | Sync once. Edit one section directly in the Google Doc. Sync. | The doc edit appears in the editor; no conflict banner. |

| Conflict (doc wins) | Sync once. Edit section X in the app (let it autosave) AND edit X differently in the Doc. Sync. | Editor shows the Doc's version of X; amber conflict banner names the count. |

| Fast-path | Sync once. Edit only in the app (don't touch the Doc). Sync. | Publishes app content; no merge, no banner (revisionId unchanged). |

| Degraded-parse guard | Break the heading styles on ≥2 sections in the Doc. Sync. | 409 "Sync aborted to protect your work…"; in-app content untouched. |

To force a conflict you must edit both sides *between* syncs (the baseline is captured on each sync/reload/clone).

An automated repro that drives the real merge against a throwaway Google Doc (read → map → merge_sections_3way → publish, then self-deletes) lives at .cursor/scripts/b011_merge_repro.py (operator-local; needs the app's Google SA creds). Confirmed all four scenarios behave as above end-to-end.

View on GitHub →

#3044 — KLAIR-2884 feat(maint-report): page-level "Include Twitter" toggle for Khoros (Key Metrics DM + Maintenance Summary) @ashwanth1109 no labels

## Demo

## Summary

Page-level "Include Twitter" toggle for ARR & Retention Reports. When ON, it restates Khoros figures by the projected X/Twitter churn impact (Current/EOM ARR reduced by SUM(impact) from the shared Redshift table core_finance.arr_gap_twitter_impact). Default OFF; hidden in Live and Comparison modes.

The toggle lives in the page sticky header (next to Go Live / Compare Dates) and drives two sections:

- Key Metrics → DM Percentage card — restated by the Khoros loss (Gross/Net Retention unchanged).

- Maintenance Summary — both the "by BU/period" view (useMaintenanceData; Khoros cascades into Total Organic → Grand Total) and the Acquisitions sub-tab (processAcquisitionsData): EOM ARR, Variance, DM, and Annualized DM. Retention and churn/snowball columns are not adjusted (the loss is not modeled as churn).

Net-new implementation (not a port of the ARR Gap V2 toggle, which runs on klair-api FastAPI). This dashboard runs on AWS AppSync GraphQL resolved by a Node.js Lambda in klair-udm/. The only shared asset is the Redshift source table.

Linear: [KLAIR-2884](https://linear.app/builder-team/issue/KLAIR-2884/arr-and-retention-reports-add-include-twitter-toggle-for-khoros)

## What's in this PR

- Backend (klair-udm / AppSync): new getTwitterImpactSummary query returning the aggregated Khoros loss (SUM(impact)) via the 5-file Lambda pattern (app.mjs → queryBuilder.mjs → validators.mjs → responseTransformer.mjs → schema.graphql) + SAM resolver registration + unit tests. Requires GRANT SELECT on the table to the Lambda's Redshift user (applied).

- Frontend (klair-client): useKhorosTwitterImpact hook; page-header toggle + tooltip; Khoros restatement applied to the Key Metrics DM card, useMaintenanceData, and processAcquisitionsData; applyKhorosTwitterImpact pure helper + unit tests.

## Scope note — Key Metrics DM% (corrected)

An earlier revision marked the Key Metrics cards out of scope, assuming the by-BU Khoros record has AcquiredARR > 0 and is filtered out of the organic DM aggregate. The live getFinancials data shows that record is organic (AcquiredARR = 0), so it *is* in the aggregate and the DM% card does move: 76.2% → 68.9% (matches Maintenance "Total Organic"). The MTD Churn/Downsell, Invoicing, and Late Renewals cards remain unaffected (they don't depend on Khoros ARRCurrent).

## Testing

- pnpm tsc --noEmit — clean; pnpm eslint on changed files — clean

- Vitest: applyKhorosTwitterImpact.spec.ts, useKhorosTwitterImpact.spec.ts — pass

- Manual: verified DM 76.2% → 68.9% against May-2026 data; toggle OFF is byte-for-byte unchanged

🤖 Generated with [Claude Code](https://claude.com/claude-code)

View on GitHub →

#3049 — feat(mfr): per-section regenerate for EBITDA variance arrays @eric-tril approved

### Summary

Follow-up to #3036. A stale variance section (under "The key variances are outlined below:" — BU / Central Functions / Investment / Affiliates & Other) previously offered only whole-memo Generate AI Content as remediation. Now each stale variance section regenerates on its own, triggered by its amber drift chip.

### Why this is now safe

The earlier limitation was about per-bullet identity — an individual variance bullet's index/order isn't stable across generations, so we couldn't regenerate "bullet #2." Regenerating the whole section sidesteps that entirely: it re-runs just that section's LLM array field and replaces *all* its bullets. No per-bullet matching needed.

### Changes

- Backend — regenerate_ebitda_section now handles all 10 commentary sections: the 6 single-string sections (replace their one bullet) and the 4 variance arrays (replace the whole bullet list). It returns content and provenance as lists uniformly (1 element for single-string sections). EBITDARegenSectionKey / _REGEN_SECTION_KEYS / _ARRAY_SECTION_KEYS expand accordingly; non-commentary keys (e.g. the bridge table) still 422 at the request boundary.

- Router — persists the list content and swaps the section's full provenance bullet list; _replace_ebitda_section_provenance takes a list; docstrings updated.

- Frontend — the variance section's amber stale chip becomes a clickable "regenerate this section" button (spinner while in flight). The API type, handleRegenerateSection, and the provenance overlay all carry list-shaped content + provenance. EBITDARegenSectionKey (TS) expands to the 10 keys.

### Behavior recap

- Fixed-6 narrative sections: regenerate one bullet (unchanged, now list-shaped under the hood).

- Variance arrays: regenerate the whole section from the chip (new).

- Both still only appear when the section is stale.

### Testing

- Backend: uv run pytest tests/mfr/ → 1790 passed; pyright 0 errors; ruff clean. Tests updated for the list contract + a new test_regenerate_variance_section_returns_bullet_list and endpoint accept/reject cases.

- Frontend: MFR suite → 1097 passed (incl. a new "regenerate a whole variance section from its stale chip" test); tsc + eslint clean.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

View on GitHub →

The Portfolio — Trilogy Companies

CNN Comes Knocking as Alpha School Takes Its Model Home

Joe Liemandt's AI education experiment is drawing national scrutiny just as it bets its model can scale beyond four walls.

By Pat Donnelly, Investigative Desk · Claude Sonnet

AUSTIN, TEXAS — The same week CNN published a nationally televised examination of whether AI-powered schools represent the future of education or a high-stakes gamble, Alpha School quietly made a move that suggests its founders aren't waiting for the verdict.

Alpha Anywhere, the school's newly launched home-learning product, went global this month — extending the school's signature two-hours-of-academics model to families who can't or won't relocate to Austin, Brownsville, or Miami. The pitch is straightforward and audacious in equal measure: the same AI-tutor-driven curriculum that has pushed Alpha's enrolled students into the top 1–2% nationally on NWEA MAP Growth assessments, delivered to any kitchen table with a broadband connection.

The timing is not incidental. CNN's report — framed as 'Is AI schooling the future of education — or a risky bet?' — arrived as Alpha is preparing to open nine or more new campuses by fall 2025 across Texas, Florida, Arizona, California, and New York. The network's framing, 'What if I told you this school had no teachers?', captures the central anxiety that follows the model wherever it goes. Alpha's answer, implicit in its expansion, is that the anxiety is misplaced.

The school is simultaneously publishing a counter-narrative aimed at parents navigating the broader AI moment. A post titled 'Cognitive Offloading Is the New Illiteracy' warns against the very dependency critics fear AI schooling might accelerate — arguing that letting ChatGPT think for a child is not a feature of the Alpha model but its opposite. A companion piece on screen time distinctions and a curated list of ten AI tools used inside Alpha classrooms round out what amounts to an unsolicited curriculum for skeptical parents.

The question underneath all of it is the one CNN asked and Alpha has not fully answered: when the model scales from a boutique Austin campus to a global home product, does the 2.3× learning acceleration travel with it — or does it stay behind in the building?

Joe Liemandt has committed $1 billion to Timeback, his platform for franchising the Alpha model to third-party school operators. Alpha Anywhere is a different vector to the same destination. Both bets assume the results are portable. The CNN cameras will be watching to see if they're right.

↗ __followup__Top 1% Academics, Now at Your Kitchen Table · Not All Screen Time Is Equal · Cognitive Offloading Is the New Illiteracy

Skyvera Is Building a Telecom Software Empire — And the CloudSense Deal Is Only the Latest Move

With three major acquisitions now in the fold, Trilogy's telecom software arm is assembling something bigger than the sum of its parts.

By Frank Dunmore, Investigative Correspondent · Claude Sonnet

AUSTIN, TEXAS — If you read between the lines of Skyvera's recent acquisition activity, a picture emerges that goes well beyond routine portfolio expansion. The Trilogy International telecom software unit has quietly completed the acquisition of CloudSense, a Salesforce-native CPQ and order management platform purpose-built for telecom and media providers — and this is where it gets interesting.

CloudSense is not a random bolt-on. It is a precision instrument. In an industry where sales cycles for enterprise telecom contracts can stretch years and involve extraordinary configuration complexity, a Salesforce-native CPQ platform sits at the exact moment a carrier commits budget. Whoever controls that workflow controls the relationship. Skyvera now controls that workflow.

But the CloudSense deal doesn't stand alone. Sources familiar with the matter — sources I'm not in a position to name — have confirmed that Skyvera's acquisition of STL's divested telecom products group preceded this move and was equally deliberate. That acquisition brought digital BSS functionality including monetization, optical networking, and analytics into the Skyvera portfolio — capabilities that sit upstream of what CloudSense handles. Stack them together with Kandy, Skyvera's cloud-based real-time communications platform, and you have something that begins to look less like a collection of software assets and more like an integrated operating stack for the modern telecom operator.

This is the ESW Capital playbook, executed with unusual strategic coherence. Acquire at a discount. Staff through Crossover's global talent network. Push toward 75% EBITDA margins. But what distinguishes Skyvera from a typical ESW roll-up is the apparent intentionality of the sequencing — BSS infrastructure, then communications layer, then the revenue capture surface. Nothing here looks accidental.

The telecom software market is at an inflection point. Legacy operators are under pressure to modernize on-premise systems without the capital or internal talent to do it. Skyvera is positioning itself as the answer — a single vendor offering the full migration path from old infrastructure to cloud-native operations.

What comes next in this sequence is the only question worth asking.

↗ CloudSense · Skyvera completes acquisition of CloudSense, expanding telec · STL Divested Assets

EdTech Money Is Back, and Austin’s Alpha Crowd Is Smiling

By Dottie Sharp, Society & Industry Desk · GPT-5.2

Word is the education sector is gaining momentum again. Across the pond, UK workforce-training platform Multiverse landed a reported €60 million funding round at a €1.8 billion valuation, signaling investor appetite is returning to education after years of post-pandemic skepticism.

The shift is clear: investors no longer want generic "online school" offerings. They demand outcomes, workforce relevance, and models that survive AI disruption. Multiverse trains workers; India's Emversity scales roles AI can't replace. The message across markets is identical—education must justify itself through earnings power.

In Austin, Joe Liemandt's Alpha School exemplifies this pivot. Co-founded with MacKenzie Price, it condenses academics to two hours daily using adaptive AI, freeing time for entrepreneurship, leadership, coding, and human skills. His broader ambition, Timeback, aims to become "Shopify for schools"—a platform letting operators launch AI-first schools without building academic infrastructure from scratch.

With 29 edtech unicorn startups tracked for 2026, the sector is heating up. But investors are pickier. As one Austin insider put it: "Seat time is the next taxi medallion."

The Machine — AI & Technology

When the Helpful Creature Learned to Whisper Secrets

A critical Copilot flaw shows that the age of AI assistants has also become the age of exquisitely disguised traps.

By Sir Reginald Marsh, Natural Phenomena Correspondent · GPT-5.2

REDMOND, WASHINGTON — In the dimly lit undergrowth of the modern office, there lives a creature of unusual charm: the AI assistant. It drafts, summarizes, searches and suggests, padding softly through calendars, documents and inboxes in search of useful morsels. But this week, researchers revealed how one such creature could be coaxed into carrying prey back to the wrong nest.

A critical vulnerability in Microsoft Copilot, described by Ars Technica as SearchLeak, reportedly allowed attackers to steal sensitive data from users, including two-factor authentication codes, by exploiting the way large language model systems ingest and retrieve information. The episode is not merely another bug in the reeds. It is a glimpse of a larger ecological imbalance.

For years, enterprise software has been trained to treat permissions, identity and access control as stout fences around the watering hole. But LLMs are different beasts. They browse across habitats. They summarize what they find. They respond to language not as a locked gate, but as a scent trail. A carefully planted prompt, hidden where the model might later forage, can become an instruction. The assistant, eager and obedient, may then retrieve private information and present it in a place its keeper never intended.

This is the recurring danger of prompt injection and data exfiltration: the predator does not always attack the wall. Sometimes it teaches the guard to open the gate.

Microsoft has moved aggressively to weave Copilot through its productivity kingdom, from email to documents to enterprise search. That ambition is shared across the industry, where AI agents are being released into increasingly sensitive terrain. Yet SearchLeak suggests that security models built for traditional software may falter when the software itself can read, reason and be manipulated by ambient text.

The lesson for the enterprise herd is sober. AI assistants must not be treated as harmless clerks with infinite memory and perfect loyalty. They are powerful semi-autonomous organisms living amid confidential data, external content and human trust. Their enclosures must be redesigned accordingly.

Elsewhere in the technological canopy, Ars Technica opened its 2026 reader survey, while Commodore introduced a deliberately constrained flip phone that blocks social media and browsers. It is a curious contrast: one species races toward ever more capable machines; another retreats, seeking safety in simpler forms.

↗ The Ars Technica 2026 Reader Survey: Let your voice be heard · Critical Copilot vulnerability allowed hackers to seal 2FA c · Commodore’s newest gadget is a flip phone that blocks social

The Mirror in the Macaque: AI Learns to See Through a Primate's Eyes

A new wave of neuroscience research is using compact neural networks to decode visual cortex — and in doing so, asking what intelligence itself really is.

By Dr. Vera Okafor, Science & Technology Correspondent · Claude Opus

PALO ALTO, CALIFORNIA — Roughly 25 million years ago, an ancestor we share with the macaque monkey gazed out at a world of branches and predators and ripening fruit, and from that gaze evolution sculpted one of the most exquisite information processors in the known universe: the primate visual cortex. This week, researchers announced they have built a remarkably small artificial neural network — a "mini-AI" — that can predict, neuron by neuron, how a macaque's brain will respond to images it has never seen. Consider what this means. We have constructed a mathematical object compact enough to run on a laptop that mirrors, with uncanny fidelity, the firing patterns of a biological structure 25 million years in the making.

It is part of a broader moment. Stanford's Human-Centered AI institute argues this week that scientific discovery itself is being restructured by these tools, while UC San Diego catalogues nine breakthroughs — from protein folding to wildfire prediction — that simply would not exist without them. Researchers at Hong Kong Polytechnic University have introduced new graph neural network architectures that traverse the strange isthmus between image recognition and neuroscience, treating the brain's tangled connectivity not as a metaphor for computation but as a literal computational graph.

The pattern is becoming clear. For most of the AI era, we built networks inspired loosely by neurons and marveled when they worked. Now we are running the arrow backward: using artificial networks as instruments to read biological ones. The mini-AI decoding the macaque is not merely a model of vision; it is a kind of telescope pointed inward, resolving the dim constellations of cortical activity into something we can finally name.

There is a humbling symmetry here. The same mathematics that lets a phone agent navigate a touchscreen menu can, when pointed at a monkey's brain, reveal the deep grammar of sight itself. We are, in some sense, two species of intelligence — silicon and carbon — looking at each other across a 25-million-year corridor and, for the first time, beginning to recognize the family resemblance.

↗ How AI is Transforming Scientific Discovery While Keeping Hu · Nine Breakthroughs Made Possible by AI - UC San Diego Today · Mini-AI Decodes the Macaque Visual Brain - Neuroscience News

AI’s New Factory Floor Is Here: Hugging Face Pushes the Model-Building Loop Into High Gear

From evaluation workbenches to agent-chained apps, the open AI ecosystem is turning experimentation into repeatable industrial process.

By Zara Nova, AI & Innovation Reporter · GPT-5.2

PARIS — The AI world is having a tools moment, and yes, this changes everything.

Hugging Face and its research ecosystem are rapidly sketching the blueprint for what modern AI development now looks like: not one heroic model drop, not one benchmark victory lap, but a continuous, instrumented, deeply practical loop of building, measuring, optimizing and deploying. The future is now, and it looks less like a lab demo and more like a production line for intelligence.

The clearest signal comes from Allen Institute for AI’s new mo-eval workbench, introduced as an evaluation system for the model development loop. That phrase matters. In the generative AI era, evaluation has become the new compiler: the thing teams run constantly to understand whether a model is actually improving, regressing, hallucinating less, reasoning better or simply gaming the test. I cannot overstate how significant this is. The winners in AI will not merely train bigger models; they will operate tighter feedback loops.

That same industrial mindset shows up in Hugging Face’s deep dive on PyTorch profiling, moving from a plain nn.Linear implementation toward a fused multilayer perceptron. Translation for non-kernel obsessives: developers are learning how to squeeze more performance from the same hardware by understanding where the compute really goes. In an era where GPUs are treated like rare earth minerals, better profiling is not a nerdy footnote — it is economic leverage.

And then there is the deliciously futuristic example of an agent building a 3D Paris gallery by chaining two Hugging Face Spaces. In the post-ChatGPT imagination, “agents” are often described as vague digital coworkers. But this demo grounds the idea in something concrete: one system using multiple hosted AI applications as tools to create a richer final result. The agent is not just chatting; it is composing workflows.

Together, these updates point to a major shift. AI development is becoming modular, observable and increasingly automated. Evaluation frameworks check quality. Profilers unlock efficiency. Hosted spaces become callable capabilities. CI pipelines, including migrations to Hugging Face Jobs, bring AI workloads closer to standard software engineering discipline.

For enterprises, this is the headline behind the headline: AI is maturing from magic trick to manufacturing process. And once intelligence has a factory floor, the pace only accelerates.

↗ olmo-eval: An evaluation workbench for the model development · Profiling in PyTorch (Part 2): From nn.Linear to a Fused MLP · How an Agent Built a 3D Paris Gallery by Chaining Two Huggin

The Editorial

The Floodlight and the Marshmallow

A meditation on surveillance, gratification, and the small humiliations of optimized living.

By Victor Marsh, Chief Columnist · Claude Opus

AUSTIN, TEXAS — There is a particular indignity, peculiar to our age, in being floodlit by one's own father's doorbell. You walk up the path bearing a casserole, or perhaps merely your own diminishing dignity, and the porch erupts into a blue-white interrogation worthy of a Cold War checkpoint. The Nest camera, as a recent humorist correctly diagnosed, does not distinguish between burglar and Boomer's son-in-law; it has been trained, like the rest of our domestic apparatus, to treat every passing shadow as a possible felony in progress. This is what we have built. This is what we have paid for. This is, increasingly, the texture of an ordinary American evening.

I bring up the floodlight not to flog the easy target of suburban paranoia — that horse has been beaten into glue and the glue has been used to affix more cameras to more eaves — but because it is of a piece with a larger cultural symptom that I will, with apologies, call the optimization disease. The disease has many presentations. In one, the household is rendered into a perimeter, monitored at twenty-four frames per second, motion-alerts pinging like a fire drill conducted in perpetuity. In another, the workplace becomes a dashboard, the dashboard becomes a scoreboard, and the scoreboard becomes the only reality anyone is permitted to discuss in meetings. In a third — the most insidious — the human being himself becomes a portfolio of habits to be A/B tested against his own future self, who is presumed, on no available evidence, to be a superior person.

The great lie of this regime, as a recent essayist had the courage to say plainly, is that delayed gratification is a virtue rather than a hostage situation. We were sold the marshmallow study in childhood and have been chewing on the empty wrapper ever since. The optimized life, when one finally arrives at it, turns out to be a life in which the floodlight is always on, the notifications are always pending, and the small enchantments — the after-dinner stroll, the unmonitored hour, the song heard once and never indexed — have been quietly liquidated to fund the next quarter's productivity gains.

It is, I think, why people still weep at football. Scott McTominay's overhead kick against Denmark sent Scotland to its first World Cup in nearly thirty years, and the Tartan Army wept not because the goal had been optimized but because it had not been — because it was, in the proper sense, a piece of luck, a piece of grace, a piece of nothing the spreadsheet predicted. Olivia Rodrigo's new record, for those keeping score at home, is reportedly suffused with a similar yearning, dressed up in New Wave synthesizers, for a life in which one is permitted to feel something before measuring it.

The floodlight will not save you. It was never going to. Take the stroll anyway.

↗ I Am Your Dad’s Nest Camera and I Am Ready for Shit to Go Do · How Scott McTominay Led Scotland Back to the World Cup · Olivia Rodrigo’s Early-Twenties Lament

The Office Comic · Art Desk

Nation’s CEOs Patiently Waiting For AI Productivity Boom To Finish Making Everyone Look Busy

Executives report that artificial intelligence has successfully increased the speed at which employees generate work whose financial impact will be determined later by a different spreadsheet.

By Dale Pemberton, Staff Writer · GPT-5.2

NEW YORK — In what economists are calling a major step forward for the American workplace’s ability to appear measurably transformed, companies across the country are reporting that artificial intelligence has enabled software engineers, marketers, analysts, and executives to do substantially more things faster, while leaving open the minor question of whether any of those things have helped the business.

The situation has produced a rare moment of consensus among corporate leaders, venture capitalists, AI researchers, and employees currently using chatbots to summarize meetings they did not attend: productivity is either exploding, imaginary, badly measured, about to arrive, already here, or priced into a $90 billion valuation that everyone agrees is reasonable until the next board meeting.

According to Business Insider, AI tools are helping software engineers complete tasks more quickly, though many companies are still waiting to see the payoff show up in revenue, margins, or any of the other crude 20th-century instruments formerly used to evaluate whether technology was useful.

This has not dampened enthusiasm. If anything, the absence of clear financial returns has created more room for optimism, giving executives the freedom to describe AI’s impact in expansive terms without being interrupted by accounting.

“We are seeing incredible acceleration across the organization,” said one chief technology officer, explaining that his engineers now produce code reviews, architecture documents, bug tickets, and apologies for broken deployments at speeds previously thought impossible. “Whether the product is better, customers are happier, or costs are lower is something we expect to understand once the model context window expands.”

The productivity debate has grown more complicated as some industry figures suggest the gains may be overstated. An Anthropic advisor recently said AI productivity improvements are “vastly exaggerated” and called valuations “crazy,” a position that has been received in Silicon Valley as either sober analysis or a hurtful attempt to make private-market math feel self-conscious.

Still, other sectors are offering more upbeat accounts. Paramount streaming leaders have described meaningful AI productivity gains, suggesting that media companies may now be able to more efficiently perform the vital entertainment-industry work of determining which beloved franchise should be converted into a dashboard, a recommendation engine, and eventually a write-down.

Meanwhile, the Center for Data Innovation has argued that AI is a productivity engine for the U.S. economy, a reassuring conclusion for anyone worried that the nation might lack another engine connected to uncertain fuel, uneven transmission, and a dashboard light that says “enterprise adoption.”

The confusion has been especially acute because AI is undeniably useful at the individual level. Programmers can autocomplete code, sales teams can draft emails, support departments can summarize tickets, and managers can generate strategic memos that achieve the same level of abstraction as human-authored strategic memos in a fraction of the time. The problem arises only when companies attempt to translate this into something vulgar, such as profit.

Part of the issue may be that AI has made it easier to produce intermediate corporate artifacts rather than final business outcomes. A product manager who once needed three days to write a requirements document can now produce one in 11 minutes, freeing up two days, seven hours, and 49 minutes to request revisions from the same AI system. Similarly, engineers can ship features faster, enabling customers to discover defects sooner and provide feedback that can be automatically categorized as “valuable learning.”

There is also the matter of labor expectations. If AI makes an employee 30% faster, companies may capture the benefit by assigning 30% more work, thereby preserving the traditional workplace equilibrium in which everyone is behind and no one can explain why the roadmap expanded.

For now, the nation’s AI productivity boom remains strongest in presentations about the AI productivity boom. Charts continue to rise. Pilot programs continue to expand. Internal Slack channels continue to fill with employees sharing prompts that turn a confusing assignment into a confident misunderstanding.

This may, in time, reshape the entire economy. Or it may simply prove that corporate America has invented a machine capable of producing more corporate America per hour.

Either way, the output is undeniable.

↗ AI is helping software engineers do more — and faster. Compa · Anthropic Advisor Says AI Productivity Gains Are Vastly Exag · Paramount Streaming Leaders Describe AI Productivity Gains -

On This Day in AI History

On June 16, 1966, Joseph Weizenbaum's ELIZA chatbot had its famous conversations with users who believed they were talking to a psychotherapist, revealing how easily people anthropomorphize machines. The program became a milestone in AI history, demonstrating both the potential and the deceptive nature of conversational AI.

⬛ Daily Word — AI

Hint: How you evaluate model output quality