<!-- CURSOR_AGENT_PR_BODY_BEGIN -->
## Summary
Adds C3.7 — the fifth per-product benchmark check in the C3.x family, and the first whose Trilogy benchmark is combined across two (section, category) rows rather than read from a single row. The check sums the (total, Sales) and (total, Marketing) rows per product column and compares the combined value against the 6.0% Trilogy benchmark.
### Verdict bands
| Verdict | Range (combined Sales + Marketing % of revenue) |
| --------- | ----------------------------------------------- |
| Pass | combined ≤ 6.0% |
| Warning | 6.0% < combined ≤ 11.0% (+5pp band) |
| Critical | combined > 11.0% |
Trilogy-wide combined Sales + Marketing benchmark is 6.0% of revenue. The 5pp warning band matches the C3.x family's preliminary calibration — wider than C3.4's 3pp because Sales + Marketing is a structurally larger cost bucket whose plausible operating range is bumpier quarter-over-quarter than tooling spend (Finance review of this band is tracked separately and is NOT tuned within this check).
### Schema deviation from C3.3 / C3.4 / C3.5 / C3.6 / C3.8 / C3.9
C3.7 is the first per-product benchmark check whose Trilogy threshold is combined across two rows. Three concrete consequences are called out in the module docstring:
1. _TARGET_CATEGORIES is a tuple of category labels (("Sales", "Marketing")) instead of the singular _TARGET_CATEGORY constant the other per-product benchmarks use. The order is informational only; the check sums commutatively.
2. The check calls benchmarks_row_for twice (once per category) and folds the per-column cells together per product.
3. The check fires on partial data. When exactly one of the two rows is present (e.g. Sales but no Marketing for a narrow-sheet BU), the check still evaluates against whatever's parseable rather than silently skipping. missing_categories in supporting_data records which row was absent so the user sees the partial-data path explicitly.
This deviation is local to C3.7 — the docstring explicitly warns future contributors not to retrofit C3.3 / C3.4 / C3.5 / C3.6 / C3.8 / C3.9 to use a tuple-of-one for "consistency". They each correctly use the singular constant because each one reads exactly one row.
### Skip ladder (four typed-skip conditions, ordered most → least specific)
1. No tab. plan.has_benchmarks_by_product is False or the table is None / empty.
2. Both rows absent. Neither (total, Sales) nor (total, Marketing) exists — mirrors the IgniteTech / GFI narrow-sheet shape today. Distinct reason string ("narrow-sheet variant") so operators can tell this apart from a sheet-author-edit rename.
3. No product columns evaluable. At least one row is present, but every product column has both per-row cells parsing as None.
There is no separate skip for "one row absent" — partial-data fires instead, per the design decision documented in the issue.
### New shared BenchmarkCombinedSupport Pydantic model
Added in _helpers.py alongside the existing BenchmarkPerProductSupport / BenchmarkAggregateSupport models. Pinned schema for combined-row supporting_data payloads:
- product, is_rollup — column metadata (same shape as C3.3)
- combined_actual_pct — sum of the two per-row cells
- component_pcts: dict[str, float | None] — per-row values, with None for absent rows / blank cells
- missing_categories: list[str] — labels for which the row was absent or the cell was blank
- benchmark_pct, gap_pp, warning_band_pp, standard_benchmark_pct_in_sheet — same shape as BenchmarkPerProductSupport
frozen=True, extra="forbid" so the supporting_data shape is immutable and FE-payload drift fails loudly. Existing BenchmarkAggregateSupport is reused as-is for the BU-level all-pass finding — its existing shape is sufficient.
## Why it's needed
C3.7 ships the next per-product benchmark in the Trilogy review scorecard. Sales and Marketing combined spend is the largest customer-acquisition cost bucket on most SaaS products; sustained overage signals either acquisition inefficiency (CAC running ahead of payback) or a product whose growth motion needs a structural rethink. The 6.0% Trilogy benchmark is the combined target — Sales and Marketing are evaluated together because the GM controls the mix between them within a single envelope.
A BU that ships only the Sales row should still be flagged when its Sales spend alone exceeds the combined benchmark, rather than silently passing because Marketing is missing — hence the partial-data behaviour.
## Changes
### C3.7 — Sales & Marketing combined per-product benchmark
- klair-api/budget_bot/board_doc/review_checks/_helpers.py — added BenchmarkCombinedSupport Pydantic model and the corresponding __all__ entry. No other changes — BenchmarkPerProductSupport, BenchmarkAggregateSupport, BenchmarkColumn, benchmarks_row_for, benchmark_product_columns, _parse_cell are stable and reused as-is.
- klair-api/budget_bot/board_doc/review_checks/sales_marketing_benchmark.py (new)
- Reads the (section="total", category="Sales") and (section="total", category="Marketing") rows and sums them per product column.
- Emits one finding per product column whose combined actual exceeds 6.0%.
- Verdict bands: pass ≤ 6.0%; warning in (6.0%, 11.0%]; critical > 11.0%.
- Rollup column (col 3 — <BU> Consolidated) → targets SectionType.FINANCIALS; other product columns → SectionType.PRODUCT_DETAIL with product_name (falls through to doc-level when the spec has no dedicated product section).
- All-products-pass emits a single BU-level pass finding via BenchmarkAggregateSupport so the scorecard reflects the check ran.
- Skip ladder distinguishes "tab not loaded" / "both rows absent (narrow-sheet variant)" / "every Sales+Marketing cell blank".
- Partial-data branch: one row present + one absent → check fires on whatever's parseable; missing_categories records the absent row; what narrative appends a clause naming the absent row so operators see the partial-data path explicitly.
- Narrative is tailored to Sales & Marketing — what names "Sales & Marketing cost"; why paragraph explains combined spend as the largest customer-acquisition cost bucket and that the 6.0% Trilogy benchmark is the *combined* target (the GM controls the mix between the two buckets); remediation options cover rebalancing Sales / Marketing within the envelope, repricing/repackaging to lift the revenue denominator, and defending the gap as a deliberate growth-investment cycle. Not copy-pasted from C3.3's engineering rationale — sales/marketing overspend has different remediation paths.
- Registration via the @register decorator (Strategy I — no __init__.py edits required).
### Tests
- klair-api/tests/board_doc/test_sales_marketing_benchmark.py (new — 35 tests) — mirrors test_engineering_product_benchmark.py with the fixture builder parameterising both the (total, Sales) and (total, Marketing) rows so each test can independently dial each row. Coverage:
- 6 verdict band + per-product fan-out tests including dual-split boundary cases (4% + 2% and 3% + 3% both sum to 6.0% pass, 7% + 4% sums to 11.0% warning boundary, 6% + 5.01% sums to 11.01% critical, 3.9% + 2% sums to 5.9% just-below-pass)
- 3 section-id resolution tests (rollup → FINANCIALS, known product → PRODUCT_DETAIL, unknown product → doc-level fallback)
- 4 skip-ladder tests (no tab, both-rows-absent narrow-sheet variant, every cell blank, partial-blank cells per-column)
- 2 partial-data firing tests (Sales-only with Marketing row missing; Marketing-only with Sales row missing) — pin both narrative + supporting_data shape on partial data
- 1 all-pass aggregate emission test
- 2 supporting-data shape tests pinning the new BenchmarkCombinedSupport payload keys including component_pcts and missing_categories
- 1 registry-wiring smoke test
- 4 BenchmarkCombinedSupport round-trip tests (full payload, partial-data shape with None components, frozen=True rejects post-construction assignment, extra="forbid" rejects unknown keys)
- 4 BenchmarkCombinedSupport validator tests (post-Eric-round-1) — missing_categories ↔ None-keyed component_pcts drift, combined_actual_pct vs component sum, gap_pp vs combined - benchmark, ≤0.01pp rounding tolerance
- 3 drift-sentinel tests (TestTargetCategoriesPin — literal labels + section + tuple-not-string)
- 2 ragged-row drift WARNING tests
- klair-api/tests/board_doc/test_review_endpoint.py — seeded (total, Sales) at 3.0% and (total, Marketing) at 2.0% in the populated DataPackage fixture so C3.7 emits a clean pass alongside C3.3 / C3.4 / C3.5 / C3.6 (combined 5.0% < 6.0% benchmark). Bumped expected findings count 11 → 12; added "C3.7" to the happy-path check_ids set, the missing-data skipped_checks set (and skip-reason iteration), and the partial-completeness ran_ids set. Updated comments referencing C3.x siblings to call out C3.7.
## Test count
- 35 new tests in test_sales_marketing_benchmark.py (31 in the initial cut + 4 added post-Eric-round-1 to pin the new @model_validator)
- +1 finding in test_review_endpoint.py's happy path; +1 entry across three skipped_checks / ran_ids sets
## Breaking changes
None. New check is additive (one more entry auto-discovered into REGISTRY via Strategy I). BenchmarkCombinedSupport is purely additive in _helpers.py — no existing helper or model was modified. No FE changes: C3.7 findings render through the same FindingCard component as C3.3-C3.6 (the supporting_data dict is rendered generically by snake_case key).
## Test plan
Drone-side checks (the boxes I can verify myself — all green):
- [x] cd klair-api && uv run pytest tests/board_doc/test_sales_marketing_benchmark.py -q → 35 passed
- [x] cd klair-api && uv run pytest tests/board_doc/test_review_endpoint.py -q → 16 passed
- [x] cd klair-api && uv run pytest tests/board_doc/test_sales_marketing_benchmark.py tests/board_doc/test_review_endpoint.py -q → 51 passed
- [x] cd klair-api && uv run ruff format budget_bot/board_doc/review_checks tests/board_doc → no reformat
- [x] cd klair-api && uv run ruff check budget_bot/board_doc/review_checks tests/board_doc → clean
- [x] cd klair-api && uv run pyright budget_bot/board_doc/review_checks/sales_marketing_benchmark.py budget_bot/board_doc/review_checks/_helpers.py tests/board_doc/test_sales_marketing_benchmark.py → 0 errors / 0 warnings
Reviewer-side validation (un-checked — please confirm post-merge):
- [ ] Open the Board Doc on a BU whose Benchmark by Product data triggers C3.7 (any BU where the (total, Sales) + (total, Marketing) combined per-product cell exceeds 6.0%). Open the Review tab. Confirm the C3.7 finding appears alongside C3.3-C3.6 with the right severity (warning vs. critical based on the combined value), and the what / why / options text reads sensibly for Sales & Marketing spend (not a copy-paste of C3.3's engineering rationale).
- [ ] Address with Claire the C3.7 finding once — confirm Claire's regeneration produces a non-trivial change to the affected section (same flow we validated for C3.3-C3.6). C3.7 reuses the same finding shape so the address-with-claire pipeline should "just work".
- [ ] Spot-check the supporting_data JSON in the API response (POST /board_doc/.../review) — verify a per-product finding's supporting_data matches the BenchmarkCombinedSupport schema (product, is_rollup, combined_actual_pct, component_pcts, missing_categories, benchmark_pct, gap_pp, warning_band_pp, standard_benchmark_pct_in_sheet) and the BU-level pass finding matches BenchmarkAggregateSupport.
Substrate quirk (carried over from C3.4 / C3.5 / C3.6 — same pre-existing gap, same root cause): two C3.7 caplog-based tests (TestRaggedRowDriftWarning::test_warning_fires_when_row_shorter_than_header) pass cleanly in isolation but fail when run as part of the full tests/board_doc -q suite because some upstream test mutates logger propagation state. The C3.4 / C3.6 sibling tests exhibit the same behaviour on main; this is a Klair-side test substrate gap, not a regression from C3.7. The new test is structurally identical to its C3.6 counterpart.
## Verification artifact
### Sample finding payload — both rows present (combined-total path)
C3.7 emits the following when a non-rollup product (Mobilogy) at combined Sales 5.0% + Marketing 3.0% = 8.0% trips the warning band against the 6.0% benchmark:
{"check_id": "C3.7",
"check_area": "Per-Product Benchmarks",
"severity": "warning",
"section_id": "product_detail__mobilogy",
"what": "Mobilogy Sales & Marketing cost (8.0%) is 2.0pp above the 6.0% Trilogy benchmark.",
"why": "Sales and Marketing combined spend is the largest customer-acquisition cost bucket on most SaaS products; sustained overage signals either acquisition inefficiency (CAC running ahead of payback) or a product whose growth motion needs a structural rethink. The 6.0% Trilogy benchmark is the combined target — Sales and Marketing are evaluated together because the GM controls the mix between them within a single envelope.",
"options": [
"Rebalance Sales and Marketing spend within the same envelope for Mobilogy — shift marketing dollars toward top-of-funnel motions where CAC is lower, or trim headcount in whichever bucket is running heaviest, so the combined ratio drops below the 6.0% benchmark without growing the total spend.",
"Reprice or repackage Mobilogy to lift the revenue denominator so the ratio recovers without OpEx cuts.",
"Defend the gap — document why a 2.0pp combined overage is acceptable for Mobilogy this quarter (e.g. a deliberate growth-investment cycle whose CAC payback shortens in the next plan)."
],
"preferred_action": null,
"supporting_data": {
"product": "Mobilogy",
"is_rollup": false,
"combined_actual_pct": 8.0,
"component_pcts": {"Sales": 5.0, "Marketing": 3.0},
"missing_categories": [],
"benchmark_pct": 6.0,
"gap_pp": 2.0,
"warning_band_pp": 5.0,
"standard_benchmark_pct_in_sheet": 6.0
}
}
### Sample finding payload — partial data (Sales-only path)
When the Marketing row is absent from the BU's worksheet and Sales alone at 8.0% trips the warning band on the BU's rollup column:
{"check_id": "C3.7",
"check_area": "Per-Product Benchmarks",
"severity": "warning",
"section_id": "financials",
"what": "BU Consolidated Sales & Marketing cost (8.0%) is 2.0pp above the 6.0% Trilogy benchmark. (Combined cost computed from Sales only; Marketing row not present in this BU's worksheet.)",
"why": "Sales and Marketing combined spend is the largest customer-acquisition cost bucket on most SaaS products; sustained overage signals either acquisition inefficiency (CAC running ahead of payback) or a product whose growth motion needs a structural rethink. The 6.0% Trilogy benchmark is the combined target — Sales and Marketing are evaluated together because the GM controls the mix between them within a single envelope.",
"options": [
"Rebalance Sales and Marketing spend within the same envelope for BU Consolidated — shift marketing dollars toward top-of-funnel motions where CAC is lower, or trim headcount in whichever bucket is running heaviest, so the combined ratio drops below the 6.0% benchmark without growing the total spend.",
"Reprice or repackage BU Consolidated to lift the revenue denominator so the ratio recovers without OpEx cuts.",
"Defend the gap — document why a 2.0pp combined overage is acceptable for BU Consolidated this quarter (e.g. a deliberate growth-investment cycle whose CAC payback shortens in the next plan)."
],
"preferred_action": null,
"supporting_data": {
"product": "BU Consolidated",
"is_rollup": true,
"combined_actual_pct": 8.0,
"component_pcts": {"Sales": 8.0, "Marketing": null},
"missing_categories": ["Marketing"],
"benchmark_pct": 6.0,
"gap_pp": 2.0,
"warning_band_pp": 5.0,
"standard_benchmark_pct_in_sheet": 4.0
}
}
(Critical findings carry a non-null preferred_action set to the rebalance-within-envelope option per the C3.3-pattern critical-only nudge.)
Closes KLAIR-2650
<!-- CURSOR_AGENT_PR_BODY_END -->
<div><a href="https://cursor.com/agents/bc-309505a0-51c0-49af-afdc-7f6d7cea22d1"><picture><source media="(prefers-color-scheme: dark)" srcset="https://cursor.com/assets/images/open-in-web-dark.png"><source media="(prefers-color-scheme: light)" srcset="https://cursor.com/assets/images/open-in-web-light.png"><img alt="Open in Web" width="114" height="28" src="https://cursor.com/assets/images/open-in-web-dark.png"></picture></a> <a href="https://cursor.com/background-agent?bcId=bc-309505a0-51c0-49af-afdc-7f6d7cea22d1"><picture><source media="(prefers-color-scheme: dark)" srcset="https://cursor.com/assets/images/open-in-cursor-dark.png"><source media="(prefers-color-scheme: light)" srcset="https://cursor.com/assets/images/open-in-cursor-light.png"><img alt="Open in Cursor" width="131" height="28" src="https://cursor.com/assets/images/open-in-cursor-dark.png"></picture></a> </div>