## Demo
<img width="2191" height="1549" alt="image" src="https://github.com/user-attachments/assets/93f6c37e-28c1-4500-b680-ddb7422c423e" />
<img width="2207" height="1614" alt="image" src="https://github.com/user-attachments/assets/ef6209f0-31e6-4508-856c-f4eb8755941c" />
## Linear
- [KLAIR-2605: Ingest SaaS budgeting source sheets into Redshift (core_finance.saas_budgeting_*)](https://linear.app/builder-team/issue/KLAIR-2605)
## Overview
Extends the operator-driven SaaS budgeting pipeline at aws-saas-budget-scripts/pipeline/ to cover two additional Google source surfaces alongside the existing Docker ingest. The weekly Central Services Units - W<NN>'<YYYY> databases tab and the mappings database mapping tab now land in Redshift under core_finance.saas_budgeting_*, behind the same python -m pipeline.main entrypoint, sharing the existing secrets / connection helpers and idempotency contract. Data-tier only — no frontend, no API surface.
## What was built
- Spec 01 — databases tab ingest: New databases_ingest.py lists weekly snapshot files in folder 1HuPt4CGkErK7JHRw2vKth9RpaI-_dpF5, picks newest-modified per (year, week) (discards logged), validates the databases tab header, projects (product, db_engine, db_server, db_name, cpu_hours, storage_gb) with hard-fail on unparseable numerics, and writes via DELETE+INSERT keyed on (snapshot_year, snapshot_week).
- Spec 02 — mapping tab ingest: New mappings_ingest.py reads the single mapping tab of sheet 1cyf34vYt8l_RPYYe8St4WFVKkovM2-dM_WzwyI2uxm0, validates the locked four-column header (database, db_engine, product, l5), rejects non-string cells, captures Drive modifiedTime, and full-replaces the table inside one transaction. Additive helpers added to drive_client.py and redshift_writer.py.
- Spec 03 — CLI multi-ingest dispatcher: Existing Docker logic extracted verbatim into docker_ingest.py; main.py rewritten as a thin argparse dispatcher with --ingest docker|databases|mappings|all. Per-ingest flag validation rejects --reingest / --quarter for mappings with a clear stderr message. Shared calendar utilities extracted to _calendar.py to break a circular import. --ingest all runs in fixed order (mappings → databases → docker) and aborts on first failure.
## Tables created
| Table | DDL | Idempotency |
|-------|-----|-------------|
| core_finance.saas_budgeting_database_units | scripts/sql/create_saas_budgeting_database_units.sql | DELETE+INSERT per (snapshot_year, snapshot_week) |
| core_finance.saas_budgeting_database_mapping | scripts/sql/create_saas_budgeting_database_mapping.sql | Full-replace in single transaction (sheet is one global snapshot) |
## CLI surface
python -m pipeline.main --ingest {docker,databases,mappings,all}[--reingest YYYY-WWW[,YYYY-WWW,...]]
[--quarter YYYY-Qn]
[--dry-run]
- Default --ingest all runs mappings → databases → docker in that order.
- --reingest / --quarter apply to docker and databases. They are rejected with a non-zero exit and clear stderr message when --ingest mappings is selected, since the mappings sheet is a single global snapshot and week-keying does not apply.
- --ingest all combined with --reingest / --quarter emits a stderr advisory and routes the flags to the week-aware ingests (mappings ignores them).
- Behavioral parity preserved for the Docker ingest: --ingest docker --reingest 2026-W18 is byte-identical to today's invocation.
## Test coverage
- 124 tests passing (59 prior + 65 new): 18 for databases ingest, 27 for mappings ingest, 16 for the dispatcher, plus 4 new tests covering the self-review fixes below.
- ruff format + ruff check clean on changed files.
- pyright clean on changed files.
## Self-review
Three findings addressed in-PR:
1. Timezone bug in _parse_modified_time — astimezone(None) was relying on local-time-zone resolution; replaced with explicit astimezone(timezone.utc) so Drive modifiedTime always lands as UTC regardless of operator machine.
2. _coerce_numeric accepted booleans — Python's bool is a subclass of int, so a stray TRUE cell would silently coerce to 1.0. Added an isinstance(raw, bool) reject branch with a ValueError naming the column and sheet id.
3. Cross-ingest semantics in main.py — Added a comment block explaining the mappings → databases → docker ordering rationale (reference data first; cheapest feedback loops before the most expensive ingest) so future contributors understand why the order is fixed.
Four minor findings cleared without code changes:
1. Pluralizing mapping to mappings_ingest — flagged but kept; matches the --ingest mappings CLI flag and the FEATURE.md naming, breaking the parity would force a doc churn for cosmetic reasons.
2. Skip-when-unchanged optimization for mappings — explicitly deferred in spec 02; the source_modified_time column is already persisted so a future spec can flip the dispatcher without a schema change.
3. Centralizing chunked-INSERT logic — write_databases_snapshot and write_mapping_snapshot each repeat the chunked-INSERT loop. Acceptable duplication for two call sites; consolidation would couple table-shape-specific column lists and is premature.
4. docker_ingest.run exit-code alignment with mappings_ingest.run dataclass return — adapter lives in main.py's dispatch loop. Changing mappings_ingest.run to return int would shed the typed MappingIngestResult that callers (and tests) rely on.
## Files changed
- Implementation modules (4): databases_ingest.py (new), mappings_ingest.py (new), docker_ingest.py (new — extracted), _calendar.py (new — shared calendar utils), plus additive edits to drive_client.py, redshift_writer.py, and a near-total rewrite of main.py.
- DDL (2): scripts/sql/create_saas_budgeting_database_units.sql, scripts/sql/create_saas_budgeting_database_mapping.sql.
- Tests (3): tests/test_databases_ingest.py, tests/test_mappings_ingest.py, tests/test_main_dispatch.py.
- Specs (4): FEATURE.md + three spec files under features/aws-spend/saas-budgeting-source-sheets-ingest/specs/.
## Test plan
- [ ] Apply DDL to Redshift sandbox cluster (create_saas_budgeting_database_units.sql, create_saas_budgeting_database_mapping.sql)
- [ ] python -m pipeline.main --ingest databases --dry-run against staging Drive folder
- [ ] python -m pipeline.main --ingest mappings --dry-run against staging sheet
- [ ] Behavioral parity check: python -m pipeline.main --ingest docker --reingest 2026-W18 produces identical output to today's python -m pipeline.main --reingest 2026-W18
- [ ] python -m pipeline.main --ingest mappings --reingest 2026-W18 exits non-zero with clear stderr message
- [ ] python -m pipeline.main --help lists docker, databases, mappings, all
🤖 Generated with [Claude Code](https://claude.com/claude-code)