{"id":577,"slug":"open-data-ingestion-stack-smb-2026","title":"Reference: open-data ingestion stack for a 1-3 person SMB operation (2026) — under $50/mo realistic","kind":"reference","scope":"business","status":"current","audiences":["claude-code","dev","candid-team"],"topics":["tech-stack","open-data","data-infrastructure"],"reference_body":"**Recommended stack for a 1-3 person team handling millions (not billions) of rows in 2026:**\n\n| Layer | Recommendation | Monthly cost @ small scale |\n|---|---|---|\n| Storage | Backblaze B2 or Cloudflare R2 (Parquet files) | $1-10 |\n| Compute / query | DuckDB locally + MotherDuck free tier (10GB, 10 CU-hours) | $0 |\n| Scheduling | GitHub Actions (2,000 free Linux min/mo private; unlimited free for public) OR cron on a $5 VPS | $0-5 |\n| Transformation | dbt-core (free) | $0 |\n| Orchestration (optional) | Dagster OSS or Prefect free tier | $0 |\n| Visualization | Observable Framework (free) / Datawrapper (free tier) / Metabase OSS | $0-10 |\n| **Total realistic minimum** | | **$5-25/month** |\n\n**Python-only minimum viable pipeline (for an agency just starting):**\n- `requests` / `httpx` to hit StatCan WDS, EIA APIv2, MSC GeoMet\n- `polars` or `duckdb` for transformation\n- `parquet` files in object storage\n- Cron + `dbt run` against DuckDB nightly\n- Quarto or Observable for the public-facing layer\n\n**Watch-outs (2026):**\n- MotherDuck Business tier moved $100 → $250/month ([[motherduck-pricing-changes-2026-business-tier]]). If scaling past free tier, evaluate ClickHouse Cloud or self-hosted DuckDB before committing.\n- For >100M-row workloads, BigQuery per-query pricing often wins; for small scale, DuckDB/MotherDuck wins on simplicity.\n- For Postgres-native teams, TimescaleDB on Hetzner ($10-20/month) handles time-series gracefully without learning new tools.","rationale_body":null,"metadata":null,"links":{"outgoing":[{"slug":"motherduck-pricing-changes-2026-business-tier","title":"MotherDuck pricing 2026: Lite ($25/mo) removed; Business moved to $250/mo between Dec 2025 and Feb 2026","kind":"reference","scope":"business","link_type":"depends-on"}],"incoming":[{"slug":"rule-build-on-official-open-data-not-scraping","title":"RULE: Build Candid client data products on official open-data feeds — never on scraped sources","kind":"rule","scope":"business","link_type":"depends-on"},{"slug":"research-brief-public-data-private-moat","title":"Research brief: Public data as a private moat — building proprietary intelligence from government open data (piece 11 of 15)","kind":"reference","scope":"business","link_type":"relates-to"},{"slug":"modern-data-stack-on-budget-2026","title":"Reference: minimum viable data stack for a $1M-$10M Canadian service business (2026, C$100-C$500/month)","kind":"reference","scope":"business","link_type":"relates-to"}]},"created_at":"2026-05-22T20:32:45.243Z","updated_at":"2026-05-22T20:32:45.243Z"}