Reference: open-data ingestion stack for a 1-3 person SMB operation (2026) — under $50/mo realistic

Recommended stack for a 1-3 person team handling millions (not billions) of rows in 2026:

Layer Recommendation Monthly cost @ small scale
Storage Backblaze B2 or Cloudflare R2 (Parquet files) $1-10
Compute / query DuckDB locally + MotherDuck free tier (10GB, 10 CU-hours) $0
Scheduling GitHub Actions (2,000 free Linux min/mo private; unlimited free for public) OR cron on a $5 VPS $0-5
Transformation dbt-core (free) $0
Orchestration (optional) Dagster OSS or Prefect free tier $0
Visualization Observable Framework (free) / Datawrapper (free tier) / Metabase OSS $0-10
Total realistic minimum $5-25/month

Python-only minimum viable pipeline (for an agency just starting):

  • requests / httpx to hit StatCan WDS, EIA APIv2, MSC GeoMet
  • polars or duckdb for transformation
  • parquet files in object storage
  • Cron + dbt run against DuckDB nightly
  • Quarto or Observable for the public-facing layer

Watch-outs (2026):