Reference: open-data ingestion stack for a 1-3 person SMB operation (2026) — under $50/mo realistic

reference·Scope: business·Status: current

Created 2026-05-22

Recommended stack for a 1-3 person team handling millions (not billions) of rows in 2026:

Layer	Recommendation	Monthly cost @ small scale
Storage	Backblaze B2 or Cloudflare R2 (Parquet files)	$1-10
Compute / query	DuckDB locally + MotherDuck free tier (10GB, 10 CU-hours)	$0
Scheduling	GitHub Actions (2,000 free Linux min/mo private; unlimited free for public) OR cron on a $5 VPS	$0-5
Transformation	dbt-core (free)	$0
Orchestration (optional)	Dagster OSS or Prefect free tier	$0
Visualization	Observable Framework (free) / Datawrapper (free tier) / Metabase OSS	$0-10
Total realistic minimum		$5-25/month

Python-only minimum viable pipeline (for an agency just starting):

Watch-outs (2026):

MotherDuck Business tier moved $100 → $250/month (MotherDuck pricing 2026: Lite ($25/mo) removed; Business moved to $250/mo between Dec 2025 and Feb 2026). If scaling past free tier, evaluate ClickHouse Cloud or self-hosted DuckDB before committing.
For >100M-row workloads, BigQuery per-query pricing often wins; for small scale, DuckDB/MotherDuck wins on simplicity.
For Postgres-native teams, TimescaleDB on Hetzner ($10-20/month) handles time-series gracefully without learning new tools.