Three data categories for SMB-facing analytics: public/government open data, live third-party feeds, and operational (first-party) data
Summary
Claim: Data inputs for SMB-facing analytics fall into three categories:
- Public / government open data — statistical agencies (Census/StatCan/Eurostat), central-bank indicators (FRED, BoC Valet), weather, GIS, transit (GTFS), business / property / permit registries.
- Live third-party feeds and APIs — commercial market data, mapping/places, embedded-analytics SaaS.
- Operational (first-party) data — the business's own transaction logs, CRM, inventory, scheduling, product-usage logs.
Source: Industry framework — synthesised from FRED docs (https://fred.stlouisfed.org/docs/api/fred/), BoC Valet docs (https://www.bankofcanada.ca/valet/docs), gtfs.org, and the build-vs-buy-data literature (https://medium.com/@audaciatech/data-products-build-vs-buy ; https://www.audacia.co.uk).
Confidence: Industry-consensus.
Why this matters for Candid: The three buckets carry very different cost / defensibility profiles. Categories 1 and 2 are non-exclusive (anyone can use them). Category 3 is the only one with native defensibility — see Andreessen Horowitz, "The Empty Promise of Data Moats" (Casado & Lauten, 2019) — most "data network effects" are really scale effects that diminish, Synthesis: data is a defensible asset only when proprietary + hard to replicate + tightly coupled to a feedback loop + continuously refreshed — otherwise it is an operational byproduct any competitor can buy or collect, and R2 — Build only on data you already own — transaction history, CRM, scheduling, no-show patterns; that is the only category with native defensibility.