Research brief: Public data as a private moat — building proprietary intelligence from government open data (piece 11 of 15)

Status: Research brief — not finished article. Compiled May 2026.

Thesis

Free is not the moat — clean is. OGL-Canada and U.S. public-domain works (17 USC §105) give every business identical raw rights. The durable advantage comes from normalization, time-series accumulation, and provenance tracking — the gap between legally free and operationally usable is where small teams can compete with much larger incumbents who underinvest in cleaning.

The canonical demonstrations: Zillow (110M-home "living database" on Census/ACS/county assessor records), ATTOM Data (500M+ transactions across 2,690+ counties, 20-step normalization), Cherre ($3.3T AUM powered by a property knowledge graph fusing public + vendor data), FlightAware (FAA + 45 ANSPs + 30,000 user-hosted ADS-B receivers + Aireon), The Climate Corporation (NOAA/NWS/USGS/NRCS/NASA → Bayer subsidiary).

What changed in 2026

  • Legal: Ninth Circuit hiQ v LinkedIn (Apr 2022) + Meta v Bright Data (N.D. Cal. Jan 2024) confirm logged-off scraping of public data is generally permissible under CFAA. But hiQ's $500k judgment for User-Agreement breach (logged-in scraping + fake accounts) shows the contract trap remains.
  • Infrastructure: A 1-3-person operation can run a real ELT stack (DuckDB local + MotherDuck free tier + dbt + GitHub Actions) for under $50/month if they stay within free tiers. MotherDuck's Business tier moved $100→$250/month between Dec 2025 and Feb 2026, so the cheap-managed-OLAP window may be closing.
  • AI citation: Perplexity averages 21.87 citations per response (Qwairy Q3 2025); favors pages with "visible statistics and proprietary data, named sources with verifiable methodology." Normalized open-data dashboards hit all three.

Honest caveats

  • Cherre's "200+ datasets" figure is from a third-party profile; Cherre's own platform copy says "50-plus additional data sources." Use the verified $3.3T AUM figure instead.
  • Our World in Data's 89M visitor figure is 2021 — no current comparable figure located.
  • MotherDuck pricing change is sourced from independent technical blogs; MotherDuck did not announce publicly. Verify before quoting in a deliverable.
  • hiQ precedent is Ninth Circuit-specific; consult counsel outside California.
  • Defamation/accuracy risk is real when republishing claims about identifiable persons/businesses from open data — aggregate to neighbourhood/postal-code level, reproduce the StatCan "as is" disclaimer, carry a public errata policy.