{"id":557,"slug":"research-brief-public-data-private-moat","title":"Research brief: Public data as a private moat — building proprietary intelligence from government open data (piece 11 of 15)","kind":"reference","scope":"business","status":"current","audiences":["kevin","claude-code","smb-owner","candid-team"],"topics":["agency-methodology","open-data","data-infrastructure"],"reference_body":"**Status:** Research brief — not finished article. Compiled May 2026.\n\n## Thesis\n\n**Free is not the moat — clean is.** OGL-Canada and U.S. public-domain works (17 USC §105) give every business identical raw rights. The durable advantage comes from **normalization, time-series accumulation, and provenance tracking** — the gap between *legally free* and *operationally usable* is where small teams can compete with much larger incumbents who underinvest in cleaning.\n\nThe canonical demonstrations: **Zillow** (110M-home \"living database\" on Census/ACS/county assessor records), **ATTOM Data** (500M+ transactions across 2,690+ counties, 20-step normalization), **Cherre** ($3.3T AUM powered by a property knowledge graph fusing public + vendor data), **FlightAware** (FAA + 45 ANSPs + 30,000 user-hosted ADS-B receivers + Aireon), **The Climate Corporation** (NOAA/NWS/USGS/NRCS/NASA → Bayer subsidiary).\n\n## What changed in 2026\n\n- **Legal:** Ninth Circuit hiQ v LinkedIn (Apr 2022) + Meta v Bright Data (N.D. Cal. Jan 2024) confirm logged-off scraping of public data is generally permissible under CFAA. But hiQ's $500k judgment for User-Agreement breach (logged-in scraping + fake accounts) shows the contract trap remains.\n- **Infrastructure:** A 1-3-person operation can run a real ELT stack (DuckDB local + MotherDuck free tier + dbt + GitHub Actions) for under $50/month if they stay within free tiers. MotherDuck's Business tier moved $100→$250/month between Dec 2025 and Feb 2026, so the cheap-managed-OLAP window may be closing.\n- **AI citation:** Perplexity averages 21.87 citations per response (Qwairy Q3 2025); favors pages with \"visible statistics and proprietary data, named sources with verifiable methodology.\" Normalized open-data dashboards hit all three.\n\n## Honest caveats\n\n- Cherre's \"200+ datasets\" figure is from a third-party profile; Cherre's own platform copy says \"50-plus additional data sources.\" Use the verified $3.3T AUM figure instead.\n- Our World in Data's 89M visitor figure is 2021 — no current comparable figure located.\n- MotherDuck pricing change is sourced from independent technical blogs; MotherDuck did not announce publicly. Verify before quoting in a deliverable.\n- hiQ precedent is Ninth Circuit-specific; consult counsel outside California.\n- Defamation/accuracy risk is real when republishing claims about identifiable persons/businesses from open data — aggregate to neighbourhood/postal-code level, reproduce the StatCan \"as is\" disclaimer, carry a public errata policy.","rationale_body":"The \"build on official open data, not scraping\" discipline is both legally durable and strategically sound — but most SMB content marketers don't know about OGL-Canada, the Statistics Canada Open Licence, or the EIA APIv2. This brief makes the case + provides the licensing checklist + names businesses built on this pattern.","metadata":null,"links":{"outgoing":[{"slug":"ogl-canada-v2-perpetual-royalty-free-commercial","title":"OGL-Canada v2.0: worldwide royalty-free perpetual licence for commercial use","kind":"reference","scope":"business","link_type":"relates-to"},{"slug":"statistics-canada-open-licence-value-added","title":"Statistics Canada Open Licence: explicitly permits \"use, reproduce, publish, freely distribute, or sell value-added products\"","kind":"reference","scope":"business","link_type":"relates-to"},{"slug":"eccc-msc-open-data-end-use-licence","title":"ECCC/MSC Open Data: free anonymous access to weather/climate/water via OGC-compliant GeoMet APIs","kind":"reference","scope":"business","link_type":"relates-to"},{"slug":"cer-pipeline-throughput-ogl-canada","title":"Canada Energy Regulator: pipeline throughput/capacity/tolls + Market Snapshots, all under OGL-Canada","kind":"reference","scope":"business","link_type":"relates-to"},{"slug":"ontario-open-data-catalogue-2948-datasets","title":"Ontario Open Data Catalogue: 2,948 datasets under OGL-Ontario v1.0","kind":"reference","scope":"business","link_type":"relates-to"},{"slug":"eia-apiv2-free-petroleum-electricity","title":"U.S. EIA APIv2: free registered-key access to petroleum/electricity/gas/coal/STEO/AEO data; WPSR releases 10:30 AM ET Wednesdays","kind":"reference","scope":"business","link_type":"relates-to"},{"slug":"noaa-nws-public-domain-disclaimer","title":"NOAA/NWS: information on NWS web servers is in the public domain — no attribution required, provided \"as is\"","kind":"reference","scope":"business","link_type":"relates-to"},{"slug":"international-open-data-licences-2026","title":"International open-data licences (2026): UK OGL v3, OECD CC BY 4.0, Eurostat, World Bank","kind":"reference","scope":"business","link_type":"relates-to"},{"slug":"hiq-v-linkedin-cfaa-public-data-scraping-2022","title":"hiQ v LinkedIn (9th Cir. Apr 2022): scraping publicly accessible data likely doesn't violate CFAA — but hiQ still settled for $500K","kind":"reference","scope":"business","link_type":"relates-to"},{"slug":"meta-v-bright-data-logged-off-scraping-2024","title":"Meta v Bright Data (Jan 2024, N.D. Cal.): Facebook/Instagram terms don't bar logged-off scraping of public data","kind":"reference","scope":"business","link_type":"relates-to"},{"slug":"zillow-built-on-administrative-data-backbone","title":"Zillow: 110M-home \"living database\" built on Census/ACS + 3,000 county assessors + USPS + MLS feeds","kind":"reference","scope":"business","link_type":"relates-to"},{"slug":"attom-500m-transactions-2690-counties","title":"ATTOM Data: 500M+ real estate/loan transactions, 2,690+ counties, 20-step Enterprise Data Management Program","kind":"reference","scope":"business","link_type":"relates-to"},{"slug":"cherre-property-knowledge-graph-3-3t-aum","title":"Cherre: property knowledge graph powering management of $3.3T AUM globally","kind":"reference","scope":"business","link_type":"relates-to"},{"slug":"flightaware-crowdsourced-ads-b-receivers-30000","title":"FlightAware: FAA + 45-country ANSP feeds + 30,000+ user-hosted ADS-B receivers + Aireon — the crowdsourced moat","kind":"reference","scope":"business","link_type":"relates-to"},{"slug":"climate-corporation-bayer-noaa-nws-usgs-nasa","title":"The Climate Corporation (Bayer Crop Science since 2018): field-level overlay on NOAA + NWS + USGS + NRCS + NASA","kind":"reference","scope":"business","link_type":"relates-to"},{"slug":"local-logic-100b-data-points-canadian","title":"Local Logic (Montreal): 100B+ data points — \"largest location dataset in real estate\"","kind":"reference","scope":"business","link_type":"relates-to"},{"slug":"hellosafe-canada-statcan-osfi-barometers","title":"HelloSafe: \"Canada's leading insurance/financial comparison platform\" — built on StatCan + OSFI data","kind":"reference","scope":"business","link_type":"relates-to"},{"slug":"carfax-1984-10000-records-fax-to-35-billion","title":"Carfax: from 10,000 records faxed in 1986 to 35B+ records across 151,000+ sources — sold to S&P Global Mobility 2022","kind":"reference","scope":"business","link_type":"relates-to"},{"slug":"motherduck-pricing-changes-2026-business-tier","title":"MotherDuck pricing 2026: Lite ($25/mo) removed; Business moved to $250/mo between Dec 2025 and Feb 2026","kind":"reference","scope":"business","link_type":"relates-to"},{"slug":"open-data-ingestion-stack-smb-2026","title":"Reference: open-data ingestion stack for a 1-3 person SMB operation (2026) — under $50/mo realistic","kind":"reference","scope":"business","link_type":"relates-to"},{"slug":"profound-680m-citations-perplexity-citation-behavior","title":"Profound (Aug 2024-Jun 2025, 680M citations): only 11% domain overlap between ChatGPT and Perplexity; 13.7% between AI Overviews and AI Mode","kind":"reference","scope":"business","link_type":"relates-to"},{"slug":"attribution-checklist-by-source","title":"Reference: compliance-grade attribution checklist by open-data source","kind":"reference","scope":"business","link_type":"relates-to"},{"slug":"underexploited-canadian-open-data-by-industry","title":"Reference: underexploited Canadian open data by industry — highest-leverage starts for KW SMB clients","kind":"reference","scope":"business","link_type":"relates-to"},{"slug":"rule-build-on-official-open-data-not-scraping","title":"RULE: Build Candid client data products on official open-data feeds — never on scraped sources","kind":"rule","scope":"business","link_type":"relates-to"},{"slug":"rule-attribution-discipline-on-every-data-product","title":"RULE: Every Candid data product carries source attribution per the [[attribution-checklist-by-source]]. Mis-attribution terminates the licence.","kind":"rule","scope":"business","link_type":"relates-to"},{"slug":"research-brief-structured-content-as-competitive-advantage","title":"Research brief: Structured content as a competitive advantage (piece 2 of 15)","kind":"reference","scope":"business","link_type":"relates-to"},{"slug":"research-brief-marketing-sites-that-do-something","title":"Research brief: What makes a marketing site do something (piece on brochure vs platform)","kind":"reference","scope":"business","link_type":"relates-to"},{"slug":"research-brief-built-to-last","title":"Research brief: Built to Last — why most SMB sites rebuild every 3-4 years (piece 5 of 15)","kind":"reference","scope":"business","link_type":"relates-to"}],"incoming":[{"slug":"research-brief-dataset-is-the-product","title":"Research brief: The Dataset is the Product — when a service business should own its data (piece 12 of 15)","kind":"reference","scope":"business","link_type":"relates-to"},{"slug":"foundation-roadmap-15-pieces-closure","title":"CANDID REFERENCE: how the 15-brief foundation roadmap connects — the throughline from strategic frame to editorial layer","kind":"reference","scope":"business","link_type":"depends-on"}]},"created_at":"2026-05-22T20:32:45.128Z","updated_at":"2026-05-22T20:32:45.128Z"}