RULE: Build Candid client data products on official open-data feeds — never on scraped sources
Created 2026-05-22
Rule: When building data products / public dashboards / "free tools" for Candid Creative clients, always use official open-data feeds with explicit licenses — never scraped sources, even when the scraping would be legally defensible.
Why:
- Durability: OGL-Canada is "perpetual" — see OGL-Canada v2.0: worldwide royalty-free perpetual licence for commercial use. Scraped pipelines break on every website redesign.
- Legal certainty: hiQ v LinkedIn (hiQ v LinkedIn (9th Cir. Apr 2022): scraping publicly accessible data likely doesn't violate CFAA — but hiQ still settled for $500K) and Meta v Bright Data (Meta v Bright Data (Jan 2024, N.D. Cal.): Facebook/Instagram terms don't bar logged-off scraping of public data) protect logged-off scraping under CFAA, but hiQ still settled for $500k on contract grounds. Open-data feeds have no equivalent contract trap.
- Operational hygiene: A predictable release cadence (StatCan 8:30 ET daily, EIA WPSR 10:30 ET Wednesdays, MSC near-real-time) lets you build alerts and automations a scraper can't reliably support.
- AI citation strategy: Per Profound's 680M-citation study (Profound (Aug 2024-Jun 2025, 680M citations): only 11% domain overlap between ChatGPT and Perplexity; 13.7% between AI Overviews and AI Mode), Perplexity favors "named sources with verifiable methodology." "We pulled this from the official StatCan WDS API" earns citations a scraper never does.
How to apply:
- Engagement scoping: when a client asks for a public dashboard, first map the underlying data to one or more OGL-Canada / Statistics Canada / ECCC MSC / EIA / NOAA sources
- If only scraping would work: name the trade explicitly in the proposal (legal + maintenance + AI-citation costs)
- The attribution checklist (Reference: compliance-grade attribution checklist by open-data source) is non-negotiable on every dashboard
- The infrastructure stack (Reference: open-data ingestion stack for a 1-3 person SMB operation (2026) — under $50/mo realistic) is the default — total realistic cost $5-25/mo at the low end
Depends on
- reference OGL-Canada v2.0: worldwide royalty-free perpetual licence for commercial use
- reference hiQ v LinkedIn (9th Cir. Apr 2022): scraping publicly accessible data likely doesn't violate CFAA — but hiQ still settled for $500K
- reference Meta v Bright Data (Jan 2024, N.D. Cal.): Facebook/Instagram terms don't bar logged-off scraping of public data
- reference Profound (Aug 2024-Jun 2025, 680M citations): only 11% domain overlap between ChatGPT and Perplexity; 13.7% between AI Overviews and AI Mode
- reference Reference: open-data ingestion stack for a 1-3 person SMB operation (2026) — under $50/mo realistic
- reference Reference: compliance-grade attribution checklist by open-data source