RULE: Build Candid client data products on official open-data feeds — never on scraped sources

rule·Scope: business·Status: current

agency-methodology regulatory-compliance open-data

Created 2026-05-22

Rule: When building data products / public dashboards / "free tools" for Candid Creative clients, always use official open-data feeds with explicit licenses — never scraped sources, even when the scraping would be legally defensible.

Why:

Durability: OGL-Canada is "perpetual" — see OGL-Canada v2.0: worldwide royalty-free perpetual licence for commercial use. Scraped pipelines break on every website redesign.
Legal certainty: hiQ v LinkedIn (hiQ v LinkedIn (9th Cir. Apr 2022): scraping publicly accessible data likely doesn't violate CFAA — but hiQ still settled for $500K) and Meta v Bright Data (Meta v Bright Data (Jan 2024, N.D. Cal.): Facebook/Instagram terms don't bar logged-off scraping of public data) protect logged-off scraping under CFAA, but hiQ still settled for $500k on contract grounds. Open-data feeds have no equivalent contract trap.
Operational hygiene: A predictable release cadence (StatCan 8:30 ET daily, EIA WPSR 10:30 ET Wednesdays, MSC near-real-time) lets you build alerts and automations a scraper can't reliably support.
AI citation strategy: Per Profound's 680M-citation study (Profound (Aug 2024-Jun 2025, 680M citations): only 11% domain overlap between ChatGPT and Perplexity; 13.7% between AI Overviews and AI Mode), Perplexity favors "named sources with verifiable methodology." "We pulled this from the official StatCan WDS API" earns citations a scraper never does.

How to apply:

Engagement scoping: when a client asks for a public dashboard, first map the underlying data to one or more OGL-Canada / Statistics Canada / ECCC MSC / EIA / NOAA sources
If only scraping would work: name the trade explicitly in the proposal (legal + maintenance + AI-citation costs)
The attribution checklist (Reference: compliance-grade attribution checklist by open-data source) is non-negotiable on every dashboard
The infrastructure stack (Reference: open-data ingestion stack for a 1-3 person SMB operation (2026) — under $50/mo realistic) is the default — total realistic cost $5-25/mo at the low end

Depends on

rule RULE: Every non-trivial claim carries a named source with author/institution + date + URL. Confidence flag honest.

Referenced by (1)

reference Research brief: Public data as a private moat — building proprietary intelligence from government open data (piece 11 of 15) · relates-to

Depends on

Related

Referenced by (1)