{"id":581,"slug":"rule-build-on-official-open-data-not-scraping","title":"RULE: Build Candid client data products on official open-data feeds — never on scraped sources","kind":"rule","scope":"business","status":"current","audiences":["claude-code","dev","candid-team"],"topics":["agency-methodology","regulatory-compliance","open-data"],"reference_body":"**Rule:** When building data products / public dashboards / \"free tools\" for Candid Creative clients, always use **official open-data feeds with explicit licenses** — never scraped sources, even when the scraping would be legally defensible.\n\n**Why:**\n- **Durability:** OGL-Canada is \"perpetual\" — see [[ogl-canada-v2-perpetual-royalty-free-commercial]]. Scraped pipelines break on every website redesign.\n- **Legal certainty:** hiQ v LinkedIn ([[hiq-v-linkedin-cfaa-public-data-scraping-2022]]) and Meta v Bright Data ([[meta-v-bright-data-logged-off-scraping-2024]]) protect logged-off scraping under CFAA, but hiQ still settled for $500k on contract grounds. Open-data feeds have no equivalent contract trap.\n- **Operational hygiene:** A predictable release cadence (StatCan 8:30 ET daily, EIA WPSR 10:30 ET Wednesdays, MSC near-real-time) lets you build alerts and automations a scraper can't reliably support.\n- **AI citation strategy:** Per Profound's 680M-citation study ([[profound-680m-citations-perplexity-citation-behavior]]), Perplexity favors *\"named sources with verifiable methodology.\"* \"We pulled this from the official StatCan WDS API\" earns citations a scraper never does.\n\n**How to apply:**\n- Engagement scoping: when a client asks for a public dashboard, first map the underlying data to one or more OGL-Canada / Statistics Canada / ECCC MSC / EIA / NOAA sources\n- If only scraping would work: name the trade explicitly in the proposal (legal + maintenance + AI-citation costs)\n- The attribution checklist ([[attribution-checklist-by-source]]) is non-negotiable on every dashboard\n- The infrastructure stack ([[open-data-ingestion-stack-smb-2026]]) is the default — total realistic cost $5-25/mo at the low end","rationale_body":null,"metadata":null,"links":{"outgoing":[{"slug":"ogl-canada-v2-perpetual-royalty-free-commercial","title":"OGL-Canada v2.0: worldwide royalty-free perpetual licence for commercial use","kind":"reference","scope":"business","link_type":"depends-on"},{"slug":"hiq-v-linkedin-cfaa-public-data-scraping-2022","title":"hiQ v LinkedIn (9th Cir. Apr 2022): scraping publicly accessible data likely doesn't violate CFAA — but hiQ still settled for $500K","kind":"reference","scope":"business","link_type":"depends-on"},{"slug":"meta-v-bright-data-logged-off-scraping-2024","title":"Meta v Bright Data (Jan 2024, N.D. Cal.): Facebook/Instagram terms don't bar logged-off scraping of public data","kind":"reference","scope":"business","link_type":"depends-on"},{"slug":"profound-680m-citations-perplexity-citation-behavior","title":"Profound (Aug 2024-Jun 2025, 680M citations): only 11% domain overlap between ChatGPT and Perplexity; 13.7% between AI Overviews and AI Mode","kind":"reference","scope":"business","link_type":"depends-on"},{"slug":"open-data-ingestion-stack-smb-2026","title":"Reference: open-data ingestion stack for a 1-3 person SMB operation (2026) — under $50/mo realistic","kind":"reference","scope":"business","link_type":"depends-on"},{"slug":"attribution-checklist-by-source","title":"Reference: compliance-grade attribution checklist by open-data source","kind":"reference","scope":"business","link_type":"depends-on"},{"slug":"rule-cite-with-named-source-and-url","title":"RULE: Every non-trivial claim carries a named source with author/institution + date + URL. Confidence flag honest.","kind":"rule","scope":"business","link_type":"relates-to"}],"incoming":[{"slug":"research-brief-public-data-private-moat","title":"Research brief: Public data as a private moat — building proprietary intelligence from government open data (piece 11 of 15)","kind":"reference","scope":"business","link_type":"relates-to"}]},"created_at":"2026-05-22T20:32:45.259Z","updated_at":"2026-05-22T20:32:45.259Z"}