{"id":1171,"slug":"ai-overview-citation-patterns","title":"AI Overview citation patterns (GEO/AEO)","kind":"reference","scope":"marketing-site","status":"current","audiences":["kevin","claude-code","smb-owner","candid-team"],"topics":["ai-citation","geo","aeo","measurement"],"reference_body":"## Overview\n\n\"AI Overview citation patterns\" denotes the empirical question of which web pages get cited inside generative search surfaces — Google's AI Overviews and AI Mode, ChatGPT's web-grounded answers, Perplexity, Microsoft Copilot, and the smaller Anthropic / DeepSeek / Mistral surfaces — and what content, technical, and entity-graph properties predict that citation. The discipline is variously called Generative Engine Optimization (GEO) and Answer Engine Optimization (AEO); Whitespark's 2026 *Local Search Ranking Factors* report was the first practitioner survey to add \"AI Search Visibility\" as a formal ranking category, marking the moment AI citation moved from a side-bet to a first-class line item in local SEO planning.\n\nThis page consolidates the directly measured evidence as of mid-2026. It draws on one peer-reviewed primary study (Princeton's GEO paper, Aggarwal 2024), one peer-reviewed observational study (Schanbacher's SSRN paper on German real-estate sites), and a set of high-volume vendor analyses (Ahrefs' 17 M-citation freshness study; Seer's 3,119-query CTR study; Profound's 680 M-citation cross-platform overlap analysis; OtterlyAI's 1 M-citation third-party split; Surfer SEO's 36 M-AI-Overview / 46 M-citation tracker; BrightEdge's AI Overview prevalence tracker; Digital Applied's 5,000-site schema audit). It also catalogues the rule-set Candid Creative applies in production — what gets shipped on every client site, what gets refused, and where the honest answer is \"the direct evidence is thin and you should optimize for customers, not for an unverified citation hypothesis.\"\n\nThree things should be carried into any reading of the rest of the page. First, the strongest single primary finding is from the Princeton GEO paper, where **Quotation Addition** lifted AI-response visibility by **+41% on Position-Adjusted Word Count** (PAWC), Statistics Addition by **+31%**, and Cite Sources and Fluency Optimization tied at **+28%**; the only tactic that *hurt* visibility was keyword stuffing, at -8% to -10%. Second, the strongest single revenue translation is Seer's September 2025 study: brands cited *inside* an AI Overview earn **+35% organic CTR and +91% paid CTR** versus uncited brands on the same SERPs, while the AI Overview itself collapses overall organic CTR by 61% and paid CTR by 68%. Third, the strongest single \"do not over-promise\" finding is Dan Taylor's January 2026 Search Engine Land analysis of 107,352 AI-Overview/AI-Mode webpages, which found Core Web Vitals carry only weak negative Spearman correlations with AI visibility (LCP r = -0.12 to -0.18, CLS r = -0.05 to -0.09) — i.e., CWV is a gate, not a growth lever. The honest synthesis is that content patterns (quotations, statistics, citations, freshness, structured BLUF formatting) do most of the work; schema is hygiene; speed is a constraint, not a signal.\n\nSee [[princeton-geo-paper-aggarwal-2024]], [[google-may-2026-ai-optimization-guidance]], [[brightedge-ai-overviews-48pct-feb-2026]], and [[query-fan-out-google-primary]] for the upstream primary sources, and [[top10-organic-ai-citation-decoupling-2026]] for the related decoupling story this page sits alongside.\n\n## The Princeton GEO paper (Aggarwal et al., 2024) — the canonical primary study\n\nThe Princeton GEO paper — Aggarwal et al., *Generative Engine Optimization*, arXiv:2311.09735 (v3) — is the only peer-reviewed primary study that tests discrete content tactics against AI-response visibility under controlled conditions. It evaluates nine tactics across a corpus of synthetic and natural queries, scoring each on two metrics: Position-Adjusted Word Count (PAWC), which weights citation share by where the cited passage appears in the AI response, and a subjective-impression metric. Table 6 of v3 reports the per-tactic lifts on PAWC. See [[princeton-geo-paper-aggarwal-2024]] for the canonical reference card.\n\n### Quotation Addition: +41% on PAWC (the top single tactic)\n\nQuotation Addition — inserting verbatim quoted passages from named external sources — was the **best-performing single tactic** at **+41% on PAWC**. The paper's abstract and §1 state: *\"Including citations, quotations from relevant sources, and statistics can significantly boost source visibility, with an increase of over 40% across various queries.\"* Confidence is Verified (primary read of paper v3).\n\nThere is a circulating attribution error in the SEO industry that credits *Statistics Addition* with the +41% number. The paper itself names **Quotation Addition** as the +41% tactic on PAWC. Candid's internal research-brief 1 flags this conflict explicitly; brief 2 inherits the industry confusion. The canonical reading is: use the paper, not the SEO blogs.\n\n### Statistics Addition: +31% on PAWC (the #2 tactic)\n\nStatistics Addition — inserting concrete numbers, percentages, dates, and currency values into prose — lifted PAWC by **+31%**, the second-best result in the study. The practical implication for KB-shaped content is direct: every claim that *can* carry a number should carry one. Vague qualifiers like \"many businesses\" or \"most of the time\" are precisely the low-lift pattern the paper measured. Confidence: Verified (primary).\n\n### Cite Sources: +28% on PAWC (tied with Fluency Optimization)\n\nCite Sources — adding named external citations comprising author + institution + URL — lifted PAWC by **+28%**. Fluency Optimization (rewriting for readability without changing substantive claims) tied at +28%. Confidence: Verified (primary).\n\nThe proposed mechanism is that AI retrievers extract more reliably from passages that anchor claims to a named institution + date. The signal is the same one Wikipedia's verifiability policy enforces — attribution makes a passage *extractable as a fact*, not as an opinion.\n\n### Keyword Stuffing: -8% to -10% (the only tactic that hurts)\n\nOf the nine tactics tested, **keyword stuffing was the only one that performed worse than baseline**, driving visibility down 8-10%. Practitioners carrying forward 2018-era SEO instincts — keyword density targets, exact-match anchors, semantic stuffing of variants — measurably *reduce* their AI citation eligibility. The discipline is now non-overlapping with old-school SEO on this lever. Confidence: Verified (primary).\n\n### Rank-5 pages gain +115.1% (the biggest interventional upside)\n\nThe paper's strongest leverage finding is that GEO treatments (citation, quotation, statistics addition, in combination) lifted AI-response visibility by **+115.1% for pages ranked around organic position 5**. Position-1 pages saw little change. The intervention compounds where the marginal page sits — the middle of page 1. Confidence: Verified. The implication for Candid Creative client work is that clients ranking #3-#8 organically have the biggest upside from structured / GEO content; clients already ranking #1 should still ship the GEO patterns (insurance + extractability), but the lift is smaller. See [[top10-organic-ai-citation-decoupling-2026]] for the related decoupling story showing that organic rank and AI citation are diverging more broadly.\n\n### Combined lifts\n\nCombined Fluency + Statistics reached +35.8% on PAWC. The paper does not report all pairwise combinations, but the additive read is conservative — Candid's working assumption is that stacking Quotation + Statistics + Cite Sources on the same page is the durable pattern. The cap on combined lift is not measured.\n\n## Freshness as a measured citation signal\n\n### Ahrefs (2025, 17 M citations): AI-cited content is 25.7% fresher on publish date\n\nAhrefs analyzed 17 million AI citations and found AI-cited URLs averaged **1,064 days since publication** versus **1,432 days for organic SERP results** — a **25.7% freshness advantage** on publish date. On last-updated date the advantage narrowed to **13.1%** (909 days vs 1,047). Source: <https://ahrefs.com/blog/do-ai-assistants-prefer-to-cite-fresh-content/>. Confidence: Verified.\n\nA separate \"67% more citations for recently updated pages\" figure has circulated in some 2026 SEO writeups but could not be located in the Ahrefs source. The defensible numbers are 25.7% (publish) and 13.1% (updated) — not 67%. The finding pairs with [[seer-content-recency-2025]]: two independent datasets confirm the recency bias. A `dateModified` field that updates on real content changes is structurally beneficial; a `dateModified` that updates on cosmetic CSS changes is gaming and likely to be discounted.\n\n### Whitehat SEO (2026): 3.2× more citations within 30 days\n\nWhitehat SEO's 2026 cross-platform analysis reports: *\"Content updated within 30 days earns 3.2× more AI citations across platforms.\"* Perplexity is the most freshness-sensitive — **82% citation rate for 30-day content vs 37% for older**. Source: <https://whitehat-seo.co.uk/blog/ai-engines-comparison-citations>. The directional consistency with the Ahrefs and Seer findings is the load-bearing point; the absolute multipliers are vendor-published and should be flagged as such.\n\n### Profound (Aug 2024-Jun 2025, 680 M citations): cross-platform overlap is low\n\nProfound's analysis of **680 million citations** across the Aug 2024-June 2025 window found that **only 11% of domains are cited by both ChatGPT and Perplexity**, and Google AI Overviews and Google AI Mode cite the same URLs only **13.7%** of the time. Source: <https://www.tryprofound.com/blog/ai-platform-citation-patterns>. Confidence: Verified. The strategic read is that citation strategy is not single-target — winning on Perplexity does not automatically win on ChatGPT, and winning on AI Overviews does not automatically win on AI Mode.\n\nCompanion citation-volume data from Qwairy Q3 2025: **Perplexity averages 21.87 citations per response; ChatGPT averages 7.92** (<https://www.qwairy.co/blog/provider-citation-behavior-q3-2025>). The combination — high citation volume, freshness sensitivity, preference for proprietary data — makes Perplexity the platform where a normalized open-data dashboard or original-research page is most likely to be cited.\n\n## Schema and structured data as citation signals — the contested zone\n\nThe evidence on schema is the most internally contradictory part of the 2025-2026 AI-citation literature. Three findings sit in tension; the reconciliation matters for how the question is framed to clients.\n\n### Schanbacher (SSRN 2025): FAQPage and Product schema strongly predict ChatGPT visibility\n\nSchanbacher (SSRN paper id 5641050) studied **1,508 German real-estate agent websites** and found:\n\n- **FAQPage schema:** odds ratio ~**13** for ChatGPT visibility (p<0.001)\n- **Product schema:** odds ratio ~**4** (p<0.001) — sites with Product schema had **17.2% visibility vs 1.8% without** (~10× lift)\n- **Mobile-friendly:** OR ~5.2\n- **robots.txt present:** OR ~3.4\n- **Multi-level headings:** h2 OR ~3.3, h3 OR ~2.3\n\nConfidence: Verified (peer-reviewed) — but **single domain** (real estate) and **single country** (Germany). Generalizability across industries/languages is unverified.\n\n### Ahrefs (April 2026): no AI-citation lift from adding JSON-LD on already-cited pages\n\nIn direct counterpoint, Ahrefs' April 2026 controlled study on 1,885 pages found **no AI-citation lift from adding JSON-LD**; on AI Overviews specifically, treated pages declined 4.6% more than controls. See [[ahrefs-schema-no-citation-lift-2026]].\n\n### Reconciliation: Schanbacher measures acquisition; Ahrefs measures increment\n\nThe two studies are not actually in conflict once the populations are read carefully. Ahrefs measured the *increment* on pages that were already cited 100+ times. Schanbacher measured *acquisition* on a population where many pages had zero AI visibility. Both findings can be true: schema helps you cross from \"invisible\" to \"cited\" but does not help you go from \"cited\" to \"cited more.\" This reading aligns with Suganthan Mohanadasan's \"three lives of schema\" framing — see below.\n\n### Suganthan: schema has three lives — index-time, training-time, query-time\n\nSuganthan Mohanadasan argues schema operates in **three layers**: (1) **index-time** — Google Knowledge Graph entity disambiguation; (2) **training-time** — canonical entity stores that feed next-generation LLMs (Wikidata, Schema.org corpus); (3) **query-time** — real-time fetch by AI agents at the moment of a user query. Single-experiment studies like Ahrefs' April 2026 result only measure layer 3. Layers 1 and 2 compound over years and account for most \"schema is dead\" hot-takes missing the point. Source: <https://suganthan.com/blog/three-lives-of-schema-markup/>. Confidence: Single-source synthesis (analytical framing, not empirical study).\n\nThis is the cleanest counter-frame to \"Ahrefs proved schema is dead.\" When a client cites the Ahrefs result as a reason not to invest in schema, the honest position is: Ahrefs measured one slice; the other slices accumulate value the experiment can't see.\n\n### Digital Applied 5 K-site audit (April 2026): 22% schema validity, r=+0.34 with AI citation\n\nDigital Applied audited 5,000 production sites in April 2026 and found that **71%** deploy at least one schema type, but only **22%** pass Google's Rich Results Test cleanly. Pearson correlation between clean-validation rate and AI-citation frequency: **r = +0.34**. Source: <https://digitalapplied.com/blog/schema-markup-adoption-5k-site-audit-2026>. Confidence: Single-source (vendor-published; methodology not independently audited).\n\nThe implication is that \"has schema\" is a useless metric and \"has *valid* schema\" is the lever. Most installed-base schema is broken — missing required properties, conflicting nested types, scraping-error strings in title fields. The Candid rule is to ship validated schema, not just emit `<script type=\"application/ld+json\">` and hope.\n\n### Google's own guidance: schema is not required for AI Overviews\n\nGoogle's official May 15, 2026 guidance states that schema markup is **not required** for AI Overviews. See [[google-schema-not-required-for-ai]] and [[google-may-2026-ai-optimization-guidance]]. This is consistent with the reconciliation above: schema is not a citation-acquisition cheat code, but it is necessary infrastructure for entity disambiguation, rich-result eligibility, and Knowledge-Graph membership.\n\n### Whitespark 2026: structured data as a formal AI-visibility input\n\nWhitespark's 2026 *Local Search Ranking Factors* report (surveyed 47 local SEO experts, published late 2025 / early 2026) added **\"AI Search Visibility\"** as a formal ranking category for the **first time**. Structured data, consistent citations, and curated list mentions are named as direct AI-visibility inputs. *\"Dedicated page for each service\"* ranks #1 in Local Organic factors and **#2 in AI Visibility factors**. Reputation.com's summary: *\"Structured content — clear GBP fields, accurate website copy, and schema markup — gives AI systems the context needed to generate accurate answers.\"* Source: <https://whitespark.ca/local-search-ranking-factors/>. Confidence: Verified for the category addition; weight estimates (proximity ~55%, GBP ~32%, reviews 16-20%) are practitioner-estimated industry-consensus, not measured experimentally.\n\nThe implication for an SMB clinic, contractor, or professional-services firm: a two-location operation with `LocalBusiness` schema + one page per service + a structured FAQ library will outperform a single-page prose site at AI citation, even at equal word count. Schema is now formally a local-ranking factor, not just an SEO nice-to-have.\n\n### Schanbacher / FAQPage / Product — the practical takeaway\n\nThe single most actionable schema finding for SMB sites is the Schanbacher result on FAQPage and Product. Even with the single-vertical caveat, the odds-ratio magnitude (OR ~13 for FAQPage, OR ~4 for Product) is large enough that the cost-benefit on a well-structured FAQ block is overwhelmingly positive — JSON-LD is non-render-blocking, payload is 1-10 KB per page, and the worst case is \"no measurable lift\" rather than \"harm.\"\n\n## Schema for contractor sites — 2026 Google guidance\n\nTranslated to the contractor vertical specifically, the 2026 Google guidance stack is:\n\n- **`GeneralContractor` / `HomeAndConstructionBusiness` / `LocalBusiness`** — the primary business-entity schema. Pick the most specific applicable type.\n- **`Service` schema per service line** — kitchen renovation, basement finishing, custom home, etc.\n- **`Review` and `AggregateRating`** — subject to Google's authenticity requirements (reviews must be first-party, not third-party-collected).\n- **`Person` schema for principals** — with `sameAs` links to LinkedIn for E-E-A-T credentialing; pair with `hasCredential` for Gold Seal / P.Eng. / PMP.\n- **`FAQPage` schema on service pages** — surfaces in AI Overviews and zero-click features.\n- **`Project` (Article subtype) schema on case studies** — with `datePublished`, `address`, `image`, photographer credit where applicable.\n\nSources: schema.org; Google Search Central. Confidence: Verified for the schema types; Industry-consensus for the contractor-vertical adoption recommendations.\n\nBased on HTTP Archive structured-data tracking, **contractor sites lag the broader local-business average by roughly 15 percentage points** in schema adoption — a directional finding (HTTP Archive 2024/2025 structured-data chapter; not vertical-isolated). With **45%** of consumers now using ChatGPT/AI for local recommendations (BrightLocal LCRS 2026), schema-rich pages are increasingly the ones that get cited. The aphorism: schema-rich content gets cited; HomeStars profile pages do not.\n\nThe complete schema build-out for a Tier-2 Ontario ICI GC client is roughly:\n\n1. `Organization` (root, in layout)\n2. `LocalBusiness` / `GeneralContractor` per office location\n3. `Service` per service-line page\n4. `Person` + `hasCredential` per team-bio\n5. `FAQPage` per service page\n6. `Article` (Project subtype) per case study\n7. `BreadcrumbList` site-wide\n8. `Organization` + `memberOf` for HBA affiliations on the `/affiliations` page\n\n## The technical preconditions — server-render and crawler access\n\nThe single largest hidden barrier to AI citation is not content quality; it is technical reachability. OtterlyAI's 1 M-citation study found **73% of audited sites have technical barriers** (robots.txt blocks, JS-only rendering) preventing AI crawler access. The fix is non-optional; it is the floor below which content-level optimization does nothing.\n\n### Server-render rule\n\nEvery page of every Candid client site must serve **rendered HTML at the URL** — either statically generated (Next.js export, Astro, Hugo) or server-side rendered (Next.js App Router with server components, classic SSR). **Never** ship a single-page-app shell where content arrives via client-side JavaScript.\n\nAI crawlers (GPTBot, ClaudeBot, PerplexityBot, OAI-SearchBot) **do not execute JavaScript** at scale. See [[ai-crawlers-do-not-execute-js]]. A site that requires JS to render its content is **invisible to AI engines** — and increasingly invisible to AI-augmented search overall, since 48% of queries now trigger an AI Overview. See [[brightedge-ai-overviews-48pct-feb-2026]].\n\nCandid's default stack is Next.js 15 App Router with server components — every page renders to HTML at build time or on the server. Client components (`\"use client\"`) are reserved for interactivity; the *content* is always in the server-rendered tree. For non-Next.js client sites, Candid migrates to a static-generation tool before any AI-visibility work begins. There is no exception.\n\n### Schema as hygiene, not growth lever\n\nCandid's working rule: always ship clean, validated schema (`LocalBusiness`, `Organization`, `Service`, `Product`, `Article`, `BreadcrumbList`, `FAQPage` where genuinely applicable) on every client site. **Never** position schema to a client as the lever that will move AI citations on its own.\n\nThree independent findings converge on this stance: Google's May 15, 2026 guidance says schema is *not required* for AI Overviews ([[google-schema-not-required-for-ai]]); Ahrefs' April 2026 controlled study on 1,885 pages found no AI-citation lift from adding JSON-LD, with treated pages declining 4.6% more than controls on AI Overviews ([[ahrefs-schema-no-citation-lift-2026]]); but the Schanbacher peer-reviewed real-estate study shows Product schema correlates with 10× ChatGPT visibility on *acquisition*, and Whitespark 2026 added AI Search Visibility as a formal category with structured data named as a direct input. Schema is **necessary infrastructure** for entity disambiguation, rich results, and Knowledge Graph membership. It is **not a citation hack**. The citation lever is the *content patterns* above schema: quotations, statistics, citations, freshness, comprehensive topic coverage.\n\nHow Candid applies it: every client site emits validated schema by default — generated programmatically from the same data source as the visible content, so the two never drift. Schema cost is essentially zero (JSON-LD is non-render-blocking, 1-10 KB per page). The argument made to clients is \"this is the foundation; the citation work lives on top.\"\n\n### llms.txt — skeptical\n\nOtterlyAI's 90-day measurement reported **84 of 62,100 AI-bot requests (0.1%)** targeted /llms.txt files — *worse* than an average content page on the same domains. Source: Kai Spriestersbach, \"The llms.txt is dead,\" Medium. Confidence: Single-source, corroborated qualitatively by reports that **Google added llms.txt to its docs in December 2024 then removed them within 24 hours**.\n\nThe /llms.txt proposal was published by Jeremy Howard (Answer.AI) on September 3, 2024 — a curated Markdown file at the website root specifically for LLM retrieval (<https://answer.ai/posts/2024-09-03-llmstxt.html>). Mintlify rolled it out across all hosted docs sites in November 2024, making thousands of sites llms.txt-aware \"practically overnight.\" The traffic data has not followed.\n\nCandid's stance: don't ship llms.txt as a citation strategy. If you ship it at all, ship it as a courtesy — the durable value is structured content on the actual pages, not the index file. Candid's research briefs recommend acknowledging this skepticism early to \"inoculate against trend-chasing.\"\n\n## Who gets cited — the third-party / first-party split\n\n### OtterlyAI (Sept 2025): community platforms capture 52.5% of citations\n\nOtterlyAI's analysis of 1 M+ AI citations (Jan-Sept 2025) across ChatGPT, Perplexity, and Google AI Overviews found:\n\n- **52.5%** of citations go to community platforms (Reddit, Quora)\n- **47.5%** go to brand domains\n- **73%** of audited sites have technical barriers preventing AI crawler access\n\nSource: <https://otterly.ai/blog/the-ai-citations-report-2026/>. Confidence: Verified (vendor-published; methodology disclosed).\n\nIndustry headline framing — *\"AI Search Engines Depend 95% on Third-Party Sources\"* — is rhetorically inflated. The actual split is closer to 50/50, but the 47.5% brand share is concentrated among large publishers, not SMB sites.\n\nThe strategic implication is a two-track posture. (1) Make the client's own site AI-crawlable (server-render rule, schema hygiene, freshness signals). (2) Build presence on Reddit, Quora, YouTube, and authoritative third-party publications, because that is where half the citations live.\n\n### Wikipedia, YouTube, Reddit: the top-cited domains\n\nAhrefs' Q1 2026 cross-platform data places Wikipedia as the **#1 cited domain in Google AI Mode** at **11.22%** of all tracked mentions; YouTube is #2 at **9.51%**. Surfer SEO Tracker (March-Aug 2025, 36 M AI Overviews, 46 M citations) reports YouTube ~23.3%, Wikipedia ~18.4%, Google.com ~16.4%. Sources: Surfer SEO (<https://surferseo.com/blog/ai-citation-report/>); The Digital Bloom (<https://thedigitalbloom.com/learn/google-ai-overviews-top-cited-domains-2025/>); Ahrefs Q1 2026. Confidence: Industry-consensus.\n\nThe common thread among top-cited domains is **structured, neutral, schema-rich content with explicit citation chains**. Wikipedia operationalizes verifiability; YouTube's metadata (description, chapters, transcripts) is structurally extractable; Reddit's thread structure surfaces atomic Q&A. This is the empirical case for KB-shaped content: dense, neutral, encyclopedic, factually attributed, structurally extractable.\n\n### Profound: cross-platform fragmentation\n\nThe 11% ChatGPT/Perplexity domain overlap and 13.7% AI Overviews / AI Mode URL overlap from the 680 M-citation Profound dataset mean that citation strategy cannot be single-target. A page that wins on ChatGPT (favors Wikipedia / authoritative-publisher patterns) will often *not* win on Perplexity (favors Reddit / community / proprietary-data patterns) without separate effort.\n\nPerplexity in particular favours *\"visible statistics and proprietary data, named sources with verifiable methodology\"* (Leapd 2026). A normalized open-data dashboard surfaced as a public page hits all three — visible statistics, proprietary normalization, named methodology — which is one of the reasons \"build on official open data, not scraping\" is a Candid pattern.\n\n## Citation behaviour by platform\n\n| Platform | Avg. citations / response | Notable bias |\n|---|---|---|\n| Perplexity | 21.87 (Qwairy Q3 2025) | Freshness-sensitive — 82% citation rate for 30-day content vs 37% for older (Whitehat SEO 2026). Reddit-heavy. |\n| ChatGPT | 7.92 (Qwairy Q3 2025) | Wikipedia-heavy. Often cites pages ranking at organic position 21+ in related Google queries (~90% of the time, per Semrush). |\n| Google AI Overviews | (volume varies) | 54.5% of citations match top organic URLs (BrightEdge Oct 2025, up from 32% in 2024); overlap >75% in YMYL sectors. |\n| Google AI Mode | (volume varies) | Wikipedia #1 at 11.22%; YouTube #2 at 9.51% (Ahrefs Q1 2026). |\n| Microsoft Copilot | (not tracked here) | Bing-aligned. |\n\nThe honest claim is that platform-specific source preferences are real and durable: Reddit-heavy for Perplexity, Wikipedia-heavy for ChatGPT, Bing-aligned for Microsoft Copilot.\n\n## Direct evidence on technical signals — what is and is not a lever\n\n### Core Web Vitals are a gate, not a signal of excellence\n\nDan Taylor (SALT.agency), writing in Search Engine Land on January 13, 2026, analyzed **n=107,352 webpages** in AI Overviews / AI Mode. Spearman correlations: **LCP r = -0.12 to -0.18 (weak negative); CLS r = -0.05 to -0.09**. Verbatim conclusion: *\"Core Web Vitals do not act as a growth lever for AI visibility. They act as a constraint. Good performance does not create an advantage. Severe failure creates disadvantage… Core Web Vitals are therefore best understood as a gate, not a signal of excellence.\"*\n\nConfidence on the Taylor analysis: **LOW** — no methodology disclosure, no published dataset, single-contributor analysis on a Semrush-owned property. The directional read (CWV as gate) is consistent with broader evidence but the magnitudes should not be quoted as load-bearing.\n\n### Indirect evidence (moderate): overlap with organic top-10\n\n- **seoClarity:** *\"a whopping 99.5% of the time one or more of the top 10 web results was included in the AI Overview's sources.\"*\n- **Ahrefs (July 2025):** *\"76.1% of URLs cited in AI Overviews also rank in the top 10 of Google search results.\"*\n- **Semrush AI Mode Comparison (2025, n=5,000 keywords):** Perplexity has *\"over 91% domain overlap and 82% URL overlap\"* with Google's top 10; ChatGPT has the weakest overlap. A separate Semrush study found ChatGPT cites pages ranking at position 21+ in related Google queries ~90% of the time.\n- **BrightEdge (Oct 2025):** 54.5% of citations in AI Overviews match top organic URLs (up from 32% in 2024); overlap >75% in YMYL sectors.\n\nThe honest answer for clients: optimize performance for customers and conversion, not for hypothetical AI citation gains. Citation behaviour is driven by content quality, domain authority, freshness, structured BLUF formatting, and platform-specific source preferences. **Speed is not a known direct lever; it is at most an indirect floor via organic rank.**\n\n## Revenue translation — what AI citation is worth\n\n### Seer (Sept 2025): -61% organic CTR from AI Overviews, +35% for cited brands\n\nSeer Interactive's September 2025 study of **3,119 queries / 42 organizations / 25.1 M organic impressions** found:\n\n- Organic CTR fell **61%** (1.76% → 0.61%) when an AI Overview appeared on a SERP.\n- Paid CTR fell **68%**.\n- But brands cited *inside* an AI Overview earned **+35% organic clicks and +91% paid clicks** versus uncited brands on the *same* SERPs.\n\nSource: Seer Interactive, September 2025 — <https://www.seerinteractive.com/insights/study-ai-brand-visibility-and-content-recency>; Inc.com analysis. Confidence: Verified.\n\nThe strategic implication for client books: the CTR collapse from AI Overviews is real (-61% organic). The halo from being cited *inside* one is also real (+35% / +91%). The middle position — visible on the SERP but not cited in the AI Overview — is the worst place to be. This is the strongest \"AI visibility matters in revenue terms\" claim available in the 2025-2026 literature.\n\n### AI Overview prevalence\n\nBrightEdge's tracking reports that **48% of queries now trigger an AI Overview as of February 2026**. See [[brightedge-ai-overviews-48pct-feb-2026]]. The CTR-collapse / cited-brand-halo dynamic is therefore live on roughly half of all commercial SERPs and is the dominant revenue story underneath the citation question.\n\n### Whitespark 2026: AI Search Visibility as a formal category\n\nThe Whitespark 2026 addition of AI Search Visibility as a formal category is itself a revenue signal — the practitioners closest to local-SEO budgets are treating AI citation as a budgeted line item, not a side experiment. *\"Dedicated page for each service\"* ranks #2 in AI Visibility factors, behind only proximity-style signals. The directional read is that the IA pattern most aligned with AI citation is also the pattern most aligned with local-organic ranking — there is no large trade-off between optimizing for one and the other.\n\n## Vertical signal — legal as a leading-edge AI adopter\n\nClio's *Legal Trends* reports show AI adoption among legal professionals accelerating sharply: **19% in 2023 → 79% in October 2024 → 93% among mid-sized firms in April 2025**. Clio's CEO Jack Newton: *\"AI has reached the level of adoption the cloud took a decade to obtain.\"* Companion finding: **64% of mid-sized firms offer flat-fee billing** (specifically flat-fee, not all alternative fee structures).\n\nSources: <https://www.clio.com/about/press/clio-latest-legal-trends-report/>; Clio 2025 *Legal Trends for Mid-Sized Law Firms* (April 2025). Confidence: Verified — **vendor-published, flag**. Clio has a commercial interest in legal-tech adoption stats, but the direction is consistent across other 2024-25 legal-tech surveys.\n\nThe relevance to AI citation: high-AI-adoption verticals are also the verticals where end-users are most likely to use AI tools to find professional-services providers, which is why the citation pattern matters earliest and most acutely in legal. Flat-fee billing is structurally different from hourly billing — you cannot price a flat fee without historical data on per-matter cost. The 64% flat-fee figure signals that mid-sized law firms are doing the engagement-profitability analysis that vertical SaaS supports but few firms historically had the structured data for. The data layer + AI overlay is what makes flat-fee viable, and the citation question (does the firm's page get surfaced when a prospect asks ChatGPT \"best Toronto employment lawyer for severance review\") sits directly on top of that data-layer maturity.\n\n## Query Fan-Out — Google's primary retrieval mechanism\n\nUnderneath all of the above sits Google's *Query Fan-Out* — the mechanism by which a single user prompt is decomposed into many sub-queries that each pull candidate URLs, with the final AI response synthesized from the union. See [[query-fan-out-google-primary]] for the reference card. The practical reading is that \"rank for the query\" is a 2010s frame; \"be surfaced by at least one of the fan-out sub-queries\" is the 2026 frame. Comprehensive topic coverage — multiple sub-pages per service, FAQ blocks for adjacent intents, glossary entries for entity disambiguation — is the IA pattern that wins fan-out. This is the structural reason the KB-shaped site beats the single-prose-page site at equal word count, and the structural reason Whitespark's *\"dedicated page for each service\"* ranks #2 in AI Visibility factors.\n\n## What the direct evidence does and does not say\n\nThere is a fair-minded synthesis in late 2026 industry reporting that the direct, controlled evidence on AI search citation remains thin. The Princeton GEO paper is the only primary peer-reviewed study; Schanbacher is peer-reviewed but single-vertical / single-country; the rest is vendor-published observational analysis with disclosed methodology but variable sample composition and zero reproducibility infrastructure. See [[ai-search-citation-direct-evidence-thin-2026]] and [[top10-organic-ai-citation-decoupling-2026]] for the broader honesty note.\n\nThe defensible operational position:\n\n1. **Ship the content patterns.** Quotations, statistics, citations, freshness, BLUF formatting, comprehensive topic coverage — these are the primary-study-backed levers (Aggarwal 2024) and they also serve human readers.\n2. **Ship the technical floor.** Server-render, validated schema, crawler access, fresh `dateModified` on real changes — these prevent disqualification and accumulate value in schema's index-time and training-time layers.\n3. **Build the third-party surface.** Reddit / YouTube / authoritative-publisher presence — because half of AI citations live there.\n4. **Do not over-promise.** Speed is a gate, not a lever. Schema is hygiene, not a hack. llms.txt is courtesy, not a strategy. Direct evidence on most \"AI optimization\" claims is thinner than the trade press suggests.\n5. **Measure where measurement is possible.** Seer's -61% / +35% / +91% framing is the closest thing to a defensible revenue case. Track cited-vs-uncited share on the client's own commercial queries, not abstract \"AI visibility scores.\"\n\n## Sources and confidence\n\n- **Verified (primary peer-reviewed):** Aggarwal et al., *Generative Engine Optimization*, arXiv:2311.09735 v3, Table 6 (Quotation Addition +41%, Statistics +31%, Cite Sources +28%, Fluency +28%, Combined Fluency+Statistics +35.8%, Keyword Stuffing -8 to -10%, Rank-5 pages +115.1%). Schanbacher, *The Impact of JSON-LD Metadata on ChatGPT Visibility*, SSRN paper id 5641050 (FAQPage OR ~13, Product OR ~4, mobile OR ~5.2, robots.txt OR ~3.4, h2 OR ~3.3, h3 OR ~2.3) — single-vertical, single-country caveat.\n- **Verified (vendor-published, methodology disclosed):** Ahrefs 17 M-citation freshness study (25.7% / 13.1%); Ahrefs Q1 2026 cross-platform domain data (Wikipedia 11.22%, YouTube 9.51% in AI Mode); Seer Interactive Sept 2025 (n=3,119 queries / 42 orgs / 25.1 M impressions; -61% organic CTR, -68% paid CTR, +35% / +91% for cited brands); Profound 680 M-citation study (11% ChatGPT/Perplexity overlap, 13.7% AI Overviews / AI Mode overlap); OtterlyAI 1 M-citation analysis (52.5% community, 47.5% brand, 73% with technical barriers); Surfer SEO Tracker (36 M AI Overviews / 46 M citations); BrightEdge Oct 2025 (54.5% overlap with top organic, up from 32%); Qwairy Q3 2025 (Perplexity 21.87 / ChatGPT 7.92 citations per response).\n- **Verified — Google primary:** Google May 15, 2026 AI optimization guidance (\"schema not required for AI Overviews\"); BrightEdge Feb 2026 (48% of queries trigger an AI Overview); Whitespark 2026 *Local Search Ranking Factors* (AI Search Visibility added as formal category; n=47 expert survey).\n- **Verified — vendor-published, flag commercial interest:** Clio *Legal Trends* (AI adoption 19% → 79% → 93%; 64% flat-fee among mid-sized firms); Whitehat SEO 2026 (3.2× freshness multiplier; Perplexity 82% vs 37% citation rate by recency).\n- **Industry-consensus:** Surfer SEO and The Digital Bloom top-cited-domain summaries; Suganthan Mohanadasan three-lives-of-schema framing (single-source synthesis); contractor-vertical schema build-out (schema.org + Google Search Central + HTTP Archive directional adoption data); Reputation.com summary of Whitespark structured-content quote.\n- **Single-source / Directional:** Digital Applied 5 K-site audit (71% deploy schema, 22% valid, r=+0.34 — vendor, methodology not independently audited); OtterlyAI llms.txt 0.1% traffic measurement; Dan Taylor / SALT.agency CWV analysis on n=107,352 webpages (LCP r=-0.12 to -0.18, CLS r=-0.05 to -0.09 — no methodology disclosure, no published dataset, single-contributor on Semrush-owned property).\n- **Corrections / rejected industry framings:** Princeton +41% credited to Statistics Addition in many SEO blogs — the paper itself names Quotation Addition. Ahrefs \"67% more citations for recently updated pages\" cannot be located in the source; use 25.7% (publish) / 13.1% (updated). OtterlyAI \"AI Search Engines Depend 95% on Third-Party Sources\" framing is rhetorically inflated; the actual split is 52.5% / 47.5%.\n\nSee [[princeton-geo-paper-aggarwal-2024]], [[seer-content-recency-2025]], [[ahrefs-schema-no-citation-lift-2026]], [[google-schema-not-required-for-ai]], [[google-may-2026-ai-optimization-guidance]], [[ai-crawlers-do-not-execute-js]], [[top10-organic-ai-citation-decoupling-2026]], [[brightedge-ai-overviews-48pct-feb-2026]], and [[query-fan-out-google-primary]] for the upstream reference cards.","rationale_body":"Consolidated topic page absorbing 20 atomic source entries per KB-CONSOLIDATION-PLAN.md (2026-06-11).","metadata":{"kb_role":"topic","word_count":5276,"last_updated":"2026-06-11","absorbed_count":20},"links":{"outgoing":[],"incoming":[]},"created_at":"2026-06-11T13:50:19.508Z","updated_at":"2026-06-11T13:50:19.508Z"}