{"id":29,"slug":"extractability-paragraph-shape-40-60-words","title":"Extractability: a quotable paragraph leads with the answer, is 40-60 words, lives under semantic HTML, and names entities concretely","kind":"reference","scope":"business","status":"current","audiences":["claude-code","candid-team"],"topics":["geo","content-architecture","extractability"],"reference_body":"**Synthesis** of GEO paper + Ahrefs content-helper findings + RAG chunking literature + the Digital Bloom 2025 report:\n\nA paragraph is extractable when it:\n1. Leads with the answer, not the setup\n2. Is self-contained (a reader landing here understands it without prior context)\n3. Names entities concretely (proper nouns, dates, statistics, prices)\n4. Cites a source with a named author or institution\n5. Lives in clear semantic HTML — under a meaningful `<h2>`/`<h3>`, not a carousel `<div>`\n6. Is **40-60 words** (matches featured-snippet research showing 45-word paragraph snippets appear most frequently)\n7. Includes a statistic or a direct quotation\n\nA paragraph is invisible when it: hedges (\"could\", \"may\"); references \"above\" / \"below\" (chunkers strip context); names entities vaguely (\"several companies say…\"); lives in a JS-rendered component or PDF without text fallback; duplicates across many pages without unique facts.\n\n**Sources:** Princeton GEO paper (see [[princeton-geo-paper-aggarwal-2024]]); Digital Bloom 2025 AI Citation & LLM Visibility Report; averi.ai LLM-Optimized Content Structures guide; HiChunk / W-RAC / hierarchical segmentation studies (arXiv 2024-2026).\n\n**Confidence:** Industry-consensus.\n\n**Used in:** [[rule-lead-with-answer-40-60-words]].","rationale_body":null,"metadata":null,"links":{"outgoing":[{"slug":"princeton-geo-paper-aggarwal-2024","title":"Princeton GEO paper (Aggarwal et al., KDD '24) — the foundational generative engine optimization study","kind":"reference","scope":"business","link_type":"relates-to"}],"incoming":[{"slug":"rule-lead-with-answer-40-60-words","title":"RULE: Lead paragraphs with the direct answer. Aim for 40–60 words. Make every paragraph self-contained.","kind":"rule","scope":"business","link_type":"depends-on"}]},"created_at":"2026-05-22T18:57:39.589Z","updated_at":"2026-05-22T18:57:39.589Z"}