Research brief: The knowledge-base-backed website (piece 3 of 15)

Status: Research material, not finished article. For Candid Creative KB. Compiled May 22, 2026.

TL;DR

  • KB-backed websites separate research from publication by treating sources, claims, and definitions as typed, citable nodes in a structured content graph — Markdown-on-disk with frontmatter + Zod/JSON Schema validation + static-site build is the dominant 2026 pattern (Astro Content Collections, Quartz, Docusaurus, MDX-on-Next.js).
  • The strongest institutional exemplars are not personal "digital gardens" but Stripe docs, Our World in Data, the Stanford Encyclopedia of Philosophy, Cochrane Library, OpenAlex, Semantic Scholar, Bellingcat's OSINT toolkit, CourtListener, and Anthropic's Transformer Circuits Thread — all production sites with provenance, dating, and versioning baked into the methodology.
  • The empirical AI-citation case is real but narrow: Aggarwal et al. (KDD '24) showed 30-40% lifts from citations/statistics/quotations and a 115.1% lift for rank-5 pages, but /llms.txt itself sees near-zero AI-bot traffic in measurement; the durable value is structure on the page, not the index file.

The 21 strongest claims to anchor future writing (filed as atomic entries)

See linked entries below. Through-lines: provenance as a discipline (not a feature); the prose-vs-data dichotomy is false; "digital garden" is a B2B liability — reframe as "research-backed knowledge base"; the methodology is itself the article — Candid's KB IS the demonstration.

Caveats (the strongest gaps to acknowledge)

  • No rigorous study compares lifetime traffic/lead quality/revenue of KB-backed vs CMS-backed marketing sites controlled for industry. The compounding-value claim rests on analogy (Stripe, OWID, Gwern) and theory, not RCTs.
  • The eMarketer 2025 figure (8% / 8.6% overlap with Google top 10) is via secondary sources; underlying Ahrefs methodology not independently audited.
  • OtterlyAI's 90-day llms.txt measurement is single-source.
  • Princeton GEO study tested 2023-era engines. 2026 engine behavior is unverified.
  • "Digital garden" usage warning: collides with the unrelated walled-gardens / platforms critique. Disambiguate if both terms appear in the same piece.

Recommendations to the writer of piece 3

  1. Lead with institutional examples (Stripe, OWID, SEP), not personal gardens. Personal gardens read as side projects to skeptical B2B buyers.
  2. Use "research-backed knowledge base" or "documentation-driven website" — never "digital garden" in client-facing copy.
  3. Cite the Princeton GEO paper with the ACM DOI as the empirical anchor.
  4. Acknowledge the llms.txt skepticism early — it inoculates against trend-chasing.
  5. Use Diátaxis as the IA of the article itself.
  6. Build the article as the first node in the agency KB — cross-link to all related entries.