Research brief: The knowledge-base-backed website (piece 3 of 15)
Created 2026-05-22
Status: Research material, not finished article. For Candid Creative KB. Compiled May 22, 2026.
TL;DR
- KB-backed websites separate research from publication by treating sources, claims, and definitions as typed, citable nodes in a structured content graph — Markdown-on-disk with frontmatter + Zod/JSON Schema validation + static-site build is the dominant 2026 pattern (Astro Content Collections, Quartz, Docusaurus, MDX-on-Next.js).
- The strongest institutional exemplars are not personal "digital gardens" but Stripe docs, Our World in Data, the Stanford Encyclopedia of Philosophy, Cochrane Library, OpenAlex, Semantic Scholar, Bellingcat's OSINT toolkit, CourtListener, and Anthropic's Transformer Circuits Thread — all production sites with provenance, dating, and versioning baked into the methodology.
- The empirical AI-citation case is real but narrow: Aggarwal et al. (KDD '24) showed 30-40% lifts from citations/statistics/quotations and a 115.1% lift for rank-5 pages, but
/llms.txtitself sees near-zero AI-bot traffic in measurement; the durable value is structure on the page, not the index file.
The 21 strongest claims to anchor future writing (filed as atomic entries)
See linked entries below. Through-lines: provenance as a discipline (not a feature); the prose-vs-data dichotomy is false; "digital garden" is a B2B liability — reframe as "research-backed knowledge base"; the methodology is itself the article — Candid's KB IS the demonstration.
Caveats (the strongest gaps to acknowledge)
- No rigorous study compares lifetime traffic/lead quality/revenue of KB-backed vs CMS-backed marketing sites controlled for industry. The compounding-value claim rests on analogy (Stripe, OWID, Gwern) and theory, not RCTs.
- The eMarketer 2025 figure (8% / 8.6% overlap with Google top 10) is via secondary sources; underlying Ahrefs methodology not independently audited.
- OtterlyAI's 90-day llms.txt measurement is single-source.
- Princeton GEO study tested 2023-era engines. 2026 engine behavior is unverified.
- "Digital garden" usage warning: collides with the unrelated walled-gardens / platforms critique. Disambiguate if both terms appear in the same piece.
Recommendations to the writer of piece 3
- Lead with institutional examples (Stripe, OWID, SEP), not personal gardens. Personal gardens read as side projects to skeptical B2B buyers.
- Use "research-backed knowledge base" or "documentation-driven website" — never "digital garden" in client-facing copy.
- Cite the Princeton GEO paper with the ACM DOI as the empirical anchor.
- Acknowledge the llms.txt skepticism early — it inoculates against trend-chasing.
- Use Diátaxis as the IA of the article itself.
- Build the article as the first node in the agency KB — cross-link to all related entries.
Related
- reference Diátaxis (Daniele Procida): four documentation types — tutorials, how-to, reference, explanation
- reference Andy Matuschak: evergreen notes — atomic, concept-oriented, densely linked, accreting over time
- reference Stripe docs as a first-class product — Markdoc framework, documentation in performance reviews
- reference Wikipedia verifiability policy: all challenged material must carry an inline citation to a reliable published source
- reference Wikipedia is the #1 cited domain in Google AI Mode (11.22%); YouTube #2 at 9.51%
- reference OtterlyAI: /llms.txt sees 0.1% of AI-bot traffic — performs worse than average content page
- reference Stanford Encyclopedia of Philosophy: every entry has permanent dated archived editions
- reference Our World in Data: CC-BY licensing, per-indicator JSON/CSV endpoints, full GitHub provenance
- reference OpenAlex (271M works) and Semantic Scholar (214M+ works) — open scholarly citation graphs at scale
- reference Astro Content Collections (stable since 2.0, modernized in Content Layer API late 2024): Zod-validated frontmatter, build-time TypeScript types
- reference Princeton GEO paper (Aggarwal et al., KDD '24) — the foundational generative engine optimization study
- rule RULE: Every non-trivial claim carries a named source with author/institution + date + URL. Confidence flag honest.
- rule RULE: Every page ships with a publish date and a last-updated date. Refresh quarterly minimum.
- reference Research brief: Structured content as a competitive advantage (piece 2 of 15)
Referenced by (6)
- reference Research brief: Owning your stack — why agency-managed platforms cost more than they save (piece 4 of 15) relates-to
- reference Reference framework: which website dimensions decay vs compound over 10 years (12-dimension matrix) relates-to
- reference Research brief: Built to Last — why most SMB sites rebuild every 3-4 years (piece 5 of 15) relates-to
- reference Research brief: Research Before Pages — methodology for KB-backed websites (piece 14 of 15) relates-to
- reference CANDID REFERENCE: how the 15-brief foundation roadmap connects — the throughline from strategic frame to editorial layer depends-on
- reference Research brief: Confidence Levels, Sources, and Dated Claims — why every statement on a credible site should be verifiable (piece 15 of 15) relates-to