Research brief: the launch-build technical foundation — what the technology must get right before a new site can be found (June 2026)

reference · Scope: business · Status: current

schema-org core-web-vitals rendering-architecture mobile-first-indexing technical-build-gates client-side-js-indexing-risk

Created 2026-06-25

Summary

TL;DR

A sound technical build is necessary but not sufficient for a new site to be found. Getting the build wrong (blocked crawling, accidental noindex, content trapped behind client-side JavaScript, a broken mobile version) can make a site effectively invisible. Getting the build perfect only clears the bar — it does not, by itself, make the site rank.
The single most important correction: structured data (schema) is eligibility-and-clarity plumbing, NOT a ranking lever. Google states on the record that structured data does not make a site rank better and that there is no special schema required to appear in its AI features (Mueller (Bluesky, April 13, 2025) — "Structured data won't make your site rank better. It's used for displaying the search features listed in Google's documentation", Google AI optimization guide — "Structured data isn't required for generative AI search, and there's no special schema.org markup you need to add").
The one place technical work is genuinely existential at launch is the rendering decision. Server-side rendering or static generation is the safe default; pure client-side JavaScript is a real, evidence-backed indexing risk that falls hardest on new, low-authority sites with no crawl-priority cushion (Vercel + MERJ July 2024 rendering study — analyzed 100,000+ Googlebot fetches; 100% HTML pages rendered; median delay 10s, p75 26s, p90 ~3h, p95 ~6h, p99 ~18h; VENDOR INCENTIVE FLAGGED + high-authority test sites, Onely November 2022 experiment (Ziemek Bućko) — brand-new zero-authority test subdomain; JS-folder page 7 took 313 HOURS vs HTML 36 hours (9x slower); first link 52h vs 25h — different stage (discovery, not render) but the high-authority/low-authority contrast is the point, Rule: SSR or SSG is the DEFAULT for any content that must be indexed; CSR for indexable content is a GAMBLE whose downside lands hardest on new low-authority sites).

The named deliverable: Gate / Hygiene / Overclaimed sort

Every major technical build factor lands in exactly one of three categories. Google does not publish numerical weightings for any of them; where magnitude is unknown, it is stated explicitly. No invented precision.

Hard gates — determine whether the site is crawled/rendered/indexed at all. See Category 1 — HARD GATES: factors that determine whether the new site is crawled, rendered, and indexed AT ALL and the atomic factor entries beneath it.
Efficiency / hygiene — helps Google process the site efficiently; modest effect. See Category 2 — EFFICIENCY / HYGIENE: factors that help Google process the site efficiently with modest effect (NOT differentiators).
Overclaimed — marketed as ranking or visibility advantages; evidence does not support it. See Category 3 — OVERCLAIMED: factors marketed as ranking/visibility advantages where the evidence does NOT support it.

Headline reframings

Schema is eligibility, not ranking (Mueller (Bluesky, April 13, 2025) — "Structured data won't make your site rank better. It's used for displaying the search features listed in Google's documentation", Mueller (December 2021) — "it's fairly rare that you would be able to provide some structured data on a page which gives us unique information that we don't see from the page itself"; Google "won't do anything with structured data that's not visible on a page", Rule: structured data is RICH-RESULT eligibility + entity/clarity + machine-readability — NOT a ranking lever; do not promise a ranking lift to clients). Implement structured data for the rich-result features your pages genuinely map to. Do not promise a ranking lift.

CSR for indexable content is a gamble (Google two-stage processing — crawl raw HTML first; all 200-status pages queued for rendering ("a headless Chromium renders the page and executes the JavaScript") with delays from seconds to hours per Vercel/MERJ, Google rendering — a noindex in the INITIAL HTML is NEVER overridden by JS; the directive is applied as soon as Googlebot sees it; client-side JS cannot undo a server-rendered noindex, Rule: SSR or SSG is the DEFAULT for any content that must be indexed; CSR for indexable content is a GAMBLE whose downside lands hardest on new low-authority sites). The favourable rendering numbers come from high-authority sites (Vercel + MERJ July 2024 rendering study — analyzed 100,000+ Googlebot fetches; 100% HTML pages rendered; median delay 10s, p75 26s, p90 ~3h, p95 ~6h, p99 ~18h; VENDOR INCENTIVE FLAGGED + high-authority test sites); the punishing numbers come from a zero-authority test subdomain (Onely November 2022 experiment (Ziemek Bućko) — brand-new zero-authority test subdomain; JS-folder page 7 took 313 HOURS vs HTML 36 hours (9x slower); first link 52h vs 25h — different stage (discovery, not render) but the high-authority/low-authority contrast is the point). The downside lands hardest on exactly the sites least able to absorb it. Cross-link to the lifecycle brief: Google queues ALL 200-status pages for rendering, JS or not — Splitt: "you don't really see how long it takes us to render, if we render at all, when we render", "Two waves of indexing" — Google's Martin Splitt now calls it an oversimplification; "pretty much every website, when we see them for the first time, goes to rendering" and the waves "play less and less of a role", Google Web Rendering Service (WRS) runs on evergreen Chromium since 2019 — modern JS (ES6+, Web Components, etc.) is supported; previously frozen at Chrome 41.

llms.txt, AI-specific markup, and content chunking do nothing for Google Search, including its generative AI features (Google AI optimization guide (updated June 15, 2026) — llms.txt is "ignored"; AI-specific markup and content chunking are explicitly listed as unnecessary, Rule: do NOT implement llms.txt, AI-specific markup, or content chunking for Google Search — Google explicitly says these do nothing).

Core Web Vitals are tiebreaker-class, not a major lever (Mueller — "We've been pretty clear that Core Web Vitals are not giant factors in ranking"; "relevance is still by far much more important"; CWV is tiebreaker-class, Mueller on chasing perfect Lighthouse / CWV scores — getting "those last few percent… your site's SEO generally won't change because of that", Rule: do NOT chase perfect Lighthouse / Core Web Vitals scores — Mueller: last few percent "won't change" SEO; once in the "good" range, stop). A brand-new site usually has no CrUX field data and the signal does not apply at all (Google CWV at launch — brand-new sites usually have NO CrUX field data (popularity/traffic threshold); Mueller confirms the signal "is not used" without sufficient field data).

Mobile-first means the mobile version IS the indexed version — content, structured data, internal links, metadata must all be present on mobile (Content/structured-data/link parity between mobile and desktop is required — limiting links on the mobile version "can slow down discovery of new pages", Google mobile-first parity — structured data must be present on the MOBILE version; if it exists only on desktop, Google will NOT use it, Google mobile-first parity — titles, descriptions, robots meta must match between mobile and desktop, Rule: mobile parity is MANDATORY — anything missing on mobile (content, links, structured data, metadata) is missing from Google's index; do NOT trim the mobile version).

A stray staging noindex is the most common launch-killer (Google hard gate — when Googlebot sees noindex (meta or X-Robots-Tag), it "will drop that page entirely from Google Search results"; stray staging noindex is the most common launch-killer, Rule: scan for a stray noindex BEFORE launch — meta tag, X-Robots-Tag header, CMS "discourage search engines" toggle — the most common launch-killer).

You cannot force indexing or rank with technical polish alone (Rule: you cannot reliably force Google to index a page — submission tools AID discovery but do not guarantee inclusion or ranking; "Search is never guaranteed" (Mueller)) — see the sister lifecycle brief for the deeper treatment.

Staged playbook

Stage 0 — pre-launch gates (Rule (Stage 0, pre-launch): clear the technical gates or stay invisible — 200, robots.txt not blocking JS/CSS, scan for stray noindex, real <a href> links, SSR/SSG for indexable content, responsive design for mobile parity): 200 OK, robots.txt not blocking JS/CSS, scan for stray noindex, real <a href> links, SSR/SSG for indexable content, responsive design for automatic mobile parity.
Stage 1 — verify (first weeks) (Rule: in the first weeks after launch, validate RENDERED HTML via Search Console URL Inspection — confirm critical content, links, canonical, and structured data survive rendering): URL Inspection live test confirms rendered HTML contains content, links, canonical, structured data; submit sitemap; no orphans.
Stage 2 — hygiene (Rule: implement structured data ONLY for the rich-result features Google documents and your pages genuinely map to; do NOT implement Schema.org types Google doesn't document; validate with the Rich Results Test, Rule: at launch, get Core Web Vitals into the "good" range via lab tools, then STOP — invest more only if/when CrUX field data appears and shows "Poor" AND content/relevance are already competitive): schema only for documented rich-result types; CWV into "good" via lab tools, then stop.
Stage 3 — do NOT do (Rule: do NOT implement llms.txt, AI-specific markup, or content chunking for Google Search — Google explicitly says these do nothing, Rule: do NOT chase perfect Lighthouse / Core Web Vitals scores — Mueller: last few percent "won't change" SEO; once in the "good" range, stop): no llms.txt, no AI-specific markup, no chasing perfect Lighthouse, no treating technical SEO as a growth strategy.

Genuine unknowns

See Genuine unknowns in the launch-build technical foundation — Google publishes no numerical weightings; PageRank-flow magnitude unquantified; render-queue behavior for new low-authority sites under-measured; AI-surface signal weighting undisclosed; CWV thresholds shift: Google publishes no numerical weightings for any technical factor; internal-linking / PageRank-flow magnitude is unquantified; render-queue behavior for new low-authority sites is under-measured; AI-surface signal weighting is not disclosed; CWV thresholds shift.

Source: compass_artifact research document, June 2026. Anchored in Google's documentation, Search Central Blog, on-record statements from John Mueller, Gary Illyes, Martin Splitt, the July 2024 Vercel/MERJ rendering study, and Onely's November 2022 9x JS-discovery experiment.