Research brief: the launch-build technical foundation — what the technology must get right before a new site can be found (June 2026)
Summary
TL;DR
A sound technical build is necessary but not sufficient for a new site to be found. Getting the build wrong (blocked crawling, accidental
noindex, content trapped behind client-side JavaScript, a broken mobile version) can make a site effectively invisible. Getting the build perfect only clears the bar — it does not, by itself, make the site rank.The single most important correction: structured data (schema) is eligibility-and-clarity plumbing, NOT a ranking lever. Google states on the record that structured data does not make a site rank better and that there is no special schema required to appear in its AI features (Mueller (Bluesky, April 13, 2025) — "Structured data won't make your site rank better. It's used for displaying the search features listed in Google's documentation", Google AI optimization guide — "Structured data isn't required for generative AI search, and there's no special schema.org markup you need to add").
The one place technical work is genuinely existential at launch is the rendering decision. Server-side rendering or static generation is the safe default; pure client-side JavaScript is a real, evidence-backed indexing risk that falls hardest on new, low-authority sites with no crawl-priority cushion (Vercel + MERJ July 2024 rendering study — analyzed 100,000+ Googlebot fetches; 100% HTML pages rendered; median delay 10s, p75 26s, p90 ~3h, p95 ~6h, p99 ~18h; VENDOR INCENTIVE FLAGGED + high-authority test sites, Onely November 2022 experiment (Ziemek Bućko) — brand-new zero-authority test subdomain; JS-folder page 7 took 313 HOURS vs HTML 36 hours (9x slower); first link 52h vs 25h — different stage (discovery, not render) but the high-authority/low-authority contrast is the point, Rule: SSR or SSG is the DEFAULT for any content that must be indexed; CSR for indexable content is a GAMBLE whose downside lands hardest on new low-authority sites).
The named deliverable: Gate / Hygiene / Overclaimed sort
Every major technical build factor lands in exactly one of three categories. Google does not publish numerical weightings for any of them; where magnitude is unknown, it is stated explicitly. No invented precision.
- Hard gates — determine whether the site is crawled/rendered/indexed at all. See Category 1 — HARD GATES: factors that determine whether the new site is crawled, rendered, and indexed AT ALL and the atomic factor entries beneath it.
- Efficiency / hygiene — helps Google process the site efficiently; modest effect. See Category 2 — EFFICIENCY / HYGIENE: factors that help Google process the site efficiently with modest effect (NOT differentiators).
- Overclaimed — marketed as ranking or visibility advantages; evidence does not support it. See Category 3 — OVERCLAIMED: factors marketed as ranking/visibility advantages where the evidence does NOT support it.
Headline reframings
Schema is eligibility, not ranking (Mueller (Bluesky, April 13, 2025) — "Structured data won't make your site rank better. It's used for displaying the search features listed in Google's documentation", Mueller (December 2021) — "it's fairly rare that you would be able to provide some structured data on a page which gives us unique information that we don't see from the page itself"; Google "won't do anything with structured data that's not visible on a page", Rule: structured data is RICH-RESULT eligibility + entity/clarity + machine-readability — NOT a ranking lever; do not promise a ranking lift to clients). Implement structured data for the rich-result features your pages genuinely map to. Do not promise a ranking lift.
CSR for indexable content is a gamble (Google two-stage processing — crawl raw HTML first; all 200-status pages queued for rendering ("a headless Chromium renders the page and executes the JavaScript") with delays from seconds to hours per Vercel/MERJ, Google rendering — a noindex in the INITIAL HTML is NEVER overridden by JS; the directive is applied as soon as Googlebot sees it; client-side JS cannot undo a server-rendered noindex, Rule: SSR or SSG is the DEFAULT for any content that must be indexed; CSR for indexable content is a GAMBLE whose downside lands hardest on new low-authority sites). The favourable rendering numbers come from high-authority sites (Vercel + MERJ July 2024 rendering study — analyzed 100,000+ Googlebot fetches; 100% HTML pages rendered; median delay 10s, p75 26s, p90 ~3h, p95 ~6h, p99 ~18h; VENDOR INCENTIVE FLAGGED + high-authority test sites); the punishing numbers come from a zero-authority test subdomain (Onely November 2022 experiment (Ziemek Bućko) — brand-new zero-authority test subdomain; JS-folder page 7 took 313 HOURS vs HTML 36 hours (9x slower); first link 52h vs 25h — different stage (discovery, not render) but the high-authority/low-authority contrast is the point). The downside lands hardest on exactly the sites least able to absorb it. Cross-link to the lifecycle brief: Google queues ALL 200-status pages for rendering, JS or not — Splitt: "you don't really see how long it takes us to render, if we render at all, when we render", "Two waves of indexing" — Google's Martin Splitt now calls it an oversimplification; "pretty much every website, when we see them for the first time, goes to rendering" and the waves "play less and less of a role", Google Web Rendering Service (WRS) runs on evergreen Chromium since 2019 — modern JS (ES6+, Web Components, etc.) is supported; previously frozen at Chrome 41.
llms.txt, AI-specific markup, and content chunking do nothing for Google Search, including its generative AI features (Google AI optimization guide (updated June 15, 2026) — llms.txt is "ignored"; AI-specific markup and content chunking are explicitly listed as unnecessary, Rule: do NOT implement llms.txt, AI-specific markup, or content chunking for Google Search — Google explicitly says these do nothing).
Core Web Vitals are tiebreaker-class, not a major lever (Mueller — "We've been pretty clear that Core Web Vitals are not giant factors in ranking"; "relevance is still by far much more important"; CWV is tiebreaker-class, Mueller on chasing perfect Lighthouse / CWV scores — getting "those last few percent… your site's SEO generally won't change because of that", Rule: do NOT chase perfect Lighthouse / Core Web Vitals scores — Mueller: last few percent "won't change" SEO; once in the "good" range, stop). A brand-new site usually has no CrUX field data and the signal does not apply at all (Google CWV at launch — brand-new sites usually have NO CrUX field data (popularity/traffic threshold); Mueller confirms the signal "is not used" without sufficient field data).
Mobile-first means the mobile version IS the indexed version — content, structured data, internal links, metadata must all be present on mobile (Content/structured-data/link parity between mobile and desktop is required — limiting links on the mobile version "can slow down discovery of new pages", Google mobile-first parity — structured data must be present on the MOBILE version; if it exists only on desktop, Google will NOT use it, Google mobile-first parity — titles, descriptions, robots meta must match between mobile and desktop, Rule: mobile parity is MANDATORY — anything missing on mobile (content, links, structured data, metadata) is missing from Google's index; do NOT trim the mobile version).
A stray staging noindex is the most common launch-killer (Google hard gate — when Googlebot sees noindex (meta or X-Robots-Tag), it "will drop that page entirely from Google Search results"; stray staging noindex is the most common launch-killer, Rule: scan for a stray noindex BEFORE launch — meta tag, X-Robots-Tag header, CMS "discourage search engines" toggle — the most common launch-killer).
You cannot force indexing or rank with technical polish alone (Rule: you cannot reliably force Google to index a page — submission tools AID discovery but do not guarantee inclusion or ranking; "Search is never guaranteed" (Mueller)) — see the sister lifecycle brief for the deeper treatment.
Staged playbook
- Stage 0 — pre-launch gates (Rule (Stage 0, pre-launch): clear the technical gates or stay invisible — 200, robots.txt not blocking JS/CSS, scan for stray
noindex, real<a href>links, SSR/SSG for indexable content, responsive design for mobile parity): 200 OK, robots.txt not blocking JS/CSS, scan for straynoindex, real<a href>links, SSR/SSG for indexable content, responsive design for automatic mobile parity. - Stage 1 — verify (first weeks) (Rule: in the first weeks after launch, validate RENDERED HTML via Search Console URL Inspection — confirm critical content, links, canonical, and structured data survive rendering): URL Inspection live test confirms rendered HTML contains content, links, canonical, structured data; submit sitemap; no orphans.
- Stage 2 — hygiene (Rule: implement structured data ONLY for the rich-result features Google documents and your pages genuinely map to; do NOT implement Schema.org types Google doesn't document; validate with the Rich Results Test, Rule: at launch, get Core Web Vitals into the "good" range via lab tools, then STOP — invest more only if/when CrUX field data appears and shows "Poor" AND content/relevance are already competitive): schema only for documented rich-result types; CWV into "good" via lab tools, then stop.
- Stage 3 — do NOT do (Rule: do NOT implement
llms.txt, AI-specific markup, or content chunking for Google Search — Google explicitly says these do nothing, Rule: do NOT chase perfect Lighthouse / Core Web Vitals scores — Mueller: last few percent "won't change" SEO; once in the "good" range, stop): nollms.txt, no AI-specific markup, no chasing perfect Lighthouse, no treating technical SEO as a growth strategy.
Genuine unknowns
See Genuine unknowns in the launch-build technical foundation — Google publishes no numerical weightings; PageRank-flow magnitude unquantified; render-queue behavior for new low-authority sites under-measured; AI-surface signal weighting undisclosed; CWV thresholds shift: Google publishes no numerical weightings for any technical factor; internal-linking / PageRank-flow magnitude is unquantified; render-queue behavior for new low-authority sites is under-measured; AI-surface signal weighting is not disclosed; CWV thresholds shift.
Source: compass_artifact research document, June 2026. Anchored in Google's documentation, Search Central Blog, on-record statements from John Mueller, Gary Illyes, Martin Splitt, the July 2024 Vercel/MERJ rendering study, and Onely's November 2022 9x JS-discovery experiment.
Related entries
Related
- reference Research brief: the lifecycle of a website in Google Search — from launch to mature standing and the perpetual re-evaluation that follows (June 2026)
- reference Google sitemap tag semantics — uses `<loc>` and `<lastmod>` (when accurate); openly ignores `<changefreq>` and `<priority>` (Illyes: "a bag of noise")
- reference Google robots.txt 5xx response postpones the whole crawl — Google will not guess; DNS/server reachability is a hard gate on the entire pipeline
- reference Mobile-first indexing declared COMPLETE on October 31, 2023 — Mueller, Google Search Central Blog: "the trek to Mobile First Indexing is now complete"
- reference Content/structured-data/link parity between mobile and desktop is required — limiting links on the mobile version "can slow down discovery of new pages"
- reference Google explicitly tells sub-few-thousand-URL sites they DO NOT need to think about crawl budget — thresholds for caring are roughly 1M+ pages updated regularly, or 10k+ pages updated daily
- reference A brand-new domain has no history, so Google has little crawl demand to work with — crawls conservatively and ramps up (or doesn't) based on what it finds
- reference Illyes (May 2023 SEO Office Hours) — indexing speed "depends on a bunch of things, but the most important one is the quality of the site, followed by its popularity on the internet"
- reference Googlebot response-code handling — 200 proceeds, 3xx is followed (up to a chain limit), 4xx (incl. 410) is dropped without wasting crawl budget, 5xx slows or pauses crawl, soft-404 confuses everything
- reference Google Web Rendering Service (WRS) runs on evergreen Chromium since 2019 — modern JS (ES6+, Web Components, etc.) is supported; previously frozen at Chrome 41
- reference Google queues ALL 200-status pages for rendering, JS or not — Splitt: "you don't really see how long it takes us to render, if we render at all, when we render"
- reference "Two waves of indexing" — Google's Martin Splitt now calls it an oversimplification; "pretty much every website, when we see them for the first time, goes to rendering" and the waves "play less and less of a role"
- reference Google canonicalization — clusters similar pages, selects the single most "representative" URL as canonical; signals are HINTS not directives (redirects strong, rel=canonical strong, sitemap weak, HTTPS preferred, hreflang clustering)
- reference Google canonical selection uses ~40 signals — Allan Scott (Google "Dups" team) on Search Off the Record: "somewhere in the neighborhood of 40"
- reference Lying with `<lastmod>` erodes Google's trust in your sitemap — "eventually we're not going to believe you anymore" (Google, 2023)
- rule Rule: crawl budget is not a concern for SMB-scale sites — sub-few-thousand URLs do not need to think about it, per Google
- rule Rule: you cannot reliably force Google to index a page — submission tools AID discovery but do not guarantee inclusion or ranking; "Search is never guaranteed" (Mueller)
- rule Rule (Stage A, pre-launch / launch week): remove the gates — verify GSC, submit honest sitemap, request homepage indexing, confirm robots.txt + DNS + 200/404/410 hygiene, ensure mobile/desktop parity
- rule Rule: update sitemap `<lastmod>` only on substantive content changes — churning the date erodes Google's trust in the signal
- reference Genuine unknowns in the Google Search pipeline — exact queue priority math, render-queue position, signal weightings, re-rendering triggers, whether/when a page will ever rank
- reference Category 1 — HARD GATES: factors that determine whether the new site is crawled, rendered, and indexed AT ALL
- reference Category 2 — EFFICIENCY / HYGIENE: factors that help Google process the site efficiently with modest effect (NOT differentiators)
- reference Category 3 — OVERCLAIMED: factors marketed as ranking/visibility advantages where the evidence does NOT support it
- reference Google hard gate — only 200-status pages are queued for rendering; non-200 (4xx/5xx) may skip rendering entirely
- reference Google hard gate — `robots.txt` Disallow blocks crawling AND blocks rendering of JS/CSS; "Google Search won't render JavaScript from blocked files or on blocked pages"
- reference Google hard gate — when Googlebot sees `noindex` (meta or `X-Robots-Tag`), it "will drop that page entirely from Google Search results"; stray staging `noindex` is the most common launch-killer
- reference Google hard gate — Google follows only `<a>` elements with an `href` attribute; `onclick`, `<button>`, `javascript:void(0)` navigation is not followed
- reference Google CONDITIONAL gate — critical content locked behind client-side JS gates indexing for new low-authority sites; high-authority sites are largely fine; the bite is concentrated where it hurts most
- reference Google mobile-first parity — structured data must be present on the MOBILE version; if it exists only on desktop, Google will NOT use it
- reference Google mobile-first parity — titles, descriptions, robots meta must match between mobile and desktop
- reference Google mobile-first nuance — collapsed/hidden content in accordions is generally fine (Google no longer discounts it); ONLY content entirely omitted from mobile is the problem
- reference Internal linking & site architecture — discovery + crawl-efficiency effects are SOLID; PageRank-flow magnitude is real but UNQUANTIFIED (Google publishes no weighting; survivorship bias plagues case studies)
- reference XML sitemap — aids discovery; Google: keeping sitemaps current is "adequate" for most sites; does NOT guarantee indexing
- reference Canonicalization, redirects, duplicate-content hygiene — Google docs treat these as processing-EFFICIENCY tools, NOT ranking boosts
- reference Google AI optimization guide — "not required to have perfectly semantic HTML… Google can understand it"; semantic HTML is good practice with modest effect
- reference Mueller (Bluesky, April 13, 2025) — "Structured data won't make your site rank better. It's used for displaying the search features listed in [Google's documentation]"
- reference Mueller (December 2021) — "it's fairly rare that you would be able to provide some structured data on a page which gives us unique information that we don't see from the page itself"; Google "won't do anything with structured data that's not visible on a page"
- reference Google AI optimization guide — "Structured data isn't required for generative AI search, and there's no special schema.org markup you need to add"
- reference Google AI optimization guide (updated June 15, 2026) — `llms.txt` is "ignored"; AI-specific markup and content chunking are explicitly listed as unnecessary
- reference Mueller — "We've been pretty clear that Core Web Vitals are not giant factors in ranking"; "relevance is still by far much more important"; CWV is tiebreaker-class
- reference Mueller on chasing perfect Lighthouse / CWV scores — getting "those last few percent… your site's SEO generally won't change because of that"
- reference Google CWV at launch — brand-new sites usually have NO CrUX field data (popularity/traffic threshold); Mueller confirms the signal "is not used" without sufficient field data
- reference Google CWV — INP (Interaction to Next Paint) replaced FID as the responsiveness metric in March 2024; the three metrics are now LCP / INP / CLS
- reference "Technical SEO is how you win" is a FALSE FRAMING — it is a gate/prerequisite, NOT a differentiator; once gates are cleared, durable advantage comes from content and earned signals
- reference Rendering strategies — SSR (server-side, per-request), SSG (static, build-time), CSR (client-side, near-empty HTML shell + JS) — three architectures with very different indexing risk profiles
- reference Google two-stage processing — crawl raw HTML first; all 200-status pages queued for rendering ("a headless Chromium renders the page and executes the JavaScript") with delays from seconds to hours per Vercel/MERJ
- reference Vercel + MERJ July 2024 rendering study — analyzed 100,000+ Googlebot fetches; 100% HTML pages rendered; median delay 10s, p75 26s, p90 ~3h, p95 ~6h, p99 ~18h; VENDOR INCENTIVE FLAGGED + high-authority test sites
- reference Onely November 2022 experiment (Ziemek Bućko) — brand-new zero-authority test subdomain; JS-folder page 7 took 313 HOURS vs HTML 36 hours (9x slower); first link 52h vs 25h — different stage (discovery, not render) but the high-authority/low-authority contrast is the point
- reference Google rendering — a `noindex` in the INITIAL HTML is NEVER overridden by JS; the directive is applied as soon as Googlebot sees it; client-side JS cannot undo a server-rendered noindex
- reference Vercel + MERJ — non-Google AI crawlers (GPTBot, ClaudeBot, PerplexityBot) do NOT execute JavaScript; over 500M GPTBot fetches tracked with zero evidence of JS execution
- reference Schema.org vs Google-supported feature types — Schema.org May 2026 dataset = 958 Itemtypes + 4,587 predicates (5,545 entries); Google's Search Gallery documents ~30 feature types; markup outside Google's documented features is generally IGNORED
- reference Google structured data — "Using structured data enables a feature to be PRESENT, it does NOT guarantee that it will be present"; eligibility ≠ rich-result display
- reference Mueller — structured data can help Google "show [the page] in more relevant search results" — a TARGETING / CLARITY effect, NOT a ranking boost
- reference Google structured-data manual action — removes RICH-RESULT eligibility; "doesn't affect how the page ranks in Google web search" — confirms the eligibility-vs-ranking distinction
- reference Adjudicating the vendor-versus-Google gap on schema — schema-plugin sellers, SEO-tool vendors, "technical SEO" agencies market schema as a ranking and AI-citation booster; contradicted by Google's explicit on-record statements; even the indirect CTR-routing argument (Berreby) is NDA-covered and concedes no direct ranking effect
- reference Responsive design (same HTML/URL across devices) is Google's recommended defense — eliminates mobile/desktop parity gaps by CONSTRUCTION; classic failure mode is a separate or aggressively-trimmed mobile template
- reference Genuine unknowns in the launch-build technical foundation — Google publishes no numerical weightings; PageRank-flow magnitude unquantified; render-queue behavior for new low-authority sites under-measured; AI-surface signal weighting undisclosed; CWV thresholds shift
- rule Rule: SSR or SSG is the DEFAULT for any content that must be indexed; CSR for indexable content is a GAMBLE whose downside lands hardest on new low-authority sites
- rule Rule: structured data is RICH-RESULT eligibility + entity/clarity + machine-readability — NOT a ranking lever; do not promise a ranking lift to clients
- rule Rule: implement structured data ONLY for the rich-result features Google documents and your pages genuinely map to; do NOT implement Schema.org types Google doesn't document; validate with the Rich Results Test
- rule Rule: do NOT implement `llms.txt`, AI-specific markup, or content chunking for Google Search — Google explicitly says these do nothing
- rule Rule: do NOT chase perfect Lighthouse / Core Web Vitals scores — Mueller: last few percent "won't change" SEO; once in the "good" range, stop
- rule Rule: at launch, get Core Web Vitals into the "good" range via lab tools, then STOP — invest more only if/when CrUX field data appears and shows "Poor" AND content/relevance are already competitive
- rule Rule: mobile parity is MANDATORY — anything missing on mobile (content, links, structured data, metadata) is missing from Google's index; do NOT trim the mobile version
- rule Rule: scan for a stray `noindex` BEFORE launch — meta tag, `X-Robots-Tag` header, CMS "discourage search engines" toggle — the most common launch-killer
- rule Rule: use real `<a href>` HTML links for any navigation Google must follow — NOT `onclick`, NOT `<button>`, NOT `javascript:void(0)`; let the SPA router intercept the click rather than replacing the anchor
- rule Rule: in the first weeks after launch, validate RENDERED HTML via Search Console URL Inspection — confirm critical content, links, canonical, and structured data survive rendering
- rule Rule (Stage 0, pre-launch): clear the technical gates or stay invisible — 200, robots.txt not blocking JS/CSS, scan for stray `noindex`, real `<a href>` links, SSR/SSG for indexable content, responsive design for mobile parity
- reference Research brief: the psychology of the launch-and-wait — owner patience and visitor first impressions on a brand-new website (June 2026)
- reference Research brief: the time dimension of a new website — ramp economics, the J-curve, owned vs rented, and the AI-era verification (June 2026)
- reference Research brief: what "success" and "progress" actually mean for a newly launched website — a leading-to-lagging indicator framework (June 2026)
- reference Research brief: how long does it actually take a new website to move through Google's pipeline — a methodology-graded benchmark report (June 2026)
- reference Research cluster: launching a new website — the six-brief synthesis on how Google handles it, what the build must get right, how long it actually takes, what it costs, what success means, and the psychology of the launch-and-wait (June 2026)
Referenced by (2)
- reference Research cluster: launching a new website — the six-brief synthesis on how Google handles it, what the build must get right, how long it actually takes, what it costs, what success means, and the psychology of the launch-and-wait (June 2026) relates-to
- reference Research brief: the lifecycle of a website in Google Search — from launch to mature standing and the perpetual re-evaluation that follows (June 2026) relates-to