Content extraction decision tree — WP REST API default, WXR XML fallback, direct DB only for hidden postmeta

Decision tree for extracting WordPress content during migration:

  • WP REST API (/wp-json/wp/v2/posts, /pages, /media) — default choice for any WP 4.7+ site. Reliable, supports custom post types if show_in_rest => true. Sanity's official migration course documents this end-to-end.
  • WXR XML export (Tools → Export) — when REST API is firewalled; one-shot full archive; WordPress.com source sites via OAuth. WXR includes references to attachments but not the binary media files.
  • Direct DB query (WP-CLI wp db export or MySQL dump) — only when you need raw postmeta rows not exposed via REST. Typically ACF data or custom post-meta fields registered without REST exposure.

Page-builder content handling:

Source Storage format Extraction reality
Gutenberg HTML + <!-- wp:blockname --> comments Cleanly parseable. @wordpress/block-serialization-default-parser converts to AST → Portable Text / MDX.
Classic editor Plain HTML Trivial. Use turndown to convert to markdown.
ACF fields Serialized PHP in postmeta; exposed via REST with ACF-to-REST or WPGraphQL-for-ACF Manageable if planned; catastrophic if discovered mid-migration. Always audit ACF first.
Elementor JSON blob in postmeta._elementor_data Essentially un-portable. Rebuild pages from screenshots. No clean Elementor→markdown/Portable Text path exists.
Divi Shortcodes in post_content Rebuild, don't convert. Severe lock-in. See Divi 4 stored content as proprietary et_pb_* shortcodes — orphan text on theme deactivation (Divi 5 fixes this) (existing).
Bricks JSON in postmeta._bricks_page_content_2 Rebuild.

Image migration (where projects overrun):

  1. Export /wp-content/uploads/ via SFTP or WP-CLI (wp media export).
  2. Upload to new host (Cloudflare Images, Sanity CDN, Vercel Blob, or new repo public/).
  3. Rewrite every <img src> and every markdown ![alt](path) in migrated content. Regex against old domain + /wp-content/uploads/.
  4. Regenerate responsive sizes via new stack's image pipeline.
  5. Plan to budget time for this — Outsourcify (7,000-article migration) flagged it as a major engineering challenge.

Internal link rewriting: one-shot script scanning all body content for https://oldsite.com/... or relative /2019/... patterns and rewriting them.

Comments: for SMB sites <500 lifetime comments, sunset gracefully — export to JSON, archive, replace with Giscus (GitHub Discussions). For active commenter communities, migrate to Disqus or keep WordPress headless.