Content extraction decision tree — WP REST API default, WXR XML fallback, direct DB only for hidden postmeta
Created 2026-05-22
Decision tree for extracting WordPress content during migration:
- WP REST API (
/wp-json/wp/v2/posts,/pages,/media) — default choice for any WP 4.7+ site. Reliable, supports custom post types ifshow_in_rest => true. Sanity's official migration course documents this end-to-end. - WXR XML export (Tools → Export) — when REST API is firewalled; one-shot full archive; WordPress.com source sites via OAuth. WXR includes references to attachments but not the binary media files.
- Direct DB query (WP-CLI
wp db exportor MySQL dump) — only when you need rawpostmetarows not exposed via REST. Typically ACF data or custom post-meta fields registered without REST exposure.
Page-builder content handling:
| Source | Storage format | Extraction reality |
|---|---|---|
| Gutenberg | HTML + <!-- wp:blockname --> comments |
Cleanly parseable. @wordpress/block-serialization-default-parser converts to AST → Portable Text / MDX. |
| Classic editor | Plain HTML | Trivial. Use turndown to convert to markdown. |
| ACF fields | Serialized PHP in postmeta; exposed via REST with ACF-to-REST or WPGraphQL-for-ACF |
Manageable if planned; catastrophic if discovered mid-migration. Always audit ACF first. |
| Elementor | JSON blob in postmeta._elementor_data |
Essentially un-portable. Rebuild pages from screenshots. No clean Elementor→markdown/Portable Text path exists. |
| Divi | Shortcodes in post_content |
Rebuild, don't convert. Severe lock-in. See Divi 4 stored content as proprietary et_pb_* shortcodes — orphan text on theme deactivation (Divi 5 fixes this) (existing). |
| Bricks | JSON in postmeta._bricks_page_content_2 |
Rebuild. |
Image migration (where projects overrun):
- Export
/wp-content/uploads/via SFTP or WP-CLI (wp media export). - Upload to new host (Cloudflare Images, Sanity CDN, Vercel Blob, or new repo
public/). - Rewrite every
<img src>and every markdownin migrated content. Regex against old domain +/wp-content/uploads/. - Regenerate responsive sizes via new stack's image pipeline.
- Plan to budget time for this — Outsourcify (7,000-article migration) flagged it as a major engineering challenge.
Internal link rewriting: one-shot script scanning all body content for https://oldsite.com/... or relative /2019/... patterns and rewriting them.
Comments: for SMB sites <500 lifetime comments, sunset gracefully — export to JSON, archive, replace with Giscus (GitHub Discussions). For active commenter communities, migrate to Disqus or keep WordPress headless.
Related
Referenced by (3)
- reference Research brief: The Candid Creative WordPress Migration Playbook (piece 19) depends-on
- reference Feature-parity replacements for common WordPress plugins (forms, SEO, search, comments, commerce, membership, newsletter, analytics) relates-to
- reference Migration objection-handling map — sourced answers to every common client fear about migrating off WordPress depends-on