{"id":660,"slug":"ai-dev-productivity-realistic-ceiling-2026","title":"AI-assisted dev productivity 2026: published evidence ranges from 21-28% speedup (GovTech) to 19% slowdown (METR RCT) — honest ceiling is 1.3-1.7× for fluent users","kind":"reference","scope":"business","status":"current","audiences":["kevin","candid-team"],"topics":["agency-methodology","ai-dev-productivity"],"reference_body":"**Published research range — three primary sources:**\n\n- **GovTech Singapore GitHub Copilot study (arXiv:2409.17434):** *\"coding/tasks speed increased by 21-28%\"*.\n- **Longitudinal arXiv:2509.19708 (100+ developers):** *\"approximately 30-40% of code shipped to production through this tool accounts for overall 28% increase in code shipment volume\"*.\n- **METR 2025 RCT (arXiv:2507.09089, n=16 experienced OSS developers):** *\"allowing AI actually increases completion time by 19%—AI tooling slowed developers down,\"* despite developers forecasting a 24% reduction.\n\n**Confidence:** Moderate. The field is moving fast and results are workload-dependent.\n\n**Translation:**\n\n- Fluent users with skill-library discipline → meaningful (20-30%) lifts.\n- Naive power-users on unfamiliar codebases → can be slower than baseline.\n- Honest ceiling for a 2-person Candid Creative shop: **ship 1.3-1.7× the marketing-site work a 2024-equivalent shop could** once the team is fluent.\n- **Cap is set by client communication, discovery, and design — not coding.**\n\n**Where AI benefits most (high confidence):**\n\n- Greenfield Astro/Tailwind scaffolding.\n- Boilerplate refactors (renaming, type-narrowing, API client updates).\n- Framework migrations (Tailwind v3→v4, Astro 4→5→6, Next.js 14→15→16).\n- Writing tests for existing code.\n- One-off scripts and CMS schemas.\n\n**Where it benefits least (moderate confidence):**\n\n- Performance optimization (requires human judgment about what to measure).\n- Accessibility decisions.\n- Information architecture and content strategy.\n- Anything requiring institutional/client context the model doesn't have.\n\n**Push back on hype:** \"Vibe coding\" entire production marketing sites without review produces fragile code. The shops that win in 2026 use AI to accelerate the boring and keep humans on the judgment.","rationale_body":null,"metadata":null,"links":{"outgoing":[{"slug":"rule-pay-for-claude-code-plus-cursor-or-copilot","title":"RULE: Every Candid developer gets paid Claude Code + Cursor (or Copilot). ~$30-40/mo/person is not a real number against the productivity lift.","kind":"rule","scope":"business","link_type":"relates-to"}],"incoming":[{"slug":"research-brief-2026-build-standards","title":"Research brief: Candid Creative 2026 Build-Standards — web stack decision framework for SMB marketing sites & lightweight apps (piece 16)","kind":"reference","scope":"business","link_type":"relates-to"}]},"created_at":"2026-05-22T21:24:18.331Z","updated_at":"2026-05-22T21:24:18.331Z"}