Zillow: 110M-home "living database" built on Census/ACS + 3,000 county assessors + USPS + MLS feeds

Claim: Zillow's database is "built on a backbone of administrative data" — Census, ACS, ~3,000 county tax assessments, sales records. The Zestimate model sits on top of public records, not beside them.

Sources:

Confidence: Verified.

Practitioner reference (from Zillow's own engineering blog):

  • Address Validation Service runs assessor records against a GIS table of ~500,000 city/state/zip/county permutations to catch upstream errors before they reach the front end
  • FillRate per field per county tracks data completeness
  • Transaction Latency = Median (Transaction Recorded Date − Transaction Received Date) — the cleanest "speed-of-data" metric in public real-estate engineering

The pattern: Zillow is fundamentally an open-data company. The MLS feed is value-added; the public-records cleaning is the moat. Note: Zillow's ZTRAX dataset was discontinued in 2023, but the technical writing remains the best public account of how a major operator handles public-records cleaning.