Zillow: 110M-home "living database" built on Census/ACS + 3,000 county assessors + USPS + MLS feeds
Created 2026-05-22
Claim: Zillow's database is "built on a backbone of administrative data" — Census, ACS, ~3,000 county tax assessments, sales records. The Zestimate model sits on top of public records, not beside them.
Sources:
- https://apps.bea.gov/fesac/meetings/2016-06-10/Rao-Presentation-The-Zillow-Experience.pdf
- https://www.zillow.com/tech/public-data-challenges/
Confidence: Verified.
Practitioner reference (from Zillow's own engineering blog):
- Address Validation Service runs assessor records against a GIS table of ~500,000 city/state/zip/county permutations to catch upstream errors before they reach the front end
- FillRate per field per county tracks data completeness
- Transaction Latency = Median (Transaction Recorded Date − Transaction Received Date) — the cleanest "speed-of-data" metric in public real-estate engineering
The pattern: Zillow is fundamentally an open-data company. The MLS feed is value-added; the public-records cleaning is the moat. Note: Zillow's ZTRAX dataset was discontinued in 2023, but the technical writing remains the best public account of how a major operator handles public-records cleaning.
Referenced by (3)
- reference ATTOM Data: 500M+ real estate/loan transactions, 2,690+ counties, 20-step Enterprise Data Management Program relates-to
- reference Carfax: from 10,000 records faxed in 1986 to 35B+ records across 151,000+ sources — sold to S&P Global Mobility 2022 relates-to
- reference Research brief: Public data as a private moat — building proprietary intelligence from government open data (piece 11 of 15) relates-to