# Data Moat Data is a moat only when it is (a) proprietary, (b) compounding, and (c) actually used. Most "data moat" claims fail on one of the three. ## The three conditions - **Proprietary** — the data is not trivially re-collectable by a competitor from public sources or third-party APIs. Watch for datasets that are actually public-internet scrapes dressed up as proprietary. - **Compounding** — the dataset grows with usage in a way competitors cannot shortcut. One-time scrapes do not compound. Transactional interactions inside a workflow do. - **Actually used** — the data materially improves the product. Data sitting in a warehouse that never touches the model or the UX is storage cost, not a moat. ## Strongest forms of data moat - **Interaction data** inside a critical workflow — every customer touch adds a labelled example. - **Consented labelled outcomes** — customer agrees the result was good/bad, giving clean RLHF-grade supervision. - **Rare-event corpora** — edge cases others have to wait years to accumulate. - **Cross-customer aggregations** that individually no customer would share, but everyone benefits from in aggregate. ## Weakest forms - Scraped public data. - Usage logs with no labels. - Customer-owned data the product cannot reuse under the contract. See also: [[Data Flywheel]], [[AI era Defensibility]], [[The Age of Vertical Models]], [[AI Agent Vertical SaaS DD MOC]] --- Tags: #AIstrategy #defensibility #investing