# Data Moat
Data is a moat only when it is (a) proprietary, (b) compounding, and (c) actually used. Most "data moat" claims fail on one of the three.
## The three conditions
- **Proprietary** — the data is not trivially re-collectable by a competitor from public sources or third-party APIs. Watch for datasets that are actually public-internet scrapes dressed up as proprietary.
- **Compounding** — the dataset grows with usage in a way competitors cannot shortcut. One-time scrapes do not compound. Transactional interactions inside a workflow do.
- **Actually used** — the data materially improves the product. Data sitting in a warehouse that never touches the model or the UX is storage cost, not a moat.
## Strongest forms of data moat
- **Interaction data** inside a critical workflow — every customer touch adds a labelled example.
- **Consented labelled outcomes** — customer agrees the result was good/bad, giving clean RLHF-grade supervision.
- **Rare-event corpora** — edge cases others have to wait years to accumulate.
- **Cross-customer aggregations** that individually no customer would share, but everyone benefits from in aggregate.
## Weakest forms
- Scraped public data.
- Usage logs with no labels.
- Customer-owned data the product cannot reuse under the contract.
See also: [[Data Flywheel]], [[AI era Defensibility]], [[The Age of Vertical Models]], [[AI Agent Vertical SaaS DD MOC]]
---
Tags: #AIstrategy #defensibility #investing