**Prometheus** is an open source metrics collection and alerting system originally built at SoundCloud and now a CNCF graduated project. It uses a pull-based model — scraping metrics from instrumented endpoints at regular intervals — and stores them in a time-series database with a powerful query language (PromQL).
---
### First Principle: Metrics are the numerical heartbeat of a system. Pull them on your schedule; don't let services control when you hear from them.
The pull model means Prometheus defines what it monitors. A service exposes a `/metrics` endpoint; Prometheus decides when to scrape it. If a service stops responding, Prometheus knows immediately — whereas a push-based system might simply receive no data and not know the difference from silence.
---
### Key Considerations
- **PromQL**: Prometheus's query language enables real-time aggregation, rate calculations, percentile estimations, and multi-dimensional slicing across any label combination.
- **Service Discovery**: Prometheus auto-discovers targets from [[Kubernetes]] pod annotations, [[OpenStack]] Nova APIs, Consul, and more — so new services are automatically monitored when they appear.
- **Alertmanager**: A companion component that handles alert routing, deduplication, grouping, and notification — sending alerts to Slack, PagerDuty, email, and webhooks.
- **Exporters**: For systems that don't natively expose `/metrics`, exporters bridge the gap — `node_exporter` for Linux, `ceph_exporter` for [[Ceph]], SNMP exporter for network gear.
- **Scale Limitations**: A single Prometheus instance is not designed for long-term storage or multi-cluster federation at scale. [[Thanos]] or [[Mimir]] solve this.
- **[[Grafana]] Integration**: [[Grafana]] uses Prometheus as its primary data source for building dashboards — the Prometheus + Grafana combination is the standard monitoring stack.
---
### How It Fits
```
Services + exporters (expose /metrics endpoints)
→ Prometheus (scrapes, stores, alerts)
→ Alertmanager (routes alerts to on-call)
→ [[Grafana]] (dashboards and visualisation)
→ [[Thanos]] / [[Mimir]] (long-term storage at scale)
```
[[Grafana]] | [[Thanos]] | [[Mimir]] | [[OpenTelemetry]] | [[Zabbix]] | [[Open Source Hyperscaler MoC]]