**Prometheus** is an open source metrics collection and alerting system originally built at SoundCloud and now a CNCF graduated project. It uses a pull-based model — scraping metrics from instrumented endpoints at regular intervals — and stores them in a time-series database with a powerful query language (PromQL). --- ### First Principle: Metrics are the numerical heartbeat of a system. Pull them on your schedule; don't let services control when you hear from them. The pull model means Prometheus defines what it monitors. A service exposes a `/metrics` endpoint; Prometheus decides when to scrape it. If a service stops responding, Prometheus knows immediately — whereas a push-based system might simply receive no data and not know the difference from silence. --- ### Key Considerations - **PromQL**: Prometheus's query language enables real-time aggregation, rate calculations, percentile estimations, and multi-dimensional slicing across any label combination. - **Service Discovery**: Prometheus auto-discovers targets from [[Kubernetes]] pod annotations, [[OpenStack]] Nova APIs, Consul, and more — so new services are automatically monitored when they appear. - **Alertmanager**: A companion component that handles alert routing, deduplication, grouping, and notification — sending alerts to Slack, PagerDuty, email, and webhooks. - **Exporters**: For systems that don't natively expose `/metrics`, exporters bridge the gap — `node_exporter` for Linux, `ceph_exporter` for [[Ceph]], SNMP exporter for network gear. - **Scale Limitations**: A single Prometheus instance is not designed for long-term storage or multi-cluster federation at scale. [[Thanos]] or [[Mimir]] solve this. - **[[Grafana]] Integration**: [[Grafana]] uses Prometheus as its primary data source for building dashboards — the Prometheus + Grafana combination is the standard monitoring stack. --- ### How It Fits ``` Services + exporters (expose /metrics endpoints) → Prometheus (scrapes, stores, alerts) → Alertmanager (routes alerts to on-call) → [[Grafana]] (dashboards and visualisation) → [[Thanos]] / [[Mimir]] (long-term storage at scale) ``` [[Grafana]] | [[Thanos]] | [[Mimir]] | [[OpenTelemetry]] | [[Zabbix]] | [[Open Source Hyperscaler MoC]]