**Zabbix** is an open source enterprise-class monitoring solution for network infrastructure and hardware — the part of the stack that [[Prometheus]] doesn't handle well. Where Prometheus excels at cloud-native application metrics, Zabbix excels at SNMP polling, IPMI sensor monitoring, and network device health — the physical layer of a data center. --- ### First Principle: Software monitoring starts from hardware up. If a disk is failing or a CPU is overheating, you need to know before the OS does. [[Prometheus]] exporters can collect OS-level metrics, but they require a working OS on the monitored host. Zabbix reaches below the OS via IPMI to read BMC sensor data — temperatures, fan speeds, power draw, disk health — and via SNMP to poll network switches, PDUs, and UPSes. --- ### Key Considerations - **IPMI & SNMP**: Zabbix's strongest differentiators. IPMI polling reads hardware sensor data from [[OpenBMC]] or proprietary BMCs. SNMP monitoring covers [[SONiC]] switches, network gear, and PDUs. - **Agent-Based Monitoring**: For servers with working OSes, the Zabbix agent collects metrics (CPU, memory, disk I/O, processes, log monitoring) with low overhead. - **Auto-Discovery**: Zabbix automatically discovers hosts via SNMP walks, IP range scanning, and agent registration — useful when new hardware is added to the fleet. - **Templates**: Pre-built templates for common hardware and software that encode hundreds of checks and alerts. Community templates cover almost every vendor. - **Alerting & Escalation**: Supports escalation chains — notify L1 first, then L2 if unacknowledged, then management. Integrates with PagerDuty and Opsgenie. - **vs [[Prometheus]]**: Not a competition — they're complementary. Prometheus for application and container metrics; Zabbix for hardware, IPMI, and network device monitoring. --- ### How It Fits ``` [[OpenBMC]] / IPMI sensors + SNMP network devices → Zabbix (hardware and network monitoring) → [[Prometheus]] (application and OS metrics) → [[Grafana]] (unified dashboards) ``` [[OpenBMC]] | [[Prometheus]] | [[Grafana]] | [[SONiC]] | [[Open Source Hyperscaler MoC]]