**Zabbix** is an open source enterprise-class monitoring solution for network infrastructure and hardware — the part of the stack that [[Prometheus]] doesn't handle well. Where Prometheus excels at cloud-native application metrics, Zabbix excels at SNMP polling, IPMI sensor monitoring, and network device health — the physical layer of a data center.
---
### First Principle: Software monitoring starts from hardware up. If a disk is failing or a CPU is overheating, you need to know before the OS does.
[[Prometheus]] exporters can collect OS-level metrics, but they require a working OS on the monitored host. Zabbix reaches below the OS via IPMI to read BMC sensor data — temperatures, fan speeds, power draw, disk health — and via SNMP to poll network switches, PDUs, and UPSes.
---
### Key Considerations
- **IPMI & SNMP**: Zabbix's strongest differentiators. IPMI polling reads hardware sensor data from [[OpenBMC]] or proprietary BMCs. SNMP monitoring covers [[SONiC]] switches, network gear, and PDUs.
- **Agent-Based Monitoring**: For servers with working OSes, the Zabbix agent collects metrics (CPU, memory, disk I/O, processes, log monitoring) with low overhead.
- **Auto-Discovery**: Zabbix automatically discovers hosts via SNMP walks, IP range scanning, and agent registration — useful when new hardware is added to the fleet.
- **Templates**: Pre-built templates for common hardware and software that encode hundreds of checks and alerts. Community templates cover almost every vendor.
- **Alerting & Escalation**: Supports escalation chains — notify L1 first, then L2 if unacknowledged, then management. Integrates with PagerDuty and Opsgenie.
- **vs [[Prometheus]]**: Not a competition — they're complementary. Prometheus for application and container metrics; Zabbix for hardware, IPMI, and network device monitoring.
---
### How It Fits
```
[[OpenBMC]] / IPMI sensors + SNMP network devices
→ Zabbix (hardware and network monitoring)
→ [[Prometheus]] (application and OS metrics)
→ [[Grafana]] (unified dashboards)
```
[[OpenBMC]] | [[Prometheus]] | [[Grafana]] | [[SONiC]] | [[Open Source Hyperscaler MoC]]