Version: Next
Grafana Dashboard
HAMi ships a pre-built Grafana dashboard that visualizes GPU allocation, memory usage, and per-pod utilization metrics exported by the HAMi device plugin.
Import the Dashboard
- Open your Grafana instance and go to Dashboards > Import.
- Download the dashboard template:
- Upload the JSON file or paste its contents into the import dialog.
- Select your Prometheus data source and click Import.
Dashboard Panels
The dashboard includes panels for:
- GPU memory allocation per pod
- GPU core utilization per pod
- Node-level GPU resource availability
- Device plugin health status
Prometheus Scrape Config
The hami-device-plugin pod on each node exposes metrics on port 31992 (configurable via devicePlugin.monitorPort). Add a scrape job:
scrape_configs:
- job_name: hami-device-plugin
static_configs:
- targets:
- <node-ip>:31992
For Prometheus Operator, create a ServiceMonitor targeting the hami-device-plugin service on port 31992.
Key metrics:
| Metric | Description |
|---|---|
hami_host_gpu_utilization_ratio | GPU real-time utilization on host (0–100) |
hami_host_gpu_memory_used_bytes | GPU real-time device memory usage on host |
Prerequisites
- Prometheus is installed and scraping the HAMi device plugin metrics endpoint.
- The HAMi device plugin is running and exposing metrics on the configured port.
For details on enabling metrics collection, see Real-time GPU Usage and Real-time Device Usage.