mirror of
https://github.com/kubernetes/node-problem-detector.git
synced 2026-02-14 18:09:57 +00:00
System Stats Monitor
System Stats Monitor is a problem daemon in node problem detector. It collects pre-defined health-related metrics from different system components. Each component may allow further detailed configurations.
Currently supported components are:
- disk
See example config file here.
Detailed Configuration Options
Global Configurations
Data collection period can be specified globally in the config file, see invokeInterval at the example.
Disk
Below metrics are collected from disk component:
disk/io_time: # of milliseconds spent doing I/Os on this devicedisk/weighted_io: # of milliseconds spent doing I/Os on this devicedisk/avg_queue_len: average # of requests that was waiting in queue or being serviced during the lastinvokeInterval
By setting the metricsConfigs field and displayName field (example), you can specify the list of metrics to be collected, and their display names on the Prometheus scaping endpoint. The name of the disk block device will be reported in the device metrics label.
And a few other options:
includeRootBlk: When set totrue, add all block devices that's not a slave or holder device to the list of disks that System Stats Monitor collects metrics from. When set tofalse, do not modify the list of disks that System Stats Monitor collects metrics from.includeAllAttachedBlk: When set totrue, add all currently attached block devices to the list of disks that System Stats Monitor collects metrics from. When set tofalse, do not modify the list of disks that System Stats Monitor collects metrics from.lsblkTimeout: System Stats Monitor useslsblkto retrieve block devices information. This option sets the timeout for callinglsblkcommands.