mirror of https://github.com/kubernetes/node-problem-detector.git synced 2026-05-06 17:27:16 +00:00

Files

Xuewei Zhang 82c2368795 Metric format fixes on host/uptime and disk/*

1. host/uptime, disk/io_time and disk/weighted_io should be
counter/cumulative metrics. SO we have to use the Sum aggregation method
rather than LastValue aggregation method (which will declare the metric
as gauge metric).

2. Renamed label "device" for disk/* metrics to "device_name".
This is to clarify that it is device_name (sda1) rather than device_path
(/dev/sda1)

2019-08-16 15:14:54 -07:00

types

Implement host collector as part of system-stats-monitor

2019-06-27 16:40:11 -07:00

disk_collector.go

Metric format fixes on host/uptime and disk/*

2019-08-16 15:14:54 -07:00

host_collector.go

Metric format fixes on host/uptime and disk/*

2019-08-16 15:14:54 -07:00

README.md

Update READMEs

2019-06-13 00:51:17 -07:00

system_stats_monitor_test.go

Add disk metrics support.

2019-06-13 00:51:17 -07:00

system_stats_monitor.go

Print monitor config path in the logs

2019-07-30 11:00:47 -07:00

README.md

System Stats Monitor

System Stats Monitor is a problem daemon in node problem detector. It collects pre-defined health-related metrics from different system components. Each component may allow further detailed configurations.

Currently supported components are:

disk

See example config file here.

Detailed Configuration Options

Global Configurations

Data collection period can be specified globally in the config file, see invokeInterval at the example.

Disk

Below metrics are collected from disk component:

disk/io_time: # of milliseconds spent doing I/Os on this device
disk/weighted_io: # of milliseconds spent doing I/Os on this device
disk/avg_queue_len: average # of requests that was waiting in queue or being serviced during the last invokeInterval

By setting the metricsConfigs field and displayName field (example), you can specify the list of metrics to be collected, and their display names on the Prometheus scaping endpoint. The name of the disk block device will be reported in the device metrics label.

And a few other options:

includeRootBlk: When set to true, add all block devices that's not a slave or holder device to the list of disks that System Stats Monitor collects metrics from. When set to false, do not modify the list of disks that System Stats Monitor collects metrics from.
includeAllAttachedBlk: When set to true, add all currently attached block devices to the list of disks that System Stats Monitor collects metrics from. When set to false, do not modify the list of disks that System Stats Monitor collects metrics from.
lsblkTimeout: System Stats Monitor uses lsblk to retrieve block devices information. This option sets the timeout for calling lsblk commands.