Files
node-problem-detector/pkg/kernelmonitor
Lantao Liu 532f933bd8 This PR:
1) Add lookback support in kernel monitor. After started, Kernel monitor
will check some old logs to detect problems which happened before last
node reboot.
2) Add `lookback` and `startPattern` in kernel monitor configuration.
  * `lookback` specifies how long time kernel monitor should look back.
  * `startPattern` specifies which log indicates the node is started.
  kernel monitor will clear all current node conditions once it finds
  a node start log. This makes sure that old problems won't change the
  node condition.
3) Add support for kernel panic monitoring, the null pointer and divide
0 kernel panic will be surfaced as event. Usually kernel monitor will
report these events during looking back phase.
2016-08-20 19:11:26 -07:00
..
2016-08-20 19:11:26 -07:00
2016-08-20 19:11:26 -07:00
2016-08-20 19:11:26 -07:00
2016-08-20 19:11:26 -07:00
2016-08-20 19:11:26 -07:00
2016-08-20 19:11:26 -07:00
2016-06-24 16:19:44 -07:00

Kernel Monitor

Kernel Monitor is a problem daemon in node problem detector. It monitors kernel log and detects known kernel issues following predefined rules.

The Kernel Monitor matches kernel issues according to a set of predefined rule list in config/kernel-monitor.json. The rule list is extensible.

Limitations

  • Kernel Monitor only supports file based kernel log now. It doesn't support log tools like journald. There is an open issue to add journald support.

  • Kernel Monitor has assumption on kernel log format, now it only works on Ubuntu and Debian. However, it is easy to extend it to support other log format.

Add New NodeConditions

To support new node conditions, you can extend the conditions field in config/kernel-monitor.json with new condition definition:

{
  "type": "NodeConditionType",
  "reason": "CamelCaseDefaultNodeConditionReason",
  "message": "arbitrary default node condition message"
}

Detect New Problems

To detect new problems, you can extend the rules field in config/kernel-monitor.json with new rule definition:

{
  "type": "temporary/permanent",
  "condition": "NodeConditionOfPermanentIssue",
  "reason": "CamelCaseShortReason",
  "message": "regexp matching the issue in the kernel log"
}

Change Log Path

Kernel log in different OS distros may locate in different path. The log field in config/kernel-monitor.json is the log path inside the container. You can always configure it to match your OS distro.

Support Other Log Format

Kernel monitor uses Translator plugin to translate kernel log the internal data structure. It is easy to implement a new translator for a new log format.