# System Log Monitor

*System Log Monitor* is a problem daemon in node problem detector. It monitors
specified system daemon log and detects problems following predefined rules.

The System Log Monitor matches problems according to a set of predefined rule list in
the configuration files. (
[`config/kernel-monitor.json`](https://github.com/kubernetes/node-problem-detector/blob/master/config/kernel-monitor.json) as an example).
The rule list is extensible.

## Supported sources

* System Log Monitor currently supports file-based logs, journald, and kmsg.
  Additional sources can be added by implementing a [new log
  watcher](#new-log-watcher).

## Add New NodeConditions

To support new node conditions, you can extend the `conditions` field in
the configuration file with new condition definition:

```json
{
  "type": "NodeConditionType",
  "reason": "CamelCaseDefaultNodeConditionReason",
  "message": "arbitrary default node condition message"
}
```

## Detect New Problems

To detect new problems, you can extend the `rules` field in the configuration file
with new rule definition:

```json
{
  "type": "temporary/permanent",
  "condition": "NodeConditionOfPermanentIssue",
  "reason": "CamelCaseShortReason",
  "pattern": "regexp matching the issue in the log",
  "patternGeneratedMessageSuffix": "Please check the network connectivity and ensure that all required services are running. For more details, see our documentation at https://example.com/docs/troubleshooting."
}
```

*Note that the pattern must match to the end of the line excluding the
tailing newline character, and multi-line pattern is supported.*

## Log Watchers

System log monitor supports different log management tools with different log
watchers:
* [filelog](./logwatchers/filelog): Log watcher for
arbitrary file based log.
* [journald](.//logwatchers/journald): Log watcher for journald.
* [kmsg](./logwatchers/kmsg): Log watcher for the kernel ring buffer device, /dev/kmsg.
Set `plugin` in the configuration file to specify log watcher.

### Plugin Configuration

Log watcher specific configurations are configured in `pluginConfig`.
* **journald**
  * source: The [`SYSLOG_IDENTIFIER`](https://www.freedesktop.org/software/systemd/man/systemd.journal-fields.html)
  of the log to watch.
* **filelog**:
  * timestamp: The regular expression used to match timestamp in the log line.
    Submatch is supported, but only the last result will be used as the actual
    timestamp.
  * message: The regular expression used to match message in the log line.
    Submatch is supported, but only the last result will be used as the actual
    message.
  * timestampFormat: The format of the timestamp. The format string is the time
    `2006-01-02T15:04:05Z07:00` in the expected format. (See
    [golang timestamp format](https://golang.org/pkg/time/#pkg-constants))
* **kmsg**: No configuration for now.

### Change Log Path

Log on different OS distros may locate in different path. The `logPath`
field in the configuration file is the log path. You can always configure
`logPath` to match your OS distro.
* filelog: `logPath` is the path of log file, e.g. `/var/log/kern.log` for kernel
  log.
* journald: `logPath` is the journal log directory, usually `/var/log/journal`.

### New Log Watcher

System log monitor uses [Log Watcher](./logwatchers/types/log_watcher.go) to
support different log management tools.  It is easy to implement a new log
watcher.

## Metrics Reporting

By setting the boolean `metricsReporting` at top level, you can choose to enable or disable
metrics reporting of System Log Monitor. If you omit the field, it will be set to `true` by
default.

Temporary problems will be reported as counter metrics, such as below example:

```
# HELP problem_counter Number of times a specific type of problem have occurred.
# TYPE problem_counter counter
problem_counter{reason="TaskHung"} 2
```

Permanent problems will be reported as both gauge metrics and counter metrics, such as below
example:

```
# HELP problem_counter Number of times a specific type of problem have occurred.
# TYPE problem_counter counter
problem_counter{reason="DockerHung"} 1
# HELP problem_gauge Whether a specific type of problem is affecting the node or not.
# TYPE problem_gauge gauge
problem_gauge{condition="KernelDeadlock",reason="DockerHung"} 1
```