- Use `systemctl is-active` to check if service is running
- Cleaner that `grep` on `systemctl status` output
- Return success means service is running/active
- Return failure means not running which could be due to
stopped/failed service or that service does not exist
- Use `command -v` instead of `which`
Ref: https://github.com/koalaman/shellcheck/wiki/SC2230
- Follow Google "Shell Style Guide": indent, use "readonly"
- Minor: Rephrase comment, avoid all caps
- Script was checking for "ip_conntrack_..." which was replaced by "nf_conntrack_..." on newer system. Now support both.
- Return failure ("not ok") when table is more than 90% full.
- Not sure what value is best here but I think that is better than when the table is full.
Otherwise we might end up with a value close to the max or bouncing around.
- Replaced cat by "$(< file )" to avoid calling external command
- Follow Google "Shell Style Guide": 2 space indent, use preferred "[[ test ]]", add "readonly"
- Include current connection usage in output message
Host collector report three things today:
1. Host OS uptime (in seconds)
2. Host kernel version (as a metric label)
3. Host OS version (as a metric label)
My comment was eaten by github in !152 and wanted to raise attention incase this was meant to be an exit instead of an echo, otherwise feel free to close!
This modifies pattern for catching cpp problem messages produced by
ABRT. Found that not all mentioned messages fit into former pattern.
For example following is valid cpp problem message produced by ABRT:
Process xxx (bad_binary) crashed in Will::Fail::a() [clone .isra.2]()
but doesn't fit former pattern, since it's last part contains
whitespaces.
* Change `unregister_netdevice` to be an event to fix#47.
* Change `KernelPanic` to `KernelOops` because we can't handle kernel
panic currently.
* Use system boot time instead of "StartPattern" to fix#48.
1) Add lookback support in kernel monitor. After started, Kernel monitor
will check some old logs to detect problems which happened before last
node reboot.
2) Add `lookback` and `startPattern` in kernel monitor configuration.
* `lookback` specifies how long time kernel monitor should look back.
* `startPattern` specifies which log indicates the node is started.
kernel monitor will clear all current node conditions once it finds
a node start log. This makes sure that old problems won't change the
node condition.
3) Add support for kernel panic monitoring, the null pointer and divide
0 kernel panic will be surfaced as event. Usually kernel monitor will
report these events during looking back phase.