23 Commits

Author SHA1 Message Date
Andy Xie
33dffe0761 enable codnition updaet when message change for custom plugin 2018-12-11 13:14:49 +08:00
Zhen Wang
6b983a9ea3 Detect corrupt docker overlay2 2018-11-27 00:35:42 -08:00
Zhen Wang
1f636381b8 Detect kubelet and container runtime frequent crashes 2018-11-26 22:41:06 -08:00
Zhen Wang
ecaa61e7d3 Detect readonly filesystem 2018-11-20 11:20:48 -08:00
Jan Heidbrink
659f31c0f2 Adapt OOMKilling pattern to current kernels 2018-07-31 15:15:45 +02:00
David Ashpole
bf730e9c63 add log-counter go plugin 2018-06-20 15:55:19 -07:00
Jasmine Hegman
76ce35cddc Possibly enhanced network_problem custom plugin
My comment was eaten by github in !152 and wanted to raise attention incase this was meant to be an exit instead of an echo, otherwise feel free to close!
2018-01-05 11:26:15 -07:00
Rohit Ramkumar
69b6b58ee3 Addressed comments 2017-12-19 08:32:27 -08:00
Rohit Ramkumar
cd472c7765 Add empty conditions list 2017-11-27 11:35:48 -08:00
Rohit Ramkumar
fb12f3b70e Add network monitor script as plugin 2017-11-27 11:33:38 -08:00
Andy Xie
10dbfef1a8 add custom problem detector plugin 2017-11-22 10:14:09 +08:00
Ajit Kumar
d2de52f090 Add rule for docker image pull error 2017-06-21 13:48:58 -07:00
Julius Milan
b579984f0a Fix abrt-adaptor config for cpp problems
This modifies pattern for catching cpp problem messages produced by
ABRT. Found that not all mentioned messages fit into former pattern.
For example following is valid cpp problem message produced by ABRT:

Process xxx (bad_binary) crashed in Will::Fail::a() [clone .isra.2]()

but doesn't fit former pattern, since it's last part contains
whitespaces.
2017-05-11 15:40:25 +02:00
Julius Milan
abcf6a4f4b Add ABRT adaptor config 2017-03-23 16:15:56 +01:00
Random-Liu
10fc831409 Change kernel specific name in code base and change syslog to filelog. 2017-02-15 13:07:01 -08:00
Random-Liu
27cc831408 Add arbitrary daemon log support 2017-02-10 11:32:35 -08:00
Random-Liu
d281cb8a15 Fix kernel monitor issues:
* Change `unregister_netdevice` to be an event to fix #47.
* Change `KernelPanic` to `KernelOops` because we can't handle kernel
panic currently.
* Use system boot time instead of "StartPattern" to fix #48.
2017-02-09 16:09:27 -08:00
Random-Liu
2ef2af99eb Update Readme.md 2017-01-19 01:59:09 -08:00
Random-Liu
c15d463ad5 Finish the journald support 2017-01-19 01:59:09 -08:00
Lantao Liu
532f933bd8 This PR:
1) Add lookback support in kernel monitor. After started, Kernel monitor
will check some old logs to detect problems which happened before last
node reboot.
2) Add `lookback` and `startPattern` in kernel monitor configuration.
  * `lookback` specifies how long time kernel monitor should look back.
  * `startPattern` specifies which log indicates the node is started.
  kernel monitor will clear all current node conditions once it finds
  a node start log. This makes sure that old problems won't change the
  node condition.
3) Add support for kernel panic monitoring, the null pointer and divide
0 kernel panic will be surfaced as event. Usually kernel monitor will
report these events during looking back phase.
2016-08-20 19:11:26 -07:00
Lantao Liu
5b07afd325 1. Make source and conditions configurable.
2. Add multiple events and conditions support in problem interface.
2016-06-02 15:32:02 -07:00
Lantao Liu
8759e4d610 Use Patch instead of UpdateStatus. 2016-05-30 19:22:32 -07:00
Lantao Liu
f0312655bd Add first version of node-problem-detector 2016-05-17 15:55:33 -07:00