51 Commits

Author SHA1 Message Date
Random-Liu
51351f91b2 Cleanup kmsg log wather. 2017-05-30 15:58:45 -07:00
Lantao Liu
be6c516cfd Merge pull request #41 from euank/kmsg-parser
logwatchers: add new kmsg-based kernel log watcher
2017-05-30 15:53:24 -07:00
Euan Kemp
73cba49db0 kmsg: update the docs to reference kmsg parser too 2017-03-09 21:38:11 -08:00
Euan Kemp
9c23921c11 logwatchers/kmsg: add initial kmsg watcher impl
This adds a logwatcher which is able to parse kernel messages directly
from the /dev/kmsg interface. This supports any modern linux distro,
while also avoiding any dependency on libraries (e.g. as journald
needs).
2017-03-09 20:40:49 -08:00
Random-Liu
02d6b89536 Fix journald plugin to only look at the current boot. 2017-03-02 13:57:38 -08:00
Andy Xie
0a914cae09 refactor options pkg 2017-02-23 08:23:52 +08:00
fate-grand-order
a756ef48f3 fix misspell "timestamp" 2017-02-21 23:01:30 +08:00
Random-Liu
889d9efbc1 Add unit test for goroutine leak. 2017-02-16 00:08:56 -08:00
Random-Liu
6170b0c87f Add multiple log monitoring support. 2017-02-15 13:15:18 -08:00
Random-Liu
dba47bdc27 Update the README.md. 2017-02-15 13:07:01 -08:00
Random-Liu
10fc831409 Change kernel specific name in code base and change syslog to filelog. 2017-02-15 13:07:01 -08:00
Random-Liu
f16f0f630b Rename helpers.go to translator.go 2017-02-10 11:32:35 -08:00
Random-Liu
27cc831408 Add arbitrary daemon log support 2017-02-10 11:32:35 -08:00
Dawn Chen
5e563930c0 Merge pull request #81 from Random-Liu/fix-kernel-monitor-issues
Fix kernel monitor issues
2017-02-10 11:17:17 -08:00
Random-Liu
d281cb8a15 Fix kernel monitor issues:
* Change `unregister_netdevice` to be an event to fix #47.
* Change `KernelPanic` to `KernelOops` because we can't handle kernel
panic currently.
* Use system boot time instead of "StartPattern" to fix #48.
2017-02-09 16:09:27 -08:00
Lantao Liu
f20b892123 Merge pull request #84 from Random-Liu/fix-transition-timestamp
Only change transition timestamp when condition is changed.
2017-02-07 10:41:51 -08:00
Andy Xie
d0e0a8c765 add options pkg 2017-02-07 18:44:21 +08:00
Random-Liu
20ffe37cea Add NPD endpoints: /debug/pprof, /healthz, /conditions. 2017-02-03 11:07:06 -08:00
Dawn Chen
b66c4df364 Merge pull request #39 from Random-Liu/journald-support
Journald support
2017-02-01 12:41:51 -08:00
Random-Liu
a986976a1d Only change transition timestamp when condition is changed. 2017-01-27 14:48:28 -08:00
Lantao Liu
ba5f5a158d Merge pull request #79 from Random-Liu/change-resync-mechanism
Update NPD to only do forcibly sync every 1 minutes.
2017-01-24 00:39:50 -08:00
Random-Liu
60975f5ad5 Update NPD to only do forcibly sync every 1 minutes. 2017-01-24 00:31:46 -08:00
fate-grand-order
9ac19a240a correct spelling error in kernel_monitor.go 2017-01-22 22:21:39 +08:00
Random-Liu
2ef2af99eb Update Readme.md 2017-01-19 01:59:09 -08:00
Random-Liu
c15d463ad5 Finish the journald support 2017-01-19 01:59:09 -08:00
Lantao Liu
f0ed07a0b4 Merge pull request #72 from andyxning/enrich_info_about_nodename
detail how node-problem-detector get node name in README
2017-01-18 11:02:56 -08:00
Andy Xie
7302c70143 add -hostname-override 2017-01-18 23:45:30 +08:00
fate-grand-order
a8a5538357 fix misspell 2017-01-17 15:13:02 +08:00
Random-Liu
6637139441 Add release tar ball support. 2017-01-13 11:13:59 -08:00
Random-Liu
aedb371d06 Add --version flag. 2017-01-12 02:07:25 -08:00
Lantao Liu
0cd7944653 Merge pull request #49 from andyxning/add_support_for_running_standalone
add support for running standalone
2017-01-09 23:35:15 -08:00
andy xie
68b379c423 add support for running npd standalone 2017-01-07 23:49:19 +08:00
andy xie
2606d52afb check for linux os 2016-12-22 10:30:42 +08:00
andy xie
2c12274333 bump kubernetes version to v1.4.0-beta.3 2016-12-20 18:11:03 +08:00
AdoHe
86f4d07547 fix data race 2016-10-31 10:40:16 -04:00
AdoHe
ff0a099eec fix test issue 2016-10-31 10:08:37 -04:00
AdoHe
1e33cddf10 mirror update 2016-10-26 08:43:04 +08:00
AdoHe
84c25077da add journald support 2016-10-08 20:28:30 -04:00
Lantao Liu
aa9e268be7 Remove the function getStartPoint, because in current logic, it is not
needed anymore.
2016-09-12 14:04:23 -07:00
Lantao Liu
a8f491c0d3 Fix unit test. 2016-09-09 20:00:18 -07:00
Dawn Chen
ea83111c80 Merge pull request #22 from Random-Liu/add-look-back
Kernel Monitor: Add look back support and kernel panic handling
2016-08-23 17:13:58 -07:00
Lantao Liu
9054dab4c8 Get node name from the downward api. 2016-08-22 17:51:15 -07:00
Lantao Liu
532f933bd8 This PR:
1) Add lookback support in kernel monitor. After started, Kernel monitor
will check some old logs to detect problems which happened before last
node reboot.
2) Add `lookback` and `startPattern` in kernel monitor configuration.
  * `lookback` specifies how long time kernel monitor should look back.
  * `startPattern` specifies which log indicates the node is started.
  kernel monitor will clear all current node conditions once it finds
  a node start log. This makes sure that old problems won't change the
  node condition.
3) Add support for kernel panic monitoring, the null pointer and divide
0 kernel panic will be surfaced as event. Usually kernel monitor will
report these events during looking back phase.
2016-08-20 19:11:26 -07:00
Lantao Liu
5a19ac1868 Get node name from pod, this makes sure that the node
name should always be consistent with kubelet.
2016-08-11 14:22:29 -07:00
Lantao Liu
acabf68e06 Add README.md for kernel monitor 2016-06-24 16:19:44 -07:00
Girish Kalele
b687dfaafc Containerize the nethealth bandwidth measurement utility 2016-06-07 20:51:30 -07:00
Girish Kalele
33a43545ca Node network health check utility - performs a quick HTTP GET test 2016-06-03 14:26:12 -07:00
Lantao Liu
29ff791f08 Hack for unsupported OS distros. 2016-06-03 01:48:26 -07:00
Lantao Liu
5b07afd325 1. Make source and conditions configurable.
2. Add multiple events and conditions support in problem interface.
2016-06-02 15:32:02 -07:00
Lantao Liu
8759e4d610 Use Patch instead of UpdateStatus. 2016-05-30 19:22:32 -07:00