Commit Graph

154 Commits

Author SHA1 Message Date
Kubernetes Prow Robot
4ad49bbd84 Merge pull request #503 from vteratipally/label_fix
changing the label names as per the standards
2020-12-08 22:04:49 -08:00
Kubernetes Prow Robot
4dccc1ce24 Merge pull request #493 from vteratipally/kernel_cmdline_parameters
add code to retrieve kernel command line parameters
2020-12-08 17:58:18 -08:00
varsha teratipally
4085da817d renaming splitWords to tokens 2020-12-08 18:34:54 +00:00
varsha teratipally
047958a49c changing the label names as per the standards 2020-12-08 02:27:22 +00:00
varsha teratipally
ffc46f977d add code to retrieve kernel command line parameters 2020-12-07 22:40:22 +00:00
Jeremy Edwards
4adec4bbc6 Introduce Windows build of Node Problem Detector 2020-12-05 23:54:52 +00:00
Kubernetes Prow Robot
bf51d6600e Merge pull request #492 from vteratipally/module_stats_branch
add code to retrieve kernel modules in a linux system from /proc/modules
2020-12-03 09:51:00 -08:00
Kubernetes Prow Robot
1e917af560 Merge pull request #455 from ZYecho/fix_newmessage
fix: print result's message when status unknown
2020-11-24 16:14:39 -08:00
varsha teratipally
2b50e4af1a add testcases for cos and ubuntu to retrieve modules 2020-11-19 10:29:12 +00:00
varsha teratipally
944efce3a6 add code for retrieving kernel modules 2020-11-19 09:49:25 +00:00
Kubernetes Prow Robot
112d53b10a Merge pull request #497 from vteratipally/fs_types
avoid duplicating the disk bytes used metrics based on fstype and mount types
2020-11-18 10:48:07 -08:00
zhangyue
b51cb3219f fix: print result's message when status unknown
Signed-off-by: zhangyue <huaihuan.zy@alibaba-inc.com>
2020-11-18 19:30:17 +08:00
Kubernetes Prow Robot
d8ea2538de Merge pull request #489 from abansal4032/health-check-kubelet-connection
Kubelet api server connection check in health checker
2020-11-16 14:06:42 -08:00
Kubernetes Prow Robot
33571a312d Merge pull request #478 from neoseele/master
fix: node memory metrics are off by 1024
2020-11-16 14:06:12 -08:00
varsha teratipally
1550882948 avoid duplicating the disk bytes used metrics based on fstype and mountopts 2020-11-16 20:10:46 +00:00
Archit Bansal
2513756583 Add kubelet apiserver connection fail check in health checker 2020-11-09 12:47:16 -08:00
Karan Goel
925ea7393c Collect CPU load averages in a separate metric 2020-11-09 09:41:52 -08:00
Neil
589411702a fix: node memory metrics are off by 1024
The memory unit in /proc/meminfo is kB (b/171164235)

```
MemTotal:       264129908 kB
MemFree:        153559480 kB
...
```
2020-10-19 17:26:31 +11:00
Archit Bansal
8c94d5e60c Add logging levels to custom plugin logs. 2020-08-28 12:51:50 -07:00
Archit Bansal
3a9370e01b Log custom plugin stderr only if the status is not ok.
Otherwise with plugins that run frequently and report ok status, the
logs are filled with unnecessary noise and significantly increases log
size.
2020-08-27 10:17:05 -07:00
varsha teratipally
50127b0512 changed labelname after code review 2020-08-06 00:43:45 +00:00
varsha teratipally
4c40b7e468 updated readme 2020-08-05 21:43:58 +00:00
varsha teratipally
e13210157d Add more info to disk metrics 2020-08-05 21:12:25 +00:00
Frame
9678892546 Fix typo in custom-plugin-monitor 2020-08-03 17:08:42 +08:00
Kubernetes Prow Robot
f3ab10eddb Merge pull request #442 from abansal4032/custom-plugin-logs-capture
Capture the logs from stderr of custom plugins
2020-07-29 14:18:03 -07:00
Archit Bansal
6acf5b1edb Capture the logs from stderr of custom plugins. 2020-07-29 11:57:05 -07:00
Kubernetes Prow Robot
c3cf941e98 Merge pull request #441 from abansal4032/custom-plugin-log-fix
Generate new status log only on condition change
2020-07-28 09:45:48 -07:00
Archit Bansal
f80f3e0dfa Generate status generation logs from custom plugin run only on condition change. 2020-07-24 09:39:39 -07:00
Archit Bansal
f56d0a929d Use InactiveExitTimestamp instead of ActiveEnterTimestamp for cooldown
period in health check monitor.
2020-07-16 18:53:47 -07:00
Archit Bansal
44dc4aa6c1 Add health-check-monitor 2020-05-27 14:08:42 -07:00
Abhilash Pallerlamudi
5342a50874 Add rhel support for osversion 2020-04-15 13:19:56 -07:00
Andrew DeMaria
7fd465e195 Add namespace option for events 2020-03-05 19:04:31 -07:00
Xuewei Zhang
83b09277f0 Collect more cpu/disk/memory metrics 2020-02-03 15:29:45 -08:00
Xuewei Zhang
fa7a3d7df1 Fix disk metrics unit and queue_length calculation 2020-01-02 17:19:38 -08:00
Kubernetes Prow Robot
0d0bba94e5 Merge pull request #402 from gmemcc/master
Ignore first collected disk stats to prevent metric distortion
2019-12-18 11:57:57 -08:00
Alex Wong
5a4ac81186 Only disk_avg_queue_len is distorted on first collection 2019-12-12 14:39:29 +08:00
Alex Wong
3d10c892a2 Ignore first collected disk stats to prevent metric distortion 2019-12-11 11:14:01 +08:00
yuzhiquan
9c24be2da4 cleanup: using time.Since(t) instead of t.Sub(time.Now()) 2019-12-05 18:57:53 +08:00
yuzhiquan
b458f0d028 fix: modify typo 2019-12-03 15:21:57 +08:00
Xuewei Zhang
5e55ef89f1 Make log-counter respect ENABLE_JOURNALD 2019-11-26 13:58:10 -08:00
tongxin21
d5cb44646e add an unit test for parsing the "/etc/os-release" of CentOS
add a newline character at the end
2019-11-01 13:34:22 +08:00
tongxin21
9b9f18a7ed add a case is ID="centos" 2019-10-28 19:09:15 +08:00
Lantao Liu
be7cc78aa0 Properly close channel when monitor exits.
Signed-off-by: Lantao Liu <lantaol@google.com>
2019-10-25 14:11:39 -07:00
Kubernetes Prow Robot
705cb01e0c Merge pull request #339 from wenjun93/logmonitor
avoid log channel closed caused endless loop
2019-10-25 11:27:39 -07:00
Kubernetes Prow Robot
bac3429522 Merge pull request #359 from gmemcc/hotfix-closed-channel
fix close of closed channel
2019-10-24 20:57:38 -07:00
wenjun93
4a4ebc7097 avoid log channel closed caused endless loop 2019-10-25 11:43:49 +08:00
Kubernetes Prow Robot
a999207a56 Merge pull request #367 from grosser/grosser/unwrap
untangle plugin runner a bit
2019-10-24 20:29:38 -07:00
Michael Grosser
3be50a088a untangle plugin runner a bit
add some docs and make it clearer what is actually going on
(parallel rule execution on start and then on timer)
2019-10-10 15:46:04 -07:00
Xuewei Zhang
794300af59 Add stackdriver exporter endpoint for problem_gauge 2019-09-26 13:45:17 -07:00
Matt Matejczyk
2e9da8569d Make heartbeatPeriod const into a flag. 2019-09-26 09:59:03 +02:00