Commit Graph

135 Commits

Author SHA1 Message Date
zhangyue
b51cb3219f fix: print result's message when status unknown
Signed-off-by: zhangyue <huaihuan.zy@alibaba-inc.com>
2020-11-18 19:30:17 +08:00
varsha teratipally
50127b0512 changed labelname after code review 2020-08-06 00:43:45 +00:00
varsha teratipally
4c40b7e468 updated readme 2020-08-05 21:43:58 +00:00
varsha teratipally
e13210157d Add more info to disk metrics 2020-08-05 21:12:25 +00:00
Frame
9678892546 Fix typo in custom-plugin-monitor 2020-08-03 17:08:42 +08:00
Kubernetes Prow Robot
f3ab10eddb Merge pull request #442 from abansal4032/custom-plugin-logs-capture
Capture the logs from stderr of custom plugins
2020-07-29 14:18:03 -07:00
Archit Bansal
6acf5b1edb Capture the logs from stderr of custom plugins. 2020-07-29 11:57:05 -07:00
Kubernetes Prow Robot
c3cf941e98 Merge pull request #441 from abansal4032/custom-plugin-log-fix
Generate new status log only on condition change
2020-07-28 09:45:48 -07:00
Archit Bansal
f80f3e0dfa Generate status generation logs from custom plugin run only on condition change. 2020-07-24 09:39:39 -07:00
Archit Bansal
f56d0a929d Use InactiveExitTimestamp instead of ActiveEnterTimestamp for cooldown
period in health check monitor.
2020-07-16 18:53:47 -07:00
Archit Bansal
44dc4aa6c1 Add health-check-monitor 2020-05-27 14:08:42 -07:00
Abhilash Pallerlamudi
5342a50874 Add rhel support for osversion 2020-04-15 13:19:56 -07:00
Andrew DeMaria
7fd465e195 Add namespace option for events 2020-03-05 19:04:31 -07:00
Xuewei Zhang
83b09277f0 Collect more cpu/disk/memory metrics 2020-02-03 15:29:45 -08:00
Xuewei Zhang
fa7a3d7df1 Fix disk metrics unit and queue_length calculation 2020-01-02 17:19:38 -08:00
Kubernetes Prow Robot
0d0bba94e5 Merge pull request #402 from gmemcc/master
Ignore first collected disk stats to prevent metric distortion
2019-12-18 11:57:57 -08:00
Alex Wong
5a4ac81186 Only disk_avg_queue_len is distorted on first collection 2019-12-12 14:39:29 +08:00
Alex Wong
3d10c892a2 Ignore first collected disk stats to prevent metric distortion 2019-12-11 11:14:01 +08:00
yuzhiquan
9c24be2da4 cleanup: using time.Since(t) instead of t.Sub(time.Now()) 2019-12-05 18:57:53 +08:00
yuzhiquan
b458f0d028 fix: modify typo 2019-12-03 15:21:57 +08:00
Xuewei Zhang
5e55ef89f1 Make log-counter respect ENABLE_JOURNALD 2019-11-26 13:58:10 -08:00
tongxin21
d5cb44646e add an unit test for parsing the "/etc/os-release" of CentOS
add a newline character at the end
2019-11-01 13:34:22 +08:00
tongxin21
9b9f18a7ed add a case is ID="centos" 2019-10-28 19:09:15 +08:00
Lantao Liu
be7cc78aa0 Properly close channel when monitor exits.
Signed-off-by: Lantao Liu <lantaol@google.com>
2019-10-25 14:11:39 -07:00
Kubernetes Prow Robot
705cb01e0c Merge pull request #339 from wenjun93/logmonitor
avoid log channel closed caused endless loop
2019-10-25 11:27:39 -07:00
Kubernetes Prow Robot
bac3429522 Merge pull request #359 from gmemcc/hotfix-closed-channel
fix close of closed channel
2019-10-24 20:57:38 -07:00
wenjun93
4a4ebc7097 avoid log channel closed caused endless loop 2019-10-25 11:43:49 +08:00
Kubernetes Prow Robot
a999207a56 Merge pull request #367 from grosser/grosser/unwrap
untangle plugin runner a bit
2019-10-24 20:29:38 -07:00
Michael Grosser
3be50a088a untangle plugin runner a bit
add some docs and make it clearer what is actually going on
(parallel rule execution on start and then on timer)
2019-10-10 15:46:04 -07:00
Xuewei Zhang
794300af59 Add stackdriver exporter endpoint for problem_gauge 2019-09-26 13:45:17 -07:00
Matt Matejczyk
2e9da8569d Make heartbeatPeriod const into a flag. 2019-09-26 09:59:03 +02:00
Alex Wong
60e048d2ce fix close of closed channel 2019-09-24 16:07:47 +08:00
Xuewei Zhang
e1939ebc03 Handle vendor change in k8s.io/apimachinery/pkg/util/clock
clock.Clock used to have Tick() method, but is now replaced with
NewTicker() method to prevent leaking. Changed NPD code to adapt to it.

See https://github.com/kubernetes/apimachinery/commit/10ebc22e for more
detail.
2019-09-14 15:22:09 -07:00
Xuewei Zhang
0f0e5eff0f Adding stackdriver exporter 2019-09-12 18:30:00 -07:00
Xuewei Zhang
9e789b5f99 Refactor on metrics so that names for all the views are tracked 2019-09-11 12:07:13 -07:00
Xuewei Zhang
0f2fce56e5 Change host/uptime to GAUGE metrics 2019-09-10 16:58:06 -07:00
Kubernetes Prow Robot
2a07254f96 Merge pull request #253 from finn-no/master
Empty LogPath will use journald's default path.
2019-08-27 09:22:41 -07:00
Andrew Stribblehill
09c498ad74 Empty LogPath will use journald's default path. 2019-08-27 01:55:30 +02:00
Xuewei Zhang
82c2368795 Metric format fixes on host/uptime and disk/*
1. host/uptime, disk/io_time and disk/weighted_io should be
counter/cumulative metrics. SO we have to use the Sum aggregation method
rather than LastValue aggregation method (which will declare the metric
as gauge metric).

2. Renamed label "device" for disk/* metrics to "device_name".
This is to clarify that it is device_name (sda1) rather than device_path
(/dev/sda1)
2019-08-16 15:14:54 -07:00
Kubernetes Prow Robot
424b864291 Merge pull request #323 from xueweiz/test
Add a simple e2e test
2019-08-16 14:56:09 -07:00
Xuewei Zhang
f9b5e60a43 Add e2e test for NPD
The first test is a very simple test. It installs NPD on a VM, and then
verifies that NPD reports metric host_uptime in Prometheus format.
2019-08-16 01:33:29 -07:00
Lang Chi
4d37d6fb68 fix a spelling error
Signed-off-by: Lang Chi <21860405@zju.edu.cn>
2019-08-13 15:12:01 +08:00
Kubernetes Prow Robot
e280e2075a Merge pull request #320 from wangzhen127/custom-plugin-fix
Don't update condition if status stays False/Unknown for custom plugin
2019-08-07 17:09:18 -07:00
Zhen Wang
30e20c6a20 Validate that permanent problem has preset default condition 2019-08-01 23:40:16 -07:00
Zhen Wang
2f5d03280a Don't update condition if status stays False/Unknown for custom plugin 2019-08-01 23:40:16 -07:00
Zhen Wang
182a9450dd Print monitor config path in the logs 2019-07-30 11:00:47 -07:00
Kubernetes Prow Robot
599ca532e8 Merge pull request #315 from xueweiz/metrics
Report metrics from custom-plugin-monitor
2019-07-25 11:58:44 -07:00
Xuewei Zhang
94af7de97b Report metrics from custom-plugin-monitor 2019-07-25 11:28:38 -07:00
Kubernetes Prow Robot
b8ce6360d9 Merge pull request #300 from xueweiz/metrics
Report metrics from system-log-monitor
2019-07-12 15:17:06 -07:00
Xuewei Zhang
fbebcf311b Report metrics from system-log-monitor 2019-07-12 14:38:21 -07:00