Archit Bansal
44dc4aa6c1
Add health-check-monitor
2020-05-27 14:08:42 -07:00
Abhilash Pallerlamudi
5342a50874
Add rhel support for osversion
2020-04-15 13:19:56 -07:00
Andrew DeMaria
7fd465e195
Add namespace option for events
2020-03-05 19:04:31 -07:00
Xuewei Zhang
83b09277f0
Collect more cpu/disk/memory metrics
2020-02-03 15:29:45 -08:00
Xuewei Zhang
fa7a3d7df1
Fix disk metrics unit and queue_length calculation
2020-01-02 17:19:38 -08:00
Kubernetes Prow Robot
0d0bba94e5
Merge pull request #402 from gmemcc/master
...
Ignore first collected disk stats to prevent metric distortion
2019-12-18 11:57:57 -08:00
Alex Wong
5a4ac81186
Only disk_avg_queue_len is distorted on first collection
2019-12-12 14:39:29 +08:00
Alex Wong
3d10c892a2
Ignore first collected disk stats to prevent metric distortion
2019-12-11 11:14:01 +08:00
yuzhiquan
9c24be2da4
cleanup: using time.Since(t) instead of t.Sub(time.Now())
2019-12-05 18:57:53 +08:00
yuzhiquan
b458f0d028
fix: modify typo
2019-12-03 15:21:57 +08:00
Xuewei Zhang
5e55ef89f1
Make log-counter respect ENABLE_JOURNALD
2019-11-26 13:58:10 -08:00
tongxin21
d5cb44646e
add an unit test for parsing the "/etc/os-release" of CentOS
...
add a newline character at the end
2019-11-01 13:34:22 +08:00
tongxin21
9b9f18a7ed
add a case is ID="centos"
2019-10-28 19:09:15 +08:00
Lantao Liu
be7cc78aa0
Properly close channel when monitor exits.
...
Signed-off-by: Lantao Liu <lantaol@google.com >
2019-10-25 14:11:39 -07:00
Kubernetes Prow Robot
705cb01e0c
Merge pull request #339 from wenjun93/logmonitor
...
avoid log channel closed caused endless loop
2019-10-25 11:27:39 -07:00
Kubernetes Prow Robot
bac3429522
Merge pull request #359 from gmemcc/hotfix-closed-channel
...
fix close of closed channel
2019-10-24 20:57:38 -07:00
wenjun93
4a4ebc7097
avoid log channel closed caused endless loop
2019-10-25 11:43:49 +08:00
Kubernetes Prow Robot
a999207a56
Merge pull request #367 from grosser/grosser/unwrap
...
untangle plugin runner a bit
2019-10-24 20:29:38 -07:00
Michael Grosser
3be50a088a
untangle plugin runner a bit
...
add some docs and make it clearer what is actually going on
(parallel rule execution on start and then on timer)
2019-10-10 15:46:04 -07:00
Xuewei Zhang
794300af59
Add stackdriver exporter endpoint for problem_gauge
2019-09-26 13:45:17 -07:00
Matt Matejczyk
2e9da8569d
Make heartbeatPeriod const into a flag.
2019-09-26 09:59:03 +02:00
Alex Wong
60e048d2ce
fix close of closed channel
2019-09-24 16:07:47 +08:00
Xuewei Zhang
e1939ebc03
Handle vendor change in k8s.io/apimachinery/pkg/util/clock
...
clock.Clock used to have Tick() method, but is now replaced with
NewTicker() method to prevent leaking. Changed NPD code to adapt to it.
See https://github.com/kubernetes/apimachinery/commit/10ebc22e for more
detail.
2019-09-14 15:22:09 -07:00
Xuewei Zhang
0f0e5eff0f
Adding stackdriver exporter
2019-09-12 18:30:00 -07:00
Xuewei Zhang
9e789b5f99
Refactor on metrics so that names for all the views are tracked
2019-09-11 12:07:13 -07:00
Xuewei Zhang
0f2fce56e5
Change host/uptime to GAUGE metrics
2019-09-10 16:58:06 -07:00
Kubernetes Prow Robot
2a07254f96
Merge pull request #253 from finn-no/master
...
Empty LogPath will use journald's default path.
2019-08-27 09:22:41 -07:00
Andrew Stribblehill
09c498ad74
Empty LogPath will use journald's default path.
2019-08-27 01:55:30 +02:00
Xuewei Zhang
82c2368795
Metric format fixes on host/uptime and disk/*
...
1. host/uptime, disk/io_time and disk/weighted_io should be
counter/cumulative metrics. SO we have to use the Sum aggregation method
rather than LastValue aggregation method (which will declare the metric
as gauge metric).
2. Renamed label "device" for disk/* metrics to "device_name".
This is to clarify that it is device_name (sda1) rather than device_path
(/dev/sda1)
2019-08-16 15:14:54 -07:00
Kubernetes Prow Robot
424b864291
Merge pull request #323 from xueweiz/test
...
Add a simple e2e test
2019-08-16 14:56:09 -07:00
Xuewei Zhang
f9b5e60a43
Add e2e test for NPD
...
The first test is a very simple test. It installs NPD on a VM, and then
verifies that NPD reports metric host_uptime in Prometheus format.
2019-08-16 01:33:29 -07:00
Lang Chi
4d37d6fb68
fix a spelling error
...
Signed-off-by: Lang Chi <21860405@zju.edu.cn >
2019-08-13 15:12:01 +08:00
Kubernetes Prow Robot
e280e2075a
Merge pull request #320 from wangzhen127/custom-plugin-fix
...
Don't update condition if status stays False/Unknown for custom plugin
2019-08-07 17:09:18 -07:00
Zhen Wang
30e20c6a20
Validate that permanent problem has preset default condition
2019-08-01 23:40:16 -07:00
Zhen Wang
2f5d03280a
Don't update condition if status stays False/Unknown for custom plugin
2019-08-01 23:40:16 -07:00
Zhen Wang
182a9450dd
Print monitor config path in the logs
2019-07-30 11:00:47 -07:00
Kubernetes Prow Robot
599ca532e8
Merge pull request #315 from xueweiz/metrics
...
Report metrics from custom-plugin-monitor
2019-07-25 11:58:44 -07:00
Xuewei Zhang
94af7de97b
Report metrics from custom-plugin-monitor
2019-07-25 11:28:38 -07:00
Kubernetes Prow Robot
b8ce6360d9
Merge pull request #300 from xueweiz/metrics
...
Report metrics from system-log-monitor
2019-07-12 15:17:06 -07:00
Xuewei Zhang
fbebcf311b
Report metrics from system-log-monitor
2019-07-12 14:38:21 -07:00
Yang Guo
ddb1d76178
Support waiting for kube-apiserver to be ready with timout during NPD startup
2019-07-09 10:24:25 -07:00
Xuewei Zhang
4944ac3e48
Implement host collector as part of system-stats-monitor
...
Host collector report three things today:
1. Host OS uptime (in seconds)
2. Host kernel version (as a metric label)
3. Host OS version (as a metric label)
2019-06-27 16:40:11 -07:00
Xuewei Zhang
29b0740f4c
Refactor systemstatsmonitor/metric_helper.go into a metrics package
2019-06-27 16:40:05 -07:00
Xuewei Zhang
225de07427
Correctly identify failures in problem daemon starting.
2019-06-26 17:55:11 -07:00
Xuewei Zhang
cf6624661a
Update READMEs
2019-06-13 00:51:17 -07:00
Xuewei Zhang
7ad5dec712
Add disk metrics support.
2019-06-13 00:51:17 -07:00
Xuewei Zhang
23dc265971
Add Prometheus exporter.
2019-06-13 00:51:17 -07:00
Xuewei Zhang
a07176073a
Add existing monitors into the problem daemon registration hook.
2019-06-13 00:51:17 -07:00
Xuewei Zhang
63f0e35e56
Implement dynamic problemdaemon registration and initialization.
...
Added package problemdaemon. All future problem daemons should be
registered by calling problemdaemon.register().
CLI interfaces will be automatically generated for all registered
problem daemons in the form of "--config.DAEMON_NAME"
2019-06-12 18:29:18 -07:00
Xuewei Zhang
5814195ad5
Move apiserver-reporting logic into k8s_exporter.
...
Added CLI option "enable-k8s-exporter" (default to true). Users can use
this option to enable/disable exporting to Kubernetes control plane.
This commit also removes all the apiserver-specific logic from package
problemdetector.
Future exporters (e.g. to local journald, Prometheus, other control
planes) should implement types.Exporter interface.
2019-06-12 18:29:18 -07:00