node-problem-detector

mirror of https://github.com/kubernetes/node-problem-detector.git synced 2026-02-25 23:33:53 +00:00

Author	SHA1	Message	Date
Archit Bansal	44dc4aa6c1	Add health-check-monitor	2020-05-27 14:08:42 -07:00
Abhilash Pallerlamudi	5342a50874	Add rhel support for osversion	2020-04-15 13:19:56 -07:00
Andrew DeMaria	7fd465e195	Add namespace option for events	2020-03-05 19:04:31 -07:00
Xuewei Zhang	83b09277f0	Collect more cpu/disk/memory metrics	2020-02-03 15:29:45 -08:00
Xuewei Zhang	fa7a3d7df1	Fix disk metrics unit and queue_length calculation	2020-01-02 17:19:38 -08:00
Kubernetes Prow Robot	0d0bba94e5	Merge pull request #402 from gmemcc/master Ignore first collected disk stats to prevent metric distortion	2019-12-18 11:57:57 -08:00
Alex Wong	5a4ac81186	Only disk_avg_queue_len is distorted on first collection	2019-12-12 14:39:29 +08:00
Alex Wong	3d10c892a2	Ignore first collected disk stats to prevent metric distortion	2019-12-11 11:14:01 +08:00
yuzhiquan	9c24be2da4	cleanup: using time.Since(t) instead of t.Sub(time.Now())	2019-12-05 18:57:53 +08:00
yuzhiquan	b458f0d028	fix: modify typo	2019-12-03 15:21:57 +08:00
Xuewei Zhang	5e55ef89f1	Make log-counter respect ENABLE_JOURNALD	2019-11-26 13:58:10 -08:00
tongxin21	d5cb44646e	add an unit test for parsing the "/etc/os-release" of CentOS add a newline character at the end	2019-11-01 13:34:22 +08:00
tongxin21	9b9f18a7ed	add a case is ID="centos"	2019-10-28 19:09:15 +08:00
Lantao Liu	be7cc78aa0	Properly close channel when monitor exits. Signed-off-by: Lantao Liu <lantaol@google.com>	2019-10-25 14:11:39 -07:00
Kubernetes Prow Robot	705cb01e0c	Merge pull request #339 from wenjun93/logmonitor avoid log channel closed caused endless loop	2019-10-25 11:27:39 -07:00
Kubernetes Prow Robot	bac3429522	Merge pull request #359 from gmemcc/hotfix-closed-channel fix close of closed channel	2019-10-24 20:57:38 -07:00
wenjun93	4a4ebc7097	avoid log channel closed caused endless loop	2019-10-25 11:43:49 +08:00
Kubernetes Prow Robot	a999207a56	Merge pull request #367 from grosser/grosser/unwrap untangle plugin runner a bit	2019-10-24 20:29:38 -07:00
Michael Grosser	3be50a088a	untangle plugin runner a bit add some docs and make it clearer what is actually going on (parallel rule execution on start and then on timer)	2019-10-10 15:46:04 -07:00
Xuewei Zhang	794300af59	Add stackdriver exporter endpoint for problem_gauge	2019-09-26 13:45:17 -07:00
Matt Matejczyk	2e9da8569d	Make heartbeatPeriod const into a flag.	2019-09-26 09:59:03 +02:00
Alex Wong	60e048d2ce	fix close of closed channel	2019-09-24 16:07:47 +08:00
Xuewei Zhang	e1939ebc03	Handle vendor change in k8s.io/apimachinery/pkg/util/clock clock.Clock used to have Tick() method, but is now replaced with NewTicker() method to prevent leaking. Changed NPD code to adapt to it. See https://github.com/kubernetes/apimachinery/commit/10ebc22e for more detail.	2019-09-14 15:22:09 -07:00
Xuewei Zhang	0f0e5eff0f	Adding stackdriver exporter	2019-09-12 18:30:00 -07:00
Xuewei Zhang	9e789b5f99	Refactor on metrics so that names for all the views are tracked	2019-09-11 12:07:13 -07:00
Xuewei Zhang	0f2fce56e5	Change host/uptime to GAUGE metrics	2019-09-10 16:58:06 -07:00
Kubernetes Prow Robot	2a07254f96	Merge pull request #253 from finn-no/master Empty LogPath will use journald's default path.	2019-08-27 09:22:41 -07:00
Andrew Stribblehill	09c498ad74	Empty LogPath will use journald's default path.	2019-08-27 01:55:30 +02:00
Xuewei Zhang	82c2368795	Metric format fixes on host/uptime and disk/* 1. host/uptime, disk/io_time and disk/weighted_io should be counter/cumulative metrics. SO we have to use the Sum aggregation method rather than LastValue aggregation method (which will declare the metric as gauge metric). 2. Renamed label "device" for disk/* metrics to "device_name". This is to clarify that it is device_name (sda1) rather than device_path (/dev/sda1)	2019-08-16 15:14:54 -07:00
Kubernetes Prow Robot	424b864291	Merge pull request #323 from xueweiz/test Add a simple e2e test	2019-08-16 14:56:09 -07:00
Xuewei Zhang	f9b5e60a43	Add e2e test for NPD The first test is a very simple test. It installs NPD on a VM, and then verifies that NPD reports metric host_uptime in Prometheus format.	2019-08-16 01:33:29 -07:00
Lang Chi	4d37d6fb68	fix a spelling error Signed-off-by: Lang Chi <21860405@zju.edu.cn>	2019-08-13 15:12:01 +08:00
Kubernetes Prow Robot	e280e2075a	Merge pull request #320 from wangzhen127/custom-plugin-fix Don't update condition if status stays False/Unknown for custom plugin	2019-08-07 17:09:18 -07:00
Zhen Wang	30e20c6a20	Validate that permanent problem has preset default condition	2019-08-01 23:40:16 -07:00
Zhen Wang	2f5d03280a	Don't update condition if status stays False/Unknown for custom plugin	2019-08-01 23:40:16 -07:00
Zhen Wang	182a9450dd	Print monitor config path in the logs	2019-07-30 11:00:47 -07:00
Kubernetes Prow Robot	599ca532e8	Merge pull request #315 from xueweiz/metrics Report metrics from custom-plugin-monitor	2019-07-25 11:58:44 -07:00
Xuewei Zhang	94af7de97b	Report metrics from custom-plugin-monitor	2019-07-25 11:28:38 -07:00
Kubernetes Prow Robot	b8ce6360d9	Merge pull request #300 from xueweiz/metrics Report metrics from system-log-monitor	2019-07-12 15:17:06 -07:00
Xuewei Zhang	fbebcf311b	Report metrics from system-log-monitor	2019-07-12 14:38:21 -07:00
Yang Guo	ddb1d76178	Support waiting for kube-apiserver to be ready with timout during NPD startup	2019-07-09 10:24:25 -07:00
Xuewei Zhang	4944ac3e48	Implement host collector as part of system-stats-monitor Host collector report three things today: 1. Host OS uptime (in seconds) 2. Host kernel version (as a metric label) 3. Host OS version (as a metric label)	2019-06-27 16:40:11 -07:00
Xuewei Zhang	29b0740f4c	Refactor systemstatsmonitor/metric_helper.go into a metrics package	2019-06-27 16:40:05 -07:00
Xuewei Zhang	225de07427	Correctly identify failures in problem daemon starting.	2019-06-26 17:55:11 -07:00
Xuewei Zhang	cf6624661a	Update READMEs	2019-06-13 00:51:17 -07:00
Xuewei Zhang	7ad5dec712	Add disk metrics support.	2019-06-13 00:51:17 -07:00
Xuewei Zhang	23dc265971	Add Prometheus exporter.	2019-06-13 00:51:17 -07:00
Xuewei Zhang	a07176073a	Add existing monitors into the problem daemon registration hook.	2019-06-13 00:51:17 -07:00
Xuewei Zhang	63f0e35e56	Implement dynamic problemdaemon registration and initialization. Added package problemdaemon. All future problem daemons should be registered by calling problemdaemon.register(). CLI interfaces will be automatically generated for all registered problem daemons in the form of "--config.DAEMON_NAME"	2019-06-12 18:29:18 -07:00
Xuewei Zhang	5814195ad5	Move apiserver-reporting logic into k8s_exporter. Added CLI option "enable-k8s-exporter" (default to true). Users can use this option to enable/disable exporting to Kubernetes control plane. This commit also removes all the apiserver-specific logic from package problemdetector. Future exporters (e.g. to local journald, Prometheus, other control planes) should implement types.Exporter interface.	2019-06-12 18:29:18 -07:00

1 2 3

125 Commits