61 Commits

Author SHA1 Message Date
Ciprian Hacman
2e69489cc6 Update golangci-lint to v2.6.2 2025-11-29 10:42:18 +02:00
Ciprian Hacman
073c3ad9f8 healthchecker: Fix systemd-service flag deprecation 2025-10-07 21:25:21 +03:00
Sergey Kanzhelev
0ce333bbc5 enabled and fixed the errcheck linter rule 2025-09-10 21:45:46 +00:00
Sergey Kanzhelev
75bf501888 format imports 2025-08-13 16:56:32 +00:00
Nick Parker
8d237a6c7c feat(k8sExporter): Options to allow disabling Events or Node Conditions
Both outputs are currently hardcoded to being enabled, this allows disabling one or the other. Defaults to both enabled to retain current behavior.

Larger clusters can save some etcd I/O by skipping one of these outputs if they aren't being consumed. In our case we aren't consuming the Events so writing them just creates more churn.
2025-02-04 14:53:19 +13:00
googs1025
0d756b78fc chore: qps flag: use float32 instead of float64 2024-11-06 13:09:20 +08:00
googs1025
17dcc94418 feature: add QPS Burst flags 2024-10-27 21:58:45 +08:00
Eric Lin
1002df5e13 Add --revert-pattern for logcounter 2024-01-17 18:21:57 +00:00
Ciprian Hacman
5210373640 Init useful flags for klog/v2 2023-09-17 11:00:42 +03:00
Manuel Rüger
e43459d86d Move glog/klog logging to klog/v2 2023-09-17 08:57:33 +03:00
Fan Shang Xiang
471ab88240 add context to long running operations 2023-07-13 10:01:18 +08:00
Kubernetes Prow Robot
e6fbdd434a Merge pull request #760 from MartinForReal/master
bump k8s.io dependencies to 1.17.2
2023-06-25 21:41:16 -07:00
guoguangwu
6dc23ca804 chore: remove refs to deprecated io/ioutil 2023-06-21 12:12:27 +08:00
Fan Shang Xiang
b5e4ef628b bump k8s.io to 1.17.2 2023-06-12 22:27:39 +08:00
Izaak Alpert (karlhungus)
b6d8069610 allow setting crictl timeout 2022-09-15 14:31:41 -04:00
tashen
a3b928467e add loopbacktime to reduce time of journalctl call 2021-05-19 13:55:55 +08:00
michelletandya
01cd8dd08c Add healthChecker functionality for kube-proxy service 2021-05-05 17:27:58 +00:00
michelletandya
c4e5400ed6 separate linux/windows health checker files. 2021-04-26 21:45:05 +00:00
Jeremy Edwards
a7f78c5668 Enable NPD to run as a Windows Service. 2021-04-02 23:03:14 -07:00
Archit Bansal
100f2bf8e6 Make log pattern check configurable in health checker 2021-02-17 17:46:18 -08:00
Archit Bansal
2513756583 Add kubelet apiserver connection fail check in health checker 2020-11-09 12:47:16 -08:00
Archit Bansal
44dc4aa6c1 Add health-check-monitor 2020-05-27 14:08:42 -07:00
Andrew DeMaria
7fd465e195 Add namespace option for events 2020-03-05 19:04:31 -07:00
Xuewei Zhang
5e55ef89f1 Make log-counter respect ENABLE_JOURNALD 2019-11-26 13:58:10 -08:00
Lantao Liu
be7cc78aa0 Properly close channel when monitor exits.
Signed-off-by: Lantao Liu <lantaol@google.com>
2019-10-25 14:11:39 -07:00
wojtekt
43728fb0fc Decrease default frequency of forced heartbeats to 5m 2019-10-24 10:39:01 +02:00
Matt Matejczyk
2e9da8569d Make heartbeatPeriod const into a flag. 2019-09-26 09:59:03 +02:00
Xuewei Zhang
0f0e5eff0f Adding stackdriver exporter 2019-09-12 18:30:00 -07:00
Yang Guo
ddb1d76178 Support waiting for kube-apiserver to be ready with timout during NPD startup 2019-07-09 10:24:25 -07:00
Xuewei Zhang
225de07427 Correctly identify failures in problem daemon starting. 2019-06-26 17:55:11 -07:00
Xuewei Zhang
be2647a686 Allow compilation time disabling for each type of Problem Daemon. 2019-06-17 16:02:45 -07:00
Lantao Liu
d520ca89bd Build node-problem-detector from a directory.
Signed-off-by: Lantao Liu <lantaol@google.com>
2019-06-13 18:54:23 -07:00
Lantao Liu
f2d17ee77b Do not import plugins unnecessarily.
Signed-off-by: Lantao Liu <lantaol@google.com>
2019-06-13 17:57:53 -07:00
Xuewei Zhang
7ad5dec712 Add disk metrics support. 2019-06-13 00:51:17 -07:00
Xuewei Zhang
23dc265971 Add Prometheus exporter. 2019-06-13 00:51:17 -07:00
Xuewei Zhang
a07176073a Add existing monitors into the problem daemon registration hook. 2019-06-13 00:51:17 -07:00
Xuewei Zhang
63f0e35e56 Implement dynamic problemdaemon registration and initialization.
Added package problemdaemon. All future problem daemons should be
registered by calling problemdaemon.register().

CLI interfaces will be automatically generated for all registered
problem daemons in the form of "--config.DAEMON_NAME"
2019-06-12 18:29:18 -07:00
Xuewei Zhang
5814195ad5 Move apiserver-reporting logic into k8s_exporter.
Added CLI option "enable-k8s-exporter" (default to true). Users can use
this option to enable/disable exporting to Kubernetes control plane.

This commit also removes all the apiserver-specific logic from package
problemdetector.

Future exporters (e.g. to local journald, Prometheus, other control
planes) should implement types.Exporter interface.
2019-06-12 18:29:18 -07:00
liangwei
4110e5350d node-problem-detector --version should not require monitors specify 2019-04-17 09:58:12 +08:00
Zhen Wang
7e766a4ec0 Disable glog writing to files for log-counter 2019-04-03 13:25:07 -07:00
Kenjiro Nakayama
a248e2a842 Add validation for the required flag
If --system-log-monitors or --custom-plugin-monitors are not
specified, npd gave us unclear message.

This patch adds the validation and clear error message.
2019-01-17 13:38:19 +09:00
Zhen Wang
1f636381b8 Detect kubelet and container runtime frequent crashes 2018-11-26 22:41:06 -08:00
SataQiu
91adf37050 fix typo: NDDE -> NODE, permenantly -> permanently 2018-11-21 17:36:08 +08:00
David Ashpole
bf730e9c63 add log-counter go plugin 2018-06-20 15:55:19 -07:00
gkGaneshR
821b8f41aa Modify unit testing of cmd/options
1. Why is this change necessary ?
   Modify unit testing of options in such a way that we specify
the WantedNodeName

2. How to verify this change ?
   Run, make test

Signed-off-by: gkGaneshR <gkganesh126@gmail.com>
2018-03-09 13:17:52 +05:30
gkGaneshR
ca76dc12ee Add dot(.) at the end of comments
Signed-off-by: gkGaneshR <gkganesh126@gmail.com>
2018-03-09 12:08:05 +05:30
gkGaneshR
c75a35099e Avoided changing hostname and changing var names
1. Why is this change necessary ?
 The program avoids changing hostname and the variable name "Options" is changed to
"options". Also, added more comments and formatted. Removed hostname in options since
it will not be changed in tests.

2. How does this change address the issue ?
 While the program is being run, the hostname is not changed. And options can't be
accessed outside(not exported).

3. How to verify this change ?
 Run, make test

Signed-off-by: gkGaneshR <gkganesh126@gmail.com>
2018-03-05 13:49:10 +05:30
gkGaneshR
a591ce52f9 Added copyright 2018 statement
1. Why is this change necessary ?
 Added copyright 2018 statement on options_test.go and added space
between // and text on the comments.

Signed-off-by: gkGaneshR <gkganesh126@gmail.com>
2018-03-04 21:05:16 +05:30
gkGaneshR
25b6c169a2 Unit testing for SetNodeNameOrDie in package cmd/options
1. Why is this change necessary ?
fixes: kubernetes/node-problem-detector#161

2. How does this change address the issue ?
Under package cmd/options, the testing for SetNodeNameOrDie need
to decide Nodename based on environment variable "NODE_NAME" or
hostname or hostnameoverride variable.

3. How to verify this change ?
Run "go test" with admin privilege

Signed-off-by: gkGaneshR <gkganesh126@gmail.com>
2018-03-02 21:03:59 +05:30
Andy Xie
10dbfef1a8 add custom problem detector plugin 2017-11-22 10:14:09 +08:00