Commit Graph

45 Commits

Author SHA1 Message Date
Kubernetes Prow Robot
d77d8f2992 Merge pull request #721 from UiPath/new-os-distributions
Add support for SLES, Oracle and Amazon Linux
2023-01-31 10:48:56 -08:00
Yordis Prieto Lazo
0842910049 chore: fix misspelling 2022-12-18 22:58:07 -05:00
Alexandru Matei
0afa7cc6ff Add support for SLES, Oracle and Amazon Linux 2022-10-27 14:54:42 +03:00
Andrew Garrett
72ad051dd6 Use Warn severity on K8s Event when Node condition is True
If temporary errors generate an Event with a Warn severity, then surely
permanent errors should generate an Event with at least that high of a
severity level.
2022-06-17 22:13:21 +00:00
Andrew Garrett
b1bd8e7424 Use %q instead of %s 2022-06-09 17:18:30 +00:00
Andrew Garrett
a39a7c6e0f Add condition message to event message
If you're using some monitoring solution that aggregates events from
your Kubernetes cluster, having the underlying reason why a condition
triggered could be very useful, especially if you are using custom
plugin monitors.

Co-authored-by: Micah Norman <micnorman@paypal.com>
Signed-off-by: Ryan Eschinger <reschinger@paypal.com>
2022-06-08 21:42:40 +00:00
Kubernetes Prow Robot
c083db10f0 Merge pull request #628 from mx-psi/master
Change to using new dependency name for osreleaser
2022-04-22 11:35:37 -07:00
Kubernetes Prow Robot
9c23553e0b Merge pull request #650 from yankay/fix-deprecated-maintainer-in-dockerfile
FIx deprecated "MAINTAINER" in Dockerfile
2022-04-21 12:28:12 -07:00
Neo Zhuo
11ddb5e6bf support custom /proc path 2022-04-11 18:15:08 +08:00
Kay Yan
bc89bbce56 MAINTAINER in Dockerfile is deprecated, change to label
Signed-off-by: Kay Yan <kay.yan@daocloud.io>
2022-03-07 15:27:08 +08:00
Pablo Baeyens
a859b5f027 Change to using new dependency name for osreleaser
To do this I
1. changed the name in go.mod and the Go code that used it,
2. ran `go mod tidy -go=1.15` and
3. ran `go mod vendor`.

Step 3 added another vendored dependency unrelated AFAIK to this change.
2021-11-29 16:45:48 +01:00
Julie Qi
fe09e416bd remove aufs hung check 2021-07-30 13:53:25 -07:00
Jeremy Edwards
d4933875ed Add support for basic system metrics for Windows. 2021-05-10 21:58:38 +00:00
michelletandya
c4e5400ed6 separate linux/windows health checker files. 2021-04-26 21:45:05 +00:00
Jeremy Edwards
4181ece888 Windows Support: Fix Build Regressions, Tests Pass 2021-03-14 10:24:45 -07:00
Karan Goel
8648fe265a add metric for per-cpu, per-stage timing 2021-01-29 08:46:39 -08:00
Karan Goel
2a2bab3d28 Add network interface stats
We do not have to collect these often, so for now set the collection
interval to 120s (even though the Stackdriver exporter is still set to
export every 60s).
2021-01-20 08:56:34 -08:00
Karan Goel
71098097c0 add metrics for process stats
Tested on a COS VM:

```
$ curl -s localhost:20257/metrics | grep "^system_"
system_interrupts_total{kernel_version="5.4.49+",os_version="cos 85-13310.1041.24"} 8.759236e+07
system_processes_total{kernel_version="5.4.49+",os_version="cos 85-13310.1041.24"} 692506
system_procs_blocked{kernel_version="5.4.49+",os_version="cos 85-13310.1041.24"} 0
system_procs_running{kernel_version="5.4.49+",os_version="cos 85-13310.1041.24"} 2
```
2021-01-13 09:14:08 -08:00
varsha teratipally
eb38b4b598 added a new metric to retrieve os features like unknown modules 2021-01-08 21:52:16 +00:00
Kubernetes Prow Robot
4dccc1ce24 Merge pull request #493 from vteratipally/kernel_cmdline_parameters
add code to retrieve kernel command line parameters
2020-12-08 17:58:18 -08:00
varsha teratipally
4085da817d renaming splitWords to tokens 2020-12-08 18:34:54 +00:00
varsha teratipally
ffc46f977d add code to retrieve kernel command line parameters 2020-12-07 22:40:22 +00:00
Jeremy Edwards
4adec4bbc6 Introduce Windows build of Node Problem Detector 2020-12-05 23:54:52 +00:00
varsha teratipally
2b50e4af1a add testcases for cos and ubuntu to retrieve modules 2020-11-19 10:29:12 +00:00
varsha teratipally
944efce3a6 add code for retrieving kernel modules 2020-11-19 09:49:25 +00:00
Karan Goel
925ea7393c Collect CPU load averages in a separate metric 2020-11-09 09:41:52 -08:00
Abhilash Pallerlamudi
5342a50874 Add rhel support for osversion 2020-04-15 13:19:56 -07:00
Xuewei Zhang
83b09277f0 Collect more cpu/disk/memory metrics 2020-02-03 15:29:45 -08:00
tongxin21
d5cb44646e add an unit test for parsing the "/etc/os-release" of CentOS
add a newline character at the end
2019-11-01 13:34:22 +08:00
tongxin21
9b9f18a7ed add a case is ID="centos" 2019-10-28 19:09:15 +08:00
Xuewei Zhang
9e789b5f99 Refactor on metrics so that names for all the views are tracked 2019-09-11 12:07:13 -07:00
Xuewei Zhang
f9b5e60a43 Add e2e test for NPD
The first test is a very simple test. It installs NPD on a VM, and then
verifies that NPD reports metric host_uptime in Prometheus format.
2019-08-16 01:33:29 -07:00
Xuewei Zhang
fbebcf311b Report metrics from system-log-monitor 2019-07-12 14:38:21 -07:00
Xuewei Zhang
4944ac3e48 Implement host collector as part of system-stats-monitor
Host collector report three things today:
1. Host OS uptime (in seconds)
2. Host kernel version (as a metric label)
3. Host OS version (as a metric label)
2019-06-27 16:40:11 -07:00
Xuewei Zhang
29b0740f4c Refactor systemstatsmonitor/metric_helper.go into a metrics package 2019-06-27 16:40:05 -07:00
Zhen Wang
1f636381b8 Detect kubelet and container runtime frequent crashes 2018-11-26 22:41:06 -08:00
Andy Xie
89cfb5261d bump kubernetes to 1.9 2018-07-09 14:59:51 +08:00
Lantao Liu
ee103dd4ac Generate event for condition change and support unknown status. 2018-06-21 15:29:53 -07:00
Tim Hockin
3468934b7d Pushes go to staging-k8s.gcr.io 2018-02-01 20:11:55 -08:00
Tim Hockin
547c65ef89 Convert registry to k8s.gcr.io 2017-12-22 09:55:16 -08:00
Andy Xie
10dbfef1a8 add custom problem detector plugin 2017-11-22 10:14:09 +08:00
Random-Liu
20ffe37cea Add NPD endpoints: /debug/pprof, /healthz, /conditions. 2017-02-03 11:07:06 -08:00
Girish Kalele
b687dfaafc Containerize the nethealth bandwidth measurement utility 2016-06-07 20:51:30 -07:00
Girish Kalele
33a43545ca Node network health check utility - performs a quick HTTP GET test 2016-06-03 14:26:12 -07:00
Lantao Liu
f0312655bd Add first version of node-problem-detector 2016-05-17 15:55:33 -07:00