Jeremy Edwards
d4933875ed
Add support for basic system metrics for Windows.
2021-05-10 21:58:38 +00:00
michelletandya
c4e5400ed6
separate linux/windows health checker files.
2021-04-26 21:45:05 +00:00
Jeremy Edwards
4181ece888
Windows Support: Fix Build Regressions, Tests Pass
2021-03-14 10:24:45 -07:00
Karan Goel
8648fe265a
add metric for per-cpu, per-stage timing
2021-01-29 08:46:39 -08:00
Karan Goel
2a2bab3d28
Add network interface stats
...
We do not have to collect these often, so for now set the collection
interval to 120s (even though the Stackdriver exporter is still set to
export every 60s).
2021-01-20 08:56:34 -08:00
Karan Goel
71098097c0
add metrics for process stats
...
Tested on a COS VM:
```
$ curl -s localhost:20257/metrics | grep "^system_"
system_interrupts_total{kernel_version="5.4.49+",os_version="cos 85-13310.1041.24"} 8.759236e+07
system_processes_total{kernel_version="5.4.49+",os_version="cos 85-13310.1041.24"} 692506
system_procs_blocked{kernel_version="5.4.49+",os_version="cos 85-13310.1041.24"} 0
system_procs_running{kernel_version="5.4.49+",os_version="cos 85-13310.1041.24"} 2
```
2021-01-13 09:14:08 -08:00
varsha teratipally
eb38b4b598
added a new metric to retrieve os features like unknown modules
2021-01-08 21:52:16 +00:00
Kubernetes Prow Robot
4dccc1ce24
Merge pull request #493 from vteratipally/kernel_cmdline_parameters
...
add code to retrieve kernel command line parameters
2020-12-08 17:58:18 -08:00
varsha teratipally
4085da817d
renaming splitWords to tokens
2020-12-08 18:34:54 +00:00
varsha teratipally
ffc46f977d
add code to retrieve kernel command line parameters
2020-12-07 22:40:22 +00:00
Jeremy Edwards
4adec4bbc6
Introduce Windows build of Node Problem Detector
2020-12-05 23:54:52 +00:00
varsha teratipally
2b50e4af1a
add testcases for cos and ubuntu to retrieve modules
2020-11-19 10:29:12 +00:00
varsha teratipally
944efce3a6
add code for retrieving kernel modules
2020-11-19 09:49:25 +00:00
Karan Goel
925ea7393c
Collect CPU load averages in a separate metric
2020-11-09 09:41:52 -08:00
Abhilash Pallerlamudi
5342a50874
Add rhel support for osversion
2020-04-15 13:19:56 -07:00
Xuewei Zhang
83b09277f0
Collect more cpu/disk/memory metrics
2020-02-03 15:29:45 -08:00
tongxin21
d5cb44646e
add an unit test for parsing the "/etc/os-release" of CentOS
...
add a newline character at the end
2019-11-01 13:34:22 +08:00
tongxin21
9b9f18a7ed
add a case is ID="centos"
2019-10-28 19:09:15 +08:00
Xuewei Zhang
9e789b5f99
Refactor on metrics so that names for all the views are tracked
2019-09-11 12:07:13 -07:00
Xuewei Zhang
f9b5e60a43
Add e2e test for NPD
...
The first test is a very simple test. It installs NPD on a VM, and then
verifies that NPD reports metric host_uptime in Prometheus format.
2019-08-16 01:33:29 -07:00
Xuewei Zhang
fbebcf311b
Report metrics from system-log-monitor
2019-07-12 14:38:21 -07:00
Xuewei Zhang
4944ac3e48
Implement host collector as part of system-stats-monitor
...
Host collector report three things today:
1. Host OS uptime (in seconds)
2. Host kernel version (as a metric label)
3. Host OS version (as a metric label)
2019-06-27 16:40:11 -07:00
Xuewei Zhang
29b0740f4c
Refactor systemstatsmonitor/metric_helper.go into a metrics package
2019-06-27 16:40:05 -07:00
Zhen Wang
1f636381b8
Detect kubelet and container runtime frequent crashes
2018-11-26 22:41:06 -08:00
Andy Xie
89cfb5261d
bump kubernetes to 1.9
2018-07-09 14:59:51 +08:00
Lantao Liu
ee103dd4ac
Generate event for condition change and support unknown status.
2018-06-21 15:29:53 -07:00
Tim Hockin
3468934b7d
Pushes go to staging-k8s.gcr.io
2018-02-01 20:11:55 -08:00
Tim Hockin
547c65ef89
Convert registry to k8s.gcr.io
2017-12-22 09:55:16 -08:00
Andy Xie
10dbfef1a8
add custom problem detector plugin
2017-11-22 10:14:09 +08:00
Random-Liu
20ffe37cea
Add NPD endpoints: /debug/pprof, /healthz, /conditions.
2017-02-03 11:07:06 -08:00
Girish Kalele
b687dfaafc
Containerize the nethealth bandwidth measurement utility
2016-06-07 20:51:30 -07:00
Girish Kalele
33a43545ca
Node network health check utility - performs a quick HTTP GET test
2016-06-03 14:26:12 -07:00
Lantao Liu
f0312655bd
Add first version of node-problem-detector
2016-05-17 15:55:33 -07:00