33 Commits

Author SHA1 Message Date
Jeremy Edwards
d4933875ed Add support for basic system metrics for Windows. 2021-05-10 21:58:38 +00:00
michelletandya
c4e5400ed6 separate linux/windows health checker files. 2021-04-26 21:45:05 +00:00
Jeremy Edwards
4181ece888 Windows Support: Fix Build Regressions, Tests Pass 2021-03-14 10:24:45 -07:00
Karan Goel
8648fe265a add metric for per-cpu, per-stage timing 2021-01-29 08:46:39 -08:00
Karan Goel
2a2bab3d28 Add network interface stats
We do not have to collect these often, so for now set the collection
interval to 120s (even though the Stackdriver exporter is still set to
export every 60s).
2021-01-20 08:56:34 -08:00
Karan Goel
71098097c0 add metrics for process stats
Tested on a COS VM:

```
$ curl -s localhost:20257/metrics | grep "^system_"
system_interrupts_total{kernel_version="5.4.49+",os_version="cos 85-13310.1041.24"} 8.759236e+07
system_processes_total{kernel_version="5.4.49+",os_version="cos 85-13310.1041.24"} 692506
system_procs_blocked{kernel_version="5.4.49+",os_version="cos 85-13310.1041.24"} 0
system_procs_running{kernel_version="5.4.49+",os_version="cos 85-13310.1041.24"} 2
```
2021-01-13 09:14:08 -08:00
varsha teratipally
eb38b4b598 added a new metric to retrieve os features like unknown modules 2021-01-08 21:52:16 +00:00
Kubernetes Prow Robot
4dccc1ce24 Merge pull request #493 from vteratipally/kernel_cmdline_parameters
add code to retrieve kernel command line parameters
2020-12-08 17:58:18 -08:00
varsha teratipally
4085da817d renaming splitWords to tokens 2020-12-08 18:34:54 +00:00
varsha teratipally
ffc46f977d add code to retrieve kernel command line parameters 2020-12-07 22:40:22 +00:00
Jeremy Edwards
4adec4bbc6 Introduce Windows build of Node Problem Detector 2020-12-05 23:54:52 +00:00
varsha teratipally
2b50e4af1a add testcases for cos and ubuntu to retrieve modules 2020-11-19 10:29:12 +00:00
varsha teratipally
944efce3a6 add code for retrieving kernel modules 2020-11-19 09:49:25 +00:00
Karan Goel
925ea7393c Collect CPU load averages in a separate metric 2020-11-09 09:41:52 -08:00
Abhilash Pallerlamudi
5342a50874 Add rhel support for osversion 2020-04-15 13:19:56 -07:00
Xuewei Zhang
83b09277f0 Collect more cpu/disk/memory metrics 2020-02-03 15:29:45 -08:00
tongxin21
d5cb44646e add an unit test for parsing the "/etc/os-release" of CentOS
add a newline character at the end
2019-11-01 13:34:22 +08:00
tongxin21
9b9f18a7ed add a case is ID="centos" 2019-10-28 19:09:15 +08:00
Xuewei Zhang
9e789b5f99 Refactor on metrics so that names for all the views are tracked 2019-09-11 12:07:13 -07:00
Xuewei Zhang
f9b5e60a43 Add e2e test for NPD
The first test is a very simple test. It installs NPD on a VM, and then
verifies that NPD reports metric host_uptime in Prometheus format.
2019-08-16 01:33:29 -07:00
Xuewei Zhang
fbebcf311b Report metrics from system-log-monitor 2019-07-12 14:38:21 -07:00
Xuewei Zhang
4944ac3e48 Implement host collector as part of system-stats-monitor
Host collector report three things today:
1. Host OS uptime (in seconds)
2. Host kernel version (as a metric label)
3. Host OS version (as a metric label)
2019-06-27 16:40:11 -07:00
Xuewei Zhang
29b0740f4c Refactor systemstatsmonitor/metric_helper.go into a metrics package 2019-06-27 16:40:05 -07:00
Zhen Wang
1f636381b8 Detect kubelet and container runtime frequent crashes 2018-11-26 22:41:06 -08:00
Andy Xie
89cfb5261d bump kubernetes to 1.9 2018-07-09 14:59:51 +08:00
Lantao Liu
ee103dd4ac Generate event for condition change and support unknown status. 2018-06-21 15:29:53 -07:00
Tim Hockin
3468934b7d Pushes go to staging-k8s.gcr.io 2018-02-01 20:11:55 -08:00
Tim Hockin
547c65ef89 Convert registry to k8s.gcr.io 2017-12-22 09:55:16 -08:00
Andy Xie
10dbfef1a8 add custom problem detector plugin 2017-11-22 10:14:09 +08:00
Random-Liu
20ffe37cea Add NPD endpoints: /debug/pprof, /healthz, /conditions. 2017-02-03 11:07:06 -08:00
Girish Kalele
b687dfaafc Containerize the nethealth bandwidth measurement utility 2016-06-07 20:51:30 -07:00
Girish Kalele
33a43545ca Node network health check utility - performs a quick HTTP GET test 2016-06-03 14:26:12 -07:00
Lantao Liu
f0312655bd Add first version of node-problem-detector 2016-05-17 15:55:33 -07:00