Karan Goel
2a2bab3d28
Add network interface stats
...
We do not have to collect these often, so for now set the collection
interval to 120s (even though the Stackdriver exporter is still set to
export every 60s).
2021-01-20 08:56:34 -08:00
Kubernetes Prow Robot
45f70a8b26
Merge pull request #456 from ZYecho/fix_timeout
...
fix: fix script timeout can't work
2021-01-19 19:01:58 -08:00
Kubernetes Prow Robot
c2d7a7be62
Merge pull request #513 from karan/cpu_activity_metrics
...
add metrics for process stats
2021-01-19 18:38:07 -08:00
Kubernetes Prow Robot
a8a1d30310
Merge pull request #509 from jeremyje/winrun
...
Support filelog watching in Windows.
2021-01-19 18:37:59 -08:00
varsha teratipally
2cb1195f18
cleanup the log
2021-01-13 17:54:53 +00:00
Karan Goel
f13d2a5449
don't run os feature collector if metric not initialized
2021-01-13 09:33:13 -08:00
Jeremy Edwards
adc587f222
Support filelog watching in Windows.
2021-01-13 17:16:46 +00:00
Karan Goel
71098097c0
add metrics for process stats
...
Tested on a COS VM:
```
$ curl -s localhost:20257/metrics | grep "^system_"
system_interrupts_total{kernel_version="5.4.49+",os_version="cos 85-13310.1041.24"} 8.759236e+07
system_processes_total{kernel_version="5.4.49+",os_version="cos 85-13310.1041.24"} 692506
system_procs_blocked{kernel_version="5.4.49+",os_version="cos 85-13310.1041.24"} 0
system_procs_running{kernel_version="5.4.49+",os_version="cos 85-13310.1041.24"} 2
```
2021-01-13 09:14:08 -08:00
zhangyue
4f68b251ac
fix: fix script timeout can't work
...
Signed-off-by: zhangyue <huaihuan.zy@alibaba-inc.com >
2021-01-13 20:53:25 +08:00
varsha teratipally
f89f620909
added new line in the known_modules.json
2021-01-08 23:25:02 +00:00
varsha teratipally
eb38b4b598
added a new metric to retrieve os features like unknown modules
2021-01-08 21:52:16 +00:00
Kubernetes Prow Robot
4ad49bbd84
Merge pull request #503 from vteratipally/label_fix
...
changing the label names as per the standards
2020-12-08 22:04:49 -08:00
Kubernetes Prow Robot
4dccc1ce24
Merge pull request #493 from vteratipally/kernel_cmdline_parameters
...
add code to retrieve kernel command line parameters
2020-12-08 17:58:18 -08:00
varsha teratipally
4085da817d
renaming splitWords to tokens
2020-12-08 18:34:54 +00:00
varsha teratipally
047958a49c
changing the label names as per the standards
2020-12-08 02:27:22 +00:00
varsha teratipally
ffc46f977d
add code to retrieve kernel command line parameters
2020-12-07 22:40:22 +00:00
Jeremy Edwards
4adec4bbc6
Introduce Windows build of Node Problem Detector
2020-12-05 23:54:52 +00:00
Kubernetes Prow Robot
bf51d6600e
Merge pull request #492 from vteratipally/module_stats_branch
...
add code to retrieve kernel modules in a linux system from /proc/modules
2020-12-03 09:51:00 -08:00
Kubernetes Prow Robot
1e917af560
Merge pull request #455 from ZYecho/fix_newmessage
...
fix: print result's message when status unknown
2020-11-24 16:14:39 -08:00
varsha teratipally
2b50e4af1a
add testcases for cos and ubuntu to retrieve modules
2020-11-19 10:29:12 +00:00
varsha teratipally
944efce3a6
add code for retrieving kernel modules
2020-11-19 09:49:25 +00:00
Kubernetes Prow Robot
112d53b10a
Merge pull request #497 from vteratipally/fs_types
...
avoid duplicating the disk bytes used metrics based on fstype and mount types
2020-11-18 10:48:07 -08:00
zhangyue
b51cb3219f
fix: print result's message when status unknown
...
Signed-off-by: zhangyue <huaihuan.zy@alibaba-inc.com >
2020-11-18 19:30:17 +08:00
Kubernetes Prow Robot
d8ea2538de
Merge pull request #489 from abansal4032/health-check-kubelet-connection
...
Kubelet api server connection check in health checker
2020-11-16 14:06:42 -08:00
Kubernetes Prow Robot
33571a312d
Merge pull request #478 from neoseele/master
...
fix: node memory metrics are off by 1024
2020-11-16 14:06:12 -08:00
varsha teratipally
1550882948
avoid duplicating the disk bytes used metrics based on fstype and mountopts
2020-11-16 20:10:46 +00:00
Archit Bansal
2513756583
Add kubelet apiserver connection fail check in health checker
2020-11-09 12:47:16 -08:00
Karan Goel
925ea7393c
Collect CPU load averages in a separate metric
2020-11-09 09:41:52 -08:00
Neil
589411702a
fix: node memory metrics are off by 1024
...
The memory unit in /proc/meminfo is kB (b/171164235)
```
MemTotal: 264129908 kB
MemFree: 153559480 kB
...
```
2020-10-19 17:26:31 +11:00
Archit Bansal
8c94d5e60c
Add logging levels to custom plugin logs.
2020-08-28 12:51:50 -07:00
Archit Bansal
3a9370e01b
Log custom plugin stderr only if the status is not ok.
...
Otherwise with plugins that run frequently and report ok status, the
logs are filled with unnecessary noise and significantly increases log
size.
2020-08-27 10:17:05 -07:00
varsha teratipally
50127b0512
changed labelname after code review
2020-08-06 00:43:45 +00:00
varsha teratipally
4c40b7e468
updated readme
2020-08-05 21:43:58 +00:00
varsha teratipally
e13210157d
Add more info to disk metrics
2020-08-05 21:12:25 +00:00
Frame
9678892546
Fix typo in custom-plugin-monitor
2020-08-03 17:08:42 +08:00
Kubernetes Prow Robot
f3ab10eddb
Merge pull request #442 from abansal4032/custom-plugin-logs-capture
...
Capture the logs from stderr of custom plugins
2020-07-29 14:18:03 -07:00
Archit Bansal
6acf5b1edb
Capture the logs from stderr of custom plugins.
2020-07-29 11:57:05 -07:00
Kubernetes Prow Robot
c3cf941e98
Merge pull request #441 from abansal4032/custom-plugin-log-fix
...
Generate new status log only on condition change
2020-07-28 09:45:48 -07:00
Archit Bansal
f80f3e0dfa
Generate status generation logs from custom plugin run only on condition change.
2020-07-24 09:39:39 -07:00
Archit Bansal
f56d0a929d
Use InactiveExitTimestamp instead of ActiveEnterTimestamp for cooldown
...
period in health check monitor.
2020-07-16 18:53:47 -07:00
Archit Bansal
44dc4aa6c1
Add health-check-monitor
2020-05-27 14:08:42 -07:00
Abhilash Pallerlamudi
5342a50874
Add rhel support for osversion
2020-04-15 13:19:56 -07:00
Andrew DeMaria
7fd465e195
Add namespace option for events
2020-03-05 19:04:31 -07:00
Xuewei Zhang
83b09277f0
Collect more cpu/disk/memory metrics
2020-02-03 15:29:45 -08:00
Xuewei Zhang
fa7a3d7df1
Fix disk metrics unit and queue_length calculation
2020-01-02 17:19:38 -08:00
Kubernetes Prow Robot
0d0bba94e5
Merge pull request #402 from gmemcc/master
...
Ignore first collected disk stats to prevent metric distortion
2019-12-18 11:57:57 -08:00
Alex Wong
5a4ac81186
Only disk_avg_queue_len is distorted on first collection
2019-12-12 14:39:29 +08:00
Alex Wong
3d10c892a2
Ignore first collected disk stats to prevent metric distortion
2019-12-11 11:14:01 +08:00
yuzhiquan
9c24be2da4
cleanup: using time.Since(t) instead of t.Sub(time.Now())
2019-12-05 18:57:53 +08:00
yuzhiquan
b458f0d028
fix: modify typo
2019-12-03 15:21:57 +08:00