Kubernetes Prow Robot
9c541692ee
Merge pull request #557 from vteratipally/adfad
...
Make sure the path to known-modules.json is relative
2021-05-14 14:39:59 -07:00
Varsha Teratipally
a79b87ce7e
Make sure the path to known-modules.json is relative to the
...
system-stats-monitor.json file
2021-05-14 21:14:55 +00:00
michelletandya
01fa5b3afd
Add windows defender problem detection custom plugin
2021-05-12 20:28:33 +00:00
Jeremy Edwards
d4933875ed
Add support for basic system metrics for Windows.
2021-05-10 21:58:38 +00:00
michelletandya
01cd8dd08c
Add healthChecker functionality for kube-proxy service
2021-05-05 17:27:58 +00:00
michelletandya
da15eb9afe
Detect containerD errors and failures.
2021-04-29 23:47:04 +00:00
michelletandya
c4e5400ed6
separate linux/windows health checker files.
2021-04-26 21:45:05 +00:00
michelletandya
344daabaa7
Update windows containerd config file to run without errors
2021-03-30 23:26:06 +00:00
Jeremy Edwards
4181ece888
Windows Support: Fix Build Regressions, Tests Pass
2021-03-14 10:24:45 -07:00
Kubernetes Prow Robot
06b5503348
Merge pull request #530 from goushicui/master
...
add memory read error
2021-02-18 07:46:51 -08:00
goushicui
7ecb76f31a
add memory read error
2021-02-09 14:08:18 +08:00
Karan Goel
8648fe265a
add metric for per-cpu, per-stage timing
2021-01-29 08:46:39 -08:00
Kubernetes Prow Robot
e34e2763cf
Merge pull request #519 from Random-Liu/fix-indention
...
Fix system-stats-monitor config indention.
2021-01-28 23:47:41 -08:00
Lantao Liu
144fad7706
Fix system-stats-monitor config indention.
2021-01-28 22:59:47 -08:00
Lantao Liu
c2ad21a380
Add containerd health checker config.
2021-01-28 22:46:55 -08:00
Karan Goel
2a2bab3d28
Add network interface stats
...
We do not have to collect these often, so for now set the collection
interval to 120s (even though the Stackdriver exporter is still set to
export every 60s).
2021-01-20 08:56:34 -08:00
Kubernetes Prow Robot
c2d7a7be62
Merge pull request #513 from karan/cpu_activity_metrics
...
add metrics for process stats
2021-01-19 18:38:07 -08:00
Jeremy Edwards
adc587f222
Support filelog watching in Windows.
2021-01-13 17:16:46 +00:00
Karan Goel
71098097c0
add metrics for process stats
...
Tested on a COS VM:
```
$ curl -s localhost:20257/metrics | grep "^system_"
system_interrupts_total{kernel_version="5.4.49+",os_version="cos 85-13310.1041.24"} 8.759236e+07
system_processes_total{kernel_version="5.4.49+",os_version="cos 85-13310.1041.24"} 692506
system_procs_blocked{kernel_version="5.4.49+",os_version="cos 85-13310.1041.24"} 0
system_procs_running{kernel_version="5.4.49+",os_version="cos 85-13310.1041.24"} 2
```
2021-01-13 09:14:08 -08:00
varsha teratipally
f89f620909
added new line in the known_modules.json
2021-01-08 23:25:02 +00:00
varsha teratipally
eb38b4b598
added a new metric to retrieve os features like unknown modules
2021-01-08 21:52:16 +00:00
Kubernetes Prow Robot
59536256e3
Merge pull request #475 from vteratipally/boot_size_disk
...
catching hung task with pattern like "tasks airflow scheduler: *"
2020-11-18 14:42:50 -08:00
vteratipally
0c258bb704
Update kernel-monitor.json
2020-11-17 13:38:07 -08:00
Kubernetes Prow Robot
cff4a54d6a
Merge pull request #488 from vteratipally/io_errors
...
Add Detectection logic for I/O errors
2020-11-16 14:06:36 -08:00
Kubernetes Prow Robot
2d53c0a2a6
Merge pull request #481 from tosi3k/oom-regex-fix
...
Adapt OOMKilling pattern to old and new Linux kernels
2020-11-16 14:06:20 -08:00
Karan Goel
925ea7393c
Collect CPU load averages in a separate metric
2020-11-09 09:41:52 -08:00
varsha teratipally
f01b5e5cfe
Detect I/O errors
2020-11-06 03:48:33 +00:00
Antoni Zawodny
6b650e785e
Adapt OOMKilling pattern to old and new Linux kernels
2020-10-22 15:12:26 +02:00
varsha teratipally
f984abbe2e
catching hung task with pattern like taks airflow scheduler: some of the events related to hungtask is not identified
2020-10-08 23:04:15 +00:00
vteratipally
edfd70a16c
Update docker-monitor.json
...
fixed json format error as it doesn't allow trailing commas
2020-08-11 10:02:17 -07:00
vteratipally
fbdd9eec9a
Update docker-monitor.json
...
making DockerContainerStartup failure as temporary
2020-08-11 09:59:46 -07:00
varsha teratipally
4ce29a95d5
removed the $ symbol as npd handles end of the line
2020-08-06 01:30:11 +00:00
varsha teratipally
95237efb4d
Detect docker startup failures
2020-08-05 21:29:11 +00:00
Archit Bansal
84188cc0aa
Set auto-repair=true by default for health check monitors.
2020-07-15 18:57:53 -07:00
Archit Bansal
44dc4aa6c1
Add health-check-monitor
2020-05-27 14:08:42 -07:00
Xuewei Zhang
83b09277f0
Collect more cpu/disk/memory metrics
2020-02-03 15:29:45 -08:00
Xuewei Zhang
b3f811d171
Add detection for ext4 errors
2019-12-06 14:49:17 -08:00
Kubernetes Prow Robot
3a41fc2fc3
Merge pull request #392 from arekkusu/origin/patch-2
...
Improve systemctl check, style + cleanup
2019-11-29 01:33:03 -08:00
Alexandre
4df720c2a0
Improve systemctl check, style + cleanup
...
- Use `systemctl is-active` to check if service is running
- Cleaner that `grep` on `systemctl status` output
- Return success means service is running/active
- Return failure means not running which could be due to
stopped/failed service or that service does not exist
- Use `command -v` instead of `which`
Ref: https://github.com/koalaman/shellcheck/wiki/SC2230
- Follow Google "Shell Style Guide": indent, use "readonly"
- Minor: Rephrase comment, avoid all caps
2019-11-29 14:14:19 +09:00
Alexandre
a91b568149
Support "nf_conntrack", check 90% full, style
...
- Script was checking for "ip_conntrack_..." which was replaced by "nf_conntrack_..." on newer system. Now support both.
- Return failure ("not ok") when table is more than 90% full.
- Not sure what value is best here but I think that is better than when the table is full.
Otherwise we might end up with a value close to the max or bouncing around.
- Replaced cat by "$(< file )" to avoid calling external command
- Follow Google "Shell Style Guide": 2 space indent, use preferred "[[ test ]]", add "readonly"
- Include current connection usage in output message
2019-11-29 13:20:37 +09:00
Kubernetes Prow Robot
5345185ec2
Merge pull request #341 from iranzo/patch-1
...
Update network_problem.sh
2019-09-15 01:00:37 -07:00
Xuewei Zhang
0f0e5eff0f
Adding stackdriver exporter
2019-09-12 18:30:00 -07:00
Pablo Iranzo Gómez
fa94b42849
Use bashate recommendations on network_problem script
2019-09-05 15:46:45 +02:00
Xuewei Zhang
f9b5e60a43
Add e2e test for NPD
...
The first test is a very simple test. It installs NPD on a VM, and then
verifies that NPD reports metric host_uptime in Prometheus format.
2019-08-16 01:33:29 -07:00
Zhen Wang
a8527712f6
Update the detection method for docker overlay2 issue
2019-08-01 22:16:44 -07:00
Zhen Wang
570ae0cb20
Make systemd monitor look back for 5m
2019-07-30 11:17:02 -07:00
Xuewei Zhang
94af7de97b
Report metrics from custom-plugin-monitor
2019-07-25 11:28:38 -07:00
Xuewei Zhang
fbebcf311b
Report metrics from system-log-monitor
2019-07-12 14:38:21 -07:00
Xuewei Zhang
4944ac3e48
Implement host collector as part of system-stats-monitor
...
Host collector report three things today:
1. Host OS uptime (in seconds)
2. Host kernel version (as a metric label)
3. Host OS version (as a metric label)
2019-06-27 16:40:11 -07:00
Zhen Wang
b94a555dfc
Add systemd monitor for kubelet, docker, and containerd restart events
2019-06-18 10:26:53 -07:00