559 Commits

Author SHA1 Message Date
Kubernetes Prow Robot
1a7aa6505d Merge pull request #512 from karan/dev_net_metrics
Add network interface stats
v0.8.6
2021-01-20 14:03:00 -08:00
Karan Goel
2a2bab3d28 Add network interface stats
We do not have to collect these often, so for now set the collection
interval to 120s (even though the Stackdriver exporter is still set to
export every 60s).
2021-01-20 08:56:34 -08:00
Kubernetes Prow Robot
45f70a8b26 Merge pull request #456 from ZYecho/fix_timeout
fix: fix script timeout can't work
2021-01-19 19:01:58 -08:00
Kubernetes Prow Robot
c2d7a7be62 Merge pull request #513 from karan/cpu_activity_metrics
add metrics for process stats
2021-01-19 18:38:07 -08:00
Kubernetes Prow Robot
a8a1d30310 Merge pull request #509 from jeremyje/winrun
Support filelog watching in Windows.
2021-01-19 18:37:59 -08:00
Kubernetes Prow Robot
19fefd773f Merge pull request #515 from vteratipally/master
cleanup the log
2021-01-15 13:31:43 -08:00
varsha teratipally
2cb1195f18 cleanup the log 2021-01-13 17:54:53 +00:00
Karan Goel
f13d2a5449 don't run os feature collector if metric not initialized 2021-01-13 09:33:13 -08:00
Jeremy Edwards
adc587f222 Support filelog watching in Windows. 2021-01-13 17:16:46 +00:00
Karan Goel
71098097c0 add metrics for process stats
Tested on a COS VM:

```
$ curl -s localhost:20257/metrics | grep "^system_"
system_interrupts_total{kernel_version="5.4.49+",os_version="cos 85-13310.1041.24"} 8.759236e+07
system_processes_total{kernel_version="5.4.49+",os_version="cos 85-13310.1041.24"} 692506
system_procs_blocked{kernel_version="5.4.49+",os_version="cos 85-13310.1041.24"} 0
system_procs_running{kernel_version="5.4.49+",os_version="cos 85-13310.1041.24"} 2
```
2021-01-13 09:14:08 -08:00
zhangyue
4f68b251ac fix: fix script timeout can't work
Signed-off-by: zhangyue <huaihuan.zy@alibaba-inc.com>
2021-01-13 20:53:25 +08:00
Kubernetes Prow Robot
b951f24297 Merge pull request #504 from jeremyje/xupgrade
Upgrade golang.org/x/sys to prepare for Windows Service.
2021-01-12 20:02:35 -08:00
Kubernetes Prow Robot
d6d20e49fa Merge pull request #505 from vteratipally/retrieve_os_features
added a new metric to retrieve os features like unknown modules, KTD
2021-01-12 19:36:43 -08:00
Kubernetes Prow Robot
989a15bf3a Merge pull request #501 from jeremyje/multiarch
Remove Dockerfile.in rewrite hack and use updated arg in Dockerfile
2021-01-12 19:36:35 -08:00
varsha teratipally
f89f620909 added new line in the known_modules.json 2021-01-08 23:25:02 +00:00
Kubernetes Prow Robot
f564d9092a Merge pull request #510 from jeremyje/nopanic
Use Fatal instead of panic for go tests.
2021-01-08 14:43:05 -08:00
Kubernetes Prow Robot
8c16b56476 Merge pull request #511 from ForestCold/master
Update list of supported problem daemons
2021-01-08 14:19:06 -08:00
varsha teratipally
eb38b4b598 added a new metric to retrieve os features like unknown modules 2021-01-08 21:52:16 +00:00
Magic Yami
041b77bd32 Merge pull request #1 from ForestCold/Update-supported-problem-deamon-list
Update supported problem deamon list
2021-01-06 14:57:38 -08:00
Magic Yami
a210b30d36 Update supported problem deamon list
When I read through the problem daemon list, the original description make me feel a little confused since it listed problem daemon config (kernel monitor) and problem daemon types (custom plugin monitor) together.  Change the way it describes to make it more clear, however didn't find clue to categorize docker-monitor, would appreciate if reviewer can point that out.
2021-01-06 14:57:05 -08:00
Jeremy Edwards
a451a892ae Use Fatal instead of panic for go tests. 2020-12-22 03:01:51 +00:00
Jeremy Edwards
1da1f28cef Upgrade golang.org/x/sys to prepare for Windows Service. 2020-12-13 06:39:59 +00:00
Kubernetes Prow Robot
4ad49bbd84 Merge pull request #503 from vteratipally/label_fix
changing the label names as per the standards
2020-12-08 22:04:49 -08:00
Kubernetes Prow Robot
4dccc1ce24 Merge pull request #493 from vteratipally/kernel_cmdline_parameters
add code to retrieve kernel command line parameters
2020-12-08 17:58:18 -08:00
varsha teratipally
4085da817d renaming splitWords to tokens 2020-12-08 18:34:54 +00:00
Jeremy Edwards
aadb16b3d4 Remove Dockerfile.in rewrite hack and use updated arg in Dockerfile 2020-12-08 06:31:29 +00:00
Kubernetes Prow Robot
8f2a94fd7e Merge pull request #502 from jeremyje/windows
Introduce Windows build of Node Problem Detector
2020-12-07 22:21:11 -08:00
varsha teratipally
047958a49c changing the label names as per the standards 2020-12-08 02:27:22 +00:00
varsha teratipally
ffc46f977d add code to retrieve kernel command line parameters 2020-12-07 22:40:22 +00:00
Jeremy Edwards
4adec4bbc6 Introduce Windows build of Node Problem Detector 2020-12-05 23:54:52 +00:00
Kubernetes Prow Robot
bf51d6600e Merge pull request #492 from vteratipally/module_stats_branch
add code to retrieve kernel modules in a linux system from /proc/modules
2020-12-03 09:51:00 -08:00
Kubernetes Prow Robot
1e917af560 Merge pull request #455 from ZYecho/fix_newmessage
fix: print result's message when status unknown
2020-11-24 16:14:39 -08:00
Kubernetes Prow Robot
6956e6074d Merge pull request #500 from Random-Liu/fix-staging-bucket
Change default staging bucket.
2020-11-20 09:44:51 -08:00
Lantao Liu
ed783da499 Change default staging bucket.
The new staging bucket for the promoter is `gcr.io/k8s-staging-npd`.
2020-11-20 09:08:35 -08:00
varsha teratipally
2b50e4af1a add testcases for cos and ubuntu to retrieve modules 2020-11-19 10:29:12 +00:00
varsha teratipally
944efce3a6 add code for retrieving kernel modules 2020-11-19 09:49:25 +00:00
Kubernetes Prow Robot
59536256e3 Merge pull request #475 from vteratipally/boot_size_disk
catching hung task with pattern like "tasks airflow scheduler: *"
v0.8.5
2020-11-18 14:42:50 -08:00
Kubernetes Prow Robot
112d53b10a Merge pull request #497 from vteratipally/fs_types
avoid duplicating the disk bytes used metrics based on fstype and mount types
2020-11-18 10:48:07 -08:00
zhangyue
b51cb3219f fix: print result's message when status unknown
Signed-off-by: zhangyue <huaihuan.zy@alibaba-inc.com>
2020-11-18 19:30:17 +08:00
vteratipally
0c258bb704 Update kernel-monitor.json 2020-11-17 13:38:07 -08:00
Kubernetes Prow Robot
438d014389 Merge pull request #425 from jsoref/grammar
Grammar
2020-11-16 21:38:04 -08:00
Kubernetes Prow Robot
3abcfb7063 Merge pull request #490 from karan/vendor
Bump some major dependencies to latest versions
2020-11-16 14:06:50 -08:00
Kubernetes Prow Robot
d8ea2538de Merge pull request #489 from abansal4032/health-check-kubelet-connection
Kubelet api server connection check in health checker
2020-11-16 14:06:42 -08:00
Kubernetes Prow Robot
cff4a54d6a Merge pull request #488 from vteratipally/io_errors
Add Detectection logic for  I/O errors
2020-11-16 14:06:36 -08:00
Kubernetes Prow Robot
5919888571 Merge pull request #485 from karan/helm-readme
fix helm instructions
2020-11-16 14:06:28 -08:00
Kubernetes Prow Robot
2d53c0a2a6 Merge pull request #481 from tosi3k/oom-regex-fix
Adapt OOMKilling pattern to old and new Linux kernels
2020-11-16 14:06:20 -08:00
Kubernetes Prow Robot
33571a312d Merge pull request #478 from neoseele/master
fix: node memory metrics are off by 1024
2020-11-16 14:06:12 -08:00
Kubernetes Prow Robot
06e5a875be Merge pull request #430 from wawa0210/linux-only
avoid npd pod schedule on windows node
2020-11-16 14:06:04 -08:00
varsha teratipally
1550882948 avoid duplicating the disk bytes used metrics based on fstype and mountopts 2020-11-16 20:10:46 +00:00
Kubernetes Prow Robot
35bfe697a5 Merge pull request #484 from karan/trial-metric
Collect CPU load averages in a separate metric
2020-11-12 12:00:28 -08:00