Kubernetes Prow Robot
e842171ba0
Merge pull request #527 from stmcginnis/image-path
...
Update image path in deployment yaml
2021-02-02 16:52:28 -08:00
Kubernetes Prow Robot
49f592d67d
Merge pull request #526 from stmcginnis/changelog
...
Update CHANGELOG for past releases
2021-02-02 16:26:28 -08:00
Sean McGinnis
f604a5ae7d
Update image path in deployment yaml
...
The image location for node-problem-detector has moved under a
subdirectory now. The deployment config wasn't updates, so those using
the provided node-problem-detector.yaml file directly would end up with
ErrImagePull errors.
This updates the yaml to point to the new location and the latest
release.
Signed-off-by: Sean McGinnis <sean.mcginnis@gmail.com >
2021-02-02 16:15:27 -06:00
Sean McGinnis
21d5ec6761
Update CHANGELOG for past releases
...
This adds release information into the CHANGELOG for all past releases.
Signed-off-by: Sean McGinnis <sean.mcginnis@gmail.com >
2021-02-02 16:02:57 -06:00
Kubernetes Prow Robot
422c088d62
Merge pull request #516 from karan/system_time
...
add metric for per-cpu, per-stage timing
2021-02-01 18:54:28 -08:00
Kubernetes Prow Robot
312f96a5a4
Merge pull request #521 from ZYecho/fix-check
...
fix check for timeout
2021-02-01 15:40:27 -08:00
zhangyue
98ba606d4f
fix check for timeout
...
Signed-off-by: zhangyue <huaihuan.zy@alibaba-inc.com >
2021-01-30 21:35:00 +08:00
Karan Goel
8648fe265a
add metric for per-cpu, per-stage timing
2021-01-29 08:46:39 -08:00
Kubernetes Prow Robot
e34e2763cf
Merge pull request #519 from Random-Liu/fix-indention
...
Fix system-stats-monitor config indention.
2021-01-28 23:47:41 -08:00
Kubernetes Prow Robot
7d87c16e03
Merge pull request #518 from Random-Liu/add-containerd-health-checker
...
Add containerd health checker config.
2021-01-28 23:11:41 -08:00
Lantao Liu
144fad7706
Fix system-stats-monitor config indention.
2021-01-28 22:59:47 -08:00
Lantao Liu
c2ad21a380
Add containerd health checker config.
2021-01-28 22:46:55 -08:00
Kubernetes Prow Robot
1a7aa6505d
Merge pull request #512 from karan/dev_net_metrics
...
Add network interface stats
v0.8.6
2021-01-20 14:03:00 -08:00
Karan Goel
2a2bab3d28
Add network interface stats
...
We do not have to collect these often, so for now set the collection
interval to 120s (even though the Stackdriver exporter is still set to
export every 60s).
2021-01-20 08:56:34 -08:00
Kubernetes Prow Robot
45f70a8b26
Merge pull request #456 from ZYecho/fix_timeout
...
fix: fix script timeout can't work
2021-01-19 19:01:58 -08:00
Kubernetes Prow Robot
c2d7a7be62
Merge pull request #513 from karan/cpu_activity_metrics
...
add metrics for process stats
2021-01-19 18:38:07 -08:00
Kubernetes Prow Robot
a8a1d30310
Merge pull request #509 from jeremyje/winrun
...
Support filelog watching in Windows.
2021-01-19 18:37:59 -08:00
Kubernetes Prow Robot
19fefd773f
Merge pull request #515 from vteratipally/master
...
cleanup the log
2021-01-15 13:31:43 -08:00
varsha teratipally
2cb1195f18
cleanup the log
2021-01-13 17:54:53 +00:00
Karan Goel
f13d2a5449
don't run os feature collector if metric not initialized
2021-01-13 09:33:13 -08:00
Jeremy Edwards
adc587f222
Support filelog watching in Windows.
2021-01-13 17:16:46 +00:00
Karan Goel
71098097c0
add metrics for process stats
...
Tested on a COS VM:
```
$ curl -s localhost:20257/metrics | grep "^system_"
system_interrupts_total{kernel_version="5.4.49+",os_version="cos 85-13310.1041.24"} 8.759236e+07
system_processes_total{kernel_version="5.4.49+",os_version="cos 85-13310.1041.24"} 692506
system_procs_blocked{kernel_version="5.4.49+",os_version="cos 85-13310.1041.24"} 0
system_procs_running{kernel_version="5.4.49+",os_version="cos 85-13310.1041.24"} 2
```
2021-01-13 09:14:08 -08:00
zhangyue
4f68b251ac
fix: fix script timeout can't work
...
Signed-off-by: zhangyue <huaihuan.zy@alibaba-inc.com >
2021-01-13 20:53:25 +08:00
Kubernetes Prow Robot
b951f24297
Merge pull request #504 from jeremyje/xupgrade
...
Upgrade golang.org/x/sys to prepare for Windows Service.
2021-01-12 20:02:35 -08:00
Kubernetes Prow Robot
d6d20e49fa
Merge pull request #505 from vteratipally/retrieve_os_features
...
added a new metric to retrieve os features like unknown modules, KTD
2021-01-12 19:36:43 -08:00
Kubernetes Prow Robot
989a15bf3a
Merge pull request #501 from jeremyje/multiarch
...
Remove Dockerfile.in rewrite hack and use updated arg in Dockerfile
2021-01-12 19:36:35 -08:00
varsha teratipally
f89f620909
added new line in the known_modules.json
2021-01-08 23:25:02 +00:00
Kubernetes Prow Robot
f564d9092a
Merge pull request #510 from jeremyje/nopanic
...
Use Fatal instead of panic for go tests.
2021-01-08 14:43:05 -08:00
Kubernetes Prow Robot
8c16b56476
Merge pull request #511 from ForestCold/master
...
Update list of supported problem daemons
2021-01-08 14:19:06 -08:00
varsha teratipally
eb38b4b598
added a new metric to retrieve os features like unknown modules
2021-01-08 21:52:16 +00:00
Magic Yami
041b77bd32
Merge pull request #1 from ForestCold/Update-supported-problem-deamon-list
...
Update supported problem deamon list
2021-01-06 14:57:38 -08:00
Magic Yami
a210b30d36
Update supported problem deamon list
...
When I read through the problem daemon list, the original description make me feel a little confused since it listed problem daemon config (kernel monitor) and problem daemon types (custom plugin monitor) together. Change the way it describes to make it more clear, however didn't find clue to categorize docker-monitor, would appreciate if reviewer can point that out.
2021-01-06 14:57:05 -08:00
Jeremy Edwards
a451a892ae
Use Fatal instead of panic for go tests.
2020-12-22 03:01:51 +00:00
Jeremy Edwards
1da1f28cef
Upgrade golang.org/x/sys to prepare for Windows Service.
2020-12-13 06:39:59 +00:00
Kubernetes Prow Robot
4ad49bbd84
Merge pull request #503 from vteratipally/label_fix
...
changing the label names as per the standards
2020-12-08 22:04:49 -08:00
Kubernetes Prow Robot
4dccc1ce24
Merge pull request #493 from vteratipally/kernel_cmdline_parameters
...
add code to retrieve kernel command line parameters
2020-12-08 17:58:18 -08:00
varsha teratipally
4085da817d
renaming splitWords to tokens
2020-12-08 18:34:54 +00:00
Jeremy Edwards
aadb16b3d4
Remove Dockerfile.in rewrite hack and use updated arg in Dockerfile
2020-12-08 06:31:29 +00:00
Kubernetes Prow Robot
8f2a94fd7e
Merge pull request #502 from jeremyje/windows
...
Introduce Windows build of Node Problem Detector
2020-12-07 22:21:11 -08:00
varsha teratipally
047958a49c
changing the label names as per the standards
2020-12-08 02:27:22 +00:00
varsha teratipally
ffc46f977d
add code to retrieve kernel command line parameters
2020-12-07 22:40:22 +00:00
Jeremy Edwards
4adec4bbc6
Introduce Windows build of Node Problem Detector
2020-12-05 23:54:52 +00:00
Kubernetes Prow Robot
bf51d6600e
Merge pull request #492 from vteratipally/module_stats_branch
...
add code to retrieve kernel modules in a linux system from /proc/modules
2020-12-03 09:51:00 -08:00
Kubernetes Prow Robot
1e917af560
Merge pull request #455 from ZYecho/fix_newmessage
...
fix: print result's message when status unknown
2020-11-24 16:14:39 -08:00
Kubernetes Prow Robot
6956e6074d
Merge pull request #500 from Random-Liu/fix-staging-bucket
...
Change default staging bucket.
2020-11-20 09:44:51 -08:00
Lantao Liu
ed783da499
Change default staging bucket.
...
The new staging bucket for the promoter is `gcr.io/k8s-staging-npd`.
2020-11-20 09:08:35 -08:00
varsha teratipally
2b50e4af1a
add testcases for cos and ubuntu to retrieve modules
2020-11-19 10:29:12 +00:00
varsha teratipally
944efce3a6
add code for retrieving kernel modules
2020-11-19 09:49:25 +00:00
Kubernetes Prow Robot
59536256e3
Merge pull request #475 from vteratipally/boot_size_disk
...
catching hung task with pattern like "tasks airflow scheduler: *"
v0.8.5
2020-11-18 14:42:50 -08:00
Kubernetes Prow Robot
112d53b10a
Merge pull request #497 from vteratipally/fs_types
...
avoid duplicating the disk bytes used metrics based on fstype and mount types
2020-11-18 10:48:07 -08:00