584 Commits

Author SHA1 Message Date
Kubernetes Prow Robot
06b5503348 Merge pull request #530 from goushicui/master
add memory read error
v0.8.7
2021-02-18 07:46:51 -08:00
Kubernetes Prow Robot
adf4c720b2 Merge pull request #534 from jeremyje/docs
Add instructions for developing NPD on Windows
2021-02-17 20:18:51 -08:00
Kubernetes Prow Robot
ee5f2d1aa5 Merge pull request #536 from abansal4032/configure-log-pattern
Make log pattern check configurable in health checker
2021-02-17 18:32:51 -08:00
Archit Bansal
100f2bf8e6 Make log pattern check configurable in health checker 2021-02-17 17:46:18 -08:00
Jeremy Edwards
efe02543c0 Add instructions for developing NPD on Windows
Signed-off-by: Jeremy Edwards <1312331+jeremyje@users.noreply.github.com>
2021-02-10 10:02:35 -08:00
goushicui
7ecb76f31a add memory read error 2021-02-09 14:08:18 +08:00
Kubernetes Prow Robot
fc4f167caa Merge pull request #529 from stmcginnis/macos
Allow building on macOS
2021-02-08 18:44:40 -08:00
Kubernetes Prow Robot
f3968f11ab Merge pull request #522 from karan/del_labels
Remove os_version and kernel_version labels
2021-02-03 10:10:28 -08:00
Kubernetes Prow Robot
5c1cabf237 Merge pull request #528 from stmcginnis/release-docs
Add notes on steps to perform when doing a release
2021-02-02 19:48:28 -08:00
Sean McGinnis
487915e9e4 Use GOHOSTOS to switch journald plugin build
This plugin requires libsystemd to compile, which is only available on
Linux. This uses `go env` to determine if the current build platform can
support this or not, and if not, disables the building of the plugin to
allow compilation on Windows and macOS platforms.

Signed-off-by: Sean McGinnis <sean.mcginnis@gmail.com>
2021-02-02 20:25:17 -06:00
Kubernetes Prow Robot
e842171ba0 Merge pull request #527 from stmcginnis/image-path
Update image path in deployment yaml
2021-02-02 16:52:28 -08:00
Kubernetes Prow Robot
49f592d67d Merge pull request #526 from stmcginnis/changelog
Update CHANGELOG for past releases
2021-02-02 16:26:28 -08:00
Sean McGinnis
7e7bc2271e Allow building on macOS
Journald is not available on mac. To allow building the rest of the
project while working on a mac, use the same flag as the Windows build
to skip inclusion of journald.

Signed-off-by: Sean McGinnis <sean.mcginnis@gmail.com>
2021-02-02 17:17:38 -06:00
Sean McGinnis
e7511e6eeb Add notes on steps to perform when doing a release
There are a few things that need to be done for each release. This adds
some basic notes to help make sure all steps are followed. This is a
working document, so hopefully it will be expanded over time as we
identify other steps that should be done.

Signed-off-by: Sean McGinnis <sean.mcginnis@gmail.com>
2021-02-02 16:55:22 -06:00
Sean McGinnis
f604a5ae7d Update image path in deployment yaml
The image location for node-problem-detector has moved under a
subdirectory now. The deployment config wasn't updates, so those using
the provided node-problem-detector.yaml file directly would end up with
ErrImagePull errors.

This updates the yaml to point to the new location and the latest
release.

Signed-off-by: Sean McGinnis <sean.mcginnis@gmail.com>
2021-02-02 16:15:27 -06:00
Sean McGinnis
21d5ec6761 Update CHANGELOG for past releases
This adds release information into the CHANGELOG for all past releases.

Signed-off-by: Sean McGinnis <sean.mcginnis@gmail.com>
2021-02-02 16:02:57 -06:00
Karan Goel
c2aceee61d remove os_versions and kernel_version labels 2021-02-02 08:25:10 -08:00
Kubernetes Prow Robot
422c088d62 Merge pull request #516 from karan/system_time
add metric for per-cpu, per-stage timing
2021-02-01 18:54:28 -08:00
Kubernetes Prow Robot
312f96a5a4 Merge pull request #521 from ZYecho/fix-check
fix check for timeout
2021-02-01 15:40:27 -08:00
zhangyue
98ba606d4f fix check for timeout
Signed-off-by: zhangyue <huaihuan.zy@alibaba-inc.com>
2021-01-30 21:35:00 +08:00
Karan Goel
8648fe265a add metric for per-cpu, per-stage timing 2021-01-29 08:46:39 -08:00
Kubernetes Prow Robot
e34e2763cf Merge pull request #519 from Random-Liu/fix-indention
Fix system-stats-monitor config indention.
2021-01-28 23:47:41 -08:00
Kubernetes Prow Robot
7d87c16e03 Merge pull request #518 from Random-Liu/add-containerd-health-checker
Add containerd health checker config.
2021-01-28 23:11:41 -08:00
Lantao Liu
144fad7706 Fix system-stats-monitor config indention. 2021-01-28 22:59:47 -08:00
Lantao Liu
c2ad21a380 Add containerd health checker config. 2021-01-28 22:46:55 -08:00
Kubernetes Prow Robot
1a7aa6505d Merge pull request #512 from karan/dev_net_metrics
Add network interface stats
v0.8.6
2021-01-20 14:03:00 -08:00
Karan Goel
2a2bab3d28 Add network interface stats
We do not have to collect these often, so for now set the collection
interval to 120s (even though the Stackdriver exporter is still set to
export every 60s).
2021-01-20 08:56:34 -08:00
Kubernetes Prow Robot
45f70a8b26 Merge pull request #456 from ZYecho/fix_timeout
fix: fix script timeout can't work
2021-01-19 19:01:58 -08:00
Kubernetes Prow Robot
c2d7a7be62 Merge pull request #513 from karan/cpu_activity_metrics
add metrics for process stats
2021-01-19 18:38:07 -08:00
Kubernetes Prow Robot
a8a1d30310 Merge pull request #509 from jeremyje/winrun
Support filelog watching in Windows.
2021-01-19 18:37:59 -08:00
Kubernetes Prow Robot
19fefd773f Merge pull request #515 from vteratipally/master
cleanup the log
2021-01-15 13:31:43 -08:00
varsha teratipally
2cb1195f18 cleanup the log 2021-01-13 17:54:53 +00:00
Karan Goel
f13d2a5449 don't run os feature collector if metric not initialized 2021-01-13 09:33:13 -08:00
Jeremy Edwards
adc587f222 Support filelog watching in Windows. 2021-01-13 17:16:46 +00:00
Karan Goel
71098097c0 add metrics for process stats
Tested on a COS VM:

```
$ curl -s localhost:20257/metrics | grep "^system_"
system_interrupts_total{kernel_version="5.4.49+",os_version="cos 85-13310.1041.24"} 8.759236e+07
system_processes_total{kernel_version="5.4.49+",os_version="cos 85-13310.1041.24"} 692506
system_procs_blocked{kernel_version="5.4.49+",os_version="cos 85-13310.1041.24"} 0
system_procs_running{kernel_version="5.4.49+",os_version="cos 85-13310.1041.24"} 2
```
2021-01-13 09:14:08 -08:00
zhangyue
4f68b251ac fix: fix script timeout can't work
Signed-off-by: zhangyue <huaihuan.zy@alibaba-inc.com>
2021-01-13 20:53:25 +08:00
Kubernetes Prow Robot
b951f24297 Merge pull request #504 from jeremyje/xupgrade
Upgrade golang.org/x/sys to prepare for Windows Service.
2021-01-12 20:02:35 -08:00
Kubernetes Prow Robot
d6d20e49fa Merge pull request #505 from vteratipally/retrieve_os_features
added a new metric to retrieve os features like unknown modules, KTD
2021-01-12 19:36:43 -08:00
Kubernetes Prow Robot
989a15bf3a Merge pull request #501 from jeremyje/multiarch
Remove Dockerfile.in rewrite hack and use updated arg in Dockerfile
2021-01-12 19:36:35 -08:00
varsha teratipally
f89f620909 added new line in the known_modules.json 2021-01-08 23:25:02 +00:00
Kubernetes Prow Robot
f564d9092a Merge pull request #510 from jeremyje/nopanic
Use Fatal instead of panic for go tests.
2021-01-08 14:43:05 -08:00
Kubernetes Prow Robot
8c16b56476 Merge pull request #511 from ForestCold/master
Update list of supported problem daemons
2021-01-08 14:19:06 -08:00
varsha teratipally
eb38b4b598 added a new metric to retrieve os features like unknown modules 2021-01-08 21:52:16 +00:00
Magic Yami
041b77bd32 Merge pull request #1 from ForestCold/Update-supported-problem-deamon-list
Update supported problem deamon list
2021-01-06 14:57:38 -08:00
Magic Yami
a210b30d36 Update supported problem deamon list
When I read through the problem daemon list, the original description make me feel a little confused since it listed problem daemon config (kernel monitor) and problem daemon types (custom plugin monitor) together.  Change the way it describes to make it more clear, however didn't find clue to categorize docker-monitor, would appreciate if reviewer can point that out.
2021-01-06 14:57:05 -08:00
Jeremy Edwards
a451a892ae Use Fatal instead of panic for go tests. 2020-12-22 03:01:51 +00:00
Jeremy Edwards
1da1f28cef Upgrade golang.org/x/sys to prepare for Windows Service. 2020-12-13 06:39:59 +00:00
Kubernetes Prow Robot
4ad49bbd84 Merge pull request #503 from vteratipally/label_fix
changing the label names as per the standards
2020-12-08 22:04:49 -08:00
Kubernetes Prow Robot
4dccc1ce24 Merge pull request #493 from vteratipally/kernel_cmdline_parameters
add code to retrieve kernel command line parameters
2020-12-08 17:58:18 -08:00
varsha teratipally
4085da817d renaming splitWords to tokens 2020-12-08 18:34:54 +00:00