483 Commits

Author SHA1 Message Date
Kubernetes Prow Robot
f3ab10eddb Merge pull request #442 from abansal4032/custom-plugin-logs-capture
Capture the logs from stderr of custom plugins
v0.8.3
2020-07-29 14:18:03 -07:00
Archit Bansal
6acf5b1edb Capture the logs from stderr of custom plugins. 2020-07-29 11:57:05 -07:00
Kubernetes Prow Robot
c3cf941e98 Merge pull request #441 from abansal4032/custom-plugin-log-fix
Generate new status log only on condition change
2020-07-28 09:45:48 -07:00
Archit Bansal
f80f3e0dfa Generate status generation logs from custom plugin run only on condition change. 2020-07-24 09:39:39 -07:00
Kubernetes Prow Robot
ca34880303 Merge pull request #444 from abansal4032/health-check-cooldown-fix
Fix for cooldown time in health checker plugin
2020-07-17 18:32:54 -07:00
Kubernetes Prow Robot
27f1e774ef Merge pull request #443 from abansal4032/health-check-enable-repair
Set auto-repair=true by default for health check monitors.
2020-07-17 17:48:51 -07:00
Archit Bansal
f56d0a929d Use InactiveExitTimestamp instead of ActiveEnterTimestamp for cooldown
period in health check monitor.
2020-07-16 18:53:47 -07:00
Archit Bansal
84188cc0aa Set auto-repair=true by default for health check monitors. 2020-07-15 18:57:53 -07:00
Kubernetes Prow Robot
061e977d1c Merge pull request #433 from bengadbois/add-health-checker-image
docker image: add health-checker binary
2020-06-09 10:47:20 -07:00
Ben Gadbois
32f770dd4e docker image: add health-checker binary 2020-06-09 08:25:31 -07:00
Kubernetes Prow Robot
452818cef8 Merge pull request #426 from abansal4032/health-check-monitor
Add health-check-monitor
v0.8.2
2020-05-27 18:02:02 -07:00
Archit Bansal
44dc4aa6c1 Add health-check-monitor 2020-05-27 14:08:42 -07:00
Kubernetes Prow Robot
1d03b66f15 Merge pull request #424 from stpabhi/rhel-support
Add rhel support for osversion
2020-04-15 14:30:45 -07:00
Abhilash Pallerlamudi
5342a50874 Add rhel support for osversion 2020-04-15 13:19:56 -07:00
Kubernetes Prow Robot
20e0147106 Merge pull request #422 from blackwith/patch-1
update system-log-monitor and image version
2020-04-08 10:45:43 -07:00
Mathieu Collin
74554c4b26 update system-log-monitor and image version 2020-04-08 11:24:56 +02:00
Kubernetes Prow Robot
633ced6c8e Merge pull request #421 from majst01/lsblk
Install util-linux to have lsblk binary
2020-03-25 10:17:03 -07:00
Stefan Majer
70c457e5df Install util-linux to have lsblk binary 2020-03-25 11:43:12 +01:00
Kubernetes Prow Robot
c709314cd7 Merge pull request #419 from KohlsTechnology/remedy-system-docs
Document Using Descheudler As a Remedy System
2020-03-10 14:01:36 -07:00
Kubernetes Prow Robot
ab5ea72c74 Merge pull request #418 from muff1nman/namespace-option
Add namespace option for events
2020-03-10 09:49:36 -07:00
Sean Malloy
f603f26afa Document Using Descheudler As a Remedy System
In addition to using draino as a remedy system the k8s descheduler can
also be used as a remedy system.
2020-03-08 22:30:51 -05:00
Andrew DeMaria
7fd465e195 Add namespace option for events 2020-03-05 19:04:31 -07:00
Kubernetes Prow Robot
4ad6227196 Merge pull request #414 from SHLo/patch-1
fix wording
2020-02-27 14:12:38 -08:00
shlo
925d69a18d fix wording 2020-02-24 11:07:57 +08:00
Kubernetes Prow Robot
450c6c3b01 Merge pull request #410 from xueweiz/stats
Collect more CPU/disk/memory metrics
v0.8.1
2020-02-06 10:49:25 -08:00
Xuewei Zhang
8c02c6d4d2 Check metric sanity in e2e tests 2020-02-03 15:38:12 -08:00
Xuewei Zhang
83b09277f0 Collect more cpu/disk/memory metrics 2020-02-03 15:29:45 -08:00
Xuewei Zhang
9ade82734d Add github.com/prometheus/procfs library 2020-01-31 16:02:15 -08:00
Xuewei Zhang
7f9437cba0 Add github.com/shirou/gopsutil/load library 2020-01-31 15:42:57 -08:00
Kubernetes Prow Robot
aadb2b88d1 Merge pull request #405 from xueweiz/test-pr
Rent Boskos project only once per test run.
2020-01-07 13:40:19 -08:00
Kubernetes Prow Robot
140a850b63 Merge pull request #404 from xueweiz/queue
Fix disk metrics unit and queue_length calculation
2020-01-06 13:14:16 -08:00
Xuewei Zhang
fb8304bec8 Rent Boskos project only once per test run.
The old implementation rents Boskos project for each Ginkgo node.
2020-01-03 14:49:35 -08:00
Xuewei Zhang
fa7a3d7df1 Fix disk metrics unit and queue_length calculation 2020-01-02 17:19:38 -08:00
Kubernetes Prow Robot
0d0bba94e5 Merge pull request #402 from gmemcc/master
Ignore first collected disk stats to prevent metric distortion
2019-12-18 11:57:57 -08:00
Alex Wong
5a4ac81186 Only disk_avg_queue_len is distorted on first collection 2019-12-12 14:39:29 +08:00
Alex Wong
3d10c892a2 Ignore first collected disk stats to prevent metric distortion 2019-12-11 11:14:01 +08:00
Kubernetes Prow Robot
7819ffda7c Merge pull request #400 from xueweiz/patch-1
Install ginkgo executable in test/build.sh
2019-12-10 11:32:07 -08:00
Xuewei Zhang
6f27c80053 Install ginkgo executable in test/build.sh
ginkgo executable is used in e2e test to support parallelism.
Make sure to install it before running e2e test in the presubmit and CI jobs.
2019-12-06 22:35:33 -08:00
Kubernetes Prow Robot
9d584df4c6 Merge pull request #387 from xueweiz/test-pr
Add a few behavioral e2e tests
2019-12-06 15:13:54 -08:00
Xuewei Zhang
7d28dde8d8 Add e2e test for OOM kill and Docker hung
Also fixes two minor bugs:

1. Change default Boskos wait timeout to 2 minutes.
This is because the current test timeout is configured to 10 minutes.
Running each test case taks 1-2 minutes, and each node will run 1-2 test
cases. 5 minutes timeout on waiting for Boskos may cause a test timeout,
which we want to avoid.

2. Create artifact subdir with 0755 rather than 0644.
Because execution bit should be set on the directories.
2019-12-06 14:49:17 -08:00
Xuewei Zhang
8b98d08b5f Record scp command failure message to help debugging 2019-12-06 14:49:17 -08:00
Xuewei Zhang
dd37dfe12c Add e2e tests for reporting filesystem problems
Also added support for running e2e tests in parallel.
2019-12-06 14:49:17 -08:00
Xuewei Zhang
b3f811d171 Add detection for ext4 errors 2019-12-06 14:49:17 -08:00
Xuewei Zhang
5da72e86bb Add problem maker to simulate problems for e2e test 2019-12-06 14:49:17 -08:00
Kubernetes Prow Robot
7dc84e8d74 Merge pull request #395 from yuzhiquan/patch
Using time.Since(t) instead of t.Sub(time.Now())
2019-12-05 17:28:49 -08:00
yuzhiquan
9c24be2da4 cleanup: using time.Since(t) instead of t.Sub(time.Now()) 2019-12-05 18:57:53 +08:00
Xuewei Zhang
40cb3e0fec Vendor changes for gomega 2019-12-04 17:17:53 -08:00
Kubernetes Prow Robot
11e35096c4 Merge pull request #394 from yuzhiquan/master
fix: modify typo
2019-12-02 23:46:57 -08:00
yuzhiquan
b458f0d028 fix: modify typo 2019-12-03 15:21:57 +08:00
Kubernetes Prow Robot
9c3f17478b Merge pull request #393 from jiayuc/fix-make-test
fix make test early failure
2019-11-29 23:45:03 -08:00