* adding service disruption
* fixing kil services
* service log changes
* remvoing extra logging
* adding daemon set
* adding service disruption name changes
* cerberus config back
* bad string
* Include check for inside k8s scenario
* Include check for inside k8s scenario (2)
* Include check for inside k8s scenario (3)
* Include check for inside k8s scenario (4)
Pod network outage chaos scenario blocks traffic at pod level irrespective of the network policy used.
With the current network policies, it is not possible to explicitly block ports which are enabled
by allowed network policy rule. This chaos scenario addresses this issue by using OVS flow rules
to block ports related to the pod. It supports OpenShiftSDN and OVNKubernetes based networks.
Below example config blocks access to openshift console.
````
- id: pod_network_outage
config:
namespace: openshift-console
direction:
- ingress
ingress_ports:
- 8443
label_selector: 'component=ui'
````
This commit enables users to opt in to check for critical alerts firing
in the cluster post chaos at the end of each scenario. Chaos scenario is
considered as failed if the cluster is unhealthy in which case user can
start debugging to fix and harden respective areas.
Fixes https://github.com/redhat-chaos/krkn/issues/410
Other than plain style changes, introduced constants
`KUBE_BURNER_URL` and `KUBE_BURNER_VERSION`
solving the problem of having a too long string and at the same time
make it easier to bump the requirement on Kube Burner.
Signed-off-by: Sandro Bonazzola <sbonazzo@redhat.com>
This commit adds an option for the user to pick whether to install
litmus or not depending on their use case. One use case is disconnected
environments where litmus is pre-installed insted of reaching out to the
internet.
This commit:
- Leverages distribution flag in the config set by the user to skip
things not supported on OpenShift to be able to run scenarios on
Kubernetes.
- Adds sample config and scenario files that work on Kubernetes.
* Added new scenario to fill up a given volumen
* fixing small issues and style
* adding PVC as input param instead of pod name
* small fix
* get container name and volumen name
replace oc with kubectl commands
* adding yaml file to create a pv, pvc and pod to run pvc_scenario
* adding support to match both string for describe command when looking for pod_name
* added support to find the pvc from a given pod
* small fix
* small fix
This commit enables users to start Kraken to act as listener by setting
the signal to PAUSE in the config to get the cluster to a desired test or
run any setup before injecting chaos by setting the signal to RUN. This
helps in cases where we have test cases that need to coordinate the chaos
at a desired time depending on the state of the cluster/test run.
This commit enables users to simulate a downtime of an application
by blocking the traffic for the specified duration to see how
it/other components communicating with it behave in case of downtime.
This will avoid querying all namespaces for pods matching the label_selector
if defined as shown in the sample scenario config. This commit also prints a
pointer to the report generated at the end of the run.
- This eases the usage and debuggability by running the fault injection pods in
the same namespace as other resources of litmus. This will also ease the
deletion process and ensure that there are no leftover objects on the cluster.
- This commit also enables users to use the same rbac template for all the litmus
scenarios without having to pull in a specic one for each of the scenarios.
This commit adds support to create zone outage in AWS by denying both
ingress and egress traffic to the instances belonging to a particular
subnet belonging to the zone by tweaking the network acl. This creates
an outage of all the nodes in the zone - both master and workers.
Current Kraken integration with Cerberus monitors the cluster as well as the
application health post chaos and pass/fails if they are not healthy after chaos.
This commit adds ability to monitor the user application health during the chaos
and fails the run in case of downtime as it's potentially a downtime in case of
customers environment as well. It is especially useful in case of control plane
failure scenarios including API server, Etcd, Ingress etc.
This commit:
- Adds timeout to avoid operations hanging for long durations.
- Improves exception handling and exits wherever needed.
- Sets KUBECONFIG env var globoally to access the cluster.