mirror of
https://github.com/krkn-chaos/krkn.git
synced 2026-04-15 06:57:28 +00:00
Health checks implementation for application endpoints (#761)
* Hog scenario porting from arcaflow to native (#748) * added new native hog scenario * removed arcaflow dependency + legacy hog scenarios * config update * changed hog configuration structure + added average samples * fix on cpu count * removes tripledes warning * changed selector format * changed selector syntax * number of nodes option * documentation * functional tests * exception handling on hog deployment thread Signed-off-by: kattameghana <meghanakatta8@gmail.com> * Hog scenario porting from arcaflow to native (#748) * added new native hog scenario * removed arcaflow dependency + legacy hog scenarios * config update * changed hog configuration structure + added average samples * fix on cpu count * removes tripledes warning * changed selector format * changed selector syntax * number of nodes option * documentation * functional tests * exception handling on hog deployment thread Signed-off-by: Paige Patton <prubenda@redhat.com> Signed-off-by: kattameghana <meghanakatta8@gmail.com> * adding vsphere updates to non native Signed-off-by: Paige Patton <prubenda@redhat.com> Signed-off-by: kattameghana <meghanakatta8@gmail.com> * adding node id to affected node Signed-off-by: kattameghana <meghanakatta8@gmail.com> * Fixed the spelling mistake Signed-off-by: Meghana Katta <mkatta@mkatta-thinkpadt14gen4.bengluru.csb> Signed-off-by: kattameghana <meghanakatta8@gmail.com> * adding v4.0.8 version (#756) Signed-off-by: Paige Patton <prubenda@redhat.com> Signed-off-by: kattameghana <meghanakatta8@gmail.com> * Add autodetecting distribution (#753) Used is_openshift function from krkn lib Remove distribution from config Remove distribution from documentation Signed-off-by: jtydlack <139967002+jtydlack@users.noreply.github.com> Signed-off-by: kattameghana <meghanakatta8@gmail.com> * initial version of health checks Signed-off-by: kattameghana <meghanakatta8@gmail.com> * Changes for appending success response and health check config format Signed-off-by: kattameghana <meghanakatta8@gmail.com> * Changes include health check doc and exit_on_failure config Signed-off-by: kattameghana <meghanakatta8@gmail.com> * Update config.yaml Signed-off-by: kattameghana <meghanakatta8@gmail.com> * initial version of health checks Signed-off-by: kattameghana <meghanakatta8@gmail.com> * Changes for appending success response and health check config format Signed-off-by: kattameghana <meghanakatta8@gmail.com> * Update config.yaml Signed-off-by: kattameghana <meghanakatta8@gmail.com> * initial version of health checks Signed-off-by: kattameghana <meghanakatta8@gmail.com> * Changes for appending success response and health check config format Signed-off-by: kattameghana <meghanakatta8@gmail.com> * Changes include health check doc and exit_on_failure config Signed-off-by: kattameghana <meghanakatta8@gmail.com> * Update config.yaml Signed-off-by: kattameghana <meghanakatta8@gmail.com> * initial version of health checks Signed-off-by: kattameghana <meghanakatta8@gmail.com> * Changes for appending success response and health check config format Signed-off-by: kattameghana <meghanakatta8@gmail.com> * Update config.yaml Signed-off-by: kattameghana <meghanakatta8@gmail.com> * Added the health check config in functional test config Signed-off-by: kattameghana <meghanakatta8@gmail.com> * Modified the health checks documentation Signed-off-by: kattameghana <meghanakatta8@gmail.com> * Changes for debugging the functional test failing Signed-off-by: kattameghana <meghanakatta8@gmail.com> * changed the code for debugging in run_test.sh Signed-off-by: kattameghana <meghanakatta8@gmail.com> * Debugging Signed-off-by: kattameghana <meghanakatta8@gmail.com> * Removed the functional test running line Signed-off-by: kattameghana <meghanakatta8@gmail.com> * Removing the health check config in common_test_config for debugging Signed-off-by: kattameghana <meghanakatta8@gmail.com> * Fixing functional test fialure Signed-off-by: kattameghana <meghanakatta8@gmail.com> * Removing the changes that are added for debugging Signed-off-by: kattameghana <meghanakatta8@gmail.com> * few modifications Signed-off-by: kattameghana <meghanakatta8@gmail.com> * Renamed timestamp Signed-off-by: kattameghana <meghanakatta8@gmail.com> * Changed the start timestamp and end timestamp data type to the datetime Signed-off-by: kattameghana <meghanakatta8@gmail.com> * initial version of health checks Signed-off-by: kattameghana <meghanakatta8@gmail.com> * Changes for appending success response and health check config format Signed-off-by: kattameghana <meghanakatta8@gmail.com> * Changes include health check doc and exit_on_failure config Signed-off-by: kattameghana <meghanakatta8@gmail.com> * Update config.yaml Signed-off-by: kattameghana <meghanakatta8@gmail.com> * initial version of health checks Signed-off-by: kattameghana <meghanakatta8@gmail.com> * Changes for appending success response and health check config format Signed-off-by: kattameghana <meghanakatta8@gmail.com> * Update config.yaml Signed-off-by: kattameghana <meghanakatta8@gmail.com> * Hog scenario porting from arcaflow to native (#748) * added new native hog scenario * removed arcaflow dependency + legacy hog scenarios * config update * changed hog configuration structure + added average samples * fix on cpu count * removes tripledes warning * changed selector format * changed selector syntax * number of nodes option * documentation * functional tests * exception handling on hog deployment thread Signed-off-by: Paige Patton <prubenda@redhat.com> Signed-off-by: kattameghana <meghanakatta8@gmail.com> * adding node id to affected node Signed-off-by: kattameghana <meghanakatta8@gmail.com> * initial version of health checks Signed-off-by: kattameghana <meghanakatta8@gmail.com> * Changes for appending success response and health check config format Signed-off-by: kattameghana <meghanakatta8@gmail.com> * Changes include health check doc and exit_on_failure config Signed-off-by: kattameghana <meghanakatta8@gmail.com> * Update config.yaml Signed-off-by: kattameghana <meghanakatta8@gmail.com> * initial version of health checks Signed-off-by: kattameghana <meghanakatta8@gmail.com> * Changes for appending success response and health check config format Signed-off-by: kattameghana <meghanakatta8@gmail.com> * Update config.yaml Signed-off-by: kattameghana <meghanakatta8@gmail.com> * Added the health check config in functional test config Signed-off-by: kattameghana <meghanakatta8@gmail.com> * Modified the health checks documentation Signed-off-by: kattameghana <meghanakatta8@gmail.com> * Changes for debugging the functional test failing Signed-off-by: kattameghana <meghanakatta8@gmail.com> * changed the code for debugging in run_test.sh Signed-off-by: kattameghana <meghanakatta8@gmail.com> * Debugging Signed-off-by: kattameghana <meghanakatta8@gmail.com> * Removed the functional test running line Signed-off-by: kattameghana <meghanakatta8@gmail.com> * Removing the health check config in common_test_config for debugging Signed-off-by: kattameghana <meghanakatta8@gmail.com> * Fixing functional test fialure Signed-off-by: kattameghana <meghanakatta8@gmail.com> * Removing the changes that are added for debugging Signed-off-by: kattameghana <meghanakatta8@gmail.com> * few modifications Signed-off-by: kattameghana <meghanakatta8@gmail.com> * Renamed timestamp Signed-off-by: kattameghana <meghanakatta8@gmail.com> * initial version of health checks Signed-off-by: kattameghana <meghanakatta8@gmail.com> * Changes for appending success response and health check config format Signed-off-by: kattameghana <meghanakatta8@gmail.com> * initial version of health checks Signed-off-by: kattameghana <meghanakatta8@gmail.com> * Hog scenario porting from arcaflow to native (#748) * added new native hog scenario * removed arcaflow dependency + legacy hog scenarios * config update * changed hog configuration structure + added average samples * fix on cpu count * removes tripledes warning * changed selector format * changed selector syntax * number of nodes option * documentation * functional tests * exception handling on hog deployment thread Signed-off-by: kattameghana <meghanakatta8@gmail.com> * Hog scenario porting from arcaflow to native (#748) * added new native hog scenario * removed arcaflow dependency + legacy hog scenarios * config update * changed hog configuration structure + added average samples * fix on cpu count * removes tripledes warning * changed selector format * changed selector syntax * number of nodes option * documentation * functional tests * exception handling on hog deployment thread Signed-off-by: Paige Patton <prubenda@redhat.com> Signed-off-by: kattameghana <meghanakatta8@gmail.com> * adding node id to affected node Signed-off-by: kattameghana <meghanakatta8@gmail.com> * initial version of health checks Signed-off-by: kattameghana <meghanakatta8@gmail.com> * Changes include health check doc and exit_on_failure config Signed-off-by: kattameghana <meghanakatta8@gmail.com> * Update config.yaml Signed-off-by: kattameghana <meghanakatta8@gmail.com> * initial version of health checks Signed-off-by: kattameghana <meghanakatta8@gmail.com> * Changes for appending success response and health check config format Signed-off-by: kattameghana <meghanakatta8@gmail.com> * Update config.yaml Signed-off-by: kattameghana <meghanakatta8@gmail.com> * Added the health check config in functional test config Signed-off-by: kattameghana <meghanakatta8@gmail.com> * Changes for debugging the functional test failing Signed-off-by: kattameghana <meghanakatta8@gmail.com> * changed the code for debugging in run_test.sh Signed-off-by: kattameghana <meghanakatta8@gmail.com> * Debugging Signed-off-by: kattameghana <meghanakatta8@gmail.com> * Removed the functional test running line Signed-off-by: kattameghana <meghanakatta8@gmail.com> * Removing the health check config in common_test_config for debugging Signed-off-by: kattameghana <meghanakatta8@gmail.com> * Fixing functional test fialure Signed-off-by: kattameghana <meghanakatta8@gmail.com> * Removing the changes that are added for debugging Signed-off-by: kattameghana <meghanakatta8@gmail.com> * few modifications Signed-off-by: kattameghana <meghanakatta8@gmail.com> * Renamed timestamp Signed-off-by: kattameghana <meghanakatta8@gmail.com> * passing the health check response as HealthCheck object Signed-off-by: kattameghana <meghanakatta8@gmail.com> * Updated the krkn-lib version in requirements.txt Signed-off-by: kattameghana <meghanakatta8@gmail.com> * Changed the coverage Signed-off-by: kattameghana <meghanakatta8@gmail.com> --------- Signed-off-by: kattameghana <meghanakatta8@gmail.com> Signed-off-by: Paige Patton <prubenda@redhat.com> Signed-off-by: Meghana Katta <mkatta@mkatta-thinkpadt14gen4.bengluru.csb> Signed-off-by: jtydlack <139967002+jtydlack@users.noreply.github.com> Co-authored-by: Tullio Sebastiani <tsebastiani@users.noreply.github.com> Co-authored-by: Paige Patton <prubenda@redhat.com> Co-authored-by: Meghana Katta <mkatta@mkatta-thinkpadt14gen4.bengluru.csb> Co-authored-by: Paige Patton <64206430+paigerube14@users.noreply.github.com> Co-authored-by: jtydlack <139967002+jtydlack@users.noreply.github.com>
This commit is contained in:
59
docs/health_checks.md
Normal file
59
docs/health_checks.md
Normal file
@@ -0,0 +1,59 @@
|
||||
### Health Checks
|
||||
|
||||
Health checks provide real-time visibility into the impact of chaos scenarios on application availability and performance. Health check configuration supports application endpoints accessible via http / https along with authentication mechanism such as bearer token and authentication credentials.
|
||||
Health checks are configured in the ```config.yaml```
|
||||
|
||||
The system periodically checks the provided URLs based on the defined interval and records the results in Telemetry. The telemetry data includes:
|
||||
|
||||
- Success response ```200``` when the application is running normally.
|
||||
- Failure response other than 200 if the application experiences downtime or errors.
|
||||
|
||||
This helps users quickly identify application health issues and take necessary actions.
|
||||
|
||||
#### Sample health check config
|
||||
```
|
||||
health_checks:
|
||||
interval: <time_in_seconds> # Defines the frequency of health checks, default value is 2 seconds
|
||||
config: # List of application endpoints to check
|
||||
- url: "https://example.com/health"
|
||||
bearer_token: "hfjauljl..." # Bearer token for authentication if any
|
||||
auth:
|
||||
exit_on_failure: True # If value is True exits when health check failed for application, values can be True/False
|
||||
- url: "https://another-service.com/status"
|
||||
bearer_token:
|
||||
auth: ("admin","secretpassword") # Provide authentication credentials (username , password) in tuple format if any, ex:("admin","secretpassword")
|
||||
exit_on_failure: False
|
||||
- url: http://general-service.com
|
||||
bearer_token:
|
||||
auth:
|
||||
exit_on_failure:
|
||||
```
|
||||
#### Sample health check telemetry
|
||||
```
|
||||
"health_checks": [
|
||||
{
|
||||
"url": "https://example.com/health",
|
||||
"status": False,
|
||||
"status_code": "503",
|
||||
"start_timestamp": "2025-02-25 11:51:33",
|
||||
"end_timestamp": "2025-02-25 11:51:40",
|
||||
"duration": "0:00:07"
|
||||
},
|
||||
{
|
||||
"url": "https://another-service.com/status",
|
||||
"status": True,
|
||||
"status_code": 200,
|
||||
"start_timestamp": "2025-02-25 22:18:19",
|
||||
"end_timestamp": "22025-02-25 22:22:46",
|
||||
"duration": "0:04:27"
|
||||
},
|
||||
{
|
||||
"url": "http://general-service.com",
|
||||
"status": True,
|
||||
"status_code": 200,
|
||||
"start_timestamp": "2025-02-25 22:18:19",
|
||||
"end_timestamp": "22025-02-25 22:22:46",
|
||||
"duration": "0:04:27"
|
||||
}
|
||||
],
|
||||
```
|
||||
Reference in New Issue
Block a user