Currently, when a replica loses its leadership, a new leader isn't elected until leaseDuration seconds.
Here, that is 15s. The max time till we get a new leader is leaseDuration (15s) + retryPeriod (2s) = 17s.
This commit updates the shutdown process such that if the leader replica is sent a shutdown signal,
it sleeps for leaseDuration seconds. This allows the leader replica to continue to export events until
a new leader is elected. And a new leader is elected only if lease hasn't been renewed and leaseDuration expires.
In addition to this, leader election now uses the leases object instead of configMaps and leases. The clusterRole
is also updated to allow writing to the leases object.
For use cases where no event loss is tolerable, users should use maxEventAgeSeconds to > 1.
* Adds ownerReferences to the exported events
This commit uses the same approach as labels and annotations and adds ownerReferences to the EnhancedEvent struct.
The flow is as follows:
* use an LRU cache to store the ownerReferences with object UID as the key
* if the object doesn't exist in cache, look up using dynamic client and store it in cache
* if the object exists in cache, return the value from cache
* Reduce the number of GetObject calls by using a single cache to store all labels, annotations and ownerReferences
Currently, every time there's an event, the events exporter runs GetObject for metadata like labels and annotations
independently. This results in the same object being looked up multiple times for different pieces of the metadata.
These number of calls grow as we want to look up additional information about the object like ownerReferences.
So, in this change, a struct called `ObjectMetadata` is created to capture all the pieces of information that need to be added
to the EnhancedEvent. And every time there's an event, the object is fetched from the kube-apiserver if it's not in the cache already
and all pieces of metadata require only 1 call. The metadata is cached so repeated events about the same object don't
result in more calls.
Additionally, UID + ResourceVersion is used the cacheKey so if the object changes, it's looked up again.
One more change here is introduction of a `deleted` field in the `EnhancedEvent.InvolvedObject` to capture whether the object
is deleted. This helps receivers identify whether a resource is deleted and create rules for it when needed.
Tests are added for these updates and the mock functions are moved to the test files.
* Make the cache size configurable
* Add metrics for number of reads served from cache and kube-apiserver respectively
* Add `deleted` field to the involvedObject in EnhancedEvent to identify whether the resource is being deleted
* add warn for when event is discarded
* add events discarded metrics
* drop event onUpdate subscription
* rename ThrottlePeriod -> MaxEventAgeSeconds
* update readme to match new name of config var
* add explicit go arch
* gofmt
* add config tests
* revert accidental change to dockerfile
* add tests for event MaxEventAgeSeconds
* add tests for event discarded to onEvent function
* Use custom role with wider read permissions (#2)
* use longer time intervals for more stable tests on slower ci
* fix failing config test
* Updated packages, tidied and changed package name from opsgenie to resmoio
* Handle removal of clusterName from API and leader election api change
* Update Dockerfile Go to 1.19
* Test case for parsing minCount properly for #43
* more extensive test case to also match apiVersion config