Files
container.training/slides/k8s/operators-design.md
2022-06-10 21:43:56 +02:00

11 KiB

Designing an operator

  • Once we understand CRDs and operators, it's tempting to use them everywhere

  • Yes, we can do (almost) everything with operators ...

  • ... But should we?

  • Very often, the answer is “no!”

  • Operators are powerful, but significantly more complex than other solutions


When should we (not) use operators?

  • Operators are great if our app needs to react to cluster events

    (nodes or pods going down, and requiring extensive reconfiguration)

  • Operators might be helpful to encapsulate complexity

    (manipulate one single custom resource for an entire stack)

  • Operators are probably overkill if a Helm chart would suffice

  • That being said, if we really want to write an operator ...

    Read on!


What does it take to write an operator?

  • Writing a quick-and-dirty operator, or a POC/MVP, is easy

  • Writing a robust operator is hard

  • We will describe the general idea

  • We will identify some of the associated challenges

  • We will list a few tools that can help us


Top-down vs. bottom-up

  • Both approaches are possible

  • Let's see what they entail, and their respective pros and cons


Top-down approach

  • Start with high-level design (see next slide)

  • Pros:

    • can yield cleaner design that will be more robust
  • Cons:

    • must be able to anticipate all the events that might happen

    • design will be better only to the extent of what we anticipated

    • hard to anticipate if we don't have production experience


High-level design

  • What are we solving?

    (e.g.: geographic databases backed by PostGIS with Redis caches)

  • What are our use-cases, stories?

    (e.g.: adding/resizing caches and read replicas; load balancing queries)

  • What kind of outage do we want to address?

    (e.g.: loss of individual node, pod, volume)

  • What are our non-features, the things we don't want to address?

    (e.g.: loss of datacenter/zone; differentiating between read and write queries;
    cache invalidation; upgrading to newer major versions of Redis, PostGIS, PostgreSQL)


Low-level design

  • What Custom Resource Definitions do we need?

    (one, many?)

  • How will we store configuration information?

    (part of the CRD spec fields, annotations, other?)

  • Do we need to store state? If so, where?

    • state that is small and doesn't change much can be stored via the Kubernetes API
      (e.g.: leader information, configuration, credentials)

    • things that are big and/or change a lot should go elsewhere
      (e.g.: metrics, bigger configuration file like GeoIP)


class: extra-details

What can we store via the Kubernetes API?

  • The API server stores most Kubernetes resources in etcd

  • Etcd is designed for reliability, not for performance

  • If our storage needs exceed what etcd can offer, we need to use something else:

    • either directly

    • or by extending the API server
      (for instance by using the aggregation layer, like metrics server does)


Bottom-up approach

  • Start with existing Kubernetes resources (Deployment, Stateful Set...)

  • Run the system in production

  • Add scripts, automation, to facilitate day-to-day operations

  • Turn the scripts into an operator

  • Pros: simpler to get started; reflects actual use-cases

  • Cons: can result in convoluted designs requiring extensive refactor


General idea

  • Our operator will watch its CRDs and associated resources

  • Drawing state diagrams and finite state automata helps a lot

  • It's OK if some transitions lead to a big catch-all "human intervention"

  • Over time, we will learn about new failure modes and add to these diagrams

  • It's OK to start with CRD creation / deletion and prevent any modification

    (that's the easy POC/MVP we were talking about)

  • Presentation and validation will help our users

    (more on that later)


Challenges

  • Reacting to infrastructure disruption can seem hard at first

  • Kubernetes gives us a lot of primitives to help:

    • Pods and Persistent Volumes will eventually recover

    • Stateful Sets give us easy ways to "add N copies" of a thing

  • The real challenges come with configuration changes

    (i.e., what to do when our users update our CRDs)

  • Keep in mind that some of the largest cloud outages haven't been caused by natural catastrophes, or even code bugs, but by configuration changes


Configuration changes

  • It is helpful to analyze and understand how Kubernetes controllers work:

    • watch resource for modifications

    • compare desired state (CRD) and current state

    • issue actions to converge state

  • Configuration changes will probably require another state diagram or FSA

  • Again, it's OK to have transitions labeled as "unsupported"

    (i.e. reject some modifications because we can't execute them)


Tools


Validation

  • By default, a CRD is "free form"

    (we can put pretty much anything we want in it)

  • When creating a CRD, we can provide an OpenAPI v3 schema (Example)

  • The API server will then validate resources created/edited with this schema

  • If we need a stronger validation, we can use a Validating Admission Webhook:


Presentation

  • By default, kubectl get mycustomresource won't display much information

    (just the name and age of each resource)

  • When creating a CRD, we can specify additional columns to print (Example, Docs)

  • By default, kubectl describe mycustomresource will also be generic

  • kubectl describe can show events related to our custom resources

    (for that, we need to create Event resources, and fill the involvedObject field)

  • For scalable resources, we can define a scale sub-resource

  • This will enable the use of kubectl scale and other scaling-related operations


About scaling

  • It is possible to use the HPA (Horizontal Pod Autoscaler) with CRDs

  • But it is not always desirable

  • The HPA works very well for homogenous, stateless workloads

  • For other workloads, your mileage may vary

  • Some systems can scale across multiple dimensions

    (for instance: increase number of replicas, or number of shards?)

  • If autoscaling is desired, the operator will have to take complex decisions

    (example: Zalando's Elasticsearch Operator (Video))


Versioning

  • As our operator evolves over time, we may have to change the CRD

    (add, remove, change fields)

  • Like every other resource in Kubernetes, custom resources are versioned

  • When creating a CRD, we need to specify a list of versions

  • Versions can be marked as stored and/or served


Stored version

  • Exactly one version has to be marked as the stored version

  • As the name implies, it is the one that will be stored in etcd

  • Resources in storage are never converted automatically

    (we need to read and re-write them ourselves)

  • Yes, this means that we can have different versions in etcd at any time

  • Our code needs to handle all the versions that still exist in storage


Served versions

  • By default, the Kubernetes API will serve resources "as-is"

    (using their stored version)

  • It will assume that all versions are compatible storage-wise

    (i.e. that the spec and fields are compatible between versions)

  • We can provide conversion webhooks to "translate" requests

    (the alternative is to upgrade all stored resources and stop serving old versions)


Operator reliability

  • Remember that the operator itself must be resilient

    (e.g.: the node running it can fail)

  • Our operator must be able to restart and recover gracefully

  • Do not store state locally

    (unless we can reconstruct that state when we restart)

  • As indicated earlier, we can use the Kubernetes API to store data:

    • in the custom resources themselves

    • in other resources' annotations


Beyond CRDs

  • CRDs cannot use custom storage (e.g. for time series data)

  • CRDs cannot support arbitrary subresources (like logs or exec for Pods)

  • CRDs cannot support protobuf (for faster, more efficient communication)

  • If we need these things, we can use the aggregation layer instead

  • The aggregation layer proxies all requests below a specific path to another server

    (this is used e.g. by the metrics server)

  • This documentation page compares the features of CRDs and API aggregation

???

:EN:- Guidelines to design our own operators :FR:- Comment concevoir nos propres opérateurs