18 KiB
Kubebuilder
-
Writing a quick and dirty operator is (relatively) easy
-
Doing it right, however ...
--
-
We need:
-
proper CRD with schema validation
-
controller performing a reconcilation loop
-
manage errors, retries, dependencies between resources
-
maybe webhooks for admission and/or conversion
-
😱
Frameworks
-
There are a few frameworks available out there:
-
kubebuilder (book): go-centric, very close to Kubernetes' core types
-
operator-framework: higher level; also supports Ansible and Helm
-
KUDO: declarative operators written in YAML
-
KOPF: operators in Python
-
...
-
Kubebuilder workflow
-
Kubebuilder will create scaffolding for us
(Go stubs for types and controllers)
-
Then we edit these types and controllers files
-
Kubebuilder generates CRD manifests from our type definitions
(and regenerates the manifests whenver we update the types)
-
It also gives us tools to quickly run the controller against a cluster
(not necessarily on the cluster)
Our objective
-
We're going to implement a useless machine
basic example | playful example | advanced example | another advanced example
-
A machine manifest will look like this:
kind: Machine apiVersion: useless.container.training/v1alpha1 metadata: name: machine-1 spec: # Our useless operator will change that to "down" switchPosition: up -
Each time we change the
switchPosition, the operator will move it back todown
(This is inspired by the uselessoperator written by V Körbes. Highly recommend!💯)
class: extra-details
Local vs remote
-
Building Go code can be a little bit slow on our modest lab VMs
-
It will typically be much faster on a local machine
-
All the demos and labs in this section will run fine either way!
Preparation
-
Install Go
(on our VMs:
sudo snap install go --classicorsudo apk add go) -
Install kubebuilder
(get a release, untar, move the
kubebuilderbinary to the$PATH) -
Initialize our workspace:
mkdir useless cd useless go mod init container.training/useless kubebuilder init --domain container.training
Create scaffolding
-
Create a type and corresponding controller:
kubebuilder create api --group useless --version v1alpha1 --kind Machine -
Answer
yto both questions -
Then we need to edit the type that just got created!
Edit type
Edit api/v1alpha1/machine_types.go.
Add the switchPosition field in the spec structure:
// MachineSpec defines the desired state of Machine
type MachineSpec struct {
// Position of the switch on the machine, for instance up or down.
SwitchPosition string ``json:"switchPosition,omitempty"``
}
⚠️ The backticks above should be simple backticks, not double-backticks. Sorry.
Go markers
We can use Go marker comments to give controller-gen extra details about how to handle our type, for instance:
//+kubebuilder:object:root=true
→ top-level type exposed through API (as opposed to "member field of another type")
//+kubebuilder:subresource:status
→ automatically generate a status subresource (very common with many types)
//+kubebuilder:printcolumn:JSONPath=".spec.switchPosition",name=Position,type=string
(See marker syntax, CRD generation, CRD validation, Object/DeepCopy )
Installing the CRD
After making these changes, we can run make install.
This will build the Go code, but also:
-
generate the CRD manifest
-
and apply the manifest to the cluster
Creating a machine
Edit config/samples/useless_v1alpha1_machine.yaml:
kind: Machine
apiVersion: useless.container.training/v1alpha1
metadata:
labels: # ...
name: machine-1
spec:
# Our useless operator will change that to "down"
switchPosition: up
... and apply it to the cluster.
Designing the controller
-
Our controller needs to:
-
notice when a
switchPositionis notdown -
move it to
downwhen that happens
-
-
Later, we can add fancy improvements (wait a bit before moving it, etc.)
Reconciler logic
-
Kubebuilder will call our reconciler when necessary
-
When necessary = when changes happen ...
-
on our resource
-
or resources that it watches (related resources)
-
-
After "doing stuff", the reconciler can return ...
-
ctrl.Result{},nil= all is good -
ctrl.Result{Requeue...},nil= all is good, but call us back in a bit -
ctrl.Result{},err= something's wrong, try again later
-
Loading an object
Open internal/controllers/machine_controller.go.
Add that code in the Reconcile method, at the TODO(user) location:
var machine uselessv1alpha1.Machine
logger := log.FromContext(ctx)
if err := r.Get(ctx, req.NamespacedName, &machine); err != nil {
logger.Info("error getting object")
return ctrl.Result{}, err
}
logger.Info(
"reconciling",
"machine", req.NamespacedName,
"switchPosition", machine.Spec.SwitchPosition,
)
Running the controller
Our controller is not done yet, but let's try what we have right now!
This will compile the controller and run it:
make run
Then:
- create a machine
- change the
switchPosition - delete the machine
--
We get a bunch of errors and go stack traces! 🤔
IgnoreNotFound
When we are called for object deletion, the object has already been deleted.
(Unless we're using finalizers, but that's another story.)
When we return err, the controller will try to access the object ...
... We need to tell it to not do that.
Don't just return err, but instead, wrap it around client.IgnoreNotFound:
return ctrl.Result{}, client.IgnoreNotFound(err)
Update the code, make run again, create/change/delete again.
--
🎉
Updating the machine
Let's try to update the machine like this:
if machine.Spec.SwitchPosition != "down" {
machine.Spec.SwitchPosition = "down"
if err := r.Update(ctx, &machine); err != nil {
logger.Info("error updating switch position")
return ctrl.Result{}, client.IgnoreNotFound(err)
}
}
Again - update, make run, test.
Spec vs Status
-
Spec = desired state
-
Status = observed state
-
If Status is lost, the controller should be able to reconstruct it
(maybe with degraded behavior in the meantime)
-
Status will almost always be a sub-resource, so that it can be updated separately
(and potentially with different permissions)
class: extra-details
Spec vs Status (in depth)
-
The
/statussubresource is handled differently by the API server -
Updates to
/statusdon't alter the rest of the object -
Conversely, updates to the object ignore changes in the status
(See the docs for the fine print.)
"Improving" our controller
-
We want to wait a few seconds before flipping the switch
-
Let's add the following line of code to the controller:
time.Sleep(5 * time.Second) -
make run, create a few machines, observe what happens
--
💡 Concurrency!
Controller logic
-
Our controller shouldn't block (think "event loop")
-
There is a queue of objects that need to be reconciled
-
We can ask to be put back on the queue for later processing
-
When we need to block (wait for something to happen), two options:
-
ask for a requeue ("call me back later")
-
yield because we know we will be notified by another resource
-
To requeue ...
return ctrl.Result{RequeueAfter: 1 * time.Second}, nil
-
That means: "try again in 1 second, and I will check if progress was made"
-
This does not guarantee that we will be called exactly 1 second later:
-
we might be called before (if other changes happen)
-
we might be called after (if the controller is busy with other objects)
-
-
If we are waiting for another Kubernetes resource to change, there is a better way
(explained on next slide)
... or not to requeue
return ctrl.Result{}, nil
-
That means: "we're done here!"
-
This is also what we should use if we are waiting for another resource
(e.g. a LoadBalancer to be provisioned, a Pod to be ready...)
-
In that case, we will need to set a watch (more on that later)
Keeping track of state
-
If we simply requeue the object to examine it 1 second later...
-
...We'll keep examining/requeuing it forever!
-
We need to "remember" that we saw it (and when)
-
Option 1: keep state in controller
(e.g. an internal
map) -
Option 2: keep state in the object
(typically in its status field)
-
Tradeoffs: concurrency / failover / control plane overhead...
"Improving" our controller, take 2
Let's store in the machine status the moment when we saw it:
type MachineStatus struct {
// Time at which the machine was noticed by our controller.
SeenAt *metav1.Time ``json:"seenAt,omitempty"``
}
⚠️ The backticks above should be simple backticks, not double-backticks. Sorry.
Note: date fields don't display timestamps in the future.
(That's why for this example it's simpler to use seenAt rather than changeAt.)
And for better visibility, add this along with the other printcolumn comments:
//+kubebuilder:printcolumn:JSONPath=".status.seenAt",name=Seen,type=date
Set seenAt
Let's add the following block in our reconciler:
if machine.Status.SeenAt == nil {
now := metav1.Now()
machine.Status.SeenAt = &now
if err := r.Status().Update(ctx, &machine); err != nil {
logger.Info("error updating status.seenAt")
return ctrl.Result{}, client.IgnoreNotFound(err)
}
return ctrl.Result{RequeueAfter: 5 * time.Second}, nil
}
(If needed, add metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" to our imports.)
Use seenAt
Our switch-position-changing code can now become:
if machine.Spec.SwitchPosition != "down" {
now := metav1.Now()
changeAt := machine.Status.SeenAt.Time.Add(5 * time.Second)
if now.Time.After(changeAt) {
machine.Spec.SwitchPosition = "down"
machine.Status.SeenAt = nil
if err := r.Update(ctx, &machine); err != nil {
logger.Info("error updating switch position")
return ctrl.Result{}, client.IgnoreNotFound(err)
}
}
}
make run, create a few machines, tweak their switches.
Owner and dependents
-
Next, let's see how to have relationships between objects!
-
We will now have two kinds of objects: machines, and switches
-
Machines will store the number of switches in their spec
-
Machines should have at least one switch, possibly multiple ones
-
Our controller will automatically create switches if needed
(a bit like the ReplicaSet controller automatically creates Pods)
-
The switches will be tied to their machine through a label
(let's pick
machine=name-of-the-machine)
Switch state
-
The position of a switch will now be stored in the switch
(not in the machine like in the first scenario)
-
The machine will also expose the combined state of the switches
(through its status)
-
The machine's status will be automatically updated by the controller
(each time a switch is added/changed/removed)
Switches and machines
[jp@hex ~]$ kubectl get machines
NAME SWITCHES POSITIONS
machine-cz2vl 3 ddd
machine-vf4xk 1 d
[jp@hex ~]$ kubectl get switches --show-labels
NAME POSITION SEEN LABELS
switch-6wmjw down machine=machine-cz2vl
switch-b8csg down machine=machine-cz2vl
switch-fl8dq down machine=machine-cz2vl
switch-rc59l down machine=machine-vf4xk
(The field status.positions shows the first letter of the position of each switch.)
Tasks
-
Create the new resource type (but don't create a controller)
-
Update
machine_types.goandswitch_types.go -
Implement logic to display machine status (status of its switches)
-
Implement logic to automatically create switches
-
Implement logic to flip all switches down immediately
-
Then tweak it so that a given machine doesn't flip more than one switch every 5 seconds
See next slides for detailed steps!
Creating the new type
kubebuilder create api --group useless --version v1alpha1 --kind Switch
Note: this time, only create a new custom resource; not a new controller.
Updating our types
-
Move the "switch position" and "seen at" to the new
Switchtype -
Update the
Machinetype to have:-
spec.switches(Go type:int, JSON type:integer) -
status.positionsof typestring
-
-
Bonus points for adding CRD Validation to the numbers of switches!
-
Then install the new CRDs with
make install -
Create a Machine, and a Switch linked to the Machine (by setting the
machinelabel)
Listing switches
-
Switches are associated to Machines with a label
(
kubectl label switch switch-xyz machine=machine-xyz) -
We can retrieve associated switches like this:
var switches uselessv1alpha1.SwitchList if err := r.List(ctx, &switches, client.InNamespace(req.Namespace), client.MatchingLabels{"machine": req.Name}, ); err != nil { logger.Error(err, "unable to list switches of the machine") return ctrl.Result{}, client.IgnoreNotFound(err) } logger.Info("Found switches", "switches", switches)
Updating status
-
Each time we reconcile a Machine, let's update its status:
status := "" for _, sw := range switches.Items { status += string(sw.Spec.Position[0]) } machine.Status.Positions = status if err := r.Status().Update(ctx, &machine); err != nil { ... -
Run the controller and check that POSITIONS gets updated
-
Add more switches linked to the same machine
-
...The POSITIONS don't get updated, unless we restart the controller
-
We'll see later how to fix that!
Creating objects
We can use the Create method to create a new object:
sw := uselessv1alpha1.Switch{
TypeMeta: metav1.TypeMeta{
APIVersion: uselessv1alpha1.GroupVersion.String(),
Kind: "Switch",
},
ObjectMeta: metav1.ObjectMeta{
GenerateName: "switch-",
Namespace: machine.Namespace,
Labels: map[string]string{"machine": machine.Name},
},
Spec: uselessv1alpha1.SwitchSpec{
Position: "down",
},
}
if err := r.Create(ctx, &sw); err != nil { ...
Create missing switches
-
In our reconciler, if a machine doesn't have enough switches, create them!
-
Option 1: directly create the number of missing switches
-
Option 2: create only one switch (and rely on later requeuing)
-
Note: option 2 won't quite work yet, since we haven't set up watches yet
Watches
-
Our controller doesn't react when switches are created/updated/deleted
-
We need to tell it to watch switches
-
We also need to tell it how to map a switch to its machine
(so that the correct machine gets queued and reconciled when a switch is updated)
Mapping a switch to its machine
Define the following helper function:
func (r *MachineReconciler) machineOfSwitch(ctx context.Context, obj client.Object) []ctrl.Request {
return []ctrl.Request{
ctrl.Request{
NamespacedName: types.NamespacedName{
Name: obj.GetLabels()["machine"],
Namespace: obj.GetNamespace(),
},
},
}
}
Telling the controller to watch switches
Update the SetupWithManager method in the controller:
// SetupWithManager sets up the controller with the Manager.
func (r *MachineReconciler) SetupWithManager(mgr ctrl.Manager) error {
return ctrl.NewControllerManagedBy(mgr).
For(&uselessv1alpha1.Machine{}).
Owns(&uselessv1alpha1.Switch{}).
Watches(
&uselessv1alpha1.Switch{},
handler.EnqueueRequestsFromMapFunc(r.machineOfSwitch),
).
Complete(r)
}
...And a few extra imports
Import the following packages referenced by the previous code:
"sigs.k8s.io/controller-runtime/pkg/handler"
"sigs.k8s.io/controller-runtime/pkg/source"
"k8s.io/apimachinery/pkg/types"
After this, when we update a switch, it should reflect on the machine.
(Try to change switch positions and see the machine status update!)
Flipping switches
-
Now re-add logic to flip switches that are not in "down" position
-
Re-add logic to wait a few seconds before flipping a switch
-
Change the logic to toggle one switch per machine every few seconds
(i.e. don't change all the switches for a machine; move them one at a time)
-
Handle "scale down" of a machine (by deleting extraneous switches)
-
Automatically delete switches when a machine is deleted
(ideally, using ownership information)
-
Test corner cases (e.g. changing a switch label)
Other possible improvements
-
Formalize resource ownership
(by setting
ownerReferencesin the switches) -
This can simplify the watch mechanism a bit
-
Allow to define a selector
(instead of using the hard-coded
machinelabel) -
And much more!
Acknowledgements
-
Useless Operator, by V Körbes
code | video (EN) | video (PT)
-
Zero To Operator, by Solly Ross
-
The kubebuilder book
???
:EN:- Implementing an operator with kubebuilder :FR:- Implémenter un opérateur avec kubebuilder