July 28th, 2024

A skeptic's first contact with Kubernetes

The author discusses their skepticism about Kubernetes, highlighting its complexity and workload management features. They acknowledge its strengths but express concerns about operational stability and configuration challenges.

Read original articleLink Icon
SkepticismFrustrationAppreciation
A skeptic's first contact with Kubernetes

The author, a long-time systems administrator, shares their initial skepticism about Kubernetes, primarily due to its perceived complexity and the need for tight control over workload placement. They aim to base their opinions on factual understanding rather than assumptions. The post outlines Kubernetes' core functionalities, including workload management, self-healing capabilities, and service discovery, emphasizing the importance of control loops in managing workloads. Control loops continuously adjust resources to maintain desired states, such as scaling based on latency metrics or queue depth.

The author explains Kubernetes components like Pods, Nodes, Services, and ReplicaSets, highlighting how they interact to ensure efficient workload management. They note that while creating Pods directly is straightforward, using ReplicaSets and Deployments offers better resilience and scalability. The discussion also covers storage management, contrasting ephemeral volumes with persistent volumes, and introducing StatefulSets for stable storage needs.

Despite recognizing the value of Kubernetes' design, the author raises questions about its operational stability, the lack of support for certain cloud-native architectures, and the complexities of its configuration language. They express frustration with the reliance on text-based templating in Kubernetes tools, suggesting that it complicates the user experience. Overall, the author concludes that while Kubernetes has significant merits, there are still areas for improvement and clarity.

AI: What people are saying
The comments reflect a mix of skepticism and practical insights regarding Kubernetes, echoing the author's concerns about its complexity and operational challenges.
  • Many users express frustration with YAML configuration, describing it as cumbersome and error-prone.
  • There is a consensus that while Kubernetes has powerful features, its usability can be hindered by its complexity and the steep learning curve.
  • Several commenters highlight the extensibility of Kubernetes, noting that it allows for custom metrics and autoscaling options beyond CPU usage.
  • Critiques of Helm, a popular package manager for Kubernetes, are common, with some arguing that issues with Helm shouldn't reflect on Kubernetes itself.
  • Users share their experiences with Kubernetes, often contrasting it with simpler alternatives, indicating a desire for more straightforward solutions.
Link Icon 22 comments
By @hbogert - 5 months
His take on text interpolation is very right. I'm a SWE turned SRE because as a developer I really enjoyed using K8s. But as a full-time SRE where I work just means YAML juggling. It's mind numbing that everybody is okay with this, this really is our domain's assembly era, albeit with whitespace, colons, dashes and brackets.

I've found solace in CUE which I just run locally to catch all the small errors everybody makes on a daily basis. Putting the CUE validation in our pipeline is too confronting for others, yet they're constantly making up best practices adhoc during reviews which could've easily been codified with CUE (or some other serious config language).

By @Atreiden - 5 months
Great writeup on the core fundamentals, saved this to share with engineers who are new to k8s and need a quick primer.

Re: This piece -

> Given the Controller pattern, why isn't there support for "Cloud Native" architectures?

> I would like to have a ReplicaSet which scales the replicas based on some simple calculation for queue depth (eg: queue depth / 16 = # replicas)

> Defining interfaces for these types of events (queue depth, open connections, response latency) would be great

> Basically, Horizontal Pod Autoscaler but with sensors which are not just "CPU"

HPAs are actually still what you want here - you can configure HPAs to scale automatically based on custom metrics. If you run Prometheus (or a similar collector), you can define the metric you want (e.g. queue-depth) and the autoscaler will make scaling decisions with these in mind.

Resources:

https://kubernetes.io/docs/tasks/run-application/horizontal-...

https://learnk8s.io/autoscaling-apps-kubernetes

By @AcerbicZero - 5 months
This was a solid write up; I've been using K8s (intermittently) for like, 5 years now, and I still spend an inordinate amount of time looking things up and trying to convert the nonsense naming conventions used to something understandable. I can think of 20 or so projects that would have run great on K8s, and I can think of 0 projects that were running on K8s, which worked well.

Eventually, seeing the wrong tool used for the wrong job time and time again I came around to seeing K8s as the latest iteration of time sharing on a mainframe, but this time with YAML, and lots of extra steps.

By @cyberax - 5 months
My problem with K8s: the network abstraction layer just feels _wrong_.

It's an attempt to replicate the old model of "hard exterior, gooey interior" model of corporate networks.

I would very much prefer if K8s used public routable IPv6 for traffic delivery, and then simply provided an authenticated overlay on top of it.

By @jauntywundrkind - 5 months
> Why are the storage and networking implementations "out of tree" (CNI / CSI)? Given the above question, why is there explicit support for Cloud providers? eg: LoadBalancer supports AWS/GCP/Azure/..

Kubernetes has been pruning out vendor-specific code for a while now, moving it out of tree. The upcoming 1.31 release will drop a lot of existing, already deprecated support for AWS & others from Kubernetes proper. https://github.com/kubernetes/enhancements/blob/master/keps/...

There's some plan to make this non-dosruptive to users but I haven followed it closely (I don't use these providers anyhow).

> Why are we generating a structured language (YAML), with a computer, by manually adding spaces to make the syntax valid? There should be no intermediate text-template representation like this one.

Helm is indeed a wild world. It's also worth noting that Kubernetes is also pushing towards neutrality here; Helm has never been an official tool, but Kustomzie is builtin to kubectl & is being removed. https://github.com/orgs/kubernetes/projects/183/views/1?filt...

There's a variety of smart awesome options out there. First place I worked at that went to kube used jsonnet (which alas went unmaintained). Folks love CUE and Dhall and others. But to my knowledge there's no massive bases of packaged software like exists for Helm. Two examples, https://github.com/bitnami/charts/tree/main/bitnami https://github.com/onedr0p/home-ops . It'd be lovely to see more works outside Helm.

Thanks sysdig for your 1.31 write up, https://sysdig.com/blog/whats-new-kubernetes-1-31/

By @cybrexalpha - 5 months
> Why are we generating a structured language (YAML), with a computer, by manually adding spaces to make the syntax valid?

Yep, it sucks. It's not like nobody has tried to do better, but nothing else has the adoption of Helm. Ultimately text is, as always, universal.

If you want a fun fact: the communication between kubectl and the kube-api-server is actually in JSON, not YAML.

By @sigwinch28 - 5 months
If we replace “YAML” with “JSON” and then talk about naive text-based templating, it seems wild.

That’s because it is. Then we go back to YAML and add whitespace sensitivity and suddenly it’s the state-of-the-art for declaring infrastructure.

By @JojoFatsani - 5 months
The helm yaml thing really is annoying. Unfortunately it feels like helm is too firmly embedded to unseat at this point.
By @politelemon - 5 months
This was a useful read and somewhat gels with my experiences.

Looking at the statement at the beginning

> and it only requires you to package your workload as a Docker image, which seems like a reasonable price to pay.

is no longer true as you continue down the path, since it actually requires you to do a lot more than you'd think.

By @dilyevsky - 5 months
On the last point ("Stringy Types") - k8s API types are actually defined as Protobufs[0] so they have strictly defined schemas. There are some types that are sum types (IntOrString) but generally no type depends on some other field's value afaik. Ofc that doesn't stop CRD developers from making everything a String type and then interpolating server-side (within their custom controller) based on phases of the moon and what not...

[0] - e.g https://github.com/kubernetes/api/blob/master/core/v1/genera...

By @azaras - 5 months
I agree with:

> My opinion is that a large part of Kubernetes' value is derived from just two concepts

I agree with the first one, "control loops," but the second one is "API-based," not "Services."

By @FridgeSeal - 5 months
Solid write up, but small nitpick with the diagram at the start:

It displays a pod containing multiple containers (this is fine and normal) but then highlights some of those containers to be different services.

Unless you guys are running wildly different setups, or we’re talking about sidecar and init containers which is a whole rabbit hole unto itself, I put different services in different pods.

By @mark242 - 5 months
Here's what I would like from a Kubernetes-like system.

I have a collection of machines.

I have a very simple file that defines A) what I want to run, B) how the pieces communicate to each other, and C) how I want it to scale.

Make that work without me having to think about any of the underlying infrastructure. (If this sounds suspiciously similar to Heroku, there you go)

By @jongjong - 5 months
Kubernetes provides a relatively simple abstraction to represent a cluster of machines as a single entity. It's just about as simple as it can be IMO.

Its simplicity leverages the idea that each part of your software should be fully responsible for its own lifecycle and can handle and recover from all scenarios that can impact it.

For example, if your app service happens to launch before your database service, then it should be able to seamlessly handle this situation and keep trying to reconnect until the database service is started. This characteristic is generally desirable in any operating environment but it is critical for Kubernetes. Its orchestration model is declarative, so it doesn't make much fuss over launch order... Yet it works well and simply if each of your services have this failure tolerance and self-healing characteristic.

By @INTPenis - 5 months
Reminds me of myself, and probably many others on here. I've always been a skeptic of novelty. Just like with cryptocoins I was there on the ground floor, I remember when Docker was launching. I remember trying to understand what it was, and I remember someone comparing it to MSI for Linux.

Unlike cryptocoins I didn't miss out on being a millionaire with containers. :(

I just avoided containers until 2019! So to me it was first containers, and then kubernetes.

That way I was already sold on the whole container image concept and namespaces in your OS, I had used FreeBSD jails in the early 2000s.

So when I understood k8s I realized it's literally just a container orchestrator. It might seem complicated but that's all to do with being able to run containers on a range of nodes instead of just one. And of course having an API to tie it all together.

Whether you project needs that or not is something you should definitely explore in depth before you set out and try to use it. Personally I prefer container hosts over k8s for most startup projects. I look forward to Talos' new podman package and being able to deploy a minimal podman container host with Talos, no k8s necessary.

By @jgalt212 - 5 months
What's the break-even number of machines you must manage before Kubernetes starts to make sense?
By @actionfromafar - 5 months
Wny can't we use Meson? Yaml is basically used as a bizarre build system anyways.
By @hosh - 5 months
This is a good start. It misses an important thing about Kubernetes that is often missed: extensibility. Each and every thing within Kubernetes can be swapped out for something else, including the scheduler. There are Custom Resource Definitions that supports these extensions and operators.

For example, there is no built-in autoscaler for nodes, but someone wrote one and you can add one in there. It uses a constraint solver for determining whether to expand or shrink node groups. If you want to use something else, you can find something or write it and install it.

Another example, you don't have to use kube-proxy. There are other ways to manage inter-node networking.

To address some of the questions:

> Why does it not matter if the state is unstable? If I'm operating a cluster that can't settle, I'd like to know immediately!

I'd point to the Cynefine framework to help make sense of this. Kubernetes is a tool that helps manage things in the complex domain, rather than the complicated domain. Unlike complicated systems, complex systems and complex adaptive systems may never reach a defined settled state.

> Basically, Horizontal Pod Autoscaler but with sensors which are not just "CPU"

That's already available. In addition to scaling on built-in metrics such as cpu and mem, there are ways to create custom metrics, including queue depth. You can do this because Kubernetes is extensible

> Why are the storage and networking implementations "out of tree" (CNI / CSI)?

It used to be in-tree, until the number of third party cloud and in-house storage and networking providers became unwieldy. This goes along with that fundemental characteristic of the Kubernetes design -- extensibility. AWS owns the Elastic Block Storage CSI, and GCP owns its CSI for its storage device. CNI allowed for the various service meshes, including exciting new ones such as the one based on eBFP.

The Cynefine framework again, helps sheds some light on this: the best way to respond to things in the complex domain is to try a lot of different approaches. Even if one approach doesn't look like it works now, some future state may make that previously impractical approach to work well.

> Given the above question, why is there explicit support for Cloud providers?

The current versions of Kubernetes pushes those implementations to CNI and operators. So for example, in order to make use of AWS ELB/ALB for the Ingress object, you have to additionally install the AWS-specific driver. If you are using AWS's EKS service, this is managed as an EKS addon. Under the hood, these drivers are, guess what, pods managed by replicasets managed by deployments that listens to the Kubernetes API servers for changes to the Ingress resource.

Not everyone uses it. On one of the Kubernetes sites I worked on, we used Traefik inside Kubernetes and its custom resource definition, IngressRoute. Everytime you create an Ingress, the AWS driver will create a completely new Load Balancer, which of course, drives up cost for very little gain.

By @skywhopper - 5 months
Why is he mad at Kubernetes about Helm? Yes, Helm is a mess. Yes, lots of people use Helm to deploy things to Kubernetes, but critiques of Helm are not valid critiques of Kubernetes.
By @mcorbin - 5 months
Nice article ;)

> The number of rules may be large, which may be problematic for certain traffic forwarding implementations (iptables linearly evaluates every rule)

Kube proxy also supports ipvs out of the box, and some CNI (like Cilium) can also replace kube proxy and rely on eBPF.

> When a Pod is moved to another node, the traffic will be forwarded twice until the old DNS entry expires Not sure to understand this one. On a standard setup what happens is:

- Pod A is running on a node, receiving traffic from a service

- The pod is stopped by kubelet (that send a SIGTERM to it).

- The pod should gracefully shutdown. During the shutdown phase, only _existing_ connections are forwarded to the stopping pod, new ones will be already forwarded elsewhere.

- If the pod stops before the terminationGracePeriodSeconds duration (default 30s), everything is fine. Else, the pod is killed by kubelet. So it's developers that should make sure pods handle signals correctly.

"Misbehaving clients (eg: ones that do not re-resolve DNS before reconnecting) will continue to work" => the services IP is stable so clients don't need to re-resolve.

> Why does it not matter if the state is unstable? If I'm operating a cluster that can't settle, I'd like to know immediately!

Kubernetes exposes a lot of metrics, on the control plane components or kubelet, usually using the Prometheus format. Look for example at the metrics exposed by kube state metrics: https://github.com/kubernetes/kube-state-metrics/tree/main/d...

With controllers metrics + kube state metrics about most Kubernetes resources, you can easily build alerts when a resource fails to reconcile.

> Basically, Horizontal Pod Autoscaler but with sensors which are not just "CPU"

Take a look at KEDA, it's exactly this: https://keda.sh/ It "extends" the autoscaler capabilities. If you're running Prometheus you can for example scale on any metric that is stored in Prometheus (and so exposed by your application/infrastructure components: queue depth, latency, request rate...).

Kubernetes was built to be extended like this. Same for your question "Why are the storage and networking implementations "out of tree" (CNI / CSI)?", to my experience support is very good today on various cloud providers or on premise infra components. Look at Karpenter for example, it's IMO a revolution in the Kubernetes node management world.