KUBERNETES ARCHITECTURE Flashcards
Components of Kubernetes
In its simplest form, Kubernetes is made of a central manager (aka master) and some worker nodes (we will see in a follow-on chapter how you can actually run everything on a single node for testing purposes). The manager runs an API server, a scheduler, various controllers and a storage system to keep the state of the cluster, container settings, and the networking configuration.
Kubernetes exposes an API (via the API server): you can communicate with the API using a local client called kubectl or you can write your own client. The kube-scheduler sees the requests for running containers coming to the API and finds a suitable node to run that container in. Each node in the cluster runs two processes: a kubelet and a kube-proxy. The kubelet receives requests to run the containers, manages any necessary resources and watches over them on the local node. The kube-proxy creates and manages networking rules to expose the container on the network.
Pod
We have learned that Kubernetes is an orchestration system to deploy and manage containers. Containers are not managed individually; instead, they are part of a larger object called a Pod. A Pod consists of one or more containers which share an IP address, access to storage and namespace. Typically, one container in a Pod runs an application, while other containers support the primary application.
Orchestration
Orchestration is managed through a series of watch-loops, also known as operators or controllers. Each controller interrogates the kube-apiserver for a particular object state, modifying the object until the declared state matches the current state. The default, newest, and feature-filled controller for containers is a Deployment. A Deployment ensures that resources are available, such as IP address and storage, and then deploys a ReplicaSet. The ReplicaSet is a controller which deploys and restarts containers, Docker by default, until the requested number of containers is running. Previously, the function was handled by the ReplicationController, but has been obviated by Deployments. There are also Jobs and CronJobs to handle single or recurring tasks.
Deployment
The default, newest, and feature-filled controller for containers is a Deployment. A Deployment ensures that resources are available, such as IP address and storage, and then deploys a ReplicaSet.
ReplicaSet
The ReplicaSet is a controller which deploys and restarts containers, Docker by default, until the requested number of containers is running. Previously, the function was handled by the ReplicationController, but has been obviated by Deployments
Jobs and CronJobs
There are also Jobs and CronJobs to handle single or recurring tasks.
DaemonSet
A DaemonSet will ensure that a single pod is deployed on every node. These are often used for logging and metrics pods.
StatefulSet
A StatefulSet can be used to deploy pods in a particular order, such that following pods are only deployed if previous pods report a ready status.
Labels
To easily manage thousands of Pods across hundreds of nodes can be a difficult task to manage. To make management easier, we can use labels, arbitrary strings which become part of the object metadata. These can then be used when checking or changing the state of objects without having to know individual names or UIDs. Nodes can have taints to discourage Pod assignments, unless the Pod has a toleration in its metadata.
metadata for annotations
There is also space in metadata for annotations which remain with the object but cannot be used by Kubernetes commands. This information could be used by third-party agents or other tools.
multi-tenancy
While using lots of smaller, commodity hardware could allow every user to have their very own cluster, often multiple users and teams share access to one or more clusters. This is referred to as multi-tenancy.
namespace
A segregation of resources, upon which resource quotas and permissions can be applied. Kubernetes objects may be created in a namespace or cluster-scoped. Users can be limited by the object verbs allowed per namespace. Also the LimitRange admission controller constrains resource usage in that namespace. Two objects cannot have the same Name: value in the same namespace.
context
A combination of user, cluster name and namespace. A convenient way to switch between combinations of permissions and restrictions. For example you may have a development cluster and a production cluster, or may be part of both the operations and architecture namespaces. This information is referenced from ~/.kube/config.
Resource Limits
A way to limit the amount of resources consumed by a pod, or to request a minimum amount of resources reserved, but not necessarily consumed, by a pod. Limits can also be set per-namespaces, which have priority over those in the PodSpec.
Pod Security Policies
A policy to limit the ability of pods to elevate permissions or modify the node upon which they are scheduled. This wide-ranging limitation may prevent a pod from operating properly. The use of PSPs may be replaced by Open Policy Agent in the future.
Network Policies
The ability to have an inside-the-cluster firewall. Ingress and Egress traffic can be limited according to namespaces and labels as well as typical network traffic characteristics.
Master Node
The Kubernetes master runs various server and manager processes for the cluster. Among the components of the master node are the kube-apiserver, the kube-scheduler, and the etcd database. As the software has matured, new components have been created to handle dedicated needs, such as the cloud-controller-manager; it handles tasks once handled by the kube-controller-manager to interact with other tools, such as Rancher or DigitalOcean for third-party cluster management and reporting.
There are several add-ons which have become essential to a typical production cluster, such as DNS services. Others are third-party solutions where Kubernetes has not yet developed a local component, such as cluster-level logging and resource monitoring.
Master Node Components
master node specific components
kube-apiserver
kube-scheduler
etcd Database
Other Agents
Common node components.
kubelet
kube-proxy
Container Runtime
kube-apiserver
The kube-apiserver is central to the operation of the Kubernetes cluster.
All calls, both internal and external traffic, are handled via this agent. All actions are accepted and validated by this agent, and it is the only agent which connects to the etcd database. As a result, it acts as a master process for the entire cluster, and acts as a frontend of the cluster’s shared state. Each API call goes through three steps: authentication, authorization, and several admission controllers.
kube-scheduler
The kube-scheduler uses an algorithm to determine which node will host a Pod of containers. The scheduler will try to view available resources (such as volumes) to bind, and then try and retry to deploy the Pod based on availability and success.
There are several ways you can affect the algorithm, or a custom scheduler could be used instead. You can also bind a Pod to a particular node, though the Pod may remain in a pending state due to other settings.
One of the first settings referenced is if the Pod can be deployed within the current quota restrictions. If so, then the taints and tolerations, and labels of the Pods are used along with those of the nodes to determine the proper placement.
etcd Database
The state of the cluster, networking, and other persistent information is kept in an etcd database, or, more accurately, a b+tree key-value store. Rather than finding and changing an entry, values are always appended to the end. Previous copies of the data are then marked for future removal by a compaction process. It works with curl and other HTTP libraries, and provides reliable watch queries.
Simultaneous requests to update a value all travel via the kube-apiserver, which then passes along the request to etcd in a series. The first request would update the database. The second request would no longer have the same version number, in which case the kube-apiserver would reply with an error 409 to the requester. There is no logic past that response on the server side, meaning the client needs to expect this and act upon the denial to update.
There is a master database along with possible followers. They communicate with each other on an ongoing basis to determine which will be master, and determine another in the event of failure. While very fast and potentially durable, there have been some hiccups with some features like whole cluster upgrades. Starting with v1.15.1, kubeadm allows easy deployment of a multi-master cluster with stacked etcd or an external database cluster.
Other Agents
The kube-controller-manager is a core control loop daemon which interacts with the kube-apiserver to determine the state of the cluster. If the state does not match, the manager will contact the necessary controller to match the desired state. There are several controllers in use, such as endpoints, namespace, and replication. The full list has expanded as Kubernetes has matured.
Remaining in beta as of v1.16, the cloud-controller-manager interacts with agents outside of the cloud. It handles tasks once handled by kube-controller-manager. This allows faster changes without altering the core Kubernetes control process. Each kubelet must use the –cloud-provider-external settings passed to the binary.
Worker node
All worker nodes run the kubelet and kube-proxy, as well as the container engine, such as Docker or cri-o. Other management daemons are deployed to watch these agents or provide services not yet included with Kubernetes.
The kubelet interacts with the underlying Docker Engine also installed on all the nodes, and makes sure that the containers that need to run are actually running. The kube-proxy is in charge of managing the network connectivity to the containers. It does so through the use of iptables entries. It also has the userspace mode, in which it monitors Services and Endpoints using a random high-number port to proxy traffic. Use of ipvs can be enabled, with the expectation it will become the default, replacing iptables.
Kubernetes does not have cluster-wide logging yet. Instead, another CNCF project is used, called Fluentd. When implemented, it provides a unified logging layer for the cluster, which filters, buffers, and routes messages.
Cluster-wide metrics is not quite fully mature, so Prometheus is also often deployed to gather metrics from nodes and perhaps some applications.
kubelet
The kubelet interacts with the underlying Docker Engine also installed on all the nodes, and makes sure that the containers that need to run are actually running.
It is the heavy lifter for changes and configuration on worker nodes. It accepts the API calls for Pod specifications (a PodSpec is a JSON or YAML file that describes a Pod). It will work to configure the local node until the specification has been met.
Should a Pod require access to storage, Secrets or ConfigMaps, the kubelet will ensure access or creation. It also sends back status to the kube-apiserver for eventual persistence.