Kubernetes Volume Management
Introduction
- In today’s business model, data is the most precious asset for many startups and enterprises.
- In a Kubernetes cluster, containers in Pods can be either data producers, data consumers, or both.
- While some container data is expected to be transient and is not expected to outlive a Pod, other forms of data must outlive the Pod in order to be aggregated and possibly loaded into analytics engines.
- Kubernetes must provide storage resources in order to provide data to be consumed by containers or to store data produced by containers.
- Kubernetes uses Volumes of several types and a few other forms of storage resources for container data management.
- In this chapter, we will talk about
PersistentVolume
and PersistentVolumeClaim
objects, which help us attach persistent storage Volumes to Pods.
Volumes
- As we know, containers running in Pods are ephemeral in nature.
- All data stored inside a container is deleted if the container crashes. However, the kubelet will restart it with a clean slate, which means that it will not have any of the old data.
- To overcome this problem, Kubernetes uses Volumes, storage abstractions that allow various storage technologies to be used by Kubernetes and offered to containers in Pods as storage media.
- A Volume is essentially a mount point on the container’s file system backed by a storage medium. The storage medium, content and access mode are determined by the Volume Type.
- In Kubernetes, a Volume is linked to a Pod and can be shared among the containers of that Pod.
- Although the Volume has the same life span as the Pod, meaning that it is deleted together with the Pod, the Volume outlives the containers of the Pod - this allows data to be preserved across container restarts.
Volume Types
A directory which is mounted inside a Pod is backed by the underlying Volume Type. A Volume Type decides the properties of the directory, like size, content, default access modes, etc. Some examples of Volume Types are:
- emptyDir
An empty Volume is created for the Pod as soon as it is scheduled on the worker node. The Volume’s life is tightly coupled with the Pod. If the Pod is terminated, the content of emptyDir is deleted forever.
- hostPath
With the hostPath Volume Type, we can share a directory between the host and the Pod. If the Pod is terminated, the content of the Volume is still available on the host.
- gcePersistentDisk
With the gcePersistentDisk Volume Type, we can mount a Google Compute Engine (GCE) persistent disk into a Pod.
- awsElasticBlockStore
With the awsElasticBlockStore Volume Type, we can mount an AWS EBS Volume into a Pod.
- azureDisk
With azureDisk we can mount a Microsoft Azure Data Disk into a Pod.
- azureFile
With azureFile we can mount a Microsoft Azure File Volume into a Pod.
- cephfs
With cephfs, an existing CephFS volume can be mounted into a Pod. When a Pod terminates, the volume is unmounted and the contents of the volume are preserved.
- nfs
With nfs, we can mount an NFS share into a Pod.
- iscsi
With iscsi, we can mount an iSCSI share into a Pod.
- secret
With the secret Volume Type, we can pass sensitive information, such as passwords, to Pods.
- configMap
With configMap objects, we can provide configuration data, or shell commands and arguments into a Pod.
- persistentVolumeClaim
We can attach a PersistentVolume to a Pod using a persistentVolumeClaim.
You can learn more details about Volume Types from the documentation.
PersistentVolumes
- In a typical IT environment, storage is managed by the storage/system administrators. The end user will just receive instructions to use the storage but is not involved with the underlying storage management.
- In the containerized world, we would like to follow similar rules, but it becomes challenging, given the many Volume Types we have seen earlier.
- Kubernetes resolves this problem with the PersistentVolume (PV) subsystem, which provides APIs for users and administrators to manage and consume persistent storage.
- To manage the Volume, it uses the PersistentVolume API resource type, and to consume it, it uses the PersistentVolumeClaim API resource type.
- A Persistent Volume is a storage abstraction backed by several storage technologies, which could be local to the host where the Pod is deployed with its application container(s), network attached storage, cloud storage, or a distributed storage solution. A Persistent Volume is statically provisioned by the cluster administrator.
- PersistentVolumes can be dynamically provisioned based on the StorageClass resource. A StorageClass contains pre-defined provisioners and parameters to create a PersistentVolume.
- Using PersistentVolumeClaims, a user sends the request for dynamic PV creation, which gets wired to the StorageClass resource.
- Some of the Volume Types that support managing storage using PersistentVolumes are:
- GCEPersistentDisk
- AWSElasticBlockStore
- AzureFile
- AzureDisk
- CephFS
- NFS
- iSCSI.
- For a complete list, as well as more details, you can check out the types of Persistent Volumes.
PersistentVolumeClaims
- ReadWriteOnce (read-write by a single node)
- ReadOnlyMany (read-only by many nodes)
- ReadWriteMany (read-write by many nodes)
- Once a suitable PersistentVolume is found, it is bound to a PersistentVolumeClaim.
- After a successful bound, the PersistentVolumeClaim resource can be used by the containers of the Pod.
- Once a user finishes its work, the attached PersistentVolumes can be released.
- The underlying PersistentVolumes can then be
reclaimed
(for an admin to verify and/or aggregate data), deleted
(both data and volume are deleted), or recycled
for future usage (only data is deleted), based on the configured persistentVolumeReclaimPolicy property.
- To learn more, you can check out the PersistentVolumeClaims.
Container Storage Interface (CSI)
- Container orchestrators like Kubernetes, Mesos, Docker or Cloud Foundry used to have their own methods of managing external storage using Volumes.
- For storage vendors, it was challenging to manage different Volume plugins for different orchestrators.
- Storage vendors and community members from different orchestrators started working together to standardize the Volume interface; a volume plugin built using a standardized Container Storage Interface (CSI) designed to work on different container orchestrators.
- Explore the CSI specifications for more details.
Using a Shared hostPath Volume Type Demo Guide
This exercise guide was prepared for the video demonstration available at the end of this chapter. It includes a few commands and the sample files presented in the video.
$ vim app-blue-shared-vol.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
creationTimestamp: null
labels:
app: blue-app
name: blue-app
spec:
replicas: 1
selector:
matchLabels:
app: blue-app
strategy: {}
template:
metadata:
creationTimestamp: null
labels:
app: blue-app
type: canary
spec:
volumes:
- name: host-volume
hostPath:
path: /home/docker/blue-shared-volume
containers:
- image: nginx
name: nginx
ports:
- containerPort: 80
volumeMounts:
- mountPath: /usr/share/nginx/html
name: host-volume
- image: debian
name: debian
volumeMounts:
- mountPath: /host-vol
name: host-volume
command: ["/bin/sh", "-c", "echo Welcome to BLUE App! > /host-vol/index.html ; sleep infinity"]
status: {}
Using a Shared hostPath Volume Type (Demo)
Learning Objectives (Review)
By the end of this chapter, you should be able to:
- Explain the need for persistent data management.
- Compare Kubernetes Volume types.
- Discuss PersistentVolumes and PersistentVolumeClaims.