Skip to main content

Command Palette

Search for a command to run...

Kubernetes - Volumes - Part-1

emptyDir, hostPath

Updated
8 min read
Kubernetes - Volumes - Part-1

In this article we are going to explore Kubernetes Volumes together. We will discuss about what they are, why the exist and how they actually work behind the scene.

If you ever wondered “Where my data go when a pod restarts“ you are in a right place. Lets dive in 🚀

🤔 Why Do We Even Need Volumes?

Let’s start with a simple question: why does Kubernetes need volumes at all?

We know that a Pod is made up of one or more containers. And we also know an important (and slightly scary) fact about containers: Containers are ephemeral. This means when a container crashes, restarts, or gets recreated, its filesystem is wiped clean. Any files written inside the container? 💥 Gone.

This is where Volumes come to the rescue. A Kubernetes Volume provides a way to decouple storage from the container’s lifecycle. So to answer the question :

🎯 Why We Use Volumes

  • Protect application data from container crashes

  • Share data between containers in a Pod

  • Make applications more reliable and production-ready

Kubernetes offers multiple volume types, each designed for different use cases—temporary storage, shared storage, cloud disks, network storage, and more.

In the next sections, we’ll explore these volume options one by one, understand when to use what, and avoid common mistakes along the way.


📦 emptyDir — The Simplest Kubernetes Volume

The most basic volume type in Kubernetes is emptyDir. An emptyDir volume is created at the Pod level, not at the container level.

The What ?

👉 But what does Pod level actually mean?

When a Pod is created: Kubernetes creates the emptyDir volume once. Every container inside that Pod can mount and access the same volume.

So, the volume does not belong to a single container, it belongs to the pod itself. As long as the pod exists the volume exists

🤝 How Do Containers Share an emptyDir?

Let’s make this real with a common and very practical scenario. Imagine a Pod with two containers:

  • App Container: Runs your main application and generate logs

  • Sidecar Container: Collect those logs and ship them to a central logging system. You don’t want to add extra load or logging logic inside your main app, so you offload that responsibility to a sidecar container.

Both containers mount the same emptyDir volume as a result the app writes the logs to the volume and sidecar reads logs from the same volume. They’re isolated containers but sharing data seamlessly.

The important catch is data in the emptyDir volume will be lost when the pod restarts or crashed. But the data will be preserved even if any of the container in the pod crashes or restarted. Because emptyDir lives only as long as the Pod lives.

So in simple terms emptyDir:

  • Container lifecycle ❌ does NOT affect data

  • Pod lifecycle ✅ DOES affect data

The When ?

emptyDir is ideal for:

  • Temporary files

  • Cache data

  • Shared workspace between containers

  • Log sharing (like our sidecar example)

🚫 It is not meant for long-term or critical data storage.

The How ?

Here’s a minimal working example of exactly the scenario we discussed.

apiVersion: v1
kind: Pod
metadata:
  name: shared-workspace
spec:
  containers:
    - name: producer
      image: busybox
      # Writes data to the volume
      volumeMounts:          # <--- VolumeMount block (Producer)
        - name: shared-data
          mountPath: /app/data
    - name: consumer
      image: busybox
      # Reads data from the volume
      volumeMounts:          # <--- VolumeMount block (Consumer)
        - name: shared-data
          mountPath: /app/input
  volumes:                  # <--- Volume definition
    - name: shared-data
      emptyDir: {}           # <--- The magic keyword

🔍 Let’s Dissect the Example Step by Step

  • Let’s begin at the bottom of the Pod spec—the volumes block. This is where we define

    • The volume name - shared-data

    • The volume type - emptyDir

  • This tells Kubernetes: Create an empty directory when the Pod starts and attach it to this Pod. At this point, the volume exists—but no container is using it yet.

  • Now look at the two containers inside the Pod: Each container has its own volumeMounts block. Both containers mount the same volume (shared-data), but at different paths. This is intentional—and this is where things get interesting.

🤔 The Common Doubt (Very Natural One!)

You might be thinking: If the producer writes logs to /app/data, how does the consumer read them from /app/input?

Think of the volume as a room. And think of mountPath as a door. A room can have multiple doors, no matter which door you enter, you end up in the same room

With this analogy in our example /app/data is one door and /app/input is another door, both doors lead to the same emptyDir volume, so The producer writes logs into the room through the /app/data door and the consumer reads those same logs from the room through the /app/input door

With emptyDir, data survives container crashes and restarts. However, once the Pod is restarted or recreated, all the data in the volume is lost.


But what if we want our data to survive even Pod restarts or crashes? That’s where hostPath comes in. Let’s take a look.

📦 hostPath

The main limitation of emptyDir is that the data is lost when the Pod restarts. This problem is addressed by hostPath.

The What ?

With hostPath, the data is preserved no matter what happens to the Pod, as long as the Pod is scheduled on the same node.

🤔 How does this work?

The trick is simple: hostPath mounts a specific file or directory from the node’s filesystem directly into your Pod. So even if the Pod dies and a new Pod starts on the same node, the data is still there—because it never left the node in the first place.

The When?

Use hostPath when:

  • You want data to survive Pod restarts

  • You are okay with the Pod running on the same node

  • You are working in local development, single-node clusters, or testing environments

  • You need access to node-level files (logs, sockets, configs)

⚠️ Not recommended for multi-node production workloads due to portability and security concerns.

⚠️ During node maintenance or a node crash, extra care is required. Any Pod using a hostPath volume will lose its data if it gets drained and rescheduled onto a different node. The data is preserved only as long as the Pod remains on the same node and the node is up and running.

The How ?

apiVersion: v1
kind: Pod
metadata:
  name: shared-workspace
spec:
  containers:
    - name: producer
      image: busybox
      volumeMounts:
        - name: shared-data
          mountPath: /app/data
    - name: consumer
      image: busybox
      volumeMounts:
        - name: shared-data
          mountPath: /app/input
  volumes:
    - name: shared-data
      hostPath: # <--- The magic keyword hostPath
        path: /tmp/shared-data
        type: DirectoryOrCreate

🔍 Let’s Dissect the Example Step by Step

  • Instead of emptyDir, we now use hostPath

  • The data is stored on the node at /tmp/shared-data

  • The type field tells Kubernetes what it should expect at the given path on the node and what to do if it doesn’t exist. If it does not exist create the directory automatically

  • 📦 Common hostPath Types

TypeWhat it means
"" (empty string)Default value. No checks are performed
DirectoryOrCreateUses the directory, or creates it if it does not exist
DirectoryDirectory must already exist
FileOrCreateUses the file, or creates it if it does not exist
FileFile must already exist
SocketUnix domain socket
CharDeviceCharacter device
BlockDeviceBlock device

🤔 Common Doubts

You might have this question in mind: We know that hostPath is tied to a specific node. What happens if I restart the Pod or apply a rollout that recreates Pods? Is there any guarantee that the Pod will be scheduled on the same node where my data exists? And if it gets scheduled on a different node, will the data be lost?

Yes—if the Pod is scheduled on a different node, the data will be lost. In most cases, Kubernetes tries to reschedule the Pod onto the same node it was previously running on. This is why, during normal Pod restarts or rollouts, you often see the Pod coming back on the same node and your data appearing to be “safe.”

❓ Why Does This Happen?

When the scheduler evaluates where to place a Pod, it considers: Existing node assignments, Node availability, Resource constraints. If the node is Healthy, Not Drained, and has sufficient resources Kubernetes will typically place the pod back on the same node. ‘

However There is no strict guarantee. If the node is crashed, drained, under maintenance, or out of resources the pod will be rescheduled to different node.


So far, we’ve seen how hostPath and emptyDir volumes work—great for temporary data, experiments, caching, and understanding how Kubernetes handles storage inside a Pod or a node. But what if you want your data to live beyond Pod restarts, survive node failures, and stay safe from all the usual Kubernetes chaos? 🤯

That’s exactly where Persistent Volumes (PV) come into the picture. They solve the problem of long-lived, reliable storage in Kubernetes. I’ve covered Persistent Volumes in detail in the next blog, breaking down how they work, why they matter, and when you should use them in real-world setups.

If you’ve made it this far — great job 👏 You now have a solid understanding of Kubernetes’ ephemeral storage story.

👉 Continue the journey here: Persistent Volumes in Kubernetes — and let’s level up your storage game 🚀

Kubernetes in Detail

Part 4 of 4

Kubernetes in Detail is a comprehensive series that explains Kubernetes from the ground up. It covers all core and advanced Kubernetes concepts in depth, with clear explanations and practical insights.

Start from the beginning

🔐 Kubernetes - Secrets

The Right Way to Handle Sensitive Data