In this article we’re going to run through the concept of Docker data volumes: what they are, why they’re useful, the different types of volumes, how to use them, and when to use each one. We’ll also go through some examples of how to use Docker volumes via the docker
command line tool.
By the time we reach the end of the article, you should be comfortable creating and using any kind of Docker data volume.
To follow this tutorial, you will need the following:
Note: Even though the Prerequisites give instructions for installing Docker on Ubuntu 14.04, the docker
commands for Docker data volumes in this article should work on other operating system as long as Docker is install.
Working with Docker requires understanding quite a few Docker-specific concepts, and most of the documentation focuses on explaining how to use Docker’s toolset without much explanation of why you’d want to use any of those tools. This can be confusing if you’re new to Docker, so we’ll start by going through some basics and then jump into working with Docker containers. Feel free to skip ahead to the next section if you’ve worked with Docker before and just want to know how to get started with data volumes.
A Docker container is similar to a virtual machine. It basically allows you to run a pre-packaged “Linux box” inside a container. The main difference between a Docker container and a typical virtual machine is that Docker is not quite as isolated from the surrounding environment as a normal virtual machine would be. A Docker container shares the Linux kernel with the host operating system, which means it doesn’t need to “boot” the way a virtual machine would.
Since so much is shared, firing up a Docker container is a quick and cheap operation — in most cases you can bring up a full Docker container (the equivalent of a normal virtual machine) in the same time as it would take to run a normal command line program. This is great because it makes deploying complex systems a much easier and more modular process, but it’s a different paradigm from the usual virtual machine approach and has some unexpected side effects for people coming from the virtualization world.
There are three main use cases for Docker data volumes:
The third case is a little more advanced, so we won’t go into it in this tutorial, but the first two are quite common.
In the first (and simplest) case you just want the data to hang around even if you remove the container, so it’s often easiest to let Docker manage where the data gets stored.
There’s no way to directly create a “data volume” in Docker, so instead we create a data volume container with a volume attached to it. For any other containers that you then want to connect to this data volume container, use the Docker’s --volumes-from
option to grab the volume from this container and apply them to the current container. This is a bit unusual at first glance, so let’s run through a quick example of how we could use this approach to make our byebye
file stick around even if the container is removed.
First, create a new data volume container to store our volume:
docker create -v /tmp --name datacontainer ubuntu
This created a container named datacontainer
based off of the ubuntu
image and in the directory /tmp
.
Now, if we run a new Ubuntu container with the --volumes-from
flag and run bash
again as we did earlier, anything we write to the /tmp
directory will get saved to the /tmp
volume of our datacontainer
container.
First, start the ubuntu
image:
docker run -t -i --volumes-from datacontainer ubuntu /bin/bash
The -t
command line options calls a terminal from inside the container. The -i
flag makes the connection interactive.
At the bash prompt for the ubuntu
container, create a file in /tmp
:
- echo "I'm not going anywhere" > /tmp/hi
Go ahead and type exit
to return to your host machine’s shell. Now, run the same command again:
docker run -t -i --volumes-from datacontainer ubuntu /bin/bash
This time the hi
file is already there:
- cat /tmp/hi
You should see:
Output of cat /tmp/hiI'm not going anywhere
You can add as many --volumes-from
flags as you’d like (for example, if you wanted to assemble a container that uses data from multiple data containers). You can also create as many data volume containers as you’d like.
The only caveat to this approach is that you can only choose the mount path inside the container (/tmp
in our example) when you create the data volume container.
The other common use for Docker containers is as a means of sharing files between the host machine and the Docker container. This works differently from the last example. There’s no need to create a “data-only” container first. You can simply run a container of any Docker image and override one of its directories with the contents of a directory on the host system.
As a quick real-world example, let’s say you wanted to use the official Docker Nginx image but you wanted to keep a permanent copy of the Nginx’s log files to analyze later. By default the nginx
Docker image logs to the /var/log/nginx
directory, but this is /var/log/nginx
inside the Docker Nginx container. Normally it’s not reachable from the host filesystem.
Let’s create a folder to store our logs and then run a copy of the Nginx image with a shared volume so that Nginx writes its logs to our host’s filesystem instead of to the /var/log/nginx
inside the container:
- mkdir ~/nginxlogs
Then start the container:
- docker run -d -v ~/nginxlogs:/var/log/nginx -p 5000:80 -i nginx
This run
command is a little different from the ones we’ve used so far, so let’s break it down piece by piece:
-v ~/nginxlogs:/var/log/nginx
— We set up a volume that links the /var/log/nginx
directory from inside the Nginx container to the ~/nginxlogs
directory on the host machine. Docker uses a :
to split the host’s path from the container path, and the host path always comes first.
-d
— Detach the process and run in the background. Otherwise, we would just be watching an empty Nginx prompt and wouldn’t be able to use this terminal until we killed Nginx.
-p 5000:80
— Setup a port forward. The Nginx container is listening on port 80 by default, and this maps the Nginx container’s port 80 to port 5000 on the host system.
If you were paying close attention, you may have also noticed one other difference from the previous run
commands. Up until now we’ve been specifying a command at the end of all our run
statements (usually /bin/bash
) to tell Docker what command to run inside the container. Because the Nginx image is an official Docker image, it follows Docker best practices, and the creator of the image set the image to run the command to start Nginx automagically. We can just drop the usual /bin/bash
here and let the creators of the image choose what command to run in the container for us.
So, we now have a copy of Nginx running inside a Docker container on our machine, and our host machine’s port 5000 maps directly to that copy of Nginx’s port 80. Let’s use curl to do a quick test request:
curl localhost:5000
You’ll get a screenful of HTML back from Nginx showing that Nginx is up and running. But more interestingly, if you look in the ~/nginxlogs
folder on the host machine and take a look at the access.log
file you’ll see a log message from Nginx showing our request:
cat ~/nginxlogs/access.log
You will see something similar to:
Output of `cat ~/nginxlogs/access.log`172.17.42.1 - - [23/Oct/2015:05:22:51 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.35.0" "-"
If you make any changes to the ~/nginxlogs
folder, you’ll be able to see them from inside the Docker container in real-time as well.
That about sums it up! We’ve now covered how to create data volume containers whose volumes we can use as a way to persist data in other containers as well as how to share folders between the host filesystem and a Docker container. This covers all but the most advanced use cases when it comes to Docker data volumes.
If you are using Docker Compose, Docker data volumes can be configured in your docker-compose.yml
file. Check out How To Install and Use Docker Compose on Ubuntu 14.04 for details.
Good luck and happy Dockering!
Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.
This textbox defaults to using Markdown to format your answer.
You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!
I am not native english speaker. But i had understood every single row of your excellent article. You have a talent of explanation abstract thing in simple way, thanks a lot Nik!
Hi~ Thank you for your sharing! I’m new to Docker, but I think that there is something different from what I know about how the modifications to the container will be dealt with. I noticed that you use “docker run” during the whole process with out an argument “–name <name>”, thus, Docker will create a new container every time, whose name is a random one generated by the docker itself, eg: if you run
there comes a container whose name is “sleepy_liskov” on my machine, you run it again, another container called “fervent_lalande” is created, they exit at once cause the busybox dose nothing. So, my point is, in your scenario, when you created the file “byebye” in a container without a name given by you, you stop it, and you “docker run” again, the new container is a new one which is definitely not the one with the “byebye” file. But if you use
you’ll find the container(let’s call it foo_bar) with the “byebye” file, it’s not gone, just stopped. you can use
to restart it. So, when you say stop the container and start it up again, I would think that you mean "docker start " and “docker stop” not “docker run”, the container will be always there with any of your modification until you remove them with
So, I guess maybe you try to find the "byebye " file in the wrong place, and it may be a good practice to use the “–name” argument or use the “–rm”. Again, I’m new to Docker, I’ve been stuck for days about how to use volume properly on OS X, 'cause you know, there is another VirtualBox layer between your host and the container, the volume mounted to the container with “-v” has some permission problem, which drives me crazy. Do you have some clue about it~
Is this still up to date? I think the changes are now persisted in the top RW layer, even if no data volume is mounted. Right?
I tried creating a file an restarting the container. The file was still there…
Quite possibly the clearest and most comprehensive overview of containers I have ever read. Thanks for making the comparison to virtual machines - as a long time VMware guy (since 2002!), I have struggled to wrap my head around the concept and learn where the similarities and differences were. Now, if you could explain how containers compare to, say, a VMware-based virtual application, you’d really win me over :-). BTW…HUGE Digital Ocean fan - I love your site!
Something definitely has changed regarding volumes. If i don’t specify a custom host path to the volume, i have no issues.
But when i want to run a container with a volume at a custom location on my host, i am not able to use that volume. This happens to me with al images i have tested it with:
[mru@dockster ~]$ docker run -d -v ~/nginxlogs:/var/log/nginx -p 5000:80 -i nginx [mru@dockster ~]$ ls nginxlogs/ [mru@dockster ~]$ docker logs brave_gates nginx: [alert] could not open error log file: open() “/var/log/nginx/error.log” failed (13: Permission denied) 2018/10/27 17:18:14 [emerg] 1#1: open() “/var/log/nginx/error.log” failed (13: Permission denied)
So, the directory gets created, but i am not able to write to it for whatever reason. This happens also when i start docker as root, and also when i specify a directory which is chmodded 777. Has anyone an idea what goes wrong here?
Hi, you wrote: “There’s no way to directly create a “data volume” in Docker, so instead we create a data volume container with a volume attached to it” Actually, from docker 1.9.0 you can directly create volumes without the need of creating data volume containers. See: https://docs.docker.com/engine/reference/commandline/volume_create/
Hi, thank you for contributing such a clear and comprehensive tutorial for how to set up docker data volumes for what I presume are the most common use cases. Would you be able to give some pointers for how to address the third, more advanced, scenario i.e. sharing the contents of a persistent data volume between multiple containers? I’ve found a few resources about GlusferFS and NFS, but so far nothing as clearly outlined as your article. Cheers
Clearer than this explanation you won’t find thanks Nik this is awesome :)
I do not really get what does the following mean. 3. To share data with other Docker containers
I want to know more about it. Any more info or ref link?
Hi,
First of all thanks for the great article. I am particularly interested in sharing data between host and container. I am trying to create a volume that will store screenshots for failed capybara tests but I am having some issues.
My docker file has the following line:
I can see the screenshot being saved in the container:
But I can’t see anything being created in the host.
Could you please help me figure out what is going on here?
Thanks