Most of the content of this article is taken from the official docker documentation. Understand images, containers, and storage drivers.
Docker is an open source application container engine that allows developers to package their applications and dependencies into a portable container, which can then be distributed to any popular Linux machine, and can also be virtualized. Containers are completely sandboxed and do not have any interface with each other. It enables developers to quickly create simple, ready-to-run containerized applications; making it easier to manage and deploy applications.
To truly understand docker's storage driver, you need to first understand how docker images are built and stored, and how containers use images.
Mirroring and Layering
Below is the image layering of ubuntu:15.04. There are a total of 4 layers, each layer is composed of some read-only files that describe the difference of the system.
Comparing the upper and lower images, you can clearly see the image layering relationship (the image above is the official document image, you can see that the image size has been simplified, but the layered structure of the ubuntu:15.04 image has not changed).
The role of the Docker storage driver is to stack these layered image files and provide a unified view. Make the container's file system look no different from our ordinary file system.
When a new container is created, a new container layer (container layer) is actually added to the image layer. All subsequent modifications to the container actually only affect this layer.
Notice
Container layer: read-write layer (writable layer)
Image layer: read-only layer
Containers and Layering
One of the main differences between an image and a container is whether it has a top-level read-write layer (writable layer). Data additions and modifications to a container are stored in the writable layer. When you delete a container, the writable layer will also be deleted (note: the difference between the writable layer and the data volume). However, the mirror layer remains the same.
The figure below shows that multiple containers share an image. The mirror layer is a read-only layer, immutable. Multiple container layers are on the same image layer and are independent of each other and do not affect each other.
container
The responsibility of the docker storage driver is to manage the image layer and the writable container layer. Different drivers implement management in different ways. Two key technologies for implementing container and image management are stackable image layers and copy-on-write (CoW, copy-on-write).
Copy-on-write
For example: Xiaowen and Xiaowu are taught math by different teachers, but they only have one workbook. Xiaowen's homework is the eleventh page of the workbook. In order not to affect Xiaowu, Xiaowen's method is to copy the 11th page and hand it in after completing the homework. This is a typical copy-on-write
The first time a file is modified, the file is first copied from the read-only layer below the read-write layer to the read-write layer. The read-only version of the file still exists, but is hidden by the copy of the file in the read-write layer.
After understanding the copy-on-write, you should pay attention to a problem: if the file contained in the mirror layer is modified for the first time, the size of the file is very large. Will cause a lot of disk IO overhead. Therefore, it is not recommended to integrate large files that need to be modified into the image. Data volumes can be used.
Data volumes and storage drives
When a container is deleted, all data written to the container will be deleted (except data stored in the data volume)
The data volume is mounted to the container, a directory or file on the docker host. The file reading and writing of the data volume is not controlled by the storage driver, and is close to the reading and writing speed of the local file system. Multiple data volumes can be mounted to a container. It is also possible for multiple containers to share one or more data volumes.
As shown in the figure: a docker host runs 2 containers. Each container has its own storage space, which is stored in the host's local file system /var/lib/docker/… In addition, there is a shared data volume mounted in /data. to the two containers for sharing.
How to choose a storage driver
The storage drivers currently supported by docker are: OverlayFS, AUFS, Btrfs, Device Mapper, VFS, ZFS.
Docker's storage driver currently does not have a universal, perfect storage driver suitable for all environments. So you need to choose according to your own environment.
Storage drivers are constantly improving and developing
For stability considerations, a storage driver will be selected by default according to your system environment configuration when installing docker. Usually using this default driver will reduce your chance of encountering bugs.
If your team has used RHEL and its related forks, you probably have experience with LVM and Device Mapper. In this case, it is recommended that you use the devicemapper storage driver.
View the storage driver of the current docker engine
As shown in the figure: the storage driver type is aufs, and the format of the host file system is extfs.
Storage driver and host file formats
Set up docker storage driver
Current status and future
Many see OverlayFS as the storage-driven future of Docker. However, it is still not mature enough. The stability is not as good as some mature storage drivers, such as: AUFS, devicemapper.
The chart below shows the advantages and disadvantages of each storage driver, please refer to:
specific to a storage driver
This part introduces the specific implementation of storage drivers, which can be referred to and learned by researchers of big data technology . For application practitioners, it can be temporarily stopped.