7 reasons why databases are not suitable for Docker and containerization

Introduction: All services have started to be containerized and upgraded. Under the mainstream idea of ​​everything is a container, the use of containerization for stateless services has become the general trend. A question that often troubles architects is whether the database needs to be containerized. The author of this article Mikhail Chinkov made his own negative point, translated by High Availability Architecture.

 

If we look at the tech industry in 2017, containers and Docker will remain the hottest buzzwords. We started packaging the developed software in Docker containers for each domain. Containers are being used by everything from small startups to huge microservice platforms. From CI platforms to Raspberry Pi. From database to...

database? Are you sure you want to put the database in the container?

Unfortunately, this is not a fictional scenario. I see many fast growing projects persisting data into containers. And put computing services and data services on the same machine. The author hopes that experienced people will not use this solution.

Below is my opinion, database containerization is very unreasonable today.

7 reasons why databases are not suitable for containerization

 

1. Data is not secure

 

Even if you put Docker data on the host for storage, it still does not guarantee data loss. Docker volumes are designed around the Union FS image layer to provide persistent storage, but it still lacks guarantees.

With current storage drivers, Docker is still at risk of being unreliable. If the container crashes and the database is not shut down properly, data can be corrupted.

 

2. Environment requirements for running the database

 

It is common to see DBMS containers and other services running on the same host. However, the hardware requirements for these services are very different.

Databases (especially relational databases) have high IO requirements. Typically database engines use dedicated environments to avoid contention for concurrent resources. If you put your database in a container, it will waste your project's resources. Because you need to configure a lot of additional resources for that instance. In the public cloud, when you need 34G of memory, the instance you launch must have 64G of memory. In practice, these resources are not fully used.

How to deal with it? You can design in layers and use fixed resources to launch multiple instances at different layers. Scaling horizontally is always better than scaling vertically. 

 

3. Network problems

 

To understand Docker networking, you must have a solid understanding of network virtualization. You must also be prepared for the unexpected. You may need to do bug fixes without support or additional tools.

We know: Databases need dedicated and durable throughput for higher loads. We also know that containers are an isolation layer behind the hypervisor and the host virtual machine. However, the network is critical for database replication, which requires a 24/7 stable connection between master and slave databases. The unresolved Docker networking issue remains unresolved in version 1.9.

Putting these issues together, containerization makes database containers difficult to manage. I know you are a top engineer and can solve any problem. But how much time do you need to spend troubleshooting Docker networking? Wouldn't it be better to put the database in a dedicated environment? Save time to focus on the business goals that really matter.

 

4. Status

 

Packaging stateless services in Docker is cool, enabling orchestration of containers and solving single points of failure. But what about the database? Put the database in the same environment and it will be stateful and make the scope for system failures greater. The next time your application instance or application crashes, it may affect the database.

 

5. The database is not suitable for use with major Docker features

 

Considering a database in a container, let's think about its value. Let's take a look at Docker's official definition of it:

 

Docker is an open platform for developers and system administrators to build, distribute and run distributed applications. Docker includes Docker Engine (a portable, lightweight runtime and packaging tool) and Docker Hub (a cloud service for sharing applications and automating workflows), Docker enables applications to be quickly assembled in components and eliminates development, QA, and differences between production environments. As a result, IT can distribute programs faster and run the same applications on laptops, data center virtual machines, and any cloud.


Based on this answer, we can easily define the main features of Docker:

 

  • Ease of building new environments

  • Ease of redeployment (continuous integration)

  • Easy to scale horizontally (from practice)

  • Easy to maintain consistent environment

 

Let's start thinking about how these functions fit into the database world.

Easy to set up database? Let's see if containerizing or running the database locally makes a huge difference in operation.

docker run -d mongod:3.4

Compared:

sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 0C49F3730359A14518585931BC711F9BA15703C6
echo "deb [ arch=amd64,arm64 ] http://repo.mongodb.org/apt/ubuntu xenial/mongodb-org/3.4 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-3.4.list
sudo apt-get update && sudo apt-get install -y mongodb-org

 

Ease of building new environments? If we are talking about MongoDB cluster - maybe containerization is more efficient. But what about configuration management systems? They are designed to solve configuration problems by running a single command. With Ansible you can easily set up dozens of Mongo instances. As you can see, there is no significant increase in value.

Easy to redeploy? How often do you redeploy your database to the next version? Database upgrades are not an availability issue, but an engineering issue (ie availability in a cluster). Think about how your application will use the new database engine version. Problems that may result when the engine is replaced.

Easy to scale horizontally? Do you want to share the data directory between multiple instances? Are you not afraid of immediate data concurrency issues and possible data corruption? Wouldn't it be safer to deploy multiple instances with a dedicated data environment? Finally do a master-slave replication?

Is it easy to maintain a consistent environment? How often does the database instance environment change? Do you upgrade your operating system every day? Or do database versions or dependencies change frequently? Or is it not easy to reach a consensus with the engineering team?

In the end, none of the features were enough to make me consider database containerization.

 

6. Additional isolation is bad for the database

 

Actually I mentioned this in the second and third reasons. But I list this as a separate reason because I want to emphasize that fact again. The more isolation levels we have, the more resource overhead we get. Compared to a dedicated environment, easy horizontal scaling can give us more benefits. However , horizontal scaling in Docker can only be used for stateless computing services, not databases.

We don't see any isolation feature for the database, so why should we put it in a container?

 

7. Inapplicability of cloud platforms

 

Most people start projects through the public cloud. The cloud simplifies the complexity of virtual machine operations and replacement, so there is no need to test new hardware environments during nights or weekends when no one is working. Why do we need to worry about the environment in which an instance is running when we can quickly spin up an instance?

That's why we pay cloud providers a lot. When we place the database container for the instance, these conveniences mentioned above do not exist. Because the data does not match, the new instance will not be compatible with the existing instance. If we want to restrict the instance to use stand-alone services, we should let the DB use a non-containerized environment. We only need to reserve the ability to elastically expand the computing service layer.

 

Do these 7 points apply to all databases?

 

Maybe not all, but it should be all databases that need persistent data, and all databases with special hardware environment requirements.

If we use Redis as cache or user session store - there shouldn't be any problem using containers. Because there is no need to ensure that the data is landed, there is no risk of data loss. But if we're considering using Redis as a persistent data store, then you're better off keeping the data outside the container, even if you keep flushing RDB snapshots, finding this snapshot in a rapidly changing compute cluster can be complicated.

We can also talk about Elasticsearch inside a container. We can store indexes in ES and rebuild them from persistent data sources. But look at the requirements! By default, Elasticsearch requires 2 to 3GB of memory. Because of Java's GC, memory usage is not consistent. Are you sure Elasticsearch is a good fit for resource-constrained containers? Wouldn't it be better to have different Elasticsearch instances use different hardware configurations?

Don't worry about database containerization for your local development environment. Put your database in a container in your local environment and you will save a lot of time and effort. You will be able to replicate the production operating system. Native Postgres for OS X or Windows is not 100% compatible with Linux versions. Set up a container on the host OS instead of a package and you'll overcome this problem.

 

in conclusion

 

The Docker hype should cool down someday. This does not mean that people will stop using container virtualization technology, but that we need to put the value of containerization in the first consideration when designing.

A few days ago I saw a talk on how frameworks survive in the messy Ruby world. The inspiration I got from this talk is the technology hype cycle, borrowing the words of this hype cycle, we see that Docker is currently in the second phase (the peak of expectations) for too long (high availability architecture editor: see Resource 1), Things will normalize when we see Docker in the last phase. I think we need to be accountable for this process and should speed it up.

 

Reference resources

 

  1. https://www.youtube.com/watch?v=9zc4DSTRGeM#7m40s

  2. Original English: https://myopsblog.wordpress.com/2017/02/06/why-databases-is-not-for-containers/

 

http://mp.weixin.qq.com/s?__biz=MzAwMDU1MTE1OQ==&mid=2653548247&idx=1&sn=99d0e90fa99deec3a7dab49eac418e5c&chksm=813a7f4fb64df65946223f717a5231aae62edc5b6335613e45f8153367736ed480e9439d8906&mpshare=1&scene=23&srcid=0214Xt8phunNYQw7MGVURlW8#rd

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326713824&siteId=291194637