Talking about the two persistence mechanisms RDB and AOF in Redis

Redis persistence

Redis is an in-memory database. If the database state in memory is not saved to disk, then once the server process exits, the database state in the server will also disappear. So Redis provides persistence functions, which are RDB (Redis DataBase) and AOF (Append Only File).

1. Persistence process

Since redis data can be stored on disk, what does this process look like?

There are five processes:

(1) The client sends a write operation to the server (the data is in the memory of the client).

(2) The database server receives the data of the write request (the data is in the memory of the server).

(3) The server calls the write system call to write the data to the disk (the data is in the buffer of the system memory).

(4) The operating system transfers the data in the buffer to the disk controller (the data is in the disk cache).

(5) The disk controller writes the data to the physical medium of the disk (the data actually falls on the disk).

These 5 processes are a normal storage process under ideal conditions, but in most cases, our machines, etc. will have various failures. Here are two cases:

(1) If the Redis database fails, as long as the third step above is executed, it can be persisted, and the remaining two steps will be completed by the operating system for us.

(2) If the operating system fails, the above 5 steps must be completed.

Only the possible failures of the saving process are considered here. In fact, the saved data may also be damaged, and a certain recovery mechanism is required, but it will not be extended here. The main consideration now is how redis implements the above five steps to save disks. It provides two strategic mechanisms, namely RDB and AOF.

After we installed redis, all the configurations are in the redis.conf file, which saves the various configurations of the two persistence mechanisms RDB and AOF.

Two, RDB mechanism

What is RDB

Write the snapshot of the data set in the memory to the disk within the specified time interval, which is the Snapshot in the jargon, and it reads the snapshot file directly into the memory when it is restored. What is a snapshot? You can understand it as taking the data at the current moment into a picture and saving it.

Redis will create (fork) a child process for persistence. It will first write the data to a temporary file. After the persistence process
is over, it will replace the last persisted file with this temporary file. During the whole process, the main process does not perform any IO operations.
This ensures extremely high performance. If large-scale data recovery is required, and the integrity of data recovery is not very sensitive, the RDB method is more efficient than the AOF method. The disadvantage of RDB is that the data after the last persistence may be lost.

RDB writes the data in the memory to a binary file in a snapshot mode, and saves it as a dump.rdb file by default.

Manual trigger

1.save command trigger

Block the current Redis server until the RDB process is completed. It will cause long-term blocking for instances with large memory. It is not recommended to use online environments

2. bgsave trigger mode

When this command is executed, Redis will perform a snapshot operation asynchronously in the background, and the snapshot can also respond to client requests. The specific process is as follows:
Insert picture description here

When this command is executed, the Redis process executes a fork operation to create a child process. The RDB persistence process is in charge of the child process, and ends automatically after completion. Blocking only occurs in the fork phase, and usually takes a short time.

Auto trigger

Automatic triggering is done by our configuration file. In the redis.conf configuration file, there are the following configurations, which we can set:

①save: This is to configure the RDB persistence condition that triggers Redis, that is, when to save the data in the memory to the hard disk. For example, "save mn". Indicates that bgsave is automatically triggered when there are n modifications to the data set within m seconds.

The default configuration is as follows:

Insert picture description here

# Means that if the value of at least 1 key changes within 900 seconds, bgsave
save 900 1 is automatically triggered.
# Means that if the value of at least 10 keys changes within 300 seconds, bgsave
save 300 10
# Means that if the value changes within 60 seconds If the value of at least 10000 keys changes, bgsave
save 60 10000 will be triggered automatically

No need to persist, then you can comment out all save lines to disable the save function.

②stop-writes-on-bgsave-error: The default value is yes. When RDB is enabled and the last time the background saves data fails, whether Redis stops receiving data. This will make users realize that the data is not properly persisted to disk, otherwise no one will notice that a disaster has occurred. If Redis restarts, you can start receiving data again

③rdbcompression; the default value is yes. For the snapshots stored in the disk, you can set whether to compress the storage.

④rdbchecksum: The default value is yes. After storing the snapshot, we can also let redis use the CRC64 algorithm for data verification, but this will increase the performance consumption by about 10%. If you want to get the maximum performance improvement, you can turn off this function.

⑤dbfilename: set the file name of the snapshot, the default is dump.rdb

⑥dir: Set the storage path of the snapshot file. This configuration item must be a directory, not a file name.

Advantages of RDB:

  1. Once this method is adopted, your entire Redis database will only contain one file, which is perfect for file backup. For example, you might plan to archive the last 24 hours of data once every hour, and also archive the last 30 days of data once a day. Through such a backup strategy, once the system has a catastrophic failure, we can recover it very easily.
  2. Maximize performance. For the Redis service process, when it starts to persist, the only thing it needs to do is to fork the child process, and then the child process will complete the persistence work, which can greatly avoid the service process from performing IO operations.
  3. Compared with the AOF mechanism, if the data set is large, the startup efficiency of RDB will be higher.

Disadvantages of RDB:

  1. There is no way to achieve real-time persistence/second-level persistence for RDB data. Because bgsave has to perform a fork operation to create a child process every time it runs, it is a heavyweight operation and the cost of frequent execution is too high.
  2. It is easy to cause data loss. Once the system is down before the timing persistence, the data that has not had time to write to the disk before will be lost.
  3. RDB files are saved in a specific binary format. There are multiple RDB versions in the Redis version evolution process, and there is a problem that the old version of Redis services cannot be compatible with the new version of RDB format.

3. AOF mechanism

Full backup is always time-consuming. Sometimes we provide a more efficient way of AOF. The working mechanism is very simple. Redis will append every write command received to the file through the write function, and re-execute AOF when restarting. The commands in the file achieve the purpose of restoring data. The popular understanding is logging.

Endurance principle

See the picture below for his principle:

Insert picture description here
Whenever a write command comes, it is directly saved in our AOF file.

Use AOF

To enable the AOF function, you need to set the configuration: appendonly yes , which is not enabled by default. The AOF file name is configured through appendfilename, and the default file name is appendonly.aof. The save path is the same as the RDB persistence mode and is specified by the dir configuration. AOF's workflow operations: command write (append), file synchronization (sync), file rewrite (rewrite), restart load (load).

appendonly no # 是否以append only模式作为持久化方式,默认使用的是rdb方式持久化,这种方式在许多应用中已经足够用了

appendfilename "appendonly.aof" # appendfilename AOF 文件名称

appendfsync everysec # appendfsync aof持久化策略的配置
	# no表示不执行fsync,由操作系统保证数据同步到磁盘,速度最快。
	# always表示每次写入都执行fsync,以保证数据同步到磁盘。
	# everysec表示每秒执行一次fsync,可能会导致丢失这1s数据。
No-appendfsync-on-rewrite #重写时是否可以运用Appendfsync,用默认no即可,保证数据安全性

Auto-aof-rewrite-min-size # 设置重写的基准值

Auto-aof-rewrite-percentage #设置重写的基准值

File rewriting

The AOF approach also brings another problem. Persistent files will become larger and larger. In order to compress the persistent file of AOF. redis provides the bgrewriteaof command. The data in the memory is saved to a temporary file by command, and a new process will be fork to rewrite the file.

Insert picture description here
Rewrite AOF files regularly to achieve compression.

The rewritten AOF file becomes smaller due to the following reasons:

1) The data that has timed out in the process is no longer written to the file.

2) The old AOF file contains invalid commands, such as del key1, hdel key2, srem keys, set a111, set a222, etc. The rewrite is directly generated using in-process data, so that the new AOF file only retains the write command of the final data.

3) Multiple write commands can be combined into one, such as: lpush list a, lpush list b, lpush list c can be transformed into: lpush list abc. In order to prevent a single command from being too large and causing the client buffer to overflow, for operations of types such as list, set, hash, zset, etc., it is divided into multiple pieces based on 64 elements.

AOF rewriting reduces the file footprint. In addition, another purpose is: smaller AOF files can be loaded by Redis faster

The AOF rewriting process can be triggered manually and automatically:

Manual trigger : directly call the bgrewriteaof command.

Automatic trigger : Determine the automatic trigger timing according to the auto-aof-rewrite-min-size and auto-aof-rewrite-percentage parameters

Advantages of AOF

  1. This mechanism can bring higher data security, that is, data persistence. Redis provides 3 synchronization strategies, namely, synchronization per second, synchronization per modification, and asynchronous. In fact, synchronization per second is also done asynchronously, and its efficiency is also very high. The difference is that once the system is down, the modified data within one second will be lost. And every time the synchronization is modified, we can regard it as synchronization persistence, that is, every data change that occurs will be immediately recorded to the disk. It can be predicted that this method is the lowest in efficiency. As for no synchronization, no need to say more, I think everyone can understand it correctly.

  2. Because this mechanism uses append mode for the write operation of the log file, even if there is a downtime during the writing process, it will not destroy the existing content in the log file. However, if we only write half of the data in this operation, the system crashes, don't worry, we can use the redis-check-aof tool to help us solve the problem of data consistency before Redis is started next time.

  3. If the log is too large, Redis can automatically enable the rewrite mechanism. That is, Redis continuously writes the modified data to the old disk file in the append mode, and at the same time, Redis will also create a new file to record which modification commands have been executed during this period. Therefore, data security can be better guaranteed during rewrite switching.

  4. AOF contains a clearly formatted, easy-to-understand log file for recording all modification operations. In fact, we can also complete the data reconstruction through this file.

Disadvantages of AOF

  1. For the same number of data sets, AOF files are usually larger than RDB files. RDB is faster in restoring large data sets than AOF.

  2. According to different synchronization strategies, AOF tends to be slower than RDB in operating efficiency. In short, the efficiency of the synchronization strategy per second is relatively high, and the efficiency of the synchronization disable strategy is as efficient as RDB.

Guess you like

Origin blog.csdn.net/qq_43458555/article/details/108275310