Interviewer: Tell me about the design and implementation of distributed locks

Today I will discuss with you the design and implementation of distributed locks. I hope it will be helpful to everyone. If there is something wrong, please point it out. Let’s learn together and make progress together~

  • Overview of distributed locks

  • Database distributed lock

  • Redis distributed lock

  • Zookeeper distributed lock

  • Comparison of three distributed locks

1. Overview of distributed locks

Our systems are deployed in a distributed manner. In daily development, we need to use distributed locks in order to prevent inventory from being oversold in business scenarios such as placing orders in seconds and snapping up products .

A distributed lock is actually an implementation of a lock that controls different processes in a distributed system to access shared resources. If different systems or different hosts of the same system share a certain critical resource, mutual exclusion is often required to prevent mutual interference and ensure consistency.

There are generally three ways to realize the popular distributed lock in the industry:

  • Distributed lock based on database implementation

  • Distributed lock based on Redis

  • Distributed lock based on Zookeeper

2. Database-based distributed lock

2.1 Distributed locks implemented by database pessimistic locks

Can be used select ... for update to implement distributed locks. Our own project, distributed timing tasks , uses a similar implementation scheme. Let me show you a simple version.

The table structure is as follows:

CREATE TABLE `t_resource_lock` (
  `key_resource` varchar(45) COLLATE utf8_bin NOT NULL DEFAULT '资源主键',
  `status` char(1) COLLATE utf8_bin NOT NULL DEFAULT '' COMMENT 'S,F,P',
  `lock_flag` int(10) unsigned NOT NULL DEFAULT '0' COMMENT '1是已经锁 0是未锁',
  `begin_time` datetime DEFAULT NULL COMMENT '开始时间',
  `end_time` datetime DEFAULT NULL COMMENT '结束时间',
  `client_ip` varchar(45) COLLATE utf8_bin NOT NULL DEFAULT '抢到锁的IP',
  `time` int(10) unsigned NOT NULL DEFAULT '60' COMMENT '方法生命周期内只允许一个结点获取一次锁,单位:分钟',
  PRIMARY KEY (`key_resource`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin

lockThe pseudocode of the locking method is as follows:

@Transcational //一定要加事务
public boolean lock(String keyResource,int time){
   resourceLock = 'select * from t_resource_lock where key_resource ='#{keySource}' for update';
   
   try{
    if(resourceLock==null){
      //插入锁的数据
      resourceLock = new ResourceLock();
      resourceLock.setTime(time);
      resourceLock.setLockFlag(1);  //上锁
      resourceLock.setStatus(P); //处理中
      resourceLock.setBeginTime(new Date());
      int count = "insert into resourceLock"; 
      if(count==1){
         //获取锁成功
         return true;
      }
      return false;
   }
   }catch(Exception x){
      return false;
   }
   
   //没上锁并且锁已经超时,即可以获取锁成功
   if(resourceLock.getLockFlag=='0'&&'S'.equals(resourceLock.getstatus)
    && new Date()>=resourceLock.addDateTime(resourceLock.getBeginTime(,time)){
      resourceLock.setLockFlag(1);  //上锁
      resourceLock.setStatus(P); //处理中
      resourceLock.setBeginTime(new Date());
      //update resourceLock;
      return true;
   }else if(new Date()>=resourceLock.addDateTime(resourceLock.getBeginTime(,time)){
     //超时未正常执行结束,获取锁失败
     return false;
   }else{
     return false;
   } 
}

unlockThe pseudocode of the unlock method is as follows:

public void unlock(String v,status){
      resourceLock.setLockFlag(0);  //解锁
      resourceLock.setStatus(status); S:表示成功,F表示失败
      //update resourceLock;
      return ;
}

Overall process:

try{
if(lock(keyResource,time)){ //加锁
   status = process();//你的业务逻辑处理。
 }
} finally{
    unlock(keyResource,status); //释放锁
}

In fact, the overall process of the distributed lock implemented by pessimistic lock is relatively clear. select ... for update It is to lock the record of the primary key first key_resource. If it is empty, you can insert a record. If there is an existing record, judge the status and time , and whether it has timed out . Here you need to pay attention, you must add a transaction .

2.2 Distributed locks implemented by database optimistic locks

In addition to pessimistic locks, optimistic locks can also be used to implement distributed locks . Optimistic lock, as the name suggests, is very optimistic. Every time an update operation is performed, it is felt that there will be no concurrency conflicts. Only after the update fails, it will be retried. It is implemented based on the idea of ​​CAS. My previous company used this method to deduct the balance .

Make a version field, every time you update and modify, it will increase by one, and then when you update the balance, take the version number you found out and update it with conditions. If it is the version number last time, update it. If not, it means If others have modified it concurrently, continue to retry.

The approximate process is as follows:

  1. Query version number and balance

select version,balance from account where user_id ='666';

Assume that the found version number is oldVersion=1.

  1. Logical processing, judging the balance

if(balance<扣减金额){
   return;
}

left_balance = balance - 扣减金额;
  1. carry out deduction balance

update account set balance = #{left_balance} ,version = version+1 where version 
= #{oldVersion} and balance>= #{left_balance} and user_id ='666';

You can look at this flowchart:

This method is suitable for scenarios with low concurrency . Generally, you need to set the number of retries

3. Distributed lock based on Redis

Redis distributed locks generally have the following implementation methods:

  • setnx + expire

  • setnx + value value is the expiration time

  • Extended commands for set (set ex px nx)

  • set ex px nx + check unique random value, then delete

  • Redisson

  • Redisson + RedLock

3.1 setnx + expire

When it comes to Redis distributed locks, many small partners backhand it setnx + expire, as follows:

if(jedis.setnx(key,lock_value) == 1){ //setnx加锁
    expire(key,100); //设置过期时间
    try {
        do something  //业务处理
    }catch(){
    }
  finally {
       jedis.del(key); //释放锁
    }
}

This code can be successfully locked, but have you found any problems? The locking operation and setting the timeout period are separate . Assuming that after the lock is executed setnx, expirewhen the expiration time is about to be executed, the process crashcrashes or restarts for maintenance, then the lock will be immortal , and other threads will never be able to acquire the lock, so distributed locks cannot be implemented like this !

3.2 setnx + value is the expiration time

long expires = System.currentTimeMillis() + expireTime; //系统时间+设置的过期时间
String expiresStr = String.valueOf(expires);

// 如果当前锁不存在,返回加锁成功
if (jedis.setnx(key, expiresStr) == 1) {
        return true;
// 如果锁已经存在,获取锁的过期时间
String currentValueStr = jedis.get(key);

// 如果获取到的过期时间,小于系统当前时间,表示已经过期
if (currentValueStr != null && Long.parseLong(currentValueStr) < System.currentTimeMillis()) {

     // 锁已过期,获取上一个锁的过期时间,并设置现在锁的过期时间(不了解redis的getSet命令的小伙伴,可以去官网看下哈)
    String oldValueStr = jedis.getSet(key, expiresStr);
    
    if (oldValueStr != null && oldValueStr.equals(currentValueStr)) {
         // 考虑多线程并发的情况,只有一个线程的设置值和当前值相同,它才可以加锁
         return true;
    }
}
        
//其他情况,均返回加锁失败
return false;
}

In daily development, some small partners implement distributed locks in this way, but there are some disadvantages :

  • The expiration time is generated by the client itself. In a distributed environment, the time of each client must be synchronized.

  • The unique identifier of the holder is not saved and may be released/unlocked by other clients .

  • When the lock expires, multiple concurrent clients request at the same time, and all of them are executed. jedis.getSet()In the end, only one client can successfully lock, but the expiration time of the client's lock may be overwritten by other clients.

3.3 Extended command of set (set ex px nx)

What do the parameters of this command mean? Review it with everyone:

SET key value [EX seconds] [PX milliseconds] [NX|XX]
  • EX second : Set the expiration time of the key to secondseconds.

  • PX millisecond : Sets the key's expiration time in millisecondmilliseconds.

  • NX : Set the key only if the key does not exist.

  • XX : Set the key only if the key already exists.

if(jedis.set(key, lock_value, "NX", "EX", 100s) == 1){ //加锁
    try {
        do something  //业务处理
    }catch(){
  }
  finally {
       jedis.del(key); //释放锁
    }
}

There may be problems with this solution:

  • The lock expired and was released, and the business has not been executed yet.

  • The lock was accidentally deleted by another thread.

Some partners may have a question, why is the lock accidentally deleted by other threads ? Assuming that in a concurrent multi-threaded scenario, thread A acquires the lock, but if it does not release the lock, thread B cannot acquire the lock , so logically it cannot execute the code below the lock, how can it cause the lock to be locked by others? What if the thread is deleted by mistake?

Assume that threads A and B both want to use keylocks, and finally A grabs the lock and locks successfully, but because it takes a long time to execute the business logic, it exceeds the set timeout period 100s. At this time, Redis automatically releases keythe lock. At this time, thread B can lock successfully, and then, it also executes business logic processing. Suppose it happens that at this time, A finishes executing its own business logic, and it releases the lock, but it releases B's lock.

3.4 set ex px nx + check unique random value, then delete

In order to solve the problem that the lock is accidentally deleted by other threads . On the basis of , you can set ex px nxadd a unique random value for verification, as follows:

if(jedis.set(key, uni_request_id, "NX", "EX", 100s) == 1){ //加锁
    try {
        do something  //业务处理
    }catch(){
  }
  finally {
       //判断是不是当前线程加的锁,是才释放
       if (uni_request_id.equals(jedis.get(key))) {
          jedis.del(key); //释放锁
        }
    }
}

Here, judging the lock added by the current thread and releasing the lock is not an atomic operation . If jedis.del()the release lock is called, the lock may no longer belong to the current client , and the lock added by others will be released.

Generally, you can use lua script to wrap it. The lua script is as follows:

if redis.call('get',KEYS[1]) == ARGV[1] then 
   return redis.call('del',KEYS[1]) 
else
   return 0
end;

This method is quite good, and under normal circumstances, this implementation method can already be used. But there are still problems: the lock expires and is released, and the business has not been executed yet .

3.5 Redison

For the possible problem that the lock expires and is released, the business has not been executed . We can set the lock expiration time a little longer, which is longer than the normal business processing time. If you feel that it is not very stable, you can also start a timing daemon thread for the thread that acquires the lock, and check whether the lock still exists every once in a while. If it exists, the expiration time of the lock will be extended to prevent the lock from being released early.

The current open source framework Redisson solves this problem. You can look at the underlying schematic diagram of Redisson:

As long as the thread is successfully locked, a watchdog will be started watch dog. It is a background thread that will check every 10 seconds. If thread 1 still holds the lock, the life time of the lock key will be continuously extended. Therefore, Redisson uses it watch dogto solve the problem that the lock expires and is released, and the business is not completed .

3.6 Redisson + RedLock

The previous six solutions are only based on the distributed lock discussion of the Redis stand-alone version , which is not perfect yet. Because Redis is generally deployed in clusters:

If thread one gets the lock on the node, but the lock Redishas not been synchronized to the node. Just at this time, a node fails, and a node will be upgraded to a node. Thread 2 can naturally acquire the same lock, but thread 1 has already acquired the lock, and the security of the lock is gone.masterkeyslavemasterslavemasterkey

In order to solve this problem, Redis author antirez proposed an advanced distributed lock algorithm: Redlock . Its core idea is this:

Deploy multiple Redis masters to ensure that they do not go down at the same time. And these master nodes are completely independent of each other, and there is no data synchronization between them. At the same time, you need to ensure that the same method is used to acquire and release locks on multiple master instances as on a single instance of Redis.

We assume that there are currently 5 Redis master nodes, and these Redis instances are running on 5 servers.

Implementation steps of RedLock:

  1. Get the current time in milliseconds.

  2. Request locks from 5 master nodes in sequence. The client sets the network connection and response timeout period, and the timeout period should be less than the expiration time of the lock. (Assuming that the automatic lock expiration time is 10 seconds, the timeout period is generally between 5-50 milliseconds, let's assume that the timeout period is 50ms). If it times out, skip the master node and try the next master node as soon as possible.

  3. The client uses the current time to subtract the start time of acquiring the lock (that is, the time recorded in step 1) to obtain the time used to acquire the lock. If and only if more than half (N/2+1, here is 5/2+1=3 nodes) of the Redis master nodes have acquired the lock, and the use time is less than the lock expiration time, the lock is considered successful. (As shown above, 10s> 30ms+40ms+50ms+4m0s+50ms)

  4. If the lock is obtained, the real effective time of the key will change, and the time used to acquire the lock needs to be subtracted.

  5. If the lock acquisition fails (the lock is not acquired in at least N/2+1 master instances, or the lock acquisition time has exceeded the effective time), the client must unlock all master nodes (even if some master nodes do not have If the lock is successful, it also needs to be unlocked to prevent some people from slipping through the net).

The simplified steps are:

  • Request locks from 5 master nodes in sequence

  • Judging according to the set timeout period, whether to skip the master node.

  • If more than or equal to 3 nodes are successfully locked, and the use time is less than the validity period of the lock, it can be determined that the lock is successful.

  • If acquiring the lock fails, unlock it!

Redisson has implemented the redLock version of the lock . Interested friends, you can go and find out~

4. Zookeeper distributed lock

Before learning about Zookeeper distributed locks, let's review Zookeeper's nodes.

Zookeeper's node Znode has four types:

  • Persistent Node : The default node type. After the client that created the node disconnects from zookeeper, the node still exists.

  • Persistent node sequential node : The so-called sequential node means that when a node is created, Zookeeper numbers the node name according to the time sequence of creation, and the persistent node sequential node is a sequential persistent node.

  • Temporary node : Contrary to the persistent node, when the client that created the node disconnects from zookeeper, the temporary node will be deleted.

  • Temporary Sequential Nodes : Temporary nodes in sequence.

The Zookeeper distributed lock implementation applies temporary sequential nodes . The code is not posted here, let’s talk about the implementation principle of the zk distributed lock.

4.1 zk acquisition lock process

When the first client requests, the Zookeeper client will create a persistent node locks. If it (Client1) wants to acquire a lock, it needs locksto create a sequence node under the node lock1. As shown in the figure

Next, the client Client1 will search for locksall the temporary order child nodes below, and judge whether its own node lock1is the one with the smallest order, and if it is, it will successfully acquire the lock.

At this time, if another client client2 comes to try to acquire the lock, it will create another temporary node under lockslock2

The client client2 will also search for all temporary sequential child nodes under locks to determine whether its own node lock2 is the smallest. At this time, it finds that lock1 is the smallest, so it fails to acquire the lock. If it fails to acquire a lock, it will not be reconciled. client2 registers a Watcher event with its top-ranked node lock1 to monitor whether lock1 exists. That is to say, client2 fails to grab a lock and enters a waiting state.

At this time, if another client Client3 tries to acquire the lock, it will create another temporary node lock3 under locks

Similarly, client3 will also search for all temporary sequential child nodes under locks to determine whether its own node lock3 is the smallest. If it finds that it is not the smallest, it fails to acquire the lock. It will not be reconciled, it will register the Watcher event with the node lock2 in front of it to monitor whether the lock2 node exists.

4.2 Release the lock

Let's take a look at the process of releasing the lock again. When Zookeeper's client business is completed or fails, the temporary node will be deleted and the lock will be released. If the task is completed, Client1 will explicitly call the command to delete lock1

If the client fails, according to the characteristics of the temporary node, lock1 will be automatically deleted

After the lock1 node is deleted, Client2 is happy because it has been listening to lock1. When the lock1 node is deleted, Client2 will receive the notification immediately, and will also search for all the temporary sequential child nodes under locks, and send the lock2 to be the smallest, and then obtain the lock.

In the same way, after Client2 gets the lock, Client3 is also eyeing it, ahaha~

  • The design positioning of Zookeeper is distributed coordination, which is easy to use. If you can't get the lock, just add a listener, which is very suitable for distributed locks.

  • Zookeeper also has disadvantages as a distributed lock: if many clients frequently apply for locks and release locks, the pressure on the Zookeeper cluster will be greater.

5. Comparison of three distributed locks

5.1 Database distributed lock implementation

advantage:

  • Simple, easy to use, no need to introduce Redis、zookeepermiddleware.

shortcoming:

  • Not suitable for high concurrency scenarios

  • The db operation performance is poor;

5.2 Redis distributed lock implementation

advantage:

  • Good performance, suitable for high concurrency scenarios

  • lighter weight

  • There are better framework support, such as Redisson

shortcoming:

  • Expiration time is not easy to control

  • Need to consider the scenario where the lock is accidentally deleted by other threads

5.3 Zookeeper distributed lock implementation

shortcoming:

  • The performance is not as good as the distributed lock implemented by redis

  • A relatively heavy distributed lock.

advantage:

  • Have better performance and reliability

  • There are better-encapsulated frameworks, such as Curator

5.4 Comparison and summary

  • From a performance perspective (from high to low) Redis > Zookeeper >= database;

  • From the perspective of ease of understanding (from low to high) Database > Redis > Zookeeper;

  • From the perspective of implementation complexity (from low to high) Zookeeper > Redis > database;

  • From a reliability perspective (from high to low) Zookeeper > Redis > Database.

Guess you like

Origin blog.csdn.net/Javatutouhouduan/article/details/130949569