Scenario-based solution for distributed cluster architecture

distributed and clustered

distributed:

  • Split a system into multiple subsystems, each subsystem is responsible for its own part of the function, deployed independently, and each performs its duties

Cluster:

  • Multiple instances work together. The simplest/most common cluster is to copy an application into multiple deployments
    :
  • Distributed must be a cluster, but a cluster is not necessarily distributed
    - because a cluster is multiple instances working together. After a distributed system is split, there are multiple instances; -
    A cluster is not necessarily distributed, because a replicated cluster is not split but replicated

1. Consensus algorithm

Hash algorithms, such as MD5, SHA and other encryption algorithms in the field of security encryption, and Hash tables in data storage and search, all use Hash algorithms

Hash algorithm application scenarios:

  • Request load balancing (such as nginx's ip_hash strategy)
  • distributed storage
  • MySQL sub-database sub-table

Common Hash Algorithm Problems

  • Take ip_hash as an example, assuming that the user's ip has not changed, one of the tomcat clusters on the server has a problem, and the number of servers is reduced, the previous hash algorithm results will be affected. And the possible impact is very large (maybe most clients need to recalculate, that is, re-login)
  • Therefore, when the cluster shrinks and expands, a large number of user sessions will be lost.
  • Although we cannot fully handle this situation, we can minimize the number of affected clients through a consistent hash algorithm

Consistent Hash Algorithm

insert image description here

  • Both client and server nodes are mapped to the hash ring shown in the figure above, and each server only processes client requests on its right side. In this way, the number of affected nodes will be greatly reduced during shrinkage and expansion.
  • If there is a situation where the number of servers is small and the request is skewed , the virtual node mechanism can be used to virtualize a few more nodes for the actual server. The more nodes are virtualized, the more this situation can be fully resolved.

Use consistent Hash load balancing strategy in nginx

The ngx_http_upstream_consistent_hash module is a load balancer that uses an internally consistent hash algorithm to select appropriate backend nodes.

  • This module can evenly map requests to backend machines in different ways according to configuration parameters.
  1. consistent_hash $remote_addr: can be mapped according to client ip
  2. consistent_hash $request_uri: according to the uri mapping requested by the client
  3. consistent_hash $args: Map according to the parameters carried by the client
  • The ngx_http_upstream_consistent_hash module is a third-party module, which requires us to download and install it before using it
  1. Download nginx consistent hash load balancing module from github https://github.com/replay/ngx_http_consistent_hash
  2. Upload the downloaded compressed package to the nginx server and decompress it
  3. We have compiled and installed nginx, now enter the source code directory of nginx at that time, and execute the following command
    ./configure —add-module=/root/ngx_http_consistent_hash-master
    make
    make install
    
  4. Nginx can be used, configure it in the nginx.conf file
    upstream lagouServer{
    	consistent_hash $request_uri;
    	server 127.0.0.1:8080;
    	server 127.0.0.1:8082;
    }
    

2. Cluster clock synchronization problem

A cluster is a system that works together and requires the time of each server to be consistent. Each server's time needs to be the same.
Otherwise, it will cause data confusion, such as: system a is faster than system b, which will cause the situation that orders are actually placed at the same time but the order time saved by the system is different.

  • Cluster Clock Synchronization Ideas
  1. All nodes in the cluster can connect to the Internet
    #使⽤ ntpdate ⽹络时间同步命令
    ntpdate -u ntp.api.bz #从⼀个时间服务器同步时间
    

    Linux also has scheduled tasks, crond, you can use the scheduled tasks of Linux to execute the ntpdate command every 10 minutes

  2. [A server node in the cluster can access the Internet] or [All servers in the cluster cannot access the Internet]
    • Choose a server Anode as the time server, and try to choose a machine that can be connected to the Internet as the time server
    • Configure Ait as a time server (modify the /etc/ntp.conf file)
      1、如果有 restrict default ignore,注释掉它
      2、添加如下⼏⾏内容
      	restrict 172.17.0.0 mask 255.255.255.0 nomodify notrap 
      	# 放开局域⽹同步功能,172.17.0.0是你的局域⽹⽹段
      	server 127.127.1.0 # local clock
      	fudge 127.127.1.0 stratum 10
      3、重启⽣效并配置ntpd服务开机⾃启动
      	service ntpd restart
      	chkconfig ntpd on
      
    • Other nodes in the cluster can Asynchronize time from the server
      ntpdate 172.17.0.17
      

3. Distributed ID solution

UUID

  • UUID (Universally Unique Identifier), translated into Chinese as: Universal Unique Identifier
    • Disadvantages: The generated code is irregular and difficult to identify

Self-incrementing ID for independent database

  • Create a table on an independent database, use the program to insert data into the table, and use select LAST_INSERT_ID();the query to auto-increment the id
    • Disadvantages: The premise that the self-incrementing ID of the independent database can be used is that the performance of the independent database is no problem.

SnowFlake snowflake algorithm

  • The Snowflake Algorithm is a strategy for generating distributed IDs launched by Twitter. This algorithm can generate a long-type ID.

Obtain a globally unique ID with the help of the Incr command of Redis

  • The Redis Incr command increases the digital value stored in the key, if not, the value of the key will be initialized to 0, and then the INCR operation is performed.
    Jedis jedis = new Jedis("127.0.0.1",6379);
    try {
          
          
    	long id = jedis.incr("id");
    	System.out.println("从redis中获取的分布式id为: " + id);
    } finally {
          
          
    	if (null != jedis) {
          
          
    		jedis.close();
    	}
    }
    

4. Distributed scheduling problem

The difference between scheduled tasks and message queues

common ground

  • Asynchronous processing: registration and order processing step by step
  • Application decoupling: A single application can be split into multiple applications through timed tasks or message queues
  • Traffic peak clipping: using task jobs and MQ can resist traffic

Essentially different

  • Timed tasks are event-driven, MQ is event-driven, and time-driven is irreplaceable, such as the interest settlement of the financial system.
  • Timed task jobs are more inclined to batch processing, and MQ tends to be processed one by one.

Distributed scheduling framework Elastic-job (Dangdang open source framework based on Qrartz secondary development)

Elastic-Job's github address: https://github.com/elasticjob
Main functions:
Distributed scheduling coordination, execution scheduling based on Quartz cron expressions, elastic expansion and contraction, failure handover, re-triggering of missed execution jobs, support for task fragmentation scheduling

  • jar package (API) + install zk software
    insert image description here
    task fragmentation:
    insert image description here

5. Session sharing problem solution

Nginx's IP_Hash strategy

Requests from the same client IP will be routed to the same target server, which is also called session stickiness.
Advantages:

  • Simple configuration, no intrusion into the application, and no additional code modification is required

shortcoming:

  • The server restarts and the session is lost
  • There is a risk of high load at a single point
  • single point of failure problem

Session copy

By modifying configuration files between multiple tomcats,
the advantages of replication between sessions are achieved:

  • Do not hack apps
  • Facilitate server horizontal expansion
  • Can adapt to various load balancing strategies
  • Server restart or downtime will not cause session loss
    Disadvantages:
  • low performance
  • memory consumption
  • Can't store too much data, otherwise the more data, the more performance delay will be affected

Session sharing, session centralized storage

The essence of Session is caching, so why not hand over Session data to professional caching middleware? For example,
the advantages of Redis:

  • Can adapt to various load balancing strategies
  • Server restart or downtime will not cause session loss
  • Strong scalability
  • Suitable for a large number of clusters
    . Disadvantages:
  • There are intrusions to the application, and the interaction code with Redis is introduced

Guess you like

Origin blog.csdn.net/u013795102/article/details/113245191