MQTT cluster that supports access to hundreds of millions of IoT terminal devices

1. Introduction to MQTT protocol

  MQTT is a lightweight, flexible, and scalable publish/subscribe messaging protocol commonly used in the Internet of Things (IoT) field. It can transfer small messages in different network environments, making it easier to communicate in low-bandwidth and unreliable network environments.

1.1 Several important concepts of MQTT

  • Broker: The intermediary in the MQTT protocol is responsible for receiving and forwarding messages. The client needs to connect to the Broker to publish and subscribe messages.

  • Topic: The message topic in MQTT, used to identify the type or content of the message. Clients can publish or subscribe to one or more topics. For example, a temperature sensor could publish temperature data to a topic named "temperature".

  • QoS: MQTT supports 3 message quality levels (QoS) to ensure the reliability of message delivery. QoS 0 means the message is delivered at most once, QoS 1 means the message is delivered at least once, and QoS 2 means the message is delivered exactly once.

  • 客户端: The MQTT client connected to the Broker can be a publisher, a subscriber or both.

1.2 MQTT related products and services

1.2.1 MQTT Broker

  MQTT Broker is the core component of the MQTT protocol, responsible for receiving and forwarding messages. Commonly used MQTT Brokers include Eclipse Mosquitto, EMQX, HiveMQ, ActiveMQ, Moscaetc., which provide open source and commercial versions for users to choose from.

1.2.2 MQTT cloud platform

  The MQTT cloud platform is a cloud service based on the MQTT protocol, which can provide functions such as MQTT Broker, data storage, and data analysis, enabling developers to quickly build and manage IoT applications. Commonly used MQTT cloud platforms include AWS IoT, Microsoft Azure IoT, IBM Watson IoT, etc.

1.2.3 MQTT client library

  The MQTT client library is used to implement the MQTT protocol on the client device to communicate with the Broker. The mainstream MQTT client libraries include Paho MQTT, MQTT.js, Eclipse Kura, etc., and support various programming languages ​​and platforms, such as Java, Python, JavaScript, etc.

  This paper selects it EMQXas the research object, and builds an MQTT service cluster based on the open source version of EMQX.

2. Function comparison between EMQX Enterprise Edition and Open Source Edition

insert image description here
Remarks : The above data comes from EMQ official website – product overview .

  As shown in the figure above, the red part is the function not supported by the open source version. The comparison between the open source version and the enterprise version is basically the same in terms of scalability and performance, that is, both the open source version and the enterprise version support up to 100 million connections and 5 million messages per second, which can meet most business scenarios. If the team does not If you don’t need O&M resources, consider using the enterprise version, which will have a higher SLA stability guarantee; if the team has its own O&M capabilities, and there is no strong demand for the extended functions of the enterprise version, you can consider using the open source version.

3. EMQX cluster deployment

3.1 EMQX cluster architecture

3.1.1 Architecture of EMQX 4.x and previous versions

insert image description here
  An architecture in which all nodes in the EMQX cluster are interconnected in pairs.

3.1.2 EMQX 5.x is based on Mria's new architecture

insert image description here
  The nodes of the EMQX cluster are divided into core nodes (Mria Core) and replica nodes (Mria Replicant), and the core nodes are connected to each other in pairs.

  • 核心节点: The core node is used as the data layer of the database, and the nodes are connected in a full mesh. Each node contains a copy of the latest data, which ensures fault tolerance: as long as one node survives, the data will not be lost. Core nodes are generally static and persistent, and auto-scaling (that is, adding, removing, or replacing nodes frequently) is not recommended.

  • 复制节点: The replication node will connect to the core node and passively replicate data updates from the core node. Replica nodes are not allowed to perform any write operations, but hand them over to the core nodes for execution. At the same time, since the replication node has a complete copy of local data, the data reading speed is very fast, which helps to reduce the delay of EMQX routing.

  Considering this data replication model as a mixture of masterless replication and master-slave replication, the advantages of this structure are:

  • Higher level of scalability: EMQX 5.0 has been able to support large-scale clusters 23containing nodes.
  • Easier cluster autoscaling: Simplifies autoscaling of clusters with autoscaling of replicated nodes.

  Compared with the 4.x version where all nodes are fully connected, the more nodes there are, the higher the cost of data synchronization between nodes. In EMQX 5.0, since the replication nodes do not participate in data writing, when more replication nodes join When clustering, the update efficiency of the table will not be affected, allowing the creation of larger EMQX clusters.

  In addition, replica nodes are designed to be added or removed on demand, and adding or removing them will not change data redundancy, so they can be placed in an auto-scaling group for better DevOps practices. However, as the total amount of data increases, initial replication of data from the core node is a relatively heavy operation, so the automatic scaling strategy of the replication node cannot be too aggressive.
  The above content is excerpted from the EMQ official website. For detailed information about the EMQX cluster architecture, please refer to: Deployment Architecture and Cluster Requirements .

3.1.3 EMQX cluster node discovery strategy

insert image description here
  Node discovery is a necessary process for creating a distributed cluster. The default configuration of EMQX uses manual discovery strategy to create a cluster, and the configuration information is in the /etc/emqx/emqx.confconfiguration file.

cluster {
  name = emqxcl
  discovery_strategy = manual
}

  discovery_strategyThe value can be: manual| static| mcast| dns| etcd| K8s. Corresponding to different node discovery methods. For details about different discovery methods, please refer to: EMQX Cluster Node Discovery.

3.2 Deployment methods supported by EMQX cluster

3.3 EMQX cluster deployment process

  First open the official website of EMQ, find the installation package download entry , and then select EMQX download in the deployment method.
insert image description hereAfter clicking EMQX download, two lines of enterprise version and open source version will appear below, and then select EMQX open source version, as shown in the figure below: select the
insert image description here
  corresponding version according to the operating system version, and then click the free download button. This article takes Centos 7 as an example. Select Centos 7 and click Free Download to enter the download page of the installation package.
insert image description here
  Select the installation method as rpm, and the CPU architecture as amd64, and then execute the following command to obtain the rpm installation package of emqx. If you don't know how to choose the configuration of CPU and memory, you can use the configuration estimation tool provided by EMQ to make a rough calculation. The configuration information required in the actual production environment is subject to the actual operation . Configure the estimator address.

3.3.1 EMQX installation and deployment

  • Download the installation package
mkdir -p /opt/emq
cd /opt/emq
wget https://www.emqx.com/zh/downloads/broker/5.0.24/emqx-5.0.24-el7-amd64.rpm
  • Install EMQX and dependencies
sudo yum install emqx-5.0.24-el7-amd64.rpm -y
  • Modify the node name of EMQX, /etc/emqx/emqx.confand nameset the value of the parameter in the configuration file emqx@节点内网IP地址to the form of
node {
  name = "[email protected]"
  cookie = "emqxsecretcookie"
  data_dir = "/var/lib/emqx"
}
  • Start EMQX service
sudo systemctl start emqx

Execute the above command on each node to complete the deployment of the stand-alone version of each node.

  • Check service running status
netstat -ltnp | grep emq

insert image description here
Or open the dashboard console of emqx through 18083the port for management, visit the address: http://节点IP:18083. The first login account is: admin, and the password is: public.

3.3.2 Create a cluster or join a cluster

  Create a cluster using a manual discovery strategy. Suppose the node name of EMQX node 1 is: [email protected], and the name of EMQX node 2 is: [email protected]. Execute the following name on node 2 to create a cluster:

emqx_ctl cluster join [email protected]
  • View the cluster status, as shown in the figure below, it indicates that there are two nodes in the current cluster in runningstatus .
emqx_ctl cluster status

insert image description here

3.3.3 Exit the cluster

emqx_ctl cluster leave

insert image description here

3.4 EMQX optimization and improvement

3.4.1 Performance testing and resource optimization

3.4.2 EMQX cluster split-brain problem

  EMQX 4.x and previous versions use the real-time distributed database that comes with Erlang/OTP Mnesia, which supports two data access modes: 本地模式and 远程模式. 本地模式It is a full connection and point-to-point replication mode. As shown in Section 3.1.1 above, the data in the node will be replicated to all other nodes in the cluster. 远程模式It is the distribution of data to different nodes in the cluster. If the current node does not have the data that the client wants to access, the node accesses other nodes to obtain data through RPC. The advantage of the local mode is that the network overhead is small and the data access efficiency is high. As long as one node in the cluster is normal, the integrity of the data can be guaranteed. The disadvantage is that the horizontal scalability is poor and there is a risk of split brain.
   EMQX 5.x introduces a new Mria architecture. As shown in Section 3.1.2 above, all nodes in the cluster are divided into core nodes and replication nodes. The core nodes adopt a full connection and point-to-point mode, and the replication nodes only communicate with a certain The designated core nodes perform data passive synchronization update. By reducing the Mnesia database transaction processing between replication nodes and core nodes in the cluster, the risk of cluster split brain is effectively reduced.

   EMQX /etc/emqx/emqx.confenables cluster.autoheal = onthe automatic repair function of the cluster network partition by setting parameters.

  • Nodes perform network partition confirmation 3 seconds after receiving a "Database Inconsistency" event from Mnesia.
  • After a node acknowledges that a network partition has occurred, it reports the message to the Leader node (the earliest node in the cluster to start).
  • After the Leader node delays for a while, when all nodes are online, it creates a SplitView.
  • The leader node elects the self-healing coordinator node in the majority partition.
  • The coordinator node restarts the nodes of the minority partition to restore the cluster

3.4.3 EMQX cluster load balancing

  Load Balancing is used to balance the load of multiple network components, thereby optimizing the use of resources and avoiding failures caused by component overload. Although load balancing is not a necessary component in the cluster, it can bring some very useful features to the cluster. For example, when it is configured in the EMQX cluster, it will bring the following advantages:

  • Balance the load of EMQX to avoid single node overload;
  • Simplify the client configuration, the client only needs to connect to the load balancer, and does not need to care about the internal scaling changes of the cluster;
  • TLS/SSL termination reduces the burden on the EMQX cluster;
  • To improve security, with load balancing at the front end of the cluster, unwanted traffic can be blocked through settings to protect the EMQX cluster from malicious attacks.

  When LB (load balancer) is deployed in EMQX, LB will be responsible for processing TCP connections and distributing received MQTT connections and messages to different EMQX cluster nodes. The deployment architecture is as follows:

  • EMQX TCP load balancing deployment
    insert image description here

  It is recommended to terminate SSL/TLS connections at the LB. The SSL/TLS secure connection is used between the device and LB, and the normal TCP connection is used between LB and EMQX. This mode can maximize the performance of EMQX cluster. The deployment architecture is as follows:

  • EMQX load balancing terminates TLS deployment
    insert image description here

  In addition to deploying clusters with load balancing, you can also use DNS polling to directly connect to the EMQX cluster, that is, add all nodes to the DNS polling list, and the device accesses the cluster through a domain name or IP address list. It is usually not recommended to use DNS polling direct connection in a production environment Way.

Remarks: The above content comes from the official website of EMQ.

4. Summary

  The EMQX 5.x version single cluster supports the access of up to 100 million devices and 5 million messages per second, which basically meets most current IoT-related business scenarios, such as vehicle access to the Internet of Vehicles cloud platform. The open source version and the enterprise version are basically the same in terms of device access limit and performance, which greatly improves the application value of the open source version . Deploy the open source version to reduce software procurement costs; if you lack the ability to operate and maintain, or expect to use the extended functions of the enterprise version, you can consider purchasing the enterprise version to improve service reliability and reduce maintenance difficulty.

Guess you like

Origin blog.csdn.net/hzwy23/article/details/130641881