[Nacos] Deploy Nacos cluster based on k8s containerization

Recently, a three-node nacos cluster service was deployed on the machine for the registration and configuration center of several small microservices. Nginx was used to briefly proxy it, and then a brief study was conducted on cluster deployment and distributed deployment to slightly improve availability. After deployment, it can be used normally, but a problem is found. After refreshing the Nacos cluster node list, there will always be one or two nodes in the DOWNh or SUSPICIOUS state from time to time, so a long process of finding the problem begins. The cluster can be used normally, but each service application will always have a bunch of link nacos service exceptions. It is estimated that it is redirected to other nodes immediately after discovering that it is not working. In the end, it was found that the hostname of the server of a certain node was wrong (another situation was that cluster.confthe content in the container was old data, and the original content was also added during ENV configuration), and the IP address of the corresponding server was configured in the nacos cluster NACOS_INETUTILS_IP_ADDRESS, which led to The heartbeat detection between cluster nodes fails. After three times, the node automatically goes offline or is placed in an untrusted state...

Summary of previous affairs

Record notes related to ideas and operations

  • Access within the cluster only
  • Opening conditions for external network ports in internal network k8s environment
  • Frequently asked questions about nacos2.2.X version

It is easy to build a cluster environment locally. It is normal to provide machine setup on the server. When it comes to the k8s environment and the intranet, it is a bit deviated. It is okay if there is no problem. Once the deployment is not successful, various problems will occur. After the deployment is successful, service failure will occur. Lines and other
problems encountered:

  • Question 1: After the cluster was successfully deployed using the intranet R&D cloud k8s, a fourth node appeared out of the original three nodes.
  • Problem 2: Cluster deployment passes in parameters based on customized environment variables, but there is more than one dynamic node.
  • Problem 3: Using the VIP address to map the original internal cluster port, service registration appears to be online and offline in large quantities.

It is not a regular operation and deployment, but it still needs to be summarized and sorted out, and the notes will be reviewed later. The relevant content of the configuration file can be configured according to different environments. Without adding more burdens, only the problems of climbing pits will be recorded.

Build the architecture

Insert image description here

  • Only 3+ Nacos nodes can form a cluster;
  • Nacos Nginx Proxy is used for proxy forwarding;

Prepare for typesetting

Compared with 1.X, Nacos2.0 version has a new gRPC communication method, so 2 more ports are needed. The new port is automatically generated with a certain offset based on the configured main port (server.port). Source: Cluster Mode Deployment

port offset from master port describe
8848 0 Main port, HTTP port used by client, console and OpenAPI
9848 1000 Client gRPC request server port, used by the client to initiate connections and requests to the server
9849 1001 The server gRPC requests the server port for synchronization between services, etc.
7848 -1000 Jraft request server port, used to process Raft related requests between servers

When using VIP/nginx requests, TCP forwarding needs to be configured, http2 forwarding cannot be configured, otherwise the connection will be disconnected by nginx. Ports 9849 and 7848 are communication ports between servers. Please do not expose them to the external network environment and client testing.
Insert image description here

According to the above official port allocation requirements, the port allocation of the Nacos cluster created using three servers deployed here is as follows:

node IP Port (required exposure) Remark Version Current offline environment deployment file path
nacos-node1 192.168.xx.201 Host: 8858, 9858, 9859, 7858
Container: 8858, 9858, 9859, 7858
Nacos node one nacos/nacos-server:2.2.4 /root/nacos-deploy/
nacos-node2 192.168.xx.202 Host: 8858, 9858, 9859, 7858
Container: 8858, 9858, 9859, 7858
Nacos node two nacos/nacos-server:2.2.4 /root/nacos-deploy/
nacos-node3 192.168.xx.203 Host: 8858, 9858, 9859, 7858
Container: 8858, 9858, 9859, 7858
Nacos node three nacos/nacos-server:2.2.4 /root/nacos-deploy/
Nacos DB Mysql 192.168.xx.206 Host: 3306 Container: 3306 Nacos database mysql:5.7.34 /root/nacos-db-deploy
Nacos DB Postgres 192.168.xx.206 Host: 5432 Container: 5432 Nacos database postgres:12-alpine /root/nacos-db-deploy
Nacos Nginx Proxy 192.168.xx.208 Host: 80 Container: 80 Nacos agent nginx:1.23.2 /root/nacos-proxy-deploy
Nacos Cheak Health 192.168.xx.208 health examination /root/nacos-check-health

k8s interoperable domain name

[nacos-node1] refers to: load name
[pigcloud.svc.cluster.local] refers to: fixed service name

nacos-node1.pigcloud.svc.cluster.local:8858 
nacos-node2.pigcloud.svc.cluster.local:8858 
nacos-node3.pigcloud.svc.cluster.local:8858

Adjustment of configuration information in domain name nginx plays a key role

Create Nacos database

If you are using containerization to run nacos, you need to create a database and import the data files of the basic database structure from the official corresponding version.

Compile nacos2.2.4

Nacos 2.2.4 supports pg database adaptation and transformation

Create mysql service

  • docker-mysql.yml
  • Execute yml
docker-compose -f  /root/nacos-db-deploy/docker-mysql.yaml up -d
# 导入官方表数据
docker cp /root/nacos/conf/nacos-mysql.sql nacos-mysql:/tmp
docker exec -it nacos-mysql sh
mysql -uroot -p123456
create database nacos;
use nacos;
source /tmp/nacos-mysql.sql;

Create postgres service

  • docker-postgres.yml

  • Execute yml

docker-compose -f  /root/nacos-db-deploy/postgres.yaml up -d

Nacos service

Create yaml file

mkdir /data/nacos2.2.4_1/logs -p
mkdir /data/nacos2.2.4_2/logs -p
mkdir /data/nacos2.2.4_2/logs -p

mkdir /root/nacos-deploy/

cat << EOF > /root/nacos-deploy/nacos1.yaml
对应内容写入
EOF

cat << EOF > /root/nacos-deploy/nacos2.yaml
对应内容写入
EOF

cat << EOF > /root/nacos-deploy/nacos3.yaml
对应内容写入
EOF

The ports in each configuration are non-standard ports. Please note NACOS_APPLICATION_PORTthat if the environment variables are not specified as non-specific ports, the default environment variables in the configuration will be. 8848Problems will occur when running three nodes on the same machine.

Run Nacos

Each node can run

docker-compose -f /root/nacos-deploy/nacos1.yaml up -d
docker-compose -f /root/nacos-deploy/nacos2.yaml up -d
docker-compose -f /root/nacos-deploy/nacos3.yaml up -d

It is normal for three nodes to access the page individually; it is normal to access the page through a proxy;

Note: The web page does not use the client. The server gRPC calls the service, but the corresponding port needs to be requested in the program, so it must be exposed. Because a non-standard port is used, it can be directly (disguised) when nginx is used as a proxy. ) can be mapped to a standard port.

Nginx proxy configuration

Containerization is still used here.
Configure nginx proxy

mkdir /root/nacos-proxy-deploy
cat <<EOF > /root/nacos-proxy-deploy/nginx.conf
对应内容写入
EOF
  • proxy container yaml
cat > /root/nacos-proxy-deploy/nacos-nginx-proxy.yaml << EOF  
对应内容写入
EOF

cd /root/nacos-proxy-deploy
# 容器运行
docker-compose -f nacos-proxy.yaml up -d

After setting up, check whether the container is running normally. You need to test whether the instance service registration is normal.

environmental inspection

mkdir /root/nacos-check-healthy -p
cat > /root/nacos-check-healthy/nacos_check_status.py << EOF
对应内容写入
EOF

R&D cloud configuration

  • Configuration
server.servlet.contextPath=${SERVER_SERVLET_CONTEXTPATH:/nacos}
server.error.include-message=ON_PARAM
server.port=${NACOS_APPLICATION_PORT:8848}
# k8s集群时候使用域名进行固定
nacos.inetutils.ip-address=${NACOS_INETUTILS_IP_ADDRESS:}

# 使用pg数据库配置
spring.datasource.platform=${SPRING_DATASOURCE_PLATFORM:postgresql}
db.num=${PG_DATABASE_NUM:1}
db.jdbcDriverName=${PG_SERVICE_DRIVER:org.postgresql.Driver}
db.url.0=jdbc:${SPRING_DATASOURCE_PLATFORM:postgresql}://${PG_SERVICE_HOST:localhost}:${PG_SERVICE_PORT:5432}/${PG_SERVICE_DB_NAME:strong_db}?currentSchema=${PG_CURRENT_SCHEMA:public}&tcpKeepAlive=true&reWriteBatchedInserts=true
db.user.0=${PG_SERVICE_USER:postgres}
db.password.0=${PG_SERVICE_PASSWORD:123456}

  • Environment configuration

Here are different configurations based on different R&D cloud environments.

KeyValue.JVM_XMS=1g
KeyValue.JVM_XMX=1g
KeyValue.JVM_XMN=512m
KeyValue.JVM_MS=128m
KeyValue.JVM_MMS=320m
KeyValue.PREFER_HOST_MODE=ip
# 这里是针对pg如果使用mysql可以自行添加调整${PARAM_NAME:PARAM_DEF_VALUE}
KeyValue.SPRING_DATASOURCE_PLATFORM=postgresql
KeyValue.PG_SERVICE_HOST=192.168.xx.208
KeyValue.PG_SERVICE_PORT=5432
KeyValue.PG_SERVICE_USER=postgres
KeyValue.PG_SERVICE_PASSWORD=123456
KeyValue.PG_SERVICE_DB_NAME=strong_db
KeyValue.PG_CURRENT_SCHEMA=public
# 如果想使用单击MODE=standalone
KeyValue.MODE=cluster
# 集群配置
KeyValue.NACOS_SERVERS=nacos-node1.pigcloud.svc.cluster.local:8858 nacos-node2.pigcloud.svc.cluster.local:8858 nacos-node3.pigcloud.svc.cluster.local:8858
# 多网卡IP选择
KeyValue.NACOS_INETUTILS_IP_ADDRESS=nacos-node1.pigcloud.svc.cluster.local

common problem

  • Normally, after a node registers a service, it will be synchronized to other nodes.
  • When each node cannot elect a leader,
    check the log: /data/nacos2.2.4_1/logs/alipay-jraft.log
    Instance service related log: /data/nacos2.2.4_1/naming-raft.log
  • After setting up proxy forwarding, be sure to expose it
    through logs. The 8848 port is exposed for the nacos client to log in to obtain the token. Subsequent operations will carry the token for resource operations; then the 9848 port is exposed for the client's gRPC request, so if it is set If the proxy is forwarded, be sure to expose it, otherwise the connection will fail and be dropped.

Reference address

Guess you like

Origin blog.csdn.net/u010638673/article/details/131738325