nacos心跳机制重复发送原理

在学习nacos源码的时候，正好看到了一个点，和昨天记录的ScheduledThreadPoolExecutor有点关系，所以记录下这个点

nacos在注册服务的时候，会定时发送心跳，在发送心跳的时候，调用的是ScheduledThreadPoolExecutor.schedule()方法，昨天的笔记中，有记录，这个方法其实是只会调用一次的，ScheduledThreadPoolExecutor也提供了两个循环重复定时调用任务的方法，但是nacos没有使用，用的反而是schedule方法，那是如何实现重复发送心跳的呢？循环递归调用。。。

因为nacos所谓的心跳机制，其实就是nacos-client定时的给nacos-server发送请求，告诉server，当前服务是正常的，所以一定是循环重复定时调用的

/**
 * Add beat information.
 * 添加心跳线程,需要注意的是，这里只会发送一次心跳，但是在run方法中，会重复调用schedule方法
 * @param serviceName service name
 * @param beatInfo    beat information
 */
public void addBeatInfo(String serviceName, BeatInfo beatInfo) {
    
    
	NAMING_LOGGER.info("[BEAT] adding beat: {} to beat map.", beatInfo);
	String key = buildKey(serviceName, beatInfo.getIp(), beatInfo.getPort());
	BeatInfo existBeat = null;
	//fix #1733
	if ((existBeat = dom2Beat.remove(key)) != null) {
    
    
		existBeat.setStopped(true);
	}
	dom2Beat.put(key, beatInfo);
	/**
	 * 这个schedule只会发送一次心跳
	 */
	executorService.schedule(new BeatTask(beatInfo), beatInfo.getPeriod(), TimeUnit.MILLISECONDS);
	MetricsMonitor.getDom2BeatSizeMonitor().set(dom2Beat.size());
}

下面这个方法是BeatTask的run方法，也就是说，线程池在调用的时候，实际上执行的是BeatTask的run方法
可以看到，在run方法中，是对心跳返回的结果进行的一系列处理，实现重复发送心跳最为核心的代码，就在最后一行，重新调用executorService.schedule()方法

@Override
public void run() {
    
    
	if (beatInfo.isStopped()) {
    
    
		return;
	}
	long nextTime = beatInfo.getPeriod();
	try {
    
    
		/**
		 * 这里是去调用服务端的心跳检测的代码
		 * 这里返回的报文中，会有一个下次心跳检测的执行间隔
		 */
		JsonNode result = serverProxy.sendBeat(beatInfo, BeatReactor.this.lightBeatEnabled);
		long interval = result.get("clientBeatInterval").asLong();
		boolean lightBeatEnabled = false;
		if (result.has(CommonParams.LIGHT_BEAT_ENABLED)) {
    
    
			lightBeatEnabled = result.get(CommonParams.LIGHT_BEAT_ENABLED).asBoolean();
		}
		BeatReactor.this.lightBeatEnabled = lightBeatEnabled;
		if (interval > 0) {
    
    
			nextTime = interval;
		}
		int code = NamingResponseCode.OK;
		if (result.has(CommonParams.CODE)) {
    
    
			code = result.get(CommonParams.CODE).asInt();
		}
		if (code == NamingResponseCode.RESOURCE_NOT_FOUND) {
    
    
			Instance instance = new Instance();
			instance.setPort(beatInfo.getPort());
			instance.setIp(beatInfo.getIp());
			instance.setWeight(beatInfo.getWeight());
			instance.setMetadata(beatInfo.getMetadata());
			instance.setClusterName(beatInfo.getCluster());
			instance.setServiceName(beatInfo.getServiceName());
			instance.setInstanceId(instance.getInstanceId());
			instance.setEphemeral(true);
			try {
    
    
				serverProxy.registerService(beatInfo.getServiceName(),
						NamingUtils.getGroupName(beatInfo.getServiceName()), instance);
			} catch (Exception ignore) {
    
    
			}
		}
	} catch (NacosException ex) {
    
    
		NAMING_LOGGER.error("[CLIENT-BEAT] failed to send beat: {}, code: {}, msg: {}",
				JacksonUtils.toJson(beatInfo), ex.getErrCode(), ex.getErrMsg());

	}
	/**
	 * 在心跳机制的方法里面，会循环调用这个方法进行重复发送心跳
	 */
	executorService.schedule(new BeatTask(beatInfo), nextTime, TimeUnit.MILLISECONDS);
}

所以说的简单点：nacos在发送心跳的时候，会调用schedule方法，在schedule要执行的任务中，如果正常发送完心跳，会再次调用schedule方法

这里有一个问题需要考虑下：nacos为什么不用定时执行的线程池？比如：scheduleAtFixedRate或者scheduleWithFixedDelay？

我开始想的是有可能nacos会在服务端心跳接口返回异常的时候，就不再调用schedule方法进行重复调用了，但是实际代码中，没找到哪里中止了线程的执行，所以这个是我还疑问的地方，还希望有大佬知道原因的，指导下

nacos心跳机制重复发送原理

猜你喜欢