简介
Netflix Hystrix使用滑动窗口来统计调用的指标数据。Hystrix 1.5将滑动窗口设计成了数据流(reactive stream, rxjava中的Observable)的形式。通过消费数据流的形式利用滑动窗口,并对数据流进行变换后进行后续的操作,可以让开发者更加灵活地去使用。由于 Hystrix 里大量使用了 RxJava,再加上滑动窗口本质就是不断变换的数据流,滑动窗口中每个桶的数据都来自于源源不断的事件,因此滑动窗口非常适合用观察者模式和响应式编程思想的 RxJava 实现。使用 RxJava 实现有一大好处:可以通过 RxJava 的一系列操作符来实现滑动窗口,从而可以依赖 RxJava 的线程模型来保证数据写入和聚合的线程安全,将这一系列的机制交给 RxJava。所有的操作都是在 RxJava 的后台线程上进行的,RxJava 会保证操作的有序性和线程安全。
滑动窗口的实现都位于com.netflix.hystrix.metric.consumer包下,这里只挑BucketedRollingCounterStream分析。类图如下:
最顶层的 BucketedCounterStream 抽象类提供了基本的桶计数器实现,按配置的时间间隔将所有事件聚合成桶;BucketedRollingCounterStream 抽象类在其基础上实现滑动窗口,并聚合成指标数据;而最底下一层的类则是各种具体的实现,比如 HealthCountsStream 最终会聚合成健康检查数据(HystrixCommandMetrics.HealthCounts,统计调用成功和失败的计数),供 HystrixCircuitBreaker 使用。
BucketedCounterStream
BucketedCounterStream抽象类提供了基本的桶计数器实现。用户在使用Hystrix的时候一般要配两个值:timeInMilliseconds和numBuckets,前者代表滑动窗口的长度(时间间隔),后者代表滑动中桶的个数,那么每个桶对应的窗口长度就是bucketSizeInMs = timeInMilliseconds / numBuckets(记作一个单元窗口周期)。BucketCounterStream每隔一个单元窗口周期(bucketSizeInMs)就把这段时间内的所有调用事件聚合到一个桶内。
类定义中的泛型:
Event:Hystrix中的事件,如命令开始执行、命令执行完成等。
Bucket: 桶的类型
Output: 数据聚合的最终输出类型
public abstract class BucketedCounterStream<Event extends HystrixEvent, Bucket, Output> {
//Max bucket中存储的数据
protected final Observable<Bucket> bucketedStream;
protected BucketedCounterStream(final HystrixEventStream<Event> inputEventStream, final int numBuckets, final int bucketSizeInMs,
final Func2<Bucket, Event, Bucket> appendRawEventToBucket) {
this.numBuckets = numBuckets;
//Max 聚合
this.reduceBucketToSummary = new Func1<Observable<Event>, Observable<Bucket>>() {
@Override
public Observable<Bucket> call(Observable<Event> eventBucket) {
return eventBucket.reduce(getEmptyBucketSummary(), appendRawEventToBucket);
}
};
final List<Bucket> emptyEventCountsToStart = new ArrayList<Bucket>();
for (int i = 0; i < numBuckets; i++) {
emptyEventCountsToStart.add(getEmptyBucketSummary());
}
//Max 从inputEventStream发送事件
this.bucketedStream = Observable.defer(new Func0<Observable<Bucket>>() {
@Override
public Observable<Bucket> call() {
return inputEventStream
.observe()
// 利用窗口函数,收集一个Bucket时间内的数据
.window(bucketSizeInMs, TimeUnit.MILLISECONDS) //bucket it by the counter window so we can emit to the next operator in time chunks, not on every OnNext
// 将数据汇总成一个Bucket
.flatMap(reduceBucketToSummary)//for a given bucket, turn it into a long array containing counts of event types
//Max 初始化
.startWith(emptyEventCountsToStart); //start it with empty arrays to make consumer logic as generic as possible (windows are always full)
}
});
}
通过BucketedCounterStream,将数据汇总成了以Bucket为单位的stream。每隔时间bucketSizeInMs就生成一个桶。然后,BucketedRollingCounterStream 按照滑动窗口的大小对每个单元窗口产生的桶进行聚合。
BucketedRollingCounterStream
public abstract class BucketedRollingCounterStream<Event extends HystrixEvent, Bucket, Output> extends BucketedCounterStream<Event, Bucket, Output> {
private Observable<Output> sourceStream;
private final AtomicBoolean isSourceCurrentlySubscribed = new AtomicBoolean(false);
protected BucketedRollingCounterStream(HystrixEventStream<Event> stream, final int numBuckets, int bucketSizeInMs,
final Func2<Bucket, Event, Bucket> appendRawEventToBucket,
final Func2<Output, Bucket, Output> reduceBucket) {
super(stream, numBuckets, bucketSizeInMs, appendRawEventToBucket);
Func1<Observable<Bucket>, Observable<Output>> reduceWindowToSummary = window -> window.scan(getEmptyOutputValue(), reduceBucket).skip(numBuckets);
this.sourceStream = bucketedStream // 数据流,每个对象代表单元窗口产生的桶
.window(numBuckets, 1) // 按照滑动窗口桶的个数进行桶的聚集
.flatMap(reduceWindowToSummary) // 将一系列的桶聚集成最后的数据对象
.doOnSubscribe(() -> isSourceCurrentlySubscribed.set(true))
.doOnUnsubscribe(() -> isSourceCurrentlySubscribed.set(false))
.share() // 不同的订阅者看到的数据是一致的
.onBackpressureDrop(); // 流量控制,当消费者消费速度过慢时就丢弃数据,不进行积压
}
@Override
public Observable<Output> observe() {
return sourceStream;
}
/* package-private */ boolean isSourceCurrentlySubscribed() {
return isSourceCurrentlySubscribed.get();
}
}
构造函数后两个参数参数分别代表两个函数:将事件流聚合成桶的函数(appendRawEventToBucket) 以及 将桶聚合成输出对象的函数(reduceBucket)。
我们看到 BucketedRollingCounterStream 实现了 observe 方法,返回了一个 Observable 类型的发布者 sourceStream,供订阅者去消费。这里的 sourceStream 应该就是滑动窗口的终极形态了,那么它是如何变换得到的呢?这里面的核心还是 window 和 flatMap 算子。这里的 window 算子和之前的版本不同,它可以将数据流中的一定数量的数据聚集成一个集合,它的第二个参数 skip=1 的意思就是按照步长为 1 在数据流中滑动,不断聚集对象,这即为滑动窗口的真正实现。
HealthCountsStream
这里看一个具体的滑动窗口的实现HealthCountsStream,它提供实时的健康检查数据HystrixCommandMetrics.HealthCounts,统计调用成功和失败的计数。
BucketedCounterStream里有三个泛型,这里回顾下:
public abstract class BucketedCounterStream<Event extends HystrixEvent, Bucket, Output> {
在这里,三个泛型对应的类型为:
Event: HystrixCommandCompletion,代表命令执行完成。可以从中获取执行结果,并从中提取所有产生的事件(HystrixEventType)。
Bucket: 桶的类型为long[],里面统计了各种事件的计数。其中index为事件类型枚举对应的索引(ordinal),值为对应事件的个数。
Output: HystrixCommandMetrics.HealthCounts,里面统计了总的执行次数、失败次数以及失败百分比,供断路器使用。
private static final ConcurrentMap<String, HealthCountsStream> streams = new ConcurrentHashMap<String, HealthCountsStream>();
private static final int NUM_EVENT_TYPES = HystrixEventType.values().length;
private HealthCountsStream(final HystrixCommandKey commandKey, final int numBuckets, final int bucketSizeInMs,
Func2<long[], HystrixCommandCompletion, long[]> reduceCommandCompletion) {
super(HystrixCommandCompletionStream.getInstance(commandKey), numBuckets, bucketSizeInMs, reduceCommandCompletion, healthCheckAccumulator);
}
下面看一下两个累加器
将事件聚合成桶
public static final Func2<long[], HystrixCommandCompletion, long[]> appendEventToBucket = new Func2<long[], HystrixCommandCompletion, long[]>() {
@Override
public long[] call(long[] initialCountArray, HystrixCommandCompletion execution) {
ExecutionResult.EventCounts eventCounts = execution.getEventCounts();
for (HystrixEventType eventType: ALL_EVENT_TYPES) {
switch (eventType) {
case EXCEPTION_THROWN: break; //this is just a sum of other anyway - don't do the work here
default:
initialCountArray[eventType.ordinal()] += eventCounts.getCount(eventType);
break;
}
}
return initialCountArray;
}
};
基于桶进行滑动窗口计数
private static final Func2<HystrixCommandMetrics.HealthCounts, long[], HystrixCommandMetrics.HealthCounts> healthCheckAccumulator = new Func2<HystrixCommandMetrics.HealthCounts, long[], HystrixCommandMetrics.HealthCounts>() {
@Override
public HystrixCommandMetrics.HealthCounts call(HystrixCommandMetrics.HealthCounts healthCounts, long[] bucketEventCounts) {
return healthCounts.plus(bucketEventCounts);
}
};
healthCounts.plus
private static final Func2<HystrixCommandMetrics.HealthCounts, long[], HystrixCommandMetrics.HealthCounts> healthCheckAccumulator = HystrixCommandMetrics.HealthCounts::plus;
// 具体的实现,位于 HystrixCommandMetrics.HealthCounts 类内
public HealthCounts plus(long[] eventTypeCounts) {
long updatedTotalCount = totalCount; // 之前的请求总数
long updatedErrorCount = errorCount; // 之前的失败个数
long successCount = eventTypeCounts[HystrixEventType.SUCCESS.ordinal()];
long failureCount = eventTypeCounts[HystrixEventType.FAILURE.ordinal()];
long timeoutCount = eventTypeCounts[HystrixEventType.TIMEOUT.ordinal()];
long threadPoolRejectedCount = eventTypeCounts[HystrixEventType.THREAD_POOL_REJECTED.ordinal()];
long semaphoreRejectedCount = eventTypeCounts[HystrixEventType.SEMAPHORE_REJECTED.ordinal()];
// 加上所有事件的总数
updatedTotalCount += (successCount + failureCount + timeoutCount + threadPoolRejectedCount + semaphoreRejectedCount);
// 加上失败事件的总数(包括请求失败、超时、线程池满、信号量满)
updatedErrorCount += (failureCount + timeoutCount + threadPoolRejectedCount + semaphoreRejectedCount);
return new HealthCounts(updatedTotalCount, updatedErrorCount);
}
事件的写入
根据HealthCountsStream的构造方法可知,事件流是对象是HystrixCommandCompletionStream。Hystrix中执行函数以命令模式封装成一个个命令(Command),每个命令执行时都会触发某个事件,其中命令执行完成事件HystrixCommandCompletion是Hystrix中最核心的事件,它可以代表某个命令执行成功、超时、异常等状态。熔断器的计数也依赖HystrixCommandCompletion事件。
构造HealthCountsStream时会获取HystrixCommandCompletionStream,如果不存在则创建。这个即是代表事件的流。
public class HystrixCommandCompletionStream implements HystrixEventStream<HystrixCommandCompletion> {
private final HystrixCommandKey commandKey;
private final Subject<HystrixCommandCompletion, HystrixCommandCompletion> writeOnlySubject;
private final Observable<HystrixCommandCompletion> readOnlyStream;
//Max 4 维护command key和stream
private static final ConcurrentMap<String, HystrixCommandCompletionStream> streams = new ConcurrentHashMap<String, HystrixCommandCompletionStream>();
public static HystrixCommandCompletionStream getInstance(HystrixCommandKey commandKey) {
// 获取HystrixCommandCompletionStream,不存在则创建
}
HystrixCommandCompletionStream(final HystrixCommandKey commandKey) {
this.commandKey = commandKey;
this.writeOnlySubject = new SerializedSubject<HystrixCommandCompletion, HystrixCommandCompletion>(PublishSubject.<HystrixCommandCompletion>create());
this.readOnlyStream = writeOnlySubject.share();
}
public void write(HystrixCommandCompletion event) {
writeOnlySubject.onNext(event);
}
@Override
public Observable<HystrixCommandCompletion> observe() {
return readOnlyStream;
}
}
而事件的写入其实是被HystrixThreadEventStream写入的。
在AbstractCommand#toObservable中,在执行完毕或者取消订阅时会调用AbstractCommand#handleCommandEnd,最终会调用到com.netflix.hystrix.metric.HystrixThreadEventStream#executionDone。
public void executionDone(ExecutionResult executionResult, HystrixCommandKey commandKey, HystrixThreadPoolKey threadPoolKey) {
HystrixCommandCompletion event = HystrixCommandCompletion.from(executionResult, commandKey, threadPoolKey);
writeOnlyCommandCompletionSubject.onNext(event);
}
那我们就来看一下writeOnlyCommandCompletionSubject的构造。可以看到它最终会写入到HystrixCommandCompletionStream中,然后就形成了BuckedCounterStream的inputEventStream。即事件来源。
writeOnlyCommandCompletionSubject = PublishSubject.create();
writeOnlyCommandCompletionSubject
.onBackpressureBuffer()
.doOnNext(writeCommandCompletionsToShardedStreams)
.unsafeSubscribe(Subscribers.empty());
private static final Action1<HystrixCommandCompletion> writeCommandCompletionsToShardedStreams = new Action1<HystrixCommandCompletion>() {
@Override
public void call(HystrixCommandCompletion commandCompletion) {
HystrixCommandCompletionStream commandStream = HystrixCommandCompletionStream.getInstance(commandCompletion.getCommandKey());
commandStream.write(commandCompletion);
}
};
获取统计数据
判断断路器是否打开circuitBreaker.allowRequest
public boolean allowRequest() {
if (properties.circuitBreakerForceOpen().get()) {
return false;
}
if (properties.circuitBreakerForceClosed().get()) {
isOpen();
return true;
}
return !isOpen() || allowSingleTest();
}
public boolean isOpen() {
if (circuitOpen.get()) {
return true;
}
// we're closed, so let's see if errors have made us so we should trip the circuit open
HealthCounts health = metrics.getHealthCounts();
if (health.getTotalRequests() < properties.circuitBreakerRequestVolumeThreshold().get()) {
return false;
}
if (health.getErrorPercentage() < properties.circuitBreakerErrorThresholdPercentage().get()) {
return false;
} else {
if (circuitOpen.compareAndSet(false, true)) {
circuitOpenedOrLastTestedTime.set(System.currentTimeMillis());
return true;
} else {
// How could previousValue be true? If another thread was going through this code at the same time a race-condition could have
// caused another thread to set it to true already even though we were in the process of doing the same
// In this case, we know the circuit is open, so let the other thread set the currentTime and report back that the circuit is open
return true;
}
}
}
public boolean allowSingleTest() {
long timeCircuitOpenedOrWasLastTested = circuitOpenedOrLastTestedTime.get();
if (circuitOpen.get() && System.currentTimeMillis() > timeCircuitOpenedOrWasLastTested + properties.circuitBreakerSleepWindowInMilliseconds().get()) {
if (circuitOpenedOrLastTestedTime.compareAndSet(timeCircuitOpenedOrWasLastTested, System.currentTimeMillis())) {
return true;
}
}
return false;
}
可以看到,获取统计数据的关键就在 HealthCounts health = metrics.getHealthCounts();
而它的调用路径如下:
metrics.getHealthCounts
=》 healthCountsStream.getLatest
=》 BucketedCounterStream#getLatest
最终调用到BucketedCounterStream#getLatest
public Output getLatest() {
startCachingStreamValuesIfUnstarted();
if (counterSubject.hasValue()) {
return counterSubject.getValue();
} else {
return getEmptyOutputValue();
}
}
// 未开始则设置订阅,最后的数据通过counterSubject返回
public void startCachingStreamValuesIfUnstarted() {
if (subscription.get() == null) {
//Max 订阅
Subscription candidateSubscription = observe().subscribe(counterSubject);
if (subscription.compareAndSet(null, candidateSubscription)) {
} else {
candidateSubscription.unsubscribe();
}
}
}
这里的数据源是通过observe()获取的,这里的observe()被BucketedRollingCounterStream覆盖,返回聚合后的sourceStream。代码如下:
BucketedRollingCounterStream#observe
@Override
public Observable<Output> observe() {
return sourceStream;
}
通过BucketedCounterStream#counterSubject来订阅BucketedRollingCounterStream#sourceStream,即最终统计出来的数据。
默认配置
默认配置都在HystrixCommandProperties中。
metrics.rollingStats.timeInMilliseconds
表示滑动窗口的时间(the duration of the statistical rolling window),默认10000(10s),也是熔断器计算的基本单位。
metrics.rollingStats.numBuckets
滑动窗口的Bucket数量(the number of buckets the rolling statistical window is divided into),默认10. 通过timeInMilliseconds和numBuckets可以计算出每个Bucket的时长。
circuitBreaker.errorThresholdPercentage
错误率阈值,表示达到熔断的条件。比如默认的50%,当一个滑动窗口内,失败率达到50%时就会触发熔断。
circuitBreaker.sleepWindowInMilliseconds
这个和熔断器自动恢复有关,为了检测后端服务是否恢复,可以放一个请求过去试探一下。sleepWindow指的发生熔断后,必须隔sleepWindow这么长的时间,才能放请求过去试探下服务是否恢复。默认是5s。
参考文章:
Hystrix 1.5 滑动窗口实现原理总结 | 「浮生若梦」 - sczyh30's blog
Spring Cloud 源码学习之 Hystrix 熔断器|springcloud,hystrix,熔断,滑动窗口|cyj