Hystrix断路器原理

简介

Netflix Hystrix使用滑动窗口来统计调用的指标数据。Hystrix 1.5将滑动窗口设计成了数据流(reactive stream, rxjava中的Observable)的形式。通过消费数据流的形式利用滑动窗口,并对数据流进行变换后进行后续的操作,可以让开发者更加灵活地去使用。由于 Hystrix 里大量使用了 RxJava,再加上滑动窗口本质就是不断变换的数据流,滑动窗口中每个桶的数据都来自于源源不断的事件,因此滑动窗口非常适合用观察者模式和响应式编程思想的 RxJava 实现。使用 RxJava 实现有一大好处:可以通过 RxJava 的一系列操作符来实现滑动窗口,从而可以依赖 RxJava 的线程模型来保证数据写入和聚合的线程安全,将这一系列的机制交给 RxJava。所有的操作都是在 RxJava 的后台线程上进行的,RxJava 会保证操作的有序性和线程安全。

滑动窗口的实现都位于com.netflix.hystrix.metric.consumer包下,这里只挑BucketedRollingCounterStream分析。类图如下:

 最顶层的 BucketedCounterStream 抽象类提供了基本的桶计数器实现,按配置的时间间隔将所有事件聚合成桶;BucketedRollingCounterStream 抽象类在其基础上实现滑动窗口,并聚合成指标数据;而最底下一层的类则是各种具体的实现,比如 HealthCountsStream 最终会聚合成健康检查数据(HystrixCommandMetrics.HealthCounts,统计调用成功和失败的计数),供 HystrixCircuitBreaker 使用。

BucketedCounterStream

BucketedCounterStream抽象类提供了基本的桶计数器实现。用户在使用Hystrix的时候一般要配两个值:timeInMilliseconds和numBuckets,前者代表滑动窗口的长度(时间间隔),后者代表滑动中桶的个数,那么每个桶对应的窗口长度就是bucketSizeInMs = timeInMilliseconds / numBuckets(记作一个单元窗口周期)。BucketCounterStream每隔一个单元窗口周期(bucketSizeInMs)就把这段时间内的所有调用事件聚合到一个桶内。

类定义中的泛型:

Event:Hystrix中的事件,如命令开始执行、命令执行完成等。

Bucket: 桶的类型

Output: 数据聚合的最终输出类型

public abstract class BucketedCounterStream<Event extends HystrixEvent, Bucket, Output> {
    //Max bucket中存储的数据
    protected final Observable<Bucket> bucketedStream;

    protected BucketedCounterStream(final HystrixEventStream<Event> inputEventStream, final int numBuckets, final int bucketSizeInMs,
                                    final Func2<Bucket, Event, Bucket> appendRawEventToBucket) {
        this.numBuckets = numBuckets;
        //Max 聚合
        this.reduceBucketToSummary = new Func1<Observable<Event>, Observable<Bucket>>() {
            @Override
            public Observable<Bucket> call(Observable<Event> eventBucket) {
                return eventBucket.reduce(getEmptyBucketSummary(), appendRawEventToBucket);
            }
        };

        final List<Bucket> emptyEventCountsToStart = new ArrayList<Bucket>();
        for (int i = 0; i < numBuckets; i++) {
            emptyEventCountsToStart.add(getEmptyBucketSummary());
        }

        //Max 从inputEventStream发送事件
        this.bucketedStream = Observable.defer(new Func0<Observable<Bucket>>() {
            @Override
            public Observable<Bucket> call() {
                return inputEventStream
                        .observe()
                        // 利用窗口函数,收集一个Bucket时间内的数据
                        .window(bucketSizeInMs, TimeUnit.MILLISECONDS) //bucket it by the counter window so we can emit to the next operator in time chunks, not on every OnNext
                        // 将数据汇总成一个Bucket
                        .flatMap(reduceBucketToSummary)//for a given bucket, turn it into a long array containing counts of event types
                        //Max 初始化
                        .startWith(emptyEventCountsToStart);           //start it with empty arrays to make consumer logic as generic as possible (windows are always full)
            }
        });
    }

通过BucketedCounterStream,将数据汇总成了以Bucket为单位的stream。每隔时间bucketSizeInMs就生成一个桶。然后,BucketedRollingCounterStream 按照滑动窗口的大小对每个单元窗口产生的桶进行聚合。

BucketedRollingCounterStream

public abstract class BucketedRollingCounterStream<Event extends HystrixEvent, Bucket, Output> extends BucketedCounterStream<Event, Bucket, Output> {
    private Observable<Output> sourceStream;
    private final AtomicBoolean isSourceCurrentlySubscribed = new AtomicBoolean(false);
    protected BucketedRollingCounterStream(HystrixEventStream<Event> stream, final int numBuckets, int bucketSizeInMs,
                                           final Func2<Bucket, Event, Bucket> appendRawEventToBucket,
                                           final Func2<Output, Bucket, Output> reduceBucket) {
        super(stream, numBuckets, bucketSizeInMs, appendRawEventToBucket);
        Func1<Observable<Bucket>, Observable<Output>> reduceWindowToSummary = window -> window.scan(getEmptyOutputValue(), reduceBucket).skip(numBuckets);
        this.sourceStream = bucketedStream      // 数据流,每个对象代表单元窗口产生的桶
                .window(numBuckets, 1)          // 按照滑动窗口桶的个数进行桶的聚集
                .flatMap(reduceWindowToSummary) // 将一系列的桶聚集成最后的数据对象
                .doOnSubscribe(() -> isSourceCurrentlySubscribed.set(true))
                .doOnUnsubscribe(() -> isSourceCurrentlySubscribed.set(false))
                .share()                        // 不同的订阅者看到的数据是一致的
                .onBackpressureDrop();          // 流量控制,当消费者消费速度过慢时就丢弃数据,不进行积压
    }
    @Override
    public Observable<Output> observe() {
        return sourceStream;
    }
    /* package-private */ boolean isSourceCurrentlySubscribed() {
        return isSourceCurrentlySubscribed.get();
    }
}

构造函数后两个参数参数分别代表两个函数:将事件流聚合成桶的函数(appendRawEventToBucket) 以及 将桶聚合成输出对象的函数(reduceBucket)。

我们看到 BucketedRollingCounterStream 实现了 observe 方法,返回了一个 Observable 类型的发布者 sourceStream,供订阅者去消费。这里的 sourceStream 应该就是滑动窗口的终极形态了,那么它是如何变换得到的呢?这里面的核心还是 window 和 flatMap 算子。这里的 window 算子和之前的版本不同,它可以将数据流中的一定数量的数据聚集成一个集合,它的第二个参数 skip=1 的意思就是按照步长为 1 在数据流中滑动,不断聚集对象,这即为滑动窗口的真正实现。

HealthCountsStream

这里看一个具体的滑动窗口的实现HealthCountsStream,它提供实时的健康检查数据HystrixCommandMetrics.HealthCounts,统计调用成功和失败的计数。

BucketedCounterStream里有三个泛型,这里回顾下:

public abstract class BucketedCounterStream<Event extends HystrixEvent, Bucket, Output> {
   
   

在这里,三个泛型对应的类型为:

Event: HystrixCommandCompletion,代表命令执行完成。可以从中获取执行结果,并从中提取所有产生的事件(HystrixEventType)。

Bucket: 桶的类型为long[],里面统计了各种事件的计数。其中index为事件类型枚举对应的索引(ordinal),值为对应事件的个数。

Output: HystrixCommandMetrics.HealthCounts,里面统计了总的执行次数、失败次数以及失败百分比,供断路器使用。

private static final ConcurrentMap<String, HealthCountsStream> streams = new ConcurrentHashMap<String, HealthCountsStream>();

private static final int NUM_EVENT_TYPES = HystrixEventType.values().length;

private HealthCountsStream(final HystrixCommandKey commandKey, final int numBuckets, final int bucketSizeInMs,
                           Func2<long[], HystrixCommandCompletion, long[]> reduceCommandCompletion) {
    super(HystrixCommandCompletionStream.getInstance(commandKey), numBuckets, bucketSizeInMs, reduceCommandCompletion, healthCheckAccumulator);
}

下面看一下两个累加器

将事件聚合成桶

public static final Func2<long[], HystrixCommandCompletion, long[]> appendEventToBucket = new Func2<long[], HystrixCommandCompletion, long[]>() {
    @Override
    public long[] call(long[] initialCountArray, HystrixCommandCompletion execution) {
        ExecutionResult.EventCounts eventCounts = execution.getEventCounts();
        for (HystrixEventType eventType: ALL_EVENT_TYPES) {
            switch (eventType) {
                case EXCEPTION_THROWN: break; //this is just a sum of other anyway - don't do the work here
                default:
                    initialCountArray[eventType.ordinal()] += eventCounts.getCount(eventType);
                    break;
            }
        }
        return initialCountArray;
    }
};

基于桶进行滑动窗口计数

private static final Func2<HystrixCommandMetrics.HealthCounts, long[], HystrixCommandMetrics.HealthCounts> healthCheckAccumulator = new Func2<HystrixCommandMetrics.HealthCounts, long[], HystrixCommandMetrics.HealthCounts>() {
    @Override
    public HystrixCommandMetrics.HealthCounts call(HystrixCommandMetrics.HealthCounts healthCounts, long[] bucketEventCounts) {
        return healthCounts.plus(bucketEventCounts);
    }
};

healthCounts.plus

private static final Func2<HystrixCommandMetrics.HealthCounts, long[], HystrixCommandMetrics.HealthCounts> healthCheckAccumulator = HystrixCommandMetrics.HealthCounts::plus;
// 具体的实现,位于 HystrixCommandMetrics.HealthCounts 类内
public HealthCounts plus(long[] eventTypeCounts) {
    long updatedTotalCount = totalCount; // 之前的请求总数
    long updatedErrorCount = errorCount; // 之前的失败个数
    long successCount = eventTypeCounts[HystrixEventType.SUCCESS.ordinal()];
    long failureCount = eventTypeCounts[HystrixEventType.FAILURE.ordinal()];
    long timeoutCount = eventTypeCounts[HystrixEventType.TIMEOUT.ordinal()];
    long threadPoolRejectedCount = eventTypeCounts[HystrixEventType.THREAD_POOL_REJECTED.ordinal()];
    long semaphoreRejectedCount = eventTypeCounts[HystrixEventType.SEMAPHORE_REJECTED.ordinal()];
    // 加上所有事件的总数
    updatedTotalCount += (successCount + failureCount + timeoutCount + threadPoolRejectedCount + semaphoreRejectedCount);
    // 加上失败事件的总数(包括请求失败、超时、线程池满、信号量满)
    updatedErrorCount += (failureCount + timeoutCount + threadPoolRejectedCount + semaphoreRejectedCount);
    return new HealthCounts(updatedTotalCount, updatedErrorCount);
}

事件的写入

根据HealthCountsStream的构造方法可知,事件流是对象是HystrixCommandCompletionStream。Hystrix中执行函数以命令模式封装成一个个命令(Command),每个命令执行时都会触发某个事件,其中命令执行完成事件HystrixCommandCompletion是Hystrix中最核心的事件,它可以代表某个命令执行成功、超时、异常等状态。熔断器的计数也依赖HystrixCommandCompletion事件。

构造HealthCountsStream时会获取HystrixCommandCompletionStream,如果不存在则创建。这个即是代表事件的流。

public class HystrixCommandCompletionStream implements HystrixEventStream<HystrixCommandCompletion> {
    private final HystrixCommandKey commandKey;

    private final Subject<HystrixCommandCompletion, HystrixCommandCompletion> writeOnlySubject;
    private final Observable<HystrixCommandCompletion> readOnlyStream;

    //Max 4 维护command key和stream
    private static final ConcurrentMap<String, HystrixCommandCompletionStream> streams = new ConcurrentHashMap<String, HystrixCommandCompletionStream>();

    public static HystrixCommandCompletionStream getInstance(HystrixCommandKey commandKey) {
        // 获取HystrixCommandCompletionStream,不存在则创建
    }

    HystrixCommandCompletionStream(final HystrixCommandKey commandKey) {
        this.commandKey = commandKey;

        this.writeOnlySubject = new SerializedSubject<HystrixCommandCompletion, HystrixCommandCompletion>(PublishSubject.<HystrixCommandCompletion>create());
        this.readOnlyStream = writeOnlySubject.share();
    }

    public void write(HystrixCommandCompletion event) {
        writeOnlySubject.onNext(event);
    }


    @Override
    public Observable<HystrixCommandCompletion> observe() {
        return readOnlyStream;
    }
}

而事件的写入其实是被HystrixThreadEventStream写入的。

在AbstractCommand#toObservable中,在执行完毕或者取消订阅时会调用AbstractCommand#handleCommandEnd,最终会调用到com.netflix.hystrix.metric.HystrixThreadEventStream#executionDone。

public void executionDone(ExecutionResult executionResult, HystrixCommandKey commandKey, HystrixThreadPoolKey threadPoolKey) {
    HystrixCommandCompletion event = HystrixCommandCompletion.from(executionResult, commandKey, threadPoolKey);
    writeOnlyCommandCompletionSubject.onNext(event);
}

那我们就来看一下writeOnlyCommandCompletionSubject的构造。可以看到它最终会写入到HystrixCommandCompletionStream中,然后就形成了BuckedCounterStream的inputEventStream。即事件来源。

writeOnlyCommandCompletionSubject = PublishSubject.create();
writeOnlyCommandCompletionSubject
        .onBackpressureBuffer()
        .doOnNext(writeCommandCompletionsToShardedStreams)
        .unsafeSubscribe(Subscribers.empty());
        
private static final Action1<HystrixCommandCompletion> writeCommandCompletionsToShardedStreams = new Action1<HystrixCommandCompletion>() {
    @Override
    public void call(HystrixCommandCompletion commandCompletion) {
        HystrixCommandCompletionStream commandStream = HystrixCommandCompletionStream.getInstance(commandCompletion.getCommandKey());
        commandStream.write(commandCompletion);
    }
}; 

获取统计数据

判断断路器是否打开circuitBreaker.allowRequest

public boolean allowRequest() {
    if (properties.circuitBreakerForceOpen().get()) {
        return false;
    }
    if (properties.circuitBreakerForceClosed().get()) {
        isOpen();
        return true;
    }
    return !isOpen() || allowSingleTest();
}

public boolean isOpen() {
    if (circuitOpen.get()) {
        return true;
    }

    // we're closed, so let's see if errors have made us so we should trip the circuit open
    HealthCounts health = metrics.getHealthCounts();
    if (health.getTotalRequests() < properties.circuitBreakerRequestVolumeThreshold().get()) {
        return false;
    }

    if (health.getErrorPercentage() < properties.circuitBreakerErrorThresholdPercentage().get()) {
        return false;
    } else {
        if (circuitOpen.compareAndSet(false, true)) {
            circuitOpenedOrLastTestedTime.set(System.currentTimeMillis());
            return true;
        } else {
            // How could previousValue be true? If another thread was going through this code at the same time a race-condition could have
            // caused another thread to set it to true already even though we were in the process of doing the same
            // In this case, we know the circuit is open, so let the other thread set the currentTime and report back that the circuit is open
            return true;
        }
    }
}

public boolean allowSingleTest() {
    long timeCircuitOpenedOrWasLastTested = circuitOpenedOrLastTestedTime.get();
    if (circuitOpen.get() && System.currentTimeMillis() > timeCircuitOpenedOrWasLastTested + properties.circuitBreakerSleepWindowInMilliseconds().get()) {
        if (circuitOpenedOrLastTestedTime.compareAndSet(timeCircuitOpenedOrWasLastTested, System.currentTimeMillis())) {
            return true;
        }
    }
    return false;
}

可以看到,获取统计数据的关键就在 HealthCounts health = metrics.getHealthCounts();

而它的调用路径如下:

metrics.getHealthCounts

=》 healthCountsStream.getLatest

=》 BucketedCounterStream#getLatest

最终调用到BucketedCounterStream#getLatest

public Output getLatest() {
    startCachingStreamValuesIfUnstarted();
    if (counterSubject.hasValue()) {
        return counterSubject.getValue();
    } else {
        return getEmptyOutputValue();
    }
}
// 未开始则设置订阅,最后的数据通过counterSubject返回
public void startCachingStreamValuesIfUnstarted() {
    if (subscription.get() == null) {
        //Max 订阅
        Subscription candidateSubscription = observe().subscribe(counterSubject);
        if (subscription.compareAndSet(null, candidateSubscription)) {
        } else {
            candidateSubscription.unsubscribe();
        }
    }
}

这里的数据源是通过observe()获取的,这里的observe()被BucketedRollingCounterStream覆盖,返回聚合后的sourceStream。代码如下:

BucketedRollingCounterStream#observe

@Override
public Observable<Output> observe() {
    return sourceStream;
}

通过BucketedCounterStream#counterSubject来订阅BucketedRollingCounterStream#sourceStream,即最终统计出来的数据。

默认配置

默认配置都在HystrixCommandProperties中。

metrics.rollingStats.timeInMilliseconds

表示滑动窗口的时间(the duration of the statistical rolling window),默认10000(10s),也是熔断器计算的基本单位。

metrics.rollingStats.numBuckets

滑动窗口的Bucket数量(the number of buckets the rolling statistical window is divided into),默认10. 通过timeInMilliseconds和numBuckets可以计算出每个Bucket的时长。

circuitBreaker.errorThresholdPercentage

错误率阈值,表示达到熔断的条件。比如默认的50%,当一个滑动窗口内,失败率达到50%时就会触发熔断。

circuitBreaker.sleepWindowInMilliseconds

这个和熔断器自动恢复有关,为了检测后端服务是否恢复,可以放一个请求过去试探一下。sleepWindow指的发生熔断后,必须隔sleepWindow这么长的时间,才能放请求过去试探下服务是否恢复。默认是5s。

参考文章:

Hystrix 1.5 滑动窗口实现原理总结 | 「浮生若梦」 - sczyh30's blog

Spring Cloud 源码学习之 Hystrix 熔断器|springcloud,hystrix,熔断,滑动窗口|cyj

猜你喜欢

转载自blog.csdn.net/adolph09/article/details/127827437