Related concepts triggered by hystrix circuit breakers

1 Problems faced by distributed systems

  1. When a service avalanche
    is called between multiple microservices, suppose that microservice A calls microservice B and microservice C, and microservice B and microservice C call other microservices. This is called "fan-out". The call response time of a microservice on the outgoing link is too long or unavailable, and the call to microservice A will occupy more and more system resources, which will cause the system to crash. The so-called "avalanche effect" is for high For traffic applications, a single back-end dependency may cause all resources on all servers to be saturated within a few seconds. Worse than failure, these applications may also cause increased latency between services, strain on backup queues, threads, and other system resources, leading to more cascading failures in the entire system. These all indicate the need to isolate and manage failures and delays so that the failure of a single dependency cannot cancel the entire application or system. Therefore, usually when you find that an instance under a module fails, this module still receives traffic at this time, and then the problematic module also calls other modules, which will cause a cascade failure, or avalanche .

2 What is hystrix

Hystrix is ​​an open source library used to deal with the delay and fault tolerance of distributed systems. In distributed systems, many dependencies will inevitably fail to call, such as timeouts, exceptions, etc.
Hystrix can guarantee that in the case of a dependency problem, Will not cause overall service failure, avoid cascading failures, and improve the resilience of distributed systems.
The "circuit breaker" itself is a kind of switching device. When a service unit fails, through the fault monitoring of the circuit breaker (similar to a blown fuse), it returns an
expected and processable alternative response (FallBack) to the caller .
Instead of waiting for a long time or throwing exceptions that the caller cannot handle, it ensures that the service caller's thread will not
be occupied for a long time and unnecessarily, thus avoiding the spread of faults in the distributed system, and even an avalanche .

What can you do
https://github.com/Netflix/Hystrix/wiki/How-To-Use

1. 服务降级
2. 服务熔断
3. 实时监控

3 The difference between service degradation, fusing and current limiting

Service degradation The
server is busy, please try again later, do not let the client wait and immediately return a friendly prompt, fallback

What conditions will trigger a downgrade:

  • time out
  • abnormal
  • Downtime
  • Service fusing
  • Thread pool/semaphore is full

Service blown The
fuse is closed, opened, and ajar.

类比保险丝达到最大服务访问后,直接拒绝访问,拉闸限电,
然后调用服务降级的方法并返回友好提示

服务的降级->进而熔断->恢复调用链路

Under what circumstances does the circuit breaker start to work

Three important parameters related to the circuit breaker: snapshot time window, total number of requests threshold, and error percentage threshold.
1: Snapshot time window: To determine whether the circuit breaker is turned on, some requests and error data need to be counted. The statistical time range is the snapshot time window, which is the latest 10 seconds by default.
2: Threshold of the total number of requests: Within the snapshot time window, the threshold of the total number of requests must be met to be eligible for fusing. The default value is 20, which means that within 10 seconds, if the hystrix command is called less than 20 times,
even if all requests time out or fail for other reasons, the circuit breaker will not open.
3: Error percentage threshold: When the total number of requests exceeds the threshold within the snapshot time window, for example, 30 calls occur, if out of these 30 calls, there are 15 timeout exceptions, that is, more than
50% errors Percentage, under the default setting of 50% threshold, the circuit breaker will be opened at this time.


Conditions for opening or closing the circuit breaker

1. 当满足一定阀值的时候(默认10秒内超过20个请求次数)
2. 当失败率达到一定的时候(默认10秒内超过50%请求失败)
3. 到达以上阀值,断路器将会开启
4. 当开启的时候,所有请求都不会进行转发
5. 一段时间之后(默认是5秒),这个时候断路器是半开状态,会让其中一个请求进行转发。如果成功,断路器会关闭,若失败,继续开启。重复4和5

After the circuit breaker is turned on

1:再有请求调用的时候,将不会调用主逻辑,而是直接调用降级fallback.通过断路器,实现了自动地发现错误并将降级逻辑切换为主逻辑,减少响
应延迟的效果。
2:原来的主逻辑要如何恢复呢?
对于这- -问题,hystrix也为我们实现了 自动恢复功能。
当断路器打开,对主逻辑进行熔断之后,
hystrix会启动一个休眠时间窗,在这个时间窗内,
降级逻辑是临时的成为主逻辑,

当休眠时间窗到期,断路器将进入半开状态,
释放一次请求到原来的主逻辑上,如果此次请求正常返回,
那么断路器将继续闭合,
主逻辑恢复,如果这次请求依然有问题,
断路器继续进入打开状态,休眠时间窗重新计时。


Service current limit

秒杀高并发等操作,严禁一窝蜂的过来拥挤,大家排队,一秒钟N个,有序进行

hystrix dashboard

除了隔离依赖服务的调用以外,Hystrix还提供 了准实时的调用监控(Hystrix Dashboard),Hystrix会持续地记录所有通过Hystrix发
起的请求的执行信息,拟统计报表和图形的形式展示给用户,包括每秒执行多少请求多少成功,多少失败等。Netflix通过
hystrix-metrics-event-stream项目实现了对以上指标的监控。
Spring Cloud也提供了Hystrix Dashboard的整合,对监控内容转化成
可视化界面。

Guess you like

Origin blog.csdn.net/qq_44783283/article/details/111185499