SpringCloud的各种超时时间配置效果

1. 前言(以下的springcloud版本是Dalston.RC1)

Springcloud框架中,超时时间的设置通常有三个层面:

1) zuul网关
用指定 url 进行路由时，使用下面的方式

#默认1000
zuul.host.socket-timeout-millis=2000
#默认2000
zuul.host.connect-timeout-millis=4000

用 serviceId 进行路由时，使用 ribbon.ReadTimeout 和 ribbon.SocketTimeout 设置

2) ribbon

ribbon:
  OkToRetryOnAllOperations: false #对所有操作请求都进行重试,默认false
  ReadTimeout: 5000   #负载均衡超时时间，默认值5000
  ConnectTimeout: 3000 #ribbon请求连接的超时时间，默认值2000
  MaxAutoRetries: 0     #对当前实例的重试次数，默认0
  MaxAutoRetriesNextServer: 1 #对切换实例的重试次数，默认1

注意：

当OkToRetryOnAllOperations设置为false时，只会对get请求进行重试。如果设置为true，便会对所有的请求进行重试，如果是put或post等写操作，如果服务器接口没做幂等性，会产生不好的结果，所以OkToRetryOnAllOperations慎用。
默认情况下,GET方式请求无论是连接异常还是读取异常,都会进行重试；非GET方式请求,只有连接异常时,才会进行重试
在使用 Zuul 的服务路由时，如果要关闭上面的重试机制，可以通过下面的两个参数来进行设置：

zuul.retryable=false
zuul.routes.<route>.retryable=false

其中，zuul.retryable 用来全局关闭重试机制，而 zuul.routes.<route>.retryable=false 则是指定路由关闭重试机制。

3) 熔断器Hystrix

hystrix:
  command:
    default:  #default全局有效，service id指定应用有效
      execution:
        timeout:
          #如果enabled设置为false，则请求超时交给ribbon控制,为true,则超时作为熔断根据
          enabled: true
        isolation:
          thread:
            timeoutInMilliseconds: 1000 #断路器超时时间，默认1000ms

feign.hystrix.enabled: true

2.测试各个配置的效果

这里我开了一个Eureka服务中心

开了两个个服务eureka-client,端口分别为8087和8088,进行负载均衡

开了一个服务eureka-feign去调用eureka-client的方法,模拟eureka-client处理时间过长的时候出现的情况

eureka-client的方法:

  /**
   * 测试重试时间
   *
   * @return
   */
  @RequestMapping("/timeOut")
  public String timeOut(@RequestParam int mills) {
    log.info("[client服务-{}] [timeOut方法]收到请求,阻塞{}ms", port, mills);
    try {
      Thread.sleep(mills);
    } catch (InterruptedException e) {
      e.printStackTrace();
    }
    log.info("[client服务-{}] [timeOut]返回请求",port);
    return String.format("client服务-%s 请求ok!!!", port);
  }

eureka-feign调用client的方法,通过传参数mills来控制client线程休眠的时间

    /**
     * 测试重试时间
     * @return
     */
    @RequestMapping("/timeOut")
    public String timeOut(@RequestParam int mills){
        log.info("开始调用");
        return feignService.timeOut( mills );
    }

service:

    /**
     * 测试springcloud的超时机制
     * @param mills
     * @return
     */
    @RequestMapping(value = "/timeOut",method = RequestMethod.GET)
    String timeOut(@RequestParam(value = "mills") int mills);

测试1

ribbon:
  OkToRetryOnAllOperations: false #对所有操作请求都进行重试,默认false
  ReadTimeout: 1000   #负载均衡超时时间，默认值5000
  ConnectTimeout: 3000 #ribbon请求连接的超时时间，默认值2000
  MaxAutoRetries: 0     #对当前实例的重试次数，默认0
  MaxAutoRetriesNextServer: 1 #对切换实例的重试次数，默认1

hystrix:
  command:
    default:  #default全局有效，service id指定应用有效
      execution:
        timeout:
          #如果enabled设置为false，则请求超时交给ribbon控制,为true,则超时作为熔断根据
          enabled: true
        isolation:
          thread:
            timeoutInMilliseconds: 5000 #断路器超时时间，默认1000ms

测试 900ms

client服务-8081 请求ok!!!

2019-04-25 23:07:35.823  INFO 10088 --- [nio-8081-exec-6] c.j.l.s.e.c.controller.HelloController   : [client服务-8081] [timeOut方法]收到请求,阻塞900ms
2019-04-25 23:07:36.725  INFO 10088 --- [nio-8081-exec-6] c.j.l.s.e.c.controller.HelloController   : [client服务-8081] [timeOut]返回请求

请求正常.

测试 2000ms

熔断

2019-04-25 23:10:54.005  INFO 15260 --- [nio-8083-exec-1] c.j.l.s.f.c.c.ConsumerController         : 开始调用
2019-04-25 23:10:54.210 DEBUG 15260 --- [hello-service-1] c.j.l.s.f.consumer.service.HelloService  : [HelloService#timeOut] ---> GET http://hello-service/hello-service/timeOut?mills=2000 HTTP/1.1
2019-04-25 23:10:54.210 DEBUG 15260 --- [hello-service-1] c.j.l.s.f.consumer.service.HelloService  : [HelloService#timeOut] ---> END HTTP (0-byte body)
2019-04-25 23:10:54.215  INFO 15260 --- [hello-service-1] s.c.a.AnnotationConfigApplicationContext : Refreshing SpringClientFactory-hello-service: startup date [Thu Apr 25 23:10:54 CST 2019]; parent: org.springframework.boot.web.servlet.context.AnnotationConfigServletWebServerApplicationContext@455b6df1
2019-04-25 23:10:54.304  INFO 15260 --- [hello-service-1] f.a.AutowiredAnnotationBeanPostProcessor : JSR-330 'javax.inject.Inject' annotation found and supported for autowiring
2019-04-25 23:10:54.628  INFO 15260 --- [hello-service-1] c.netflix.config.ChainedDynamicProperty  : Flipping property: hello-service.ribbon.ActiveConnectionsLimit to use NEXT property: niws.loadbalancer.availabilityFilteringRule.activeConnectionsLimit = 2147483647
2019-04-25 23:10:54.657  INFO 15260 --- [hello-service-1] c.n.u.concurrent.ShutdownEnabledTimer    : Shutdown hook installed for: NFLoadBalancer-PingTimer-hello-service
2019-04-25 23:10:54.690  INFO 15260 --- [hello-service-1] c.netflix.loadbalancer.BaseLoadBalancer  : Client: hello-service instantiated a LoadBalancer: DynamicServerListLoadBalancer:{NFLoadBalancer:name=hello-service,current list of Servers=[],Load balancer stats=Zone stats: {},Server stats: []}ServerList:null
2019-04-25 23:10:54.701  INFO 15260 --- [hello-service-1] c.n.l.DynamicServerListLoadBalancer      : Using serverListUpdater PollingServerListUpdater
2019-04-25 23:10:54.734  INFO 15260 --- [hello-service-1] c.netflix.config.ChainedDynamicProperty  : Flipping property: hello-service.ribbon.ActiveConnectionsLimit to use NEXT property: niws.loadbalancer.availabilityFilteringRule.activeConnectionsLimit = 2147483647
2019-04-25 23:10:54.737  INFO 15260 --- [hello-service-1] c.n.l.DynamicServerListLoadBalancer      : DynamicServerListLoadBalancer for client hello-service initialized: DynamicServerListLoadBalancer:{NFLoadBalancer:name=hello-service,current list of Servers=[localhost:8080, localhost:8081],Load balancer stats=Zone stats: {defaultzone=[Zone:defaultzone;  Instance count:2;   Active connections count: 0;    Circuit breaker tripped count: 0;   Active connections per server: 0.0;]
},Server stats: [[Server:localhost:8081;    Zone:defaultZone;   Total Requests:0;   Successive connection failure:0;    Total blackout seconds:0;   Last connection made:Thu Jan 01 08:00:00 CST 1970;  First connection made: Thu Jan 01 08:00:00 CST 1970;    Active Connections:0;   total failure count in last (1000) msecs:0; average resp time:0.0;  90 percentile resp time:0.0;    95 percentile resp time:0.0;    min resp time:0.0;  max resp time:0.0;  stddev resp time:0.0]
, [Server:localhost:8080;   Zone:defaultZone;   Total Requests:0;   Successive connection failure:0;    Total blackout seconds:0;   Last connection made:Thu Jan 01 08:00:00 CST 1970;  First connection made: Thu Jan 01 08:00:00 CST 1970;    Active Connections:0;   total failure count in last (1000) msecs:0; average resp time:0.0;  90 percentile resp time:0.0;    95 percentile resp time:0.0;    min resp time:0.0;  max resp time:0.0;  stddev resp time:0.0]
]}ServerList:org.springframework.cloud.netflix.ribbon.eureka.DomainExtractingServerList@49af8c14
2019-04-25 23:10:55.708  INFO 15260 --- [erListUpdater-0] c.netflix.config.ChainedDynamicProperty  : Flipping property: hello-service.ribbon.ActiveConnectionsLimit to use NEXT property: niws.loadbalancer.availabilityFilteringRule.activeConnectionsLimit = 2147483647
2019-04-25 23:10:56.829 DEBUG 15260 --- [hello-service-1] c.j.l.s.f.consumer.service.HelloService  : [HelloService#timeOut] <--- ERROR SocketTimeoutException: Read timed out (2621ms)
2019-04-25 23:10:56.833 DEBUG 15260 --- [hello-service-1] c.j.l.s.f.consumer.service.HelloService  : [HelloService#timeOut] java.net.SocketTimeoutException: Read timed out
    at java.net.SocketInputStream.socketRead0(Native Method)
    ...

2019-04-25 23:10:56.835 DEBUG 15260 --- [hello-service-1] c.j.l.s.f.consumer.service.HelloService  : [HelloService#timeOut] <--- END ERROR
熔断

接着测试4000ms, 6000都熔断了

测试2

更换两个超时时间:

ReadTimeout: 3000   #负载均衡超时时间，默认值5000
ConnectTimeout: 1000 #ribbon请求连接的超时时间，默认值2000

ribbon:
  OkToRetryOnAllOperations: false #对所有操作请求都进行重试,默认false
  ReadTimeout: 3000   #负载均衡超时时间，默认值5000
  ConnectTimeout: 1000 #ribbon请求连接的超时时间，默认值2000
  MaxAutoRetries: 0     #对当前实例的重试次数，默认0
  MaxAutoRetriesNextServer: 1 #对切换实例的重试次数，默认1

hystrix:
  command:
    default:  #default全局有效，service id指定应用有效
      execution:
        timeout:
          #如果enabled设置为false，则请求超时交给ribbon控制,为true,则超时作为熔断根据
          enabled: true
        isolation:
          thread:
            timeoutInMilliseconds: 5000 #断路器超时时间，默认1000ms

测试2000ms:

成功了

client服务-8081 请求ok!!!

调用4000ms

熔断了

测试6000ms也是熔断

可见ReadTimeout和ConnectTimeout,当调用某个服务等待时间过长的时候, 对超时报错/熔断生效的是ReadTimeout, ConnectTimeout则表示连接服务的时间,一般不用配置太久,1~2秒左右就可以了

测试3

现在来测试ReadTimeout和timeoutInMilliseconds谁起作用

测试2中的配置如下:

ReadTimeout: 3000   #负载均衡超时时间，默认值5000
ConnectTimeout: 1000 #ribbon请求连接的超时时间，默认值2000
timeoutInMilliseconds: 5000 #断路器超时时间，默认1000ms

在4000ms熔断了,2000ms正常,说明是ReadTimeout生效, 现在换成:

ReadTimeout: 5000   #负载均衡超时时间，默认值5000
  ConnectTimeout: 1000 #ribbon请求连接的超时时间，默认值2000
 timeoutInMilliseconds: 3000 #断路器超时时间，默认1000ms

ribbon:
  OkToRetryOnAllOperations: false #对所有操作请求都进行重试,默认false
  ReadTimeout: 5000   #负载均衡超时时间，默认值5000
  ConnectTimeout: 1000 #ribbon请求连接的超时时间，默认值2000
  MaxAutoRetries: 0     #对当前实例的重试次数，默认0
  MaxAutoRetriesNextServer: 1 #对切换实例的重试次数，默认1

hystrix:
  command:
    default:  #default全局有效，service id指定应用有效
      execution:
        timeout:
          #是否开启超时熔断
          enabled: true
        isolation:
          thread:
            timeoutInMilliseconds: 3000 #断路器超时时间，默认1000ms

feign.hystrix.enabled: true

2000ms 正常

client服务-8081 请求ok!!!

4000ms 熔断

熔断了

说明熔断器timeoutInMilliseconds: 3000起作用了

测试4

这里再测一个配置:

这个enable如果为false, 则表示熔断器不根据自己配置的超时时间进行熔断,这样的话就会受到ribbon的ReadTimeout配置的影响了,超过这个时间,eureka-feign会抛出timeout的异常,这个时候熔断器就会因为这个异常而进行熔断

hystrix:
  command:
    default:  #default全局有效，service id指定应用有效
      execution:
        timeout:
          #是否开启超时熔断
          enabled: false

测试4000ms 正常

client服务-8081 请求ok!!!

测试6000ms 熔断. 此处是因为ribbon的ReadTimeout: 5000

熔断了

3.总结

由上面的测试可以得出:

如果hystrix.command.default.execution.timeout.enabled为true,则会有两个执行方法超时的配置,一个就是ribbon的ReadTimeout,一个就是熔断器hystrix的timeoutInMilliseconds, 此时谁的值小谁生效
如果hystrix.command.default.execution.timeout.enabled为false,则熔断器不进行超时熔断,而是根据ribbon的ReadTimeout抛出的异常而熔断,也就是取决于ribbon
ribbon的ConnectTimeout,配置的是请求服务的超时时间,除非服务找不到,或者网络原因,这个时间才会生效
ribbon还有MaxAutoRetries对当前实例的重试次数,MaxAutoRetriesNextServer对切换实例的重试次数, 如果ribbon的ReadTimeout超时,或者ConnectTimeout连接超时,会进行重试操作
由于ribbon的重试机制,通常熔断的超时时间需要配置的比ReadTimeout长,ReadTimeout比ConnectTimeout长,否则还未重试,就熔断了
为了确保重试机制的正常运作,理论上（以实际情况为准）建议hystrix的超时时间为:(1 + MaxAutoRetries + MaxAutoRetriesNextServer) * ReadTimeout

本文转自：简单谈谈什么是Hystrix，以及SpringCloud的各种超时时间配置效果，和简单谈谈微服务优化