Zuul/Ribbon/Hystrix 不在不同的实例上重试
Zuul/Ribbon/Hystrix not retrying on different instance
背景
我正在使用 Spring 云 Brixton.RC2,以及 Zuul 和 Eureka。
我有一个使用 @EnableZuulProxy
的网关服务和一个使用 status
方法的 book-service
。通过配置,我可以通过休眠一段定义的时间来模拟 status
方法的工作。
Zuul路由很简单
zuul.routes.foos.path=/foos/**
zuul.routes.foos.serviceId=reservation-service
我运行两个实例book-service
。当我将休眠时间设置为低于 Hystrix 超时阈值(1000 毫秒)时,我可以看到请求发送到图书服务的两个实例。这很好用。
问题
我知道如果 Hystrix 命令失败,Ribbon 应该可以在不同的服务器上重试该命令。这应该使失败对客户端透明。
看了Ribbon的配置,在Zuul中添加了如下配置:
zuul.routes.reservation-service.retryable=true //not sure which one to try
zuul.routes.foos.retryable=true //not sure which one to try
ribbon.MaxAutoRetries=0 // I don't want to retry on the same host, I also tried with 1 it doesn't work either
ribbon.MaxAutoRetriesNextServer=2
ribbon.OkToRetryOnAllOperations=true
现在我更新配置,让只有一个服务休眠超过1s,也就是说我有一个健康服务,一个坏服务。
当我调用网关时,调用会发送到两个实例,一半的调用 returns 为 500。在网关中,我看到 Hystrix 超时:
com.netflix.zuul.exception.ZuulException: Forwarding error
[...]
Caused by: com.netflix.hystrix.exception.HystrixRuntimeException: reservation-service timed-out and no fallback available.
[...]
Caused by: java.util.concurrent.TimeoutException: null
为什么 Ribbon 不在另一个实例上重试调用?
我是不是遗漏了什么?
参考资料
- 与此相关question(未解决)
- Ribbon configuration
- 据此commit Zuul 应该支持 Ribbon 的重试
Zuul 默认使用不允许设置超时的 SEMAPHORE 隔离策略。我一直无法通过这种策略使用负载平衡。对我有用的是(按照你的例子):
1) 将 Zuul 的隔离更改为 THREAD:
hystrix:
command:
reservation-service:
execution:
isolation:
strategy: THREAD
thread:
timeoutInMilliseconds: 100000
重要:timeoutInMilliseconds= 100000 就像说没有 HystrixTimeout。为什么?因为如果 Hystrix 超时,将不会有任何负载平衡(我只是用 timeoutInMilliseconds 测试它)
然后,将 Ribbon 的 ReadTimeout 配置为所需的值:
reservation-service:
ribbon:
ReadTimeout: 800
ConnectTimeout: 250
OkToRetryOnAllOperations: true
MaxAutoRetriesNextServer: 2
MaxAutoRetries: 0
在这种情况下,功能区中的 1 秒服务超时后,它将使用 500 毫秒服务重试
下面是我在 zuul 实例中得到的日志:
o.s.web.servlet.DispatcherServlet : DispatcherServlet with name 'dispatcherServlet' processing GET request for [/api/stories]
o.s.web.servlet.DispatcherServlet : Last-Modified value for [/api/stories] is: -1
c.n.zuul.http.HttpServletRequestWrapper : Path = null
c.n.zuul.http.HttpServletRequestWrapper : Transfer-Encoding = null
c.n.zuul.http.HttpServletRequestWrapper : Content-Encoding = null
c.n.zuul.http.HttpServletRequestWrapper : Content-Length header = -1
c.n.loadbalancer.ZoneAwareLoadBalancer : Zone aware logic disabled or there is only one zone
c.n.loadbalancer.LoadBalancerContext : storyteller-api using LB returned Server: localhost:7799 for request /api/stories
---> ATTEMPTING THE SLOW SERVICE
com.netflix.niws.client.http.RestClient : RestClient sending new Request(GET: ) http://localhost:7799/api/stories
c.n.http4.MonitoredConnectionManager : Get connection: {}->http://localhost:7799, timeout = 250
com.netflix.http4.NamedConnectionPool : [{}->http://localhost:7799] total kept alive: 1, total issued: 0, total allocated: 1 out of 200
com.netflix.http4.NamedConnectionPool : No free connections [{}->http://localhost:7799][null]
com.netflix.http4.NamedConnectionPool : Available capacity: 50 out of 50 [{}->http://localhost:7799][null]
com.netflix.http4.NamedConnectionPool : Creating new connection [{}->http://localhost:7799]
com.netflix.http4.NFHttpClient : Attempt 1 to execute request
com.netflix.http4.NFHttpClient : Closing the connection.
c.n.http4.MonitoredConnectionManager : Released connection is not reusable.
com.netflix.http4.NamedConnectionPool : Releasing connection [{}->http://localhost:7799][null]
com.netflix.http4.NamedConnectionPool : Notifying no-one, there are no waiting threads
--- HERE'S RIBBON'S TIMEOUT
c.n.l.reactive.LoadBalancerCommand : Got error com.sun.jersey.api.client.ClientHandlerException: java.net.SocketTimeoutException: Read timed out when executed on server localhost:7799
c.n.loadbalancer.ZoneAwareLoadBalancer : Zone aware logic disabled or there is only one zone
c.n.loadbalancer.LoadBalancerContext : storyteller-api using LB returned Server: localhost:9977 for request /api/stories
---> HERE IT RETRIES
com.netflix.niws.client.http.RestClient : RestClient sending new Request(GET: ) http://localhost:9977/api/stories
c.n.http4.MonitoredConnectionManager : Get connection: {}->http://localhost:9977, timeout = 250
com.netflix.http4.NamedConnectionPool : [{}->http://localhost:9977] total kept alive: 1, total issued: 0, total allocated: 1 out of 200
com.netflix.http4.NamedConnectionPool : Getting free connection [{}->http://localhost:9977][null]
com.netflix.http4.NFHttpClient : Stale connection check
com.netflix.http4.NFHttpClient : Attempt 1 to execute request
com.netflix.http4.NFHttpClient : Connection can be kept alive indefinitely
c.n.http4.MonitoredConnectionManager : Released connection is reusable.
com.netflix.http4.NamedConnectionPool : Releasing connection [{}->http://localhost:9977][null]
com.netflix.http4.NamedConnectionPool : Pooling connection [{}->http://localhost:9977][null]; keep alive indefinitely
com.netflix.http4.NamedConnectionPool : Notifying no-one, there are no waiting threads
o.s.web.servlet.DispatcherServlet : Null ModelAndView returned to DispatcherServlet with name 'dispatcherServlet': assuming HandlerAdapter completed request handling
o.s.web.servlet.DispatcherServlet : Successfully completed request
o.s.web.servlet.DispatcherServlet : DispatcherServlet with name 'dispatcherServlet' processing GET request for [/favicon.ico]
o.s.w.s.handler.SimpleUrlHandlerMapping : Matching patterns for request [/favicon.ico] are [/**/favicon.ico]
o.s.w.s.handler.SimpleUrlHandlerMapping : URI Template variables for request [/favicon.ico] are {}
o.s.w.s.handler.SimpleUrlHandlerMapping : Mapping [/favicon.ico] to HandlerExecutionChain with handler [ResourceHttpRequestHandler [locations=[ServletContext resource [/], class path resource [META-INF/resources/], class path resource [resources/], class path resource [static/], class path resource [public/], class path resource []], resolvers=[org.springframework.web.servlet.resource.PathResourceResolver@a0d875d]]] and 1 interceptor
o.s.web.servlet.DispatcherServlet : Last-Modified value for [/favicon.ico] is: -1
o.s.web.servlet.DispatcherServlet : Null ModelAndView returned to DispatcherServlet with name 'dispatcherServlet': assuming HandlerAdapter completed request handling
o.s.web.servlet.DispatcherServlet : Successfully completed request
背景
我正在使用 Spring 云 Brixton.RC2,以及 Zuul 和 Eureka。
我有一个使用 @EnableZuulProxy
的网关服务和一个使用 status
方法的 book-service
。通过配置,我可以通过休眠一段定义的时间来模拟 status
方法的工作。
Zuul路由很简单
zuul.routes.foos.path=/foos/**
zuul.routes.foos.serviceId=reservation-service
我运行两个实例book-service
。当我将休眠时间设置为低于 Hystrix 超时阈值(1000 毫秒)时,我可以看到请求发送到图书服务的两个实例。这很好用。
问题
我知道如果 Hystrix 命令失败,Ribbon 应该可以在不同的服务器上重试该命令。这应该使失败对客户端透明。
看了Ribbon的配置,在Zuul中添加了如下配置:
zuul.routes.reservation-service.retryable=true //not sure which one to try
zuul.routes.foos.retryable=true //not sure which one to try
ribbon.MaxAutoRetries=0 // I don't want to retry on the same host, I also tried with 1 it doesn't work either
ribbon.MaxAutoRetriesNextServer=2
ribbon.OkToRetryOnAllOperations=true
现在我更新配置,让只有一个服务休眠超过1s,也就是说我有一个健康服务,一个坏服务。
当我调用网关时,调用会发送到两个实例,一半的调用 returns 为 500。在网关中,我看到 Hystrix 超时:
com.netflix.zuul.exception.ZuulException: Forwarding error
[...]
Caused by: com.netflix.hystrix.exception.HystrixRuntimeException: reservation-service timed-out and no fallback available.
[...]
Caused by: java.util.concurrent.TimeoutException: null
为什么 Ribbon 不在另一个实例上重试调用?
我是不是遗漏了什么?
参考资料
- 与此相关question(未解决)
- Ribbon configuration
- 据此commit Zuul 应该支持 Ribbon 的重试
Zuul 默认使用不允许设置超时的 SEMAPHORE 隔离策略。我一直无法通过这种策略使用负载平衡。对我有用的是(按照你的例子):
1) 将 Zuul 的隔离更改为 THREAD:
hystrix:
command:
reservation-service:
execution:
isolation:
strategy: THREAD
thread:
timeoutInMilliseconds: 100000
重要:timeoutInMilliseconds= 100000 就像说没有 HystrixTimeout。为什么?因为如果 Hystrix 超时,将不会有任何负载平衡(我只是用 timeoutInMilliseconds 测试它)
然后,将 Ribbon 的 ReadTimeout 配置为所需的值:
reservation-service:
ribbon:
ReadTimeout: 800
ConnectTimeout: 250
OkToRetryOnAllOperations: true
MaxAutoRetriesNextServer: 2
MaxAutoRetries: 0
在这种情况下,功能区中的 1 秒服务超时后,它将使用 500 毫秒服务重试
下面是我在 zuul 实例中得到的日志:
o.s.web.servlet.DispatcherServlet : DispatcherServlet with name 'dispatcherServlet' processing GET request for [/api/stories]
o.s.web.servlet.DispatcherServlet : Last-Modified value for [/api/stories] is: -1
c.n.zuul.http.HttpServletRequestWrapper : Path = null
c.n.zuul.http.HttpServletRequestWrapper : Transfer-Encoding = null
c.n.zuul.http.HttpServletRequestWrapper : Content-Encoding = null
c.n.zuul.http.HttpServletRequestWrapper : Content-Length header = -1
c.n.loadbalancer.ZoneAwareLoadBalancer : Zone aware logic disabled or there is only one zone
c.n.loadbalancer.LoadBalancerContext : storyteller-api using LB returned Server: localhost:7799 for request /api/stories
---> ATTEMPTING THE SLOW SERVICE
com.netflix.niws.client.http.RestClient : RestClient sending new Request(GET: ) http://localhost:7799/api/stories
c.n.http4.MonitoredConnectionManager : Get connection: {}->http://localhost:7799, timeout = 250
com.netflix.http4.NamedConnectionPool : [{}->http://localhost:7799] total kept alive: 1, total issued: 0, total allocated: 1 out of 200
com.netflix.http4.NamedConnectionPool : No free connections [{}->http://localhost:7799][null]
com.netflix.http4.NamedConnectionPool : Available capacity: 50 out of 50 [{}->http://localhost:7799][null]
com.netflix.http4.NamedConnectionPool : Creating new connection [{}->http://localhost:7799]
com.netflix.http4.NFHttpClient : Attempt 1 to execute request
com.netflix.http4.NFHttpClient : Closing the connection.
c.n.http4.MonitoredConnectionManager : Released connection is not reusable.
com.netflix.http4.NamedConnectionPool : Releasing connection [{}->http://localhost:7799][null]
com.netflix.http4.NamedConnectionPool : Notifying no-one, there are no waiting threads
--- HERE'S RIBBON'S TIMEOUT
c.n.l.reactive.LoadBalancerCommand : Got error com.sun.jersey.api.client.ClientHandlerException: java.net.SocketTimeoutException: Read timed out when executed on server localhost:7799
c.n.loadbalancer.ZoneAwareLoadBalancer : Zone aware logic disabled or there is only one zone
c.n.loadbalancer.LoadBalancerContext : storyteller-api using LB returned Server: localhost:9977 for request /api/stories
---> HERE IT RETRIES
com.netflix.niws.client.http.RestClient : RestClient sending new Request(GET: ) http://localhost:9977/api/stories
c.n.http4.MonitoredConnectionManager : Get connection: {}->http://localhost:9977, timeout = 250
com.netflix.http4.NamedConnectionPool : [{}->http://localhost:9977] total kept alive: 1, total issued: 0, total allocated: 1 out of 200
com.netflix.http4.NamedConnectionPool : Getting free connection [{}->http://localhost:9977][null]
com.netflix.http4.NFHttpClient : Stale connection check
com.netflix.http4.NFHttpClient : Attempt 1 to execute request
com.netflix.http4.NFHttpClient : Connection can be kept alive indefinitely
c.n.http4.MonitoredConnectionManager : Released connection is reusable.
com.netflix.http4.NamedConnectionPool : Releasing connection [{}->http://localhost:9977][null]
com.netflix.http4.NamedConnectionPool : Pooling connection [{}->http://localhost:9977][null]; keep alive indefinitely
com.netflix.http4.NamedConnectionPool : Notifying no-one, there are no waiting threads
o.s.web.servlet.DispatcherServlet : Null ModelAndView returned to DispatcherServlet with name 'dispatcherServlet': assuming HandlerAdapter completed request handling
o.s.web.servlet.DispatcherServlet : Successfully completed request
o.s.web.servlet.DispatcherServlet : DispatcherServlet with name 'dispatcherServlet' processing GET request for [/favicon.ico]
o.s.w.s.handler.SimpleUrlHandlerMapping : Matching patterns for request [/favicon.ico] are [/**/favicon.ico]
o.s.w.s.handler.SimpleUrlHandlerMapping : URI Template variables for request [/favicon.ico] are {}
o.s.w.s.handler.SimpleUrlHandlerMapping : Mapping [/favicon.ico] to HandlerExecutionChain with handler [ResourceHttpRequestHandler [locations=[ServletContext resource [/], class path resource [META-INF/resources/], class path resource [resources/], class path resource [static/], class path resource [public/], class path resource []], resolvers=[org.springframework.web.servlet.resource.PathResourceResolver@a0d875d]]] and 1 interceptor
o.s.web.servlet.DispatcherServlet : Last-Modified value for [/favicon.ico] is: -1
o.s.web.servlet.DispatcherServlet : Null ModelAndView returned to DispatcherServlet with name 'dispatcherServlet': assuming HandlerAdapter completed request handling
o.s.web.servlet.DispatcherServlet : Successfully completed request