Reactive microservices in java translation <13> Stability and Resilience Patterns

Stability and Resilience Patterns

In distributed systems, dealing with failures is the number one factor, and you have to live with them. Your microservices must be aware that the services they call can fail for many reasons. Every interaction between microservices can fail in some way, and you need to prepare for that failure, which can take different forms, from various network errors to semantic errors.

Managing Failures in Reactive Microservices

Reactive microservices are responsible for managing local failures, and they must avoid propagating failures to another microservice. In other words, you shouldn't delegate "hot potatoes" to another microservice. Therefore, reactive microservices should be coded with failures as first-class citizens in mind.

The Vert.x development model makes fault the central entity. When developing models using callbacks, the handler typically receives an AsyncResult as a parameter. This structure encapsulates the result of an asynchronous operation. In case of success you can get the result, as for failure it contains a Throwable describing the reason for the failure:

When using the RxJava APIs, failure management can be done in the subscribe method:

If a failure occurs in one of the observed streams, the error handler is called. You can also handle the failure earlier, avoiding the error handler in the subscribe method:

Managing mistakes isn't fun, but it has to be done. The reactive microservice's code is responsible for making appropriate decisions when it encounters failures, and it also needs to be prepared to see its calls to other microservices fail.

 

Using Timeouts

When dealing with distributed interactions, we often use timeouts. A timeout is a simple mechanism that allows you to stop waiting for a response when you think it won't come. A set timeout provides fault isolation, ensures that failures are limited to the microservices it affects, and allows you to handle timeouts and execute in a degraded mode.

Timeouts are often used in conjunction with retries. When the timeout occurs, we can try again. Immediately after troubleshooting, there are some responses after failure, but some of these effects are beneficial. If it fails due to a major problem in the calling microservice, it may fail again if it is retried immediately. However some transient failures can be overcome by retries, especially network failures such as dropped messages. You can decide whether to retry the operation as follows:

Remember in a distributed system that a timeout does not mean the operation failed, the caller fails for many reasons. Let's take an example where you have two microservices, a and b. a is sending a request to b, but the response is not responded in time, a will time out. In this case, three types of failures can occur:
    1. The message between a and b is lost - the operation is not performed.

    The operation in 2.b failed -- the operation has not yet completed.

    3. The response message between b and a is lost - the operation has been performed successfully, but a did not get a response.

The last condition is often overlooked and can be harmful. In this case, combining timeouts with retries can compromise the integrity of the system. Retry can only be used with idempotent operations, that is, for operations you can call multiple times without changing the result after the initial call. Before using retries, always check that the system can handle retries gracefully.

Retrying also makes consumers wait longer for a response, which is not a good thing either. Rolling back is usually better than retrying the operation too many times. Also, constantly calling a failed service may not help it recover. These two problems are managed by another resiliency pattern: circuit breaker, which is covered later.

To be continued!

Original address:

https://developers.redhat.com/promotions/building-reactive-microservices-in-java/

If you have anything to discuss, you can add my WeChat public account:

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325363762&siteId=291194637