Retry Mechanism For Reliable Systems
Why Do We Need To Have A Retry Mechanism?
As a game provider, occasionally failure can happen when our system makes a call to a tenant's backend service via the webhooks. These failures can come from a variety of factors. They include servers, networks, software, transient error, or event mistakes from system operators. So we support retry mechanism to reduce the probability of failure and ensure that the game session continues as intended.
Where Do We Setup Retry Mechanism?
Retry With Sidecar In Service Mesh
- We use Envoy as a Sidecar Proxy and configure retry on it
Although some http client libraries also provide a retry mechanism out of the box. But implement retry mechanism right & effectively not easy, and need to make sure every single games have to follow the same rules of retry take time & effort. So we decided to let the Envoy Proxy handle that part.
Retry mechanism in sequence diagram
Some configuration in Envoy
Name | Value | Description |
---|---|---|
retry_on | 5xx,gateway-error,connect-failure,reset | Envoy will attempt a retry if the upstream server responds with any 5xx response code, or does not respond at all (disconnect/reset/read timeout). |
num_retries | 4 | number of retries |
retry_back_off.base_interval | 0.1s | the base interval to be used for the next back off computation |
retry_back_off.max_interval | 1s | specifies the maximum interval between retries |
Support Retry Manual From The Back Office
We also support tool to retry manual
You can check a list of failed transactions from this site. Pressing the retry icon on the right side allows us to try again. And please notice that not every transaction will be permitted to attempt again, just a few of those with valid case can do retry if you will.
Preview retry request before sending
Retries history
Retry Rules
Here are all of the HTTP code if our game facing will fall into the case of retrying:
Code | Description |
---|---|
500 | Internal Server Error |
502 | Bad Gateway |
503 | Service Unavailable |
504 | Gateway Timeout |
1308 | Expired Tenant Player Token |
1309 | Player Inactive |