Skip to main content

Retry Mechanism For Reliable Systems

· 3 min read
Tony
Engineer@Nautilus Games

Why Do We Need To Have A Retry Mechanism?

As a game provider, occasionally failure can happen when our system makes a call to a tenant's backend service via the webhooks. These failures can come from a variety of factors. They include servers, networks, software, transient error, or event mistakes from system operators. So we support retry mechanism to reduce the probability of failure and ensure that the game session continues as intended.

Where Do We Setup Retry Mechanism?

Retry With Sidecar In Service Mesh

  • We use Envoy as a Sidecar Proxy and configure retry on it
info

Although some http client libraries also provide a retry mechanism out of the box. But implement retry mechanism right & effectively not easy, and need to make sure every single games have to follow the same rules of retry take time & effort. So we decided to let the Envoy Proxy handle that part.

Retry mechanism in sequence diagram

Overall Flow

Some configuration in Envoy

NameValueDescription
retry_on5xx,gateway-error,connect-failure,resetEnvoy will attempt a retry if the upstream server responds with any 5xx response code, or does not respond at all (disconnect/reset/read timeout).
num_retries4number of retries
retry_back_off.base_interval0.1sthe base interval to be used for the next back off computation
retry_back_off.max_interval1sspecifies the maximum interval between retries

Support Retry Manual From The Back Office

We also support tool to retry manual

You can check a list of failed transactions from this site. Pressing the retry icon on the right side allows us to try again. And please notice that not every transaction will be permitted to attempt again, just a few of those with valid case can do retry if you will.

Retry Manual Tool

Preview retry request before sending

Preview Retry

Retries history

Retries History

Retry Rules

Here are all of the HTTP code if our game facing will fall into the case of retrying:

CodeDescription
500Internal Server Error
502Bad Gateway
503Service Unavailable
504Gateway Timeout
1308Expired Tenant Player Token
1309Player Inactive