This AIP is currently under review. This means that the editors have read it and are in high-level concurrence, and while it is not yet fully approved, we have a good faith expectation that the AIP will be approved in something close to its current state.

AIP-194

Retries for gRPC clients

RPCs sometimes fail. When one does, the client performing the RPC needs to know whether it is safe to retry the operation. When status codes are used consistently across multiple APIs, clients can respond to failures appropriately.

Guidance

Clients should retry requests for which repeated runs would not cause unintended state changes, which are non-transactional, and which are unary.

Clients should not retry transactional requests; instead these requests should have application-level retry logic that retries the entire transaction block from the start.

Clients should not retry requests in which repeated runs would cause unintended state changes.

Note: This AIP does not cover client streaming or bi-directional streaming.

Retryable codes

The following codes should be retried for retryable requests:

  • UNAVAILABLE: This code generally results from network hiccups, and is generally transient. It is retryable under the expectation that the connection will become available (soon).

Appendix

Non-retryable codes

The following codes should not be retried for any request:

  • OK: The request succeeded.
  • CANCELLED: An application can cancel a request, which must be honored.
  • DEADLINE_EXCEEDED: An application can set a deadline, which must be honored.
  • INVALID_ARGUMENT: Retrying a request with an invalid argument will never succeed.
  • DATA_LOSS: This is an unrecoverable error and must immediately be surfaced to the application.

Generally non-retryable codes:

The following codes generally should not be retried for any request:

  • RESOURCE_EXHAUSTED: This code may be a signal that quota is exhausted. Retries therefore may not be expected to work for several hours; meanwhile the retries may have billing implications. If RESOURCE_EXHAUSTED is used for other reasons than quota and the expected time for the resource to become available is much shorter, it may be retryable.
  • INTERNAL: This code generally means that some internal part of the system has failed, and usually means a bug should be filed against the system. These should immediately be surfaced to the application.
  • UNKNOWN: Unlike INTERNAL, this code is reserved for truly unknown-to-the-system errors, and therefore may not be safe to retry. These should immediately be surfaced to the application.
  • ABORTED: This code typically means that the request failed due to a sequencer check failure or transaction abort. These should not be retried for an individual request; they should be retried at a level higher (the entire transaction, for example).

Some codes may be retried if a system is designed without synchronization or signaling between various components. For example, client might retry NOT_FOUND on a read operation, which is designed to hang forever until the resource is created. However, these types of systems are generally not encouraged.

Therefore, the following codes should not be retried for any request:

  • NOT_FOUND: A client should not retry until a resource is created.
  • ALREADY_EXISTS: A client should not retry until a resource is deleted.
  • PERMISSION_DENIED: A client should not retry until it has permission.
  • UNAUTHORIZED: A client should not retry until it is authorized.
  • UNAUTHENTICATED: A client should not retry until it is authenticated.
  • FAILED_PRECONDITION: A client should not retry until system state changes.
  • OUT_OF_RANGE: A client should not retry until the range is extended.
  • UNIMPLEMENTED: A client should not retry until the RPC is implemented.

Further reading

  • For parallel or retried request disambiguation, see AIP-154.