AIP-151

Long-running operations

Occasionally, an API may need to expose a method that takes a significant amount of time to complete. In these situations, it is often a poor user experience to simply block while the task runs; rather, it is better to return some kind of promise to the user and allow the user to check back in later.

The long-running operations pattern is roughly analogous to a Python Future, or a Node.js Promise. Essentially, the user is given a token that can be used to track progress and retrieve the result.

Guidance

Individual API methods that might take a significant amount of time to complete should return a google.longrunning.Operation object instead of the ultimate response message.

// Write a book.
rpc WriteBook(WriteBookRequest) returns (google.longrunning.Operation) {
  option (google.api.http) = {
    post: "/v1/{parent=publishers/*}/books}:write"
    body: "*"
  };
  option (google.longrunning.operation_info) = {
    response_type: "WriteBookResponse"
    metadata_type: "WriteBookMetadata"
  };
}
  • The response type must be google.longrunning.Operation. The Operation proto definition must not be copied into individual APIs.
  • The method must include a google.longrunning.operation_info annotation defining the response and metadata types.
    • The response and metadata types must be defined in the file where the RPC appears, or a file imported by that file.
    • If the response and metadata types are defined in another package, the fully-qualified message name must be used.
    • The response type should not be google.protobuf.Empty (except for Delete methods), unless it is certain that response data will never be needed. If response data might be added in the future, define an empty message for the RPC response and use that.
    • The metadata type should not be google.protobuf.Empty, unless it is certain that metadata will never be needed. If metadata might be added in the future, define an empty message for the RPC metadata and use that.
  • APIs with messages that return Operation must implement the Operations service. Individual APIs must not define their own interfaces for long-running operations to avoid inconsistency.

Note: User expectations can vary on what is considered “a significant amount of time” depending on what work is being done. A good rule of thumb is 10 seconds.

Standard methods

APIs may return an Operation from the Create, Update, or Delete standard methods if appropriate. In this case, the response type in the operation_info annotation must be the standard and expected response type for that standard method.

When creating or deleting a resource with a long-running operation, the resource should be included in List and Get calls; however, the resource should indicate that it is not usable, generally with a state enum.

Expiration

APIs may allow their operation resources to expire after sufficient time has elapsed after the operation completed.

Note: A good rule of thumb for operation expiry is 30 days.

Errors

Errors that prevent a long-running operation from starting must return an error response (AIP-193), similar to any other method.

Errors that occur over the course of an operation may be placed in the metadata message. The errors themselves must still be represented with a google.rpc.Status object.

Changelog

  • 2019-09-23: Added guidance on errors.
  • 2019-08-23: Added guidance about fully-qualified message names when the message name is in another package.
  • 2019-08-01: Changed the examples from “shelves” to “publishers”, to present a better example of resource ownership.