Understanding Event Subscription Retries

When implementing a message delivery system there are a few caveats that must be addressed in order to ensure stability, consistency, and good user experience. One of the shortcomings of a message delivery system is ensuring messages reach their destination successfully and knowing what to do when messages fail to arrive. 

Some integrations can accept failure of delivery, and then drop the message and move to the next message. In other integrations, failure to deliver a message cannot be ignored. For example, a financial integration might attempt to deliver a message, but instead receives an HTTP status code of 404, which indicates the server could not find the endpoint to which the message was to be delivered. In such cases a missing message could mean someone not being paid for their time or an organization going over budget on contracted resources.

Workfront Strategy for Event Subscription Retries

Because customers leverage the Workfront platform as a core piece of their daily knowledge work, the Workfront Event Subscription framework provides a mechanism to ensure that the delivery of each message is attempted to the fullest extent possible.

Currently, event-triggered outbound messages that fail delivery to customer endpoints are resent every 10 minutes until delivery is successful, for up to two hours. Customers need to ensure that any endpoints consuming outbound messages from Workfront Event Subscriptions are setup to return a “200 OK” message back to Workfront when delivery is successful.

NOTE Because Workfront Event Subscriptions is still developing as a product, some of the parameters and the values used in calculating message retry operations are subject to change. When changes like these are made, proper and timely communication of the changes will take place.

Handling Failed Event-Triggered Outbound Messages

The following flowchart shows the strategy for reattempting message deliveries with Workfront Event Subscriptions: 


The following explanations correspond with the steps depicted in the flowchart:

    1. Message fails to be delivered.
    2. Message delivery failure information is logged.

      All failed attempts to deliver a message are logged so that debugging may be performed to determine the root cause of a given failure or series of failures.

    3. Message attempt count is incremented.
    4. Message is placed onto the message retry queue.

      As shown in the preceding flowchart, the message queue used for processing message delivery retries is a separate queue from the one that processes the initial delivery attempt for each message. This allows the near real-time flow of messages to continue unimpeded by the failure of any subset of messages.

    5. Message is consumed from the message retry queue.
    6. Each message has a 10 minute buffer that must elapse between delivery attempts. 

      (Conditional) If the message has exceeded the maximum number of retry attempts (currently set to 12 to allow for at least two hours of retries), the message is considered a permanent failure and is dropped so that no further attempts to deliver the message are made. The process terminates here.

    7. (Conditional) If the message is ready to be retried, an HTTP request containing the message is executed against the configured URI of the matching subscription and then continues with Step 8.
      If the message is not ready to be retried, the current thread sleeps for 10 seconds, then restarts the process at Step 4.
    8. (Conditional) If the message delivery is successful, no further action is required.
      If the message delivery was unsuccessful, the process restarts at Step 1 as a new message retry.