Skip to main content

Retrying deliveries

This documentation is focused for webhook senders but is also useful for the ones receiving them!

Sometimes the delivery of a webhook fails. This can happen due to several reasons. To list a few:

  • The receiver returned an invalid status code (like 4xx or 5xx status codes)
  • The receiver's servers are down
  • The DNS failed to resolve the consumer's host
  • The receiver's servers are experienced an unusual traffic spike and cannot handle more requests

The highest priority on such scenarios is to not drop any webhooks in order to avoid losing data. This is achieved by having a retry mechanism in place. Webhooks.uno automatically retries failed deliveries by default.

Nevertheless, you can control a few options of the retry mechanism in the WebhookDefinition.

The retry mechanism

When an attempt to deliver a webhook message to a receiver fails, the webhook message will be preserved and delivery will be performed again in the future. Parameters such as for how long delivery is attempted before being considered failed are configured by the WebhookDefinition.

A WebhookDefinition is associated to a Topic object. Thus, it is possible to have distinct retry behaviors for individual topics.

This retry mechanism is configured by two attributes of the WebhookDefinition. They are:

  • retry_wait_factor A multiplier value that controls for how long and how often delivery will be retried
  • retry_max_retries The maximum amount of retries before considering the delivery failed

Wait time

The time to wait between each attempt is calculated based on the retry_wait_factor attribute. The wait_time is calculated by the equation below. This time is calculated after a delivery attempt fails and it is the time, in seconds, to wait before attempting delivery again.

wait_time = (retry_wait_factor/100)*30 +
2**(attempt_counter*(retry_wait_factor/100)) +
rand(60)

where:

  • attempt_counter is the number of the attempt
  • retry_wait_factor a multiplier value
  • rand(60) is a random factor. An integer within 0 to 59 seconds
  • ** is the exponentiation operator

The value of attempt_counter starts from 1 and goes up to retry_max_retries. The first delivery attempt is not considered a retry, which means the first retry attempt only occurs after the webhook delivery failed at least once.

The retry_wait_factor must be an integer >= 10 and <= 200.

The following table shows the wait times for wait factors (denoted f) of 100 and 150. You can use this table as a reference when adjusting your values. The random factor is considered zero.

To experiment with different values, check out this Repl.it.

Attemptf=100 waitf=100 accum time*f=150 waitf=150 accum time*
132s32s48s48s
234s1m 6s53s1m 41s
338s1m 44s1m 8s2m 49s
446s2m 30s1m 49s4m 38s
51m 2s3m 32s3m 47s8m 25s
61m 34s5m 6s9m 17s17m 42s
72m 38s7m 44s24m 54s42m 36s
84m 46s12m 30s1h 9m1h 51m
99m 2s21m 32s3h 13m5h 5m
1017m 34s39m 6s9h 6m14h 12m
1134m 38s1h 13m1d 1h1d 15h
121h 8m2h 22m3d 0h4d 16h
132h 17m4h 39m8d 13h13d 6h
144h 33m9h 13m24d 6h37d 13h
159h 6m18h 19m68d 15h106d 5h

The wait columns show the time to wait for the next retry after each attempt. The accum time shows the accumulated waited time since the first delivery was attempted.

Maximum retry attempts

Retrying delivery forever is probably not a good idea. Thus, the maximum amount of attempts that will be performed is controlled by the retry_max_retries attribute.

As an example, let's say that retry_max_retries = 2. In this case, the webhook message will be attempted once (the first time) and two more times (two retries).

If retry_max_retries is zero, then there will be no retries. That is, the initial delivery attempt will be performed. In case it fails, no more attempts will ever be performed.

Reliability considerations

When adjusting the retry_* attributes of a WebhookDefinition, keep in mind that webhook messages that are waiting to be delivered are kept in the Redis database of webhooks.uno.

Since Redis uses RAM as its storage mechanism, having long waiting times could mean your Redis deployment can run out of memory.