Webhooks: How to Design for Missed Events
Webhooks are the backbone of real-time integrations - but they are not perfectly reliable. Network failures, server downtime, and rate limiting mean webhooks can be missed or arrive out of order. Here is how to design systems that stay consistent even when webhooks fail.
Why Shopify Webhooks Can Fail
Shopify webhooks are highly reliable, but they operate under best-effort delivery semantics. They are not guaranteed. Here are common failure modes:
1. Endpoint Downtime
Your webhook endpoint might be down during deployments, infrastructure failures, or scaling events. Shopify attempts delivery and retries a few times with exponential backoff. If all attempts fail within the retry window, the event is lost.
2. Network Timeouts
Shopify has timeout limits for webhook delivery (typically 5 seconds). If your endpoint does not respond within this window, the delivery fails and may retry - or may not, depending on where in the retry cycle it occurred.
3. Rate Limiting
If your endpoint receives more webhooks than it can handle, you might return 429 (Too Many Requests) or drop connections. Shopify interprets this as delivery failure and retries - which may compound the problem.
4. Out-of-Order Delivery
Webhooks can arrive out of order. An order updated webhook might arrive before the order created webhook. This is especially common during high traffic or when retries are involved.
Design Pattern 1: Webhook + Polling Hybrid
Do not rely solely on webhooks. Use webhooks for real-time notifications, but have a polling mechanism as a fallback to catch missed events.
Design Pattern 2: Event Replay
Store all webhook payloads in an event log before processing. If you discover missed events or processing errors, you can replay events from the log.
Design Pattern 3: Idempotency Keys
Process every webhook payload with an idempotency key to ensure duplicate deliveries or replays do not cause duplicate side effects.
Design Pattern 4: Delivery Receipts
Track which webhooks you have successfully processed and alert on gaps.
Design Pattern 5: Ordering Guarantees
Handle out-of-order webhooks gracefully by tracking entity versions or timestamps.
Conclusion
Webhooks are powerful but imperfect. Production systems must assume webhooks will fail and design accordingly. The patterns above are not optional extras - they are reliability requirements. Implement them from day one, not after your first production incident.
Need help with this?
We have built these patterns into production systems for dozens of merchants. See how we can help you implement them.
Related posts
The Integration Reliability Checklist
A practical checklist for building reliable integrations. Learn the essential patterns for idempotency, retries, dead-letter queues, and monitoring that prevent production failures.
Why Shopify Integrations Drift (And How to Reconcile)
Integration drift is inevitable. Learn why systems fall out of sync and how reconciliation jobs detect and correct discrepancies before they impact customers.