Webhooks

How to Monitor Webhooks in Production (And Catch Failures Before They Break Your App)

6 min readUpdated March 17, 2026

Webhook integrations often look fine in development. Production is where delivery failures, schema drift, and integration breakage show up. Failed deliveries, retries, delayed events, payload changes. By the time someone notices a problem, the useful context is often buried in logs or already gone.

Webhook testing tools help you simulate events during development. The real problems show up in production. Events fail to deliver. Payloads change without notice. Volume drops and you don't notice until a user complains. Most tooling is built for testing, not for ongoing monitoring of live traffic.

Webhook monitoring in production means capturing real webhook events, tracking schema changes over time, and alerting when integrations break. This guide covers what to watch for and how to catch failures early.

For more on how to monitor webhook retries and failures — delivery attempts, retries, and silence — see our companion guide.

Why webhook monitoring matters

Webhooks are external and asynchronous. Unlike a normal request/response flow where you get an immediate error or success, webhook failures are harder to spot. The provider sends a request; your endpoint may or may not process it correctly. A webhook can return 200 and still leave you with broken downstream behavior.

Without monitoring, you often find out something went wrong when a user reports it or a batch job fails. By then, the original event may be long gone.

What can go wrong with webhooks in production

In practice, production webhook issues fall into a few buckets:

Failed deliveries — The provider retries and eventually gives up. Your endpoint was down, slow, or returned non-2xx. You may not notice until you check the provider's dashboard.
Delayed or retried events — Events arrive out of order or with significant delay. Logic that assumes ordering can break.
Duplicated events — Retries can deliver the same event more than once. Idempotency matters; without it you risk double-processing.
Missing fields — The provider changed the payload. A field your code expects is gone. Handler logic that assumes it exists will throw.
Type changes — A field that was a number is now a string, or vice versa. Serialization, formatting, or calculations break.
Handler logic that silently fails — The webhook returns 200, but your handler throws after responding, or a downstream step fails. The provider thinks it succeeded; you have inconsistent state.

I've seen teams spend hours tracing a bug that turned out to be a field that quietly disappeared from the payload three weeks ago. By the time someone notices, the original event may already be buried in logs.

What to monitor for

Useful monitoring covers several dimensions. Delivery success and failure rates tell you whether events are reaching your endpoint. Retry patterns — how often providers retry, and whether retries eventually succeed — indicate reliability. Volume changes matter too: a sudden drop in events might mean the provider stopped sending, or your endpoint is rejecting them.

Event type distribution helps spot anomalies. If you normally receive a mix of invoice.paid and customer.updated and one type disappears, something changed. Payload structure changes are harder to catch but often the root cause of subtle bugs. Unusual error spikes — handler exceptions, validation failures, downstream timeouts — round out the picture.

Logging vs structured webhook monitoring

Raw logging is a reasonable starting point. Log each incoming webhook, maybe the event type and a few key fields. When something breaks, you grep the logs and hope the event is still there:

grep "invoice.payment_succeeded" app.log

Searching logs is painful when you need to compare events over time or understand patterns across retries and failures. Did the payload structure change between last week and today? Which events failed and why? Log aggregation helps, but unstructured logs don't answer schema-level questions.

Structured webhook monitoring captures events in a queryable form. You store the payload (or its schema), event metadata, delivery status, and timestamps. That lets you diff schemas, track failure rates per event type, and spot drift before it causes production issues.

Detecting schema changes over time

One of the harder production problems is schema drift. The request still arrives. It parses. But the payload shape has changed enough to break assumptions — a field removed, a type changed, a nested object flattened. Your handler may return 200 while downstream logic fails.

Before

data.object.currency
data.object.amount_due

After

data.object.amount_due

Detecting this requires comparing payload structures across events. Extract the JSON schema (field paths and types) from each payload, normalize it, and diff against previously seen schemas for the same event type. For a detailed walkthrough, see our guide on detecting breaking changes in Stripe webhooks.

Alerting on webhook issues

Alerts become useful when they signal real problems, not noise. Repeated failures for the same event type — the provider keeps retrying and failing — warrant an alert. Sudden delivery drops, where volume falls off a cliff, usually mean something is wrong.

Newly detected breaking schema changes are another good trigger. If a field disappears or a type changes in a way that could break your code, you want to know before users do. Unexpected error spikes in your handler or downstream systems also deserve attention.

HookHound helps teams detect structural webhook changes and notifies them when those changes matter — breaking vs non-breaking, with alerts to Slack or email so you can fix things before production breaks.

HookHound helps developers monitor webhook payload schemas and detect breaking changes automatically.

Get started free

Tools for webhook observability in production

In practice you combine several approaches:

Application logs — Log incoming webhooks, handler results, and errors. Basic but essential. Use structured logging when possible.
Provider dashboards — Stripe, GitHub, and others show delivery status, retries, and failures. Check them when debugging delivery issues.
Request inspection tools — RequestBin, ngrok, Hookdeck. Good for capturing and inspecting individual requests. See our guide on webhook testing tools for development; for production debugging, debugging webhook integration failures in production covers inspecting real requests and comparing payloads over time.
Custom monitoring — Store events in a database or event store. Build dashboards for delivery rates, error rates, and volume. Add schema extraction and diffing if you need it.
Production webhook monitoring tools — Capture real events, extract schemas, detect changes over time, and alert when something breaks. Useful when you want schema monitoring without building it yourself.

FAQ

What is the best way to monitor webhooks?

There's no single best approach — it depends on your scale and criticality. Start with delivery success/failure tracking and error rates. Add schema monitoring if you depend on specific payload shapes. Combine application logs, provider dashboards, and structured event capture for a complete picture.

Are logs enough for webhook monitoring?

Logs work for basic debugging, but searching through them to compare events over time or spot patterns across retries is painful. When you need to answer "did the payload structure change?" or "why did delivery drop last week?", structured event capture and schema tracking are more useful.

How do I know if a webhook payload changed?

Capture incoming payloads, extract their JSON structure (field paths and types), and compare schemas across events. Breaking changes include removed fields, type changes, and structural shifts. See our guide on detecting breaking changes in Stripe webhooks for a concrete approach.

Should I alert on every webhook failure?

Usually no. Transient failures happen — network blips, brief provider outages. Alert on repeated failures for the same event, sudden delivery drops, or newly detected breaking schema changes. Tune thresholds so alerts matter instead of becoming noise.