onotify: A scalable, flexible Alertmanager

While Prometheus has quickly become the defacto standard for collecting, storing, and alerting on time series metrics. However, adopting Prometheus locks you into the Prometheus ecosystem – exporters, middleware boxes, and critically, Alertmanager. Despite this tight coupling, Alertmanager has traditionally avoided implementing the business logic necessary to function as a fully featured Alerting platform, instead delegating that responsibility to downstream alerting sinks such as Pagerduty or Slack. In this talk, I will highlight these limitations and contextualize them within Prometheus’s historical development. I will then introduce onotify, a drop in serverless replacement for Alertmanager built on top of Cloudflare workers. onotify addresses many of Alertmanager’s business logic challenges, including multi-tenancy, rate limiting, scaling, and Alert meta-observability, while scaling almost infinitely and providing all the tools that platform teams need to manage their alerting system as they scale to thousands, or even tens of thousands of alerts.

Speaker

  • Colin Douch
    Colin Douch
    DuckDuckGo

    Colin previously tech led the Observability Team at Cloudflare, but currently works at DuckDuckGo focusing on improving the external reliability and internal observability of DuckDuckGo’s increasingly complex set of privacy preserving products. Starting in mining, he has been working, advising, and researching in the Monitoring and Observability space for close to 15 years gaining a wide perspective into the difficulties that modern companies, big, and small deal with in properly introspecting their systems. Originally from New Zealand, he currently lives in London where he frequently runs talks on Observability developments, introducing new graduates to the world of Observability and usually teaching the old timers something new too.

Date

Nov 20 2025
Expired!

Time

14:45 - 15:15

Location

Elisabeth