Earlier today, between roughly 19:04 and 19:09 UTC, message sends and webhook deliveries returned 500s across the fleet for about five minutes.
Our Valkey instance was OOM-killed at 19:01. The pod's memory limit was 512 MiB, a low default left over from initial setup, and Valkey itself had no eviction ceiling configured, so under normal growth its working set crept past the pod limit before the configured eviction policy could engage. Valkey came back about a second later, but because it's an in-memory store, all ephemeral state was wiped, including the snowflake node-ID leases held by every API pod. A few minutes later every API pod tried to renew its lease, found it gone, and treated the loss as terminal rather than recoverable. We mitigated by rolling both API deployments, which let pods acquire fresh leases on startup.
Apologies for the disruption, and thanks for flying Fluxer.