Things did stabilise for a while, and we were ready to close the incident, but the problem resurfaced. We're now approaching it from another angle that should provide a durable solution, even under heavy load. As a reminder, we're still only two engineers, and we don't have 100% availability throughout the day. Thanks for bearing with us a little longer! The hardware is no longer the issue due to the new infrastructure; this is now purely a software problem that appears under high concurrency.