The gateway has been running smoothly for the past 10 days, with no major issues.
We've now got scheduled maintenance planned to move our gateway infrastructure onto a new code version. For context, that infrastructure is a cluster of 54 nodes spread across six specialised tiers for redundancy and load distribution, and it's served us very well. The new version introduces a system for safe, durable, and strongly consistent code updates that can roll out across the whole cluster in seconds, all through a standardised process that minimises human error. It also stabilises the wider deployment pipeline, should we ever need to carry out a rolling deployment of our stateful tiers.
Our lead scientist, Rick Sanchez, reckons it'll be a quick twenty-minute adventure, in and out, though you might want to take that with a pinch of salt. The work we've put into boosting our backend's performance and helping it absorb load means we can promise a faster recovery time than on previous occasions.
Once this new durable hotpatch deployment system is in place, we shouldn't ever need to "restart Fluxer" again, unless something catastrophic triggers cascading failures in our real-time stack. We're aiming for that 99.99% uptime!