Fluxer - We're letting people back in slowly! – Incident details

Fluxer API (api.fluxer.app) experiencing degraded performance

We're letting people back in slowly!

Resolved
Degraded performance
Started about 1 month agoLasted 2 days

Affected

Fluxer API (api.fluxer.app)

Major outage from 7:13 AM to 2:50 PM, Degraded performance from 2:50 PM to 4:39 PM, Under maintenance from 4:39 PM to 7:31 AM, Operational from 7:31 AM to 9:58 AM

Updates
  • Resolved
    Resolved
    This incident has been resolved.
  • Monitoring
    Monitoring

    It's finally here, we're going to start letting in traffic slowly, and then probably more rapidly. We will be rolling out the sessions a smaller groups to maintain a close eye on the infrastructure as we do this, but seeing as it is a lower traffic time of day, it shouldn't be an issue (hopefully...knock on wood).

  • Update
    Update

    We're awake again and working through the remaining pre-takeoff checklist!

  • Update
    Update

    Sharing exact ETAs hasn't worked out so well so far, but there isn’t much left to do, and the two people working on this migration could really use some catch-up sleep!

  • Investigating
    Investigating

    And right as I posted that update, it went down again.

    We’re going to limit access to the app and show a clear message while we finish the migration. These outages are pulling time away from the migration work, and the team working on it is extremely small. Right now it is a single person, me.

    We do have one new hire though, and are working towards expanding the team!

    A massive surge of new signups is also overwhelming the single production server we are working to migrate away from. That server is currently serving about 120,000 users, and we received those users in just two weeks.

    Things are going to be back up and better, fully on the new environment, with wider voice server coverage in Johannesburg, Mumbai, São Paulo, Sydney, Tokyo, Miami, Dallas, Madrid, Frankfurt, Nuremberg, Stockholm, and more to come, plus improved anti-abuse and platform moderation tools to fight spam and raids, and more, at 10 AM UTC on Saturday if everything goes as expected!

    I can also reveal that we've got a surprise for all pre-existing Plutonium and lifetime Visionary users, and all non-paying users too, as soon as everything is back and running.

    Thanks for your patience, and have an awesome weekend!

  • Resolved
    Resolved

    We're really sorry about the downtime!

    Things are going to get better soon, but right now we have to keep two worlds alive at the same time. We have to maintain the old production environment that is already overloaded, and we also have to keep working through issues in the new environment we're trying to move everyone to.

    The hard part is that a lot of people want back in all at once, and most things are still running on that old environment. That creates the classic thundering herd effect. Requests pile up, some time out, clients retry, the retries add even more load, and it can spiral into downtime across multiple layers of the stack.

    We've had to tweak a lot of things to blunt that surge and stop the negative loop. These are the well documented symptoms you see in systems that scale fast, including Discord in its early days.

    We honestly did not expect to be operating at this scale so quickly, and we are an extremely small team. It is basically a single person driving the core work (with one new hire just today!). We're trying to do better <3

  • Identified
    Identified

    The API has been brought back online. However, the real-time Gateway is being slammed with requests to bring your communities back online. We have identified the slowness and are working on unclogging the queue.

  • Monitoring
    Monitoring

    We implemented a fix to get everyone back in and are currently monitoring the result, you may experience some missing servers and instability.

  • Identified
    Identified

    We are currently working on resolving the elevated error rates on the API.