Cloud Service Provider outage

Incident Report for FMX System

Postmortem

We want to thank you for using FMX to manage your operations. We understand that FMX is often a critical component of running your organization and therefore we take any service disruptions seriously. This postmortem report will help you to better understand what caused the interruption in service as well as how we plan to avoid issues like these in the future.

The root cause of the outage

On the evening of July 18th, we discovered that the FMX platform was unavailable. After investigating we determined that the cause of this was due to an entire region outage at the Microsoft Azure datacenters across the entire US-Central region. You can read more about it here from the Azure team. Tracking ID: 1K80-N_8

Solution

To solve the problem, we simply monitored the situation and waited for the Azure remediations to be implemented.

Future mitigations

On July 30th, we conducted our own internal postmortem discussing future mitigation tactics that we can employ to reduce our reliance on an individual region or cloud provider, improve communication, and improve our monitoring capabilities. This discussion aligned very closely with the Azure Well-Architected Framework.

‌

Regards,

FMX Team

Posted Jul 31, 2024 - 16:40 EDT

Resolved

This incident has been resolved by our cloud service provider. We will continue to keep an eye out for any further issues. Again, our sincere apologies for any service disruption.

Posted Jul 19, 2024 - 01:02 EDT

Monitoring

Our cloud provider has deployed a fix that appears to have resolved the current incident. We'll continue to monitor the situation and provide updates as they happen.

Posted Jul 18, 2024 - 22:47 EDT

Update

We are seeing sites come back online. We're continuing to monitor the incident to ensure our cloud service provider's mitigation is successful.

Posted Jul 18, 2024 - 21:43 EDT

Update

According to our cloud provider they've `determined the underlying cause and are currently applying mitigation through multiple workstreams. ` We're continuing to monitor the situation and will provide updates as they become available.

Posted Jul 18, 2024 - 21:38 EDT

Update

We are continuing to monitor and follow our cloud providers status pages and will provide more updates as they become available.

Posted Jul 18, 2024 - 20:56 EDT

Investigating

Our Cloud Service Provider is experiencing an outage. We will update this message as we have more details. We apologize for the inconvenience.

Posted Jul 18, 2024 - 18:52 EDT

This incident affected: Web App, API, Email, and Reporting Dashboards.