Resolved -
This incident has been resolved. A resource intensive process was initiated during a busy period that overwhelmed a subset of our internal resources and resulted in intermittent failure to complete requests. Manual scaling of component-specific resources was needed in addition to our automated scaling to recover. We have implemented some immediate efficiency improvements and are working on implementing additional larger improvements in the coming days.
May 15, 16:43 PDT
Update -
We are continuing to work on a long term fix and monitoring to ensure our mitigation is holding.
May 15, 13:23 PDT
Identified -
The issue has been identified and temporary mitigation put in place. We have not observed internal errors being surfaced in the last 20 minutes. We are continuing to monitor while we work on a longer term fix.
May 15, 11:06 PDT
Investigating -
We detected a spike in internal errors affecting some hosts. Behavior is intermittent and refreshing the page should result in successful load.
May 15, 10:36 PDT