Update - We unfortunately encountered an additional bug yesterday that threatened the stability of the underlying volume storage cluster (Ceph). As a result there was a short period of additional storage disruption overnight (at about 21:45 NZ time) as we worked with support partners to stabilise the underlying Ceph cluster. This may have resulted in storage errors or hung task warnings within VMs at that time. We're now monitoring. Please let us know if you need support with any related issues.
Apr 16, 2024 - 10:01 NZST
Monitoring - We've identified a specific node that has been dragging storage performance down and now taken it out of the Ceph cluster. IO performance has recovered considerably since this change. We're continuing to monitor.
Apr 15, 2024 - 17:02 NZST
Investigating - Unfortunately we've seen further indications of storage performance issues over the weekend (affecting Ceph volumes accessed through the Cinder Block service). The impact seems to be less pronounced since Friday, though we expect VMs/user to be experiencing intermittent latency spikes and lower average storage performance. We're continuing to investigate and will be escalating with our support partners today.
Please let us know if you experience any problems with your VMs.
Apr 15, 2024 - 09:28 NZST
Monitoring - A fix has been implemented and we are monitoring the results.
Apr 12, 2024 - 17:06 NZST
Update - We are continuing to work on this and planning a switch restart
Apr 12, 2024 - 15:10 NZST
Identified - The underlying network issue has represented itself. Users may still experience some performance degradation especially with storage or other intermittent errors. We're continuing to investigate.
Apr 12, 2024 - 13:14 NZST
Monitoring - We've mitigated the underlying storage problem, which appears to be network related, but limited to specific hosts. Investigation of that is ongoing.
Apr 12, 2024 - 12:29 NZST
Identified - We're working to resolve a storage performance issue in the underlying Ceph storage cluster that provides all block storage for Flexible HPC Cloud VM instances. This may be noticeably impacting IO performance from within cloud instances, which could have knock on intermittent impacts to services within tenant environments, examples include slow or incomplete login interactions or database and web timeouts.
Apr 12, 2024 - 11:30 NZST