Object storage and ceph-hdd volume issue

Incident Report for NeSI Status

Resolved

We're pleased to advise that after juggling objects between drives and nodes for a couple of weeks Ceph has now worked its way to a much more even data layout.
Posted Mar 11, 2025 - 12:19 NZDT

Identified

We have a data imbalance issue within the distributed storage that underpins the FlexiHPC object storage service and "ceph-hdd" type volumes. This has caused the object store to report intermittent errors to clients over the weekend when attempting to write new data. Cloud instances using the ceph-hdd volume type may have experienced intermittent IO errors and/or high latency. We expect most FlexiHPC users to be unaffected as ceph-ssd is the default and is not impacted.

Due to the amounts of data involved, resolving this issue and rebalancing the cluster may take some time. Please stay tuned for updates. If your deployment has been impacted by these issues and you need assistance recovering then please reach out to support.
Posted Feb 17, 2025 - 10:20 NZDT
This incident affected: Flexible High Performance Cloud Services (Public API of the FlexiHPC Service) and Flexible High Performance Cloud.