Update - We unfortunately encountered an additional bug yesterday that threatened the stability of the underlying volume storage cluster (Ceph). As a result there was a short period of additional storage disruption overnight (at about 21:45 NZ time) as we worked with support partners to stabilise the underlying Ceph cluster. This may have resulted in storage errors or hung task warnings within VMs at that time. We're now monitoring. Please let us know if you need support with any related issues.
Apr 16, 2024 - 10:01 NZST
Monitoring - We've identified a specific node that has been dragging storage performance down and now taken it out of the Ceph cluster. IO performance has recovered considerably since this change. We're continuing to monitor.
Apr 15, 2024 - 17:02 NZST
Investigating - Unfortunately we've seen further indications of storage performance issues over the weekend (affecting Ceph volumes accessed through the Cinder Block service). The impact seems to be less pronounced since Friday, though we expect VMs/user to be experiencing intermittent latency spikes and lower average storage performance. We're continuing to investigate and will be escalating with our support partners today.

Please let us know if you experience any problems with your VMs.

Apr 15, 2024 - 09:28 NZST
Monitoring - A fix has been implemented and we are monitoring the results.
Apr 12, 2024 - 17:06 NZST
Update - We are continuing to work on this and planning a switch restart
Apr 12, 2024 - 15:10 NZST
Identified - The underlying network issue has represented itself. Users may still experience some performance degradation especially with storage or other intermittent errors. We're continuing to investigate.
Apr 12, 2024 - 13:14 NZST
Monitoring - We've mitigated the underlying storage problem, which appears to be network related, but limited to specific hosts. Investigation of that is ongoing.
Apr 12, 2024 - 12:29 NZST
Identified - We're working to resolve a storage performance issue in the underlying Ceph storage cluster that provides all block storage for Flexible HPC Cloud VM instances. This may be noticeably impacting IO performance from within cloud instances, which could have knock on intermittent impacts to services within tenant environments, examples include slow or incomplete login interactions or database and web timeouts.
Apr 12, 2024 - 11:30 NZST
Update - We've managed to stabilise things, which has reduced the impact, but don't have a root cause yet so are continuing to investigate
Mar 15, 2024 - 08:59 NZDT
Investigating - We're seeing issues on the Maui Ancil Slurm cluster and impacts to Jupyter. This seems to be filesystem related, though we are still investigating and escalating with our support partners
Mar 14, 2024 - 20:42 NZDT

About This Site

New Zealand eScience Infrastructure High Performance Compute and Storage Service Status

Apply for Access ? Operational
Data Transfer Operational
Submit new HPC Jobs Operational
Jobs running on HPC Operational
Jupyter on NeSI (beta) ? Operational
HPC Storage Operational
Long-term Storage (Early Access) ? Operational
User Support System ? Operational
Flexible High Performance Cloud ? Operational
NeSI HPC Compute Infrastructure ? Operational
HPC Lander node ? Operational
HPC Login nodes - Māui ? Operational
HPC Login nodes - Mahuika ? Operational
HPC Compute nodes - Māui ? Operational
HPC Compute nodes - Mahuika ? Operational
Mahuika Extension nodes - Mahuika ? Operational
Māui Ancillary nodes ? Operational
Mahuika Ancillary nodes ? Operational
NeSI Storage Infrastructure Degraded Performance
HPC Shared Storage system ? Degraded Performance
Online storage ? Operational
Nearline storage Operational
Scratch storage ? Operational
NeSI Data Transfer Infrastructure ? Operational
NeSI HPC Facility (Greta Point, Wellington) DTN ? Operational
Flexible High Performance Cloud Services ? Operational
90 days ago
100.0 % uptime
Today
Virtual Compute Service Operational
Bare Metal Compute Service Operational
FlexiHPC Dashboard (web interface) ? Operational
90 days ago
100.0 % uptime
Today
FlexiHPC CLI interface ? Operational
90 days ago
100.0 % uptime
Today
Public API of the FlexiHPC Service ? Operational
90 days ago
100.0 % uptime
Today
Operational
Degraded Performance
Partial Outage
Major Outage
Maintenance
Major outage
Partial outage
No downtime recorded on this day.
No data exists for this day.
had a major outage.
had a partial outage.
Past Incidents
Apr 21, 2024

No incidents reported today.

Apr 20, 2024

No incidents reported.

Apr 19, 2024

No incidents reported.

Apr 18, 2024

No incidents reported.

Apr 17, 2024

No incidents reported.

Apr 16, 2024

Unresolved incident: Storage performance issue impacting virtual instances.

Apr 15, 2024
Apr 14, 2024

No incidents reported.

Apr 13, 2024

No incidents reported.

Apr 12, 2024
Completed - The scheduled maintenance has been completed.
Apr 12, 16:58 NZST
In progress - Scheduled maintenance is currently in progress. We will provide updates as necessary.
Apr 12, 15:45 NZST
Scheduled - We've been experiencing stability issues in our network, which is impacting block storage and other services. A switch restart is needed for troubleshooting. This may result in short-term (minutes) of traffic disruption.
Apr 12, 15:14 NZST
Apr 11, 2024

No incidents reported.

Apr 10, 2024
Resolved - This incident has been resolved.
Apr 10, 15:01 NZST
Update - We are continuing to monitor for any further issues.
Apr 10, 14:43 NZST
Monitoring - A fix has been implemented and we are monitoring the results.
Apr 10, 14:43 NZST
Investigating - jupyter.nesi.org.nz are experiencing login issues at the moment We are investigating the issue and apologise for the inconvenience. If you have further questions or queries, please, contact us at support@nesi.org.nz
Apr 10, 14:25 NZST
Apr 9, 2024

No incidents reported.

Apr 8, 2024

No incidents reported.

Apr 7, 2024

No incidents reported.