In progress - Scheduled maintenance is currently in progress. We will provide updates as necessary.
Apr 02, 2025 - 15:00 NZDT
Scheduled - We are upgrading the core software underlying our Flexible HPC research cloud platform. This upgrade is essential for improving the stability, performance, security and reliability of the Flexible HPC research cloud platform.

There is no specific action required from users relating to this upgrade. If you experience any issues during this period, our advice is to wait and retry after the change has completed. You are welcome to contact us if you do experience any issues.

We understand that these interruptions may cause inconvenience, and we appreciate your support as we undertake important work in maintaining and improving our infrastructure. We aim to minimise downtime and disruption and keep you informed throughout.

Technical Information

This change is to the underlying OpenStack environment, upgrading OpenStack from the currently deployed 2023.1 Antelope release to the 2024.1 Caracal release.

This release includes a range of new features and fixes to address bugs, improve security, and enhance performance. The following upgrades are included in the 2024.1 Caracal release: https://releases.openstack.org/caracal/highlights.html

Impact

During the upgrade, you might experience brief periods where the web browser dashboard or APIs for programmable access to the platform are unavailable.

Users might experience temporary failures in launching new compute instances, creating storage volumes, or managing network resources due to these brief API interruptions.

Upgrade Process

This is an automated rolling upgrade with platform services upgraded sequentially to minimise overall downtime.

Communication

We will provide regular updates on this Statuspage.

For support during the upgrade or any time, please contact us: support@nesi.org.nz

Apr 2, 2025 15:00 - Apr 3, 2025 15:00 NZDT
Monitoring - We're pleased to advise that we have been able to narrow down the issue and apply upgrades that seem to have resolved all the related problems we're aware of.

The actual issue was some buggy arp handling behaviour in OVN, causing some unlucky packets to disappear into a network blackhole.

Apr 02, 2025 - 14:02 NZDT
Investigating - We have identified an ongoing issue with external connectivity via Neutron router gateways and/or Floating IPs. We are still investigating the root cause of this issue and have identified temporary workarounds.

Users may randomly run into is issues with external connectivity (via Neutron router gateways and/or Floating IPs) breaking in weird and wonderful ways. In the simpler cases connectivity into a Floating IP just stops working, yet internal connectivity with the instance in question is fine and it can connect out externally. However, K8s clusters seem to experience a more subtle degradation resulting in timeouts talking to external APIs (including Flexi OpenStack itself), we've seen examples where connections via some source ports work and don't from others.

If you experience any issues like this please raise a support request with the details of the network, router or subnet in question and we will implement a workaround for you.

Mar 11, 2025 - 12:26 NZDT

About This Site

New Zealand eScience Infrastructure High Performance Compute and Storage Service Status

Apply for Access ? Operational
Data Transfer Operational
Submit new HPC Jobs Operational
Jobs running on HPC Operational
Jupyter on NeSI (beta) ? Operational
NeSI OnDemand ? Operational
90 days ago
99.8 % uptime
Today
HPC Storage Operational
Long-term Storage (Early Access) ? Operational
User Support System ? Operational
Flexible High Performance Cloud ? Under Maintenance
NeSI HPC Compute Infrastructure ? Operational
HPC Lander node ? Operational
HPC Login nodes - Māui ? Operational
HPC Login nodes - Mahuika ? Operational
HPC Compute nodes - Māui ? Operational
HPC Compute nodes - Mahuika ? Operational
Mahuika Extension nodes - Mahuika ? Operational
Māui Ancillary nodes ? Operational
Mahuika Ancillary nodes ? Operational
NeSI Storage Infrastructure Operational
HPC Shared Storage system ? Operational
Online storage ? Operational
Nearline storage Operational
Scratch storage ? Operational
NeSI Data Transfer Infrastructure ? Operational
NeSI HPC Facility (Greta Point, Wellington) DTN ? Operational
Flexible High Performance Cloud Services ? Under Maintenance
90 days ago
99.97 % uptime
Today
Virtual Compute Service Operational
Bare Metal Compute Service Operational
FlexiHPC Dashboard (web interface) ? Under Maintenance
90 days ago
99.97 % uptime
Today
FlexiHPC CLI interface ? Under Maintenance
90 days ago
99.97 % uptime
Today
Public API of the FlexiHPC Service ? Under Maintenance
90 days ago
99.97 % uptime
Today
Operational
Degraded Performance
Partial Outage
Major Outage
Maintenance
Major outage
Partial outage
No downtime recorded on this day.
No data exists for this day.
had a major outage.
had a partial outage.

Scheduled Maintenance

my.nesi.org.nz system update Apr 8, 2025 15:30-17:00 NZST

We will be undergoing scheduled maintenance during this time.
Posted on Apr 01, 2025 - 12:30 NZDT
Apr 2, 2025

Unresolved incidents: Major upgrade to NeSI’s Flexible HPC research cloud platform, External router and floating IP connectivity randomly degraded.

Apr 1, 2025

No incidents reported.

Mar 31, 2025

No incidents reported.

Mar 30, 2025

No incidents reported.

Mar 29, 2025

No incidents reported.

Mar 28, 2025

No incidents reported.

Mar 27, 2025

No incidents reported.

Mar 26, 2025

No incidents reported.

Mar 25, 2025
Resolved - This incident has been resolved.
Mar 25, 16:14 NZDT
Monitoring - A fix has been implemented and we are monitoring the results.
Dec 10, 08:37 NZDT
Investigating - We are experiencing some issues with the backend storage supporting the Maui_ancil Slurm controllers. To fix this we will be shutting down the Maui_ancil controllers for a short period, immediately. New job submissions will not be possible whilst this occurs, but existing jobs should be unaffected.
Dec 9, 16:09 NZDT
Resolved - This incident has been resolved.
Mar 25, 16:13 NZDT
Monitoring - Good news, everyone! The Slurm controller services are back online as of yesterday, and we're keeping a close eye on things. Unfortunately, a few hundred jobs decided to take an unexpected holiday break due to the outage. 😅

In a classic case of split-brain, the controllers couldn't agree on what was happening, so the status of some failed jobs might be a bit... unreliable. We recommend you double-check those failed jobs to see if they need restarting, some of them may have completed. Apologies for the hiccup, and may your holidays be as glitch-free as possible! 🎄✨

Dec 24, 12:57 NZDT
Identified - We've identified an error with Slurm controller that was preventing it from starting and are now working to get it back online. Root cause is so far unknown but may be down to a filesystem issue. It is likely that many jobs have been killed or finished in a failure state. We will provide an update on follow up actions for users once the service is stable.
Dec 23, 10:34 NZDT
Investigating - The Mahuika Slurm Controller is not functioning. This affects the ability to submit new jobs and to launch Jupyter sessions.
Our apologies, due to the time of year we are unable to give a time when this will be fixed due to reduced levels of staffing over the holiday period.

Dec 22, 10:06 NZDT
Mar 24, 2025

No incidents reported.

Mar 23, 2025

No incidents reported.

Mar 22, 2025

No incidents reported.

Mar 21, 2025

No incidents reported.

Mar 20, 2025
Resolved - This incident has been resolved.
Mar 20, 17:12 NZDT
Monitoring - We are actively monitoring the situation, as the issue now appears to have subsided.

Our team is reviewing logs to determine the root cause of the intermittent outage. We appreciate your patience and understanding.

Mar 19, 15:22 NZDT
Investigating - We are currently investigating an issue related to the Research Developer Cloud

This is causing issues related to dashboard access, object storage, API's and possibly other services as well

We will provide an update once we know more

Mar 19, 13:30 NZDT
Completed - The scheduled maintenance has been completed.
Mar 20, 17:05 NZDT
Verifying - Verification is currently underway for the maintenance items.
Mar 20, 17:05 NZDT
In progress - Scheduled maintenance is currently in progress. We will provide updates as necessary.
Mar 20, 16:32 NZDT
Scheduled - We will be undergoing scheduled maintenance during this time.
Mar 20, 16:31 NZDT
Resolved - This incident has been resolved.
Mar 20, 09:52 NZDT
Monitoring - We have identified the issue with the NFS and have resolved the issue, We will continue to monitor to ensure that the issue has been cleared.

We thank you for your patience during this unplanned outage.

Mar 19, 14:50 NZDT
Investigating - We are currently investigating an issue with the NFS server

This is causing connection issues related to NeSI OnDemand as users are not able to get onto the system.

We apologize for the inconvenience and will provide another update soon

Mar 19, 11:45 NZDT
Mar 19, 2025