Investigating - We have identified an ongoing issue with external connectivity via Neutron router gateways and/or Floating IPs. We are still investigating the root cause of this issue and have identified temporary workarounds.
Users may randomly run into is issues with external connectivity (via Neutron router gateways and/or Floating IPs) breaking in weird and wonderful ways. In the simpler cases connectivity into a Floating IP just stops working, yet internal connectivity with the instance in question is fine and it can connect out externally. However, K8s clusters seem to experience a more subtle degradation resulting in timeouts talking to external APIs (including Flexi OpenStack itself), we've seen examples where connections via some source ports work and don't from others.
If you experience any issues like this please raise a support request with the details of the network, router or subnet in question and we will implement a workaround for you.
Mar 11, 2025 - 12:26 NZDT
Resolved -
This incident has been resolved.
Mar 25, 16:14 NZDT
Monitoring -
A fix has been implemented and we are monitoring the results.
Dec 10, 08:37 NZDT
Investigating -
We are experiencing some issues with the backend storage supporting the Maui_ancil Slurm controllers. To fix this we will be shutting down the Maui_ancil controllers for a short period, immediately. New job submissions will not be possible whilst this occurs, but existing jobs should be unaffected.
Dec 9, 16:09 NZDT
Resolved -
This incident has been resolved.
Mar 25, 16:13 NZDT
Monitoring -
Good news, everyone! The Slurm controller services are back online as of yesterday, and we're keeping a close eye on things. Unfortunately, a few hundred jobs decided to take an unexpected holiday break due to the outage. π
In a classic case of split-brain, the controllers couldn't agree on what was happening, so the status of some failed jobs might be a bit... unreliable. We recommend you double-check those failed jobs to see if they need restarting, some of them may have completed. Apologies for the hiccup, and may your holidays be as glitch-free as possible! πβ¨
Dec 24, 12:57 NZDT
Identified -
We've identified an error with Slurm controller that was preventing it from starting and are now working to get it back online. Root cause is so far unknown but may be down to a filesystem issue. It is likely that many jobs have been killed or finished in a failure state. We will provide an update on follow up actions for users once the service is stable.
Dec 23, 10:34 NZDT
Investigating -
The Mahuika Slurm Controller is not functioning. This affects the ability to submit new jobs and to launch Jupyter sessions. Our apologies, due to the time of year we are unable to give a time when this will be fixed due to reduced levels of staffing over the holiday period.
Dec 22, 10:06 NZDT
Resolved -
This incident has been resolved.
Mar 20, 09:52 NZDT
Monitoring -
We have identified the issue with the NFS and have resolved the issue, We will continue to monitor to ensure that the issue has been cleared.
We thank you for your patience during this unplanned outage.
Mar 19, 14:50 NZDT
Investigating -
We are currently investigating an issue with the NFS server
This is causing connection issues related to NeSI OnDemand as users are not able to get onto the system.
We apologize for the inconvenience and will provide another update soon
Mar 19, 11:45 NZDT