tag:status.nesi.org.nz,2005:/historyNeSI Status Status - Incident History2024-03-29T07:40:30+13:00NeSI Statustag:status.nesi.org.nz,2005:Incident/203148442024-03-26T18:00:03+13:002024-03-26T18:00:03+13:00my.nesi.org.nz system update<p><small>Mar <var data-var='date'>26</var>, <var data-var='time'>18:00</var> NZDT</small><br><strong>Completed</strong> - The scheduled maintenance has been completed.</p><p><small>Mar <var data-var='date'>26</var>, <var data-var='time'>16:00</var> NZDT</small><br><strong>In progress</strong> - Scheduled maintenance is currently in progress. We will provide updates as necessary.</p><p><small>Mar <var data-var='date'>21</var>, <var data-var='time'>12:40</var> NZDT</small><br><strong>Scheduled</strong> - We will be undergoing scheduled maintenance during this time to update the system.</p>tag:status.nesi.org.nz,2005:Incident/203071992024-03-21T12:48:16+13:002024-03-21T12:48:16+13:00Slow Jupyter logins<p><small>Mar <var data-var='date'>21</var>, <var data-var='time'>12:48</var> NZDT</small><br><strong>Resolved</strong> - This incident has been resolved.</p><p><small>Mar <var data-var='date'>20</var>, <var data-var='time'>18:07</var> NZDT</small><br><strong>Investigating</strong> - We're aware of and investigating some poor performance logging into Jupyter and spawning notebooks</p>tag:status.nesi.org.nz,2005:Incident/202409222024-03-15T08:59:08+13:002024-03-15T08:59:08+13:00Filesystem issue<p><small>Mar <var data-var='date'>15</var>, <var data-var='time'>08:59</var> NZDT</small><br><strong>Update</strong> - We've managed to stabilise things, which has reduced the impact, but don't have a root cause yet so are continuing to investigate</p><p><small>Mar <var data-var='date'>14</var>, <var data-var='time'>20:42</var> NZDT</small><br><strong>Investigating</strong> - We're seeing issues on the Maui Ancil Slurm cluster and impacts to Jupyter. This seems to be filesystem related, though we are still investigating and escalating with our support partners</p>tag:status.nesi.org.nz,2005:Incident/201427152024-03-05T09:19:07+13:002024-03-05T09:19:07+13:00SSH: trouble logging in via SSH? Try Jupyter<p><small>Mar <var data-var='date'> 5</var>, <var data-var='time'>09:19</var> NZDT</small><br><strong>Resolved</strong> - This incident has been resolved.</p><p><small>Mar <var data-var='date'> 5</var>, <var data-var='time'>09:18</var> NZDT</small><br><strong>Monitoring</strong> - A fix has been implemented and we are monitoring the results.</p><p><small>Mar <var data-var='date'> 2</var>, <var data-var='time'>22:46</var> NZDT</small><br><strong>Investigating</strong> - We are seeing issues with logging in to SSH, which mightn't be resolved until business hours on Monday 4 March 2024.<br /><br />Logging in via your web browser using https://jupyter.nesi.org.nz/ works well, and it has a terminal too - see our docs on Jupyter for more information: https://support.nesi.org.nz/hc/en-gb/articles/360001555615-Jupyter-on-NeSI</p>tag:status.nesi.org.nz,2005:Incident/200746072024-02-28T18:00:56+13:002024-02-28T18:00:56+13:00my.nesi.org.nz system update<p><small>Feb <var data-var='date'>28</var>, <var data-var='time'>18:00</var> NZDT</small><br><strong>Completed</strong> - The scheduled maintenance has been completed.</p><p><small>Feb <var data-var='date'>28</var>, <var data-var='time'>16:00</var> NZDT</small><br><strong>In progress</strong> - Scheduled maintenance is currently in progress. We will provide updates as necessary.</p><p><small>Feb <var data-var='date'>26</var>, <var data-var='time'>15:12</var> NZDT</small><br><strong>Scheduled</strong> - We will be undergoing scheduled maintenance during this time.</p>tag:status.nesi.org.nz,2005:Incident/200339342024-02-28T13:07:09+13:002024-02-28T13:07:09+13:00Intermittent network connectivity issues within Flexi<p><small>Feb <var data-var='date'>28</var>, <var data-var='time'>13:07</var> NZDT</small><br><strong>Resolved</strong> - This incident has been resolved.</p><p><small>Feb <var data-var='date'>23</var>, <var data-var='time'>11:33</var> NZDT</small><br><strong>Monitoring</strong> - After working with support partners overnight we've identified a configuration regression related to an earlier incident. This regression appears to have caused performance issues for networking and virtual machine shared storage, which as a result impacted many workloads across the cloud. The problematic nodes have been removed from production, this seems to have stabilised the situation and services have now been restored. Users are encouraged to login and check services/workflows/jobs etc and please get in touch if you are experiencing any ongoing issues. Our apologies for the issues and unplanned downtime!</p><p><small>Feb <var data-var='date'>22</var>, <var data-var='time'>18:27</var> NZDT</small><br><strong>Update</strong> - We are continuing to remove the problematic nodes from the storage cluster.<br /><br />Connections to running instances will still be intermittent.<br /><br />We apologise for any inconvenience cause during this time.</p><p><small>Feb <var data-var='date'>22</var>, <var data-var='time'>16:09</var> NZDT</small><br><strong>Update</strong> - We are in the process of removing problematic nodes, however this is not a quick process.<br /><br />Connections to instances may be slow or intermittent. <br /><br />We apologise for any inconvenience cause during this time.</p><p><small>Feb <var data-var='date'>22</var>, <var data-var='time'>12:16</var> NZDT</small><br><strong>Identified</strong> - We have found an issue with the underlying storage cluster that could be attributing to network issues users are experiencing. <br /><br />Running Instances look to be fine at this stage, however external connections may be slow or intermittent. <br /><br />We apologise for any inconvenience cause during this time, If you are experiencing any issues please reach out to support so we can advise</p><p><small>Feb <var data-var='date'>22</var>, <var data-var='time'>10:06</var> NZDT</small><br><strong>Update</strong> - We are continuing to monitor for any further issues.</p><p><small>Feb <var data-var='date'>21</var>, <var data-var='time'>17:26</var> NZDT</small><br><strong>Monitoring</strong> - We are currently monitoring an possible network issue within the Flexi/RDC Space.<br /><br />If you are experiencing any issues please reach out to support so we can advise</p><p><small>Feb <var data-var='date'>21</var>, <var data-var='time'>13:45</var> NZDT</small><br><strong>Investigating</strong> - We are currently investigating an possible network issue within the Flexi/RDC Space.<br /><br />Running Instances look to be fine at this stage.<br /><br />If you are experiencing any issues please reach out to support so we can advise</p>tag:status.nesi.org.nz,2005:Incident/200830412024-02-28T12:26:10+13:002024-02-28T12:26:11+13:00Jupyter connection issues<p><small>Feb <var data-var='date'>28</var>, <var data-var='time'>12:26</var> NZDT</small><br><strong>Resolved</strong> - We have resolved the issue and users should be able to connect to Jupyter<br /><br />We apologise for the inconvenience, if you are having any issues at all please reach out to support</p><p><small>Feb <var data-var='date'>27</var>, <var data-var='time'>13:05</var> NZDT</small><br><strong>Monitoring</strong> - We have identified and implemented a fix for the issues with connecting to Jupyter.<br /><br />We apologise for the inconvenience, if you are having any issues at all please reach out to support</p><p><small>Feb <var data-var='date'>27</var>, <var data-var='time'>12:17</var> NZDT</small><br><strong>Investigating</strong> - We are currently investigating an issue with connecting to Jupyter at this time.<br /><br />We apologise for the inconvenience and will provider further updates soon.</p>tag:status.nesi.org.nz,2005:Incident/200237822024-02-21T10:36:58+13:002024-02-21T10:36:58+13:00Jupyterhub log in issues for all users<p><small>Feb <var data-var='date'>21</var>, <var data-var='time'>10:36</var> NZDT</small><br><strong>Resolved</strong> - This incident has been resolved.</p><p><small>Feb <var data-var='date'>20</var>, <var data-var='time'>15:00</var> NZDT</small><br><strong>Monitoring</strong> - We have applied a fix to the Jupyter Hub service and users should be able to log in again.<br /><br />We apologise for the disruption, please reach out to support if you need any assistance.</p><p><small>Feb <var data-var='date'>20</var>, <var data-var='time'>14:38</var> NZDT</small><br><strong>Update</strong> - We are currently investigating an issue with Jupyter Hub user logins that has spread to a wider audience and is now affecting all user log ins.<br /><br />We are working to resolve the issue quickly and are sorry for any inconvenience caused.</p><p><small>Feb <var data-var='date'>20</var>, <var data-var='time'>10:18</var> NZDT</small><br><strong>Investigating</strong> - We are currently investigating an issue with recently created Jupyter Hub users. Users that have older accounts shouldn't have any issues with log in.<br /><br />We are sorry for any inconvenience caused and are working to resolve the situation</p>tag:status.nesi.org.nz,2005:Incident/199879082024-02-15T16:24:10+13:002024-02-15T16:24:10+13:00Internment Connections to Flexi and RDC Platform<p><small>Feb <var data-var='date'>15</var>, <var data-var='time'>16:24</var> NZDT</small><br><strong>Resolved</strong> - This incident has been resolved.</p><p><small>Feb <var data-var='date'>15</var>, <var data-var='time'>13:30</var> NZDT</small><br><strong>Monitoring</strong> - A fix has been implemented and we are monitoring the results.</p><p><small>Feb <var data-var='date'>15</var>, <var data-var='time'>13:09</var> NZDT</small><br><strong>Investigating</strong> - We are currently investigating an issue with connectivity to the RDC Dashboard and connections to Floating IPs used for SSH or web traffic<br /><br />All instances are still running and this should only affect connections from external applications or users into the RDC/Flexi</p>tag:status.nesi.org.nz,2005:Incident/199355962024-02-09T14:04:58+13:002024-02-09T14:04:58+13:00Production database upgrade<p><small>Feb <var data-var='date'> 9</var>, <var data-var='time'>14:04</var> NZDT</small><br><strong>Completed</strong> - The scheduled maintenance has been completed.</p><p><small>Feb <var data-var='date'> 8</var>, <var data-var='time'>15:04</var> NZDT</small><br><strong>In progress</strong> - Scheduled maintenance is currently in progress. We will provide updates as necessary.</p><p><small>Feb <var data-var='date'> 8</var>, <var data-var='time'>15:03</var> NZDT</small><br><strong>Scheduled</strong> - We will be undergoing scheduled maintenance during this time.</p>tag:status.nesi.org.nz,2005:Incident/198805552024-02-08T15:00:56+13:002024-02-08T15:00:56+13:00my.nesi.org.nz system maintenance<p><small>Feb <var data-var='date'> 8</var>, <var data-var='time'>15:00</var> NZDT</small><br><strong>Completed</strong> - The scheduled maintenance has been completed.</p><p><small>Feb <var data-var='date'> 8</var>, <var data-var='time'>10:00</var> NZDT</small><br><strong>In progress</strong> - Scheduled maintenance is currently in progress. We will provide updates as necessary.</p><p><small>Feb <var data-var='date'> 1</var>, <var data-var='time'>13:59</var> NZDT</small><br><strong>Scheduled</strong> - We are upgrading my.nesi.org.nz portal and linked applications. Jupyter on NeSI may be occasionally unavailable during this period.</p>tag:status.nesi.org.nz,2005:Incident/199261522024-02-08T00:00:56+13:002024-02-08T00:00:56+13:00Flexible High Performance Cloud network maintenance<p><small>Feb <var data-var='date'> 8</var>, <var data-var='time'>00:00</var> NZDT</small><br><strong>Completed</strong> - The scheduled maintenance has been completed.</p><p><small>Feb <var data-var='date'> 7</var>, <var data-var='time'>18:00</var> NZDT</small><br><strong>In progress</strong> - Scheduled maintenance is currently in progress. We will provide updates as necessary.</p><p><small>Feb <var data-var='date'> 7</var>, <var data-var='time'>13:10</var> NZDT</small><br><strong>Update</strong> - We will be undergoing scheduled maintenance during this time.</p><p><small>Feb <var data-var='date'> 7</var>, <var data-var='time'>12:50</var> NZDT</small><br><strong>Update</strong> - NeSI’s Flexible High Performance Cloud infrastructure (and the Research Developer Cloud service that it offers) is getting a new more permanent home on the Internet thanks to REANNZ. This means the public IP addresses available for floating IPs will be changing.<br /> <br /> In preparation for this transition we will be performing a maintenance on the Flexi networks. This could potentially disrupt traffic in flight for a few seconds as the switches are restarted.<br /> <br /> Running instances will continue to run however may have intermittent network access. There will be no change to existing floating IPs and the Neutron ports they are associated with. There will be follow up maintenance and communication soon regarding the help we need from users to complete this transition.</p><p><small>Feb <var data-var='date'> 7</var>, <var data-var='time'>12:49</var> NZDT</small><br><strong>Scheduled</strong> - NeSI’s Flexible High Performance Cloud infrastructure (and the Research Developer Cloud service that it offers) is getting a new more permanent home on the Internet thanks to REANNZ. This means the public IP addresses available for floating IPs will be changing.<br /> <br /> In preparation for this transition we will be performing a maintenance on the Flexi networks. This could potentially disrupt traffic in flight for a few seconds as the switches are restarted.<br /> <br /> Running instances will continue to run however may have intermittent network access. There will be no change to existing floating IPs and the Neutron ports they are associated with. There will be follow up maintenance and communication soon regarding the help we need from users to complete this transition.</p>tag:status.nesi.org.nz,2005:Incident/198332902024-01-26T14:00:00+13:002024-01-26T14:00:00+13:00NeSI Lander nodes update<p><small>Jan <var data-var='date'>26</var>, <var data-var='time'>14:00</var> NZDT</small><br><strong>Completed</strong> - The scheduled maintenance has been completed.</p><p><small>Jan <var data-var='date'>26</var>, <var data-var='time'>12:01</var> NZDT</small><br><strong>In progress</strong> - Scheduled maintenance is currently in progress. We will provide updates as necessary.</p><p><small>Jan <var data-var='date'>26</var>, <var data-var='time'>11:25</var> NZDT</small><br><strong>Scheduled</strong> - We will be applying updates to the NeSI Lander nodes. Interruption of services is not expected but sessions open over the maintenance period will be disconnected.</p>tag:status.nesi.org.nz,2005:Incident/198249332024-01-26T08:13:49+13:002024-01-26T08:13:49+13:00Hypervisor Error<p><small>Jan <var data-var='date'>26</var>, <var data-var='time'>08:13</var> NZDT</small><br><strong>Resolved</strong> - This incident has been resolved.</p><p><small>Jan <var data-var='date'>25</var>, <var data-var='time'>17:29</var> NZDT</small><br><strong>Monitoring</strong> - A fix has been implemented and we are monitoring the results.</p><p><small>Jan <var data-var='date'>25</var>, <var data-var='time'>16:27</var> NZDT</small><br><strong>Investigating</strong> - we are experiencing a hypervisor error that is impacting several services. Our team is actively working to address the issue and restore full functionality as quickly as possible.</p>tag:status.nesi.org.nz,2005:Incident/197979962024-01-25T18:01:07+13:002024-01-25T18:01:07+13:00my.nesi.org.nz system update and release<p><small>Jan <var data-var='date'>25</var>, <var data-var='time'>18:01</var> NZDT</small><br><strong>Completed</strong> - The scheduled maintenance has been completed.</p><p><small>Jan <var data-var='date'>25</var>, <var data-var='time'>15:32</var> NZDT</small><br><strong>In progress</strong> - Scheduled maintenance is currently in progress. We will provide updates as necessary.</p><p><small>Jan <var data-var='date'>23</var>, <var data-var='time'>09:11</var> NZDT</small><br><strong>Scheduled</strong> - We will be undergoing scheduled maintenance during this time to update the system.</p>tag:status.nesi.org.nz,2005:Incident/197288922024-01-19T21:15:21+13:002024-01-19T21:15:21+13:00Flexi HPC Openstack Control pane OS upgrades - continues<p><small>Jan <var data-var='date'>19</var>, <var data-var='time'>21:15</var> NZDT</small><br><strong>Completed</strong> - The scheduled maintenance has been completed.</p><p><small>Jan <var data-var='date'>17</var>, <var data-var='time'>09:15</var> NZDT</small><br><strong>In progress</strong> - Scheduled maintenance is currently in progress. We will provide updates as necessary.</p><p><small>Jan <var data-var='date'>17</var>, <var data-var='time'>09:08</var> NZDT</small><br><strong>Scheduled</strong> - Openstack Controllers will be upgraded sequentially from CentOS 8 to Rocky 9. No service impact is expected. This will continue throughout this week</p>tag:status.nesi.org.nz,2005:Incident/197447682024-01-18T16:00:27+13:002024-01-18T16:00:27+13:00Authentication services mainenance<p><small>Jan <var data-var='date'>18</var>, <var data-var='time'>16:00</var> NZDT</small><br><strong>Completed</strong> - The scheduled maintenance has been completed.</p><p><small>Jan <var data-var='date'>18</var>, <var data-var='time'>14:01</var> NZDT</small><br><strong>In progress</strong> - Scheduled maintenance is currently in progress. We will provide updates as necessary.</p><p><small>Jan <var data-var='date'>18</var>, <var data-var='time'>12:44</var> NZDT</small><br><strong>Scheduled</strong> - We will be doing some upgrades on the directory services used for authentication and DNS services in the HPCF for Maui & Mahuika. The upgrade process may cause minor disruptions such as delayed logins and authentication failures. If you have any issues during this period, please what a few minutes and try again.</p>tag:status.nesi.org.nz,2005:Incident/197171052024-01-17T09:00:28+13:002024-01-17T09:00:28+13:00Flexi HPC Openstack Control pane OS upgrades<p><small>Jan <var data-var='date'>17</var>, <var data-var='time'>09:00</var> NZDT</small><br><strong>Completed</strong> - The scheduled maintenance has been completed.</p><p><small>Jan <var data-var='date'>16</var>, <var data-var='time'>21:00</var> NZDT</small><br><strong>In progress</strong> - Scheduled maintenance is currently in progress. We will provide updates as necessary.</p><p><small>Jan <var data-var='date'>16</var>, <var data-var='time'>11:52</var> NZDT</small><br><strong>Scheduled</strong> - Openstack Controllers will be upgraded sequentially from CentOS 8 to Rocky 9. No service impact is expected.</p>tag:status.nesi.org.nz,2005:Incident/197195502024-01-16T17:57:49+13:002024-01-16T17:57:49+13:00jupyter.nesi.org.nz login issues<p><small>Jan <var data-var='date'>16</var>, <var data-var='time'>17:57</var> NZDT</small><br><strong>Resolved</strong> - This incident has been resolved.</p><p><small>Jan <var data-var='date'>16</var>, <var data-var='time'>17:28</var> NZDT</small><br><strong>Monitoring</strong> - A fix has been implemented and we are monitoring the results.</p><p><small>Jan <var data-var='date'>16</var>, <var data-var='time'>17:13</var> NZDT</small><br><strong>Investigating</strong> - jupyter.nesi.org.nz is experiencing login issues at the moment.<br /><br />We are investigating the issue and apologise for the inconvenience. If you have further questions or queries, please, contact us at support@nesi.org.nz</p>tag:status.nesi.org.nz,2005:Incident/196656752024-01-11T06:30:25+13:002024-01-11T07:46:13+13:00Minor Version Update for Greta Point Ceph instance<p><small>Jan <var data-var='date'>11</var>, <var data-var='time'>06:30</var> NZDT</small><br><strong>Completed</strong> - The scheduled maintenance has been completed.</p><p><small>Jan <var data-var='date'>10</var>, <var data-var='time'>22:30</var> NZDT</small><br><strong>In progress</strong> - Scheduled maintenance is currently in progress. We will provide updates as necessary.</p><p><small>Jan <var data-var='date'>10</var>, <var data-var='time'>22:27</var> NZDT</small><br><strong>Scheduled</strong> - We are doing a minor version update on our Ceph cluster in Greta Point. This is a routine operation. All endpoints provided by Ceph are clustered in a highly available setting and will be available throughout the whole operation. So this should have no effects on any user facing service.<br /><br />If you are experiencing any problems with the access to rados/ceph in Greta Point feel free to immediately notify us via #flexi-hpcloud or channel in slack.</p>tag:status.nesi.org.nz,2005:Incident/196545212024-01-10T00:33:30+13:002024-01-10T00:33:30+13:00Minor Version Update for Tamaki Ceph instance<p><small>Jan <var data-var='date'>10</var>, <var data-var='time'>00:33</var> NZDT</small><br><strong>Completed</strong> - The scheduled maintenance has been completed.</p><p><small>Jan <var data-var='date'> 9</var>, <var data-var='time'>21:46</var> NZDT</small><br><strong>In progress</strong> - Scheduled maintenance is currently in progress. We will provide updates as necessary.</p><p><small>Jan <var data-var='date'> 9</var>, <var data-var='time'>21:40</var> NZDT</small><br><strong>Scheduled</strong> - We are doing a minor version update on our Ceph cluster. This is a routine operation. All endpoints provided by ceph are clustered in a highly available setting and will be available throughout the whole operation. So this should have no effects on any user facing service.<br /><br />If you are experiencing any problems with Flexible High Performance Cloud feel free to immediately notify us via #flexi-hpcloud channel in slack.</p>tag:status.nesi.org.nz,2005:Incident/194739492023-12-19T20:20:08+13:002023-12-19T20:20:08+13:00Reboot/outage of Maui XC50<p><small>Dec <var data-var='date'>19</var>, <var data-var='time'>20:20</var> NZDT</small><br><strong>Completed</strong> - Maintenance is complete and test indicate Maui is working as expected.</p><p><small>Dec <var data-var='date'>19</var>, <var data-var='time'>20:08</var> NZDT</small><br><strong>Verifying</strong> - Verification is currently underway for the maintenance items.</p><p><small>Dec <var data-var='date'>19</var>, <var data-var='time'>17:46</var> NZDT</small><br><strong>Update</strong> - We are extending the outage by 2 hours, new end time is 20:00 (8pm)</p><p><small>Dec <var data-var='date'>19</var>, <var data-var='time'>13:30</var> NZDT</small><br><strong>In progress</strong> - Scheduled maintenance is currently in progress. We will provide updates as necessary.</p><p><small>Dec <var data-var='date'>19</var>, <var data-var='time'>10:00</var> NZDT</small><br><strong>Scheduled</strong> - NeSI engineers require a reboot of Maui (XC50) to locate and fix ongoing issues</p>tag:status.nesi.org.nz,2005:Incident/194239082023-12-19T09:55:26+13:002023-12-19T09:55:26+13:00Maui XC50 Slurm controller failover<p><small>Dec <var data-var='date'>19</var>, <var data-var='time'>09:55</var> NZDT</small><br><strong>Completed</strong> - This maintenance did not result in the outcome we had hoped. We will complete this maintenance in favour of a complekte outage/reboot.</p><p><small>Dec <var data-var='date'>18</var>, <var data-var='time'>19:52</var> NZDT</small><br><strong>Verifying</strong> - Verification is currently underway for the maintenance items.</p><p><small>Dec <var data-var='date'>18</var>, <var data-var='time'>10:41</var> NZDT</small><br><strong>Update</strong> - Due to continuing issues we are going to failover to our primary controller. There may be some job submission and Slurm command timeouts during the failover process.</p><p><small>Dec <var data-var='date'>14</var>, <var data-var='time'>15:19</var> NZDT</small><br><strong>In progress</strong> - We continue to see issues with the controllers on Maui XC50 and continue to investigate and escalate to vendors.</p><p><small>Dec <var data-var='date'>14</var>, <var data-var='time'>15:17</var> NZDT</small><br><strong>Update</strong> - We are continuing to verify the maintenance items.</p><p><small>Dec <var data-var='date'>14</var>, <var data-var='time'>13:08</var> NZDT</small><br><strong>Verifying</strong> - We have failed over to primary controller and we are verifying the stability of the system.</p><p><small>Dec <var data-var='date'>14</var>, <var data-var='time'>12:38</var> NZDT</small><br><strong>In progress</strong> - Due to continuing issues we are going to failover to our primary controller at 13:00 NZT. There may be some job submission and Slurm command timeouts during the failover process.</p><p><small>Dec <var data-var='date'>13</var>, <var data-var='time'>19:30</var> NZDT</small><br><strong>Update</strong> - The failover is complete, however, intermittent issues remain.</p><p><small>Dec <var data-var='date'>13</var>, <var data-var='time'>18:12</var> NZDT</small><br><strong>Verifying</strong> - Verification is currently underway for the maintenance items.</p><p><small>Dec <var data-var='date'>13</var>, <var data-var='time'>17:47</var> NZDT</small><br><strong>Update</strong> - Failover of slurm controller did not happen cleanly. We have extended maintenance until 18:15</p><p><small>Dec <var data-var='date'>13</var>, <var data-var='time'>16:45</var> NZDT</small><br><strong>In progress</strong> - Scheduled maintenance is currently in progress. We will provide updates as necessary.</p><p><small>Dec <var data-var='date'>13</var>, <var data-var='time'>16:43</var> NZDT</small><br><strong>Update</strong> - We will be undergoing scheduled maintenance during this time.</p><p><small>Dec <var data-var='date'>13</var>, <var data-var='time'>16:41</var> NZDT</small><br><strong>Scheduled</strong> - We have located an issue on the primary Maui XC50 slurm controller. To mitigate we need add a new controller to the configuration and fail over to that node. Because we need to restart all slurmd daemons of the nodes, this will cause disruption to the jobs currently running.</p>tag:status.nesi.org.nz,2005:Incident/194050562023-12-13T23:30:10+13:002023-12-13T23:30:10+13:00FlexiHPC Network failover testing<p><small>Dec <var data-var='date'>13</var>, <var data-var='time'>23:30</var> NZDT</small><br><strong>Completed</strong> - The scheduled maintenance has been completed.</p><p><small>Dec <var data-var='date'>13</var>, <var data-var='time'>20:31</var> NZDT</small><br><strong>In progress</strong> - Scheduled maintenance is currently in progress. We will provide updates as necessary.</p><p><small>Dec <var data-var='date'>12</var>, <var data-var='time'>10:23</var> NZDT</small><br><strong>Scheduled</strong> - We are undertaking network maintenance and testing on Wednesday evening, Dec 13th, from 8:30pm to 11:30pm. Access to the FlexiHPC platforms may be affected during this time.</p>tag:status.nesi.org.nz,2005:Incident/194024302023-12-12T12:10:07+13:002023-12-12T12:10:07+13:00Maui jobs stuck in completing state<p><small>Dec <var data-var='date'>12</var>, <var data-var='time'>12:10</var> NZDT</small><br><strong>Resolved</strong> - This incident has been resolved.</p><p><small>Dec <var data-var='date'>12</var>, <var data-var='time'>09:09</var> NZDT</small><br><strong>Monitoring</strong> - A fix has been implemented and we are monitoring the results.</p><p><small>Dec <var data-var='date'>12</var>, <var data-var='time'>05:08</var> NZDT</small><br><strong>Investigating</strong> - Jobs on Maui are getting stuck in a `completing` or `CG` state and failing to release resources to allow new jobs to start.<br /><br />We are still investigating the issue with not current ETA on resolution</p>