Mahuika cluster jobs fail to complete cleanly
Incident Report for NeSI Status
Postmortem

switched BCM head node master from 01 to 02 and that's clearing all our alerts

Posted Sep 14, 2021 - 15:54 NZST

Resolved
This incident has been resolved.
Posted Sep 14, 2021 - 15:53 NZST
Monitoring
The service has now been restored and jobs are running in an orderly fashion. We will continue monitoring the situation.
Posted Sep 14, 2021 - 13:48 NZST
Investigating
Due to a network issue Mahuika compute nodes may fail to leave completion state and are stuck in the queue with state indication CG. We are investigation the issue now.
Posted Sep 14, 2021 - 12:53 NZST
This incident affected: NeSI HPC Compute Infrastructure (HPC Compute nodes - Mahuika).