The infiniband problem with wbh001 has now been resolved. This hugemem node is now available and running jobs in Mahuika slurm.
Posted 4 months ago. Jun 19, 2019 - 17:06 NZST
Since the platform maintenance last Thursday (13th June) node wbh001, the only node in the hugemem partition of Mahuika, has been offline due to an Infiniband networking problem. Vendor support has been engaged and severity escalated, however an estimated time to recovery is unavailable as the root cause is still being investigated.
Posted 4 months ago. Jun 19, 2019 - 11:10 NZST
This incident affected: NeSI High Performance Computing and Storage (Mahuika Ancillary nodes).