WHAT HAPPENED?

A storage server attached to the VM cluster hung causing several VMs to pause. TSO refreshed the connection on the storage server and was able to bring the VMs back online.

Affected servers:
arrakis.cc.gatech.edu (YP)
ende1.cc.gatech.edu
ende2.cc.gatech.edu
ende3.cc.gatech.edu
sarge.cc.gatech.edu (databases)

WHEN DID IT HAPPEN?

The issue was first reported to TSO a little before 9pm on Saturday, 9/7/19, and the issue was resolved by 10:45pm the same day.

WHY DID IT HAPPEN?

The storage server experienced hung NFS mounts, which led several VMs to pause when they could not reach their storage. Restarting the NFS process on the storage server fixed the issue.

WHO IS AFFECTED?

Users may have experienced a brief outage to the servers listed above, as well as anyone using databases located on sarge.cc.gatech.edu.

WHAT DO YOU NEED TO DO?

No user action is required, the issue has been resolved and TSO is continuing to monitor.

WHO SHOULD YOU CONTACT FOR QUESTIONS?

Feel free to contact the TSO Help Desk (CCB 148, 404.894.7065, helpdesk@cc.gatech.edu).

Owner of Alert
TSO