WHAT HAPPENED?
The killerbee cluster is currently unavailable via new SSH sessions.
WHEN DID IT HAPPEN?
The issue was first discovered at 8:30 AM on Friday, February 5, 2016 and is ongoing.
WHY DID IT HAPPEN?
The cause is unknown at this time.
WHO WAS AFFECTED?
CoC faculty and students attempting to access the killerbee cluster machines via SSH.
WHAT DO YOU NEED TO DO?
No user action is required.
HAS THE PROBLEM BEEN RESOLVED?
No. TSO is currently troubleshooting the issue.
WHO SHOULD YOU CONTACT FOR QUESTIONS?
Feel free to contact the TSO Help Desk (CCB 148, 404.894.7065, helpdesk@cc.gatech.edu).
UPDATE:
The issue was traced to the home directory server holder.cc.gatech.edu. It had gone offline due to overheating and was causing login issues . Upon investigating, TSO discovered a cooling issue in KACB 2219 Data Center. Condensation was heavy in the room, and the air handlers were emitting mist. TSO shut down some non-essential backend servers and contacted Campus Facilities. Campus Facilities arrived shortly thereafter and is in the process of investigating.
UPDATE:
Campus Facilities changed a setting to address the temperature and humidity in the room. Holder was restarted. After refreshing the mounts for holder, the killerbees were returned to service at 11:30 AM, Friday, February 5.
UPDATE:
Campus Facilities placed all four HVAC units on override to ensure they remain on for the weekend. They are bringing in the contractor, Johnson Control, to perform additional troubleshooting to prevent future recurrences.
Owner of Alert
TSO