High Performance Computing Survey

By Brady Black

I am currently an intern at the University of North Carolina, where we are in the middle of the design phase for a new 64 CPU cluster. This survey was created to gather information about what other cluster administrators have discovered and to provide insight into different/better/worse ways of accomplishing cluster administration.

Please be assured that the information gathered from this survey will be used to provide a better understanding of current issues facing the High Performance Computing industry and will not specifically mention any participants unless specific consent is received. If you would like a copy of the aggregate results please let me know at the bottom of the survey. About the author.

Where are you from:

Do you have a website I could reference:

What is your cluster(s) primary function:

What is the current number of nodes on your cluster?
   2 - 10 Nodes
   11 - 25 Nodes
   26 - 50 Nodes
   51 - 100 Nodes
   100 - 200 Nodes
   201 - 300 Nodes
   301 - 400 Nodes
   400 + Nodes

At what point do you believe adding more nodes is not worth the trouble?

In a typical week, how many nodes are out of production for any period of time?
   2 - 10 Nodes
   11 - 25 Nodes
   26 - 50 Nodes
   51 + Nodes

How many users does your cluster serve?
    1 - 20 Users
    21 - 50 Users
    51 - 100 Users
    101 - 200 Users
    201 - 300 Users
    301 - 500 Users
    501 - 1000 Users
    1001 - 2000 Users
    2001+ Users

How many applications do you officially support for your users?
    1 - 10 Applications
    11 - 25 Applications
    26 - 50 Applications
    51 - 100 Applications
    101 - 200 Applications
    201 + Applications

What is used to monitor cluster node hardware? Is there a specific tool you utilize?

Are you satisfied with your node monitoring results?
    Yes / No

What do you use the hardware monitoring results for?

What is used to monitor the cluster network?

Are you satisfied with the network monitoring results?
    Yes / No

What are the network monitoring results used for?

Do you manage and aggregate node logs and mail? If so how?
    Yes / No

If you perform any monitoring, how do you present the monitored data and to whom?

What operating system(s) do you run on your cluster:
    Mac OS X

How are operating system updates accomplished?

How are BIOS updates managed?

What do you use for job scheduling?
    PBS Pro
    Sun Grid Engine

Do you run any performance analysis? If so, how is this analysis performed.
Yes / No

How are the performance analysis results utilized and are they beneficial?

Does your cluster have remote management capability?
    Yes / No

Do you have a service level agreement (SLA)?
    9 - 5 weekdays (70%)

How often are down times scheduled?

How are your down times accomplished?

In your opinion, what was the best "bang for your buck" option you installed in your cluster?

What is the most common problem you have with your cluster?

Would you like a copy of the aggregate results attained through this survey?
    Yes / No
Send the results to:

Comments/Questions/Contact Information
(If you have any questions please be sure to leave a way for me to contact you.)

Do I have permission to publish specific information about your cluster?
    Yes / No