| Knowledge Center Contents Previous Next Index |
Tuning the Cluster
Contents
Tuning LIM
LIM provides critical services to all LSF components. In addition to the timely collection of resource information, LIM provides host selection and job placement policies. If you are using Platform MultiCluster, LIM determines how different clusters should exchange load and resource information. You can tune LIM policies and parameters to improve performance.
LIM uses load thresholds to determine whether to place remote jobs on a host. If one or more LSF load indices exceeds the corresponding threshold (too many users, not enough swap space, etc.), then the host is regarded as busy and LIM will not recommend jobs to that host. You can also tune LIM load thresholds.
You can also change default LIM behavior and pre-select hosts to be elected master to improve performance.
In this section
Adjusting LIM Parameters
There are two main goals in adjusting LIM configuration parameters: improving response time, and reducing interference with interactive use. To improve response time, tune LSF to correctly select the best available host for each job. To reduce interference, tune LSF to avoid overloading any host.
LIM policies are advisory information for applications. Applications can either use the placement decision from LIM, or make further decisions based on information from LIM.
Most of the LSF interactive tools use LIM policies to place jobs on the network. LSF uses load and resource information from LIM and makes its own placement decisions based on other factors in addition to load information.
Files that affect LIM are
lsf.shared, lsf.cluster.cluster_name, wherecluster_nameis the name of your cluster.RUNWINDOW parameter
LIM thresholds and run windows affect the job placement advice of LIM. Job placement advice is not enforced by LIM.
The RUNWINDOW parameter defined in
lsf.cluster.cluster_namespecifies one or more time windows during which a host is considered available. If the current time is outside all the defined time windows, the host is considered locked and LIM will not advise any applications to run jobs on the host.Load Thresholds
Load threshold parameters define the conditions beyond which a host is considered busy by LIM and are a major factor in influencing performance. No jobs will be dispatched to a busy host by LIM's policy. Each of these parameters is a load index value, so that if the host load goes beyond that value, the host becomes busy.
LIM uses load thresholds to determine whether to place remote jobs on a host. If one or more LSF load indices exceeds the corresponding threshold (too many users, not enough swap space, etc.), then the host is regarded as busy and LIM will not recommend jobs to that host.
Thresholds can be set for any load index supported internally by the LIM, and for any external load index.
If a particular load index is not specified, LIM assumes that there is no threshold for that load index. Define looser values for load thresholds if you want to aggressively run jobs on a host.
See Load Thresholds for more details.
In this section
- Load indices that affect LIM performance
- Comparing LIM load thresholds
- If LIM often reports a host as busy
- If interactive jobs slow down response
- Multiprocessor systems
Load indices that affect LIM performance
For more details on load indices see Load Indices.
Comparing LIM load thresholds
To tune LIM load thresholds, compare the output of
lsloadto the thresholds reported bylshosts -l.The
lsloadandlsmoncommands display an asterisk*next to each load index that exceeds its threshold.Example
Consider the following output from
lshosts -landlsload:lshosts -lHOST_NAME: hostD ... LOAD_THRESHOLDS: r15s r1m r15m ut pg io ls it tmp swp mem - 3.5 - - 15 - - - - 2M 1M HOST_NAME: hostA ... LOAD_THRESHOLDS: r15s r1m r15m ut pg io ls it tmp swp mem - 3.5 - - 15 - - - - 2M 1MlsloadHOST_NAME status r15s r1m r15m ut pg ls it tmp swp mem hostD ok 0.0 0.0 0.0 0% 0.0 6 0 30M 32M 10M hostA busy 1.9 2.1 1.9 47% *69.6 21 0 38M 96M 60MIn this example, the hosts have the following characteristics:
hostDisok.hostAisbusy- Thepg(paging rate) index is 69.6, above the threshold of 15.If LIM often reports a host as busy
If LIM often reports a host as
busywhen the CPU utilization and run queue lengths are relatively low and the system is responding quickly, the most likely cause is the paging rate threshold. Try raising thepgthreshold.Different operating systems assign subtly different meanings to the paging rate statistic, so the threshold needs to be set at different levels for different host types. In particular, HP-UX systems need to be configured with significantly higher
pgvalues; try starting at a value of 50.There is a point of diminishing returns. As the paging rate rises, eventually the system spends too much time waiting for pages and the CPU utilization decreases. Paging rate is the factor that most directly affects perceived interactive response. If a system is paging heavily, it feels very slow.
If interactive jobs slow down response
If you find that interactive jobs slow down system response too much while LIM still reports your host as
ok, reduce the CPU run queue lengths (r15s,r1m,r15m). Likewise, increase CPU run queue lengths if hosts become busy at low loads.Multiprocessor systems
On multiprocessor systems, CPU run queue lengths (
r15s,r1m,r15m) are compared to the effective run queue lengths as displayed by thelsload -Ecommand.CPU run queue lengths should be configured as the load limit for a single processor. Sites with a variety of uniprocessor and multiprocessor machines can use a standard value for
r15s,r1mandr15min the configuration files, and the multiprocessor machines will automatically run more jobs.Note that the normalized run queue length displayed by
lsload -Nis scaled by the number of processors. See Load Indices for the concept of effective and normalized run queue lengths.Changing Default LIM Behavior to Improve Performance
You may want to change the default LIM behavior in the following cases:
- In very large sites. As the size of the cluster becomes large (500 hosts or more), reconfiguration of the cluster causes each LIM to re-read the configuration files. This can take quite some time.
- In sites where each host in the cluster cannot share a common configuration directory or exact replica.
In this section
- Default LIM behavior
- Changing Default LIM Behavior to Improve Performance
- Reconfiguration and LSF_MASTER_LIST
- How LSF works with LSF_MASTER_LIST
- Considerations
Default LIM behavior
By default, each LIM running in an LSF cluster must read the configuration files
lsf.sharedandlsf.cluster.cluster_nameto obtain information about resource definitions, host types, host thresholds, etc. This includes master and slave LIMs.This requires that each host in the cluster share a common configuration directory or an exact replica of the directory.
Change default LIM behavior
The parameter LSF_MASTER_LIST in
lsf.confallows you to identify for the LSF system which hosts can become masters. Hosts not listed in LSF_MASTER_LIST will be considered as slave-only hosts and will never be considered to become master.Set LSF_MASTER_LIST (lsf.conf)
- Edit
lsf.confand set the parameter LSF_MASTER_LIST to indicate hosts that are candidates to become the master host. For example:LSF_MASTER_LIST="hostA hostB hostC"The order in which you specify hosts in LSF_MASTER_LIST is the preferred order for selecting hosts to become the master LIM.
- Save your changes.
- Reconfigure the cluster
lsadmin reconfigbadmin mbdrestart.Reconfiguration and LSF_MASTER_LIST
If you change LSF_MASTER_LIST
Whenever you change the parameter LSF_MASTER_LIST, reconfigure the cluster with
lsadmin reconfigandbadmin mbdrestart.If you change lsf.cluster.
cluster_nameor lsf.sharedIf you make changes that do not affect load report messages such as adding or removing slave-only hosts, you only need to restart the LIMs on all master candidates with the command
lsadmin limrestartand the specific host names.For example:
lsadmin limrestart hostA hostB hostCIf you make changes that affect load report messages such as load indices, you must restart all the LIMs in the cluster. Use the command
lsadmin reconfig.How LSF works with LSF_MASTER_LIST
The files
lsf.sharedandlsf.cluster.cluster_nameare shared only among LIMs listed as candidates to be elected master with the parameter LSF_MASTER_LIST.The preferred master host is no longer the first host in the cluster list in
lsf.cluster.cluster_name, but the first host in the list specified by LSF_MASTER_LIST inlsf.conf.Whenever you reconfigure, only master LIM candidates read
lsf.sharedandlsf.cluster.cluster_nameto get updated information. The elected master LIM sends configuration information to slave LIMs.The order in which you specify hosts in LSF_MASTER_LIST is the preferred order for selecting hosts to become the master LIM.
Considerations
Generally, the files
lsf.cluster.cluster_nameandlsf.sharedfor hosts that are master candidates should be identical.When the cluster is started up or reconfigured, LSF rereads configuration files and compares
lsf.cluster.cluster_nameandlsf.sharedfor hosts that are master candidates.In some cases in which identical files are not shared, files may be out of sync. This section describes situations that may arise should
lsf.cluster.cluster_nameandlsf.sharedfor hosts that are master candidates not be identical to those of the elected master host.LSF_MASTER_LIST defined
When LSF_MASTER_LIST is defined, LSF only rejects candidate master hosts listed in LSF_MASTER_LIST from the cluster if the number of load indices in
lsf.cluster.cluster_nameorlsf.sharedfor master candidates is different from the number of load indices in thelsf.cluster.cluster_nameorlsf.sharedfiles of the elected master.A warning is logged in the log file
lim.log.master_host_nameand the cluster continues to run, but without the hosts that were rejected.If you want the hosts that were rejected to be part of the cluster, ensure the number of load indices in
lsf.cluster.cluster_nameandlsf.sharedare identical for all master candidates and restart LIMs on the master and all master candidates:
lsadmin limrestart hostA hostB hostCLSF_MASTER_LIST defined, and master host goes down
If LSF_MASTER_LIST is defined and the elected master host goes down, and if the number of load indices in
lsf.cluster.cluster_nameorlsf.sharedfor the new elected master is different from the number of load indices in the files of the master that went down, LSF will reject all master candidates that do not have the same number of load indices in their files as the newly elected master. LSF will also reject all slave-only hosts. This could cause a situation in which only the newly elected master is considered part of the cluster.A warning is logged in the log file
lim.log.new_master_host_nameand the cluster continues to run, but without the hosts that were rejected.To resolve this, from the current master host, restart all LIMs:
lsadmin limrestart allAll slave-only hosts will be considered part of the cluster. Master candidates with a different number of load indices in their
lsf.cluster.cluster_nameorlsf.sharedfiles will be rejected.When the master that was down comes back up, you will have the same situation as described in The files lsf.shared and lsf.cluster.cluster_name are shared only among LIMs listed as candidates to be elected master with the parameter LSF_MASTER_LIST.. You will need to ensure load indices defined in
lsf.cluster.cluster_nameandlsf.sharedfor all master candidates are identical and restart LIMs on all master candidates.Improving performance of mbatchd query requests on UNIX
You can improve
mbatchdquery performance on UNIX systems using the following methods:
- Multithreading-On UNIX platforms that support thread programming, you can change default
mbatchdbehavior to use multithreading and increase performance of query requests when you use thebjobscommand. Multithreading is beneficial for busy clusters with many jobs and frequent query requests. This may indirectly increase overallmbatchdperformance.- Hard CPU affinity-You can specify the master host CPUs on which
mbatchdchild query processes can run. This improvesmbatchdscheduling and dispatch performance by binding query processes to specific CPUs so that higher prioritymbatchdprocesses can run more efficiently.In this section
- How mbatchd works without multithreading
- Configure mbatchd to use multithreading
- Set a query-dedicated port for mbatchd
- Specify an expiry time for child mbatchds (optional)
- Specify hard CPU affinity
- Configure mbatchd to push new job information to child mbatchd
How mbatchd works without multithreading
Ports
By default,
mbatchduses the port defined by the parameter LSB_MBD_PORT inlsf.confor looks into the system services database for port numbers to communicate with LIM and job request commands.It uses this port number to receive query requests from clients.
Servicing requests
For every query request received,
mbatchdforks a childmbatchdto service the request. Each childmbatchdprocesses the request and then exits.Configure mbatchd to use multithreading
When
mbatchdhas a dedicated port specified by the parameter LSB_QUERY_PORT inlsf.conf, it forks a childmbatchdwhich in turn creates threads to process query requests.As soon as
mbatchdhas forked a childmbatchd, the childmbatchdtakes over and listens on the port to process more query requests. For each query request, the childmbatchdcreates a thread to process it.The child
mbatchdcontinues to listen to the port number specified by LSB_QUERY_PORT and creates threads to service requests until the job status changes, a new job is submitted, or until the time specified in MBD_REFRESH_TIME inlsb.paramshas passed.Specify a time interval, in seconds, when
mbatchdwill fork a new childmbatchdto service query requests to keep information sent back to clients updated. A childmbatchdprocesses query requests creating threads.MBD_REFRESH_TIME has the following syntax:
MBD_REFRESH_TIME=seconds[min_refresh_time]where
min_refresh_timedefines the minimum time (in seconds) that the childmbatchdwill stay to handle queries. The valid range is 0 - 300. The default is 5 seconds.
- If MBD_REFRESH_TIME is <
min_refresh_time, the childmbatchdexits at MBD_REFRESH_TIME even if the job changes status or a new job is submitted before MBD_REFRESH_TIME expires.- If MBD_REFRESH_TIME >
min_refresh_time
- the child
mbatchdexits atmin_refresh_timeif a job changes status or a new job is submitted before themin_refresh_time- the child
mbatchdexits after themin_refresh_timewhen a job changes status or a new job is submitted- If MBD_REFRESH_TIME >
min_refresh_timeand no job changes status or a new job is submitted, the childmbatchdexits at MBD_REFRESH_TIMEThe default for
min_refresh_timeis 10 seconds.If you use the
bjobscommand and do not get up-to-date information, you may want to decrease the value of MBD_REFRESH_TIME or MIN_REFRESH_TIME inlsb.paramsto make it likely that successive job queries could get the newly-submitted job information.
note:Lowering the value of MBD_REFRESH_TIME or MIN_REFRESH_TIME increases the load on mbatchd and might negatively affect performance.
- Specify a query-dedicated port for the
mbatchdby setting LSB_QUERY_PORT inlsf.conf.See Set a query-dedicated port for mbatchd.
- Optional: Set an interval of time to indicate when a new child
mbatchdis to be forked by setting MBD_REFRESH_TIME inlsb.params. The default value of MBD_REFRESH_TIME is 5 seconds, and valid values are 0-300 seconds.See Specify an expiry time for child mbatchds (optional).
- Optional: Use NEWJOB_REFRESH=Y in
lsb.paramsto enable a childmbatchdto get up to date new job information from the parentmbatchd.See Configure mbatchd to push new job information to child mbatchd.
Set a query-dedicated port for mbatchd
To change the default
mbatchdbehavior so thatmbatchdforks a childmbatchdthat can create threads, specify a port number with LSB_QUERY_PORT inlsf.conf.
tip:This configuration only works on UNIX platforms that support thread programming.
- Log on to the host as the primary LSF administrator.
- Edit
lsf.conf.- Add the LSB_QUERY_PORT parameter and specify a port number that will be dedicated to receiving requests from hosts.
- Save the
lsf.conffile.- Reconfigure the cluster:
badmin mbdrestartSpecify an expiry time for child mbatchds (optional)
Use MBD_REFRESH_TIME in
lsb.paramsto define how oftenmbatchdforks a new childmbatchd.
- Log on to the host as the primary LSF administrator.
- Edit
lsb.params.- Add the MBD_REFRESH_TIME parameter and specify a time interval in seconds to fork a child
mbatchd.The default value for this parameter is 5 seconds. Valid values are 0 to 300 seconds.
- Save the
lsb.paramsfile.- Reconfigure the cluster as follows:
badmin reconfigSpecify hard CPU affinity
You can specify the master host CPUs on which
mbatchdchild query processes can run (hard CPU affinity). This improvesmbatchdscheduling and dispatch performance by binding query processes to specific CPUs so that higher prioritymbatchdprocesses can run more efficiently.When you define this parameter, LSF runs
mbatchdchild query processesonlyon the specified CPUs. The operating system can assign other processes to run on the same CPU, however, if utilization of the bound CPU is lower than utilization of the unbound CPUs.
- Identify the CPUs on the master host that will run
mbatchdchild query processes.
- Linux: To obtain a list of valid CPUs, run the command
/proc/cpuinfo- Solaris: To obtain a list of valid CPUs, run the command
psrinfo- In the file
lsb.params, define the parameter MBD_QUERY_CPUS.For example, if you specify:
MBD_QUERY_CPUS=1 2the
mbatchdchild query processes will run only on CPU numbers 1 and 2 on the master host.You can specify CPU affinity only for master hosts that use one of the following operating systems:
- Linux 2.6 or higher
- Solaris 8 or higher
If failover to a master host candidate occurs, LSF maintains the hard CPU affinity, provided that the master host candidate has the same CPU configuration as the original master host. If the configuration differs, LSF ignores the CPU list and reverts to default behavior.
- Verify that the
mbatchdchild query processes are bound to the correct CPUs on the master host.
- Start up a query process by running a query command such as
bjobs.- Check to see that the query process is bound to the correct CPU.
- Linux: Run the command
taskset -p <pid>- Solaris: Run the command
ps -APConfigure mbatchd to push new job information to child mbatchd
Prerequisites: LSB_QUERY_PORT must be defined. in
lsf.conf.If you have enabled multithreaded mbatchd support, the bjobs command may not display up-to-date information if two consecutive query commands are issued before a child mbatchd expires because child mbatchd job information is not updated. Use NEWJOB_REFRESH=Y in
lsb.paramsto enable a childmbatchdto get up to date new job information from the parentmbatchd.When NEWJOB_REFRESH=Y the parent
mbatchdpushes new job information to a childmbatchd. Job queries withbjobsdisplay new jobs submitted after the childmbatchdwas created.
- Log on to the host as the primary LSF administrator.
- Edit
lsb.params.- Add NEWJOB_REFRESH=Y.
You should set MBD_REFRESH_TIME in
lsb.paramsto a value greater than 10 seconds.- Save the
lsb.paramsfile.- Reconfigure the cluster as follows:
badmin reconfig
|
Platform Computing Inc.
www.platform.com |
| Knowledge Center Contents Previous Next Index |