|Knowledge Center Contents Previous Next Index|
How the System Works
LSF can be configured in different ways that affect the scheduling of jobs. By default, this is how LSF handles a new job:
- Receive the job. Create a job file. Return the job ID to the user.
- Schedule the job and select the best available host.
- Dispatch the job to a selected host.
- Set the environment on the host.
- Start the job.
The life cycle of a job starts when you submit the job to LSF. On the command line,
bsubis used to submit jobs, and you can specify many options to
bsubto modify the default behavior, including the use of a JSDL file. Jobs must be submitted to a queue.
Queues represent a set of pending jobs, lined up in a defined order and waiting for their opportunity to use resources. Queues implement different job scheduling and control policies. All jobs submitted to the same queue share the same scheduling and control policy. Queues do not correspond to individual hosts; each queue can use all server hosts in the cluster, or a configured subset of the server hosts.
A queue is a network-wide holding place for jobs. Jobs enter the queue via the
bsubcommand. LSF can be configured to have one or more default queues. Jobs that are not submitted to a specific queue will be assigned to the first default queue that accepts them. Queues have the following attributes associated with them:
- Priority, where a larger integer is a higher priority
- Name, which uniquely identifies the queue
- Queue limits, that restrict hosts, number of jobs, users, groups, processors, etc.
- Standard UNIX limits: memory, swap, process, CPU, etc.
- Scheduling policies: FCFS, fairshare, preemptive, exclusive
- Run conditions
- Load-sharing threshold conditions, which apply load sharing to the queue
nice(1)value, which sets the UNIX scheduler priority
Example queueBegin Queue QUEUE_NAME = normal PRIORITY = 30 STACKLIMIT= 2048 DESCRIPTION = For normal low priority jobs, running only if hosts are lightly loaded. QJOB_LIMIT = 60 # job limit of the queue PJOB_LIMIT = 2 # job limit per processor ut = 0.2 io = 50/240 USERS = all HOSTS = all NICE = 20 End Queue
Defines the order in which queues are searched to determine which job will be processed. Queues are assigned a priority by the LSF administrator, where a higher number has a higher priority. Queues are serviced by LSF in order of priority from the highest to the lowest. If multiple queues have the same priority, LSF schedules all the jobs from these queues in first-come, first-served order.
Automatic queue selection
Typically, a cluster has multiple queues. When you submit a job to LSF you might define which queue the job will enter. If you submit a job without specifying a queue name, LSF considers the requirements of the job and automatically chooses a suitable queue from a list of candidate default queues. If you did not define any candidate default queues, LSF will create a new queue using all the default settings, and submit the job to that queue.
Viewing default queues
bparamsto display default queues:
bparamsDefault Queues: normal ...
The user can override this list by defining the environment variable LSB_DEFAULTQUEUE.
How automatic queue selection works
LSF selects a suitable queue according to:
- User access restriction - Queues that do not allow this user to submit jobs are not considered.
- Host restriction - If the job explicitly specifies a list of hosts on which the job can be run, then the selected queue must be configured to send jobs to all hosts in the list.
- Queue status - Closed queues are not considered.
- Exclusive execution restriction - If the job requires exclusive execution, then queues that are not configured to accept exclusive jobs are not considered.
- Job's requested resources - These must be within the resource allocation limits of the selected queue.
If multiple queues satisfy the above requirements, then the first queue listed in the candidate queues (as defined by the DEFAULT_QUEUE parameter or the LSB_DEFAULTQUEUE environment variable) that satisfies the requirements is selected.
When a batch job is submitted to a queue, LSF Batch holds it in a job file until conditions are right for it to be executed. Then the job file is used to execute the job.
The job file is a Bourne shell script run at execution time.
The job file is a batch file processed at execution time.
Job Scheduling and Dispatch
Submitted jobs sit in queues until they are scheduled and dispatched to a host for execution. When a job is submitted to LSF, many factors control when and where the job starts to run:
- Active time window of the queue or hosts
- Resource requirements of the job
- Availability of eligible hosts
- Various job slot limits
- Job dependency conditions
- Fairshare constraints
- Load conditions
First-Come, First-Served (FCFS) scheduling
By default, jobs in a queue are dispatched in first-come, first-served (FCFS) order. This means that jobs are dispatched according to their order in the queue. Since jobs are ordered according to job priority, this does not necessarily mean that jobs will be dispatched in the order of submission. The order of jobs in the queue can also be modified by the user or administrator.
Service level agreement (SLA) scheduling
An SLA in LSF is a "just-in-time" scheduling policy that defines an agreement between LSF administrators and LSF users. The SLA scheduling policy defines how many jobs should be run from each SLA to meet the configured goals.
Fairshare scheduling and other policies
If a fairshare scheduling policy has been specified for the queue or if host partitions have been configured, jobs are dispatched in accordance with these policies instead. To solve diverse problems, LSF allows multiple scheduling policies in the same cluster. LSF has several queue scheduling policies such as exclusive, preemptive, fairshare, and hierarchical fairshare.
Scheduling and dispatch
Jobs are scheduled at regular intervals (5 seconds by default, configured by the parameter JOB_SCHEDULING_INTERVAL in
lsb.params). Once jobs are scheduled, they can be immediately dispatched to hosts.
To prevent overloading any host, LSF waits a short time between dispatching jobs to the same host. The delay is configured by the JOB_ACCEPT_INTERVAL parameter in
lsb.queues. JOB_ACCEPT_INTERVAL controls the number of seconds to wait after dispatching a job to a host before dispatching a second job to the same host. The default is 60 seconds. If JOB_ACCEPT_INTERVAL is set to zero, more than one job can be started on a host at a time. A host running a short job which finishes before JOB_ACCEPT_INTERVAL has elapsed is free to accept a new job without waiting.
Some operating systems, such as Linux and AIX, let you increase the number of file descriptors that can be allocated to the master host. You do not need to limit the number of file descriptors to 1024 if you want fast job dispatching. To take advantage of the greater number of file descriptors, you must set the parameter LSB_MAX_JOB_DISPATCH_PER_SESSION in
lsf.confto a value greater than 300 and less than or equal to one-half the value of MAX_SBD_CONNS defined in
lsb.params. LSB_MAX_JOB_DISPATCH_PER_SESSION defines the maximum number of jobs that
mbatchdcan dispatch during one job scheduling session. You must restart
sbatchdwhen you change the value of this parameter for the change to take effect.
Jobs are not necessarily dispatched in order of submission.
Each queue has a priority number set by an LSF Administrator when the queue is defined. LSF tries to start jobs from the highest priority queue first.
By default, LSF considers jobs for dispatch in the following order:
- For each queue, from highest to lowest priority. If multiple queues have the same priority, LSF schedules all the jobs from these queues in first-come, first-served order.
- For each job in the queue, according to FCFS order
- If any host is eligible to run this job, start the job on the best eligible host, and mark that host ineligible to start any other job until JOB_ACCEPT_INTERVAL has passed
Jobs can be dispatched out of turn if pre-execution conditions are not met, specific hosts or resources are busy or unavailable, or a user has reached the user job slot limit.
Viewing job order in queue
bjobsto see the order in which jobs in a queue will actually be dispatched for the FCFS policy.
Changing job order in queue (btop and bbot)
bbotcommands to change the job order in the queue.
See Changing Job Order Within Queues for more information.
Each time LSF attempts to dispatch a job, it checks to see which hosts are eligible to run the job. A number of conditions determine whether a host is eligible:
- Host dispatch windows
- Resource requirements of the job
- Resource requirements of the queue
- Host list of the queue
- Host load levels
- Job slot limits of the host
A host is only eligible to run a job if all the conditions are met. If a job is queued and there is an eligible host for that job, the job is placed on that host. If more than one host is eligible, the job is started on the best host based on both the job and the queue resource requirements.
Host load levels
A host is available if the values of the load indices (such as
mem) of the host are within the configured scheduling thresholds. There are two sets of scheduling thresholds: host and queue. If any load index on the host exceeds the corresponding host threshold or queue threshold, the host is not eligible to run any job.
Viewing host load levels
- Use the
bhosts -lcommand to display the host thresholds.
- Use the
bqueues -lcommand to display the queue thresholds.
When LSF tries to place a job, it obtains current load information for all hosts.
The load levels on each host are compared to the scheduling thresholds configured for that host in the
lsb.hosts, as well as the per-queue scheduling thresholds configured in
If any load index exceeds either its per-queue or its per-host scheduling threshold, no new job is started on that host.
Viewing eligible hosts
bjobs -lpcommand displays the names of hosts that cannot accept a job at the moment together with the reasons the job cannot be accepted.
Resource requirements at the queue level can also be used to specify scheduling conditions (for example,
r1m<0.4 && pg<3).
A higher priority or earlier batch job is only bypassed if no hosts are available that meet the requirements of that job.
If a host is available but is not eligible to run a particular job, LSF looks for a later job to start on that host. LSF starts the first job found for which that host is eligible.
Job Execution Environment
When LSF runs your jobs, it tries to make it as transparent to the user as possible. By default, the execution environment is maintained to be as close to the submission environment as possible. LSF will copy the environment from the submission host to the execution host. The execution environment includes the following:
- Environment variables needed by the job
- Working directory where the job begins running
- Other system-dependent environment settings; for example, resource usage limits and
Since a network can be heterogeneous, it is often impossible or undesirable to reproduce the submission host's execution environment on the execution host. For example, if home directory is not shared between submission and execution host, LSF runs the job in the
/tmpon the execution host. If the DISPLAY environment variable is something like
:0.0, then it must be processed before using on the execution host. These are automatically handled by LSF.
To change the default execution environment, use:
For resource control, LSF also changes some of the execution environment of jobs. These include nice values, resource usage limits, or any other environment by configuring a job starter.
Shared user directories
LSF works best when user home directories are shared across all hosts in the cluster. To provide transparent remote execution, you should share user home directories on all LSF hosts.
To provide transparent remote execution, LSF commands determine the user's current working directory and use that directory on the remote host.
For example, if the command
cc file.cis executed remotely,
cconly finds the correct
file.cif the remote command runs in the same directory.
LSF automatically creates an
.lsbatchsubdirectory in the user's home directory on the execution host. This directory is used to store temporary input and output files for jobs.
Executables and the PATH environment variable
Search paths for executables (the PATH environment variable) are passed to the remote execution host unchanged. In mixed clusters, LSF works best when the user binary directories (for example,
/usr/local/bin) have the same path names on different host types. This makes the PATH variable valid on all hosts.
LSF configuration files are normally stored in a shared directory. This makes administration easier. There is little performance penalty for this, because the configuration files are not frequently read.
LSF is designed to continue operating even if some of the hosts in the cluster are unavailable. One host in the cluster acts as the master, but if the master host becomes unavailable another host takes over. LSF is available as long as there is one available host in the cluster.
LSF can tolerate the failure of any host or group of hosts in the cluster. When a host crashes, all jobs running on that host are lost. No other pending or running jobs are affected. Important jobs can be submitted to LSF with an option to automatically restart if the job is lost because of a host failure.
Dynamic master host
The LSF master host is chosen dynamically. If the current master host becomes unavailable, another host takes over automatically. The failover master host is selected from the list defined in LSF_MASTER_LIST in
install.configat installation). The first available host in the list acts as the master. LSF might be unavailable for a few minutes while hosts are waiting to be contacted by the new master.
Running jobs are managed by
sbatchdon each server host. When the new
mbatchdstarts, it polls the
sbatchdon each host and finds the current status of its jobs. If
sbatchdfails but the host is still running, jobs running on the host are not lost. When
sbatchdis restarted it regains control of all jobs running on the host.
If the cluster is partitioned by a network failure, a master LIM takes over on each side of the partition. Interactive load-sharing remains available, as long as each host still has access to the LSF executables.
Event log file (lsb.events)
Fault tolerance in LSF depends on the event log file,
lsb.events, which is kept on the primary file server. Every event in the system is logged in this file, including all job submissions and job and host status changes. If the master host becomes unavailable, a new master is chosen by
sbatchdon the new master starts a new
mbatchd. The new
lsb.eventsfile to recover the state of the system.
For sites not wanting to rely solely on a central file server for recovery information, LSF can be configured to maintain a duplicate event log by keeping a replica of
lsb.events. The replica is stored on the file server, and used if the primary copy is unavailable. When using LSF's duplicate event log function, the primary event log is stored on the first master host, and re-synchronized with the replicated copy when the host recovers.
If the network is partitioned, only one of the partitions can access
lsb.events, so batch services are only available on one side of the partition. A lock file is used to make sure that only one
mbatchdis running in the cluster.
If an LSF server host fails, jobs running on that host are lost. No other jobs are affected. Jobs can be submitted as
rerunnable, so that they automatically run again from the beginning or as
checkpointable, so that they start again from a checkpoint on another host if they are lost because of a host failure.
If all of the hosts in a cluster go down, all running jobs are lost. When a host comes back up and takes over as master, it reads the
lsb.eventsfile to get the state of all batch jobs. Jobs that were running when the systems went down are assumed to have exited, and email is sent to the submitting user. Pending jobs remain in their queues, and are scheduled as hosts become available.
Job exception handling
You can configure hosts and queues so that LSF detects exceptional conditions while jobs are running, and take appropriate action automatically. You can customize what exceptions are detected, and the corresponding actions. By default, LSF does not detect any exceptions.
See Handling Host-level Job Exceptions and Handling Job Exceptions in Queues for more information about job-level exception management.
Platform Computing Inc.
|Knowledge Center Contents Previous Next Index|