| Knowledge Center Contents Previous Next Index |
Interactive Jobs with bsub
Contents
- About Interactive Jobs
- Submitting Interactive Jobs
- Performance Tuning for Interactive Batch Jobs
- Interactive Batch Job Messaging
- Running X Applications with bsub
- Writing Job Scripts
- Registering utmp File Entries for Interactive Batch Jobs
About Interactive Jobs
It is sometimes desirable from a system management point of view to control all workload through a single centralized scheduler.
Running an interactive job through the LSF batch system allows you to take advantage of batch scheduling policies and host selection features for resource-intensive jobs. You can submit a job and the least loaded host is selected to run the job.
Since all interactive batch jobs are subject to LSF policies, you will have more control over your system. For example, you may dedicate two servers as interactive servers, and disable interactive access to all other servers by defining an interactive queue that only uses the two interactive servers.
Scheduling policies
Running an interactive batch job allows you to take advantage of batch scheduling policies and host selection features for resource-intensive jobs.
An interactive batch job is scheduled using the same policy as all other jobs in a queue. This means an interactive job can wait for a long time before it gets dispatched. If fast response time is required, interactive jobs should be submitted to high-priority queues with loose scheduling constraints.
Interactive queues
You can configure a queue to be interactive-only, batch-only, or both interactive and batch with the parameter INTERACTIVE in
lsb.queues.See the
Platform LSF Configuration Referencefor information about configuring interactive queues in thelsb.queuesfile.Interactive jobs with non-batch utilities
Non-batch utilities such as
lsrun,lsgrun, etc., use LIM simple placement advice for host selection when running interactive tasks. For more details on using non-batch utilities to run interactive tasks, see Running Interactive and Remote Tasks.Submitting Interactive Jobs
Use the
bsub -Ioption to submit batch interactive jobs, and thebsub -Isand -Ipoptions to submit batch interactive jobs in pseudo-terminals.Pseudo-terminals are not supported for Windows.
For more details, see the
bsubcommand.Finding out which queues accept interactive jobs
Before you submit an interactive job, you need to find out which queues accept interactive jobs with the
bqueues -lcommand.If the output of this command contains the following, this is a batch-only queue. This queue does not accept interactive jobs:
SCHEDULING POLICIES: NO_INTERACTIVEIf the output contains the following, this is an interactive-only queue:
SCHEDULING POLICIES: ONLY_INTERACTIVEIf none of the above are defined or if
SCHEDULING POLICIESis not in the output ofbqueues -l, both interactive and batch jobs are accepted by the queue.You configure interactive queues in the
lsb.queuesfile.Submit an interactive job
- Use the
bsub -Ioption to submit an interactive batch job.For example:
bsub -I lsSubmits a batch interactive job which displays the output of
lsat the user's terminal.
%bsub -I -q interactive -n 4,10 lsmake<<Waiting for dispatch ...>>This example starts Platform Make on 4 to 10 processors and displays the output on the terminal.
A new job cannot be submitted until the interactive job is completed or terminated.
When an interactive job is submitted, a message is displayed while the job is awaiting scheduling. The
bsubcommand stops display of output from the shell until the job completes, and no mail is sent to the user by default. A user can issue actrl-cat any time to terminate the job.Interactive jobs cannot be checkpointed.
Interactive batch jobs cannot be rerunnable (
bsub -r)You can submit interactive batch jobs to rerunnable queues (RERUNNABLE=y in
lsb.queues) or rerunnable application profiles (RERUNNABLE=y inlsb.applications).Submit an interactive job by using a pseudo-terminal
Submission of interaction jobs using pseudo-terminal is not supported for Windows for either
lsrunorbsubLSF commands.bsub -Ip
- To submit a batch interactive job by using a pseudo-terminal, use the
bsub -Ipoption.For example:
%bsub -Ip vi myfileSubmits a batch interactive job to edit
myfile.When you specify the
-Ipoption,bsubsubmits a batch interactive job and creates a pseudo-terminal when the job starts. Some applications such asvifor example, require a pseudo-terminal in order to run correctly.bsub -Is
- To submit a batch interactive job and create a pseudo-terminal with shell mode support, use the
bsub -Isoption.For example:
%bsub -Is cshSubmits a batch interactive job that starts up
cshas an interactive shell.When you specify the
-Isoption,bsubsubmits a batch interactive job and creates a pseudo-terminal with shell mode support when the job starts. This option should be specified for submitting interactive shells, or applications which redefine the CTRL-C and CTRL-Z keys (for example,jove).Submit an interactive job and redirect streams to files
bsub -i, -o, -e
You can use the
-Ioption together with the-i,-o, and-eoptions ofbsubto selectively redirect streams to files. For more details, see thebsub(1)man page.
- To save the standard error stream in the
job.errfile, while standard input and standard output come from the terminal:%bsub -I -q interactive -e job.err lsmakeSplit stdout and stderr
If in your environment there is a wrapper around
bsuband LSF commands so that end-users are unaware of LSF and LSF-specific options, you can redirect standard output and standard error of batch interactive jobs to a file with the > operator.By default, both standard error messages and output messages for batch interactive jobs are written to
stdouton the submission host.
- To write both
stderrandstdouttomystdout:bsub -I myjob 2>mystderr 1>mystdout- To redirect both
stdoutandstderrto different files, set LSF_INTERACTIVE_STDERR=y inlsf.confor as an environment variable.For example, with LSF_INTERACTIVE_STDERR set:
bsub -I myjob 2>mystderr 1>mystdout
stderris redirected tomystderr, andstdouttomystdout.See the
Platform LSF Configuration Referencefor more details on LSF_INTERACTIVE_STDERR.Submit an interactive job, redirect streams to files, and display streams
When using any of the interactive
bsuboptions (for example: -I,-Is,-ISs) as well as the-oor-eoptions, you can also have your output displayed on the console by using the-ttyoption.
- To run an interactive job, redirect the error stream to file, and display the stream to the console:
%
bsub -I -q interactive -e job.err -ttylsmakePerformance Tuning for Interactive Batch Jobs
LSF is often used on systems that support both interactive and batch users. On one hand, users are often concerned that load sharing will overload their workstations and slow down their interactive tasks. On the other hand, some users want to dedicate some machines for critical batch jobs so that they have guaranteed resources. Even if all your workload is batch jobs, you still want to reduce resource contentions and operating system overhead to maximize the use of your resources.
Numerous parameters can be used to control your resource allocation and to avoid undesirable contention.
Types of load conditions
Since interferences are often reflected from the load indices, LSF responds to load changes to avoid or reduce contentions. LSF can take actions on jobs to reduce interference before or after jobs are started. These actions are triggered by different load conditions. Most of the conditions can be configured at both the queue level and at the host level. Conditions defined at the queue level apply to all hosts used by the queue, while conditions defined at the host level apply to all queues using the host.
Scheduling conditions
These conditions, if met, trigger the start of more jobs. The scheduling conditions are defined in terms of load thresholds or resource requirements.
At the queue level, scheduling conditions are configured as either resource requirements or scheduling load thresholds, as described in
lsb.queues. At the host level, the scheduling conditions are defined as scheduling load thresholds, as described inlsb.hosts.Suspending conditions
These conditions affect running jobs. When these conditions are met, a SUSPEND action is performed to a running job.
At the queue level, suspending conditions are defined as STOP_COND as described in
lsb.queuesor as suspending load threshold. At the host level, suspending conditions are defined as stop load threshold as described inlsb.hosts.Resuming conditions
These conditions determine when a suspended job can be resumed. When these conditions are met, a RESUME action is performed on a suspended job.
At the queue level, resume conditions are defined as by RESUME_COND in
lsb.queues, or by theloadSchedthresholds for the queue if RESUME_COND is not defined.Types of load indices
To effectively reduce interference between jobs, correct load indices should be used properly. Below are examples of a few frequently used parameters.
Paging rate (pg)
The paging rate (
pg) load index relates strongly to the perceived interactive performance. If a host is paging applications to disk, the user interface feels very slow.The paging rate is also a reflection of a shortage of physical memory. When an application is being paged in and out frequently, the system is spending a lot of time performing overhead, resulting in reduced performance.
The paging rate load index can be used as a threshold to either stop sending more jobs to the host, or to suspend an already running batch job to give priority to interactive users.
This parameter can be used in different configuration files to achieve different purposes. By defining paging rate threshold in
lsf.cluster.cluster_name, the host will become busy from LIM's point of view; therefore, no more jobs will be advised by LIM to run on this host.By including paging rate in queue or host scheduling conditions, jobs can be prevented from starting on machines with a heavy paging rate, or can be suspended or even killed if they are interfering with the interactive user on the console.
A job suspended due to
pgthreshold will not be resumed even if the resume conditions are met unless the machine is interactively idle for more than PG_SUSP_IT seconds.Interactive idle time (it)
Strict control can be achieved using the idle time (
it) index. This index measures the number of minutes since any interactive terminal activity. Interactive terminals include hard wired ttys,rloginandlsloginsessions, and X shell windows such asxterm. On some hosts, LIM also detects mouse and keyboard activity.This index is typically used to prevent batch jobs from interfering with interactive activities. By defining the suspending condition in the queue as
it<1 && pg>50, a job from this queue will be suspended if the machine is not interactively idle and the paging rate is higher than 50 pages per second. Furthermore, by defining the resuming condition asit>5 && pg<10in the queue, a suspended job from the queue will not resume unless it has been idle for at least five minutes and the paging rate is less than ten pages per second.The
itindex is only non-zero if no interactive users are active. Setting theitthreshold to five minutes allows a reasonable amount of think time for interactive users, while making the machine available for load sharing, if the users are logged in but absent.For lower priority batch queues, it is appropriate to set an
itsuspending threshold of two minutes and scheduling threshold of ten minutes in thelsb.queuesfile. Jobs in these queues are suspended while the execution host is in use, and resume after the host has been idle for a longer period. For hosts where all batch jobs, no matter how important, should be suspended, set a per-host suspending threshold in thelsb.hostsfile.CPU run queue length (r15s, r1m, r15m)
Running more than one CPU-bound process on a machine (or more than one process per CPU for multiprocessors) can reduce the total throughput because of operating system overhead, as well as interfering with interactive users. Some tasks such as compiling can create more than one CPU-intensive task.
Queues should normally set CPU run queue scheduling thresholds below 1.0, so that hosts already running compute-bound jobs are left alone. LSF scales the run queue thresholds for multiprocessor hosts by using the effective run queue lengths, so multiprocessors automatically run one job per processor in this case.
For short to medium-length jobs, the
r1mindex should be used. For longer jobs, you might want to add anr15mthreshold. An exception to this are high priority queues, where turnaround time is more important than total throughput. For high priority queues, anr1mscheduling threshold of 2.0 is appropriate.See Load Indices for the concept of effective run queue length.
CPU utilization (ut)
The
utparameter measures the amount of CPU time being used. When all the CPU time on a host is in use, there is little to gain from sending another job to that host unless the host is much more powerful than others on the network. Autthreshold of 90% prevents jobs from going to a host where the CPU does not have spare processing cycles.If a host has very high
pgbut lowut, then it may be desirable to suspend some jobs to reduce the contention.Some commands report
utpercentage as a number from 0-100, some report it as a decimal number between 0-1. The configuration parameter in thelsf.cluster.cluster_namefile, the configuration files, and thebsub -Rresource requirement string take a fraction in the range from 0 to 1.The command
bhistshows the execution history of batch jobs, including the time spent waiting in queues or suspended because of system load.The command
bjobs -pshows why a job is pending.Scheduling conditions and resource thresholds
Three parameters, RES_REQ, STOP_COND and RESUME_COND, can be specified in the definition of a queue. Scheduling conditions are a more general way for specifying job dispatching conditions at the queue level. These parameters take resource requirement strings as values which allows you to specify conditions in a more flexible manner than using the
loadSchedorloadStopthresholds.Interactive Batch Job Messaging
LSF can display messages to
stderror the Windows console when the following changes occur with interactive batch jobs:
- Job state
- Pending reason
- Suspend reason
Other job status changes, like switching the job's queue, are not displayed.
Limitations
Interactive batch job messaging is not supported in a MultiCluster environment.
Windows
Interactive batch job messaging is not fully supported on Windows. Only changes in the job state that occur before the job starts running are displayed. No messages are displayed after the job starts.
Configure interactive batch job messaging
Messaging for interactive batch jobs can be specified cluster-wide or in the user environment.
Cluster level
- To enable interactive batch job messaging for all users in the cluster, the LSF administrator configures the following parameters in
lsf.conf:
- LSB_INTERACT_MSG_ENH=Y
- (Optional) LSB_INTERACT_MSG_INTVAL
LSB_INTERACT_MSG_INTVAL specifies the time interval, in seconds, in which LSF updates messages about any changes to the pending status of the job. The default interval is 60 seconds. LSB_INTERACT_MSG_INTVAL is ignored if LSB_INTERACT_MSG_ENH is not set.
User level
- To enable messaging for interactive batch jobs, LSF users can define LSB_INTERACT_MSG_ENH and LSB_INTERACT_MSG_INTVAL as environment variables.
The user-level definition of LSB_INTERACT_MSG_ENH overrides the definition in
lsf.conf.Example messages
Job in pending state
The following example shows messages displayed when a job is in pending state:
bsub -Is -R "ls < 2" cshJob <2812> is submitted to default queue <normal>. <<Waiting for dispatch ...>> << Job's resource requirements not satisfied: 2 hosts; >> << Load information unavailable: 1 host; >> << Just started a job recently: 1 host; >> << Load information unavailable: 1 host; >> << Job's resource requirements not satisfied: 1 host; >>Job terminated by user
The following example shows messages displayed when a job in pending state is terminated by the user:
bsub -m hostA -b 13:00 -Is shJob <2015> is submitted to default queue <normal>. Job will be scheduled after Fri Nov 19 13:00:00 1999 <<Waiting for dispatch ...>> << New job is waiting for scheduling >> << The job has a specified start time >>bkill 2015<< Job <2015> has been terminated by user or administrator >> <<Terminated while pending>>Job suspended then resumed
The following example shows messages displayed when a job is dispatched, suspended, and then resumed:
bsub -m hostA -Is shJob <2020> is submitted to default queue <normal>. <<Waiting for dispatch ...>> << New job is waiting for scheduling >> <<Starting on hostA>>bstop 2020<< The job was suspended by user >>bresume 2020<< Waiting for re-scheduling after being resumed by user >>Running X Applications with bsub
You can start an X session on the least loaded host by submitting it as a batch job:
bsub xtermAn
xtermis started on the least loaded host in the cluster.When you run X applications using
lsrunorbsub, the environment variableDISPLAYis handled properly for you. It behaves as if you were running the X application on the local machine.Configure SSH X11 forwarding for jobs
Prerequisites: X11 forwarding must already be working outside LSF.
- Install SSH and enable X11 forwarding for all hosts that will submit and run these jobs (UNIX hosts only).
- (Optional) In
lsf.conf, specify an SSH command forLSB_SSH_XFORWARD_CMD.The command can include full PATH and options.
Writing Job Scripts
You can build a job file one line at a time, or create it from another file, by running
bsubwithout specifying a job to submit. When you do this, you start an interactive session in whichbsubreads command lines from the standard input and submits them as a single batch job. You are prompted withbsub>for each line.You can use the
bsub -Zscommand to spool a file.For more details on
bsuboptions, see thebsub(1)man page.Writing a job file one line at a time
UNIX example
%bsub -q simulation bsub> cd /work/data/myhomedir bsub> myjob arg1 arg2 ...... bsub> rm myjob.log bsub> ^DJob <1234> submitted to queue <simulation>.In the above example, the 3 command lines run as a Bourne shell (
/bin/sh) script. Only valid Bourne shell command lines are acceptable in this case.Windows example
C:\> bsub -q simulation bsub> cd \\server\data\myhomedir bsub> myjob arg1 arg2 ...... bsub> del myjob.log bsub> ^ZJob <1234> submitted to queue <simulation>.In the above example, the 3 command lines run as a batch file (.BAT). Note that only valid Windows batch file command lines are acceptable in this case.
Specifying job options in a file
In this example, options to run the job are specified in the
options_file.%bsub -q simulation < options_fileJob <1234> submitted to queue <simulation>.UNIX
On UNIX, the
options_filemust be a text file that contains Bourne shell command lines. It cannot be a binary executable file.Windows
On Windows, the
options_filemust be a text file containing Windows batch file command lines.Spooling a job command file
Use
bsub -Zsto spool a job command file to the directory specified by the JOB_SPOOL_DIR parameter inlsb.params, and use the spooled file as the command file for the job.Use the
bmod -Zsncommand to modify or remove the command file after the job has been submitted. Removing or modifying the original input file does not affect the submitted job.Redirecting a script to bsub standard input
You can redirect a script to the standard input of the
bsubcommand:%bsub < myscriptJob <1234> submitted to queue <test>.In this example, the
myscriptfile contains job submission options as well as command lines to execute. When thebsubcommand reads a script from its standard input, it can be modified right afterbsubreturns for the next job submission.When the script is specified on the
bsubcommand line, the script is not spooled:%bsub myscriptJob <1234> submitted to default queue <normal>.In this case the command line
myscriptis spooled, instead of the contents of themyscriptfile. Later modifications to themyscriptfile can affect job behavior.Specifying embedded submission options
You can specify job submission options in scripts read from standard input by the
bsubcommand using lines starting with#BSUB:%bsub -q simulation bsub> #BSUB -q test bsub> #BSUB -o outfile -R "mem>10" bsub> myjob arg1 arg2 bsub> #BSUB -J simjob bsub> ^DJob <1234> submitted to queue <simulation>.Note that:
- Command-line options override embedded options. In this example, the job is submitted to the
simulationqueue rather than thetestqueue.- Submission options can be specified anywhere in the standard input. In the above example, the
-Joption ofbsubis specified after the command to be run.- More than one option can be specified on one line, as shown in the example above.
Running a job under a particular shell
By default, LSF runs batch jobs using the Bourne (
/bin/sh)shell. You can specify the shell under which a job is to run. This is done by specifying an interpreter in the first line of the script.For example:
%bsub bsub> #!/bin/csh -f bsub> set coredump=`ls |grep core` bsub> if ( "$coredump" != "") then bsub> mv core core.`date | cut -d" " -f1` bsub> endif bsub> myjob bsub> ^DJob <1234> is submitted to default queue <normal>.The
bsubcommand must read the job script from standard input to set the execution shell. If you do not specify a shell in the script, the script is run using/bin/sh. If the first line of the script starts with a#not immediately followed by an exclamation mark (!), then/bin/cshis used to run the job.For example:
%bsub bsub> # This is a comment line. This tells the system to use /bin/csh to bsub> # interpret the script. bsub> bsub> setenv DAY `date | cut -d" " -f1` bsub> myjob bsub> ^DJob <1234> is submitted to default queue <normal>.If running jobs under a particular shell is required frequently, you can specify an alternate shell using a command-level job starter and run your jobs interactively. See Controlling Execution Environment Using Job Starters for more details.
Registering utmp File Entries for Interactive Batch Jobs
LSF administrators can configure the cluster to track user and account information for interactive batch jobs submitted with
bsub -Iporbsub -Is. User and account information is registered as entries in the UNIXutmpfile, which holds information for commands such aswho. Registering user information for interactive batch jobs inutmpallows more accurate job accounting.Configuration and operation
To enable
utmpfile registration, the LSF administrator sets the LSB_UTMP parameter inlsf.conf.When LSB_UTMP is defined, LSF registers the job by adding an entry to the
utmpfile on the execution host when the job starts. After the job finishes, LSF removes the entry for the job from theutmpfile.Limitations
- Registration of
utmpfile entries is supported on the following platforms:
- SGI IRIX (6.4 and later)
- Solaris (all versions)
- HP-UX (all versions)
- Linux (all versions)
utmpfile registration is not supported in a MultiCluster environment.- Because interactive batch jobs submitted with
bsub -Iare not associated with a pseudo-terminal,utmpfile registration is not supported for these jobs.
|
Platform Computing Inc.
www.platform.com |
| Knowledge Center Contents Previous Next Index |