|Knowledge Center Contents Previous Next Index|
Non-Shared File Systems
- About Directories and Files
- Using LSF with Non-Shared File Systems
- Remote File Access
- File Transfer Mechanism (lsrcp)
About Directories and Files
LSF is designed for networks where all hosts have shared file systems, and files have the same names on all hosts.
LSF includes support for copying user data to the execution host before running a batch job, and for copying results back after the job executes.
In networks where the file systems are not shared, this can be used to give remote jobs access to local data.
Supported file systems
On UNIX systems, LSF supports the following shared file systems:
- Network File System (NFS). NFS file systems can be mounted permanently or on demand using
- Andrew File System (AFS)
- Distributed File System (DCE/DFS)
On Windows, directories containing LSF files can be shared among hosts from a Windows server machine.
Non-shared directories and files
LSF is usually used in networks with shared file space. When shared file space is not available, LSF can copy needed files to the execution host before running the job, and copy result files back to the submission host after the job completes. See Remote File Access for more information.
Some networks do not share files between hosts. LSF can still be used on these networks, with reduced fault tolerance. See Using LSF with Non-Shared File Systems for information about using LSF in a network without a shared file system.
Using LSF with Non-Shared File Systems
To install LSF on a cluster without shared file systems, follow the complete installation procedure on every host to install all the binaries, man pages, and configuration files.
After you have installed LSF on every host, you must update the configuration files on all hosts so that they contain the complete cluster configuration. Configuration files must be the same on all hosts.
You must choose one host to act as the LSF master host. LSF configuration files and working directories must be installed on this host, and the master host must be listed first in
You can use the parameter LSF_MASTER_LIST in
lsf.confto define which hosts can be considered to be elected master hosts. In some cases, this may improve performance.
For Windows password authentication in a non-shared file system environment, you must define the parameter LSF_MASTER_LIST in
lsf.confso that jobs will run with correct permissions. If you do not define this parameter, LSF assumes that the cluster uses a shared file system environment.
Some fault tolerance can be introduced by choosing more than one host as a possible master host, and using NFS to mount the LSF working directory on only these hosts. All the possible master hosts must be listed first in
cluster_name. As long as one of these hosts is available, LSF continues to operate.
Remote File Access
Using LSF with non-shared file space
LSF is usually used in networks with shared file space. When shared file space is not available, use the
bsub -fcommand to have LSF copy needed files to the execution host before running the job, and copy result files back to the submission host after the job completes.
LSF attempts to run a job in the directory where the
bsubcommand was invoked. If the execution directory is under the user's home directory,
sbatchdlooks for the path relative to the user's home directory. This handles some common configurations, such as cross-mounting user home directories with the
If the directory is not available on the execution host, the job is run in
/tmp. Any files created by the batch job, including the standard output and error files created by the
bsub, are left on the execution host.
LSF provides support for moving user data from the submission host to the execution host before executing a batch job, and from the execution host back to the submitting host after the job completes. The file operations are specified with the
LSF uses the
lsrcpcommand to transfer files.
lsrcpcontacts RES on the remote host to perform file transfer. If RES is not available, the UNIX
rcpcommand is used. See File Transfer Mechanism (lsrcp) for more information.
]]"option to the
bsubcommand copies a file between the submission host and the execution host. To specify multiple files, repeat the
File name on the submission host
File name on the execution host
remote_filecan be absolute or relative file path names. You must specific at least one file name. When the file
remote_fileis not specified, it is assumed to be the same as
local_filewithout the operator results in a syntax error.
Operation to perform on the file. The operator must be surrounded by white space.
Valid values for
local_fileon the submission host is copied to
remote_fileon the execution host before job execution.
remote_fileis overwritten if it exists.
remote_fileon the execution host is copied to
local_fileon the submission host after the job completes.
local_fileis overwritten if it exists.
remote_fileis appended to
local_fileafter the job completes.
local_fileis created if it does not exist.
Equivalent to performing the > and then the < operation. The file
local_fileis copied to
remote_filebefore the job executes, and
remote_fileis copied back, overwriting
local_file, after the job completes. <> is the same as ><
If the submission and execution hosts have different directory structures, you must ensure that the directory where
local_filewill be placed exists. LSF tries to change the directory to the same path name as the directory where the
bsubcommand was run. If this directory does not exist, the job is run in your home directory on the execution host.
You should specify
remote_fileas a file name with no path when running in non-shared file systems; this places the file in the job's current working directory on the execution host. This way the job will work correctly even if the directory where the
bsubcommand is run does not exist on the execution host. Be careful not to overwrite an existing file in your home directory.
If the input file specified with
bsub -iis not found on the execution host, the file is copied from the submission host using the LSF remote file access facility and is removed from the execution host after the job finishes.
bsub -o and bsub -e
The output files specified with the
bsubare created on the execution host, and are not copied back to the submission host by default. You can use the remote file access facility to copy these files back to the submission host if they are not on a shared file system.
For example, the following command stores the job output in the
job_outfile and copies the file back to the submission host:
bsub -o job_out -f "job_out <" myjob
myjobto LSF, with input taken from the file
/data/data3and the output copied back to
/data/out3, run the command:
bsub -f "/data/data3 > data3" -f "/data/out3 < out3" myjob data3 out3
To run the job
batch_update, which updates the
batch_datafile in place, you need to copy the file to the execution host before the job runs and copy it back after the job completes:
bsub -f "batch_data <>" batch_update batch_data
File Transfer Mechanism (lsrcp)
The LSF remote file access mechanism (
bsub -f) uses
lsrcpto process the file transfer. The
lsrcpcommand tries to connect to RES on the submission host to handle the file transfer.
See Remote File Access for more information about using
Limitations to lsrcp
Because LSF client hosts do not run RES, jobs that are submitted from client hosts should only specify
rcpis allowed. You must set up the permissions for
rcpif account mapping is used.
File transfer using
lscrpis not supported in the following contexts:
- If LSF account mapping is used;
lsrcpfails when running under a different user account
- LSF client hosts do not run RES, so
lsrcpcannot contact RES on the submission host
See Authorization options for more information.
In these situations, use the following workarounds:
rcp on UNIX
lsrcpcannot contact RES on the submission host, it attempts to use
rcpto copy the file. You must set up the
HOME/.rhostsfile in order to use
rsh(1) man pages for more information on using the
Custom file transfer mechanism
You can replace
lsrcpwith your own file transfer mechanism as long as it supports the same syntax as
lsrcp. This might be done to take advantage of a faster interconnection network, or to overcome limitations with the existing
sbatchdlooks for the
lsrcpexecutable in the
LSF_BINDIRdirectory as specified in the
Platform Computing Inc.
|Knowledge Center Contents Previous Next Index|