| Knowledge Center Contents Previous Next Index |
Non-Shared File Systems
Contents
- About Directories and Files
- Using LSF with Non-Shared File Systems
- Remote File Access
- File Transfer Mechanism (lsrcp)
About Directories and Files
LSF is designed for networks where all hosts have shared file systems, and files have the same names on all hosts.
LSF includes support for copying user data to the execution host before running a batch job, and for copying results back after the job executes.
In networks where the file systems are not shared, this can be used to give remote jobs access to local data.
Supported file systems
UNIX
On UNIX systems, LSF supports the following shared file systems:
- Network File System (NFS). NFS file systems can be mounted permanently or on demand using
automount.- Andrew File System (AFS)
- Distributed File System (DCE/DFS)
Windows
On Windows, directories containing LSF files can be shared among hosts from a Windows server machine.
Non-shared directories and files
LSF is usually used in networks with shared file space. When shared file space is not available, LSF can copy needed files to the execution host before running the job, and copy result files back to the submission host after the job completes. See Remote File Access for more information.
Some networks do not share files between hosts. LSF can still be used on these networks, with reduced fault tolerance. See Using LSF with Non-Shared File Systems for information about using LSF in a network without a shared file system.
Using LSF with Non-Shared File Systems
LSF installation
To install LSF on a cluster without shared file systems, follow the complete installation procedure on every host to install all the binaries, man pages, and configuration files.
Configuration files
After you have installed LSF on every host, you must update the configuration files on all hosts so that they contain the complete cluster configuration. Configuration files must be the same on all hosts.
Master host
You must choose one host to act as the LSF master host. LSF configuration files and working directories must be installed on this host, and the master host must be listed first in
lsf.cluster.cluster_name.You can use the parameter LSF_MASTER_LIST in
lsf.confto define which hosts can be considered to be elected master hosts. In some cases, this may improve performance.For Windows password authentication in a non-shared file system environment, you must define the parameter LSF_MASTER_LIST in
lsf.confso that jobs will run with correct permissions. If you do not define this parameter, LSF assumes that the cluster uses a shared file system environment.Fault tolerance
Some fault tolerance can be introduced by choosing more than one host as a possible master host, and using NFS to mount the LSF working directory on only these hosts. All the possible master hosts must be listed first in
lsf.cluster.cluster_name. As long as one of these hosts is available, LSF continues to operate.Remote File Access
Using LSF with non-shared file space
LSF is usually used in networks with shared file space. When shared file space is not available, use the
bsub -fcommand to have LSF copy needed files to the execution host before running the job, and copy result files back to the submission host after the job completes.LSF attempts to run a job in the directory where the
bsubcommand was invoked. If the execution directory is under the user's home directory,sbatchdlooks for the path relative to the user's home directory. This handles some common configurations, such as cross-mounting user home directories with the/netautomount option.If the directory is not available on the execution host, the job is run in
/tmp. Any files created by the batch job, including the standard output and error files created by the-oand-eoptions tobsub, are left on the execution host.LSF provides support for moving user data from the submission host to the execution host before executing a batch job, and from the execution host back to the submitting host after the job completes. The file operations are specified with the
-foption tobsub.LSF uses the
lsrcpcommand to transfer files.lsrcpcontacts RES on the remote host to perform file transfer. If RES is not available, the UNIXrcpcommand is used. See File Transfer Mechanism (lsrcp) for more information.bsub -f
The
-f "[local_fileoperator[remote_file]]"option to thebsubcommand copies a file between the submission host and the execution host. To specify multiple files, repeat the-foption.local_file
File name on the submission host
remote_file
File name on the execution host
The files
local_fileandremote_filecan be absolute or relative file path names. You must specific at least one file name. When the fileremote_fileis not specified, it is assumed to be the same aslocal_file. Includinglocal_filewithout the operator results in a syntax error.operator
Operation to perform on the file. The operator must be surrounded by white space.
Valid values for
operatorare:>
local_fileon the submission host is copied toremote_fileon the execution host before job execution.remote_fileis overwritten if it exists.<
remote_fileon the execution host is copied tolocal_fileon the submission host after the job completes.local_fileis overwritten if it exists.<<
remote_fileis appended tolocal_fileafter the job completes.local_fileis created if it does not exist.><, <>
Equivalent to performing the > and then the < operation. The file
local_fileis copied toremote_filebefore the job executes, andremote_fileis copied back, overwritinglocal_file, after the job completes. <> is the same as ><If the submission and execution hosts have different directory structures, you must ensure that the directory where
remote_fileandlocal_filewill be placed exists. LSF tries to change the directory to the same path name as the directory where thebsubcommand was run. If this directory does not exist, the job is run in your home directory on the execution host.You should specify
remote_fileas a file name with no path when running in non-shared file systems; this places the file in the job's current working directory on the execution host. This way the job will work correctly even if the directory where thebsubcommand is run does not exist on the execution host. Be careful not to overwrite an existing file in your home directory.bsub -i
If the input file specified with
bsub -iis not found on the execution host, the file is copied from the submission host using the LSF remote file access facility and is removed from the execution host after the job finishes.bsub -o and bsub -e
The output files specified with the
-oand-earguments tobsubare created on the execution host, and are not copied back to the submission host by default. You can use the remote file access facility to copy these files back to the submission host if they are not on a shared file system.For example, the following command stores the job output in the
job_outfile and copies the file back to the submission host:
bsub -o job_out -f "job_out <" myjobExample
To submit
myjobto LSF, with input taken from the file/data/data3and the output copied back to/data/out3, run the command:
bsub -f "/data/data3 > data3" -f "/data/out3 < out3" myjob data3 out3To run the job
batch_update, which updates thebatch_datafile in place, you need to copy the file to the execution host before the job runs and copy it back after the job completes:
bsub -f "batch_data <>" batch_update batch_dataFile Transfer Mechanism (lsrcp)
The LSF remote file access mechanism (
bsub -f) useslsrcpto process the file transfer. Thelsrcpcommand tries to connect to RES on the submission host to handle the file transfer.See Remote File Access for more information about using
bsub -f.Limitations to lsrcp
Because LSF client hosts do not run RES, jobs that are submitted from client hosts should only specify
bsub -fifrcpis allowed. You must set up the permissions forrcpif account mapping is used.File transfer using
lscrpis not supported in the following contexts:
- If LSF account mapping is used;
lsrcpfails when running under a different user account- LSF client hosts do not run RES, so
lsrcpcannot contact RES on the submission hostSee Authorization options for more information.
Workarounds
In these situations, use the following workarounds:
rcp on UNIX
If
lsrcpcannot contact RES on the submission host, it attempts to usercpto copy the file. You must set up the/etc/hosts.equivorHOME/.rhostsfile in order to usercp.See the
rcp(1) andrsh(1) man pages for more information on using thercpcommand.Custom file transfer mechanism
You can replace
lsrcpwith your own file transfer mechanism as long as it supports the same syntax aslsrcp. This might be done to take advantage of a faster interconnection network, or to overcome limitations with the existinglsrcp.sbatchdlooks for thelsrcpexecutable in theLSF_BINDIRdirectory as specified in thelsf.conffile.
|
Platform Computing Inc.
www.platform.com |
| Knowledge Center Contents Previous Next Index |