|Knowledge Center Contents Previous Next Index|
Managing LSF on Platform EGO
- About LSF on Platform EGO
- LSF and EGO directory structure
- Configuring LSF and EGO
- Managing LSF daemons through EGO
- Administrative Basics
- Logging and troubleshooting
- Frequently asked questions
About LSF on Platform EGO
LSF on Platform EGO allows EGO to serve as the central resource broker, enabling enterprise applications to benefit from sharing of resources across the enterprise grid.
Scalability-EGO enhances LSF scalability. Currently, the LSF scheduler has to deal with a large number of jobs. EGO provides management functionality for multiple schedulers that co-exist in one EGO environment. In LSF Version 7, although only a single instance of LSF is available on EGO, the foundation is established for greater scalability in follow-on releases that will allow multiple instances of LSF on EGO.
Robustness-In previous releases, LSF functioned as both scheduler and resource manager. EGO decouples these functions, making the entire system more robust. EGO reduces or eliminates downtime for LSF users while resources are added or removed.
Reliability-In situations where service is degraded due to noncritical failures such as sbatchd or RES, by default, LSF does not automatically restart the daemons. The EGO Service Controller can monitor all LSF daemons and automatically restart them if they fail. Similarly, the EGO Service Controller can also monitor and restart other critical processes such as FLEXnet and
Additional scheduling functionality-EGO provides the foundation for EGO-enabled SLA, which provides LSF with additional and important scheduling functionality.
Centralized management and administration framework.
Single reporting framework-across various application heads built around EGO.
What is Platform EGO?
Platform Enterprise Grid Orchestrator (EGO) allows developers, administrators, and users to treat a collection of distributed software and hardware resources on a shared computing infrastructure (cluster) as parts of a single virtual computer.
EGO assesses the demands of competing business services (consumers) operating within a cluster and dynamically allocates resources so as to best meet a company's overriding business objectives. These objectives might include
- Reducing the time or the cost of providing key business services
- Maximizing the revenue generated by existing computing infrastructure
- Configuring, enforcing, and auditing service plans for multiple consumers
- Ensuring high availability and business continuity through disaster scenarios
- Simplifying IT management and reducing management costs
- Consolidating divergent and mixed computing resources into a single virtual infrastructure that can be shared transparently between many business users
Platform EGO also provides a full suite of services to support and manage resource orchestration. These include cluster management, configuration and auditing of service-level plans, resource facilitation to provide fail-over if a master host goes down, monitoring and data distribution.
EGO is only sensitive to the
resource requirementsof business services; EGO has no knowledge of any run-time dynamic parameters that exist for them. This means that EGO does not interfere with how a business service chooses to use the resources it has been allocated.
How does Platform EGO work?
Platform products work in various ways to match business service (consumer) demands for resources with an available supply of resources. While a specific clustered application manager or consumer (for example, an LSF cluster) identifies what its resource demands are, Platform EGO is responsible for supplying those resources. Platform EGO determines the number of resources each consumer is entitled to, takes into account a consumer's priority and overall objectives, and then allocates the number of required resources (for example, the number of slots, virtual machines, or physical machines).
Once the consumer receives its allotted resources from Platform EGO, the consumer applies its own rules and policies. How the consumer decides to balance its workload across the fixed resources allotted to it is not the responsibility of EGO.
So how does Platform EGO know the demand? Administrators or developers use various EGO interfaces (such as the SDK or CLI) to tell EGO what constitutes a demand for more resources. When Platform LSF identifies that there is a demand, it then distributes the required resources based on the resource plans given to it by the administrator or developer.
For all of this to happen smoothly, various components are built into Platform EGO. Each EGO component performs a specific job.
Platform EGO components
Platform EGO comprises a collection of cluster orchestration software components. The following figure shows overall architecture and how these components fit within a larger system installation and interact with each other:
Key EGO concepts
A consumer represents an entity that can demand resources from the cluster. A consumer might be a business service, a business process that is a complex collection of business services, an individual user, or an entire line of business.
Resources are physical and logical entities that can be requested by a client. For example, an application (client) requests a processor (resource) in order to run.
Resources also have attributes. For example, a host has attributes of memory, processor utilization, operating systems type, etc.
Resource distribution tree
The resource distribution tree identifies consumers of the cluster resources, and organizes them into a manageable structure.
Resource groups are logical groups of hosts. Resource groups provide a simple way of organizing and grouping resources (hosts) for convenience; instead of creating policies for individual resources, you can create and apply them to an entire group. Groups can be made of resources that satisfy a specific requirement in terms of OS, memory, swap space, CPU factor and so on, or that are explicitly listed by name.
Resource distribution plans
The resource distribution plan, or resource plan, defines how cluster resources are distributed among consumers. The plan takes into account the differences between consumers and their needs, resource properties, and various other policies concerning consumer rank and the allocation of resources.
The distribution priority is to satisfy each consumer's reserved ownership, then distribute remaining resources to consumers that have demand.
A service is a self-contained, continuously running process that accepts one or more requests and returns one or more responses. Services may have multiple concurrent service instances running on multiple hosts. All Platform EGO services are automatically enabled by default at installation.
egoshto check service status.
If EGO is disabled, the
egoshcommand cannot find
ego.confor cannot contact
vemkd(not started), and the following message is displayed:You cannot run the egosh command because the administrator has chosen not to enable EGO in lsf.conf: LSF_ENABLE_EGO=N.
EGO user accounts
A user account is a Platform system user who can be assigned to any role for any consumer in the tree. User accounts include optional contact information, a name, and a password.
LSF and EGO directory structure
The following tables describe the purpose of each sub-directory and whether they are writable or non-writable by LSF.
Directory Path Description Attribute LSF_TOP/7.0 LSF 7.0 binaries and other machine dependent files Non-writable LSF_TOP/conf LSF 7.0 configuration filesYou must be LSF administrator or root to edit files in this directory Writable by the LSF administrator, master host, and master candidate hosts LSF_TOP/log LSF 7.0 log files Writable by all hosts in the cluster LSF_TOP/work LSF 7.0 working directory Writable by the master host and master candidate hosts, and is accessible to slave hosts
EGO, GUI, and PERF directories
Directory Path Description Attribute LSF_BINDIR EGO binaries and other machine dependent files Non-writable LSF_CONFDIR/ego/
EGO services configuration and log files. Writable LSF_CONFDIR/ego/
EGO kernel configuration, log files and working directory, including conf/log/work Writable LSB_SHAREDIR/
EGO working directory Writable LSF_TOP/perf/1.2 PERF commands, library and schema Non-writable LSF_CONFDIR/perf/
PERF configuration Writable LSB_SHAREDIR/
PERF embedded data files for derby Writable LSF_TOP/perf/1.2/etc PERF script command for services Non-writable LSF_TOP/log/perf(PERF_LOGDIR) PERF log files Writable LSB_SHAREDIR/
PERF working directory Writable LSF_TOP/jre Java Runtime Environment Non-writable LSF_TOP/gui GUI Non-writable LSF_CONFDIR/gui/
GUI configuration Writable LSB_SHAREDIR/
GUI working directory Writable LSF_TOP/log/gui(GUI_LOGDIR) GUI log files Writable LSF_TOP/gui/2.0/ GUI binaries and tomcat Non-writable LSF_TOP/gui/2.0/tomcat Tomcat web server Writable
note:Several directories under LSF_TOP/gui/1.2/tomcat are writable by Tomcat servers. You should install the whole Tomcat directory on a writable file system.
Example directory structures
UNIX and Linux
The following figures show typical directory structures for a new UNIX or Linux installation with
lsfinstall. Depending on which products you have installed and platforms you have selected, your directory structure may vary.
The following diagram shows an example directory structure for a Windows installation.
Configuring LSF and EGO
EGO configuration files for LSF daemon management (res.xml and sbatchd.xml)
The following files are located in
res.xml-EGO service configuration file for
sbatchd.xml-EGO service configuration file for
When LSF daemon control through EGO Service Controller is configured,
lsadminuses the reserved EGO service name
resto control the LSF
badminuses the reserved EGO service name
sbatchdto control the LSF
How to handle parameters in lsf.conf with corresponding parameters in ego.conf
When EGO is enabled, existing LSF parameters (parameter names beginning with
LSF_) that are set only in
lsf.confoperate as usual because LSF daemons and commands read both
Some existing LSF parameters have corresponding EGO parameter names in
LSF_CONFDIR/lsf.confis a separate file from
/kernel/ego.conf). You can keep your existing LSF parameters in
lsf.conf, or your can set the corresponding EGO parameters in
ego.confthat have not already been set in
You cannot set LSF parameters in
ego.conf, but you can set the following EGO parameters related to LIM, PIM, and ELIM in either
You cannot set any other EGO parameters (parameter names beginning with
lsf.conf. If EGO is not enabled, you can only set these parameters in
note:If you specify a parameter in lsf.conf and you also specify the corresponding parameter in ego.conf, the parameter value in ego.conf takes precedence over the conflicting parameter in lsf.conf.If the parameter is not set in either lsf.conf or ego.conf, the default takes effect depends on whether EGO is enabled. If EGO is not enabled, then the LSF default takes effect. If EGO is enabled, the EGO default takes effect. In most cases, the default is the same.Some parameters in lsf.conf do not have exactly the same behavior, valid values, syntax, or default value as the corresponding parameter in ego.conf, so in general, you should not set them in both files. If you need LSF parameters for backwards compatibility, you should set them only in lsf.conf.
If you have LSF 6.2 hosts in your cluster, they can only read
lsf.conf, so you must set LSF parameters only in
LSF and EGO corresponding parameters
The following table summarizes existing LSF parameters that have corresponding EGO parameter names. You must continue to set other LSF parameters in
lsf.conf parameter ego.conf parameter LSF_API_CONNTIMEOUT EGO_LIM_CONNTIMEOUT LSF_API_RECVTIMEOUT EGO_LIM_RECVTIMEOUT LSF_CLUSTER_ID (Windows) EGO_CLUSTER_ID (Windows) LSF_CONF_RETRY_INT EGO_CONF_RETRY_INT LSF_CONF_RETRY_MAX EGO_CONF_RETRY_MAX LSF_DEBUG_LIM EGO_DEBUG_LIM LSF_DHPC_ENV EGO_DHPC_ENV LSF_DYNAMIC_HOST_TIMEOUT EGO_DYNAMIC_HOST_TIMEOUT LSF_DYNAMIC_HOST_WAIT_TIME EGO_DYNAMIC_HOST_WAIT_TIME LSF_ENABLE_DUALCORE EGO_ENABLE_DUALCORE LSF_GET_CONF EGO_GET_CONF LSF_GETCONF_MAX EGO_GETCONF_MAX LSF_LIM_DEBUG EGO_LIM_DEBUG LSF_LIM_PORT EGO_LIM_PORT LSF_LOCAL_RESOURCES EGO_LOCAL_RESOURCES LSF_LOG_MASK EGO_LOG_MASK LSF_MASTER_LIST EGO_MASTER_LIST LSF_PIM_INFODIR EGO_PIM_INFODIR LSF_PIM_SLEEPTIME EGO_PIM_SLEEPTIME LSF_PIM_SLEEPTIME_UPDATE EGO_PIM_SLEEPTIME_UPDATE LSF_RSH EGO_RSH LSF_STRIP_DOMAIN EGO_STRIP_DOMAIN LSF_TIME_LIM EGO_TIME_LIM
Parameters that have changed in LSF 7
The default for LSF_LIM_PORT has changed to accommodate EGO default port configura6tion. On EGO, default ports start with lim at 7869, and are numbered consecutively for
This is different from previous LSF releases where the default LSF_LIM_PORT was 6879.
mbatchdcontinue to use the default pre-version 7 ports 6878, 6881, and 6882.
Upgrade installation preserves any existing port settings for
egoscuse default EGO ports starting at 7870, if they do not conflict with existing
EGO connection ports and base port
On every host, a set of connection ports must be free for use by LSF and EGO components.
LSF and EGO require exclusive use of certain ports for communication. EGO uses the same four consecutive ports on every host in the cluster. The first of these is called the base port.
The default EGO base connection port is 7869. By default, EGO uses four consecutive ports starting from the base port. By default, EGO uses ports 7869-7872.
The ports can be customized by customizing the base port. For example, if the base port is 6880, EGO uses ports 6880-6883.
LSF and EGO needs the same ports on every host, so you must specify the same base port on every host.
Special resource groups for LSF master hosts
By default, Platform LSF installation defines a special resource group named
ManagementHostsfor the Platform LSF master host. (In general, Platform LSF master hosts are dedicated hosts; the ManagementHosts EGO resource group serves this purpose.)
Platform LSF master hosts must not be subject to any lend, borrow, or reclaim policies. They must be exclusively owned by the Platform LSF consumer.
The default Platform EGO configuration is such that the LSF_MASTER_LIST hosts and the execution hosts are in different resource groups so that different resource plans can be applied to each group.
Managing LSF daemons through EGO
Daemons in LSF_SERVERDIR Description vemkd Started by lim on master host pem Started by lim on every host egosc Started by vemkd on master host
Daemons in LSF_SERVERDIR Description lim lim runs on every host. On UNIX, lim is either started by lsadmin through rsh/ssh or started through rc file. On Windows, lim is started as a Windows service. pim Started by lim on every host mbatchd Started by sbatchd on master host mbschd Started by mbatchd on master host sbatchd Under OS startup mode, sbatchd is either started by lsadmin through rsh/ssh or started through rc file on UNIX. On Windows, sbatchd is started as a Windows service.Under EGO Service Controller mode, sbatchd is started by pem as an EGO service on every host. res Under OS startup mode, res is either started by lsadmin through rsh/ssh or started through rc file on UNIX. On Windows, res is started as a Windows service.Under EGO Service Controller mode, res is started by pem as an EGO service on every host.
Operating System daemon control
Opertaing system startup mode is the same as previous releases:
- On UNIX, administrators configure the autostart of
resin the operating system (
inittab) and use
badminto start LSF daemons manually through
- On Windows,
resare started as Windows services.
EGO Service Controller daemon control
Under EGO Service Control mode, administrators configure the EGO Service Controller to start
sbatchd, and restart them if they fail.
You can still run
badminto start LSF manually, but internally,
badmincommunicates with the EGO Service Controller, which actually starts
resas EGO services.
If EGO Service Controller management is configured and you run
lsadmin resshutdownto manually shut down LSF, the LSF daemons are not restarted automatically by EGO. You must run
badmin hstartupto start the LSF daemons manually.
Permissions required for daemon control
To control all daemons in the cluster, you must
- Be logged on as root or as a user listed in the
/etc/lsf.sudoersfile. See the
Platform LSF Configuration Referencefor configuration details of
- Be able to run the
sshcommands across all LSF hosts without having to enter a password. See your operating system documentation for information about configuring the
sshcommands. The shell command specified by LSF_RSH in
lsf.confis used before
Bypass EGO login at startup (lsf.sudoers)
Prerequisites: You must be the LSF administrator (
lsfadmin) or root to configure
When LSF daemons control through EGO Service Controller is configured, users must have EGO credentials for EGO to start
sbatchdservices. By default,
user logoncommand to prompt for the user name and password of the EGO administrator to get EGO credentials.
lsf.sudoersto bypass EGO login to start
Set the following parameters:
- LSF_EGO_ADMIN_USER-User name of the EGO administrator. The default administrator name is
- LSF_EGO_ADMIN_PASSWD-Password of the EGO administrator.
EGO control of HPC Portal and PERF services
When EGO is enabled in the cluster, EGO may control services for components such as the HPC Portal or LSF Reports (PERF). This is recommended. It allows failover among multiple management hosts, and allows EGO cluster commands to start, stop, and restart the services.
HPC Portal not controlled by EGO
For HPC Portal, if it is not controlled by EGO, you must specify the host to run HPC Portal. Use the
pmcadmincommand to start and stop HPC Portal. Use the
pmcsetrc.shcommand to enable automatic startup on the host (the daemon will restart if the host is restarted).
PERF services not controlled by EGO
For PERF, if the services are not controlled by EGO, you must specify the host to run PERF services
purger. Use the
perfadmincommand to start and stop these services on the host. Use the
perfsetrc.shcommand to enable automatic startup of these services on the host (the daemons will restart if the host is restarted). If the PERF host is not the same as the Derby database host, run the same commands on the Derby database host to control
Administering and Using Platform EGOfor detailed information about EGO administration.
Set the command-line environment
On Linux hosts, set the environment before you run any LSF or EGO commands. You need to do this once for each session you open.
egoadminaccounts use LSF and EGO commands to configure and start the cluster.
You need to reset the environment if the environment changes during your session, for example, if you run
egoconfig mghost, which changes the location of some configuration files.
If Platform EGO is enabled in the LSF cluster (
LSF_EGO_ENVDIRare defined in
profile.lsf, set the following environment variables:
Platform EGO Referencefor more information about these variables.
Platform LSF Configuration Referencefor more information about
Logging and troubleshooting
LSF log files
LSF event and account log location
LSF uses directories for temporary work files, log files and transaction files and spooling.
LSF keeps track of all jobs in the system by maintaining a transaction log in the work subtree. The LSF log files are found in the directory
The following files maintain the state of the LSF system:
LSF uses the
lsb.eventsfile to keep track of the state of all jobs. Each job is a transaction from job submission to job completion. LSF system keeps track of everything associated with the job in the
The events file is automatically trimmed and old job events are stored in
mbatchdstarts, it refers only to the
lsb.eventsfile, not the
bhistcommand can refer to these files.
LSF error log location
If the optional LSF_LOGDIR parameter is defined in
lsf.conf, error messages from LSF servers are logged to files in this directory.
If LSF_LOGDIR is defined, but the daemons cannot write to files there, the error log files are created in
If LSF_LOGDIR is not defined, errors are logged to the system error logs (
syslog) using the LOG_DAEMON facility.
syslogmessages are highly configurable, and the default configuration varies widely from system to system. Start by looking for the file
/etc/syslog.conf, and read the man pages for
If the error log is managed by
syslog, it is probably already being automatically cleared.
If LSF daemons cannot find
lsf.confwhen they start, they will not find the definition of LSF_LOGDIR. In this case, error messages go to
syslog. If you cannot find any error messages in the log files, they are likely in the
LSF daemon error logs
LSF log files are reopened each time a message is logged, so if you rename or remove a daemon log file, the daemons will automatically create a new log file.
The LSF daemons log messages when they detect problems or unusual situations.
The daemons can be configured to put these messages into files.
The error log file names for the LSF system daemons are:
LSF daemons log error messages in different levels so that you can choose to log all messages, or only log messages that are deemed critical. Message logging for LSF daemons is controlled by the parameter LSF_LOG_MASK in
lsf.conf. Possible values for this parameter can be any log priority symbol that is defined in
/usr/include/sys/syslog.h. The default value for LSF_LOG_MASK is LOG_WARNING.
LSF log directory permissions and ownership
Ensure that the permissions on the LSF_LOGDIR directory to be writable by
root. The LSF administrator must own LSF_LOGDIR.
EGO log files
Log files contain important run-time information about the general health of EGO daemons, workload submissions, and other EGO system events. Log files are an essential troubleshooting tool during production and testing.
The naming convention for most EGO log files is the name of the daemon plus the host name the daemon is running on.
The following table outlines the daemons and their associated log file names. Log files on Windows hosts have a
Daemon Log file name ESC (EGO Service Controller)
PEM (Process Execution Manager)
VEMKD (Platform LSF Kernel Daemon)
WSM (HPC Portal/WEBGUI)
WSG (Web Service Gateway)
Most log entries are informational in nature. It is not uncommon to have a large (and growing) log file and still have a healthy cluster.
EGO log file locations
By default, most Platform LSF log files are found in
- The service controller log files are found in
- HPC Portal log files (WSM and Catalina) are found in
- Web service gateway log files are found in
- The service directory log files, logged by BIND, are found in
EGO log entry format
Log file entries follow the format
where the date is expressed in YYYY-MM-DD hh-mm-ss.sss.
2006-03-14 11:02:44.000 Eastern Standard Time ERROR [2488:1036] vemkdexit: vemkd is halting.
EGO log classes
Every log entry belongs to a log class. You can use log class as a mechanism to filter log entries by area. Log classes in combination with log levels allow you to troubleshoot using log entries that only address, for example, configuration.
Log classes are adjusted at run time using
Valid logging classes are as follows:
Class Description LC_ALLOC Logs messages related to the resource allocation engine LC_AUTH Logs messages related to users and authentication LC_CLIENT Logs messages related to clients LC_COMM Logs messages related to communications LC_CONF Logs messages related to configuration LC_CONTAINER Logs messages related to activities LC_EVENT Logs messages related to the event notification service LC_MEM Logs messages related to memory allocation LC_PEM Logs messages related to the process execution manager (pem) LC_PERF Logs messages related to performance LC_QUERY Logs messages related to client queries LC_RECOVER Logs messages related to recovery and data persistence LC_RSRC Logs messages related to resources, including host status changes LC_SYS Logs messages related to system calls LC_TRACE Logs the steps of the program
EGO log levels
There are nine log levels that allow administrators to control the level of event information that is logged.
When you are troubleshooting, increase the log level to obtain as much detailed information as you can. When you are finished troubleshooting, decrease the log level to prevent the log files from becoming too large.
Valid logging levels are as follows:
Number Level Description 0 LOG_EMERG Log only those messages in which the system is unusable. 1 LOG_ALERT Log only those messages for which action must be taken immediately. 2 LOG_CRIT Log only those messages that are critical. 3 LOG_ERR Log only those messages that indicate error conditions. 4 LOG_WARNING Log only those messages that are warnings or more serious messages. This is the default level of debug information. 5 LOG_NOTICE Log those messages that indicate normal but significant conditions or warnings and more serious messages. 6 LOG_INFO Log all informational messages and more serious messages. 7 LOG_DEBUG Log all debug-level messages. 8 LOG_TRACE Log all available messages.
EGO log level and class information retrieved from configuration files
When EGO is enabled, the
ego.confto retrieve the following information (as corresponds to the particular daemon):
- EGO_LOG_MASK: The log level used to determine the amount of detail logged.
- EGO_DEBUG_PEM: The log class setting for pem.
- EGO_DEBUG_VEMKD: The log class setting for vemkd.
wsm.confto retrieve the following information:
- LOG_LEVEL: The configured log classcontrolling the level of event information that is logged (
wsg.confto retrieve the following information:
- WSG_PORT: The port on which the Web service gateway (WebServiceGateway) should run
- WSG_SSL: Whether the daemon should use Secure Socket Layer (SSL) for communication.
- WSG_DEBUG_DETAIL: The log level used to determine the amount of detail logged for debugging purposes.
- WSG_LOGDIR: The directory location where
wsg.logfiles are written.
The service director daemon (
named.confto retrieve the following information:
- logging, severity: The configured severity log class controlling the level of event information that is logged (
dynamic). In the case of a log class set to
debug, a log level is required to determine the amount of detail logged for debugging purposes.
Why do log files grow so quickly?
Every time an EGO system event occurs, a log file entry is added to a log file. Most entries are informational in nature, except when there is an error condition. If your log levels provide entries for all information (for example, if you have set them to LOG_DEBUG), the files will grow quickly.
- During regular EGO operation, set your log levels to LOG_WARNING. With this setting, critical errors are logged but informational entries are not, keeping the log file size to a minimum.
- For troubleshooting purposes, set your log level to LOG_DEBUG. Because of the quantity of messages you will receive when subscribed to this log level, change the level back to LOG_WARNING as soon as you are finished troubleshooting.
tip:If your log files are too long, you can always rename them for archive purposes. New, fresh log files will then be created and will log all new events.
How often should I maintain log files?
The growth rate of the log files is dependent on the log level and the complexity of your cluster. If you have a large cluster, daily log file maintenance may be required.
We recommend using a log file rotation utility to do unattended maintenance of your log files. Failure to do timely maintenance could result in a full file system which hinders system performance and operation.
Troubleshoot using multiple EGO log files
EGO log file locations and content
If a service does not start as expected, open the appropriate service log file and review the run-time information contained within it to discover the problem. Look for relevant entries such as insufficient disk space, lack of memory, or network problems that result in unavailable hosts.
Log file Default location What it contains catalina.out Linux:
Logs system errors and debug information from Tomcat web server startup. esc.log
Logs service failures and service instance restarts based on availability plans. Errors surrounding HPC Portal startup are logged here. named.log
Logs information gathered during the updating and querying of service instance location; logged by BIND, a DNS server. pem.log Linux:
Logs remote operations (start, stop, control activities, failures). Logs tracked results for resource utilization of all processes associated with the host, and information for accounting or chargeback. vemkd.log
Logs aggregated host information about the state of individual resources, status of allocation requests, consumer hierarchy, resources assignment to consumers, and started operating system-level process. wsg.log Linux:
Logs service failures surrounding web services interfaces for web service clients (applications). wsm.log
Logs information collected by the web server monitor daemon. Failures of the WEBGUI service that runs the HPC Portal are logged here.
Matching service error messages and corresponding log files
If you receive this message... This may be the problem... Review this log file
failed to create vem working directory
Cannot create work directory during startup vemkd
failed to open lock file
Cannot get lock file during startup vemkd
failed to open host event file
Cannot recover during startup because cannot open event file vemkd
lim port is not defined
ego.confis not defined
master candidate can not set GET_CONF=lim
Wrong parameter defined for master candidate host (for example, EGO_GET_CONF=LIM) lim
there is no valid host in EGO_MASTER_LIST
No valid host in master list lim
Cannot get local host name during startup pem
temp directory (%s) not exist or not accessible, exit
Tmp directory does not exist pem
incorrect EGO_PEM_PORT value %s, exit
EGO_PEM_PORT is a negative number pem
Tmp directory does not exist esc
cannot initialize the listening TCP port %d
Socket error esc
cannot log on
Log on to vemkd failed esc
JAVA_HOME is not defined, exit
WEBGUI service profile is wrong wsm
failed to get hostname: %s
Host name configuration problem wsm
event_init ( ) failed
EGO event plugin configuration problem in
ego.conf_loadeventplug ( ) failed
Event library problem wsm
cannot write to child
Web server is down or there is no response wsm
child no reply
Web server is down or there is no response wsm
vem_register: error in invoking vem_register function
VEM service registration failed wsg
you are not authorized to unregister a service
Either you are not authorized to unregister a service, or there is no registry client wsg
request has invalid signature: TSIG service.ego: tsig verify failure (BADTIME)
Resource record updating failed named
For more information
- About Platform LSF logging and troubleshooting, see Error and Event Logging and Troubleshooting and Error Messages
- About Platform EGO loggings and troubleshooting, see
Administering and Using Platform EGO
Frequently asked questions
Does LSF 7 on EGO support a grace period when reclamation is configured in the resource plan?
No. Resources are immediately reclaimed even if you set a resource reclaim grace period.
Does LSF 7 on EGO support upgrade of the master host only?
Under EGO Service Controller daemon management mode on Windows, does PEM start sbatchd and res directly or does it ask Windows to start sbatchd and res as Windows Services?
On Windows, LSF still installs sbatchd and res as Windows services. If EGO Service Controller daemon control is selected during installation, the Windows service will be set up as Manual. PEM will start up the sbatchd and res directly, not as Windows Services.
What's the benefit of LSF daemon management through the EGO Service Controller?
EGO Service Controller provides High Availability services to sbatchd and res, and faster cluster startup than startup with lsadmin and badmin.
How does the hostsetup script work in LSF 7?
hostsetupscript functions essentially the same as previous versions. It sets up a host to use the LSF cluster and configures LSF daemons to start automatically. In LSF 7, running
hostsetup --top=/path --boot="y"will check the EGO service defination files sbatchd.xml and res.xml. If
sbatchdstartup is set to "Automatic", the host rc setting will only start
lim. If set to "Manual", the host rc setting will start
resas in previous versions.
Is non-shared mixed cluster installation supported, for example, adding UNIX hosts to a Windows cluster, or adding Windows hosts to a UNIX cluster?
In LSF 7, non-shared installation is supported. For example, to add a UNIX host to a Windows cluster, set up the Windows cluster first, then run
lsfinstall -s -f slave.config. In
slave.config, put the Windows hosts in LSF_MASTER_LIST. After startup, the UNIX host will become an LSF host. Adding a Windows host is even simpler. Run the Windows installer, enter the current UNIX master host name. After installation, all daemons will automatically start and the host will join the cluster.
As EGO and LSF share base configuration files, how are other resources handled in EGO in addition to hosts and slots?
Same as previous releases. LSF 7
mbatchdstill communicates with LIM to get available resources. By default, LSF can schedule jobs to make use of all resources started in cluster. If EGO-enabled SLA scheduling is configured, LSF only schedules jobs to use resources on hosts allocated by EGO.
How about compatibility for external scripts and resources like elim, melim, esub and others?
LSF 7 supports full compatibility for these external executables.
elim.xxxis started under LSF_SERVERDIR as usual. By default, LIM is located under LSF_SERVERDIR.
Can Platform LSF MultiCluster share one EGO base?
No, each LSF cluster must run on top of one EGO cluster.
Can EGO consumer policies replace MultiCluster lease mode?
Conceptually, both define resource borrowing and lending policies. However, current EGO consumer policies can only work with slot resources within one EGO cluster. MultiCluster lease mode supports other load indices and external resources between multiple clusters. If you are using MultiCluster lease mode to share only slot resources between clusters, and you are able to merge those clusters into a single cluster, you should be able to use EGO consumer policy and submit jobs to EGO-enabled SLA scheduling to achieve the same goal.
Platform Computing Inc.
|Knowledge Center Contents Previous Next Index|