lamexec: Run non-mpi programs on lam nodes.

SYNOPSIS

lamexec [-fhvD] [-c # | -np #] [-nw | -w] [-pty] [-s node] [-x VAR1[=VALUE1][,VAR2[=VALUE2],...]] [where] program [-- args]

OPTIONS

-c #: Synonym for -np (see below).
-D: Use the executable program location as the current working directory for created processes. The current working directory of the created processes will be set before the user's program is invoked.
-f: Do not configure standard I/O file descriptors - use defaults.
-h: Print useful information on this command.
-np #: (see below). Run this many copies of the program on the given nodes. This option indicates that the specified file is an executable program and not an application schema. If no nodes are specified, all LAM nodes are considered for scheduling; LAM will schedule the programs in a round-robin fashion, "wrapping around" (and scheduling multiple copies on a single node) if necessary.
-nw: Do not wait for all processes to complete before exiting lamexec. This option is mutually exclusive with -w.
-pty: Enable pseudo-tty support. Among other things, this enabled line-buffered output (which is probably what you want). The only reason that this feature is not enabled by default is because it is so new and has not been extensively tested yet.
-s node: Load the program from this node. This option is not valid on the command line if an application schema is specified.
-v: Be verbose; report on important steps as they are done.
-w: Wait for all applications to exit before lamexec exits.
-x: Export the specified environment variables to the remote nodes before executing the program. Existing environment variables can be specified (see the Examples section, below), or new variable names specified with corresponding values. The parser for the -x option is not very sophisticated; it does not even understand quoted values. Users are advised to set variables in the environment, and then use -x to export (not define) them.
where: A set of node and/or CPU identifiers indicating where to start
-- args: Pass these runtime arguments to every new process. This must always be the last argument to lamexec. This option is not valid on the command line if an application schema is specified.

DESCRIPTION

lamexec is essentially a clone of the mpirun(1), but is intended for non-MPI programs.

One invocation of lamexec starts a non-MPI application running under LAM. To start the same program on all LAM nodes, the application can be specified on the lamexec command line. To start multiple applications on the LAM nodes, an application schema is required in a separate file. See appschema(5) for a description of the application schema syntax, but it essentially contains multiple lamexec command lines, less the command name itself. The ability to specify different options for different instantiations of a program is another reason to use an application schema.

Location Nomenclature

The location nomenclature that is used for the where clause mention in the SYNOPSIS section, above, is identical to mpirun(1)'s nomenclature. See the mpirun(1) man page for a lengthy discussion of the location nomenclature.

Note that the by-CPU syntax, while valid for lamexec, is not quite as meaningful because process rank ordering in MPI_COMM_WORLD is irrelevant. As such, the by-node nomenclature is typically the preferred syntax for lamexec.

Application Schema or Executable Program?

To distinguish the two different forms, lamexec looks on the command line for nodes or the -c option. If neither is specified, then the file named on the command line is assumed to be an application schema. If either one or both are specified, then the file is assumed to be an executable program. If nodes and -c both are specified, then copies of the program are started on the specified nodes according to an internal LAM scheduling policy. Specifying just one node effectively forces LAM to run all copies of the program in one place. If -c is given, but not nodes, then all LAM nodes are used. If nodes is given, but not -c, then one copy of the program is run on each node.

Program Transfer

By default, LAM searches for executable programs on the target node where a particular instantiation will run. If the file system is not shared, the target nodes are homogeneous, and the program is frequently recompiled, it can be convenient to have LAM transfer the program from a source node (usually the local node) to each target node. The -s option specifies this behavior and identifies the single source node.

Locating Files

LAM looks for an executable program by searching the directories in the user's PATH environment variable as defined on the source node(s). This behavior is consistent with logging into the source node and executing the program from the shell. On remote nodes, the "." path is the home directory.

LAM looks for an application schema in three directories: the local directory, the value of the LAMAPPLDIR environment variable, and laminstalldir/boot, where "laminstalldir" is the directory where LAM/MPI was installed.

Standard I/O

LAM directs UNIX standard input to /dev/null on all remote nodes. On the local node that invoked lamexec, standard input is inherited from lamexec. The default is what used to be the -w option to prevent conflicting access to the terminal.

LAM directs UNIX standard output and error to the LAM daemon on all remote nodes. LAM ships all captured output/error to the node that invoked lamexec and prints it on the standard output/error of lamexec. Local processes inherit the standard output/error of lamexec and transfer to it directly.

Thus it is possible to redirect standard I/O for LAM applications by using the typical shell redirection procedure on lamexec.

% lamexec N my_app my_input my_output

The -f option avoids all the setup required to support standard I/O described above. Remote processes are completely directed to /dev/null and local processes inherit file descriptors from lamboot(1).

Pseudo-tty support

The -pty option enabled pseudo-tty support for process output. This allows, among other things, for line buffered output from remote nodes (which is probably what you want).

This option is not currently the default for lamexec because it has not been thoroughly tested on a variety of different Unixes. Users are encouraged to use -pty and report any problems back to the LAM Team.

Current Working Directory

The current working directory for new processes created on the local node is inherited from lamexec. The current working directory for new processes created on remote nodes is the remote user's home directory. This default behavior is overridden by the -D option.

The -D option will change the current working directory of new processes to the directory where the executable resides before the new user's program is invoked.

An alternative to the -D option is the -wd option. -wd allows the user to specify an arbitrary current working directory (vs. the location of the executable). Note that the -wd option can be used in application schema files (see appschema(5)) as well.

Process Environment

Processes in the application inherit their environment from the LAM daemon upon the node on which they are running. The environment of a LAM daemon is fixed upon booting of the LAM with lamboot(1) and is inherited from the user's shell. On the origin node this will be the shell from which lamboot(1) was invoked and on remote nodes this will be the shell started by rsh(1). When running dynamically linked applications which require the LD_LIBRARY_PATH environment variable to be set, care must be taken to ensure that it is correctly set when booting the LAM.

Exported Environment Variables

The -x option to lamexec can be used to export specific environment variables to the new processes. While the syntax of the -x option allows the definition of new variables, note that the parser for this option is currently not very sophisticated - it does not even understand quoted values. Users are advised to set variables in the environment and use -x to export them; not to define them.

EXAMPLES

lamexec N prog1: Load and execute prog1 on all nodes. Search for the executable file on each node.
lamexec -c 8 prog1: Run 8 copies of prog1 wherever LAM wants to run them.
lamexec n8-10 -v -nw -s n3 prog1 -- -q: Load and execute prog1 on nodes 8, 9, and 10. Search for prog1 on node 3 and transfer it to the three target nodes. Report as each process is created. Give "-q" as a command line to each new process. Do not wait for the processes to complete before exiting lamexec.
lamexec -v myapp: Parse the application schema, myapp, and start all processes specified in it. Report as each process is created.
lamexec N N -pty -wd /workstuff/output -x DISPLAY run_app.csh: Run the application "run_app.csh" (assumedly a C shell script) twice on each node in the system (ideal for 2-way SMPs). Also enable pseudo-tty support, change directory to /workstuff/output, and export the DISPLAY variable to the new processes (perhaps the shell script will invoke an X application such as xv to display output).
lamexec -np 5 -D `pwd`/my_application: A common usage of lamexec in environments where a filesystem is shared between all nodes in the multicomputer, using the shell-escaped "pwd" command specifies the full name of the executable to run. This prevents the need for putting the directory in the path; the remote notes will have an absolute filename to execute (and change directory to it upon invocation).

DIAGNOSTICS

lamexec: Exec format error: A non-ASCII character was detected in the application schema. This is usually a command line usage error where lamexec is expecting an application schema and an executable file was given.
lamexec: syntax error in application schema, line XXX: The application schema cannot be parsed because of a usage or syntax error on the given line in the file.
filename: No such file or directory: This error can occur in two cases. Either the named file cannot be located or it has been found but the user does not have sufficient permissions to execute the program or read the application schema.

RETURN VALUE

lamexec returns 0 if all processes started by lamexec exit normally. A non-zero value is returned if an internal error occurred in lamexec, or one or more processes exited abnormally. If an internal error occurred in lamexec, the corresponding error code is returned. In the event that one or more processes exit with non-zero exit code, the return value of the process that lamexec first notices died abnormally will be returned. Note that, in general, this will be the first process that died but is not guaranteed to be so.

However, note that if the -nw switch is used, the return value from lamexec does not indicate the exit status of the processes started by it.

RELATED TO lamexec…

mpimsg(1), mpirun(1), mpitask(1), loadgo(1)

lamexec (1)