Checkpoint a running parallel process using the open mpi checkpoint/restart service (crs) note: ompi-checkpoint, and orte-checkpoint are all exact synonyms for each other. using any of the names will result in exactly identical behavior.
ompi-checkpoint [ options ] <PID_OF_MPIRUN>
orte-checkpoint will attempt to notify a running parallel job (identified by mpirun) that it has been requested that the job checkpoint itself. A global snapshot handle reference is presented to the user, which is used in ompi_restart to restart the job.
<PID_OF_MPIRUN>
Process ID of the mpirun process.
-h | --help
Display help for this command
-w | --nowait
Do not wait for the application to finish checkpointing before returning.
-s | --status
Display status messages regarding the progression of the checkpoint request.
--term
After checkpointing the running job, terminate it.
-v | --verbose
Enable verbose output for debugging.
-gmca | --gmca <key> <value>
Pass global MCA parameters that are applicable to all contexts. <key> is the parameter name; <value> is the parameter value.
-mca | --mca <key> <value>
Send arguments to various MCA modules.
orte-checkpoint can be invoked multiple, non-overlapping times. It is convenient to note that the user does not need to spectify the checkpointer to be used here, as that is determined completely by each of the running process in the job being checkpointed.