sagemaker.train.container_drivers.distributed_drivers.mpi_driver#
This module is the entry point for the MPI driver script.
Functions
|
Main function for the MPI driver script. |
- sagemaker.train.container_drivers.distributed_drivers.mpi_driver.main()[source]#
Main function for the MPI driver script.
The MPI Dirver is responsible for setting up the MPI environment, generating the correct mpi commands, and launching the MPI job.
Execution Lifecycle: 1. Setup General Environment Variables at /etc/environment 2. Start SSHD Daemon 3. Bootstrap Worker Nodes
Wait to establish connection with Master Node
Wait for Master Node to write status file
- Bootstrap Master Node
Wait to establish connection with Worker Nodes
Generate MPI Command
Execute MPI Command with user script provided in entry_script
Write status file to Worker Nodes
Exit