sagemaker.core.modules.train.container_drivers.distributed_drivers.mpi_driver#
This module is the entry point for the MPI driver script.
Functions
|
|
|
|
|
|
|
Main function for the MPI driver script. |
|
|
|
|
|
- sagemaker.core.modules.train.container_drivers.distributed_drivers.mpi_driver.bootstrap_master_node(*args, **kwargs)#
- sagemaker.core.modules.train.container_drivers.distributed_drivers.mpi_driver.bootstrap_worker_node(*args, **kwargs)#
- sagemaker.core.modules.train.container_drivers.distributed_drivers.mpi_driver.get_mpirun_command(*args, **kwargs)#
- sagemaker.core.modules.train.container_drivers.distributed_drivers.mpi_driver.main()[source]#
Main function for the MPI driver script.
The MPI Dirver is responsible for setting up the MPI environment, generating the correct mpi commands, and launching the MPI job.
Execution Lifecycle: 1. Setup General Environment Variables at /etc/environment 2. Start SSHD Daemon 3. Bootstrap Worker Nodes
Wait to establish connection with Master Node
Wait for Master Node to write status file
- Bootstrap Master Node
Wait to establish connection with Worker Nodes
Generate MPI Command
Execute MPI Command with user script provided in entry_script
Write status file to Worker Nodes
Exit
- sagemaker.core.modules.train.container_drivers.distributed_drivers.mpi_driver.start_sshd_daemon(*args, **kwargs)#
- sagemaker.core.modules.train.container_drivers.distributed_drivers.mpi_driver.write_env_vars_to_file(*args, **kwargs)#
- sagemaker.core.modules.train.container_drivers.distributed_drivers.mpi_driver.write_status_file_to_workers(*args, **kwargs)#