sagemaker.train.container_drivers.distributed_drivers.mpi_driver

Contents

sagemaker.train.container_drivers.distributed_drivers.mpi_driver#

This module is the entry point for the MPI driver script.

Functions

main()

Main function for the MPI driver script.

sagemaker.train.container_drivers.distributed_drivers.mpi_driver.main()[source]#

Main function for the MPI driver script.

The MPI Dirver is responsible for setting up the MPI environment, generating the correct mpi commands, and launching the MPI job.

Execution Lifecycle: 1. Setup General Environment Variables at /etc/environment 2. Start SSHD Daemon 3. Bootstrap Worker Nodes

  1. Wait to establish connection with Master Node

  2. Wait for Master Node to write status file

  1. Bootstrap Master Node
    1. Wait to establish connection with Worker Nodes

    2. Generate MPI Command

    3. Execute MPI Command with user script provided in entry_script

    4. Write status file to Worker Nodes

  2. Exit