sagemaker.core.modules.train.container_drivers.distributed_drivers.mpi_utils#
This module provides mpi related utility functions for the container drivers.
Functions
|
Bootstrap the master node. |
|
Bootstrap the worker nodes. |
|
Fetch mpi command |
Start the SSH daemon on the current node. |
|
Whether smddpmprun is installed. |
|
Whether smddprun is installed. |
|
Write environment variables to /etc/environment file. |
|
|
Write the status file to all worker nodes. |
Classes
Class to handle host key policy for SageMaker distributed training SSH connections. |
- class sagemaker.core.modules.train.container_drivers.distributed_drivers.mpi_utils.CustomHostKeyPolicy[source]#
Bases:
MissingHostKeyPolicyClass to handle host key policy for SageMaker distributed training SSH connections.
Example: >>> client = paramiko.SSHClient() >>> client.set_missing_host_key_policy(CustomHostKeyPolicy()) >>> # Will succeed for SageMaker algorithm containers >>> client.connect(‘algo-1234.internal’) >>> # Will raise SSHException for other unknown hosts >>> client.connect(‘unknown-host’) # raises SSHException
- sagemaker.core.modules.train.container_drivers.distributed_drivers.mpi_utils.bootstrap_master_node(worker_hosts: List[str])[source]#
Bootstrap the master node.
- sagemaker.core.modules.train.container_drivers.distributed_drivers.mpi_utils.bootstrap_worker_node(master_host: str, status_file: str = '/tmp/done.algo-1')[source]#
Bootstrap the worker nodes.
- sagemaker.core.modules.train.container_drivers.distributed_drivers.mpi_utils.get_mpirun_command(host_count: int, host_list: List[str], num_processes: int, additional_options: List[str], entry_script_path: str)[source]#
Fetch mpi command
- sagemaker.core.modules.train.container_drivers.distributed_drivers.mpi_utils.start_sshd_daemon()[source]#
Start the SSH daemon on the current node.
- sagemaker.core.modules.train.container_drivers.distributed_drivers.mpi_utils.validate_smddpmprun() bool[source]#
Whether smddpmprun is installed.
- Returns:
True if both are installed
- Return type:
bool
- sagemaker.core.modules.train.container_drivers.distributed_drivers.mpi_utils.validate_smddprun() bool[source]#
Whether smddprun is installed.
- Returns:
True if installed
- Return type:
bool