sagemaker.train.container_drivers.common.utils#

This module provides utility functions for the container drivers.

Functions

execute_commands(commands)

Execute the provided commands and return exit code with failure traceback if any.

get_process_count([process_count])

Get the number of processes to run on each node in the training job.

get_python_executable()

Get the python executable path.

hyperparameters_to_cli_args(hyperparameters)

Convert the hyperparameters to CLI arguments.

is_master_node()

Check if the current node is the master node.

is_worker_node()

Check if the current node is a worker node.

log_subprocess_output(pipe)

Log the output from the subprocess.

read_distributed_json([distributed_json])

Read the distribution config json file.

read_hyperparameters_json([hyperparameters_json])

Read the hyperparameters config json file.

read_source_code_json([source_code_json])

Read the source code config json file.

safe_deserialize(data)

Safely deserialize data from a JSON string.

safe_serialize(data)

Serialize the data without wrapping strings in quotes.

write_failure_file([message])

Write a failure file with the message.

sagemaker.train.container_drivers.common.utils.execute_commands(commands: List[str]) Tuple[int, str][source]#

Execute the provided commands and return exit code with failure traceback if any.

sagemaker.train.container_drivers.common.utils.get_process_count(process_count: int | None = None) int[source]#

Get the number of processes to run on each node in the training job.

sagemaker.train.container_drivers.common.utils.get_python_executable() str[source]#

Get the python executable path.

sagemaker.train.container_drivers.common.utils.hyperparameters_to_cli_args(hyperparameters: Dict[str, Any]) List[str][source]#

Convert the hyperparameters to CLI arguments.

sagemaker.train.container_drivers.common.utils.is_master_node() bool[source]#

Check if the current node is the master node.

sagemaker.train.container_drivers.common.utils.is_worker_node() bool[source]#

Check if the current node is a worker node.

sagemaker.train.container_drivers.common.utils.log_subprocess_output(pipe: IO[bytes])[source]#

Log the output from the subprocess.

sagemaker.train.container_drivers.common.utils.read_distributed_json(distributed_json: Dict[str, Any] = '/opt/ml/input/data/sm_drivers/distributed.json')[source]#

Read the distribution config json file.

sagemaker.train.container_drivers.common.utils.read_hyperparameters_json(hyperparameters_json: Dict[str, Any] = '/opt/ml/input/config/hyperparameters.json')[source]#

Read the hyperparameters config json file.

sagemaker.train.container_drivers.common.utils.read_source_code_json(source_code_json: Dict[str, Any] = '/opt/ml/input/data/sm_drivers/sourcecode.json')[source]#

Read the source code config json file.

sagemaker.train.container_drivers.common.utils.safe_deserialize(data: Any) Any[source]#

Safely deserialize data from a JSON string.

This function handles the following cases: 1. If data is not a string, it returns the input as-is. 2. If data is a JSON-encoded string, it attempts to deserialize it using json.loads(). 3. If data is a string but cannot be decoded as JSON, it returns the original string.

Returns:

The deserialized data, or the original input if it cannot be JSON-decoded.

Return type:

Any

sagemaker.train.container_drivers.common.utils.safe_serialize(data)[source]#

Serialize the data without wrapping strings in quotes.

This function handles the following cases: 1. If data is a string, it returns the string as-is without wrapping in quotes. 2. If data is serializable (e.g., a dictionary, list, int, float), it returns

the JSON-encoded string using json.dumps().

  1. If data cannot be serialized (e.g., a custom object), it returns the string representation of the data using str(data).

Parameters:

data (Any) – The data to serialize.

Returns:

The serialized JSON-compatible string or the string representation of the input.

Return type:

str

sagemaker.train.container_drivers.common.utils.write_failure_file(message: str | None = None)[source]#

Write a failure file with the message.