sagemaker.core.fw_utils#

Utility methods used by framework classes.

Module Attributes

UploadedCode(s3_prefix, script_name)

sagemaker.fw_utils.UploadedCode: An object containing the S3 prefix and script name.

Functions

create_image_uri(region, framework, ...[, ...])

Deprecated method.

framework_name_from_image(image_uri)

Extract the framework and Python version from the image name.

framework_version_from_tag(image_tag)

Extract the framework version from the image tag.

get_mp_parameters(distribution)

Get the model parallelism parameters provided by the user.

model_code_key_prefix(...)

Returns the s3 key prefix for uploading code during model deployment.

parse_mp_parameters(params)

Parse the model parallelism parameters provided by the user.

profiler_config_deprecation_warning(...)

Deprecation message if framework profiling is specified TF >= 2.12 and PT >= 2.0.

python_deprecation_warning(framework, ...)

Placeholder docstring.

tar_and_upload_dir(session, bucket, ...[, ...])

Package source files and upload a compress tar file to S3.

validate_distribution(distribution, ...)

Check if distribution strategy is correctly invoked by the user.

validate_distribution_for_instance_type(...)

Check if the provided distribution strategy is supported for the instance_type.

validate_mp_config(config)

Validate the configuration dictionary for model parallelism.

validate_smdistributed(instance_type, ...[, ...])

Check if smdistributed strategy is correctly invoked by the user.

validate_source_code_input_against_pipeline_variables([...])

Validate source code input against pipeline variables.

validate_source_dir(script, directory)

Validate that the source directory exists and it contains the user script.

validate_torch_distributed_distribution(...)

Check if torch_distributed distribution strategy is correctly invoked by the user.

validate_version_or_image_args(...)

Checks if version or image arguments are specified.

warn_if_parameter_server_with_multi_gpu(...)

Warn the user about training when it doesn't leverage all the GPU cores.

Classes

UploadedCode(s3_prefix, script_name)

sagemaker.fw_utils.UploadedCode: An object containing the S3 prefix and script name.

class sagemaker.core.fw_utils.UploadedCode(s3_prefix, script_name)#

Bases: tuple

sagemaker.fw_utils.UploadedCode: An object containing the S3 prefix and script name.

This is for the source code used for the entry point with an Estimator. It can be instantiated with positional or keyword arguments.

s3_prefix#

Alias for field number 0

script_name#

Alias for field number 1

sagemaker.core.fw_utils.create_image_uri(region, framework, instance_type, framework_version, py_version=None, account=None, accelerator_type=None, optimized_families=None)[source]#

Deprecated method. Please use sagemaker.image_uris.retrieve().

Parameters:
  • region (str) – AWS region where the image is uploaded.

  • framework (str) – framework used by the image.

  • instance_type (str) – SageMaker instance type. Used to determine device type (cpu/gpu/family-specific optimized).

  • framework_version (str) – The version of the framework.

  • py_version (str) – Optional. Python version Ex: py38, py39, py310, py311. If not specified, image uri will not include a python component.

  • account (str) – AWS account that contains the image. (default: ‘520713654638’)

  • accelerator_type (str) – SageMaker Elastic Inference accelerator type.

  • optimized_families (str) – Deprecated. A no-op argument.

Returns:

the image uri

sagemaker.core.fw_utils.framework_name_from_image(image_uri)[source]#

Extract the framework and Python version from the image name.

Parameters:

image_uri (str) – Image URI, which should be one of the following forms: legacy: ‘<account>.dkr.ecr.<region>.amazonaws.com/sagemaker-<fw>-<py_ver>-<device>:<container_version>’ legacy: ‘<account>.dkr.ecr.<region>.amazonaws.com/sagemaker-<fw>-<py_ver>-<device>:<fw_version>-<device>-<py_ver>’ current: ‘<account>.dkr.ecr.<region>.amazonaws.com/sagemaker-<fw>:<fw_version>-<device>-<py_ver>’ current: ‘<account>.dkr.ecr.<region>.amazonaws.com/sagemaker-rl-<fw>:<rl_toolkit><rl_version>-<device>-<py_ver>’ current: ‘<account>.dkr.ecr.<region>.amazonaws.com/<fw>-<image_scope>:<fw_version>-<device>-<py_ver>’ current: ‘<account>.dkr.ecr.<region>.amazonaws.com/sagemaker-xgboost:<fw_version>-<container_version>’

Returns:

A tuple containing:

  • str: The framework name

  • str: The Python version

  • str: The image tag

  • str: If the TensorFlow image is script mode

Return type:

tuple

sagemaker.core.fw_utils.framework_version_from_tag(image_tag)[source]#

Extract the framework version from the image tag.

Parameters:

image_tag (str) – Image tag, which should take the form ‘<framework_version>-<device>-<py_version>’ ‘<xgboost_version>-<container_version>’

Returns:

The framework version.

Return type:

str

sagemaker.core.fw_utils.get_mp_parameters(distribution)[source]#

Get the model parallelism parameters provided by the user.

Parameters:

distribution – distribution dictionary defined by the user.

Returns:

dictionary containing model parallelism parameters used for training.

Return type:

params

sagemaker.core.fw_utils.model_code_key_prefix(code_location_key_prefix, model_name, image)[source]#

Returns the s3 key prefix for uploading code during model deployment.

The location returned is a potential concatenation of 2 parts
  1. code_location_key_prefix if it exists

  2. model_name or a name derived from the image

Parameters:
  • code_location_key_prefix (str) – the s3 key prefix from code_location

  • model_name (str) – the name of the model

  • image (str) – the image from which a default name can be extracted

Returns:

the key prefix to be used in uploading code

Return type:

str

sagemaker.core.fw_utils.parse_mp_parameters(params)[source]#

Parse the model parallelism parameters provided by the user.

Parameters:

params – a string representing path to an existing config, or a config dict.

Returns:

a dict of parsed config.

Return type:

parsed

Raises:

ValueError – if params is not a string or a dict, or the config file cannot be parsed as json.

sagemaker.core.fw_utils.profiler_config_deprecation_warning(profiler_config, image_uri, framework_name, framework_version)[source]#

Deprecation message if framework profiling is specified TF >= 2.12 and PT >= 2.0.

sagemaker.core.fw_utils.python_deprecation_warning(framework, latest_supported_version)[source]#

Placeholder docstring.

sagemaker.core.fw_utils.tar_and_upload_dir(session, bucket, s3_key_prefix, script, directory=None, dependencies=None, kms_key=None, s3_resource=None, settings: SessionSettings | None = None) UploadedCode[source]#

Package source files and upload a compress tar file to S3.

The S3 location will be s3://<bucket>/s3_key_prefix/sourcedir.tar.gz. If directory is an S3 URI, an UploadedCode object will be returned, but nothing will be uploaded to S3 (this allow reuse of code already in S3). If directory is None, the script will be added to the archive at ./<basename of script>. If directory is not None, the (recursive) contents of the directory will be added to the archive. directory is treated as the base path of the archive, and the script name is assumed to be a filename or relative path inside the directory.

Parameters:
  • session (boto3.Session) – Boto session used to access S3.

  • bucket (str) – S3 bucket to which the compressed file is uploaded.

  • s3_key_prefix (str) – Prefix for the S3 key.

  • script (str) – Script filename or path.

  • directory (str) – Optional. Directory containing the source file. If it starts with “s3://”, no action is taken.

  • dependencies (List[str]) – Optional. A list of paths to directories (absolute or relative) containing additional libraries that will be copied into /opt/ml/lib

  • kms_key (str) – Optional. KMS key ID used to upload objects to the bucket (default: None).

  • s3_resource (boto3.resource("s3")) – Optional. Pre-instantiated Boto3 Resource for S3 connections, can be used to customize the configuration, e.g. set the endpoint URL (default: None).

  • settings (sagemaker.session_settings.SessionSettings) – Optional. The settings of the SageMaker Session, can be used to override the default encryption behavior (default: None).

Returns:

An object with the S3 bucket and key (S3 prefix) and

script name.

Return type:

sagemaker.fw_utils.UploadedCode

sagemaker.core.fw_utils.validate_distribution(distribution: Dict, instance_groups: List[InstanceGroup], framework_name: str, framework_version: str, py_version: str, image_uri: str, kwargs: Dict) Dict[source]#

Check if distribution strategy is correctly invoked by the user.

Currently, check for dataparallel, modelparallel and heterogeneous cluster set up. Validate if the user requested strategy is supported.

Parameters:
  • distribution (dict) –

    A dictionary with information to enable distributed training. (Defaults to None if distributed training is not enabled.) For example:

    {
        "smdistributed": {
            "dataparallel": {
                "enabled": True
            }
        }
    }
    

  • instance_groups ([InstanceGroup]) – A list contains instance groups used for training.

  • framework_name (str) – A string representing the name of framework selected.

  • framework_version (str) – A string representing the framework version selected.

  • py_version (str) – A string representing the python version selected.

  • Expy38, py39, py310, py311

  • image_uri (str) – A string representing a Docker image URI.

  • kwargs (dict) – Additional kwargs passed to this function

Returns:

updated dictionary with validated information

to enable distributed training.

Return type:

distribution(dict)

Raises:

ValueError – if distribution dictionary isn’t correctly formatted or multiple strategies are requested simultaneously or an unsupported strategy is requested or strategy-specific inputs are incorrect/unsupported or heterogeneous cluster set up is incorrect

sagemaker.core.fw_utils.validate_distribution_for_instance_type(instance_type, distribution)[source]#

Check if the provided distribution strategy is supported for the instance_type.

Parameters:
  • instance_type (str) – A string representing the type of training instance selected.

  • distribution (dict) – A dictionary with information to enable distributed training.

sagemaker.core.fw_utils.validate_mp_config(config)[source]#

Validate the configuration dictionary for model parallelism.

Parameters:

config (dict) – Dictionary holding configuration keys and values.

Raises:

ValueError – If any of the keys have incorrect values.

sagemaker.core.fw_utils.validate_smdistributed(instance_type, framework_name, framework_version, py_version, distribution, image_uri=None)[source]#

Check if smdistributed strategy is correctly invoked by the user.

Currently, two strategies are supported: dataparallel or modelparallel. Validate if the user requested strategy is supported.

Currently, only one strategy can be specified at a time. Validate if the user has requested more than one strategy simultaneously.

Validate if the smdistributed dict arg is syntactically correct.

Additionally, perform strategy-specific validations.

Parameters:
  • instance_type (str) – A string representing the type of training instance selected.

  • framework_name (str) – A string representing the name of framework selected.

  • framework_version (str) – A string representing the framework version selected.

  • py_version (str) – A string representing the python version selected.

  • Expy38, py39, py310, py311

  • distribution (dict) –

    A dictionary with information to enable distributed training. (Defaults to None if distributed training is not enabled.) For example:

    {
        "smdistributed": {
            "dataparallel": {
                "enabled": True
            }
        }
    }
    

  • image_uri (str) – A string representing a Docker image URI.

Raises:

ValueError – if distribution dictionary isn’t correctly formatted or multiple strategies are requested simultaneously or an unsupported strategy is requested or strategy-specific inputs are incorrect/unsupported

sagemaker.core.fw_utils.validate_source_code_input_against_pipeline_variables(entry_point: str | PipelineVariable | None = None, source_dir: str | PipelineVariable | None = None, git_config: Dict[str, str] | None = None, enable_network_isolation: bool | PipelineVariable = False)[source]#

Validate source code input against pipeline variables.

Parameters:
  • entry_point (str or PipelineVariable) – The path to the local Python source file that should be executed as the entry point to training (default: None).

  • source_dir (str or PipelineVariable) – The Path to a directory with any other training source code dependencies aside from the entry point file (default: None).

  • git_config (Dict[str, str]) – Git configurations used for cloning files (default: None).

  • enable_network_isolation (bool or PipelineVariable) – Specifies whether container will run in network isolation mode (default: False).

sagemaker.core.fw_utils.validate_source_dir(script, directory)[source]#

Validate that the source directory exists and it contains the user script.

Parameters:
  • script (str) – Script filename.

  • directory (str) – Directory containing the source file.

Raises:

ValueError – If directory does not exist, is not a directory, or does not contain script.

sagemaker.core.fw_utils.validate_torch_distributed_distribution(instance_type, distribution, framework_version, py_version, image_uri, entry_point)[source]#

Check if torch_distributed distribution strategy is correctly invoked by the user.

Parameters:
  • instance_type (str) – A string representing the type of training instance selected.

  • distribution (dict) –

    A dictionary with information to enable distributed training. (Defaults to None if distributed training is not enabled.) For example:

    {
        "torch_distributed": {
            "enabled": True
        }
    }
    

  • framework_version (str) – A string representing the framework version selected.

  • py_version (str) – A string representing the python version selected.

  • Expy38, py39, py310, py311

  • image_uri (str) – A string representing a Docker image URI.

  • entry_point (str or PipelineVariable) – The absolute or relative path to the local Python source file that should be executed as the entry point to training.

Raises:

ValueError – if py_version is not python3 or framework_version is not compatible with instance types

sagemaker.core.fw_utils.validate_version_or_image_args(framework_version, py_version, image_uri)[source]#

Checks if version or image arguments are specified.

Validates framework and model arguments to enforce version or image specification.

Parameters:
  • framework_version (str) – The version of the framework.

  • py_version (str) – A string representing the python version selected.

  • Expy38, py39, py310, py311

  • image_uri (str) – The URI of the image.

Raises:

ValueError – if image_uri is None and either framework_version or py_version is None.

sagemaker.core.fw_utils.warn_if_parameter_server_with_multi_gpu(training_instance_type, distribution)[source]#

Warn the user about training when it doesn’t leverage all the GPU cores.

Warn the user that training will not fully leverage all the GPU cores if parameter server is enabled and a multi-GPU instance is selected. Distributed training with the default parameter server setup doesn’t support multi-GPU instances.

Parameters:
  • training_instance_type (str) – A string representing the type of training instance selected.

  • distribution (dict) –

    A dictionary with information to enable distributed training. (Defaults to None if distributed training is not enabled.) For example:

    {
        "parameter_server": {
            "enabled": True
        }
    }