sagemaker.mlops.workflow.function_step#
A proxy to the function returns of arbitrary type.
Functions
|
Decorator for converting a python function to a pipeline step. |
Classes
|
A proxy to the function returns of arbitrary type. |
- class sagemaker.mlops.workflow.function_step.DelayedReturn(function_step: _FunctionStep, reference_path: tuple = ())[source]#
Bases:
StepOutputA proxy to the function returns of arbitrary type.
When a function decorated with
@stepis invoked, the return of that function is of type DelayedReturn. If the DelayedReturn object represents a Python collection, such as a tuple, list, or dict, you can reference the child items in the following ways:a_member = a_delayed_return[2]a_member = a_delayed_return["a_key"]a_member = a_delayed_return[2]["a_key"]
- property expr: Dict[str, Any] | List[Dict[str, Any]]#
Get the expression structure for workflow service calls.
- sagemaker.mlops.workflow.function_step.step(_func=None, *, name: str | None = None, display_name: str | None = None, description: str | None = None, retry_policies: List[RetryPolicy] | None = None, dependencies: str = None, pre_execution_commands: List[str] = None, pre_execution_script: str = None, environment_variables: Dict[str, str | PipelineVariable] | None = None, image_uri: str | PipelineVariable | None = None, instance_count: int | PipelineVariable = 1, instance_type: str | PipelineVariable | None = None, job_conda_env: str | PipelineVariable | None = None, job_name_prefix: str | None = None, keep_alive_period_in_seconds: int | PipelineVariable = 0, max_retry_attempts: int | PipelineVariable = 1, max_runtime_in_seconds: int | PipelineVariable = 86400, role: str = None, security_group_ids: List[str | PipelineVariable] | None = None, subnets: List[str | PipelineVariable] | None = None, tags: List[Dict[str, str | PipelineVariable]] | Dict[str, str | PipelineVariable] | None = None, volume_kms_key: str | PipelineVariable | None = None, volume_size: int | PipelineVariable = 30, encrypt_inter_container_traffic: bool | PipelineVariable | None = None, spark_config: SparkConfig = None, use_spot_instances: bool | PipelineVariable = False, max_wait_time_in_seconds: int | PipelineVariable | None = None)[source]#
Decorator for converting a python function to a pipeline step.
This decorator wraps the annotated code into a DelayedReturn object which can then be passed to a pipeline as a step. This creates a new pipeline that proceeds from the step of the DelayedReturn object.
If the value for a parameter is not set, the decorator first looks up the value from the SageMaker configuration file. If no value is specified in the configuration file or no configuration file is found, the decorator selects the default as specified in the following list. For more information, see Configuring and using defaults with the SageMaker Python SDK.
- Parameters:
_func – A Python function to run as a SageMaker pipeline step.
name (str) – Name of the pipeline step. Defaults to a generated name using function name and uuid4 identifier to avoid duplicates.
display_name (str) – The display name of the pipeline step. Defaults to the function name.
description (str) – The description of the pipeline step. Defaults to the function docstring. If there is no docstring, then it defaults to the function file path.
retry_policies (List[RetryPolicy]) – A list of retry policies configured for this step. Defaults to
None.dependencies (str) –
The path to a dependencies file. Defaults to
None. Ifdependenciesis provided, the value must be one of the following:A path to a conda environment.yml file. The following conditions apply:
If
job_conda_envis set, then the conda environment is updated by installing dependencies from the yaml file and the function is invoked within that conda environment. For this to succeed, the specified conda environment must already exist in the image.If the environment variable
SAGEMAKER_JOB_CONDA_ENVis set in the image, then the conda environment is updated by installing dependencies from the yaml file and the function is invoked within that conda environment. For this to succeed, the conda environment name must already be set withSAGEMAKER_JOB_CONDA_ENV, andSAGEMAKER_JOB_CONDA_ENVmust already exist in the image.If none of the previous conditions are met, a new conda environment named
sagemaker-runtime-envis created and the function annotated with the remote decorator is invoked in that conda environment.
A path to a requirements.txt file. The following conditions apply:
If
job_conda_envis set in the remote decorator, dependencies are installed within that conda environment and the function annotated with the remote decorator is invoked in the same conda environment. For this to succeed, the specified conda environment must already exist in the image.If an environment variable
SAGEMAKER_JOB_CONDA_ENVis set in the image, dependencies are installed within that conda environment and the function annotated with the remote decorator is invoked in the environment. For this to succeed, the conda environment name must already be set inSAGEMAKER_JOB_CONDA_ENV, andSAGEMAKER_JOB_CONDA_ENVmust already exist in the image.If none of the above conditions are met, conda is not used. Dependencies are installed at the system level without any virtual environment, and the function annotated with the remote decorator is invoked using the Python runtime available in the system path.
None. SageMaker assumes that there are no dependencies to install while executing the remote annotated function in the training job.
pre_execution_commands (List[str]) – A list of commands to be executed prior to executing the pipeline step. Only one of
pre_execution_commandsorpre_execution_scriptcan be specified at the same time. Defaults toNone.pre_execution_script (str) – A path to a script file to be executed prior to executing the pipeline step. Only one of
pre_execution_commandsorpre_execution_scriptcan be specified at the same time. Defaults toNone.environment_variables (dict[str, str] or dict[str, PipelineVariable]) – Environment variables to be used inside the step. Defaults to
None.image_uri (str, PipelineVariable) –
The universal resource identifier (URI) location of a Docker image on Amazon Elastic Container Registry (ECR). Defaults to the following, based on where the SDK is running:
If you specify
spark_configand want to run the step in a Spark application, theimage_urishould beNone. A SageMaker Spark image is used for training, otherwise aValueErroris thrown.If you use SageMaker Studio notebooks, the image used as the kernel image for the notebook is used.
Otherwise, it is resolved to a base python image with the same python version as the environment running the local code.
If no compatible image is found, a
ValueErroris thrown.instance_count (int, PipelineVariable) – The number of instances to use. Defaults to 1. Note that pipeline steps do not support values of
instance_countgreater than 1 for non-Spark jobs.instance_type (str, PipelineVariable) – The Amazon Elastic Compute Cloud (EC2) instance type to use to run the SageMaker job. For example,
ml.c4.xlarge. If not provided, aValueErroris thrown.job_conda_env (str, PipelineVariable) – The name of the conda environment to activate during the job’s runtime. Defaults to
None.job_name_prefix (str) – The prefix used to create the underlying SageMaker job.
keep_alive_period_in_seconds (int, PipelineVariable) – The duration in seconds to retain and reuse provisioned infrastructure after the completion of a training job. This infrastructure is also known as SageMaker managed warm pools. The use of warm pools reduces the latency time spent to provision new resources. The default value for
keep_alive_period_in_secondsis 0. Note that additional charges associated with warm pools may apply. Using this parameter also activates a new persistent cache feature which reduces job start up latency more than if you were to use SageMaker managed warm pools alone. This occurs because the package source downloaded in the previous runs are cached.max_retry_attempts (int, PipelineVariable) – The max number of times the job is retried after an
InternalServerFailureerror from the SageMaker service. Defaults to 1.max_runtime_in_seconds (int, PipelineVariable) – The upper limit in seconds to be used for training. After this specified amount of time, SageMaker terminates the job regardless of its current status. Defaults to 1 day or (86400 seconds).
role (str) –
The IAM role (either name or full ARN) used to run your SageMaker training job. Defaults to one of the following:
The SageMaker default IAM role if the SDK is running in SageMaker Notebooks or SageMaker Studio Notebooks.
Otherwise, a
ValueErroris thrown.
security_group_ids (List[str, PipelineVariable]) – A list of security group IDs. Defaults to
Noneand the training job is created without a VPC config.subnets (List[str, PipelineVariable]) – A list of subnet IDs. Defaults to
Noneand the job is created without a VPC config.tags (Optional[Tags]) – Tags attached to the job. Defaults to
Noneand the training job is created without tags.volume_kms_key (str, PipelineVariable) – An Amazon Key Management Service (KMS) key used to encrypt an Amazon Elastic Block Storage (EBS) volume attached to the training instance. Defaults to
None.volume_size (int, PipelineVariable) – The size in GB of the storage volume that stores input and output data during training. Defaults to
30.encrypt_inter_container_traffic (bool, PipelineVariable) – A flag that specifies whether traffic between training containers is encrypted for the training job. Defaults to
False.spark_config (SparkConfig) – Configurations of the Spark application that runs on the Spark image. If
spark_configis specified, a SageMaker Spark image URI is used for training. Note thatimage_urican not be specified at the same time, otherwise aValueErroris thrown. Defaults toNone.use_spot_instances (bool, PipelineVariable) – Specifies whether to use SageMaker Managed Spot instances for training. If enabled, then
max_wait_time_in_secondsargument should also be set. Defaults toFalse.max_wait_time_in_seconds (int, PipelineVariable) – Timeout in seconds waiting for the spot training job. After this amount of time, Amazon SageMaker stops waiting for the managed spot training job to complete. Defaults to
None.