sagemaker.core.processing

sagemaker.core.processing#

This module contains code related to the Processor class.

which is used for Amazon SageMaker Processing Jobs. These jobs let users perform data pre-processing, post-processing, feature engineering, data validation, and model evaluation, and interpretation on Amazon SageMaker.

Functions

logs_for_processing_job(sagemaker_session, ...)

Display logs for a given processing job, optionally tailing them until the is complete.

Classes

`FeatureStoreOutput`(**kwargs)	Configuration for processing job outputs in Amazon SageMaker Feature Store.
`FrameworkProcessor`(image_uri[, role, ...])	Handles Amazon SageMaker processing tasks using ModelTrainer for code packaging.
`Processor`([role, image_uri, instance_count, ...])	Handles Amazon SageMaker Processing tasks.
`ScriptProcessor`([role, image_uri, command, ...])	Handles Amazon SageMaker processing tasks for jobs using a machine learning framework.

class sagemaker.core.processing.FeatureStoreOutput(**kwargs)[source]#

Bases: ApiObject

Configuration for processing job outputs in Amazon SageMaker Feature Store.

feature_group_name: str | None = None#

Bases: ScriptProcessor

Handles Amazon SageMaker processing tasks using ModelTrainer for code packaging.

framework_entrypoint_command = ['/bin/bash']#

Runs a processing job.

Parameters:

code (str) – This can be an S3 URI or a local path to a file with the framework script to run.
source_dir (str) – Path (absolute, relative or an S3 URI) to a directory with any other processing source code dependencies aside from the entry point file (default: None).
requirements (str) – Path to a requirements.txt file relative to source_dir (default: None).
inputs (list[ProcessingInput]) – Input files for the processing job. These must be provided as ProcessingInput objects (default: None).
outputs (list[ProcessingOutput]) – Outputs for the processing job. These can be specified as either path strings or ProcessingOutput objects (default: None).
arguments (list[str] or list[PipelineVariable]) – A list of string arguments to be passed to a processing job (default: None).
wait (bool) – Whether the call should wait until the job completes (default: True).
logs (bool) – Whether to show the logs produced by the job. Only meaningful when wait is True (default: True).
job_name (str) – Processing job name. If not specified, the processor generates a default job name, based on the base job name and current timestamp.
experiment_config (dict[str, str]) – Experiment management configuration.
kms_key (str) – The ARN of the KMS key that is used to encrypt the user code file (default: None).

Returns:

None or pipeline step arguments in case the Processor instance is built with PipelineSession

Bases: object

Handles Amazon SageMaker Processing tasks.

JOB_CLASS_NAME = 'processing-job'#

Runs a processing job.

Parameters:

inputs (list[ProcessingInput]) – Input files for the processing job. These must be provided as ProcessingInput objects (default: None).
outputs (list[ProcessingOutput]) – Outputs for the processing job. These can be specified as either path strings or ProcessingOutput objects (default: None).
arguments (list[str] or list[PipelineVariable]) – A list of string arguments to be passed to a processing job (default: None).
wait (bool) – Whether the call should wait until the job completes (default: True).
logs (bool) – Whether to show the logs produced by the job. Only meaningful when wait is True (default: True).
job_name (str) – Processing job name. If not specified, the processor generates a default job name, based on the base job name and current timestamp.
experiment_config (dict[str, str]) – Experiment management configuration. Optionally, the dict can contain three keys: ‘ExperimentName’, ‘TrialName’, and ‘TrialComponentDisplayName’. The behavior of setting these keys is as follows: * If ExperimentName is supplied but TrialName is not a Trial will be automatically created and the job’s Trial Component associated with the Trial. * If TrialName is supplied and the Trial already exists the job’s Trial Component will be associated with the Trial. * If both ExperimentName and TrialName are not supplied the trial component will be unassociated. * TrialComponentDisplayName is used for display in Studio. * Both ExperimentName and TrialName will be ignored if the Processor instance is built with PipelineSession. However, the value of TrialComponentDisplayName is honored for display in Studio.
kms_key (str) – The ARN of the KMS key that is used to encrypt the user code file (default: None).

Returns:

None or pipeline step arguments in case the Processor instance is built with PipelineSession

Raises:

ValueError – if logs is True but wait is False.

Bases: Processor

Handles Amazon SageMaker processing tasks for jobs using a machine learning framework.

Runs a processing job.

Parameters:

code (str) – This can be an S3 URI or a local path to a file with the framework script to run.
inputs (list[ProcessingInput]) – Input files for the processing job. These must be provided as ProcessingInput objects (default: None).
outputs (list[ProcessingOutput]) – Outputs for the processing job. These can be specified as either path strings or ProcessingOutput objects (default: None).
arguments (list[str]) – A list of string arguments to be passed to a processing job (default: None).
wait (bool) – Whether the call should wait until the job completes (default: True).
logs (bool) – Whether to show the logs produced by the job. Only meaningful when wait is True (default: True).
job_name (str) – Processing job name. If not specified, the processor generates a default job name, based on the base job name and current timestamp.
experiment_config (dict[str, str]) – Experiment management configuration. Optionally, the dict can contain three keys: ‘ExperimentName’, ‘TrialName’, and ‘TrialComponentDisplayName’. The behavior of setting these keys is as follows: * If ExperimentName is supplied but TrialName is not a Trial will be automatically created and the job’s Trial Component associated with the Trial. * If TrialName is supplied and the Trial already exists the job’s Trial Component will be associated with the Trial. * If both ExperimentName and TrialName are not supplied the trial component will be unassociated. * TrialComponentDisplayName is used for display in Studio. * Both ExperimentName and TrialName will be ignored if the Processor instance is built with PipelineSession. However, the value of TrialComponentDisplayName is honored for display in Studio.
kms_key (str) – The ARN of the KMS key that is used to encrypt the user code file (default: None).

Returns:

None or pipeline step arguments in case the Processor instance is built with PipelineSession

sagemaker.core.processing.logs_for_processing_job(sagemaker_session, job_name, wait=False, poll=10)[source]#

Display logs for a given processing job, optionally tailing them until the is complete.

Parameters:

job_name (str) – Name of the processing job to display the logs for.
wait (bool) – Whether to keep looking for new log entries until the job completes (default: False).
poll (int) – The interval in seconds between polling for new log entries and job completion (default: 5).

Raises:

ValueError – If the processing job fails.

sagemaker.core.processing

Contents

sagemaker.core.processing#