sagemaker.core.processing#
This module contains code related to the Processor class.
which is used for Amazon SageMaker Processing Jobs. These jobs let users perform data pre-processing, post-processing, feature engineering, data validation, and model evaluation, and interpretation on Amazon SageMaker.
Functions
|
Display logs for a given processing job, optionally tailing them until the is complete. |
Classes
|
Configuration for processing job outputs in Amazon SageMaker Feature Store. |
|
Handles Amazon SageMaker processing tasks using ModelTrainer for code packaging. |
|
Handles Amazon SageMaker Processing tasks. |
|
Handles Amazon SageMaker processing tasks for jobs using a machine learning framework. |
- class sagemaker.core.processing.FeatureStoreOutput(**kwargs)[source]#
Bases:
ApiObjectConfiguration for processing job outputs in Amazon SageMaker Feature Store.
- feature_group_name: str | None = None#
- class sagemaker.core.processing.FrameworkProcessor(image_uri: str | PipelineVariable, role: str | PipelineVariable | None = None, instance_count: int | PipelineVariable | None = None, instance_type: str | PipelineVariable | None = None, command: List[str] | None = None, volume_size_in_gb: int | PipelineVariable = 30, volume_kms_key: str | PipelineVariable | None = None, output_kms_key: str | PipelineVariable | None = None, code_location: str | None = None, max_runtime_in_seconds: int | PipelineVariable | None = None, base_job_name: str | None = None, sagemaker_session: Session | None = None, env: Dict[str, str | PipelineVariable] | None = None, tags: List[Dict[str, str | PipelineVariable]] | Dict[str, str | PipelineVariable] | None = None, network_config: NetworkConfig | None = None)[source]#
Bases:
ScriptProcessorHandles Amazon SageMaker processing tasks using ModelTrainer for code packaging.
- framework_entrypoint_command = ['/bin/bash']#
- run(code: str, source_dir: str | None = None, requirements: str | None = None, inputs: List[ProcessingInput] | None = None, outputs: List[ProcessingOutput] | None = None, arguments: List[str | PipelineVariable] | None = None, wait: bool = True, logs: bool = True, job_name: str | None = None, experiment_config: Dict[str, str] | None = None, kms_key: str | None = None)[source]#
Runs a processing job.
- Parameters:
code (str) – This can be an S3 URI or a local path to a file with the framework script to run.
source_dir (str) – Path (absolute, relative or an S3 URI) to a directory with any other processing source code dependencies aside from the entry point file (default: None).
requirements (str) – Path to a requirements.txt file relative to source_dir (default: None).
inputs (list[
ProcessingInput]) – Input files for the processing job. These must be provided asProcessingInputobjects (default: None).outputs (list[
ProcessingOutput]) – Outputs for the processing job. These can be specified as either path strings orProcessingOutputobjects (default: None).arguments (list[str] or list[PipelineVariable]) – A list of string arguments to be passed to a processing job (default: None).
wait (bool) – Whether the call should wait until the job completes (default: True).
logs (bool) – Whether to show the logs produced by the job. Only meaningful when wait is True (default: True).
job_name (str) – Processing job name. If not specified, the processor generates a default job name, based on the base job name and current timestamp.
experiment_config (dict[str, str]) – Experiment management configuration.
kms_key (str) – The ARN of the KMS key that is used to encrypt the user code file (default: None).
- Returns:
None or pipeline step arguments in case the Processor instance is built with
PipelineSession
- class sagemaker.core.processing.Processor(role: str | None = None, image_uri: str | PipelineVariable | None = None, instance_count: int | PipelineVariable | None = None, instance_type: str | PipelineVariable | None = None, entrypoint: List[str | PipelineVariable] | None = None, volume_size_in_gb: int | PipelineVariable = 30, volume_kms_key: str | PipelineVariable | None = None, output_kms_key: str | PipelineVariable | None = None, max_runtime_in_seconds: int | PipelineVariable | None = None, base_job_name: str | None = None, sagemaker_session: Session | None = None, env: Dict[str, str | PipelineVariable] | None = None, tags: List[Dict[str, str | PipelineVariable]] | Dict[str, str | PipelineVariable] | None = None, network_config: NetworkConfig | None = None)[source]#
Bases:
objectHandles Amazon SageMaker Processing tasks.
- JOB_CLASS_NAME = 'processing-job'#
- run(inputs: List[ProcessingInput] | None = None, outputs: List[ProcessingOutput] | None = None, arguments: List[str | PipelineVariable] | None = None, wait: bool = True, logs: bool = True, job_name: str | None = None, experiment_config: Dict[str, str] | None = None, kms_key: str | None = None)[source]#
Runs a processing job.
- Parameters:
inputs (list[
ProcessingInput]) – Input files for the processing job. These must be provided asProcessingInputobjects (default: None).outputs (list[
ProcessingOutput]) – Outputs for the processing job. These can be specified as either path strings orProcessingOutputobjects (default: None).arguments (list[str] or list[PipelineVariable]) – A list of string arguments to be passed to a processing job (default: None).
wait (bool) – Whether the call should wait until the job completes (default: True).
logs (bool) – Whether to show the logs produced by the job. Only meaningful when
waitis True (default: True).job_name (str) – Processing job name. If not specified, the processor generates a default job name, based on the base job name and current timestamp.
experiment_config (dict[str, str]) – Experiment management configuration. Optionally, the dict can contain three keys: ‘ExperimentName’, ‘TrialName’, and ‘TrialComponentDisplayName’. The behavior of setting these keys is as follows: * If ExperimentName is supplied but TrialName is not a Trial will be automatically created and the job’s Trial Component associated with the Trial. * If TrialName is supplied and the Trial already exists the job’s Trial Component will be associated with the Trial. * If both ExperimentName and TrialName are not supplied the trial component will be unassociated. * TrialComponentDisplayName is used for display in Studio. * Both ExperimentName and TrialName will be ignored if the Processor instance is built with
PipelineSession. However, the value of TrialComponentDisplayName is honored for display in Studio.kms_key (str) – The ARN of the KMS key that is used to encrypt the user code file (default: None).
- Returns:
None or pipeline step arguments in case the Processor instance is built with
PipelineSession- Raises:
ValueError – if
logsis True butwaitis False.
- class sagemaker.core.processing.ScriptProcessor(role: str | PipelineVariable | None = None, image_uri: str | PipelineVariable | None = None, command: List[str] | None = None, instance_count: int | PipelineVariable | None = None, instance_type: str | PipelineVariable | None = None, volume_size_in_gb: int | PipelineVariable = 30, volume_kms_key: str | PipelineVariable | None = None, output_kms_key: str | PipelineVariable | None = None, max_runtime_in_seconds: int | PipelineVariable | None = None, base_job_name: str | None = None, sagemaker_session: Session | None = None, env: Dict[str, str | PipelineVariable] | None = None, tags: List[Dict[str, str | PipelineVariable]] | Dict[str, str | PipelineVariable] | None = None, network_config: NetworkConfig | None = None)[source]#
Bases:
ProcessorHandles Amazon SageMaker processing tasks for jobs using a machine learning framework.
- run(code: str, inputs: List[ProcessingInput] | None = None, outputs: List[ProcessingOutput] | None = None, arguments: List[str | PipelineVariable] | None = None, wait: bool = True, logs: bool = True, job_name: str | None = None, experiment_config: Dict[str, str] | None = None, kms_key: str | None = None)[source]#
Runs a processing job.
- Parameters:
code (str) – This can be an S3 URI or a local path to a file with the framework script to run.
inputs (list[
ProcessingInput]) – Input files for the processing job. These must be provided asProcessingInputobjects (default: None).outputs (list[
ProcessingOutput]) – Outputs for the processing job. These can be specified as either path strings orProcessingOutputobjects (default: None).arguments (list[str]) – A list of string arguments to be passed to a processing job (default: None).
wait (bool) – Whether the call should wait until the job completes (default: True).
logs (bool) – Whether to show the logs produced by the job. Only meaningful when wait is True (default: True).
job_name (str) – Processing job name. If not specified, the processor generates a default job name, based on the base job name and current timestamp.
experiment_config (dict[str, str]) – Experiment management configuration. Optionally, the dict can contain three keys: ‘ExperimentName’, ‘TrialName’, and ‘TrialComponentDisplayName’. The behavior of setting these keys is as follows: * If ExperimentName is supplied but TrialName is not a Trial will be automatically created and the job’s Trial Component associated with the Trial. * If TrialName is supplied and the Trial already exists the job’s Trial Component will be associated with the Trial. * If both ExperimentName and TrialName are not supplied the trial component will be unassociated. * TrialComponentDisplayName is used for display in Studio. * Both ExperimentName and TrialName will be ignored if the Processor instance is built with
PipelineSession. However, the value of TrialComponentDisplayName is honored for display in Studio.kms_key (str) – The ARN of the KMS key that is used to encrypt the user code file (default: None).
- Returns:
None or pipeline step arguments in case the Processor instance is built with
PipelineSession
- sagemaker.core.processing.logs_for_processing_job(sagemaker_session, job_name, wait=False, poll=10)[source]#
Display logs for a given processing job, optionally tailing them until the is complete.
- Parameters:
job_name (str) – Name of the processing job to display the logs for.
wait (bool) – Whether to keep looking for new log entries until the job completes (default: False).
poll (int) – The interval in seconds between polling for new log entries and job completion (default: 5).
- Raises:
ValueError – If the processing job fails.