sagemaker.core.inputs#

Amazon SageMaker channel configurations for S3 data sources and file system data sources

Classes

BatchDataCaptureConfig(destination_s3_uri[, ...])

Configuration object passed in when create a batch transform job.

CreateModelInput([instance_type, ...])

A class containing parameters which can be used to create a SageMaker Model

FileSystemInput(file_system_id, ...[, ...])

Amazon SageMaker channel configurations for file system data sources.

ShuffleConfig(seed)

For configuring channel shuffling using a seed.

TrainingInput(s3_data[, distribution, ...])

Amazon SageMaker channel configurations for S3 data sources.

TransformInput(data[, data_type, ...])

Creates a class containing parameters for configuring input data for a batch tramsform job.

class sagemaker.core.inputs.BatchDataCaptureConfig(destination_s3_uri: str, kms_key_id: str | None = None, generate_inference_id: bool | None = None)[source]#

Bases: object

Configuration object passed in when create a batch transform job.

Specifies configuration related to batch transform job data capture for use with Amazon SageMaker Model Monitoring

class sagemaker.core.inputs.CreateModelInput(instance_type: str | None = None, accelerator_type: str | None = None)[source]#

Bases: object

A class containing parameters which can be used to create a SageMaker Model

Parameters:
  • instance_type (str) – type or EC2 instance will be used for model deployment.

  • accelerator_type (str) – elastic inference accelerator type.

accelerator_type: str#
instance_type: str#
class sagemaker.core.inputs.FileSystemInput(file_system_id, file_system_type, directory_path, file_system_access_mode='ro', content_type=None)[source]#

Bases: object

Amazon SageMaker channel configurations for file system data sources.

config#

A Sagemaker File System DataSource.

Type:

dict[str, dict]

class sagemaker.core.inputs.ShuffleConfig(seed)[source]#

Bases: object

For configuring channel shuffling using a seed.

For more detail, see the AWS documentation: https://docs.aws.amazon.com/sagemaker/latest/dg/API_ShuffleConfig.html

class sagemaker.core.inputs.TrainingInput(s3_data: str | PipelineVariable, distribution: str | PipelineVariable | None = None, compression: str | PipelineVariable | None = None, content_type: str | PipelineVariable | None = None, record_wrapping: str | PipelineVariable | None = None, s3_data_type: str | PipelineVariable = 'S3Prefix', instance_groups: List[str | PipelineVariable] | None = None, input_mode: str | PipelineVariable | None = None, attribute_names: List[str | PipelineVariable] | None = None, target_attribute_name: str | PipelineVariable | None = None, shuffle_config: ShuffleConfig | None = None, hub_access_config: dict | None = None, model_access_config: dict | None = None)[source]#

Bases: object

Amazon SageMaker channel configurations for S3 data sources.

config#

A SageMaker DataSource referencing a SageMaker S3DataSource.

Type:

dict[str, dict]

add_hub_access_config(hub_access_config=None)[source]#

Add Hub Access Config to the channel’s configuration.

Parameters:
  • hub_access_config (dict) – The HubAccessConfig to be added to the

  • configuration. (channel's)

add_model_access_config(model_access_config=None)[source]#

Add Model Access Config to the channel’s configuration.

Parameters:

model_access_config (dict) – Whether model terms of use have been accepted.

class sagemaker.core.inputs.TransformInput(data: str, data_type: str = 'S3Prefix', content_type: str | None = None, compression_type: str | None = None, split_type: str | None = None, input_filter: str | None = None, output_filter: str | None = None, join_source: str | None = None, model_client_config: dict | None = None, batch_data_capture_config: dict | None = None)[source]#

Bases: object

Creates a class containing parameters for configuring input data for a batch tramsform job.

It can be used when calling sagemaker.transformer.Transformer.transform()

Parameters:
  • data (str) – The S3 location of the input data that the model can consume.

  • data_type (str) – The data type for a batch transform job. (default: 'S3Prefix')

  • content_type (str) – The multi-purpose internet email extension (MIME) type of the data. (default: None)

  • compression_type (str) – If your transform data is compressed, specify the compression type. Valid values: 'Gzip', None (default: None)

  • split_type (str) – The method to use to split the transform job’s data files into smaller batches. Valid values: 'Line', RecordIO, 'TFRecord', None (default: None)

  • input_filter (str) – A JSONPath expression for selecting a portion of the input data to pass to the algorithm. For example, you can use this parameter to exclude fields, such as an ID column, from the input. If you want SageMaker to pass the entire input dataset to the algorithm, accept the default value $. For more information on batch transform data processing, input, join, and output, see Associate Prediction Results with Input Records in the Amazon SageMaker developer guide. Example value: $. For more information about valid values for this parameter, see JSONPath Operators in the Amazon SageMaker developer guide. (default: $)

  • output_filter (str) –

    A JSONPath expression for selecting a portion of the joined dataset to save in the output file for a batch transform job. If you want SageMaker to store the entire input dataset in the output file, leave the default value, $. If you specify indexes that aren’t within the dimension size of the joined dataset, you get an error. Example value: $. For more information about valid values for this parameter, see JSONPath Operators in the Amazon SageMaker developer guide. (default: $)

  • join_source (str) – Specifies the source of the data to join with the transformed data. The default value is None, which specifies not to join the input with the transformed data. If you want the batch transform job to join the original input data with the transformed data, set to Input. Valid values: None, Input (default: None)

  • model_client_config (dict) –

    Configures the timeout and maximum number of retries for processing a transform job invocation.

    • 'InvocationsTimeoutInSeconds' (int) - The timeout value in seconds for an invocation request. The default value is 600.

    • 'InvocationsMaxRetries' (int) - The maximum number of retries when invocation requests are failing.

    (default: {600,3})

  • batch_data_capture_config (dict) – The dict is an object of BatchDataCaptureConfig and specifies configuration related to batch transform job for use with Amazon SageMaker Model Monitoring. For more information, see Capture data from batch transform job in the Amazon SageMaker developer guide. (default: None)

batch_data_capture_config: dict#
compression_type: str#
content_type: str#
data: str#
data_type: str#
input_filter: str#
join_source: str#
model_client_config: dict#
output_filter: str#
split_type: str#