SageMaker Serve

SageMaker Serve#

Model serving and inference capabilities for deploying and managing ML models.

Model Deployment#

Local SageMaker Serve development package.

This __init__.py file imports key modules used by inference scripts to prevent Python module resolution conflicts with external serve.py files.

The imports below “prime” the module cache so that sagemaker.serve is recognized as a package, preventing conflicts when inference scripts import from submodules.

class sagemaker.serve.InferenceSpec[source]#

Bases: ABC

Abstract base class for holding custom load, invoke and prepare functions.

Provides a skeleton for customization to override the methods load, invoke and prepare.

get_model()[source]#: Return HuggingFace model name for inference spec

abstract invoke(input_object: object, model: object)[source]#

Given model object and input, make inference and return the result.

Parameters:

input_object (object) – The input to model
model (object) – The model object

abstract load(model_dir: str)[source]#

Loads the model stored in model_dir and return the model object.

Parameters:: model_dir (str) – Path to the directory where the model is stored.

postprocess(predictions: object)[source]#: Custom post-processing function

prepare(*args, **kwargs)[source]#: Custom prepare function

preprocess(input_data: object)[source]#: Custom pre-processing function

class sagemaker.serve.ModelBuilder(model: object | str | ~sagemaker.train.model_trainer.ModelTrainer | ~sagemaker.train.base_trainer.BaseTrainer | ~sagemaker.core.resources.TrainingJob | ~sagemaker.core.resources.ModelPackage | ~typing.List[~sagemaker.core.resources.Model] | None = None, model_path: str | None = <factory>, inference_spec: ~sagemaker.serve.spec.inference_spec.InferenceSpec | None = None, schema_builder: ~sagemaker.serve.builder.schema_builder.SchemaBuilder | None = None, modelbuilder_list: ~typing.List[~sagemaker.serve.model_builder.ModelBuilder] | None = None, role_arn: str | None = None, sagemaker_session: ~sagemaker.core.helper.session_helper.Session | None = None, image_uri: str | ~sagemaker.core.helper.pipeline_variable.PipelineVariable | None = None, s3_model_data_url: str | ~sagemaker.core.helper.pipeline_variable.PipelineVariable | ~typing.Dict[str, ~typing.Any] | None = None, source_code: ~sagemaker.core.training.configs.SourceCode | None = None, env_vars: ~typing.Dict[str, str | ~sagemaker.core.helper.pipeline_variable.PipelineVariable] | None = <factory>, model_server: ~sagemaker.serve.utils.types.ModelServer | None = None, model_metadata: ~typing.Dict[str, ~typing.Any] | None = None, log_level: int | None = 10, content_type: str | None = None, accept_type: str | None = None, compute: ~sagemaker.core.training.configs.Compute | None = None, network: ~sagemaker.core.training.configs.Networking | None = None, instance_type: str | None = None, mode: ~sagemaker.serve.mode.function_pointers.Mode | None = Mode.SAGEMAKER_ENDPOINT, shared_libs: ~typing.List[str] = <factory>, dependencies: ~typing.Dict[str, ~typing.Any] | None = <factory>, image_config: ~typing.Dict[str, str | ~sagemaker.core.helper.pipeline_variable.PipelineVariable] | None = None)[source]#

Bases: _InferenceRecommenderMixin, _ModelBuilderServers, _ModelBuilderUtils

Unified interface for building and deploying machine learning models.

ModelBuilder provides a streamlined workflow for preparing and deploying ML models to Amazon SageMaker. It supports multiple frameworks (PyTorch, TensorFlow, HuggingFace, etc.), model servers (TorchServe, TGI, Triton, etc.), and deployment modes (SageMaker endpoints, local containers, in-process).

The typical workflow involves three steps: 1. Initialize ModelBuilder with your model and configuration 2. Call build() to create a deployable Model resource 3. Call deploy() to create an Endpoint resource for inference

Example

>>> from sagemaker.serve.model_builder import ModelBuilder
>>> from sagemaker.serve.mode.function_pointers import Mode
>>>
>>> # Initialize with a trained model
>>> model_builder = ModelBuilder(
...     model=my_pytorch_model,
...     role_arn="arn:aws:iam::123456789012:role/SageMakerRole",
...     instance_type="ml.m5.xlarge"
... )
>>>
>>> # Build the model (creates SageMaker Model resource)
>>> model = model_builder.build()
>>>
>>> # Deploy to endpoint (creates SageMaker Endpoint resource)
>>> endpoint = model_builder.deploy(endpoint_name="my-endpoint")
>>>
>>> # Make predictions
>>> result = endpoint.invoke(data=input_data)

Parameters:

model – The model to deploy. Can be a trained model object, ModelTrainer, TrainingJob, ModelPackage, or JumpStart model ID string. Either model or inference_spec is required.
model_path – Local directory path where model artifacts are stored or will be downloaded.
inference_spec – Custom inference specification with load() and invoke() functions.
schema_builder – Defines input/output schema for serialization and deserialization.
modelbuilder_list – List of ModelBuilder objects for multi-model deployments.
pipeline_models – List of Model objects for creating inference pipelines.
role_arn – IAM role ARN for SageMaker to assume.
sagemaker_session – Session object for managing SageMaker API interactions.
image_uri – Container image URI. Auto-selected if not specified.
s3_model_data_url – S3 URI where model artifacts are stored or will be uploaded.
source_code – Source code configuration for custom inference code.
env_vars – Environment variables to set in the container.
model_server – Model server to use (TORCHSERVE, TGI, TRITON, etc.).
model_metadata – Dictionary to override model metadata (HF_TASK, MLFLOW_MODEL_PATH, etc.).
log_level – Logging level for ModelBuilder operations (default: logging.DEBUG).
content_type – MIME type of input data. Auto-derived from schema_builder if provided.
accept_type – MIME type of output data. Auto-derived from schema_builder if provided.
compute – Compute configuration specifying instance type and count.
network – Network configuration including VPC settings and network isolation.
instance_type – EC2 instance type for deployment (e.g., ‘ml.m5.large’).
mode – Deployment mode (SAGEMAKER_ENDPOINT, LOCAL_CONTAINER, or IN_PROCESS).

Note

ModelBuilder returns sagemaker.core.resources.Model and sagemaker.core.resources.Endpoint objects, not the deprecated PySDK Model and Predictor classes. Use endpoint.invoke() instead of predictor.predict() for inference.

accept_type: str | None = None#

Build a deployable Model instance with ModelBuilder.

Creates a SageMaker Model resource with the appropriate container image, model artifacts, and configuration. This method prepares the model for deployment but does not deploy it to an endpoint. Use the deploy() method to create an endpoint.

Note: This returns a sagemaker.core.resources.Model object, not the deprecated PySDK Model class.

Parameters:

model_name (str, optional) – The name for the SageMaker model. If not specified, a unique name will be generated. (Default: None).
mode (Mode, optional) – The mode of operation. Options are SAGEMAKER_ENDPOINT, LOCAL_CONTAINER, or IN_PROCESS. (Default: None, uses mode from initialization).
role_arn (str, optional) – The IAM role ARN for SageMaker to assume when creating the model and endpoint. (Default: None).
sagemaker_session (Session, optional) – Session object which manages interactions with Amazon SageMaker APIs and any other AWS services needed. If not specified, uses the session from initialization or creates one using the default AWS configuration chain. (Default: None).
region (str, optional) – The AWS region for deployment. If specified and different from the current region, a new session will be created. (Default: None).

Returns:

A sagemaker.core.resources.Model resource: that represents the created SageMaker model, or a ModelBuilder instance for multi-model scenarios.

Return type:

Union[Model, ModelBuilder, None]

Example

>>> model_builder = ModelBuilder(model=my_model, role_arn=role)
>>> model = model_builder.build()  # Creates Model resource
>>> endpoint = model_builder.deploy()  # Creates Endpoint resource
>>> result = endpoint.invoke(data=input_data)

compute: Compute | None = None#

configure_for_torchserve(shared_libs: List[str] | None = None, dependencies: Dict[str, Any] | None = None, image_config: Dict[str, str | PipelineVariable] | None = None) → ModelBuilder[source]#: Configure ModelBuilder for TorchServe deployment.

content_type: str | None = None#

dependencies: Dict[str, Any] | None#

deploy(endpoint_name: str = None, initial_instance_count: int | None = 1, instance_type: str | None = None, wait: bool = True, update_endpoint: bool | None = False, container_timeout_in_seconds: int = 300, inference_config: ServerlessInferenceConfig | AsyncInferenceConfig | BatchTransformInferenceConfig | ResourceRequirements | None = None, custom_orchestrator_instance_type: str = None, custom_orchestrator_initial_instance_count: int = None, **kwargs) → Endpoint | LocalEndpoint | Transformer[source]#

Deploy the built model to an Endpoint.

Creates a SageMaker EndpointConfig and deploys an Endpoint resource from the model created by build(). The model must be built before calling deploy().

Note: This returns a sagemaker.core.resources.Endpoint object, not the deprecated PySDK Predictor class. Use endpoint.invoke() to make predictions.

Parameters:

endpoint_name (str) – The name of the endpoint to create. If not specified, a unique endpoint name will be created. (Default: “endpoint”).
initial_instance_count (int, optional) – The initial number of instances to run in the endpoint. Required for instance-based endpoints. (Default: 1).
instance_type (str, optional) – The EC2 instance type to deploy this model to. For example, ‘ml.p2.xlarge’. Required for instance-based endpoints unless using serverless inference. (Default: None).
wait (bool) – Whether the call should wait until the deployment completes. (Default: True).
update_endpoint (bool) – Flag to update the model in an existing Amazon SageMaker endpoint. If True, deploys a new EndpointConfig to an existing endpoint and deletes resources from the previous EndpointConfig. (Default: False).
container_timeout_in_seconds (int) – The timeout value, in seconds, for the container to respond to requests. (Default: 300).
(Union[ServerlessInferenceConfig (inference_config) – BatchTransformInferenceConfig, ResourceRequirements], optional): Unified inference configuration parameter. Can be used instead of individual config parameters. (Default: None).
AsyncInferenceConfig – BatchTransformInferenceConfig, ResourceRequirements], optional): Unified inference configuration parameter. Can be used instead of individual config parameters. (Default: None).

:paramBatchTransformInferenceConfig, ResourceRequirements], optional): Unified inference: configuration parameter. Can be used instead of individual config parameters. (Default: None).

Parameters:

custom_orchestrator_instance_type (str, optional) – Instance type for custom orchestrator deployment. (Default: None).
custom_orchestrator_initial_instance_count (int, optional) – Initial instance count for custom orchestrator deployment. (Default: None).

Returns:

A sagemaker.core.resources.Endpoint: resource representing the deployed endpoint, a LocalEndpoint for local mode, or a Transformer for batch transform inference.

Return type:

Union[Endpoint, LocalEndpoint, Transformer]

Example

>>> model_builder = ModelBuilder(model=my_model, role_arn=role, instance_type="ml.m5.xlarge")
>>> model = model_builder.build()  # Creates Model resource
>>> endpoint = model_builder.deploy(endpoint_name="my-endpoint")  # Creates Endpoint resource
>>> result = endpoint.invoke(data=input_data)  # Make predictions

deploy_local(endpoint_name: str = 'endpoint', container_timeout_in_seconds: int = 300, **kwargs) → LocalEndpoint[source]#

Deploy the built model to local mode for testing.

Deploys the model locally using either LOCAL_CONTAINER mode (runs in a Docker container) or IN_PROCESS mode (runs in the current Python process). This is useful for testing and development before deploying to SageMaker endpoints. The model must be built with mode=Mode.LOCAL_CONTAINER or mode=Mode.IN_PROCESS before calling this method.

Note: This returns a LocalEndpoint object for local inference, not a SageMaker Endpoint resource. Use local_endpoint.invoke() to make predictions.

Parameters:

endpoint_name (str) – The name for the local endpoint. (Default: “endpoint”).
container_timeout_in_seconds (int) – The timeout value, in seconds, for the container to respond to requests. (Default: 300).

Returns:

A LocalEndpoint object for making local predictions.

Return type:

LocalEndpoint

Raises:

ValueError – If the model was not built with LOCAL_CONTAINER or IN_PROCESS mode.

Example

>>> model_builder = ModelBuilder(
...     model=my_model,
...     role_arn=role,
...     mode=Mode.LOCAL_CONTAINER
... )
>>> model = model_builder.build()
>>> local_endpoint = model_builder.deploy_local()
>>> result = local_endpoint.invoke(data=input_data)

display_benchmark_metrics(**kwargs) → None[source]#: Display benchmark metrics for JumpStart models.

enable_network_isolation()[source]#

Whether to enable network isolation when creating this Model

Returns:: If network isolation should be enabled or not.
Return type:: bool

env_vars: Dict[str, str | PipelineVariable] | None#

fetch_endpoint_names_for_base_model() → Set[str][source]#

Fetches endpoint names for the base model.

Returns:: Set of endpoint names for the base model.

classmethod from_jumpstart_config(jumpstart_config: JumpStartConfig, role_arn: str | None = None, compute: Compute | None = None, network: Networking | None = None, image_uri: str | None = None, env_vars: Dict[str, str] | None = None, model_kms_key: str | None = None, resource_requirements: ResourceRequirements | None = None, tolerate_vulnerable_model: bool | None = None, tolerate_deprecated_model: bool | None = None, sagemaker_session: Session | None = None, schema_builder: SchemaBuilder | None = None) → ModelBuilder[source]#

Create a ModelBuilder instance from a JumpStart configuration.

This class method provides a convenient way to create a ModelBuilder for deploying pre-trained models from Amazon SageMaker JumpStart. It automatically retrieves the appropriate model artifacts, container images, and default configurations for the specified JumpStart model.

Parameters:

jumpstart_config (JumpStartConfig) – Configuration object specifying the JumpStart model to use. Must include model_id and optionally model_version and inference_config_name.
role_arn (str, optional) – The IAM role ARN for SageMaker to assume when creating the model and endpoint. If not specified, attempts to use the default SageMaker execution role. (Default: None).
compute (Compute, optional) – Compute configuration specifying instance type and instance count for deployment. For example, Compute(instance_type=’ml.g5.xlarge’, instance_count=1). (Default: None).
network (Networking, optional) – Network configuration including VPC settings and network isolation. For example, Networking(vpc_config={‘Subnets’: […], ‘SecurityGroupIds’: […]}, enable_network_isolation=False). (Default: None).
image_uri (str, optional) – Custom container image URI. If not specified, uses the default JumpStart container image for the model. (Default: None).
env_vars (Dict[str, str], optional) – Environment variables to set in the container. These will be merged with default JumpStart environment variables. (Default: None).
model_kms_key (str, optional) – KMS key ARN used to encrypt model artifacts when uploading to S3. (Default: None).
resource_requirements (ResourceRequirements, optional) – The compute resource requirements for deploying the model to an inference component based endpoint. (Default: None).
tolerate_vulnerable_model (bool, optional) – If True, allows deployment of models with known security vulnerabilities. Use with caution. (Default: None).
tolerate_deprecated_model (bool, optional) – If True, allows deployment of deprecated JumpStart models. (Default: None).
sagemaker_session (Session, optional) – Session object which manages interactions with Amazon SageMaker APIs and any other AWS services needed. If not specified, creates one using the default AWS configuration chain. (Default: None).
schema_builder (SchemaBuilder, optional) – Schema builder for defining input/output schemas. If not specified, uses default schemas for the JumpStart model. (Default: None).

Returns:

A configured ModelBuilder instance ready to build and deploy: the specified JumpStart model.

Return type:

ModelBuilder

Example

>>> from sagemaker.core.jumpstart.configs import JumpStartConfig
>>> from sagemaker.serve.model_builder import ModelBuilder
>>>
>>> js_config = JumpStartConfig(
...     model_id="huggingface-llm-mistral-7b",
...     model_version="*"
... )
>>>
>>> from sagemaker.core.training.configs import Compute
>>>
>>> model_builder = ModelBuilder.from_jumpstart_config(
...     jumpstart_config=js_config,
...     compute=Compute(instance_type="ml.g5.2xlarge", instance_count=1)
... )
>>>
>>> model = model_builder.build()  # Creates Model resource
>>> endpoint = model_builder.deploy()  # Creates Endpoint resource
>>> result = endpoint.invoke(data=input_data)  # Make predictions

get_deployment_config() → Dict[str, Any] | None[source]#: Gets the deployment config to apply to the model.

image_config: Dict[str, str | PipelineVariable] | None = None#

image_uri: str | PipelineVariable | None = None#

inference_spec: InferenceSpec | None = None#

instance_type: str | None = None#

is_repack() → bool[source]#

Whether the source code needs to be repacked before uploading to S3.

Returns:: if the source need to be repacked or not
Return type:: bool

list_deployment_configs() → List[Dict[str, Any]][source]#: List deployment configs for the model in the current region.

log_level: int | None = 10#

mode: Mode | None = 3#

model: object | str | ModelTrainer | BaseTrainer | TrainingJob | ModelPackage | List[Model] | None = None#

model_metadata: Dict[str, Any] | None = None#

model_path: str | None#

model_server: ModelServer | None = None#

modelbuilder_list: List[ModelBuilder] | None = None#

network: Networking | None = None#

Create an optimized deployable Model instance with ModelBuilder.

Runs a SageMaker model optimization job to quantize, compile, or shard the model for improved inference performance. Returns a Model resource that can be deployed using the deploy() method.

Note: This returns a sagemaker.core.resources.Model object.

Parameters:

output_path (str, optional) – S3 URI where the optimized model artifacts will be stored. If not specified, uses the default output path. (Default: None).
instance_type (str, optional) – Target deployment instance type that the model is optimized for. For example, ‘ml.p4d.24xlarge’. (Default: None).
role_arn (str, optional) – IAM execution role ARN for the optimization job. If not specified, uses the role from initialization. (Default: None).
sagemaker_session (Session, optional) – Session object which manages interactions with Amazon SageMaker APIs and any other AWS services needed. If not specified, uses the session from initialization or creates one using the default AWS configuration chain. (Default: None).
region (str, optional) – The AWS region for the optimization job. If specified and different from the current region, a new session will be created. (Default: None).
model_name (str, optional) – The name for the optimized SageMaker model. If not specified, a unique name will be generated. (Default: None).
tags (Tags, optional) – Tags for labeling the model optimization job. (Default: None).
job_name (str, optional) – The name of the model optimization job. If not specified, a unique name will be generated. (Default: None).
accept_eula (bool, optional) – For models that require a Model Access Config, specify True or False to indicate whether model terms of use have been accepted. The accept_eula value must be explicitly defined as True in order to accept the end-user license agreement (EULA) that some models require. (Default: None).
quantization_config (Dict, optional) – Quantization configuration specifying the quantization method and parameters. For example: {‘OverrideEnvironment’: {‘OPTION_QUANTIZE’: ‘awq’}}. (Default: None).
compilation_config (Dict, optional) – Compilation configuration for optimizing the model for specific hardware. (Default: None).
speculative_decoding_config (Dict, optional) – Speculative decoding configuration for improving inference latency of large language models. (Default: None).
sharding_config (Dict, optional) – Model sharding configuration for distributing large models across multiple devices. (Default: None).
env_vars (Dict, optional) – Additional environment variables to pass to the optimization container. (Default: None).
vpc_config (Dict, optional) – VPC configuration for the optimization job. Should contain ‘Subnets’ and ‘SecurityGroupIds’ keys. (Default: None).
kms_key (str, optional) – KMS key ARN used to encrypt the optimized model artifacts when uploading to S3. (Default: None).
image_uri (str, optional) – Custom container image URI for the optimization job. If not specified, uses the default optimization container. (Default: None).
max_runtime_in_sec (int) – Maximum job execution time in seconds. The optimization job will be stopped if it exceeds this time. (Default: 36000).

Returns:

A sagemaker.core.resources.Model resource containing the optimized: model artifacts, ready for deployment.

Return type:

Model

Example

>>> model_builder = ModelBuilder(model=my_model, role_arn=role)
>>> optimized_model = model_builder.optimize(
...     instance_type="ml.g5.xlarge",
...     quantization_config={'OverrideEnvironment': {'OPTION_QUANTIZE': 'awq'}}
... )
>>> endpoint = model_builder.deploy()  # Deploy the optimized model
>>> result = endpoint.invoke(data=input_data)

Creates a model package for creating SageMaker models or listing on Marketplace.

Parameters:

content_types (list[str] or list[PipelineVariable]) – The supported MIME types for the input data.
response_types (list[str] or list[PipelineVariable]) – The supported MIME types for the output data.
inference_instances (list[str] or list[PipelineVariable]) – A list of the instance types that are used to generate inferences in real-time (default: None).
transform_instances (list[str] or list[PipelineVariable]) – A list of the instance types on which a transformation job can be run or on which an endpoint can be deployed (default: None).
model_package_name (str or PipelineVariable) – Model Package name, exclusive to model_package_group_name, using model_package_name makes the Model Package un-versioned (default: None).
model_package_group_name (str or PipelineVariable) – Model Package Group name, exclusive to model_package_name, using model_package_group_name makes the Model Package versioned (default: None).
model_metrics (ModelMetrics) – ModelMetrics object (default: None).
metadata_properties (MetadataProperties) – MetadataProperties object (default: None).
marketplace_cert (bool) – A boolean value indicating if the Model Package is certified for AWS Marketplace (default: False).
approval_status (str or PipelineVariable) – Model Approval Status, values can be “Approved”, “Rejected”, or “PendingManualApproval” (default: “PendingManualApproval”).
description (str) – Model Package description (default: None).
drift_check_baselines (DriftCheckBaselines) – DriftCheckBaselines object (default: None).
customer_metadata_properties (dict[str, str] or dict[str, PipelineVariable]) – A dictionary of key-value paired metadata properties (default: None).
domain (str or PipelineVariable) – Domain values can be “COMPUTER_VISION”, “NATURAL_LANGUAGE_PROCESSING”, “MACHINE_LEARNING” (default: None).
task (str or PipelineVariable) – Task values which are supported by Inference Recommender are “FILL_MASK”, “IMAGE_CLASSIFICATION”, “OBJECT_DETECTION”, “TEXT_GENERATION”, “IMAGE_SEGMENTATION”, “CLASSIFICATION”, “REGRESSION”, “OTHER” (default: None).
sample_payload_url (str or PipelineVariable) – The S3 path where the sample payload is stored (default: None).
nearest_model_name (str or PipelineVariable) – Name of a pre-trained machine learning benchmarked by Amazon SageMaker Inference Recommender (default: None).
data_input_configuration (str or PipelineVariable) – Input object for the model (default: None).
skip_model_validation (str or PipelineVariable) – Indicates if you want to skip model validation. Values can be “All” or “None” (default: None).
source_uri (str or PipelineVariable) – The URI of the source for the model package (default: None).
model_card (ModeCard or ModelPackageModelCard) – document contains qualitative and quantitative information about a model (default: None).
model_life_cycle (ModelLifeCycle) – ModelLifeCycle object (default: None).
accept_eula (bool) – For models that require a Model Access Config, specify True or False to indicate whether model terms of use have been accepted (default: None).
model_type (JumpStartModelType) – Type of JumpStart model (default: None).

Returns:

A sagemaker.model.ModelPackage instance or pipeline step arguments in case the Model instance is built with PipelineSession

Note

The following parameters are inherited from ModelBuilder.__init__ and do not need to be passed to register(): - image_uri: Use self.image_uri (defined in __init__) - framework: Use self.framework (defined in __init__) - framework_version: Use self.framework_version (defined in __init__)

role_arn: str | None = None#

s3_model_data_url: str | PipelineVariable | Dict[str, Any] | None = None#

sagemaker_session: Session | None = None#

schema_builder: SchemaBuilder | None = None#

set_deployment_config(config_name: str, instance_type: str) → None[source]#: Sets the deployment config to apply to the model.

shared_libs: List[str]#

source_code: SourceCode | None = None#

to_string(obj: object)[source]#

Convert an object to string

This helper function handles converting PipelineVariable object to string as well

Parameters:: obj (object) – The object to be converted

transformer(instance_count, instance_type, strategy=None, assemble_with=None, output_path=None, output_kms_key=None, accept=None, env=None, max_concurrent_transforms=None, max_payload=None, tags=None, volume_kms_key=None)[source]#

Return a Transformer that uses this Model.

Parameters:

instance_count (int) – Number of EC2 instances to use.
instance_type (str) – Type of EC2 instance to use, for example, ‘ml.c4.xlarge’.
strategy (str) – The strategy used to decide how to batch records in a single request (default: None). Valid values: ‘MultiRecord’ and ‘SingleRecord’.
assemble_with (str) – How the output is assembled (default: None). Valid values: ‘Line’ or ‘None’.
output_path (str) – S3 location for saving the transform result. If not specified, results are stored to a default bucket.
output_kms_key (str) – Optional. KMS key ID for encrypting the transform output (default: None).
accept (str) – The accept header passed by the client to the inference endpoint. If it is supported by the endpoint, it will be the format of the batch transform output.
env (dict) – Environment variables to be set for use during the transform job (default: None).
max_concurrent_transforms (int) – The maximum number of HTTP requests to be made to each individual transform container at one time.
max_payload (int) – Maximum size of the payload in a single HTTP request to the container in MB.
tags (Optional[Tags]) – Tags for labeling a transform job. If none specified, then the tags used for the training job are used for the transform job.
volume_kms_key (str) – Optional. KMS key ID for encrypting the volume attached to the ML compute instance (default: None).

class sagemaker.serve.ModelServer(value)[source]#

Bases: Enum

An enum for model server

DJL_SERVING = 4#

MMS = 2#

SMD = 8#

TEI = 7#

TENSORFLOW_SERVING = 3#

TGI = 6#

TORCHSERVE = 1#

TRITON = 5#

SageMaker Serve

Contents

SageMaker Serve#

Model Deployment#