sagemaker.core.inference_recommender.inference_recommender_mixin

sagemaker.core.inference_recommender.inference_recommender_mixin#

Placeholder docstring

Classes

`InferenceRecommenderMixin`()	A mixin class for SageMaker `Inference Recommender` that will be extended by `Model`
`ModelLatencyThreshold`(percentile, ...)	Used to store inference request/response latency to perform endpoint load testing.
`Phase`(duration_in_seconds, ...)	Used to store phases of a traffic pattern to perform endpoint load testing.

class sagemaker.core.inference_recommender.inference_recommender_mixin.InferenceRecommenderMixin[source]#

Bases: object

A mixin class for SageMaker Inference Recommender that will be extended by Model

Recommends an instance type for a SageMaker or BYOC model.

Create a SageMaker Model or use a registered ModelPackage, to start an Inference Recommender job.

The name of the created model is accessible in the name field of this Model after right_size returns.

Parameters:

sample_payload_url (str) – The S3 path where the sample payload is stored.
supported_content_types – (list[str]): The supported MIME types for the input data.
supported_instance_types (list[str]) – A list of the instance types that this model is expected to work on. (default: None).
job_name (str) – The name of the Inference Recommendations Job. (default: None).
framework (str) – The machine learning framework of the Image URI. Only required to specify if you bring your own custom containers (default: None).
job_duration_in_seconds (int) – The maximum job duration that a job can run for. (default: None).
hyperparameter_ranges (list[Dict[str, sagemaker.parameter.CategoricalParameter]]) –
Specifies the hyper parameters to be used during endpoint load tests. instance_type must be specified as a hyperparameter range. env_vars can be specified as an optional hyperparameter range. (default: None). Example:
```
hyperparameter_ranges = [{
    'instance_types': CategoricalParameter(['ml.c5.xlarge', 'ml.c5.2xlarge']),
    'OMP_NUM_THREADS': CategoricalParameter(['1', '2', '3', '4'])
}]
```
phases (list[Phase]) – Shape of the traffic pattern to use in the load test (default: None).
traffic_type (str) – Specifies the traffic pattern type. Currently only supports one type ‘PHASES’ (default: None).
max_invocations (str) – defines the minimum invocations per minute for the endpoint to support (default: None).
model_latency_thresholds (list[ModelLatencyThreshold]) – defines the maximum response latency for endpoints to support (default: None).
max_tests (int) – restricts how many endpoints in total are allowed to be spun up for this job (default: None).
max_parallel_tests (int) – restricts how many concurrent endpoints this job is allowed to spin up (default: None).
log_level (str) – specifies the inline output when waiting for right_size to complete (default: “Verbose”).

Returns:

A SageMaker Model object. See Model() for full details.

Return type:

sagemaker.model.Model

class sagemaker.core.inference_recommender.inference_recommender_mixin.ModelLatencyThreshold(percentile: str, value_in_milliseconds: int)[source]#

Bases: object

Used to store inference request/response latency to perform endpoint load testing.

Required for an Advanced Inference Recommendations Job

class sagemaker.core.inference_recommender.inference_recommender_mixin.Phase(duration_in_seconds: int, initial_number_of_users: int, spawn_rate: int)[source]#

Bases: object

Used to store phases of a traffic pattern to perform endpoint load testing.

Required for an Advanced Inference Recommendations Job

sagemaker.core.inference_recommender.inference_recommender_mixin

Contents

sagemaker.core.inference_recommender.inference_recommender_mixin#