sagemaker.core.inference_recommender.inference_recommender_mixin#
Placeholder docstring
Classes
A mixin class for SageMaker |
|
|
Used to store inference request/response latency to perform endpoint load testing. |
|
Used to store phases of a traffic pattern to perform endpoint load testing. |
- class sagemaker.core.inference_recommender.inference_recommender_mixin.InferenceRecommenderMixin[source]#
Bases:
objectA mixin class for SageMaker
Inference Recommenderthat will be extended byModel- right_size(sample_payload_url: str | None = None, supported_content_types: List[str] | None = None, supported_instance_types: List[str] | None = None, job_name: str | None = None, framework: str | None = None, job_duration_in_seconds: int | None = None, hyperparameter_ranges: List[Dict[str, CategoricalParameter]] | None = None, phases: List[Phase] | None = None, traffic_type: str | None = None, max_invocations: int | None = None, model_latency_thresholds: List[ModelLatencyThreshold] | None = None, max_tests: int | None = None, max_parallel_tests: int | None = None, log_level: str | None = 'Verbose')[source]#
Recommends an instance type for a SageMaker or BYOC model.
Create a SageMaker
Modelor use a registeredModelPackage, to start an Inference Recommender job.The name of the created model is accessible in the
namefield of thisModelafter right_size returns.- Parameters:
sample_payload_url (str) – The S3 path where the sample payload is stored.
supported_content_types – (list[str]): The supported MIME types for the input data.
supported_instance_types (list[str]) – A list of the instance types that this model is expected to work on. (default: None).
job_name (str) – The name of the Inference Recommendations Job. (default: None).
framework (str) – The machine learning framework of the Image URI. Only required to specify if you bring your own custom containers (default: None).
job_duration_in_seconds (int) – The maximum job duration that a job can run for. (default: None).
hyperparameter_ranges (list[Dict[str, sagemaker.parameter.CategoricalParameter]]) –
Specifies the hyper parameters to be used during endpoint load tests. instance_type must be specified as a hyperparameter range. env_vars can be specified as an optional hyperparameter range. (default: None). Example:
hyperparameter_ranges = [{ 'instance_types': CategoricalParameter(['ml.c5.xlarge', 'ml.c5.2xlarge']), 'OMP_NUM_THREADS': CategoricalParameter(['1', '2', '3', '4']) }]
phases (list[Phase]) – Shape of the traffic pattern to use in the load test (default: None).
traffic_type (str) – Specifies the traffic pattern type. Currently only supports one type ‘PHASES’ (default: None).
max_invocations (str) – defines the minimum invocations per minute for the endpoint to support (default: None).
model_latency_thresholds (list[ModelLatencyThreshold]) – defines the maximum response latency for endpoints to support (default: None).
max_tests (int) – restricts how many endpoints in total are allowed to be spun up for this job (default: None).
max_parallel_tests (int) – restricts how many concurrent endpoints this job is allowed to spin up (default: None).
log_level (str) – specifies the inline output when waiting for right_size to complete (default: “Verbose”).
- Returns:
A SageMaker
Modelobject. SeeModel()for full details.- Return type:
sagemaker.model.Model
- class sagemaker.core.inference_recommender.inference_recommender_mixin.ModelLatencyThreshold(percentile: str, value_in_milliseconds: int)[source]#
Bases:
objectUsed to store inference request/response latency to perform endpoint load testing.
Required for an Advanced Inference Recommendations Job
- class sagemaker.core.inference_recommender.inference_recommender_mixin.Phase(duration_in_seconds: int, initial_number_of_users: int, spawn_rate: int)[source]#
Bases:
objectUsed to store phases of a traffic pattern to perform endpoint load testing.
Required for an Advanced Inference Recommendations Job