sagemaker.core.inference_recommender.inference_recommender_mixin

sagemaker.core.inference_recommender.inference_recommender_mixin#

Placeholder docstring

Classes

InferenceRecommenderMixin()

A mixin class for SageMaker Inference Recommender that will be extended by Model

ModelLatencyThreshold(percentile, ...)

Used to store inference request/response latency to perform endpoint load testing.

Phase(duration_in_seconds, ...)

Used to store phases of a traffic pattern to perform endpoint load testing.

class sagemaker.core.inference_recommender.inference_recommender_mixin.InferenceRecommenderMixin[source]#

Bases: object

A mixin class for SageMaker Inference Recommender that will be extended by Model

right_size(sample_payload_url: str | None = None, supported_content_types: List[str] | None = None, supported_instance_types: List[str] | None = None, job_name: str | None = None, framework: str | None = None, job_duration_in_seconds: int | None = None, hyperparameter_ranges: List[Dict[str, CategoricalParameter]] | None = None, phases: List[Phase] | None = None, traffic_type: str | None = None, max_invocations: int | None = None, model_latency_thresholds: List[ModelLatencyThreshold] | None = None, max_tests: int | None = None, max_parallel_tests: int | None = None, log_level: str | None = 'Verbose')[source]#

Recommends an instance type for a SageMaker or BYOC model.

Create a SageMaker Model or use a registered ModelPackage, to start an Inference Recommender job.

The name of the created model is accessible in the name field of this Model after right_size returns.

Parameters:
  • sample_payload_url (str) – The S3 path where the sample payload is stored.

  • supported_content_types – (list[str]): The supported MIME types for the input data.

  • supported_instance_types (list[str]) – A list of the instance types that this model is expected to work on. (default: None).

  • job_name (str) – The name of the Inference Recommendations Job. (default: None).

  • framework (str) – The machine learning framework of the Image URI. Only required to specify if you bring your own custom containers (default: None).

  • job_duration_in_seconds (int) – The maximum job duration that a job can run for. (default: None).

  • hyperparameter_ranges (list[Dict[str, sagemaker.parameter.CategoricalParameter]]) –

    Specifies the hyper parameters to be used during endpoint load tests. instance_type must be specified as a hyperparameter range. env_vars can be specified as an optional hyperparameter range. (default: None). Example:

    hyperparameter_ranges = [{
        'instance_types': CategoricalParameter(['ml.c5.xlarge', 'ml.c5.2xlarge']),
        'OMP_NUM_THREADS': CategoricalParameter(['1', '2', '3', '4'])
    }]
    

  • phases (list[Phase]) – Shape of the traffic pattern to use in the load test (default: None).

  • traffic_type (str) – Specifies the traffic pattern type. Currently only supports one type ‘PHASES’ (default: None).

  • max_invocations (str) – defines the minimum invocations per minute for the endpoint to support (default: None).

  • model_latency_thresholds (list[ModelLatencyThreshold]) – defines the maximum response latency for endpoints to support (default: None).

  • max_tests (int) – restricts how many endpoints in total are allowed to be spun up for this job (default: None).

  • max_parallel_tests (int) – restricts how many concurrent endpoints this job is allowed to spin up (default: None).

  • log_level (str) – specifies the inline output when waiting for right_size to complete (default: “Verbose”).

Returns:

A SageMaker Model object. See Model() for full details.

Return type:

sagemaker.model.Model

class sagemaker.core.inference_recommender.inference_recommender_mixin.ModelLatencyThreshold(percentile: str, value_in_milliseconds: int)[source]#

Bases: object

Used to store inference request/response latency to perform endpoint load testing.

Required for an Advanced Inference Recommendations Job

class sagemaker.core.inference_recommender.inference_recommender_mixin.Phase(duration_in_seconds: int, initial_number_of_users: int, spawn_rate: int)[source]#

Bases: object

Used to store phases of a traffic pattern to perform endpoint load testing.

Required for an Advanced Inference Recommendations Job