sagemaker.core.model_monitor.model_monitoring

Contents

sagemaker.core.model_monitor.model_monitoring#

This module contains code related to Amazon SageMaker Model Monitoring.

These classes assist with suggesting baselines and creating monitoring schedules for data captured by SageMaker Endpoints.

Classes

BaseliningJob(sagemaker_session, job_name, ...)

Provides functionality to retrieve baseline-specific files output from baselining job.

BatchTransformInput(...[, s3_input_mode, ...])

Accepts parameters that specify a batch transform input for monitoring schedule.

DefaultModelMonitor([role, instance_count, ...])

Sets up Amazon SageMaker Monitoring Schedules and baseline suggestions.

EndpointInput(endpoint_name, destination[, ...])

Accepts parameters that specify an endpoint input for monitoring execution.

ModelMonitor([role, image_uri, ...])

Sets up Amazon SageMaker Monitoring Schedules and baseline suggestions.

ModelQualityMonitor([role, instance_count, ...])

Amazon SageMaker model monitor to monitor quality metrics for an endpoint.

MonitoringExecution(sagemaker_session, ...)

Provides functionality to retrieve monitoring-specific files from monitoring executions.

MonitoringInput(start_time_offset, ...)

Accepts parameters specifying batch transform or endpoint inputs for monitoring execution.

MonitoringOutput(source[, destination, ...])

Accepts parameters that specify an S3 output for a monitoring job.

class sagemaker.core.model_monitor.model_monitoring.BaseliningJob(sagemaker_session, job_name, inputs, outputs, output_kms_key=None)[source]#

Bases: object

Provides functionality to retrieve baseline-specific files output from baselining job.

baseline_statistics(file_name='statistics.json', kms_key=None)[source]#

Returns a sagemaker.model_monitor.

Statistics object representing the statistics JSON file generated by this baselining job.

Parameters:
  • file_name (str) – The name of the json-formatted statistics file

  • kms_key (str) – The kms key to use when retrieving the file.

Returns:

The Statistics object representing the file that

was generated by the job.

Return type:

sagemaker.model_monitor.Statistics

Raises:

UnexpectedStatusException – This is thrown if the job is not in a ‘Complete’ state.

describe()[source]#

Describe the processing job.

classmethod from_processing_job(processing_job)[source]#

Initializes a Baselining job from a processing job.

Parameters:

processing_job (sagemaker.processing.ProcessingJob) – The ProcessingJob used for baselining instance.

Returns:

The instance of ProcessingJob created

using the current job name.

Return type:

sagemaker.processing.BaseliningJob

suggested_constraints(file_name='constraints.json', kms_key=None)[source]#

Returns a sagemaker.model_monitor.

Constraints object representing the constraints JSON file generated by this baselining job.

Parameters:
  • file_name (str) – The name of the json-formatted constraints file

  • kms_key (str) – The kms key to use when retrieving the file.

Returns:

The Constraints object representing the file that

was generated by the job.

Return type:

sagemaker.model_monitor.Constraints

Raises:

UnexpectedStatusException – This is thrown if the job is not in a ‘Complete’ state.

class sagemaker.core.model_monitor.model_monitoring.BatchTransformInput(data_captured_destination_s3_uri: str, destination: str, dataset_format: MonitoringDatasetFormat, s3_input_mode: str = 'File', s3_data_distribution_type: str = 'FullyReplicated', start_time_offset: str | None = None, end_time_offset: str | None = None, features_attribute: str | None = None, inference_attribute: str | None = None, probability_attribute: str | None = None, probability_threshold_attribute: str | None = None, exclude_features_attribute: str | None = None)[source]#

Bases: MonitoringInput

Accepts parameters that specify a batch transform input for monitoring schedule.

It also provides a method to turn those parameters into a dictionary.

class sagemaker.core.model_monitor.model_monitoring.DefaultModelMonitor(role=None, instance_count=1, instance_type='ml.m5.xlarge', volume_size_in_gb=30, volume_kms_key=None, output_kms_key=None, max_runtime_in_seconds=None, base_job_name=None, sagemaker_session=None, env=None, tags=None, network_config=None)[source]#

Bases: ModelMonitor

Sets up Amazon SageMaker Monitoring Schedules and baseline suggestions.

Use this class when you want to utilize Amazon SageMaker Monitoring’s plug-and-play solution that only requires your dataset and optional pre/postprocessing scripts. For a more customized experience, consider using the ModelMonitor class instead.

JOB_DEFINITION_BASE_NAME = 'data-quality-job-definition'#
classmethod attach(monitor_schedule_name, sagemaker_session=None)[source]#

Sets this object’s schedule name to the name provided.

This allows subsequent describe_schedule or list_executions calls to point to the given schedule.

Parameters:
  • monitor_schedule_name (str) – The name of the schedule to attach to.

  • sagemaker_session (sagemaker.core.helper.session.Session) – Session object which manages interactions with Amazon SageMaker APIs and any other AWS services needed. If not specified, one is created using the default AWS configuration chain.

create_monitoring_schedule(endpoint_input=None, record_preprocessor_script=None, post_analytics_processor_script=None, output_s3_uri=None, constraints=None, statistics=None, monitor_schedule_name=None, schedule_cron_expression=None, enable_cloudwatch_metrics=True, batch_transform_input=None, data_analysis_start_time=None, data_analysis_end_time=None)[source]#

Creates a monitoring schedule to monitor an Amazon SageMaker Endpoint.

If constraints and statistics are provided, or if they are able to be retrieved from a previous baselining job associated with this monitor, those will be used. If constraints and statistics cannot be automatically retrieved, baseline_inputs will be required in order to kick off a baselining job.

Parameters:
  • endpoint_input (str or sagemaker.model_monitor.EndpointInput) – The endpoint to monitor. This can either be the endpoint name or an EndpointInput. (default: None)

  • record_preprocessor_script (str) – The path to the record preprocessor script. This can be a local path or an S3 uri.

  • post_analytics_processor_script (str) – The path to the record post-analytics processor script. This can be a local path or an S3 uri.

  • output_s3_uri (str) – Desired S3 destination of the constraint_violations and statistics json files. Default: “s3://<default_session_bucket>/<job_name>/output”

  • constraints (sagemaker.model_monitor.Constraints or str) – If provided alongside statistics, these will be used for monitoring the endpoint. This can be a sagemaker.model_monitor.Constraints object or an s3_uri pointing to a constraints JSON file.

  • statistics (sagemaker.model_monitor.Statistic or str) – If provided alongside constraints, these will be used for monitoring the endpoint. This can be a sagemaker.model_monitor.Statistics object or an s3_uri pointing to a statistics JSON file.

  • monitor_schedule_name (str) – Schedule name. If not specified, the processor generates a default job name, based on the image name and current timestamp.

  • schedule_cron_expression (str) – The cron expression that dictates the frequency that this job run. See sagemaker.model_monitor.CronExpressionGenerator for valid expressions. Default: Daily.

  • enable_cloudwatch_metrics (bool) – Whether to publish cloudwatch metrics as part of the baselining or monitoring jobs.

  • batch_transform_input (sagemaker.model_monitor.BatchTransformInput) – Inputs to run the monitoring schedule on the batch transform (default: None)

  • data_analysis_start_time (str) – Start time for the data analysis window for the one time monitoring schedule (NOW), e.g. “-PT1H” (default: None)

  • data_analysis_end_time (str) – End time for the data analysis window for the one time monitoring schedule (NOW), e.g. “-PT1H” (default: None)

delete_monitoring_schedule()[source]#

Deletes the monitoring schedule and its job definition.

latest_monitoring_constraint_violations()[source]#

Returns the sagemaker.model_monitor.

ConstraintViolations generated by the latest monitoring execution.

Returns:

The ConstraintViolations object

representing the file generated by the latest monitoring execution.

Return type:

sagemaker.model_monitoring.ConstraintViolations

latest_monitoring_statistics()[source]#

Returns the sagemaker.model_monitor.Statistics.

These are the statistics generated by the latest monitoring execution.

Returns:

The Statistics object representing the file

generated by the latest monitoring execution.

Return type:

sagemaker.model_monitoring.Statistics

classmethod monitoring_type()[source]#

Type of the monitoring job.

run_baseline()[source]#

Not implemented.

‘.run_baseline()’ is only allowed for ModelMonitor objects. Please use suggest_baseline for DefaultModelMonitor objects, instead.

Raises:

NotImplementedError

suggest_baseline(baseline_dataset, dataset_format, record_preprocessor_script=None, post_analytics_processor_script=None, output_s3_uri=None, wait=True, logs=True, job_name=None, monitoring_config_override=None)[source]#

Suggest baselines for use with Amazon SageMaker Model Monitoring Schedules.

Parameters:
  • baseline_dataset (str) – The path to the baseline_dataset file. This can be a local path or an S3 uri.

  • dataset_format (dict) – The format of the baseline_dataset.

  • record_preprocessor_script (str) – The path to the record preprocessor script. This can be a local path or an S3 uri.

  • post_analytics_processor_script (str) – The path to the record post-analytics processor script. This can be a local path or an S3 uri.

  • output_s3_uri (str) – Desired S3 destination Destination of the constraint_violations and statistics json files. Default: “s3://<default_session_bucket>/<job_name>/output”

  • wait (bool) – Whether the call should wait until the job completes (default: True).

  • logs (bool) – Whether to show the logs produced by the job. Only meaningful when wait is True (default: True).

  • job_name (str) – Processing job name. If not specified, the processor generates a default job name, based on the image name and current timestamp.

  • monitoring_config_override (DataQualityMonitoringConfig) – monitoring_config object to override the global monitoring_config parameter of constraints suggested by Model Monitor Container. If not specified, the values suggested by container is set.

Returns:

The ProcessingJob object representing the

baselining job.

Return type:

sagemaker.processing.ProcessingJob

update_monitoring_schedule(endpoint_input=None, record_preprocessor_script=None, post_analytics_processor_script=None, output_s3_uri=None, statistics=None, constraints=None, schedule_cron_expression=None, instance_count=None, instance_type=None, volume_size_in_gb=None, volume_kms_key=None, output_kms_key=None, max_runtime_in_seconds=None, env=None, network_config=None, enable_cloudwatch_metrics=None, role=None, batch_transform_input=None, data_analysis_start_time=None, data_analysis_end_time=None)[source]#

Updates the existing monitoring schedule.

Parameters:
  • endpoint_input (str or sagemaker.model_monitor.EndpointInput) – The endpoint to monitor. This can either be the endpoint name or an EndpointInput.

  • record_preprocessor_script (str) – The path to the record preprocessor script. This can be a local path or an S3 uri.

  • post_analytics_processor_script (str) – The path to the record post-analytics processor script. This can be a local path or an S3 uri.

  • output_s3_uri (str) – Desired S3 destination of the constraint_violations and statistics json files.

  • statistics (sagemaker.model_monitor.Statistic or str) – If provided alongside constraints, these will be used for monitoring the endpoint. This can be a sagemaker.model_monitor.Statistics object or an S3 uri pointing to a statistics JSON file.

  • constraints (sagemaker.model_monitor.Constraints or str) – If provided alongside statistics, these will be used for monitoring the endpoint. This can be a sagemaker.model_monitor.Constraints object or an S3 uri pointing to a constraints JSON file.

  • schedule_cron_expression (str) – The cron expression that dictates the frequency that this job runs at. See sagemaker.model_monitor.CronExpressionGenerator for valid expressions.

  • instance_count (int) – The number of instances to run the jobs with.

  • instance_type (str) – Type of EC2 instance to use for the job, for example, ‘ml.m5.xlarge’.

  • volume_size_in_gb (int) – Size in GB of the EBS volume to use for storing data during processing (default: 30).

  • volume_kms_key (str) – A KMS key for the job’s volume.

  • output_kms_key (str) – The KMS key id for the job’s outputs.

  • max_runtime_in_seconds (int) – Timeout in seconds. After this amount of time, Amazon SageMaker terminates the job regardless of its current status. Default: 3600

  • env (dict) – Environment variables to be passed to the job.

  • network_config (sagemaker.network.NetworkConfig) – A NetworkConfig object that configures network isolation, encryption of inter-container traffic, security group IDs, and subnets.

  • enable_cloudwatch_metrics (bool) – Whether to publish cloudwatch metrics as part of the baselining or monitoring jobs.

  • role (str) – An AWS IAM role name or ARN. The Amazon SageMaker jobs use this role.

  • batch_transform_input (sagemaker.model_monitor.BatchTransformInput) – Inputs to run the monitoring schedule on the batch transform (default: None)

  • data_analysis_start_time (str) – Start time for the data analysis window for the one time monitoring schedule (NOW), e.g. “-PT1H” (default: None)

  • data_analysis_end_time (str) – End time for the data analysis window for the one time monitoring schedule (NOW), e.g. “-PT1H” (default: None)

class sagemaker.core.model_monitor.model_monitoring.EndpointInput(endpoint_name, destination, s3_input_mode='File', s3_data_distribution_type='FullyReplicated', start_time_offset=None, end_time_offset=None, features_attribute=None, inference_attribute=None, probability_attribute=None, probability_threshold_attribute=None, exclude_features_attribute=None)[source]#

Bases: object

Accepts parameters that specify an endpoint input for monitoring execution.

It also provides a method to turn those parameters into a dictionary.

class sagemaker.core.model_monitor.model_monitoring.ModelMonitor(role=None, image_uri=None, instance_count=1, instance_type='ml.m5.xlarge', entrypoint=None, volume_size_in_gb=30, volume_kms_key=None, output_kms_key=None, max_runtime_in_seconds=None, base_job_name=None, sagemaker_session=None, env=None, tags=None, network_config=None)[source]#

Bases: object

Sets up Amazon SageMaker Monitoring Schedules and baseline suggestions.

Use this class when you want to provide your own container image containing the code you’d like to run, in order to produce your own statistics and constraint validation files. For a more guided experience, consider using the DefaultModelMonitor class instead.

classmethod attach(monitor_schedule_name, sagemaker_session=None)[source]#

Set this object’s schedule name point to the Amazon Sagemaker Monitoring Schedule name.

This allows subsequent describe_schedule or list_executions calls to point to the given schedule.

Parameters:
  • monitor_schedule_name (str) – The name of the schedule to attach to.

  • sagemaker_session (sagemaker.core.helper.session_helper.Session) – Session object which manages interactions with Amazon SageMaker APIs and any other AWS services needed. If not specified, one is created using the default AWS configuration chain.

baseline_statistics(file_name='statistics.json')[source]#

Returns a Statistics object representing the statistics json file

Object is generated by the latest baselining job.

Parameters:

file_name (str) – The name of the .json statistics file

Returns:

The Statistics object representing the file that

was generated by the job.

Return type:

sagemaker.model_monitor.Statistics

create_monitoring_schedule(endpoint_input=None, output=None, statistics=None, constraints=None, monitor_schedule_name=None, schedule_cron_expression=None, batch_transform_input=None, arguments=None, data_analysis_start_time=None, data_analysis_end_time=None)[source]#

Creates a monitoring schedule to monitor an Amazon SageMaker Endpoint.

If constraints and statistics are provided, or if they are able to be retrieved from a previous baselining job associated with this monitor, those will be used. If constraints and statistics cannot be automatically retrieved, baseline_inputs will be required in order to kick off a baselining job.

Parameters:
  • endpoint_input (str or sagemaker.model_monitor.EndpointInput) – The endpoint to monitor. This can either be the endpoint name or an EndpointInput. (default: None)

  • output (sagemaker.model_monitor.MonitoringOutput) – The output of the monitoring schedule. (default: None)

  • statistics (sagemaker.model_monitor.Statistic or str) – If provided alongside constraints, these will be used for monitoring the endpoint. This can be a sagemaker.model_monitor.Statistic object or an S3 uri pointing to a statistic JSON file. (default: None)

  • constraints (sagemaker.model_monitor.Constraints or str) – If provided alongside statistics, these will be used for monitoring the endpoint. This can be a sagemaker.model_monitor.Constraints object or an S3 uri pointing to a constraints JSON file. (default: None)

  • monitor_schedule_name (str) – Schedule name. If not specified, the processor generates a default job name, based on the image name and current timestamp. (default: None)

  • schedule_cron_expression (str) – The cron expression that dictates the frequency that this job runs at. See sagemaker.model_monitor.CronExpressionGenerator for valid expressions. Default: Daily. (default: None)

  • batch_transform_input (sagemaker.model_monitor.BatchTransformInput) – Inputs to run the monitoring schedule on the batch transform (default: None)

  • arguments ([str]) – A list of string arguments to be passed to a processing job.

  • data_analysis_start_time (str) – Start time for the data analysis window for the one time monitoring schedule (NOW), e.g. “-PT1H” (default: None)

  • data_analysis_end_time (str) – End time for the data analysis window for the one time monitoring schedule (NOW), e.g. “-PT1H” (default: None)

delete_monitoring_schedule()[source]#

Deletes the monitoring schedule (subclass is responsible for deleting job definition)

describe_latest_baselining_job()[source]#

Describe the latest baselining job kicked off by the suggest workflow.

describe_schedule()[source]#

Describes the schedule that this object represents.

Returns:

A dictionary response with the monitoring schedule description.

Return type:

dict

get_latest_execution_logs(wait=False)[source]#

Get the processing job logs for the most recent monitoring execution

Parameters:

wait (bool) – Whether the call should wait until the job completes (default: False).

Raises:

ValueError – If no execution job or processing job for the last execution has run

Returns: None

latest_monitoring_constraint_violations(file_name='constraint_violations.json')[source]#

Returns the sagemaker.model_monitor.

ConstraintViolations generated by the latest monitoring execution.

Parameters:

file_name (str) – The name of the constraint violdations file to be retrieved. Only override if generating a custom file name.

Returns:

The ConstraintViolations object

representing the file generated by the latest monitoring execution.

Return type:

sagemaker.model_monitoring.ConstraintViolations

latest_monitoring_statistics(file_name='statistics.json')[source]#

Returns the sagemaker.model_monitor.

Statistics generated by the latest monitoring execution.

Parameters:

file_name (str) – The name of the statistics file to be retrieved. Only override if generating a custom file name.

Returns:

The Statistics object representing the file

generated by the latest monitoring execution.

Return type:

sagemaker.model_monitoring.Statistics

list_executions()[source]#

Get the list of the latest monitoring executions in descending order of “ScheduledTime”.

Statistics or violations can be called following this example: .. rubric:: Example

>>> my_executions = my_monitor.list_executions()
>>> second_to_last_execution_statistics = my_executions[-1].statistics()
>>> second_to_last_execution_violations = my_executions[-1].constraint_violations()
Returns:

List of MonitoringExecutions in

descending order of “ScheduledTime”.

Return type:

[sagemaker.model_monitor.MonitoringExecution]

list_monitoring_alert_history(monitoring_alert_name: str | None = None, sort_by: str | None = 'CreationTime', sort_order: str | None = 'Descending', next_token: str | None = None, max_results: int | None = 10, creation_time_before: str | None = None, creation_time_after: str | None = None, status_equals: str | None = None)[source]#

Lists the alert history associated with the given schedule_name and alert_name.

Parameters:
  • monitoring_alert_name (Optional[str]) – The name of the alert_name to filter on. If not provided, does not filter on it. Default: None.

  • sort_by (Optional[str]) – sort_by (str): The field to sort by. Can be one of: “Name”, “CreationTime” Default: “CreationTime”.

  • sort_order (Optional[str]) – The sort order. Can be one of: “Ascending”, “Descending”. Default: “Descending”.

  • next_token (Optional[str]) – The pagination token. Default: None.

  • max_results (Optional[int]) – The maximum number of results to return. Must be between 1 and 100. Default: 10.

  • creation_time_before (Optional[str]) – A filter to filter alert history before a time Default: None.

  • creation_time_after (Optional[str]) – A filter to filter alert history after a time Default: None.

  • status_equals (Optional[str]) – A filter to filter alert history by status Default: None.

Returns:

list of monitoring alert history. str: Next token.

Return type:

List[MonitoringAlertHistorySummary]

list_monitoring_alerts(next_token: str | None = None, max_results: int | None = 10)[source]#

List the monitoring alerts.

Parameters:
  • next_token (Optional[str]) – The pagination token. Default: None

  • max_results (Optional[int]) – The maximum number of results to return.

  • Default (Must be between 1 and 100.) – 10

Returns:

list of monitoring alert history. str: Next token.

Return type:

List[MonitoringAlertSummary]

classmethod monitoring_type()[source]#

Type of the monitoring job.

run_baseline(baseline_inputs, output, arguments=None, wait=True, logs=True, job_name=None)[source]#

Run a processing job meant to baseline your dataset.

Parameters:
  • baseline_inputs ([sagemaker.processing.ProcessingInput]) – Input files for the processing job. These must be provided as ProcessingInput objects.

  • output (sagemaker.processing.ProcessingOutput) – Destination of the constraint_violations and statistics json files.

  • arguments ([str]) – A list of string arguments to be passed to a processing job.

  • wait (bool) – Whether the call should wait until the job completes (default: True).

  • logs (bool) – Whether to show the logs produced by the job. Only meaningful when wait is True (default: True).

  • job_name (str) – Processing job name. If not specified, the processor generates a default job name, based on the image name and current timestamp.

start_monitoring_schedule()[source]#

Starts the monitoring schedule.

stop_monitoring_schedule()[source]#

Stops the monitoring schedule.

suggested_constraints(file_name='constraints.json')[source]#

Returns a Statistics object representing the constraints json file.

Object is generated by the latest baselining job

Parameters:

file_name (str) – The name of the .json constraints file

Returns:

The Constraints object representing the file that

was generated by the job.

Return type:

sagemaker.model_monitor.Constraints

update_monitoring_alert(monitoring_alert_name: str, data_points_to_alert: int | None, evaluation_period: int | None)[source]#

Update the monitoring schedule alert.

Args:

monitoring_alert_name (str): The name of the monitoring alert to update. data_points_to_alert (int): The data point to alert. evaluation_period (int): The period to evaluate the alert status.

Returns: None

update_monitoring_schedule(endpoint_input=None, output=None, statistics=None, constraints=None, schedule_cron_expression=None, instance_count=None, instance_type=None, entrypoint=None, volume_size_in_gb=None, volume_kms_key=None, output_kms_key=None, arguments=None, max_runtime_in_seconds=None, env=None, network_config=None, role=None, image_uri=None, batch_transform_input=None, data_analysis_start_time=None, data_analysis_end_time=None)[source]#

Updates the existing monitoring schedule.

If more options than schedule_cron_expression are to be updated, a new job definition will be created to hold them. The old job definition will not be deleted.

Parameters:
  • endpoint_input (str or sagemaker.model_monitor.EndpointInput) – The endpoint to monitor. This can either be the endpoint name or an EndpointInput.

  • output (sagemaker.model_monitor.MonitoringOutput) – The output of the monitoring schedule.

  • statistics (sagemaker.model_monitor.Statistic or str) – If provided alongside constraints, these will be used for monitoring the endpoint. This can be a sagemaker.model_monitor.Statistics object or an S3 uri pointing to a statistics JSON file.

  • constraints (sagemaker.model_monitor.Constraints or str) – If provided alongside statistics, these will be used for monitoring the endpoint. This can be a sagemaker.model_monitor.Constraints object or an S3 uri pointing to a constraints JSON file.

  • schedule_cron_expression (str) – The cron expression that dictates the frequency that this job runs at. See sagemaker.model_monitor.CronExpressionGenerator for valid expressions.

  • instance_count (int) – The number of instances to run the jobs with.

  • instance_type (str) – Type of EC2 instance to use for the job, for example, ‘ml.m5.xlarge’.

  • entrypoint (str) – The entrypoint for the job.

  • volume_size_in_gb (int) – Size in GB of the EBS volume to use for storing data during processing (default: 30).

  • volume_kms_key (str) – A KMS key for the job’s volume.

  • output_kms_key (str) – The KMS key id for the job’s outputs.

  • arguments ([str]) – A list of string arguments to be passed to a processing job.

  • max_runtime_in_seconds (int) – Timeout in seconds. After this amount of time, Amazon SageMaker terminates the job regardless of its current status. Default: 3600

  • env (dict) – Environment variables to be passed to the job.

  • network_config (sagemaker.network.NetworkConfig) – A NetworkConfig object that configures network isolation, encryption of inter-container traffic, security group IDs, and subnets.

  • role (str) – An AWS IAM role name or ARN. The Amazon SageMaker jobs use this role.

  • image_uri (str) – The uri of the image to use for the jobs started by the Monitor.

  • batch_transform_input (sagemaker.model_monitor.BatchTransformInput) – Inputs to run the monitoring schedule on the batch transform (default: None)

  • data_analysis_start_time (str) – Start time for the data analysis window for the one time monitoring schedule (NOW), e.g. “-PT1H” (default: None)

  • data_analysis_end_time (str) – End time for the data analysis window for the one time monitoring schedule (NOW), e.g. “-PT1H” (default: None)

class sagemaker.core.model_monitor.model_monitoring.ModelQualityMonitor(role=None, instance_count=1, instance_type='ml.m5.xlarge', volume_size_in_gb=30, volume_kms_key=None, output_kms_key=None, max_runtime_in_seconds=None, base_job_name=None, sagemaker_session=None, env=None, tags=None, network_config=None)[source]#

Bases: ModelMonitor

Amazon SageMaker model monitor to monitor quality metrics for an endpoint.

Please see the __init__ method of its base class for how to instantiate it.

JOB_DEFINITION_BASE_NAME = 'model-quality-job-definition'#
classmethod attach(monitor_schedule_name, sagemaker_session=None)[source]#

Sets this object’s schedule name to the name provided.

This allows subsequent describe_schedule or list_executions calls to point to the given schedule.

Parameters:
  • monitor_schedule_name (str) – The name of the schedule to attach to.

  • sagemaker_session (sagemaker.core.helper.session.Session) – Session object which manages interactions with Amazon SageMaker APIs and any other AWS services needed. If not specified, one is created using the default AWS configuration chain.

create_monitoring_schedule(endpoint_input=None, ground_truth_input=None, problem_type=None, record_preprocessor_script=None, post_analytics_processor_script=None, output_s3_uri=None, constraints=None, monitor_schedule_name=None, schedule_cron_expression=None, enable_cloudwatch_metrics=True, batch_transform_input=None, data_analysis_start_time=None, data_analysis_end_time=None)[source]#

Creates a monitoring schedule.

Parameters:
  • endpoint_input (str or sagemaker.model_monitor.EndpointInput) – The endpoint to monitor. This can either be the endpoint name or an EndpointInput. (default: None)

  • ground_truth_input (str) – S3 URI to ground truth dataset. (default: None)

  • problem_type (str) – The type of problem of this model quality monitoring. Valid values are “Regression”, “BinaryClassification”, “MulticlassClassification”. (default: None)

  • record_preprocessor_script (str) – The path to the record preprocessor script. This can be a local path or an S3 uri.

  • post_analytics_processor_script (str) – The path to the record post-analytics processor script. This can be a local path or an S3 uri.

  • output_s3_uri (str) – S3 destination of the constraint_violations and analysis result. Default: “s3://<default_session_bucket>/<job_name>/output”

  • constraints (sagemaker.model_monitor.Constraints or str) – If provided it will be used for monitoring the endpoint. It can be a Constraints object or an S3 uri pointing to a constraints JSON file.

  • monitor_schedule_name (str) – Schedule name. If not specified, the processor generates a default job name, based on the image name and current timestamp.

  • schedule_cron_expression (str) – The cron expression that dictates the frequency that this job run. See sagemaker.model_monitor.CronExpressionGenerator for valid expressions. Default: Daily.

  • enable_cloudwatch_metrics (bool) – Whether to publish cloudwatch metrics as part of the baselining or monitoring jobs.

  • batch_transform_input (sagemaker.model_monitor.BatchTransformInput) – Inputs to run the monitoring schedule on the batch transform

  • data_analysis_start_time (str) – Start time for the data analysis window for the one time monitoring schedule (NOW), e.g. “-PT1H” (default: None)

  • data_analysis_end_time (str) – End time for the data analysis window for the one time monitoring schedule (NOW), e.g. “-PT1H” (default: None)

delete_monitoring_schedule()[source]#

Deletes the monitoring schedule and its job definition.

classmethod monitoring_type()[source]#

Type of the monitoring job.

suggest_baseline(baseline_dataset, dataset_format, problem_type, inference_attribute=None, probability_attribute=None, ground_truth_attribute=None, probability_threshold_attribute=None, post_analytics_processor_script=None, output_s3_uri=None, wait=False, logs=False, job_name=None)[source]#

Suggest baselines for use with Amazon SageMaker Model Monitoring Schedules.

Parameters:
  • baseline_dataset (str) – The path to the baseline_dataset file. This can be a local path or an S3 uri.

  • dataset_format (dict) – The format of the baseline_dataset.

  • problem_type (str) – The type of problem of this model quality monitoring. Valid values are “Regression”, “BinaryClassification”, “MulticlassClassification”.

  • inference_attribute (str) – Index or JSONpath to locate predicted label(s). Only used for ModelQualityMonitor.

  • probability_attribute (str or int) – Index or JSONpath to locate probabilities. Only used for ModelQualityMonitor.

  • ground_truth_attribute (str) – Index to locate actual label(s). Only used for ModelQualityMonitor.

  • probability_threshold_attribute (float) – threshold to convert probabilities to binaries Only used for ModelQualityMonitor.

  • post_analytics_processor_script (str) – The path to the record post-analytics processor script. This can be a local path or an S3 uri.

  • output_s3_uri (str) – Desired S3 destination Destination of the constraint_violations and statistics json files. Default: “s3://<default_session_bucket>/<job_name>/output”

  • wait (bool) – Whether the call should wait until the job completes (default: False).

  • logs (bool) – Whether to show the logs produced by the job. Only meaningful when wait is True (default: False).

  • job_name (str) – Processing job name. If not specified, the processor generates a default job name, based on the image name and current timestamp.

Returns:

The ProcessingJob object representing the

baselining job.

Return type:

sagemaker.processing.ProcessingJob

update_monitoring_schedule(endpoint_input=None, ground_truth_input=None, problem_type=None, record_preprocessor_script=None, post_analytics_processor_script=None, output_s3_uri=None, constraints=None, schedule_cron_expression=None, enable_cloudwatch_metrics=None, role=None, instance_count=None, instance_type=None, volume_size_in_gb=None, volume_kms_key=None, output_kms_key=None, max_runtime_in_seconds=None, env=None, network_config=None, batch_transform_input=None, data_analysis_start_time=None, data_analysis_end_time=None)[source]#

Updates the existing monitoring schedule.

If more options than schedule_cron_expression are to be updated, a new job definition will be created to hold them. The old job definition will not be deleted.

Parameters:
  • endpoint_input (str or sagemaker.model_monitor.EndpointInput) – The endpoint to monitor. This can either be the endpoint name or an EndpointInput.

  • ground_truth_input (str) – S3 URI to ground truth dataset.

  • problem_type (str) – The type of problem of this model quality monitoring. Valid values are “Regression”, “BinaryClassification”, “MulticlassClassification”.

  • record_preprocessor_script (str) – The path to the record preprocessor script. This can be a local path or an S3 uri.

  • post_analytics_processor_script (str) – The path to the record post-analytics processor script. This can be a local path or an S3 uri.

  • output_s3_uri (str) – S3 destination of the constraint_violations and analysis result. Default: “s3://<default_session_bucket>/<job_name>/output”

  • constraints (sagemaker.model_monitor.Constraints or str) – If provided it will be used for monitoring the endpoint. It can be a Constraints object or an S3 uri pointing to a constraints JSON file.

  • schedule_cron_expression (str) – The cron expression that dictates the frequency that this job run. See sagemaker.model_monitor.CronExpressionGenerator for valid expressions. Default: Daily.

  • enable_cloudwatch_metrics (bool) – Whether to publish cloudwatch metrics as part of the baselining or monitoring jobs.

  • role (str) – An AWS IAM role. The Amazon SageMaker jobs use this role.

  • instance_count (int) – The number of instances to run the jobs with.

  • instance_type (str) – Type of EC2 instance to use for the job, for example, ‘ml.m5.xlarge’.

  • volume_size_in_gb (int) – Size in GB of the EBS volume to use for storing data during processing (default: 30).

  • volume_kms_key (str) – A KMS key for the job’s volume.

  • output_kms_key (str) – The KMS key id for the job’s outputs.

  • max_runtime_in_seconds (int) – Timeout in seconds. After this amount of time, Amazon SageMaker terminates the job regardless of its current status. Default: 3600

  • env (dict) – Environment variables to be passed to the job.

  • network_config (sagemaker.network.NetworkConfig) – A NetworkConfig object that configures network isolation, encryption of inter-container traffic, security group IDs, and subnets.

  • batch_transform_input (sagemaker.model_monitor.BatchTransformInput) – Inputs to run the monitoring schedule on the batch transform

  • data_analysis_start_time (str) – Start time for the data analysis window for the one time monitoring schedule (NOW), e.g. “-PT1H” (default: None)

  • data_analysis_end_time (str) – End time for the data analysis window for the one time monitoring schedule (NOW), e.g. “-PT1H” (default: None)

class sagemaker.core.model_monitor.model_monitoring.MonitoringExecution(sagemaker_session, job_name, inputs, output, output_kms_key=None)[source]#

Bases: ProcessingJob

Provides functionality to retrieve monitoring-specific files from monitoring executions.

constraint_violations(file_name='constraint_violations.json', kms_key=None)[source]#

Returns a sagemaker.model_monitor.

ConstraintViolations object representing the constraint violations JSON file generated by this monitoring execution.

Parameters:
  • file_name (str) – The name of the json-formatted constraint violations file.

  • kms_key (str) – The kms key to use when retrieving the file.

Returns:

The ConstraintViolations object

representing the file that was generated by the monitoring execution.

Return type:

sagemaker.model_monitor.ConstraintViolations

Raises:

UnexpectedStatusException – This is thrown if the job is not in a ‘Complete’ state.

describe()[source]#

Describe the processing job.

classmethod from_processing_arn(sagemaker_session, processing_job_arn)[source]#

Initializes a Baselining job from a processing arn.

Parameters:
  • processing_job_arn (str) – ARN of the processing job to create a MonitoringExecution

  • of. (out)

  • sagemaker_session (sagemaker.core.helper.session_helper.Session) – Session object which manages interactions with Amazon SageMaker APIs and any other AWS services needed. If not specified, one is created using the default AWS configuration chain.

Returns:

The instance of ProcessingJob created

using the current job name.

Return type:

sagemaker.processing.BaseliningJob

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid', 'protected_namespaces': (), 'validate_assignment': True}#

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

property output#

Get the first output from processing_output_config.

property outputs#

Get all outputs from processing_output_config.

statistics(file_name='statistics.json', kms_key=None)[source]#

Returns a sagemaker.model_monitor.

Statistics object representing the statistics JSON file generated by this monitoring execution.

Parameters:
  • file_name (str) – The name of the json-formatted statistics file

  • kms_key (str) – The kms key to use when retrieving the file.

Returns:

The Statistics object representing the file that

was generated by the execution.

Return type:

sagemaker.model_monitor.Statistics

Raises:

UnexpectedStatusException – This is thrown if the job is not in a ‘Complete’ state.

class sagemaker.core.model_monitor.model_monitoring.MonitoringInput(start_time_offset: str, end_time_offset: str, features_attribute: str, inference_attribute: str, probability_attribute: str | int, probability_threshold_attribute: float)[source]#

Bases: object

Accepts parameters specifying batch transform or endpoint inputs for monitoring execution.

MonitoringInput accepts parameters that specify additional parameters while monitoring jobs. It also provides a method to turn those parameters into a dictionary.

Parameters:
  • start_time_offset (str) – Monitoring start time offset, e.g. “-PT1H”

  • end_time_offset (str) – Monitoring end time offset, e.g. “-PT0H”.

  • features_attribute (str) – JSONpath to locate features in JSONlines dataset. Only used for ModelBiasMonitor and ModelExplainabilityMonitor

  • inference_attribute (str) – Index or JSONpath to locate predicted label(s). Only used for ModelQualityMonitor, ModelBiasMonitor, and ModelExplainabilityMonitor

  • probability_attribute (str) – Index or JSONpath to locate probabilities. Only used for ModelQualityMonitor, ModelBiasMonitor and ModelExplainabilityMonitor

  • probability_threshold_attribute (float) – threshold to convert probabilities to binaries Only used for ModelQualityMonitor, ModelBiasMonitor and ModelExplainabilityMonitor

end_time_offset: str#
features_attribute: str#
inference_attribute: str#
probability_attribute: str | int#
probability_threshold_attribute: float#
start_time_offset: str#
class sagemaker.core.model_monitor.model_monitoring.MonitoringOutput(source, destination=None, s3_upload_mode='Continuous')[source]#

Bases: object

Accepts parameters that specify an S3 output for a monitoring job.

It also provides a method to turn those parameters into a dictionary.