sagemaker.core.model_monitor.model_monitoring#
This module contains code related to Amazon SageMaker Model Monitoring.
These classes assist with suggesting baselines and creating monitoring schedules for data captured by SageMaker Endpoints.
Classes
|
Provides functionality to retrieve baseline-specific files output from baselining job. |
|
Accepts parameters that specify a batch transform input for monitoring schedule. |
|
Sets up Amazon SageMaker Monitoring Schedules and baseline suggestions. |
|
Accepts parameters that specify an endpoint input for monitoring execution. |
|
Sets up Amazon SageMaker Monitoring Schedules and baseline suggestions. |
|
Amazon SageMaker model monitor to monitor quality metrics for an endpoint. |
|
Provides functionality to retrieve monitoring-specific files from monitoring executions. |
|
Accepts parameters specifying batch transform or endpoint inputs for monitoring execution. |
|
Accepts parameters that specify an S3 output for a monitoring job. |
- class sagemaker.core.model_monitor.model_monitoring.BaseliningJob(sagemaker_session, job_name, inputs, outputs, output_kms_key=None)[source]#
Bases:
objectProvides functionality to retrieve baseline-specific files output from baselining job.
- baseline_statistics(file_name='statistics.json', kms_key=None)[source]#
Returns a sagemaker.model_monitor.
Statistics object representing the statistics JSON file generated by this baselining job.
- Parameters:
file_name (str) – The name of the json-formatted statistics file
kms_key (str) – The kms key to use when retrieving the file.
- Returns:
- The Statistics object representing the file that
was generated by the job.
- Return type:
sagemaker.model_monitor.Statistics
- Raises:
UnexpectedStatusException – This is thrown if the job is not in a ‘Complete’ state.
- classmethod from_processing_job(processing_job)[source]#
Initializes a Baselining job from a processing job.
- Parameters:
processing_job (sagemaker.processing.ProcessingJob) – The ProcessingJob used for baselining instance.
- Returns:
- The instance of ProcessingJob created
using the current job name.
- Return type:
sagemaker.processing.BaseliningJob
- suggested_constraints(file_name='constraints.json', kms_key=None)[source]#
Returns a sagemaker.model_monitor.
Constraints object representing the constraints JSON file generated by this baselining job.
- Parameters:
file_name (str) – The name of the json-formatted constraints file
kms_key (str) – The kms key to use when retrieving the file.
- Returns:
- The Constraints object representing the file that
was generated by the job.
- Return type:
sagemaker.model_monitor.Constraints
- Raises:
UnexpectedStatusException – This is thrown if the job is not in a ‘Complete’ state.
- class sagemaker.core.model_monitor.model_monitoring.BatchTransformInput(data_captured_destination_s3_uri: str, destination: str, dataset_format: MonitoringDatasetFormat, s3_input_mode: str = 'File', s3_data_distribution_type: str = 'FullyReplicated', start_time_offset: str | None = None, end_time_offset: str | None = None, features_attribute: str | None = None, inference_attribute: str | None = None, probability_attribute: str | None = None, probability_threshold_attribute: str | None = None, exclude_features_attribute: str | None = None)[source]#
Bases:
MonitoringInputAccepts parameters that specify a batch transform input for monitoring schedule.
It also provides a method to turn those parameters into a dictionary.
- class sagemaker.core.model_monitor.model_monitoring.DefaultModelMonitor(role=None, instance_count=1, instance_type='ml.m5.xlarge', volume_size_in_gb=30, volume_kms_key=None, output_kms_key=None, max_runtime_in_seconds=None, base_job_name=None, sagemaker_session=None, env=None, tags=None, network_config=None)[source]#
Bases:
ModelMonitorSets up Amazon SageMaker Monitoring Schedules and baseline suggestions.
Use this class when you want to utilize Amazon SageMaker Monitoring’s plug-and-play solution that only requires your dataset and optional pre/postprocessing scripts. For a more customized experience, consider using the ModelMonitor class instead.
- JOB_DEFINITION_BASE_NAME = 'data-quality-job-definition'#
- classmethod attach(monitor_schedule_name, sagemaker_session=None)[source]#
Sets this object’s schedule name to the name provided.
This allows subsequent describe_schedule or list_executions calls to point to the given schedule.
- Parameters:
monitor_schedule_name (str) – The name of the schedule to attach to.
sagemaker_session (sagemaker.core.helper.session.Session) – Session object which manages interactions with Amazon SageMaker APIs and any other AWS services needed. If not specified, one is created using the default AWS configuration chain.
- create_monitoring_schedule(endpoint_input=None, record_preprocessor_script=None, post_analytics_processor_script=None, output_s3_uri=None, constraints=None, statistics=None, monitor_schedule_name=None, schedule_cron_expression=None, enable_cloudwatch_metrics=True, batch_transform_input=None, data_analysis_start_time=None, data_analysis_end_time=None)[source]#
Creates a monitoring schedule to monitor an Amazon SageMaker Endpoint.
If constraints and statistics are provided, or if they are able to be retrieved from a previous baselining job associated with this monitor, those will be used. If constraints and statistics cannot be automatically retrieved, baseline_inputs will be required in order to kick off a baselining job.
- Parameters:
endpoint_input (str or sagemaker.model_monitor.EndpointInput) – The endpoint to monitor. This can either be the endpoint name or an EndpointInput. (default: None)
record_preprocessor_script (str) – The path to the record preprocessor script. This can be a local path or an S3 uri.
post_analytics_processor_script (str) – The path to the record post-analytics processor script. This can be a local path or an S3 uri.
output_s3_uri (str) – Desired S3 destination of the constraint_violations and statistics json files. Default: “s3://<default_session_bucket>/<job_name>/output”
constraints (sagemaker.model_monitor.Constraints or str) – If provided alongside statistics, these will be used for monitoring the endpoint. This can be a sagemaker.model_monitor.Constraints object or an s3_uri pointing to a constraints JSON file.
statistics (sagemaker.model_monitor.Statistic or str) – If provided alongside constraints, these will be used for monitoring the endpoint. This can be a sagemaker.model_monitor.Statistics object or an s3_uri pointing to a statistics JSON file.
monitor_schedule_name (str) – Schedule name. If not specified, the processor generates a default job name, based on the image name and current timestamp.
schedule_cron_expression (str) – The cron expression that dictates the frequency that this job run. See sagemaker.model_monitor.CronExpressionGenerator for valid expressions. Default: Daily.
enable_cloudwatch_metrics (bool) – Whether to publish cloudwatch metrics as part of the baselining or monitoring jobs.
batch_transform_input (sagemaker.model_monitor.BatchTransformInput) – Inputs to run the monitoring schedule on the batch transform (default: None)
data_analysis_start_time (str) – Start time for the data analysis window for the one time monitoring schedule (NOW), e.g. “-PT1H” (default: None)
data_analysis_end_time (str) – End time for the data analysis window for the one time monitoring schedule (NOW), e.g. “-PT1H” (default: None)
- latest_monitoring_constraint_violations()[source]#
Returns the sagemaker.model_monitor.
ConstraintViolations generated by the latest monitoring execution.
- Returns:
- The ConstraintViolations object
representing the file generated by the latest monitoring execution.
- Return type:
sagemaker.model_monitoring.ConstraintViolations
- latest_monitoring_statistics()[source]#
Returns the sagemaker.model_monitor.Statistics.
These are the statistics generated by the latest monitoring execution.
- Returns:
- The Statistics object representing the file
generated by the latest monitoring execution.
- Return type:
sagemaker.model_monitoring.Statistics
- run_baseline()[source]#
Not implemented.
‘.run_baseline()’ is only allowed for ModelMonitor objects. Please use suggest_baseline for DefaultModelMonitor objects, instead.
- Raises:
NotImplementedError –
- suggest_baseline(baseline_dataset, dataset_format, record_preprocessor_script=None, post_analytics_processor_script=None, output_s3_uri=None, wait=True, logs=True, job_name=None, monitoring_config_override=None)[source]#
Suggest baselines for use with Amazon SageMaker Model Monitoring Schedules.
- Parameters:
baseline_dataset (str) – The path to the baseline_dataset file. This can be a local path or an S3 uri.
dataset_format (dict) – The format of the baseline_dataset.
record_preprocessor_script (str) – The path to the record preprocessor script. This can be a local path or an S3 uri.
post_analytics_processor_script (str) – The path to the record post-analytics processor script. This can be a local path or an S3 uri.
output_s3_uri (str) – Desired S3 destination Destination of the constraint_violations and statistics json files. Default: “s3://<default_session_bucket>/<job_name>/output”
wait (bool) – Whether the call should wait until the job completes (default: True).
logs (bool) – Whether to show the logs produced by the job. Only meaningful when wait is True (default: True).
job_name (str) – Processing job name. If not specified, the processor generates a default job name, based on the image name and current timestamp.
monitoring_config_override (DataQualityMonitoringConfig) – monitoring_config object to override the global monitoring_config parameter of constraints suggested by Model Monitor Container. If not specified, the values suggested by container is set.
- Returns:
- The ProcessingJob object representing the
baselining job.
- Return type:
sagemaker.processing.ProcessingJob
- update_monitoring_schedule(endpoint_input=None, record_preprocessor_script=None, post_analytics_processor_script=None, output_s3_uri=None, statistics=None, constraints=None, schedule_cron_expression=None, instance_count=None, instance_type=None, volume_size_in_gb=None, volume_kms_key=None, output_kms_key=None, max_runtime_in_seconds=None, env=None, network_config=None, enable_cloudwatch_metrics=None, role=None, batch_transform_input=None, data_analysis_start_time=None, data_analysis_end_time=None)[source]#
Updates the existing monitoring schedule.
- Parameters:
endpoint_input (str or sagemaker.model_monitor.EndpointInput) – The endpoint to monitor. This can either be the endpoint name or an EndpointInput.
record_preprocessor_script (str) – The path to the record preprocessor script. This can be a local path or an S3 uri.
post_analytics_processor_script (str) – The path to the record post-analytics processor script. This can be a local path or an S3 uri.
output_s3_uri (str) – Desired S3 destination of the constraint_violations and statistics json files.
statistics (sagemaker.model_monitor.Statistic or str) – If provided alongside constraints, these will be used for monitoring the endpoint. This can be a sagemaker.model_monitor.Statistics object or an S3 uri pointing to a statistics JSON file.
constraints (sagemaker.model_monitor.Constraints or str) – If provided alongside statistics, these will be used for monitoring the endpoint. This can be a sagemaker.model_monitor.Constraints object or an S3 uri pointing to a constraints JSON file.
schedule_cron_expression (str) – The cron expression that dictates the frequency that this job runs at. See sagemaker.model_monitor.CronExpressionGenerator for valid expressions.
instance_count (int) – The number of instances to run the jobs with.
instance_type (str) – Type of EC2 instance to use for the job, for example, ‘ml.m5.xlarge’.
volume_size_in_gb (int) – Size in GB of the EBS volume to use for storing data during processing (default: 30).
volume_kms_key (str) – A KMS key for the job’s volume.
output_kms_key (str) – The KMS key id for the job’s outputs.
max_runtime_in_seconds (int) – Timeout in seconds. After this amount of time, Amazon SageMaker terminates the job regardless of its current status. Default: 3600
env (dict) – Environment variables to be passed to the job.
network_config (sagemaker.network.NetworkConfig) – A NetworkConfig object that configures network isolation, encryption of inter-container traffic, security group IDs, and subnets.
enable_cloudwatch_metrics (bool) – Whether to publish cloudwatch metrics as part of the baselining or monitoring jobs.
role (str) – An AWS IAM role name or ARN. The Amazon SageMaker jobs use this role.
batch_transform_input (sagemaker.model_monitor.BatchTransformInput) – Inputs to run the monitoring schedule on the batch transform (default: None)
data_analysis_start_time (str) – Start time for the data analysis window for the one time monitoring schedule (NOW), e.g. “-PT1H” (default: None)
data_analysis_end_time (str) – End time for the data analysis window for the one time monitoring schedule (NOW), e.g. “-PT1H” (default: None)
- class sagemaker.core.model_monitor.model_monitoring.EndpointInput(endpoint_name, destination, s3_input_mode='File', s3_data_distribution_type='FullyReplicated', start_time_offset=None, end_time_offset=None, features_attribute=None, inference_attribute=None, probability_attribute=None, probability_threshold_attribute=None, exclude_features_attribute=None)[source]#
Bases:
objectAccepts parameters that specify an endpoint input for monitoring execution.
It also provides a method to turn those parameters into a dictionary.
- class sagemaker.core.model_monitor.model_monitoring.ModelMonitor(role=None, image_uri=None, instance_count=1, instance_type='ml.m5.xlarge', entrypoint=None, volume_size_in_gb=30, volume_kms_key=None, output_kms_key=None, max_runtime_in_seconds=None, base_job_name=None, sagemaker_session=None, env=None, tags=None, network_config=None)[source]#
Bases:
objectSets up Amazon SageMaker Monitoring Schedules and baseline suggestions.
Use this class when you want to provide your own container image containing the code you’d like to run, in order to produce your own statistics and constraint validation files. For a more guided experience, consider using the DefaultModelMonitor class instead.
- classmethod attach(monitor_schedule_name, sagemaker_session=None)[source]#
Set this object’s schedule name point to the Amazon Sagemaker Monitoring Schedule name.
This allows subsequent describe_schedule or list_executions calls to point to the given schedule.
- Parameters:
monitor_schedule_name (str) – The name of the schedule to attach to.
sagemaker_session (sagemaker.core.helper.session_helper.Session) – Session object which manages interactions with Amazon SageMaker APIs and any other AWS services needed. If not specified, one is created using the default AWS configuration chain.
- baseline_statistics(file_name='statistics.json')[source]#
Returns a Statistics object representing the statistics json file
Object is generated by the latest baselining job.
- Parameters:
file_name (str) – The name of the .json statistics file
- Returns:
- The Statistics object representing the file that
was generated by the job.
- Return type:
sagemaker.model_monitor.Statistics
- create_monitoring_schedule(endpoint_input=None, output=None, statistics=None, constraints=None, monitor_schedule_name=None, schedule_cron_expression=None, batch_transform_input=None, arguments=None, data_analysis_start_time=None, data_analysis_end_time=None)[source]#
Creates a monitoring schedule to monitor an Amazon SageMaker Endpoint.
If constraints and statistics are provided, or if they are able to be retrieved from a previous baselining job associated with this monitor, those will be used. If constraints and statistics cannot be automatically retrieved, baseline_inputs will be required in order to kick off a baselining job.
- Parameters:
endpoint_input (str or sagemaker.model_monitor.EndpointInput) – The endpoint to monitor. This can either be the endpoint name or an EndpointInput. (default: None)
output (sagemaker.model_monitor.MonitoringOutput) – The output of the monitoring schedule. (default: None)
statistics (sagemaker.model_monitor.Statistic or str) – If provided alongside constraints, these will be used for monitoring the endpoint. This can be a sagemaker.model_monitor.Statistic object or an S3 uri pointing to a statistic JSON file. (default: None)
constraints (sagemaker.model_monitor.Constraints or str) – If provided alongside statistics, these will be used for monitoring the endpoint. This can be a sagemaker.model_monitor.Constraints object or an S3 uri pointing to a constraints JSON file. (default: None)
monitor_schedule_name (str) – Schedule name. If not specified, the processor generates a default job name, based on the image name and current timestamp. (default: None)
schedule_cron_expression (str) – The cron expression that dictates the frequency that this job runs at. See sagemaker.model_monitor.CronExpressionGenerator for valid expressions. Default: Daily. (default: None)
batch_transform_input (sagemaker.model_monitor.BatchTransformInput) – Inputs to run the monitoring schedule on the batch transform (default: None)
arguments ([str]) – A list of string arguments to be passed to a processing job.
data_analysis_start_time (str) – Start time for the data analysis window for the one time monitoring schedule (NOW), e.g. “-PT1H” (default: None)
data_analysis_end_time (str) – End time for the data analysis window for the one time monitoring schedule (NOW), e.g. “-PT1H” (default: None)
- delete_monitoring_schedule()[source]#
Deletes the monitoring schedule (subclass is responsible for deleting job definition)
- describe_latest_baselining_job()[source]#
Describe the latest baselining job kicked off by the suggest workflow.
- describe_schedule()[source]#
Describes the schedule that this object represents.
- Returns:
A dictionary response with the monitoring schedule description.
- Return type:
dict
- get_latest_execution_logs(wait=False)[source]#
Get the processing job logs for the most recent monitoring execution
- Parameters:
wait (bool) – Whether the call should wait until the job completes (default: False).
- Raises:
ValueError – If no execution job or processing job for the last execution has run
Returns: None
- latest_monitoring_constraint_violations(file_name='constraint_violations.json')[source]#
Returns the sagemaker.model_monitor.
ConstraintViolations generated by the latest monitoring execution.
- Parameters:
file_name (str) – The name of the constraint violdations file to be retrieved. Only override if generating a custom file name.
- Returns:
- The ConstraintViolations object
representing the file generated by the latest monitoring execution.
- Return type:
sagemaker.model_monitoring.ConstraintViolations
- latest_monitoring_statistics(file_name='statistics.json')[source]#
Returns the sagemaker.model_monitor.
Statistics generated by the latest monitoring execution.
- Parameters:
file_name (str) – The name of the statistics file to be retrieved. Only override if generating a custom file name.
- Returns:
- The Statistics object representing the file
generated by the latest monitoring execution.
- Return type:
sagemaker.model_monitoring.Statistics
- list_executions()[source]#
Get the list of the latest monitoring executions in descending order of “ScheduledTime”.
Statistics or violations can be called following this example: .. rubric:: Example
>>> my_executions = my_monitor.list_executions() >>> second_to_last_execution_statistics = my_executions[-1].statistics() >>> second_to_last_execution_violations = my_executions[-1].constraint_violations()
- Returns:
- List of MonitoringExecutions in
descending order of “ScheduledTime”.
- Return type:
[sagemaker.model_monitor.MonitoringExecution]
- list_monitoring_alert_history(monitoring_alert_name: str | None = None, sort_by: str | None = 'CreationTime', sort_order: str | None = 'Descending', next_token: str | None = None, max_results: int | None = 10, creation_time_before: str | None = None, creation_time_after: str | None = None, status_equals: str | None = None)[source]#
Lists the alert history associated with the given schedule_name and alert_name.
- Parameters:
monitoring_alert_name (Optional[str]) – The name of the alert_name to filter on. If not provided, does not filter on it. Default: None.
sort_by (Optional[str]) – sort_by (str): The field to sort by. Can be one of: “Name”, “CreationTime” Default: “CreationTime”.
sort_order (Optional[str]) – The sort order. Can be one of: “Ascending”, “Descending”. Default: “Descending”.
next_token (Optional[str]) – The pagination token. Default: None.
max_results (Optional[int]) – The maximum number of results to return. Must be between 1 and 100. Default: 10.
creation_time_before (Optional[str]) – A filter to filter alert history before a time Default: None.
creation_time_after (Optional[str]) – A filter to filter alert history after a time Default: None.
status_equals (Optional[str]) – A filter to filter alert history by status Default: None.
- Returns:
list of monitoring alert history. str: Next token.
- Return type:
- list_monitoring_alerts(next_token: str | None = None, max_results: int | None = 10)[source]#
List the monitoring alerts.
- Parameters:
next_token (Optional[str]) – The pagination token. Default: None
max_results (Optional[int]) – The maximum number of results to return.
Default (Must be between 1 and 100.) – 10
- Returns:
list of monitoring alert history. str: Next token.
- Return type:
List[MonitoringAlertSummary]
- run_baseline(baseline_inputs, output, arguments=None, wait=True, logs=True, job_name=None)[source]#
Run a processing job meant to baseline your dataset.
- Parameters:
baseline_inputs ([sagemaker.processing.ProcessingInput]) – Input files for the processing job. These must be provided as ProcessingInput objects.
output (sagemaker.processing.ProcessingOutput) – Destination of the constraint_violations and statistics json files.
arguments ([str]) – A list of string arguments to be passed to a processing job.
wait (bool) – Whether the call should wait until the job completes (default: True).
logs (bool) – Whether to show the logs produced by the job. Only meaningful when wait is True (default: True).
job_name (str) – Processing job name. If not specified, the processor generates a default job name, based on the image name and current timestamp.
- suggested_constraints(file_name='constraints.json')[source]#
Returns a Statistics object representing the constraints json file.
Object is generated by the latest baselining job
- Parameters:
file_name (str) – The name of the .json constraints file
- Returns:
- The Constraints object representing the file that
was generated by the job.
- Return type:
sagemaker.model_monitor.Constraints
- update_monitoring_alert(monitoring_alert_name: str, data_points_to_alert: int | None, evaluation_period: int | None)[source]#
Update the monitoring schedule alert.
- Args:
monitoring_alert_name (str): The name of the monitoring alert to update. data_points_to_alert (int): The data point to alert. evaluation_period (int): The period to evaluate the alert status.
Returns: None
- update_monitoring_schedule(endpoint_input=None, output=None, statistics=None, constraints=None, schedule_cron_expression=None, instance_count=None, instance_type=None, entrypoint=None, volume_size_in_gb=None, volume_kms_key=None, output_kms_key=None, arguments=None, max_runtime_in_seconds=None, env=None, network_config=None, role=None, image_uri=None, batch_transform_input=None, data_analysis_start_time=None, data_analysis_end_time=None)[source]#
Updates the existing monitoring schedule.
If more options than schedule_cron_expression are to be updated, a new job definition will be created to hold them. The old job definition will not be deleted.
- Parameters:
endpoint_input (str or sagemaker.model_monitor.EndpointInput) – The endpoint to monitor. This can either be the endpoint name or an EndpointInput.
output (sagemaker.model_monitor.MonitoringOutput) – The output of the monitoring schedule.
statistics (sagemaker.model_monitor.Statistic or str) – If provided alongside constraints, these will be used for monitoring the endpoint. This can be a sagemaker.model_monitor.Statistics object or an S3 uri pointing to a statistics JSON file.
constraints (sagemaker.model_monitor.Constraints or str) – If provided alongside statistics, these will be used for monitoring the endpoint. This can be a sagemaker.model_monitor.Constraints object or an S3 uri pointing to a constraints JSON file.
schedule_cron_expression (str) – The cron expression that dictates the frequency that this job runs at. See sagemaker.model_monitor.CronExpressionGenerator for valid expressions.
instance_count (int) – The number of instances to run the jobs with.
instance_type (str) – Type of EC2 instance to use for the job, for example, ‘ml.m5.xlarge’.
entrypoint (str) – The entrypoint for the job.
volume_size_in_gb (int) – Size in GB of the EBS volume to use for storing data during processing (default: 30).
volume_kms_key (str) – A KMS key for the job’s volume.
output_kms_key (str) – The KMS key id for the job’s outputs.
arguments ([str]) – A list of string arguments to be passed to a processing job.
max_runtime_in_seconds (int) – Timeout in seconds. After this amount of time, Amazon SageMaker terminates the job regardless of its current status. Default: 3600
env (dict) – Environment variables to be passed to the job.
network_config (sagemaker.network.NetworkConfig) – A NetworkConfig object that configures network isolation, encryption of inter-container traffic, security group IDs, and subnets.
role (str) – An AWS IAM role name or ARN. The Amazon SageMaker jobs use this role.
image_uri (str) – The uri of the image to use for the jobs started by the Monitor.
batch_transform_input (sagemaker.model_monitor.BatchTransformInput) – Inputs to run the monitoring schedule on the batch transform (default: None)
data_analysis_start_time (str) – Start time for the data analysis window for the one time monitoring schedule (NOW), e.g. “-PT1H” (default: None)
data_analysis_end_time (str) – End time for the data analysis window for the one time monitoring schedule (NOW), e.g. “-PT1H” (default: None)
- class sagemaker.core.model_monitor.model_monitoring.ModelQualityMonitor(role=None, instance_count=1, instance_type='ml.m5.xlarge', volume_size_in_gb=30, volume_kms_key=None, output_kms_key=None, max_runtime_in_seconds=None, base_job_name=None, sagemaker_session=None, env=None, tags=None, network_config=None)[source]#
Bases:
ModelMonitorAmazon SageMaker model monitor to monitor quality metrics for an endpoint.
Please see the __init__ method of its base class for how to instantiate it.
- JOB_DEFINITION_BASE_NAME = 'model-quality-job-definition'#
- classmethod attach(monitor_schedule_name, sagemaker_session=None)[source]#
Sets this object’s schedule name to the name provided.
This allows subsequent describe_schedule or list_executions calls to point to the given schedule.
- Parameters:
monitor_schedule_name (str) – The name of the schedule to attach to.
sagemaker_session (sagemaker.core.helper.session.Session) – Session object which manages interactions with Amazon SageMaker APIs and any other AWS services needed. If not specified, one is created using the default AWS configuration chain.
- create_monitoring_schedule(endpoint_input=None, ground_truth_input=None, problem_type=None, record_preprocessor_script=None, post_analytics_processor_script=None, output_s3_uri=None, constraints=None, monitor_schedule_name=None, schedule_cron_expression=None, enable_cloudwatch_metrics=True, batch_transform_input=None, data_analysis_start_time=None, data_analysis_end_time=None)[source]#
Creates a monitoring schedule.
- Parameters:
endpoint_input (str or sagemaker.model_monitor.EndpointInput) – The endpoint to monitor. This can either be the endpoint name or an EndpointInput. (default: None)
ground_truth_input (str) – S3 URI to ground truth dataset. (default: None)
problem_type (str) – The type of problem of this model quality monitoring. Valid values are “Regression”, “BinaryClassification”, “MulticlassClassification”. (default: None)
record_preprocessor_script (str) – The path to the record preprocessor script. This can be a local path or an S3 uri.
post_analytics_processor_script (str) – The path to the record post-analytics processor script. This can be a local path or an S3 uri.
output_s3_uri (str) – S3 destination of the constraint_violations and analysis result. Default: “s3://<default_session_bucket>/<job_name>/output”
constraints (sagemaker.model_monitor.Constraints or str) – If provided it will be used for monitoring the endpoint. It can be a Constraints object or an S3 uri pointing to a constraints JSON file.
monitor_schedule_name (str) – Schedule name. If not specified, the processor generates a default job name, based on the image name and current timestamp.
schedule_cron_expression (str) – The cron expression that dictates the frequency that this job run. See sagemaker.model_monitor.CronExpressionGenerator for valid expressions. Default: Daily.
enable_cloudwatch_metrics (bool) – Whether to publish cloudwatch metrics as part of the baselining or monitoring jobs.
batch_transform_input (sagemaker.model_monitor.BatchTransformInput) – Inputs to run the monitoring schedule on the batch transform
data_analysis_start_time (str) – Start time for the data analysis window for the one time monitoring schedule (NOW), e.g. “-PT1H” (default: None)
data_analysis_end_time (str) – End time for the data analysis window for the one time monitoring schedule (NOW), e.g. “-PT1H” (default: None)
- suggest_baseline(baseline_dataset, dataset_format, problem_type, inference_attribute=None, probability_attribute=None, ground_truth_attribute=None, probability_threshold_attribute=None, post_analytics_processor_script=None, output_s3_uri=None, wait=False, logs=False, job_name=None)[source]#
Suggest baselines for use with Amazon SageMaker Model Monitoring Schedules.
- Parameters:
baseline_dataset (str) – The path to the baseline_dataset file. This can be a local path or an S3 uri.
dataset_format (dict) – The format of the baseline_dataset.
problem_type (str) – The type of problem of this model quality monitoring. Valid values are “Regression”, “BinaryClassification”, “MulticlassClassification”.
inference_attribute (str) – Index or JSONpath to locate predicted label(s). Only used for ModelQualityMonitor.
probability_attribute (str or int) – Index or JSONpath to locate probabilities. Only used for ModelQualityMonitor.
ground_truth_attribute (str) – Index to locate actual label(s). Only used for ModelQualityMonitor.
probability_threshold_attribute (float) – threshold to convert probabilities to binaries Only used for ModelQualityMonitor.
post_analytics_processor_script (str) – The path to the record post-analytics processor script. This can be a local path or an S3 uri.
output_s3_uri (str) – Desired S3 destination Destination of the constraint_violations and statistics json files. Default: “s3://<default_session_bucket>/<job_name>/output”
wait (bool) – Whether the call should wait until the job completes (default: False).
logs (bool) – Whether to show the logs produced by the job. Only meaningful when wait is True (default: False).
job_name (str) – Processing job name. If not specified, the processor generates a default job name, based on the image name and current timestamp.
- Returns:
- The ProcessingJob object representing the
baselining job.
- Return type:
sagemaker.processing.ProcessingJob
- update_monitoring_schedule(endpoint_input=None, ground_truth_input=None, problem_type=None, record_preprocessor_script=None, post_analytics_processor_script=None, output_s3_uri=None, constraints=None, schedule_cron_expression=None, enable_cloudwatch_metrics=None, role=None, instance_count=None, instance_type=None, volume_size_in_gb=None, volume_kms_key=None, output_kms_key=None, max_runtime_in_seconds=None, env=None, network_config=None, batch_transform_input=None, data_analysis_start_time=None, data_analysis_end_time=None)[source]#
Updates the existing monitoring schedule.
If more options than schedule_cron_expression are to be updated, a new job definition will be created to hold them. The old job definition will not be deleted.
- Parameters:
endpoint_input (str or sagemaker.model_monitor.EndpointInput) – The endpoint to monitor. This can either be the endpoint name or an EndpointInput.
ground_truth_input (str) – S3 URI to ground truth dataset.
problem_type (str) – The type of problem of this model quality monitoring. Valid values are “Regression”, “BinaryClassification”, “MulticlassClassification”.
record_preprocessor_script (str) – The path to the record preprocessor script. This can be a local path or an S3 uri.
post_analytics_processor_script (str) – The path to the record post-analytics processor script. This can be a local path or an S3 uri.
output_s3_uri (str) – S3 destination of the constraint_violations and analysis result. Default: “s3://<default_session_bucket>/<job_name>/output”
constraints (sagemaker.model_monitor.Constraints or str) – If provided it will be used for monitoring the endpoint. It can be a Constraints object or an S3 uri pointing to a constraints JSON file.
schedule_cron_expression (str) – The cron expression that dictates the frequency that this job run. See sagemaker.model_monitor.CronExpressionGenerator for valid expressions. Default: Daily.
enable_cloudwatch_metrics (bool) – Whether to publish cloudwatch metrics as part of the baselining or monitoring jobs.
role (str) – An AWS IAM role. The Amazon SageMaker jobs use this role.
instance_count (int) – The number of instances to run the jobs with.
instance_type (str) – Type of EC2 instance to use for the job, for example, ‘ml.m5.xlarge’.
volume_size_in_gb (int) – Size in GB of the EBS volume to use for storing data during processing (default: 30).
volume_kms_key (str) – A KMS key for the job’s volume.
output_kms_key (str) – The KMS key id for the job’s outputs.
max_runtime_in_seconds (int) – Timeout in seconds. After this amount of time, Amazon SageMaker terminates the job regardless of its current status. Default: 3600
env (dict) – Environment variables to be passed to the job.
network_config (sagemaker.network.NetworkConfig) – A NetworkConfig object that configures network isolation, encryption of inter-container traffic, security group IDs, and subnets.
batch_transform_input (sagemaker.model_monitor.BatchTransformInput) – Inputs to run the monitoring schedule on the batch transform
data_analysis_start_time (str) – Start time for the data analysis window for the one time monitoring schedule (NOW), e.g. “-PT1H” (default: None)
data_analysis_end_time (str) – End time for the data analysis window for the one time monitoring schedule (NOW), e.g. “-PT1H” (default: None)
- class sagemaker.core.model_monitor.model_monitoring.MonitoringExecution(sagemaker_session, job_name, inputs, output, output_kms_key=None)[source]#
Bases:
ProcessingJobProvides functionality to retrieve monitoring-specific files from monitoring executions.
- constraint_violations(file_name='constraint_violations.json', kms_key=None)[source]#
Returns a sagemaker.model_monitor.
ConstraintViolations object representing the constraint violations JSON file generated by this monitoring execution.
- Parameters:
file_name (str) – The name of the json-formatted constraint violations file.
kms_key (str) – The kms key to use when retrieving the file.
- Returns:
- The ConstraintViolations object
representing the file that was generated by the monitoring execution.
- Return type:
sagemaker.model_monitor.ConstraintViolations
- Raises:
UnexpectedStatusException – This is thrown if the job is not in a ‘Complete’ state.
- classmethod from_processing_arn(sagemaker_session, processing_job_arn)[source]#
Initializes a Baselining job from a processing arn.
- Parameters:
processing_job_arn (str) – ARN of the processing job to create a MonitoringExecution
of. (out)
sagemaker_session (sagemaker.core.helper.session_helper.Session) – Session object which manages interactions with Amazon SageMaker APIs and any other AWS services needed. If not specified, one is created using the default AWS configuration chain.
- Returns:
- The instance of ProcessingJob created
using the current job name.
- Return type:
sagemaker.processing.BaseliningJob
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid', 'protected_namespaces': (), 'validate_assignment': True}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- property output#
Get the first output from processing_output_config.
- property outputs#
Get all outputs from processing_output_config.
- statistics(file_name='statistics.json', kms_key=None)[source]#
Returns a sagemaker.model_monitor.
Statistics object representing the statistics JSON file generated by this monitoring execution.
- Parameters:
file_name (str) – The name of the json-formatted statistics file
kms_key (str) – The kms key to use when retrieving the file.
- Returns:
- The Statistics object representing the file that
was generated by the execution.
- Return type:
sagemaker.model_monitor.Statistics
- Raises:
UnexpectedStatusException – This is thrown if the job is not in a ‘Complete’ state.
- class sagemaker.core.model_monitor.model_monitoring.MonitoringInput(start_time_offset: str, end_time_offset: str, features_attribute: str, inference_attribute: str, probability_attribute: str | int, probability_threshold_attribute: float)[source]#
Bases:
objectAccepts parameters specifying batch transform or endpoint inputs for monitoring execution.
MonitoringInput accepts parameters that specify additional parameters while monitoring jobs. It also provides a method to turn those parameters into a dictionary.
- Parameters:
start_time_offset (str) – Monitoring start time offset, e.g. “-PT1H”
end_time_offset (str) – Monitoring end time offset, e.g. “-PT0H”.
features_attribute (str) – JSONpath to locate features in JSONlines dataset. Only used for ModelBiasMonitor and ModelExplainabilityMonitor
inference_attribute (str) – Index or JSONpath to locate predicted label(s). Only used for ModelQualityMonitor, ModelBiasMonitor, and ModelExplainabilityMonitor
probability_attribute (str) – Index or JSONpath to locate probabilities. Only used for ModelQualityMonitor, ModelBiasMonitor and ModelExplainabilityMonitor
probability_threshold_attribute (float) – threshold to convert probabilities to binaries Only used for ModelQualityMonitor, ModelBiasMonitor and ModelExplainabilityMonitor
- end_time_offset: str#
- features_attribute: str#
- inference_attribute: str#
- probability_attribute: str | int#
- probability_threshold_attribute: float#
- start_time_offset: str#