sagemaker.train.evaluate.custom_scorer_evaluator#
Custom Scorer Evaluator for SageMaker Model Evaluation Module.
This module provides evaluation capabilities using custom scorer metrics, supporting both built-in preset metrics and custom evaluator implementations for flexible model evaluation workflows.
Functions
Get the built-in metrics enum for custom scorer evaluation. |
Classes
|
Custom scorer evaluation job for preset or custom evaluator metrics. |
- class sagemaker.train.evaluate.custom_scorer_evaluator.CustomScorerEvaluator(*, region: str | None = None, role: str | None = None, sagemaker_session: Any | None = None, model: str | BaseTrainer | ModelPackage, base_eval_name: str | None = None, s3_output_path: str, mlflow_resource_arn: str | None = None, mlflow_experiment_name: str | None = None, mlflow_run_name: str | None = None, networking: VpcConfig | None = None, kms_key_id: str | None = None, model_package_group: str | ModelPackageGroup | None = None, evaluator: str | Any, dataset: Any, evaluate_base_model: bool = False)[source]#
Bases:
BaseEvaluatorCustom scorer evaluation job for preset or custom evaluator metrics.
This evaluator supports both preset metrics (via built-in metrics enum) and custom evaluator implementations for specialized evaluation needs.
- evaluator#
Built-in metric enum value, Evaluator object, or Evaluator ARN string. Required. Use
get_builtin_metrics()for available preset metrics.- Type:
Union[str, Any]
- dataset#
Dataset for evaluation. Required. Accepts S3 URI, Dataset ARN, or DataSet object.
- Type:
Any
- mlflow_resource_arn#
ARN of the MLflow tracking server for experiment tracking. Optional. If not provided, the system will attempt to resolve it using the default MLflow app experience (checks domain match, account default, or creates a new app). Inherited from BaseEvaluator.
- Type:
Optional[str]
- evaluate_base_model#
Whether to evaluate the base model in addition to the custom model. Set to False to skip base model evaluation and only evaluate the custom model. Defaults to True (evaluates both models).
- Type:
bool
- region#
AWS region. Inherited from BaseEvaluator.
- Type:
Optional[str]
- sagemaker_session#
SageMaker session object. Inherited from BaseEvaluator.
- Type:
Optional[Any]
- model#
Model for evaluation. Inherited from BaseEvaluator.
- Type:
Union[str, Any]
- base_eval_name#
Base name for evaluation jobs. Inherited from BaseEvaluator.
- Type:
Optional[str]
- s3_output_path#
S3 location for evaluation outputs. Inherited from BaseEvaluator.
- Type:
str
- mlflow_experiment_name#
MLflow experiment name. Inherited from BaseEvaluator.
- Type:
Optional[str]
- mlflow_run_name#
MLflow run name. Inherited from BaseEvaluator.
- Type:
Optional[str]
- kms_key_id#
KMS key ID for encryption. Inherited from BaseEvaluator.
- Type:
Optional[str]
- model_package_group#
Model package group. Inherited from BaseEvaluator.
- Type:
Optional[Union[str, ModelPackageGroup]]
Example
from sagemaker.train.evaluate.custom_scorer_evaluator import ( CustomScorerEvaluator, get_builtin_metrics ) from sagemaker.ai_registry.evaluator import Evaluator # Using preset metric BuiltInMetric = get_builtin_metrics() evaluator = CustomScorerEvaluator( evaluator=BuiltInMetric.PRIME_MATH, dataset=my_dataset, base_model="my-model", s3_output_path="s3://bucket/output", mlflow_resource_arn="arn:aws:sagemaker:us-west-2:123456789012:mlflow-tracking-server/my-server" ) # Using custom evaluator my_evaluator = Evaluator.create( name="my-custom-evaluator", function_source="/path/to/evaluator.py", sub_type="AWS/Evaluator" ) evaluator = CustomScorerEvaluator( evaluator=my_evaluator, dataset=my_dataset, base_model="my-model", s3_output_path="s3://bucket/output", mlflow_resource_arn="arn:aws:sagemaker:us-west-2:123456789012:mlflow-tracking-server/my-server" ) # Using evaluator ARN string evaluator = CustomScorerEvaluator( evaluator="arn:aws:sagemaker:us-west-2:123456789012:hub-content/AIRegistry/Evaluator/my-evaluator/1", dataset=my_dataset, base_model="my-model", s3_output_path="s3://bucket/output", mlflow_resource_arn="arn:aws:sagemaker:us-west-2:123456789012:mlflow-tracking-server/my-server" ) job = evaluator.evaluate()
- base_eval_name: str | None#
- dataset: Any#
- evaluate() EvaluationPipelineExecution[source]#
Create and start a custom scorer evaluation job.
- Returns:
The created custom scorer evaluation execution
- Return type:
Example
evaluator = CustomScorerEvaluator( evaluator=BuiltInMetric.CODE_EXECUTIONS, dataset=my_dataset, base_model="my-model", s3_output_path="s3://bucket/output", mlflow_resource_arn="arn:..." ) execution = evaluator.evaluate() execution.wait()
- evaluate_base_model: bool#
- evaluator: str | Any#
- classmethod get_all(session: Any | None = None, region: str | None = None)[source]#
Get all custom scorer evaluation executions.
Uses
EvaluationPipelineExecution.get_all()to retrieve all custom scorer evaluation executions as an iterator.- Parameters:
session (Optional[Any]) – Optional boto3 session. If not provided, will be inferred.
region (Optional[str]) – Optional AWS region. If not provided, will be inferred.
- Yields:
EvaluationPipelineExecution – Custom scorer evaluation execution instances
Example
# Get all custom scorer evaluations as iterator evaluations = CustomScorerEvaluator.get_all() all_executions = list(evaluations) # Or iterate directly for execution in CustomScorerEvaluator.get_all(): print(f"{execution.name}: {execution.status.overall_status}") # With specific session/region evaluations = CustomScorerEvaluator.get_all(session=my_session, region='us-west-2') all_executions = list(evaluations)
- property hyperparameters#
Get evaluation hyperparameters as a FineTuningOptions object.
This property provides access to evaluation hyperparameters with validation, type checking, and user-friendly information display. Hyperparameters are lazily loaded from the JumpStart Hub when first accessed.
- Returns:
Dynamic object with evaluation hyperparameters
- Return type:
- Raises:
ValueError – If base model name is not available or if hyperparameters cannot be loaded
Example
evaluator = CustomScorerEvaluator(...) # Access current values print(evaluator.hyperparameters.temperature) # Modify values (with validation) evaluator.hyperparameters.temperature = 0.5 # Get as dictionary params = evaluator.hyperparameters.to_dict() # Display parameter information evaluator.hyperparameters.get_info() evaluator.hyperparameters.get_info('temperature')
- kms_key_id: str | None#
- mlflow_experiment_name: str | None#
- mlflow_resource_arn: str | None#
- mlflow_run_name: str | None#
- model: str | BaseTrainer | ModelPackage#
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_package_group: str | ModelPackageGroup | None#
- model_post_init(context: Any, /) None#
This function is meant to behave like a BaseModel method to initialise private attributes.
It takes context as an argument since that’s what pydantic-core passes when calling it.
- Parameters:
self – The BaseModel instance.
context – The context.
- region: str | None#
- role: str | None#
- s3_output_path: str#
- sagemaker_session: Any | None#
- sagemaker.train.evaluate.custom_scorer_evaluator.get_builtin_metrics() Type[_BuiltInMetric][source]#
Get the built-in metrics enum for custom scorer evaluation.
This utility function provides access to preset metrics for custom scorer evaluation.
- Returns:
The built-in metric enum class
- Return type:
Type[_BuiltInMetric]
Example
from sagemaker.train.evaluate import get_builtin_metrics BuiltInMetric = get_builtin_metrics() evaluator = CustomScorerEvaluator( evaluator=BuiltInMetric.PRIME_MATH, dataset=my_dataset, base_model="my-model", s3_output_path="s3://bucket/output", mlflow_resource_arn="arn:..." )