sagemaker.train.evaluate.base_evaluator

sagemaker.train.evaluate.base_evaluator#

Base evaluator module for SageMaker Model Evaluation.

This module provides the base class for all evaluators in the SageMaker Model Evaluation Module. It handles common functionality such as model resolution, MLflow integration, and AWS resource configuration.

Classes

BaseEvaluator(*[, region, role, ...])

Base class for SageMaker model evaluators.

Bases: BaseModel

Base class for SageMaker model evaluators.

Provides common functionality for all evaluators including model resolution, MLflow integration, and AWS resource configuration. Subclasses must implement the evaluate() method.

region#

AWS region for evaluation jobs. If not provided, will use SAGEMAKER_REGION env var or default region.

Type:: Optional[str]

role#

IAM execution role ARN for SageMaker pipeline and training jobs. If not provided, will be derived from the session’s caller identity. Use this when running outside SageMaker-managed environments (e.g., local notebooks, CI/CD) where the caller identity is not a SageMaker-assumable role.

Type:: Optional[str]

sagemaker_session#

SageMaker session object. If not provided, a default session will be created automatically.

Type:: Optional[Any]

model#

Model for evaluation. Can be: - JumpStart model ID (str): e.g., ‘llama3-2-1b-instruct’ - ModelPackage object: A fine-tuned model package - ModelPackage ARN (str): e.g., ‘arn:aws:sagemaker:region:account:model-package/name/version’ - BaseTrainer object: A completed training job (i.e., it must have _latest_training_job with output_model_package_arn populated)

Type:: Union[str, Any]

base_eval_name#

Optional base name for evaluation jobs. This name is used as the PipelineExecutionDisplayName when creating the SageMaker pipeline execution. The actual display name will be “{base_eval_name}-{timestamp}”. This parameter can be used to cross-reference the pipeline execution ARN with a human-readable display name in the SageMaker console. If not provided, a unique name will be generated automatically in the format “eval-{model_name}-{uuid}”.

Type:: Optional[str]

s3_output_path#

S3 location for evaluation outputs. Required.

Type:: str

mlflow_resource_arn#

MLflow resource ARN for experiment tracking. Optional. If not provided, the system will attempt to resolve it using the default MLflow app experience (checks domain match, account default, or creates a new app). Supported formats: - MLflow tracking server: arn:aws:sagemaker:region:account:mlflow-tracking-server/name - MLflow app: arn:aws:sagemaker:region:account:mlflow-app/app-id

Type:: Optional[str]

mlflow_experiment_name#

Optional MLflow experiment name for tracking evaluation runs.

Type:: Optional[str]

mlflow_run_name#

Optional MLflow run name for tracking individual evaluation executions.

Type:: Optional[str]

networking#

VPC configuration for evaluation jobs. Accepts a sagemaker_core.shapes.VpcConfig object with security_group_ids and subnets attributes. When provided, evaluation jobs will run within the specified VPC for enhanced security and access to private resources.

Type:: Optional[VpcConfig]

kms_key_id#

AWS KMS key ID for encrypting output data. When provided, evaluation job outputs will be encrypted using this KMS key for enhanced data security.

Type:: Optional[str]

model_package_group#

Model package group. Accepts: 1. ARN string (e.g., ‘arn:aws:sagemaker:region:account:model-package-group/name’) 2. ModelPackageGroup object (ARN will be extracted from model_package_group_arn attribute) 3. Model package group name string (will fetch the object and extract ARN) Required when model is a JumpStart model ID. Optional when model is a ModelPackage ARN/object (will be inferred automatically).

Type:: Optional[Union[str, ModelPackageGroup]]

class Config[source]#

Bases: object

arbitrary_types_allowed = True#

base_eval_name: str | None#

evaluate() → Any[source]#

Create and start an evaluation execution.

This method must be implemented by subclasses to define the specific evaluation logic for different evaluation types (benchmark, custom scorer, LLM-as-judge, etc.).

Returns:: The created evaluation execution object.
Return type:: EvaluationPipelineExecution
Raises:: NotImplementedError – This is an abstract method that must be implemented by subclasses.

Example

>>> # In a subclass implementation
>>> class CustomEvaluator(BaseEvaluator):
...     def evaluate(self):
...         # Create pipeline definition
...         pipeline_definition = self._build_pipeline()
...         # Start execution
...         return EvaluationPipelineExecution.start(...)

kms_key_id: str | None#

mlflow_experiment_name: str | None#

mlflow_resource_arn: str | None#

mlflow_run_name: str | None#

model: str | BaseTrainer | ModelPackage#

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_package_group: str | ModelPackageGroup | None#

networking: VpcConfig | None#

region: str | None#

role: str | None#

s3_output_path: str#

sagemaker_session: Any | None#

sagemaker.train.evaluate.base_evaluator

Contents

sagemaker.train.evaluate.base_evaluator#