SageMaker V3 JumpStart Training Example#

This notebook demonstrates how to use SageMaker V3 ModelTrainer with JumpStart models for easy model training and fine-tuning.

Prerequisites#

Note: Ensure you have sagemaker-train and ipywidgets installed in your environment. The ipywidgets package is required to monitor training job progress in Jupyter notebooks.

# Import required libraries
import json
import uuid

from sagemaker.train.model_trainer import ModelTrainer
from sagemaker.core.jumpstart import JumpStartConfig
from sagemaker.core.helper.session_helper import Session, get_execution_role

Step 1: Setup Session and Configuration#

Initialize the SageMaker session and define our training configuration.

# Initialize SageMaker session
sagemaker_session = Session()
role = get_execution_role()

# Configuration
JOB_NAME_PREFIX = "js-v3-training-example"

# Generate unique identifier
unique_id = str(uuid.uuid4())[:8]
base_job_name = f"{JOB_NAME_PREFIX}-{unique_id}"

print(f"Base job name: {base_job_name}")
print(f"SageMaker execution role: {role}")

Step 2: Train HuggingFace BERT Model#

Train a HuggingFace BERT model for text classification using JumpStart.

# Configure JumpStart for HuggingFace BERT
bert_jumpstart_config = JumpStartConfig(
    model_id="huggingface-spc-bert-base-cased",
    accept_eula=False  # This model doesn't require EULA acceptance
)

# Create ModelTrainer from JumpStart config
bert_trainer = ModelTrainer.from_jumpstart_config(
    jumpstart_config=bert_jumpstart_config,
    base_job_name=f"{base_job_name}-bert",
    hyperparameters={
        "epochs": 1,  # Set to 1 for quick demonstration
        "learning_rate": 5e-5,
        "train_batch_size": 32
    },
    sagemaker_session=sagemaker_session
)

print("BERT ModelTrainer created successfully from JumpStart config!")
# Start BERT training
print("Starting BERT training job...")
print("Note: This will use the default JumpStart dataset and may take 10-15 minutes.")

bert_trainer.train()
print(f"BERT training job completed!")

Step 3: Train XGBoost Classification Model#

Train an XGBoost model for classification tasks using JumpStart.

# Configure JumpStart for XGBoost
xgboost_jumpstart_config = JumpStartConfig(
    model_id="xgboost-classification-model"
)

# Create ModelTrainer from JumpStart config
xgboost_trainer = ModelTrainer.from_jumpstart_config(
    jumpstart_config=xgboost_jumpstart_config,
    base_job_name=f"{base_job_name}-xgboost",
    hyperparameters={
        "num_round": 10,  # Reduced for quick demonstration
        "max_depth": 5,
        "eta": 0.2,
        "objective": "binary:logistic"
    },
    sagemaker_session=sagemaker_session
)

print("XGBoost ModelTrainer created successfully from JumpStart config!")
# Start XGBoost training
print("Starting XGBoost training job...")
print("Note: This will use the default JumpStart dataset and should complete in 5-10 minutes.")

xgboost_trainer.train()
print(f"XGBoost training job completed!")

Step 4: Train CatBoost Regression Model#

Train a CatBoost model for regression tasks using JumpStart.

# Configure JumpStart for CatBoost
catboost_jumpstart_config = JumpStartConfig(
    model_id="catboost-regression-model"
)

# Create ModelTrainer from JumpStart config
catboost_trainer = ModelTrainer.from_jumpstart_config(
    jumpstart_config=catboost_jumpstart_config,
    base_job_name=f"{base_job_name}-catboost",
    hyperparameters={
        "iterations": 50,  # Reduced for quick demonstration
        "learning_rate": 0.1,
        "depth": 6,
        "loss_function": "RMSE"
    },
    sagemaker_session=sagemaker_session
)

print("CatBoost ModelTrainer created successfully from JumpStart config!")
# Start CatBoost training
print("Starting CatBoost training job...")
print("Note: This will use the default JumpStart dataset and should complete in 5-10 minutes.")

catboost_trainer.train()
print(f"CatBoost training job completed!")

Step 5: Review Training Results#

Check the status and results of our training jobs.

# Display training job information
training_jobs = [
    ("BERT", bert_trainer),
    ("XGBoost", xgboost_trainer),
    ("CatBoost", catboost_trainer)
]

print("Training Job Summary:")
print("=" * 50)

for model_name, trainer in training_jobs:
    job_name = trainer._latest_training_job.training_job_name
    model_artifacts = trainer._latest_training_job.model_artifacts
    
    print(f"\n{model_name} Model:")
    print(f"  Job Name: {job_name}")
    print(f"  Model Artifacts: {model_artifacts}")
    print(f"  Status: Completed")

Step 6: Access Training Metrics (Optional)#

View training metrics and logs from CloudWatch.

# Example: Access training job details
print("Training Job Details:")
print("\nTo view detailed training metrics and logs:")
print("1. Go to the SageMaker Console")
print("2. Navigate to 'Training' > 'Training jobs'")
print("3. Search for jobs with prefix:", base_job_name)
print("4. Click on any job to view metrics, logs, and model artifacts")

# You can also access logs programmatically
print("\nProgrammatic access to logs:")
for model_name, trainer in training_jobs:
    print(f"{model_name}: trainer.latest_training_job.describe()")

Step 7: Discovery other models available in Jumpstart Hub (Optional)#

from sagemaker.core.jumpstart.notebook_utils import list_jumpstart_models
from sagemaker.core.jumpstart.search import search_public_hub_models

# List all available JumpStart models
models = list_jumpstart_models()

# Filter by framework (e.g., HuggingFace)
huggingface_models = list_jumpstart_models(filter="framework == huggingface")

print(huggingface_models)
# Search for specific models
results = search_public_hub_models(query="bert")

print(results)
# Search with logical expressions
text_gen_models = search_public_hub_models(
    query="@task:text-generation"
)

print(text_gen_models)
# Complex queries
hf_bert = search_public_hub_models(
    query="@framework:huggingface AND bert"
)

print(hf_bert)

Summary#

This notebook demonstrated:

  1. Creating ModelTrainer instances from JumpStart configurations

  2. Training multiple model types (BERT, XGBoost, CatBoost) with custom hyperparameters

  3. Using JumpStart’s built-in datasets and training scripts

  4. Monitoring training job progress and results

Benefits of JumpStart Training:#

  • Pre-configured models: No need to write training scripts or handle data preprocessing

  • Best practices: Optimized hyperparameters and training configurations

  • Multiple frameworks: Support for HuggingFace, XGBoost, CatBoost, and more

  • Easy customization: Override hyperparameters while keeping proven defaults

  • Built-in datasets: Start training immediately with curated datasets

Next Steps:#

  • Deploy trained models using SageMaker V3 ModelBuilder

  • Fine-tune models with your own datasets

  • Experiment with different hyperparameters

  • Set up automated training pipelines

JumpStart training with V3 ModelTrainer makes it incredibly easy to get started with machine learning while maintaining the flexibility to customize as needed!