SageMaker V3 Custom InferenceSpec Example

SageMaker V3 Custom InferenceSpec Example#

This notebook demonstrates how to create and deploy custom models using InferenceSpec with SageMaker V3 ModelBuilder.

Prerequisites#

Note: Ensure you have sagemaker and ipywidgets installed in your environment. The ipywidgets package is required to monitor endpoint deployment progress in Jupyter notebooks.

# Import required libraries
import json
import uuid
import tempfile
import os
import torch
import torch.nn as nn

from sagemaker.serve.model_builder import ModelBuilder
from sagemaker.serve.spec.inference_spec import InferenceSpec
from sagemaker.serve.builder.schema_builder import SchemaBuilder
from sagemaker.serve.utils.types import ModelServer
from sagemaker.core.resources import EndpointConfig

Step 1: Create a Simple PyTorch Model#

First, let’s create a simple neural network model for demonstration.

class SimpleModel(nn.Module):
    """A simple neural network for classification."""
    def __init__(self):
        super().__init__()
        self.linear = nn.Linear(4, 2)
    
    def forward(self, x):
        return torch.softmax(self.linear(x), dim=1)

# Create and save the model
pytorch_model = SimpleModel()
model_path = tempfile.mkdtemp()

# Save model using TorchScript for deployment
sample_input = torch.tensor([[0.1, 0.2, 0.3, 0.4]], dtype=torch.float32)
traced_model = torch.jit.trace(pytorch_model, sample_input)
model_file = os.path.join(model_path, "model.pth")
torch.jit.save(traced_model, model_file)

print(f"Model saved to: {model_file}")

Step 2: Define Custom InferenceSpec#

Create a custom InferenceSpec that defines how to load and run inference with our model.

class SimpleModelSpec(InferenceSpec):
    """Custom InferenceSpec for our simple PyTorch model."""
    
    def load(self, model_dir: str):
        """Load the PyTorch model from the model directory."""
        model = SimpleModel()
        model_path = os.path.join(model_dir, "model.pth")
        
        if os.path.exists(model_path):
            model = torch.jit.load(model_path, map_location='cpu')
        
        model.eval()
        return model
    
    def invoke(self, input_object: object, model: object):
        """Run inference on the input data."""
        # Handle list input (the expected format)
        if isinstance(input_object, list):
            input_tensor = torch.tensor(input_object, dtype=torch.float32)
        else:
            input_tensor = torch.tensor([[0.1, 0.2, 0.3, 0.4]], dtype=torch.float32)
        
        with torch.no_grad():
            predictions = model(input_tensor)
        
        return predictions.tolist()

print("Custom InferenceSpec defined successfully!")

Step 3: Create Schema Builder#

Define the input/output schema for our model.

# Create schema builder with sample input/output
sample_input = [[0.1, 0.2, 0.3, 0.4]]  # List format for JSON serialization
sample_output = [[0.9, 0.1]]  # Expected output format

schema_builder = SchemaBuilder(sample_input, sample_output)
print("Schema builder created successfully!")

Step 4: Configure ModelBuilder#

Set up the ModelBuilder with our custom InferenceSpec.

# Configuration
MODEL_NAME_PREFIX = "custom-spec-model"
ENDPOINT_NAME_PREFIX = "custom-spec-endpoint"

# Generate unique identifiers
unique_id = str(uuid.uuid4())[:8]
model_name = f"{MODEL_NAME_PREFIX}-{unique_id}"
endpoint_name = f"{ENDPOINT_NAME_PREFIX}-{unique_id}"

# Create ModelBuilder with custom InferenceSpec
inference_spec = SimpleModelSpec()
model_builder = ModelBuilder(
    inference_spec=inference_spec,
    model_path=model_path,
    model_server=ModelServer.TORCHSERVE,
    schema_builder=schema_builder
)

print(f"ModelBuilder configured for model: {model_name}")
print(f"Target endpoint: {endpoint_name}")

Step 5: Build the Model#

Build the model artifacts for deployment.

# Build the model
core_model = model_builder.build(model_name=model_name)
print(f"Model Successfully Created: {core_model.model_name}")

Step 6: Deploy the Model#

Deploy the model to a SageMaker endpoint.

# Deploy the model
core_endpoint = model_builder.deploy(endpoint_name=endpoint_name)
print(f"Endpoint Successfully Created: {core_endpoint.endpoint_name}")

Step 7: Test the Model#

Send test requests to verify the model works correctly.

# Test 1: Single prediction
test_data_1 = [[0.1, 0.2, 0.3, 0.4]]

result_1 = core_endpoint.invoke(
    body=json.dumps(test_data_1),
    content_type="application/json"
)

prediction_1 = json.loads(result_1.body.read().decode('utf-8'))
print(f"Single Prediction: {prediction_1}")

# Test 2: Batch prediction
test_data_2 = [
    [0.1, 0.2, 0.3, 0.4],
    [0.5, 0.6, 0.7, 0.8],
    [0.2, 0.3, 0.4, 0.5]
]

result_2 = core_endpoint.invoke(
    body=json.dumps(test_data_2),
    content_type="application/json"
)

prediction_2 = json.loads(result_2.body.read().decode('utf-8'))
print(f"Batch Prediction: {prediction_2}")

Step 8: Clean Up Resources#

Clean up all created resources and temporary files.

# Clean up AWS resources
core_endpoint_config = EndpointConfig.get(endpoint_config_name=core_endpoint.endpoint_name)

core_model.delete()
core_endpoint.delete()
core_endpoint_config.delete()

# Clean up temporary files
import shutil
shutil.rmtree(model_path)

print("All resources and temporary files successfully deleted!")

Summary#

This notebook demonstrated:

Creating a simple PyTorch model
Defining a custom InferenceSpec with load() and invoke() methods
Setting up schema builders for input/output validation
Configuring ModelBuilder with TorchServe
Building and deploying the model
Testing both single and batch predictions
Proper cleanup of resources

Custom InferenceSpecs provide maximum flexibility for deploying any model with custom preprocessing, postprocessing, and inference logic!