SageMaker V3 In-Process Mode Example#

This notebook demonstrates how to use SageMaker V3 ModelBuilder in In-Process mode for fast local development and testing.

# Import required libraries
import uuid

from sagemaker.serve.model_builder import ModelBuilder
from sagemaker.serve.spec.inference_spec import InferenceSpec
from sagemaker.serve.builder.schema_builder import SchemaBuilder
from sagemaker.serve.mode.function_pointers import Mode

Step 1: Define a Simple InferenceSpec#

Create a lightweight InferenceSpec for mathematical operations - perfect for in-process testing.

class MathInferenceSpec(InferenceSpec):
    """Simple math operations for IN_PROCESS testing."""
    
    def load(self, model_dir: str):
        """Load a simple math 'model'."""
        return {"operation": "multiply", "factor": 2.0}
    
    def invoke(self, input_object, model):
        """Perform math operation."""
        if isinstance(input_object, dict) and "numbers" in input_object:
            numbers = input_object["numbers"]
        elif isinstance(input_object, list):
            numbers = input_object
        else:
            numbers = [float(input_object)]
        
        factor = model["factor"]
        result = [num * factor for num in numbers]
        
        return {"result": result, "operation": f"multiply by {factor}"}

print("Math InferenceSpec defined successfully!")

Step 2: Create Schema Builder#

Define the expected input and output format.

# Create schema builder for math operations
sample_input = {"numbers": [1.0, 2.0, 3.0]}
sample_output = {"result": [2.0, 4.0, 6.0], "operation": "multiply by 2.0"}
schema_builder = SchemaBuilder(sample_input, sample_output)

print("Schema builder created successfully!")

Step 3: Configure ModelBuilder for In-Process Mode#

Set up ModelBuilder to run in IN_PROCESS mode for fast local execution.

# Configuration
MODEL_NAME_PREFIX = "inprocess-math-model"
ENDPOINT_NAME_PREFIX = "inprocess-math-endpoint"

# Generate unique identifiers
unique_id = str(uuid.uuid4())[:8]
model_name = f"{MODEL_NAME_PREFIX}-{unique_id}"
endpoint_name = f"{ENDPOINT_NAME_PREFIX}-{unique_id}"

# Create ModelBuilder in IN_PROCESS mode
inference_spec = MathInferenceSpec()
model_builder = ModelBuilder(
    inference_spec=inference_spec,
    schema_builder=schema_builder,
    mode=Mode.IN_PROCESS  # This is the key difference!
)

print(f"ModelBuilder configured for in-process model: {model_name}")
print(f"Target endpoint: {endpoint_name}")

Step 4: Build the Model#

Build the model - this is very fast in in-process mode!

# Build the model
core_model = model_builder.build(model_name=model_name)
print(f"Model Successfully Created: {core_model.model_name}")

Step 5: Deploy Locally#

Deploy the model locally - no containers or AWS resources needed!

# Deploy locally in in-process mode
local_endpoint = model_builder.deploy_local(endpoint_name=endpoint_name)
print(f"Local Endpoint Successfully Created: {local_endpoint.endpoint_name}")
print("Note: This runs entirely in your Python process - no containers!")

Step 6: Test the In-Process Model#

Test various mathematical operations with instant responses.

# Test 1: Simple multiplication
test_data_1 = {"numbers": [1.0, 2.0, 3.0]}

result_1 = local_endpoint.invoke(
    body=test_data_1,
    content_type="application/json"
)

print(f"Test 1 - Simple multiplication: {result_1.body}")
# Test 2: Larger numbers
test_data_2 = {"numbers": [10.5, 20.3, 30.7, 40.1]}

result_2 = local_endpoint.invoke(
    body=test_data_2,
    content_type="application/json"
)

print(f"Test 2 - Larger numbers: {result_2.body}")
# Test 3: Single number (alternative input format)
test_data_3 = [5.0]  # Direct list format

result_3 = local_endpoint.invoke(
    body=test_data_3,
    content_type="application/json"
)

print(f"Test 3 - Single number: {result_3.body}")

Step 7: Performance Testing#

Demonstrate the speed advantage of in-process mode.

import time

# Performance test - multiple rapid requests
start_time = time.time()
num_requests = 100

for i in range(num_requests):
    test_data = {"numbers": [i, i+1, i+2]}
    result = local_endpoint.invoke(
        body=test_data,
        content_type="application/json"
    )

end_time = time.time()
total_time = end_time - start_time
avg_time = total_time / num_requests

print(f"Performance Test Results:")
print(f"- Total requests: {num_requests}")
print(f"- Total time: {total_time:.3f} seconds")
print(f"- Average time per request: {avg_time*1000:.2f} ms")
print(f"- Requests per second: {num_requests/total_time:.1f}")

Step 8: Clean Up#

Clean up the in-process resources (very fast!).

# Clean up in-process endpoint
if local_endpoint and hasattr(local_endpoint, 'in_process_mode_obj'):
    if local_endpoint.in_process_mode_obj:
        local_endpoint.in_process_mode_obj.destroy_server()

print("In-process model and endpoint successfully cleaned up!")
print("Note: No AWS resources were created, so no cloud cleanup needed.")

Summary#

This notebook demonstrated:

  1. Creating a simple InferenceSpec for mathematical operations

  2. Configuring ModelBuilder for IN_PROCESS mode

  3. Building and deploying models locally without containers

  4. Testing with various input formats

  5. Performance testing showing the speed of in-process execution

  6. Quick cleanup with no AWS resources

Benefits of In-Process Mode:#

  • Ultra-fast: No container startup time

  • No AWS costs: Runs entirely locally

  • Perfect for development: Rapid iteration and testing

  • Easy debugging: Direct access to Python objects

  • Lightweight: Minimal resource usage

Use in-process mode for development, testing, and lightweight inference tasks!