SageMaker V3 In-Process Mode Example#
This notebook demonstrates how to use SageMaker V3 ModelBuilder in In-Process mode for fast local development and testing.
# Import required libraries
import uuid
from sagemaker.serve.model_builder import ModelBuilder
from sagemaker.serve.spec.inference_spec import InferenceSpec
from sagemaker.serve.builder.schema_builder import SchemaBuilder
from sagemaker.serve.mode.function_pointers import Mode
Step 1: Define a Simple InferenceSpec#
Create a lightweight InferenceSpec for mathematical operations - perfect for in-process testing.
class MathInferenceSpec(InferenceSpec):
"""Simple math operations for IN_PROCESS testing."""
def load(self, model_dir: str):
"""Load a simple math 'model'."""
return {"operation": "multiply", "factor": 2.0}
def invoke(self, input_object, model):
"""Perform math operation."""
if isinstance(input_object, dict) and "numbers" in input_object:
numbers = input_object["numbers"]
elif isinstance(input_object, list):
numbers = input_object
else:
numbers = [float(input_object)]
factor = model["factor"]
result = [num * factor for num in numbers]
return {"result": result, "operation": f"multiply by {factor}"}
print("Math InferenceSpec defined successfully!")
Step 2: Create Schema Builder#
Define the expected input and output format.
# Create schema builder for math operations
sample_input = {"numbers": [1.0, 2.0, 3.0]}
sample_output = {"result": [2.0, 4.0, 6.0], "operation": "multiply by 2.0"}
schema_builder = SchemaBuilder(sample_input, sample_output)
print("Schema builder created successfully!")
Step 3: Configure ModelBuilder for In-Process Mode#
Set up ModelBuilder to run in IN_PROCESS mode for fast local execution.
# Configuration
MODEL_NAME_PREFIX = "inprocess-math-model"
ENDPOINT_NAME_PREFIX = "inprocess-math-endpoint"
# Generate unique identifiers
unique_id = str(uuid.uuid4())[:8]
model_name = f"{MODEL_NAME_PREFIX}-{unique_id}"
endpoint_name = f"{ENDPOINT_NAME_PREFIX}-{unique_id}"
# Create ModelBuilder in IN_PROCESS mode
inference_spec = MathInferenceSpec()
model_builder = ModelBuilder(
inference_spec=inference_spec,
schema_builder=schema_builder,
mode=Mode.IN_PROCESS # This is the key difference!
)
print(f"ModelBuilder configured for in-process model: {model_name}")
print(f"Target endpoint: {endpoint_name}")
Step 4: Build the Model#
Build the model - this is very fast in in-process mode!
# Build the model
core_model = model_builder.build(model_name=model_name)
print(f"Model Successfully Created: {core_model.model_name}")
Step 5: Deploy Locally#
Deploy the model locally - no containers or AWS resources needed!
# Deploy locally in in-process mode
local_endpoint = model_builder.deploy_local(endpoint_name=endpoint_name)
print(f"Local Endpoint Successfully Created: {local_endpoint.endpoint_name}")
print("Note: This runs entirely in your Python process - no containers!")
Step 6: Test the In-Process Model#
Test various mathematical operations with instant responses.
# Test 1: Simple multiplication
test_data_1 = {"numbers": [1.0, 2.0, 3.0]}
result_1 = local_endpoint.invoke(
body=test_data_1,
content_type="application/json"
)
print(f"Test 1 - Simple multiplication: {result_1.body}")
# Test 2: Larger numbers
test_data_2 = {"numbers": [10.5, 20.3, 30.7, 40.1]}
result_2 = local_endpoint.invoke(
body=test_data_2,
content_type="application/json"
)
print(f"Test 2 - Larger numbers: {result_2.body}")
# Test 3: Single number (alternative input format)
test_data_3 = [5.0] # Direct list format
result_3 = local_endpoint.invoke(
body=test_data_3,
content_type="application/json"
)
print(f"Test 3 - Single number: {result_3.body}")
Step 7: Performance Testing#
Demonstrate the speed advantage of in-process mode.
import time
# Performance test - multiple rapid requests
start_time = time.time()
num_requests = 100
for i in range(num_requests):
test_data = {"numbers": [i, i+1, i+2]}
result = local_endpoint.invoke(
body=test_data,
content_type="application/json"
)
end_time = time.time()
total_time = end_time - start_time
avg_time = total_time / num_requests
print(f"Performance Test Results:")
print(f"- Total requests: {num_requests}")
print(f"- Total time: {total_time:.3f} seconds")
print(f"- Average time per request: {avg_time*1000:.2f} ms")
print(f"- Requests per second: {num_requests/total_time:.1f}")
Step 8: Clean Up#
Clean up the in-process resources (very fast!).
# Clean up in-process endpoint
if local_endpoint and hasattr(local_endpoint, 'in_process_mode_obj'):
if local_endpoint.in_process_mode_obj:
local_endpoint.in_process_mode_obj.destroy_server()
print("In-process model and endpoint successfully cleaned up!")
print("Note: No AWS resources were created, so no cloud cleanup needed.")
Summary#
This notebook demonstrated:
Creating a simple InferenceSpec for mathematical operations
Configuring ModelBuilder for IN_PROCESS mode
Building and deploying models locally without containers
Testing with various input formats
Performance testing showing the speed of in-process execution
Quick cleanup with no AWS resources
Benefits of In-Process Mode:#
Ultra-fast: No container startup time
No AWS costs: Runs entirely locally
Perfect for development: Rapid iteration and testing
Easy debugging: Direct access to Python objects
Lightweight: Minimal resource usage
Use in-process mode for development, testing, and lightweight inference tasks!