Intelligent Defaults and Logging Configuration in SageMakerCore#


Introductions#

In this notebook, we will walkthrough the setup and usage of intelligent defaults in the SageMakerCore SDK. Additionally, this notebook contains a section with the steps required for configuring logging levels to assist in the debugging of issues that arise while using the SDK.

Intelligent Defaults#

Intelligent Defaults is a feature provided by the SageMakerCore SDK to assist users in defining default values to be auto populated into the AWS API Request parameters. For example, if a user/admin wants all of their AWS Resources to use a specific VPC Config during creation this can be defined in the Intelligent Defaults Configs. Intelligent Defaults supports:

  1. GlobalDefaults - default values applied across SageMaker API calls

  2. Resource Specific Defaults - defaults applied only when creating a specific resource

An Example of the strucuture of the Intelligent Defaults Config is below:

{
    "SchemaVesion": "1.0",
    "SageMaker": {
        "PythonSDK": {
            "Resources": {
                "GlobalDefaults": {
                    "vpc_config": {
                        "security_group_ids": [
                            "sg-xxxxxxxxxxxxxxxxx" // Replace with security group id
                        ],
                        "subnets": [
                            "subnet-xxxxxxxxxxxxxxxxx", // Replace with subnet id
                            "subnet-xxxxxxxxxxxxxxxxx" // Replace with subnet id
                        ]
                    }
                    // ...
                },
                "TrainingJob": {
                    "role_arn": "arn:aws:xxxxxxxxxxx:role/xxxxx", // Replace with role arn
                    "output_data_config": {
                        "s3_output_path": "s3://xxxxxxxxxxx", // Replace with S3 URI
                    },
                    // ...
                }
            }
        }
    }
}

Logging Levels#

To assist in debugging issues originating within the SDK, SageMakerCore provides a simple utility method - configure_logging()

To set the logging level users have 2 options:

  1. Pass a string parameter to utility method with log level they desire - configure_logging("DEBUG")

  2. Set the LOG_LEVEL=INFO environment variable and call configure_logging() without a parameter

In a later section in this notebook, we will walk through an example of how these options would look like in practice for a user.

Pre-Requisites#

Install Latest SageMakerCore#

All SageMakerCore beta distributions will be released to a private s3 bucket. After being allowlisted, run the cells below to install the latest version of SageMakerCore from s3://sagemaker-core-beta-artifacts/sagemaker_core-latest.tar.gz

Ensure you are using a kernel with python version >=3.8

# Uninstall previous version of sagemaker-core and restart kernel
!pip uninstall sagemaker-core -y
# Install the latest version of sagemaker-core

!pip install sagemaker-core --upgrade
# Check the version of sagemaker-core
!pip show -v sagemaker-core

Install Additional Packages#

# Install additionall packages

!pip install -U scikit-learn pandas boto3

Setup#

Let’s start by specifying:

  • AWS region.

  • The IAM role arn used to give learning and hosting access to your data. Ensure your enviornment has AWS Credentials configured.

  • The S3 bucket that you want to use for storing training and model data.

from sagemaker.core.helper.session_helper import Session, get_execution_role
from rich import print

# Get region, role, bucket

sagemaker_session = Session()
region = sagemaker_session.boto_region_name
role = get_execution_role()
bucket = sagemaker_session.default_bucket()
print(role)

Load and Prepare Dataset#

For this example, we will be using the IRIS data set from sklearn.datasets to train our XGBoost container.

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

import pandas as pd

# Get IRIS Data

iris = load_iris()
iris_df = pd.DataFrame(iris.data, columns=iris.feature_names)
iris_df['target'] = iris.target
import os

# Prepare Data

os.makedirs('./data', exist_ok=True)

iris_df = iris_df[['target'] + [col for col in iris_df.columns if col != 'target']]

train_data, test_data = train_test_split(iris_df, test_size=0.2, random_state=42)

train_data.to_csv('./data/train.csv', index=False, header=False)

Upload Data to S3#

In this step, we will upload the train and test data to the S3 bucket configured earlier using sagemaker_session.default_bucket()

# Upload Data

prefix = "DEMO-scikit-iris"
TRAIN_DATA = "train.csv"
DATA_DIRECTORY = "data"

train_input = sagemaker_session.upload_data(
    DATA_DIRECTORY, bucket=bucket, key_prefix="{}/{}".format(prefix, DATA_DIRECTORY)
)

s3_input_path = "s3://{}/{}/data/{}".format(bucket, prefix, TRAIN_DATA)
s3_output_path = "s3://{}/{}/output".format(bucket, prefix)

print(s3_input_path)
print(s3_output_path)

Fetch the XGBoost Image URI#

In this step, we will fetch the XGBoost Image URI we will use as an input parameter when creating an AWS TrainingJob

from sagemaker.core import image_uris

image = image_uris.retrieve(
    framework="xgboost",
    region=region,
    version='latest'
)

Intelligent Defaults#

Create Intelligent Defaults JSON#

In order for SageMakerCore to pick up the Intelligent Defaults Configs to populate API calls, we first must create the json config file and set the SAGEMAKER_CORE_ADMIN_CONFIG_OVERRIDE enviornment variable.

Below we will create the config file at data/defaults.json and assign this path to the SAGEMAKER_CORE_ADMIN_CONFIG_OVERRIDE enviornment variable.

import os
import json

DEFAULTS_CONTENT = {
    "SchemaVesion": "1.0",
    "SageMaker": {
        "PythonSDK": {
            "Resources": {
                "GlobalDefaults": {
                    "vpc_config": {
                        "security_group_ids": [
                            "sg-xxxxxxxxxxxxxxxxx" # Replace with security group id
                        ],
                        "subnets": [
                            "subnet-xxxxxxxxxxxxxxxxx", # Replace with subnet id
                            "subnet-xxxxxxxxxxxxxxxxx" # Replace with subnet id
                        ]
                    }
                },
                "TrainingJob": {
                    "role_arn": role,
                    "output_data_config": {
                        "s3_output_path": s3_output_path
                    },
            
                }
            }
        }
    }
}

path_to_defaults = os.path.join(DATA_DIRECTORY, "defaults.json")
with open(os.path.join(DATA_DIRECTORY, "defaults.json"), "w") as f:
    json.dump(DEFAULTS_CONTENT, f, indent=4)  
import os
# Setting path of Config file in environment variable 
os.environ['SAGEMAKER_CORE_ADMIN_CONFIG_OVERRIDE'] = path_to_defaults

Using GlobalDefaults#

In the below example, a Cluster resource will be created using the vpc_config defined under the SageMaker.PythonSDK.Resources.GlobalDefaults.

import time
from sagemaker.core.resources import Cluster
from sagemaker.core.shapes import ClusterInstanceGroupSpecification, ClusterLifeCycleConfig
    
cluster_name_v3 = 'xgboost-cluster-' + time.strftime("%Y-%m-%d-%H-%M-%S", time.gmtime())

# Use vpc_config from Intelligent Defaults JSON config file under the SageMaker.PythonSDK.Resources.GlobalDefaults key
cluster = Cluster.create(
    cluster_name=cluster_name_v3,
    instance_groups=[
        ClusterInstanceGroupSpecification(
            instance_count=1, 
            instance_group_name="instance-group-11",
            instance_type="ml.m5.4xlarge",
            life_cycle_config=ClusterLifeCycleConfig(source_s3_uri=s3_input_path, on_create="dothis"),
            execution_role=role
        )
    ]
)
cluster.wait_for_status("InService")

Using Resource Defaults#

In the below example, a TrainingJob resource will be created using the role and output_data_config defined under the SageMaker.Python.Resources.TrainingJob key.

Note: Because TrainingJob also excepts a vpc_config parameter, the vpc_config parameter will be populated from the GlobalDefaults

import time
from sagemaker.core.resources import TrainingJob
from sagemaker.core.shapes import  AlgorithmSpecification, Channel, DataSource, S3DataSource, ResourceConfig, StoppingCondition

job_name_v3 = 'xgboost-iris-' + time.strftime("%Y-%m-%d-%H-%M-%S", time.gmtime())

# Use role and output_data_config from Intelligent Defaults JSON config file under the SageMaker.PythonSDK.Resources.TrainingJob key
# Use vpc_config from Intelligent Defaults JSON config file under the SageMaker.PythonSDK.Resources.GlobalDefaults key

training_job = TrainingJob.create(
    training_job_name=job_name_v3,
    hyper_parameters={
        'objective': 'multi:softmax',
        'num_class': '3',
        'num_round': '10',
        'eval_metric': 'merror'
    },
    algorithm_specification=AlgorithmSpecification(
        training_image=image,
        training_input_mode='File'
    ),
    input_data_config=[
        Channel(
            channel_name='train',
            content_type='csv',
            compression_type='None',
            record_wrapper_type='None',
            data_source=DataSource(
                s3_data_source=S3DataSource(
                    s3_data_type='S3Prefix',
                    s3_uri=s3_input_path,
                    s3_data_distribution_type='FullyReplicated'
                )
            )
        )
    ],
    resource_config=ResourceConfig(
        instance_type='ml.m4.xlarge',
        instance_count=1,
        volume_size_in_gb=30
    ),
    stopping_condition=StoppingCondition(
        max_runtime_in_seconds=600
    )
)

Configure Logging Levels#

Below are 2 examples of how a SageMakerCore user could configure the logging level of the SDK to assist with debugging.

To set the logging level users have 2 options:

  1. Pass a string parameter to utility method with log level they desire - configure_logging("DEBUG")

  2. Set the LOG_LEVEL=INFO environment variable and call configure_logging() without a parameter

Configure Logging with Parameter#

# Setting log_level to DEBUG using configure_logging with string parameter 
from sagemaker.core.utils import configure_logging

configure_logging('DEBUG')
# Get TrainingJob with DEBUG log_level
from sagemaker.core.resources import TrainingJob

training_job = TrainingJob.get(job_name_v3)

Configure Logging with Enviornment Variable#

# Setting log_level to INFO using env variable
!export LOG_LEVEL=INFO

configure_logging()
# List TrainingJobs with INFO log_level
training_job = TrainingJob.get(job_name_v3)

Delete All SageMaker Resources#

The following code block will call the delete() method for any SageMaker Core Resources created during the execution of this notebook which were assigned to local or global variables. If you created any additional deleteable resources without assigning the returning object to a unique variable, you will need to delete the resource manually by doing something like:

resource = Resource.get("resource-name")
resource.delete()
# Delete any sagemaker core resource objects created in this notebook
def delete_all_sagemaker_resources():
    all_objects = list(locals().values()) + list(globals().values())
    deletable_objects = [obj for obj in all_objects if hasattr(obj, 'delete') and obj.__class__.__module__ == 'sagemaker.core.resources']
    
    for obj in deletable_objects:
        obj.delete()
        
delete_all_sagemaker_resources()