sagemaker.core.debugger.metrics_config#

The various types of metrics configurations that can be specified in FrameworkProfile.

Classes

DataloaderProfilingConfig([start_step, ...])

The configuration for framework metrics to be collected for data loader profiling.

DetailedProfilingConfig([start_step, ...])

The configuration for framework metrics to be collected for detailed profiling.

HorovodProfilingConfig([start_step, ...])

The configuration for framework metrics from Horovod distributed training.

MetricsConfigBase(name, start_step, ...)

The base class for the metrics configuration.

PythonProfilingConfig([start_step, ...])

The configuration for framework metrics to be collected for Python profiling.

SMDataParallelProfilingConfig([start_step, ...])

Configuration for framework metrics collected from a SageMaker Distributed training job.

StepRange(start_step, num_steps)

Configuration for the range of steps to profile.

TimeRange(start_unix_time, duration)

Configuration for the range of Unix time to profile.

class sagemaker.core.debugger.metrics_config.DataloaderProfilingConfig(start_step=None, num_steps=None, start_unix_time=None, duration=None, profile_default_steps=False, metrics_regex='.*')[source]#

Bases: MetricsConfigBase

The configuration for framework metrics to be collected for data loader profiling.

class sagemaker.core.debugger.metrics_config.DetailedProfilingConfig(start_step=None, num_steps=None, start_unix_time=None, duration=None, profile_default_steps=False)[source]#

Bases: MetricsConfigBase

The configuration for framework metrics to be collected for detailed profiling.

class sagemaker.core.debugger.metrics_config.HorovodProfilingConfig(start_step=None, num_steps=None, start_unix_time=None, duration=None, profile_default_steps=False)[source]#

Bases: MetricsConfigBase

The configuration for framework metrics from Horovod distributed training.

class sagemaker.core.debugger.metrics_config.MetricsConfigBase(name, start_step, num_steps, start_unix_time, duration)[source]#

Bases: object

The base class for the metrics configuration.

It determines the step or time range that needs to be profiled and validates the input value pairs. Available profiling range parameter pairs are (start_step and num_steps) and (start_unix_time and duration). The two parameter pairs are mutually exclusive, and this class validates if one of the two pairs is used. If both pairs are specified, a FOUND_BOTH_STEP_AND_TIME_FIELDS error occurs.

to_json_string()[source]#

Convert this metrics configuration to dictionary formatted as a string.

Calling eval on the return value is the same as calling _to_json directly.

Returns:

This metrics configuration as a dictionary and formatted as a string.

Return type:

str

class sagemaker.core.debugger.metrics_config.PythonProfilingConfig(start_step=None, num_steps=None, start_unix_time=None, duration=None, profile_default_steps=False, python_profiler=PythonProfiler.CPROFILE, cprofile_timer=cProfileTimer.TOTAL_TIME)[source]#

Bases: MetricsConfigBase

The configuration for framework metrics to be collected for Python profiling.

class sagemaker.core.debugger.metrics_config.SMDataParallelProfilingConfig(start_step=None, num_steps=None, start_unix_time=None, duration=None, profile_default_steps=False)[source]#

Bases: MetricsConfigBase

Configuration for framework metrics collected from a SageMaker Distributed training job.

class sagemaker.core.debugger.metrics_config.StepRange(start_step, num_steps)[source]#

Bases: object

Configuration for the range of steps to profile.

It returns the target steps in dictionary format that you can pass to the FrameworkProfile class.

to_json()[source]#

Convert the step range into a dictionary.

Returns:

The step range as a dictionary.

Return type:

dict

class sagemaker.core.debugger.metrics_config.TimeRange(start_unix_time, duration)[source]#

Bases: object

Configuration for the range of Unix time to profile.

It returns the target time duration in dictionary format that you can pass to the FrameworkProfile class.

to_json()[source]#

Convert the time range into a dictionary.

Returns:

The time range as a dictionary.

Return type:

dict