sagemaker.core.remote_function.spark_config#

This module is used to define the Spark job config to remote function.

Classes

SparkConfig([submit_jars, submit_py_files, ...])

This is the class to initialize the spark configurations for remote function

SparkConfigUtils()

Util class for spark configurations

class sagemaker.core.remote_function.spark_config.SparkConfig(submit_jars: List[str] | None = None, submit_py_files: List[str] | None = None, submit_files: List[str] | None = None, configuration: List[Dict] | Dict | None = None, spark_event_logs_uri: str | None = None)[source]#

Bases: object

This is the class to initialize the spark configurations for remote function

submit_jars#

A list which contains paths to the jars which are going to be submitted to Spark job. The location can be a valid s3 uri or local path to the jar. Defaults to None.

Type:

Optional[List[str]]

submit_py_files#

A list which contains paths to the python files which are going to be submitted to Spark job. The location can be a valid s3 uri or local path to the python file. Defaults to None.

Type:

Optional[List[str]]

submit_files#

A list which contains paths to the files which are going to be submitted to Spark job. The location can be a valid s3 uri or local path to the python file. Defaults to None.

Type:

Optional[List[str]]

configuration#

Configuration for Hadoop, Spark, or Hive. List or dictionary of EMR-style classifications. https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-configure-apps.html

Type:

list[dict] or dict

spark_event_logs_s3_uri#

S3 path where Spark application events will be published to.

Type:

str

configuration: List[Dict] | Dict | None#
spark_event_logs_uri: str | None#
submit_files: List[str] | None#
submit_jars: List[str] | None#
submit_py_files: List[str] | None#
class sagemaker.core.remote_function.spark_config.SparkConfigUtils[source]#

Bases: object

Util class for spark configurations

static validate_configuration(configuration: Dict)[source]#

Validates the user-provided Hadoop/Spark/Hive configuration.

This ensures that the list or dictionary the user provides will serialize to JSON matching the schema of EMR’s application configuration

Parameters:

configuration (Dict) – A dict that contains the configuration overrides to the default values. For more information, please visit: https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-configure-apps.html

static validate_s3_uri(spark_output_s3_path)[source]#

Validate whether the URI uses an S3 scheme.

In the future, this validation will perform deeper S3 validation.

Parameters:

spark_output_s3_path (str) – The URI of the Spark output S3 Path.