sagemaker.train.aws_batch.training_queue#

Define Queue class for AWS Batch service

Classes

TrainingQueue(queue_name)

TrainingQueue class for AWS Batch service

class sagemaker.train.aws_batch.training_queue.TrainingQueue(queue_name: str)[source]#

Bases: object

TrainingQueue class for AWS Batch service

With this class, customers are able to create a new queue and submit jobs to AWS Batch Service.

get_job(job_name)[source]#

Get a Batch job according to job_name.

Args: job_name: Batch job name.

Returns: The QueuedJob with name matching job_name.

list_jobs(job_name: str | None = None, status: str | None = 'RUNNING') List[TrainingQueuedJob][source]#

List Batch jobs according to job_name or status.

Parameters:
  • job_name – Batch job name.

  • status – Batch job status.

Returns: A list of QueuedJob.

list_jobs_by_share(status: str | None = 'RUNNING', share_identifier: str | None = None, quota_share_name: str | None = None) List[TrainingQueuedJob][source]#

List Batch jobs according to status and share.

Parameters:
  • status – Batch job status.

  • share_identifier – Batch fairshare share identifier.

  • quota_share_name – Batch quota management share name.

Returns: A list of QueuedJob.

map(training_job: ModelTrainer, inputs, job_names: List[str] | None = None, retry_config: Dict | None = None, priority: int | None = None, share_identifier: str | None = None, timeout: Dict | None = None, tags: Dict | None = None, quota_share_name: str | None = None) List[TrainingQueuedJob][source]#

Submit queued jobs to the provided estimator and return a list of TrainingQueuedJob objects.

Parameters:
  • training_job – Training job ModelTrainer object.

  • inputs – List of Training job inputs.

  • job_names – List of Batch job names.

  • retry_config – Retry config for the Batch jobs.

  • priority – Scheduling priority for the Batch jobs.

  • share_identifier – Share identifier for the Batch jobs.

  • timeout – Timeout configuration for the Batch jobs.

  • tags – Tags apply to Batch job. These tags are for Batch job only.

  • quota_share_name – Quota share name for the Batch jobs.

Returns: a list of TrainingQueuedJob objects with each Batch job ARN and job name.

submit(training_job: ModelTrainer, inputs, job_name: str | None = None, retry_config: Dict | None = None, priority: int | None = None, share_identifier: str | None = None, timeout: Dict | None = None, tags: Dict | None = None, quota_share_name: str | None = None, preemption_config: Dict | None = None) TrainingQueuedJob[source]#

Submit a queued job and return a QueuedJob object.

Parameters:
  • training_job – Training job ModelTrainer object.

  • inputs – Training job inputs.

  • job_name – Batch job name.

  • retry_config – Retry configuration for Batch job.

  • priority – Scheduling priority for Batch job.

  • share_identifier – Share identifier for Batch job.

  • timeout – Timeout configuration for Batch job.

  • tags – Tags apply to Batch job. These tags are for Batch job only.

  • quota_share_name – Quota Share name for the Batch job.

  • preemption_config – Preemption configuration.

Returns: a TrainingQueuedJob object with Batch job ARN and job name.