sagemaker.core.git_utils

Contents

sagemaker.core.git_utils#

Placeholder docstring

Functions

git_clone_repo(git_config, entry_point[, ...])

Git clone repo containing the training code and serving code.

sagemaker.core.git_utils.git_clone_repo(git_config, entry_point, source_dir=None, dependencies=None)[source]#

Git clone repo containing the training code and serving code.

This method also validate git_config, and set entry_point, source_dir and dependencies to the right file or directory in the repo cloned.

Parameters:
  • git_config (dict[str, str]) –

    Git configurations used for cloning files, including repo, branch, commit, 2FA_enabled, username, password and token. The repo field is required. All other fields are optional. repo specifies the Git repository where your training script is stored. If you don’t provide branch, the default value ‘master’ is used. If you don’t provide commit, the latest commit in the specified branch is used. 2FA_enabled, username, password and token are for authentication purpose. If 2FA_enabled is not provided, we consider 2FA as disabled.

    For GitHub and GitHub-like repos, when SSH URLs are provided, it doesn’t matter whether 2FA is enabled or disabled; you should either have no passphrase for the SSH key pairs, or have the ssh-agent configured so that you will not be prompted for SSH passphrase when you do ‘git clone’ command with SSH URLs. When https URLs are provided: if 2FA is disabled, then either token or username+password will be used for authentication if provided (token prioritized); if 2FA is enabled, only token will be used for authentication if provided. If required authentication info is not provided, python SDK will try to use local credentials storage to authenticate. If that fails either, an error message will be thrown.

    For CodeCommit repos, 2FA is not supported, so ‘2FA_enabled’ should not be provided. There is no token in CodeCommit, so ‘token’ should not be provided too. When ‘repo’ is an SSH URL, the requirements are the same as GitHub-like repos. When ‘repo’ is an https URL, username+password will be used for authentication if they are provided; otherwise, python SDK will try to use either CodeCommit credential helper or local credential storage for authentication.

  • entry_point (str) – A relative location to the Python source file which should be executed as the entry point to training or model hosting in the Git repo.

  • source_dir (str) – A relative location to a directory with other training or model hosting source code dependencies aside from the entry point file in the Git repo (default: None). Structure within this directory are preserved when training on Amazon SageMaker.

  • dependencies (list[str]) – A list of relative locations to directories with any additional libraries that will be exported to the container in the Git repo (default: []).

Returns:

A dict that contains the updated values of entry_point, source_dir and dependencies.

Return type:

dict

Raises:
  • CalledProcessError – If 1. failed to clone git repo 2. failed to checkout the required branch 3. failed to checkout the required commit

  • ValueError – If 1. entry point specified does not exist in the repo 2. source dir specified does not exist in the repo 3. dependencies specified do not exist in the repo 4. wrong format is provided for git_config