sagemaker.core.git_utils#
Placeholder docstring
Functions
|
Git clone repo containing the training code and serving code. |
- sagemaker.core.git_utils.git_clone_repo(git_config, entry_point, source_dir=None, dependencies=None)[source]#
Git clone repo containing the training code and serving code.
This method also validate
git_config, and setentry_point,source_diranddependenciesto the right file or directory in the repo cloned.- Parameters:
git_config (dict[str, str]) –
Git configurations used for cloning files, including
repo,branch,commit,2FA_enabled,username,passwordandtoken. Therepofield is required. All other fields are optional.repospecifies the Git repository where your training script is stored. If you don’t providebranch, the default value ‘master’ is used. If you don’t providecommit, the latest commit in the specified branch is used.2FA_enabled,username,passwordandtokenare for authentication purpose. If2FA_enabledis not provided, we consider 2FA as disabled.For GitHub and GitHub-like repos, when SSH URLs are provided, it doesn’t matter whether 2FA is enabled or disabled; you should either have no passphrase for the SSH key pairs, or have the ssh-agent configured so that you will not be prompted for SSH passphrase when you do ‘git clone’ command with SSH URLs. When https URLs are provided: if 2FA is disabled, then either token or username+password will be used for authentication if provided (token prioritized); if 2FA is enabled, only token will be used for authentication if provided. If required authentication info is not provided, python SDK will try to use local credentials storage to authenticate. If that fails either, an error message will be thrown.
For CodeCommit repos, 2FA is not supported, so ‘2FA_enabled’ should not be provided. There is no token in CodeCommit, so ‘token’ should not be provided too. When ‘repo’ is an SSH URL, the requirements are the same as GitHub-like repos. When ‘repo’ is an https URL, username+password will be used for authentication if they are provided; otherwise, python SDK will try to use either CodeCommit credential helper or local credential storage for authentication.
entry_point (str) – A relative location to the Python source file which should be executed as the entry point to training or model hosting in the Git repo.
source_dir (str) – A relative location to a directory with other training or model hosting source code dependencies aside from the entry point file in the Git repo (default: None). Structure within this directory are preserved when training on Amazon SageMaker.
dependencies (list[str]) – A list of relative locations to directories with any additional libraries that will be exported to the container in the Git repo (default: []).
- Returns:
A dict that contains the updated values of entry_point, source_dir and dependencies.
- Return type:
dict
- Raises:
CalledProcessError – If 1. failed to clone git repo 2. failed to checkout the required branch 3. failed to checkout the required commit
ValueError – If 1. entry point specified does not exist in the repo 2. source dir specified does not exist in the repo 3. dependencies specified do not exist in the repo 4. wrong format is provided for git_config