Lineage Tracking#
Amazon SageMaker Lineage enables events that happen within SageMaker to be traced via a graph structure. The data simplifies generating reports, making comparisons, or discovering relationships between events. For example, you can easily trace both how a model was generated and where the model was deployed.
The lineage graph is created automatically by SageMaker and you can directly create or modify your own graphs.
Key Concepts#
Lineage Graph - A connected graph tracing your machine learning workflow end to end.
Artifacts - Represents a URI addressable object or data. Artifacts are typically inputs or outputs to Actions.
Actions - Represents an action taken such as a computation, transformation, or job.
Contexts - Provides a method to logically group other entities.
Associations - A directed edge in the lineage graph that links two entities.
Lineage Traversal - Starting from an arbitrary point, trace the lineage graph to discover and analyze relationships between steps in your workflow.
Experiments - Experiment entities (Experiments, Trials, and Trial Components) are also part of the lineage graph and can be associated with Artifacts, Actions, or Contexts.
Use Cases#
The notebook from the SageMaker examples repository demonstrates the following major use cases for lineage tracking:
Creating Lineage Contexts - Group related lineage entities under a logical workflow context
Listing Lineage Entities - Query and enumerate existing contexts, actions, artifacts, and associations
Creating Actions - Record computational steps such as model builds, transformations, or training jobs
Creating Artifacts - Register data inputs (datasets, labels) and outputs (trained models) as lineage artifacts
Creating Associations - Link entities together with directed edges to form the lineage graph
Traversing Associations - Query incoming and outgoing associations to understand entity relationships
Cleaning Up Lineage Data - Delete associations and entities when they are no longer needed
V3 Migration Notes#
In SageMaker Python SDK V3, lineage classes have moved from sagemaker.lineage to sagemaker.core.lineage. The old import paths still work via compatibility shims but emit deprecation warnings.
V2 Import |
V3 Import |
|---|---|
|
|
|
|
|
|
|
|
|
|
The API signatures for create, list, delete, and association management remain the same. The key change is the import path.
Use Case 1: Session Setup#
Initialize a SageMaker session and set up common variables.
V2 (Legacy):
import boto3
import sagemaker
region = boto3.Session().region_name
sagemaker_session = sagemaker.session.Session()
default_bucket = sagemaker_session.default_bucket()
V3:
import boto3
from sagemaker.core.helper.session_helper import Session
region = boto3.Session().region_name
sagemaker_session = Session()
default_bucket = sagemaker_session.default_bucket()
Use Case 2: Creating a Lineage Context#
Contexts provide a method to logically group other lineage entities. Each context name must be unique across all other contexts.
V2 (Legacy):
from datetime import datetime
from sagemaker.lineage.context import Context
unique_id = str(int(datetime.now().replace(microsecond=0).timestamp()))
context_name = f"machine-learning-workflow-{unique_id}"
ml_workflow_context = Context.create(
context_name=context_name,
context_type="MLWorkflow",
source_uri=unique_id,
properties={"example": "true"},
)
V3:
from datetime import datetime
from sagemaker.core.lineage.context import Context
unique_id = str(int(datetime.now().replace(microsecond=0).timestamp()))
context_name = f"machine-learning-workflow-{unique_id}"
ml_workflow_context = Context.create(
context_name=context_name,
context_type="MLWorkflow",
source_uri=unique_id,
properties={"example": "true"},
)
Use Case 3: Listing Contexts#
Enumerate existing contexts sorted by creation time.
V2 (Legacy):
from sagemaker.lineage.context import Context
contexts = Context.list(sort_by="CreationTime", sort_order="Descending")
for ctx in contexts:
print(ctx.context_name)
V3:
from sagemaker.core.lineage.context import Context
contexts = Context.list(sort_by="CreationTime", sort_order="Descending")
for ctx in contexts:
print(ctx.context_name)
Use Case 4: Creating an Action#
Actions represent computational steps such as model builds, transformations, or training jobs.
V2 (Legacy):
from sagemaker.lineage.action import Action
model_build_action = Action.create(
action_name=f"model-build-step-{unique_id}",
action_type="ModelBuild",
source_uri=unique_id,
properties={"Example": "Metadata"},
)
V3:
from sagemaker.core.lineage.action import Action
model_build_action = Action.create(
action_name=f"model-build-step-{unique_id}",
action_type="ModelBuild",
source_uri=unique_id,
properties={"Example": "Metadata"},
)
Use Case 5: Creating Associations#
Associations are directed edges in the lineage graph. The association_type can be Produced, DerivedFrom, AssociatedWith, or ContributedTo.
V2 (Legacy):
from sagemaker.lineage.association import Association
context_action_association = Association.create(
source_arn=ml_workflow_context.context_arn,
destination_arn=model_build_action.action_arn,
association_type="AssociatedWith",
)
V3:
from sagemaker.core.lineage.association import Association
context_action_association = Association.create(
source_arn=ml_workflow_context.context_arn,
destination_arn=model_build_action.action_arn,
association_type="AssociatedWith",
)
Use Case 6: Traversing Associations#
Query incoming and outgoing associations to understand how entities are related in the lineage graph.
V2 (Legacy):
from sagemaker.lineage.association import Association
# List incoming associations to an action
incoming = Association.list(destination_arn=model_build_action.action_arn)
for association in incoming:
print(f"{model_build_action.action_name} has incoming association from {association.source_name}")
# List outgoing associations from a context
outgoing = Association.list(source_arn=ml_workflow_context.context_arn)
for association in outgoing:
print(f"{ml_workflow_context.context_name} has outgoing association to {association.destination_name}")
V3:
from sagemaker.core.lineage.association import Association
# List incoming associations to an action
incoming = Association.list(destination_arn=model_build_action.action_arn)
for association in incoming:
print(f"{model_build_action.action_name} has incoming association from {association.source_name}")
# List outgoing associations from a context
outgoing = Association.list(source_arn=ml_workflow_context.context_arn)
for association in outgoing:
print(f"{ml_workflow_context.context_name} has outgoing association to {association.destination_name}")
Use Case 7: Creating Artifacts#
Artifacts represent URI-addressable objects or data, such as datasets, labels, or trained models.
V2 (Legacy):
from sagemaker.lineage.artifact import Artifact
input_test_images = Artifact.create(
artifact_name="mnist-test-images",
artifact_type="TestData",
source_types=[{"SourceIdType": "Custom", "Value": unique_id}],
source_uri=f"https://sagemaker-example-files-prod-{region}.s3.amazonaws.com/datasets/image/MNIST/t10k-images-idx3-ubyte.gz",
)
input_test_labels = Artifact.create(
artifact_name="mnist-test-labels",
artifact_type="TestLabels",
source_types=[{"SourceIdType": "Custom", "Value": unique_id}],
source_uri=f"https://sagemaker-example-files-prod-{region}.s3.amazonaws.com/datasets/image/MNIST/t10k-labels-idx1-ubyte.gz",
)
output_model = Artifact.create(
artifact_name="mnist-model",
artifact_type="Model",
source_types=[{"SourceIdType": "Custom", "Value": unique_id}],
source_uri=f"https://sagemaker-example-files-prod-{region}.s3.amazonaws.com/datasets/image/MNIST/model/tensorflow-training-2020-11-20-23-57-13-077/model.tar.gz",
)
V3:
from sagemaker.core.lineage.artifact import Artifact
input_test_images = Artifact.create(
artifact_name="mnist-test-images",
artifact_type="TestData",
source_types=[{"SourceIdType": "Custom", "Value": unique_id}],
source_uri=f"https://sagemaker-example-files-prod-{region}.s3.amazonaws.com/datasets/image/MNIST/t10k-images-idx3-ubyte.gz",
)
input_test_labels = Artifact.create(
artifact_name="mnist-test-labels",
artifact_type="TestLabels",
source_types=[{"SourceIdType": "Custom", "Value": unique_id}],
source_uri=f"https://sagemaker-example-files-prod-{region}.s3.amazonaws.com/datasets/image/MNIST/t10k-labels-idx1-ubyte.gz",
)
output_model = Artifact.create(
artifact_name="mnist-model",
artifact_type="Model",
source_types=[{"SourceIdType": "Custom", "Value": unique_id}],
source_uri=f"https://sagemaker-example-files-prod-{region}.s3.amazonaws.com/datasets/image/MNIST/model/tensorflow-training-2020-11-20-23-57-13-077/model.tar.gz",
)
Use Case 8: Linking Artifacts to Actions#
Associate data artifacts as inputs to an action, and the action’s output to a model artifact, forming a complete lineage chain.
V2 (Legacy):
from sagemaker.lineage.association import Association
# Link input data to the model build action
Association.create(
source_arn=input_test_images.artifact_arn,
destination_arn=model_build_action.action_arn,
)
Association.create(
source_arn=input_test_labels.artifact_arn,
destination_arn=model_build_action.action_arn,
)
# Link the action output to the model artifact
Association.create(
source_arn=model_build_action.action_arn,
destination_arn=output_model.artifact_arn,
)
V3:
from sagemaker.core.lineage.association import Association
# Link input data to the model build action
Association.create(
source_arn=input_test_images.artifact_arn,
destination_arn=model_build_action.action_arn,
)
Association.create(
source_arn=input_test_labels.artifact_arn,
destination_arn=model_build_action.action_arn,
)
# Link the action output to the model artifact
Association.create(
source_arn=model_build_action.action_arn,
destination_arn=output_model.artifact_arn,
)
Use Case 9: Cleaning Up Lineage Data#
Delete associations first, then delete the entities themselves. Associations must be removed before their source or destination entities can be deleted.
V2 (Legacy):
import sagemaker
from sagemaker.lineage.association import Association
from sagemaker.lineage.context import Context
from sagemaker.lineage.action import Action
from sagemaker.lineage.artifact import Artifact
sagemaker_session = sagemaker.session.Session()
def delete_associations(arn):
for summary in Association.list(destination_arn=arn):
assct = Association(
source_arn=summary.source_arn,
destination_arn=summary.destination_arn,
sagemaker_session=sagemaker_session,
)
assct.delete()
for summary in Association.list(source_arn=arn):
assct = Association(
source_arn=summary.source_arn,
destination_arn=summary.destination_arn,
sagemaker_session=sagemaker_session,
)
assct.delete()
# Delete context
delete_associations(ml_workflow_context.context_arn)
Context(context_name=ml_workflow_context.context_name, sagemaker_session=sagemaker_session).delete()
# Delete action
delete_associations(model_build_action.action_arn)
Action(action_name=model_build_action.action_name, sagemaker_session=sagemaker_session).delete()
# Delete artifacts
for artifact in [input_test_images, input_test_labels, output_model]:
delete_associations(artifact.artifact_arn)
Artifact(artifact_arn=artifact.artifact_arn, sagemaker_session=sagemaker_session).delete()
V3:
from sagemaker.core.helper.session_helper import Session
from sagemaker.core.lineage.association import Association
from sagemaker.core.lineage.context import Context
from sagemaker.core.lineage.action import Action
from sagemaker.core.lineage.artifact import Artifact
sagemaker_session = Session()
def delete_associations(arn):
for summary in Association.list(destination_arn=arn):
assct = Association(
source_arn=summary.source_arn,
destination_arn=summary.destination_arn,
sagemaker_session=sagemaker_session,
)
assct.delete()
for summary in Association.list(source_arn=arn):
assct = Association(
source_arn=summary.source_arn,
destination_arn=summary.destination_arn,
sagemaker_session=sagemaker_session,
)
assct.delete()
# Delete context
delete_associations(ml_workflow_context.context_arn)
Context(context_name=ml_workflow_context.context_name, sagemaker_session=sagemaker_session).delete()
# Delete action
delete_associations(model_build_action.action_arn)
Action(action_name=model_build_action.action_name, sagemaker_session=sagemaker_session).delete()
# Delete artifacts
for artifact in [input_test_images, input_test_labels, output_model]:
delete_associations(artifact.artifact_arn)
Artifact(artifact_arn=artifact.artifact_arn, sagemaker_session=sagemaker_session).delete()
Caveats#
Associations cannot be created between two experiment entities (e.g., between an Experiment and Trial).
Associations can only be created between Action, Artifact, or Context resources.
Maximum number of manually created lineage entities:
Artifacts: 6000
Contexts: 500
Actions: 3000
Associations: 6000
There is no limit on the number of lineage entities created automatically by SageMaker.
Lineage Tracking Example#
For a complete end-to-end V3 example, see the lineage tracking notebook: