sagemaker.mlops.feature_store#
SageMaker FeatureStore V3 - powered by sagemaker-core.
- class sagemaker.mlops.feature_store.AthenaQuery(catalog: str, database: str, table_name: str, sagemaker_session: Session)[source]#
Bases:
objectClass to manage querying of feature store data with AWS Athena.
This class instantiates a AthenaQuery object that is used to retrieve data from feature store via standard SQL queries.
- catalog#
name of the data catalog.
- Type:
str
- database#
name of the database.
- Type:
str
- table_name#
name of the table.
- Type:
str
- as_dataframe(**kwargs) DataFrame[source]#
Download the result of the current query and load it into a DataFrame.
- Parameters:
**kwargs (object) – key arguments used for the method pandas.read_csv to be able to have a better tuning on data. For more info read: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html
- Returns:
A pandas DataFrame contains the query result.
- catalog: str#
- database: str#
- get_query_execution() Dict[str, Any][source]#
Get execution status of the current query.
- Returns:
Response dict from Athena.
- run(query_string: str, output_location: str, kms_key: str = None, workgroup: str = None) str[source]#
Execute a SQL query given a query string, output location and kms key.
This method executes the SQL query using Athena and outputs the results to output_location and returns the execution id of the query.
- Parameters:
query_string – SQL query string.
output_location – S3 URI of the query result.
kms_key – KMS key id. If set, will be used to encrypt the query result file.
workgroup (str) – The name of the workgroup in which the query is being started.
- Returns:
Execution id of the query.
- table_name: str#
- class sagemaker.mlops.feature_store.CollectionTypeEnum(value)[source]#
Bases:
EnumCollection types: List, Set, or Vector.
- LIST = 'List'#
- SET = 'Set'#
- VECTOR = 'Vector'#
- class sagemaker.mlops.feature_store.DataCatalogConfig(*, table_name: str | PipelineVariable, catalog: str | PipelineVariable, database: str | PipelineVariable)[source]#
Bases:
BaseThe meta data of the Glue table which serves as data catalog for the OfflineStore.
- table_name#
- Type:
The name of the Glue table.
- catalog#
- Type:
The name of the Glue table catalog.
- database#
- Type:
The name of the Glue table database.
- catalog: str | PipelineVariable#
- database: str | PipelineVariable#
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid', 'protected_namespaces': (), 'validate_assignment': True}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- table_name: str | PipelineVariable#
- class sagemaker.mlops.feature_store.DatasetBuilder(_sagemaker_session: Session, _base: FeatureGroup | DataFrame, _output_path: str, _record_identifier_feature_name: str | None = None, _event_time_identifier_feature_name: str | None = None, _included_feature_names: List[str] | None = None, _kms_key_id: str | None = None, _event_time_identifier_feature_type: FeatureTypeEnum | None = None)[source]#
Bases:
objectDatasetBuilder definition.
This class instantiates a DatasetBuilder object that comprises a base, a list of feature names, an output path and a KMS key ID.
- _base#
A base which can be either a FeatureGroup or a pandas.DataFrame and will be used to merge other FeatureGroups and generate a Dataset.
- Type:
Union[FeatureGroup, DataFrame]
- _output_path#
An S3 URI which stores the output .csv file.
- Type:
str
- _record_identifier_feature_name#
A string representing the record identifier feature if base is a DataFrame (default: None).
- Type:
str
- _event_time_identifier_feature_name#
A string representing the event time identifier feature if base is a DataFrame (default: None).
- Type:
str
- _included_feature_names#
A list of strings representing features to be included in the output. If not set, all features will be included in the output. (default: None).
- Type:
List[str]
- _kms_key_id#
A KMS key id. If set, will be used to encrypt the result file (default: None).
- Type:
str
- _point_in_time_accurate_join#
A boolean representing if point-in-time join is applied to the resulting dataframe when calling “to_dataframe”. When set to True, users can retrieve data using “row-level time travel” according to the event times provided to the DatasetBuilder. This requires that the entity dataframe with event times is submitted as the base in the constructor (default: False).
- Type:
bool
- _include_duplicated_records#
A boolean representing whether the resulting dataframe when calling “to_dataframe” should include duplicated records (default: False).
- Type:
bool
- _include_deleted_records#
A boolean representing whether the resulting dataframe when calling “to_dataframe” should include deleted records (default: False).
- Type:
bool
- _number_of_recent_records#
An integer representing how many records will be returned for each record identifier (default: 1).
- Type:
int
- _number_of_records#
An integer representing the number of records that should be returned in the resulting dataframe when calling “to_dataframe” (default: None).
- Type:
int
- _write_time_ending_timestamp#
A datetime that represents the latest write time for a record to be included in the resulting dataset. Records with a newer write time will be omitted from the resulting dataset. (default: None).
- Type:
datetime.datetime
- _event_time_starting_timestamp#
A datetime that represents the earliest event time for a record to be included in the resulting dataset. Records with an older event time will be omitted from the resulting dataset. (default: None).
- Type:
datetime.datetime
- _event_time_ending_timestamp#
A datetime that represents the latest event time for a record to be included in the resulting dataset. Records with a newer event time will be omitted from the resulting dataset. (default: None).
- Type:
datetime.datetime
- _feature_groups_to_be_merged#
A list of FeatureGroupToBeMerged which will be joined to base (default: []).
- Type:
List[FeatureGroupToBeMerged]
- _event_time_identifier_feature_type#
A FeatureTypeEnum representing the type of event time identifier feature (default: None).
- Type:
- as_of(timestamp: datetime) DatasetBuilder[source]#
Set write_time_ending_timestamp field with provided input.
- Parameters:
timestamp (datetime.datetime) – A datetime that all records’ write time in dataset will be before it.
- Returns:
This DatasetBuilder object.
- classmethod create(base: FeatureGroup | DataFrame, output_path: str, session: Session, record_identifier_feature_name: str | None = None, event_time_identifier_feature_name: str | None = None, included_feature_names: List[str] | None = None, kms_key_id: str | None = None) DatasetBuilder[source]#
Create a DatasetBuilder for generating a Dataset.
- Parameters:
base – A FeatureGroup or DataFrame to use as the base.
output_path – S3 URI for output.
session – SageMaker session.
record_identifier_feature_name – Required if base is DataFrame.
event_time_identifier_feature_name – Required if base is DataFrame.
included_feature_names – Features to include in output.
kms_key_id – KMS key for encryption.
- Returns:
DatasetBuilder instance.
- include_deleted_records() DatasetBuilder[source]#
Include deleted records in dataset.
- Returns:
This DatasetBuilder object.
- include_duplicated_records() DatasetBuilder[source]#
Include duplicated records in dataset.
- Returns:
This DatasetBuilder object.
- point_in_time_accurate_join() DatasetBuilder[source]#
Enable point-in-time accurate join.
- Returns:
This DatasetBuilder object.
- to_csv_file() tuple[str, str][source]#
Get query string and result in .csv format file.
- Returns:
- A tuple containing:
str: The S3 path of the .csv file
str: The query string executed
- Return type:
tuple
Note
This method returns a tuple (csv_path, query_string). To get just the CSV path: csv_path, _ = builder.to_csv_file()
- to_dataframe() tuple[DataFrame, str][source]#
Get query string and result in pandas.DataFrame.
- Returns:
- A tuple containing:
pd.DataFrame: The pandas DataFrame object
str: The query string executed
- Return type:
tuple
Note
This method returns a tuple (dataframe, query_string). To get just the DataFrame: df, _ = builder.to_dataframe()
- with_event_time_range(starting_timestamp: datetime | None = None, ending_timestamp: datetime | None = None) DatasetBuilder[source]#
Set event_time_starting_timestamp and event_time_ending_timestamp with provided inputs.
- Parameters:
starting_timestamp (datetime.datetime) – A datetime that all records’ event time in dataset will be after it (default: None).
ending_timestamp (datetime.datetime) – A datetime that all records’ event time in dataset will be before it (default: None).
- Returns:
This DatasetBuilder object.
- with_feature_group(feature_group: FeatureGroup, target_feature_name_in_base: str | None = None, included_feature_names: List[str] | None = None, feature_name_in_target: str | None = None, join_comparator: JoinComparatorEnum = JoinComparatorEnum.EQUALS, join_type: JoinTypeEnum = JoinTypeEnum.INNER_JOIN) DatasetBuilder[source]#
Join FeatureGroup with base.
- Parameters:
feature_group (FeatureGroup) – A target FeatureGroup which will be joined to base.
target_feature_name_in_base (str) – A string representing the feature name in base which will be used as a join key (default: None).
included_feature_names (List[str]) – A list of strings representing features to be included in the output (default: None).
feature_name_in_target (str) – A string representing the feature name in the target feature group that will be compared to the target feature in the base feature group. If None is provided, the record identifier feature will be used in the SQL join. (default: None).
join_comparator (JoinComparatorEnum) – A JoinComparatorEnum representing the comparator used when joining the target feature in the base feature group and the feature in the target feature group. (default: JoinComparatorEnum.EQUALS).
join_type (JoinTypeEnum) – A JoinTypeEnum representing the type of join between the base and target feature groups. (default: JoinTypeEnum.INNER_JOIN).
- Returns:
This DatasetBuilder object.
- with_number_of_recent_records_by_record_identifier(n: int) DatasetBuilder[source]#
Set number_of_recent_records field with provided input.
- Parameters:
n (int) – An int that how many recent records will be returned for each record identifier.
- Returns:
This DatasetBuilder object.
- with_number_of_records_from_query_results(n: int) DatasetBuilder[source]#
Set number_of_records field with provided input.
- Parameters:
n (int) – An int that how many records will be returned.
- Returns:
This DatasetBuilder object.
- class sagemaker.mlops.feature_store.DeletionModeEnum(value)[source]#
Bases:
EnumDeletion modes for delete_record.
- HARD_DELETE = 'HardDelete'#
- SOFT_DELETE = 'SoftDelete'#
- class sagemaker.mlops.feature_store.ExpirationTimeResponseEnum(value)[source]#
Bases:
EnumExpiresAt response toggle.
- DISABLED = 'Disabled'#
- ENABLED = 'Enabled'#
- class sagemaker.mlops.feature_store.FeatureDefinition(*, feature_name: str | PipelineVariable, feature_type: str | PipelineVariable, collection_type: str | PipelineVariable | None = Unassigned(), collection_config: CollectionConfig | None = Unassigned())[source]#
Bases:
BaseA list of features. You must include FeatureName and FeatureType. Valid feature FeatureTypes are Integral, Fractional and String.
- feature_name#
- Type:
The name of a feature. The type must be a string. FeatureName cannot be any of the following: is_deleted, write_time, api_invocation_time. The name: Must start with an alphanumeric character. Can only include alphanumeric characters, underscores, and hyphens. Spaces are not allowed.
- feature_type#
- Type:
The value type of a feature. Valid values are Integral, Fractional, or String.
- collection_type#
- Type:
A grouping of elements where each element within the collection must have the same feature type (String, Integral, or Fractional). List: An ordered collection of elements. Set: An unordered collection of unique elements. Vector: A specialized list that represents a fixed-size array of elements. The vector dimension is determined by you. Must have elements with fractional feature types.
- collection_config#
- Type:
Configuration for your collection.
- collection_config: CollectionConfig | None#
- collection_type: str | PipelineVariable | None#
- feature_name: str | PipelineVariable#
- feature_type: str | PipelineVariable#
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid', 'protected_namespaces': (), 'validate_assignment': True}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class sagemaker.mlops.feature_store.FeatureGroup(*, feature_group_name: str | PipelineVariable, feature_group_arn: str | PipelineVariable | None = Unassigned(), record_identifier_feature_name: str | PipelineVariable | None = Unassigned(), event_time_feature_name: str | PipelineVariable | None = Unassigned(), feature_definitions: List[FeatureDefinition] | None = Unassigned(), creation_time: datetime | None = Unassigned(), last_modified_time: datetime | None = Unassigned(), online_store_config: OnlineStoreConfig | None = Unassigned(), offline_store_config: OfflineStoreConfig | None = Unassigned(), throughput_config: ThroughputConfigDescription | None = Unassigned(), role_arn: str | PipelineVariable | None = Unassigned(), feature_group_status: str | PipelineVariable | None = Unassigned(), offline_store_status: OfflineStoreStatus | None = Unassigned(), last_update_status: LastUpdateStatus | None = Unassigned(), failure_reason: str | PipelineVariable | None = Unassigned(), description: str | PipelineVariable | None = Unassigned(), next_token: str | PipelineVariable | None = Unassigned(), online_store_replicas: List[OnlineStoreReplica] | None = Unassigned(), online_store_read_write_type: str | PipelineVariable | None = Unassigned(), online_store_total_size_bytes: int | None = Unassigned(), online_store_total_item_count: int | None = Unassigned(), created_by: UserContext | None = Unassigned(), last_modified_by: UserContext | None = Unassigned())[source]#
Bases:
BaseClass representing resource FeatureGroup
- feature_group_arn#
The Amazon Resource Name (ARN) of the FeatureGroup.
- Type:
str | sagemaker.core.helper.pipeline_variable.PipelineVariable | None
- feature_group_name#
he name of the FeatureGroup.
- record_identifier_feature_name#
The name of the Feature used for RecordIdentifier, whose value uniquely identifies a record stored in the feature store.
- Type:
str | sagemaker.core.helper.pipeline_variable.PipelineVariable | None
- event_time_feature_name#
The name of the feature that stores the EventTime of a Record in a FeatureGroup. An EventTime is a point in time when a new event occurs that corresponds to the creation or update of a Record in a FeatureGroup. All Records in the FeatureGroup have a corresponding EventTime.
- Type:
str | sagemaker.core.helper.pipeline_variable.PipelineVariable | None
- feature_definitions#
A list of the Features in the FeatureGroup. Each feature is defined by a FeatureName and FeatureType.
- Type:
List[sagemaker.core.shapes.shapes.FeatureDefinition] | None
- creation_time#
A timestamp indicating when SageMaker created the FeatureGroup.
- Type:
datetime.datetime | None
- next_token#
A token to resume pagination of the list of Features (FeatureDefinitions).
- Type:
str | sagemaker.core.helper.pipeline_variable.PipelineVariable | None
- last_modified_time#
A timestamp indicating when the feature group was last updated.
- Type:
datetime.datetime | None
- online_store_config#
The configuration for the OnlineStore.
- Type:
- offline_store_config#
The configuration of the offline store. It includes the following configurations: Amazon S3 location of the offline store. Configuration of the Glue data catalog. Table format of the offline store. Option to disable the automatic creation of a Glue table for the offline store. Encryption configuration.
- Type:
- throughput_config#
- role_arn#
The Amazon Resource Name (ARN) of the IAM execution role used to persist data into the OfflineStore if an OfflineStoreConfig is provided.
- Type:
str | sagemaker.core.helper.pipeline_variable.PipelineVariable | None
- feature_group_status#
The status of the feature group.
- Type:
str | sagemaker.core.helper.pipeline_variable.PipelineVariable | None
- offline_store_status#
The status of the OfflineStore. Notifies you if replicating data into the OfflineStore has failed. Returns either: Active or Blocked
- Type:
- last_update_status#
A value indicating whether the update made to the feature group was successful.
- Type:
- failure_reason#
The reason that the FeatureGroup failed to be replicated in the OfflineStore. This is failure can occur because: The FeatureGroup could not be created in the OfflineStore. The FeatureGroup could not be deleted from the OfflineStore.
- Type:
str | sagemaker.core.helper.pipeline_variable.PipelineVariable | None
- description#
A free form description of the feature group.
- Type:
str | sagemaker.core.helper.pipeline_variable.PipelineVariable | None
- online_store_replicas#
- Type:
List[sagemaker.core.shapes.shapes.OnlineStoreReplica] | None
- online_store_read_write_type#
- Type:
str | sagemaker.core.helper.pipeline_variable.PipelineVariable | None
- online_store_total_size_bytes#
The size of the OnlineStore in bytes.
- Type:
int | None
- online_store_total_item_count#
- Type:
int | None
- created_by#
- Type:
- last_modified_by#
- Type:
- batch_get_record(identifiers: List[BatchGetRecordIdentifier], expiration_time_response: str | PipelineVariable | None = Unassigned(), session: Session | None = None, region: str | None = None) BatchGetRecordResponse | None[source]#
Retrieves a batch of Records from a FeatureGroup.
- Parameters:
identifiers – A list containing the name or Amazon Resource Name (ARN) of the FeatureGroup, the list of names of Features to be retrieved, and the corresponding RecordIdentifier values as strings.
expiration_time_response – Parameter to request ExpiresAt in response. If Enabled, BatchGetRecord will return the value of ExpiresAt, if it is not null. If Disabled and null, BatchGetRecord will return null.
session – Boto3 session.
region – Region name.
- Returns:
BatchGetRecordResponse
- Raises:
botocore.exceptions.ClientError – This exception is raised for AWS service related errors. The error message and error code can be parsed from the exception as follows:
` try: # AWS service call here except botocore.exceptions.ClientError as e: error_message = e.response['Error']['Message'] error_code = e.response['Error']['Code'] `AccessForbidden – You do not have permission to perform an action.
InternalFailure – An internal failure occurred. Try your request again. If the problem persists, contact Amazon Web Services customer support.
ServiceUnavailable – The service is currently unavailable.
ValidationError – There was an error validating your request.
- classmethod create(feature_group_name: str | PipelineVariable, record_identifier_feature_name: str | PipelineVariable, event_time_feature_name: str | PipelineVariable, feature_definitions: List[FeatureDefinition], online_store_config: OnlineStoreConfig | None = Unassigned(), offline_store_config: OfflineStoreConfig | None = Unassigned(), throughput_config: ThroughputConfig | None = Unassigned(), role_arn: str | PipelineVariable | None = Unassigned(), description: str | PipelineVariable | None = Unassigned(), tags: List[Tag] | None = Unassigned(), use_pre_prod_offline_store_replicator_lambda: bool | None = Unassigned(), session: Session | None = None, region: str | PipelineVariable | None = None) FeatureGroup | None[source]#
Create a FeatureGroup resource
- Parameters:
feature_group_name – The name of the FeatureGroup. The name must be unique within an Amazon Web Services Region in an Amazon Web Services account. The name: Must start with an alphanumeric character. Can only include alphanumeric characters, underscores, and hyphens. Spaces are not allowed.
record_identifier_feature_name – The name of the Feature whose value uniquely identifies a Record defined in the FeatureStore. Only the latest record per identifier value will be stored in the OnlineStore. RecordIdentifierFeatureName must be one of feature definitions’ names. You use the RecordIdentifierFeatureName to access data in a FeatureStore. This name: Must start with an alphanumeric character. Can only contains alphanumeric characters, hyphens, underscores. Spaces are not allowed.
event_time_feature_name – The name of the feature that stores the EventTime of a Record in a FeatureGroup. An EventTime is a point in time when a new event occurs that corresponds to the creation or update of a Record in a FeatureGroup. All Records in the FeatureGroup must have a corresponding EventTime. An EventTime can be a String or Fractional. Fractional: EventTime feature values must be a Unix timestamp in seconds. String: EventTime feature values must be an ISO-8601 string in the format. The following formats are supported yyyy-MM-dd’T’HH:mm:ssZ and yyyy-MM-dd’T’HH:mm:ss.SSSZ where yyyy, MM, and dd represent the year, month, and day respectively and HH, mm, ss, and if applicable, SSS represent the hour, month, second and milliseconds respsectively. ‘T’ and Z are constants.
feature_definitions – A list of Feature names and types. Name and Type is compulsory per Feature. Valid feature FeatureTypes are Integral, Fractional and String. FeatureNames cannot be any of the following: is_deleted, write_time, api_invocation_time You can create up to 2,500 FeatureDefinitions per FeatureGroup.
online_store_config – You can turn the OnlineStore on or off by specifying True for the EnableOnlineStore flag in OnlineStoreConfig. You can also include an Amazon Web Services KMS key ID (KMSKeyId) for at-rest encryption of the OnlineStore. The default value is False.
offline_store_config – Use this to configure an OfflineFeatureStore. This parameter allows you to specify: The Amazon Simple Storage Service (Amazon S3) location of an OfflineStore. A configuration for an Amazon Web Services Glue or Amazon Web Services Hive data catalog. An KMS encryption key to encrypt the Amazon S3 location used for OfflineStore. If KMS encryption key is not specified, by default we encrypt all data at rest using Amazon Web Services KMS key. By defining your bucket-level key for SSE, you can reduce Amazon Web Services KMS requests costs by up to 99 percent. Format for the offline store table. Supported formats are Glue (Default) and Apache Iceberg. To learn more about this parameter, see OfflineStoreConfig.
throughput_config
role_arn – The Amazon Resource Name (ARN) of the IAM execution role used to persist data into the OfflineStore if an OfflineStoreConfig is provided.
description – A free-form description of a FeatureGroup.
tags – Tags used to identify Features in each FeatureGroup.
use_pre_prod_offline_store_replicator_lambda
session – Boto3 session.
region – Region name.
- Returns:
The FeatureGroup resource.
- Raises:
botocore.exceptions.ClientError – This exception is raised for AWS service related errors. The error message and error code can be parsed from the exception as follows:
` try: # AWS service call here except botocore.exceptions.ClientError as e: error_message = e.response['Error']['Message'] error_code = e.response['Error']['Code'] `ResourceInUse – Resource being accessed is in use.
ResourceLimitExceeded – You have exceeded an SageMaker resource limit. For example, you might have too many training jobs created.
ConfigSchemaValidationError – Raised when a configuration file does not adhere to the schema
LocalConfigNotFoundError – Raised when a configuration file is not found in local file system
S3ConfigNotFoundError – Raised when a configuration file is not found in S3
- created_by: UserContext | None#
- creation_time: datetime | None#
- delete() None[source]#
Delete a FeatureGroup resource
- Raises:
botocore.exceptions.ClientError – This exception is raised for AWS service related errors. The error message and error code can be parsed from the exception as follows:
` try: # AWS service call here except botocore.exceptions.ClientError as e: error_message = e.response['Error']['Message'] error_code = e.response['Error']['Code'] `ResourceNotFound – Resource being access is not found.
- delete_record(record_identifier_value_as_string: str | PipelineVariable, event_time: str | PipelineVariable, target_stores: List[str | PipelineVariable] | None = Unassigned(), deletion_mode: str | PipelineVariable | None = Unassigned(), session: Session | None = None, region: str | None = None) None[source]#
Deletes a Record from a FeatureGroup in the OnlineStore.
- Parameters:
record_identifier_value_as_string – The value for the RecordIdentifier that uniquely identifies the record, in string format.
event_time – Timestamp indicating when the deletion event occurred. EventTime can be used to query data at a certain point in time.
target_stores – A list of stores from which you’re deleting the record. By default, Feature Store deletes the record from all of the stores that you’re using for the FeatureGroup.
deletion_mode – The name of the deletion mode for deleting the record. By default, the deletion mode is set to SoftDelete.
session – Boto3 session.
region – Region name.
- Raises:
botocore.exceptions.ClientError – This exception is raised for AWS service related errors. The error message and error code can be parsed from the exception as follows:
` try: # AWS service call here except botocore.exceptions.ClientError as e: error_message = e.response['Error']['Message'] error_code = e.response['Error']['Code'] `AccessForbidden – You do not have permission to perform an action.
InternalFailure – An internal failure occurred. Try your request again. If the problem persists, contact Amazon Web Services customer support.
ServiceUnavailable – The service is currently unavailable.
ValidationError – There was an error validating your request.
- description: str | PipelineVariable | None#
- event_time_feature_name: str | PipelineVariable | None#
- failure_reason: str | PipelineVariable | None#
- feature_definitions: List[FeatureDefinition] | None#
- feature_group_arn: str | PipelineVariable | None#
- feature_group_name: str | PipelineVariable#
- feature_group_status: str | PipelineVariable | None#
- classmethod get(feature_group_name: str | PipelineVariable, next_token: str | PipelineVariable | None = Unassigned(), session: Session | None = None, region: str | PipelineVariable | None = None) FeatureGroup | None[source]#
Get a FeatureGroup resource
- Parameters:
feature_group_name – The name or Amazon Resource Name (ARN) of the FeatureGroup you want described.
next_token – A token to resume pagination of the list of Features (FeatureDefinitions). 2,500 Features are returned by default.
session – Boto3 session.
region – Region name.
- Returns:
The FeatureGroup resource.
- Raises:
botocore.exceptions.ClientError – This exception is raised for AWS service related errors. The error message and error code can be parsed from the exception as follows:
` try: # AWS service call here except botocore.exceptions.ClientError as e: error_message = e.response['Error']['Message'] error_code = e.response['Error']['Code'] `ResourceNotFound – Resource being access is not found.
- classmethod get_all(name_contains: str | PipelineVariable | None = Unassigned(), feature_group_status_equals: str | PipelineVariable | None = Unassigned(), offline_store_status_equals: str | PipelineVariable | None = Unassigned(), creation_time_after: datetime | None = Unassigned(), creation_time_before: datetime | None = Unassigned(), sort_order: str | PipelineVariable | None = Unassigned(), sort_by: str | PipelineVariable | None = Unassigned(), session: Session | None = None, region: str | PipelineVariable | None = None) ResourceIterator[FeatureGroup][source]#
Get all FeatureGroup resources
- Parameters:
name_contains – A string that partially matches one or more FeatureGroups names. Filters FeatureGroups by name.
feature_group_status_equals – A FeatureGroup status. Filters by FeatureGroup status.
offline_store_status_equals – An OfflineStore status. Filters by OfflineStore status.
creation_time_after – Use this parameter to search for FeatureGroupss created after a specific date and time.
creation_time_before – Use this parameter to search for FeatureGroupss created before a specific date and time.
sort_order – The order in which feature groups are listed.
sort_by – The value on which the feature group list is sorted.
max_results – The maximum number of results returned by ListFeatureGroups.
next_token – A token to resume pagination of ListFeatureGroups results.
session – Boto3 session.
region – Region name.
- Returns:
Iterator for listed FeatureGroup resources.
- Raises:
botocore.exceptions.ClientError – This exception is raised for AWS service related errors. The error message and error code can be parsed from the exception as follows:
` try: # AWS service call here except botocore.exceptions.ClientError as e: error_message = e.response['Error']['Message'] error_code = e.response['Error']['Code'] `
- get_record(record_identifier_value_as_string: str | PipelineVariable, feature_names: List[str | PipelineVariable] | None = Unassigned(), expiration_time_response: str | PipelineVariable | None = Unassigned(), session: Session | None = None, region: str | None = None) GetRecordResponse | None[source]#
Use for OnlineStore serving from a FeatureStore.
- Parameters:
record_identifier_value_as_string – The value that corresponds to RecordIdentifier type and uniquely identifies the record in the FeatureGroup.
feature_names – List of names of Features to be retrieved. If not specified, the latest value for all the Features are returned.
expiration_time_response – Parameter to request ExpiresAt in response. If Enabled, GetRecord will return the value of ExpiresAt, if it is not null. If Disabled and null, GetRecord will return null.
session – Boto3 session.
region – Region name.
- Returns:
GetRecordResponse
- Raises:
botocore.exceptions.ClientError – This exception is raised for AWS service related errors. The error message and error code can be parsed from the exception as follows:
` try: # AWS service call here except botocore.exceptions.ClientError as e: error_message = e.response['Error']['Message'] error_code = e.response['Error']['Code'] `AccessForbidden – You do not have permission to perform an action.
InternalFailure – An internal failure occurred. Try your request again. If the problem persists, contact Amazon Web Services customer support.
ResourceNotFound – Resource being access is not found.
ServiceUnavailable – The service is currently unavailable.
ValidationError – There was an error validating your request.
- last_modified_by: UserContext | None#
- last_modified_time: datetime | None#
- last_update_status: LastUpdateStatus | None#
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid', 'protected_namespaces': (), 'validate_assignment': True}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- next_token: str | PipelineVariable | None#
- offline_store_config: OfflineStoreConfig | None#
- offline_store_status: OfflineStoreStatus | None#
- online_store_config: OnlineStoreConfig | None#
- online_store_read_write_type: str | PipelineVariable | None#
- online_store_replicas: List[OnlineStoreReplica] | None#
- online_store_total_item_count: int | None#
- online_store_total_size_bytes: int | None#
- put_record(record: List[FeatureValue], target_stores: List[str | PipelineVariable] | None = Unassigned(), ttl_duration: TtlDuration | None = Unassigned(), session: Session | None = None, region: str | None = None) None[source]#
The PutRecord API is used to ingest a list of Records into your feature group.
- Parameters:
record – List of FeatureValues to be inserted. This will be a full over-write. If you only want to update few of the feature values, do the following: Use GetRecord to retrieve the latest record. Update the record returned from GetRecord. Use PutRecord to update feature values.
target_stores – A list of stores to which you’re adding the record. By default, Feature Store adds the record to all of the stores that you’re using for the FeatureGroup.
ttl_duration – Time to live duration, where the record is hard deleted after the expiration time is reached; ExpiresAt = EventTime + TtlDuration. For information on HardDelete, see the DeleteRecord API in the Amazon SageMaker API Reference guide.
session – Boto3 session.
region – Region name.
- Raises:
botocore.exceptions.ClientError – This exception is raised for AWS service related errors. The error message and error code can be parsed from the exception as follows:
` try: # AWS service call here except botocore.exceptions.ClientError as e: error_message = e.response['Error']['Message'] error_code = e.response['Error']['Code'] `AccessForbidden – You do not have permission to perform an action.
InternalFailure – An internal failure occurred. Try your request again. If the problem persists, contact Amazon Web Services customer support.
ServiceUnavailable – The service is currently unavailable.
ValidationError – There was an error validating your request.
- record_identifier_feature_name: str | PipelineVariable | None#
- refresh() FeatureGroup | None[source]#
Refresh a FeatureGroup resource
- Returns:
The FeatureGroup resource.
- Raises:
botocore.exceptions.ClientError – This exception is raised for AWS service related errors. The error message and error code can be parsed from the exception as follows:
` try: # AWS service call here except botocore.exceptions.ClientError as e: error_message = e.response['Error']['Message'] error_code = e.response['Error']['Code'] `ResourceNotFound – Resource being access is not found.
- role_arn: str | PipelineVariable | None#
- throughput_config: ThroughputConfigDescription | None#
- update(add_online_store_replica: AddOnlineStoreReplicaAction | None = Unassigned(), feature_additions: List[FeatureDefinition] | None = Unassigned(), online_store_config: OnlineStoreConfigUpdate | None = Unassigned(), description: str | PipelineVariable | None = Unassigned(), throughput_config: ThroughputConfigUpdate | None = Unassigned()) FeatureGroup | None[source]#
Update a FeatureGroup resource
- Parameters:
add_online_store_replica
feature_additions – Updates the feature group. Updating a feature group is an asynchronous operation. When you get an HTTP 200 response, you’ve made a valid request. It takes some time after you’ve made a valid request for Feature Store to update the feature group.
- Returns:
The FeatureGroup resource.
- Raises:
botocore.exceptions.ClientError – This exception is raised for AWS service related errors. The error message and error code can be parsed from the exception as follows:
` try: # AWS service call here except botocore.exceptions.ClientError as e: error_message = e.response['Error']['Message'] error_code = e.response['Error']['Code'] `ResourceLimitExceeded – You have exceeded an SageMaker resource limit. For example, you might have too many training jobs created.
ResourceNotFound – Resource being access is not found.
- wait_for_delete(poll: int = 5, timeout: int | None = None) None[source]#
Wait for a FeatureGroup resource to be deleted.
- Parameters:
poll – The number of seconds to wait between each poll.
timeout – The maximum number of seconds to wait before timing out.
- Raises:
botocore.exceptions.ClientError – This exception is raised for AWS service related errors. The error message and error code can be parsed from the exception as follows:
` try: # AWS service call here except botocore.exceptions.ClientError as e: error_message = e.response['Error']['Message'] error_code = e.response['Error']['Code'] `TimeoutExceededError – If the resource does not reach a terminal state before the timeout.
DeleteFailedStatusError – If the resource reaches a failed state.
WaiterError – Raised when an error occurs while waiting.
- wait_for_status(target_status: Literal['Creating', 'Created', 'CreateFailed', 'Deleting', 'DeleteFailed'], poll: int = 5, timeout: int | None = None) None[source]#
Wait for a FeatureGroup resource to reach certain status.
- Parameters:
target_status – The status to wait for.
poll – The number of seconds to wait between each poll.
timeout – The maximum number of seconds to wait before timing out.
- Raises:
TimeoutExceededError – If the resource does not reach a terminal state before the timeout.
FailedStatusError – If the resource reaches a failed state.
WaiterError – Raised when an error occurs while waiting.
- class sagemaker.mlops.feature_store.FeatureGroupToBeMerged(features: List[str], included_feature_names: List[str], projected_feature_names: List[str], catalog: str, database: str, table_name: str, record_identifier_feature_name: str, event_time_identifier_feature: FeatureDefinition, target_feature_name_in_base: str | None = None, table_type: TableType | None = None, feature_name_in_target: str | None = None, join_comparator: JoinComparatorEnum = JoinComparatorEnum.EQUALS, join_type: JoinTypeEnum = JoinTypeEnum.INNER_JOIN)[source]#
Bases:
objectFeatureGroup metadata which will be used for SQL join.
This class instantiates a FeatureGroupToBeMerged object that comprises a list of feature names, a list of feature names which will be included in SQL query, a database, an Athena table name, a feature name of record identifier, a feature name of event time identifier and a feature name of base which is the target join key.
- features#
A list of strings representing feature names of this FeatureGroup.
- Type:
List[str]
- included_feature_names#
A list of strings representing features to be included in the SQL join.
- Type:
List[str]
- projected_feature_names#
A list of strings representing features to be included for final projection in output.
- Type:
List[str]
- catalog#
A string representing the catalog.
- Type:
str
- database#
A string representing the database.
- Type:
str
- table_name#
A string representing the Athena table name of this FeatureGroup.
- Type:
str
- record_identifier_feature_name#
A string representing the record identifier feature.
- Type:
str
- event_time_identifier_feature#
A FeatureDefinition representing the event time identifier feature.
- Type:
- target_feature_name_in_base#
A string representing the feature name in base which will be used as target join key (default: None).
- Type:
str
- table_type#
A TableType representing the type of table if it is Feature Group or Panda Data Frame (default: None).
- Type:
- feature_name_in_target#
A string representing the feature name in the target feature group that will be compared to the target feature in the base feature group. If None is provided, the record identifier feature will be used in the SQL join. (default: None).
- Type:
str
- join_comparator#
A JoinComparatorEnum representing the comparator used when joining the target feature in the base feature group and the feature in the target feature group. (default: JoinComparatorEnum.EQUALS).
- Type:
- join_type#
A JoinTypeEnum representing the type of join between the base and target feature groups. (default: JoinTypeEnum.INNER_JOIN).
- Type:
- catalog: str#
- database: str#
- event_time_identifier_feature: FeatureDefinition#
- feature_name_in_target: str = None#
- features: List[str]#
- included_feature_names: List[str]#
- join_comparator: JoinComparatorEnum = '='#
- join_type: JoinTypeEnum = 'JOIN'#
- projected_feature_names: List[str]#
- record_identifier_feature_name: str#
- table_name: str#
- target_feature_name_in_base: str = None#
- class sagemaker.mlops.feature_store.FeatureMetadata(*, feature_group_name: str | PipelineVariable, feature_name: str | PipelineVariable, feature_group_arn: str | PipelineVariable | None = Unassigned(), feature_identifier: str | PipelineVariable | None = Unassigned(), feature_type: str | PipelineVariable | None = Unassigned(), creation_time: datetime | None = Unassigned(), last_modified_time: datetime | None = Unassigned(), description: str | PipelineVariable | None = Unassigned(), parameters: List[FeatureParameter] | None = Unassigned())[source]#
Bases:
BaseClass representing resource FeatureMetadata
- feature_group_arn#
The Amazon Resource Number (ARN) of the feature group that contains the feature.
- Type:
str | sagemaker.core.helper.pipeline_variable.PipelineVariable | None
- feature_group_name#
The name of the feature group that you’ve specified.
- feature_name#
The name of the feature that you’ve specified.
- feature_type#
The data type of the feature.
- Type:
str | sagemaker.core.helper.pipeline_variable.PipelineVariable | None
- creation_time#
A timestamp indicating when the feature was created.
- Type:
datetime.datetime | None
- last_modified_time#
A timestamp indicating when the metadata for the feature group was modified. For example, if you add a parameter describing the feature, the timestamp changes to reflect the last time you
- Type:
datetime.datetime | None
- feature_identifier#
- Type:
str | sagemaker.core.helper.pipeline_variable.PipelineVariable | None
- description#
The description you added to describe the feature.
- Type:
str | sagemaker.core.helper.pipeline_variable.PipelineVariable | None
- parameters#
The key-value pairs that you added to describe the feature.
- Type:
List[sagemaker.core.shapes.shapes.FeatureParameter] | None
- creation_time: datetime | None#
- description: str | PipelineVariable | None#
- feature_group_arn: str | PipelineVariable | None#
- feature_group_name: str | PipelineVariable#
- feature_identifier: str | PipelineVariable | None#
- feature_name: str | PipelineVariable#
- feature_type: str | PipelineVariable | None#
- classmethod get(feature_group_name: str | PipelineVariable, feature_name: str | PipelineVariable, session: Session | None = None, region: str | PipelineVariable | None = None) FeatureMetadata | None[source]#
Get a FeatureMetadata resource
- Parameters:
feature_group_name – The name or Amazon Resource Name (ARN) of the feature group containing the feature.
feature_name – The name of the feature.
session – Boto3 session.
region – Region name.
- Returns:
The FeatureMetadata resource.
- Raises:
botocore.exceptions.ClientError – This exception is raised for AWS service related errors. The error message and error code can be parsed from the exception as follows:
` try: # AWS service call here except botocore.exceptions.ClientError as e: error_message = e.response['Error']['Message'] error_code = e.response['Error']['Code'] `ResourceNotFound – Resource being access is not found.
- last_modified_time: datetime | None#
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid', 'protected_namespaces': (), 'validate_assignment': True}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- parameters: List[FeatureParameter] | None#
- refresh() FeatureMetadata | None[source]#
Refresh a FeatureMetadata resource
- Returns:
The FeatureMetadata resource.
- Raises:
botocore.exceptions.ClientError – This exception is raised for AWS service related errors. The error message and error code can be parsed from the exception as follows:
` try: # AWS service call here except botocore.exceptions.ClientError as e: error_message = e.response['Error']['Message'] error_code = e.response['Error']['Code'] `ResourceNotFound – Resource being access is not found.
- update(description: str | PipelineVariable | None = Unassigned(), parameter_additions: List[FeatureParameter] | None = Unassigned(), parameter_removals: List[str | PipelineVariable] | None = Unassigned()) FeatureMetadata | None[source]#
Update a FeatureMetadata resource
- Parameters:
parameter_additions – A list of key-value pairs that you can add to better describe the feature.
parameter_removals – A list of parameter keys that you can specify to remove parameters that describe your feature.
- Returns:
The FeatureMetadata resource.
- Raises:
botocore.exceptions.ClientError – This exception is raised for AWS service related errors. The error message and error code can be parsed from the exception as follows:
` try: # AWS service call here except botocore.exceptions.ClientError as e: error_message = e.response['Error']['Message'] error_code = e.response['Error']['Code'] `ResourceNotFound – Resource being access is not found.
- class sagemaker.mlops.feature_store.FeatureParameter(*, key: str | PipelineVariable | None = Unassigned(), value: str | PipelineVariable | None = Unassigned())[source]#
Bases:
BaseA key-value pair that you specify to describe the feature.
- key#
- Type:
A key that must contain a value to describe the feature.
- value#
- Type:
The value that belongs to a key.
- key: str | PipelineVariable | None#
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid', 'protected_namespaces': (), 'validate_assignment': True}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- value: str | PipelineVariable | None#
- class sagemaker.mlops.feature_store.FeatureTypeEnum(value)[source]#
Bases:
EnumFeature data types: Fractional, Integral, or String.
- FRACTIONAL = 'Fractional'#
- INTEGRAL = 'Integral'#
- STRING = 'String'#
- class sagemaker.mlops.feature_store.FeatureValue(*, feature_name: str | PipelineVariable, value_as_string: str | PipelineVariable | None = Unassigned(), value_as_string_list: List[str | PipelineVariable] | None = Unassigned())[source]#
Bases:
BaseThe value associated with a feature.
- feature_name#
- Type:
The name of a feature that a feature value corresponds to.
- value_as_string#
- Type:
The value in string format associated with a feature. Used when your CollectionType is None. Note that features types can be String, Integral, or Fractional. This value represents all three types as a string.
- value_as_string_list#
- Type:
The list of values in string format associated with a feature. Used when your CollectionType is a List, Set, or Vector. Note that features types can be String, Integral, or Fractional. These values represents all three types as a string.
- feature_name: str | PipelineVariable#
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid', 'protected_namespaces': (), 'validate_assignment': True}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- value_as_string: str | PipelineVariable | None#
- value_as_string_list: List[str | PipelineVariable] | None#
- class sagemaker.mlops.feature_store.Filter(*, name: str | PipelineVariable, operator: str | PipelineVariable | None = Unassigned(), value: str | PipelineVariable | None = Unassigned())[source]#
Bases:
BaseA conditional statement for a search expression that includes a resource property, a Boolean operator, and a value. Resources that match the statement are returned in the results from the Search API. If you specify a Value, but not an Operator, SageMaker uses the equals operator. In search, there are several property types: Metrics To define a metric filter, enter a value using the form “Metrics.<name>”, where <name> is a metric name. For example, the following filter searches for training jobs with an “accuracy” metric greater than “0.9”: { “Name”: “Metrics.accuracy”, “Operator”: “GreaterThan”, “Value”: “0.9” } HyperParameters To define a hyperparameter filter, enter a value with the form “HyperParameters.<name>”. Decimal hyperparameter values are treated as a decimal in a comparison if the specified Value is also a decimal value. If the specified Value is an integer, the decimal hyperparameter values are treated as integers. For example, the following filter is satisfied by training jobs with a “learning_rate” hyperparameter that is less than “0.5”: { “Name”: “HyperParameters.learning_rate”, “Operator”: “LessThan”, “Value”: “0.5” } Tags To define a tag filter, enter a value with the form Tags.<key>.
- name#
- Type:
A resource property name. For example, TrainingJobName. For valid property names, see SearchRecord. You must specify a valid property for the resource.
- operator#
- Type:
A Boolean binary operator that is used to evaluate the filter. The operator field contains one of the following values: Equals The value of Name equals Value. NotEquals The value of Name doesn’t equal Value. Exists The Name property exists. NotExists The Name property does not exist. GreaterThan The value of Name is greater than Value. Not supported for text properties. GreaterThanOrEqualTo The value of Name is greater than or equal to Value. Not supported for text properties. LessThan The value of Name is less than Value. Not supported for text properties. LessThanOrEqualTo The value of Name is less than or equal to Value. Not supported for text properties. In The value of Name is one of the comma delimited strings in Value. Only supported for text properties. Contains The value of Name contains the string Value. Only supported for text properties. A SearchExpression can include the Contains operator multiple times when the value of Name is one of the following: Experiment.DisplayName Experiment.ExperimentName Experiment.Tags Trial.DisplayName Trial.TrialName Trial.Tags TrialComponent.DisplayName TrialComponent.TrialComponentName TrialComponent.Tags TrialComponent.InputArtifacts TrialComponent.OutputArtifacts A SearchExpression can include only one Contains operator for all other values of Name. In these cases, if you include multiple Contains operators in the SearchExpression, the result is the following error message: “‘CONTAINS’ operator usage limit of 1 exceeded.”
- value#
- Type:
A value used with Name and Operator to determine which resources satisfy the filter’s condition. For numerical properties, Value must be an integer or floating-point decimal. For timestamp properties, Value must be an ISO 8601 date-time string of the following format: YYYY-mm-dd’T’HH:MM:SS.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid', 'protected_namespaces': (), 'validate_assignment': True}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- name: str | PipelineVariable#
- operator: str | PipelineVariable | None#
- value: str | PipelineVariable | None#
- class sagemaker.mlops.feature_store.FilterOperatorEnum(value)[source]#
Bases:
EnumFilter operators.
- CONTAINS = 'Contains'#
- EQUALS = 'Equals'#
- EXISTS = 'Exists'#
- GREATER_THAN = 'GreaterThan'#
- GREATER_THAN_OR_EQUAL_TO = 'GreaterThanOrEqualTo'#
- IN = 'In'#
- LESS_THAN = 'LessThan'#
- LESS_THAN_OR_EQUAL_TO = 'LessThanOrEqualTo'#
- NOT_EQUALS = 'NotEquals'#
- NOT_EXISTS = 'NotExists'#
- sagemaker.mlops.feature_store.FractionalFeatureDefinition(feature_name: str, collection_type: ListCollectionType | SetCollectionType | VectorCollectionType | None = None) FeatureDefinition[source]#
Create a feature definition with Fractional type.
- exception sagemaker.mlops.feature_store.IngestionError(failed_rows: List[int], message: str)[source]#
Bases:
ExceptionException raised for errors during ingestion.
- failed_rows#
List of row indices that failed to ingest.
- message#
Error message.
- class sagemaker.mlops.feature_store.IngestionManagerPandas(feature_group_name: str, feature_definitions: Dict[str, Dict[Any, Any]], max_workers: int = 1, max_processes: int = 1)[source]#
Bases:
objectClass to manage the multi-threaded data ingestion process.
This class will manage the data ingestion process which is multi-threaded.
- feature_group_name#
name of the Feature Group.
- Type:
str
- feature_definitions#
dictionary of feature definitions where the key is the feature name and the value is the FeatureDefinition. The FeatureDefinition contains the data type of the feature.
- Type:
Dict[str, Dict[Any, Any]]
- max_workers#
number of threads to create.
- Type:
int
- max_processes#
number of processes to create. Each process spawns
max_workersthreads.- Type:
int
- property failed_rows: List[int]#
Get rows that failed to ingest.
- Returns:
List of row indices that failed to be ingested.
- feature_definitions: Dict[str, Dict[Any, Any]]#
- feature_group_name: str#
- max_processes: int = 1#
- max_workers: int = 1#
- run(data_frame: DataFrame, target_stores: List[str] = None, wait: bool = True, timeout: int | float = None)[source]#
Start the ingestion process.
- Parameters:
data_frame (DataFrame) – source DataFrame to be ingested.
target_stores (List[str]) – list of target stores (“OnlineStore”, “OfflineStore”). If None, the default target store is used.
wait (bool) – whether to wait for the ingestion to finish or not.
timeout (Union[int, float]) –
concurrent.futures.TimeoutErrorwill be raised if timeout is reached.
- Raises:
ValueError – If wait=False with max_workers=1 and max_processes=1.
- sagemaker.mlops.feature_store.IntegralFeatureDefinition(feature_name: str, collection_type: ListCollectionType | SetCollectionType | VectorCollectionType | None = None) FeatureDefinition[source]#
Create a feature definition with Integral type.
- class sagemaker.mlops.feature_store.JoinComparatorEnum(value)[source]#
Bases:
EnumAn enumeration.
- EQUALS = '='#
- GREATER_THAN = '>'#
- GREATER_THAN_OR_EQUAL_TO = '>='#
- LESS_THAN = '<'#
- LESS_THAN_OR_EQUAL_TO = '<='#
- NOT_EQUAL_TO = '<>'#
- class sagemaker.mlops.feature_store.JoinTypeEnum(value)[source]#
Bases:
EnumAn enumeration.
- CROSS_JOIN = 'CROSS JOIN'#
- FULL_JOIN = 'FULL JOIN'#
- INNER_JOIN = 'JOIN'#
- LEFT_JOIN = 'LEFT JOIN'#
- RIGHT_JOIN = 'RIGHT JOIN'#
- class sagemaker.mlops.feature_store.ListCollectionType[source]#
Bases:
objectList collection type.
- collection_config = None#
- collection_type = 'List'#
- class sagemaker.mlops.feature_store.OfflineStoreConfig(*, s3_storage_config: S3StorageConfig, disable_glue_table_creation: bool | None = Unassigned(), data_catalog_config: DataCatalogConfig | None = Unassigned(), table_format: str | PipelineVariable | None = Unassigned())[source]#
Bases:
BaseThe configuration of an OfflineStore. Provide an OfflineStoreConfig in a request to CreateFeatureGroup to create an OfflineStore. To encrypt an OfflineStore using at rest data encryption, specify Amazon Web Services Key Management Service (KMS) key ID, or KMSKeyId, in S3StorageConfig.
- s3_storage_config#
- Type:
The Amazon Simple Storage (Amazon S3) location of OfflineStore.
- disable_glue_table_creation#
- Type:
Set to True to disable the automatic creation of an Amazon Web Services Glue table when configuring an OfflineStore. If set to False, Feature Store will name the OfflineStore Glue table following Athena’s naming recommendations. The default value is False.
- data_catalog_config#
- Type:
The meta data of the Glue table that is autogenerated when an OfflineStore is created.
- table_format#
- Type:
Format for the offline store table. Supported formats are Glue (Default) and Apache Iceberg.
- data_catalog_config: DataCatalogConfig | None#
- disable_glue_table_creation: bool | None#
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid', 'protected_namespaces': (), 'validate_assignment': True}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- s3_storage_config: S3StorageConfig#
- table_format: str | PipelineVariable | None#
- class sagemaker.mlops.feature_store.OnlineStoreConfig(*, security_config: OnlineStoreSecurityConfig | None = Unassigned(), enable_online_store: bool | None = Unassigned(), ttl_duration: TtlDuration | None = Unassigned(), storage_type: str | PipelineVariable | None = Unassigned())[source]#
Bases:
BaseUse this to specify the Amazon Web Services Key Management Service (KMS) Key ID, or KMSKeyId, for at rest data encryption. You can turn OnlineStore on or off by specifying the EnableOnlineStore flag at General Assembly. The default value is False.
- security_config#
- Type:
Use to specify KMS Key ID (KMSKeyId) for at-rest encryption of your OnlineStore.
- enable_online_store#
- Type:
Turn OnlineStore off by specifying False for the EnableOnlineStore flag. Turn OnlineStore on by specifying True for the EnableOnlineStore flag. The default value is False.
- ttl_duration#
- Type:
Time to live duration, where the record is hard deleted after the expiration time is reached; ExpiresAt = EventTime + TtlDuration. For information on HardDelete, see the DeleteRecord API in the Amazon SageMaker API Reference guide.
- storage_type#
- Type:
Option for different tiers of low latency storage for real-time data retrieval. Standard: A managed low latency data store for feature groups. InMemory: A managed data store for feature groups that supports very low latency retrieval.
- enable_online_store: bool | None#
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid', 'protected_namespaces': (), 'validate_assignment': True}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- security_config: OnlineStoreSecurityConfig | None#
- storage_type: str | PipelineVariable | None#
- ttl_duration: TtlDuration | None#
- class sagemaker.mlops.feature_store.OnlineStoreSecurityConfig(*, kms_key_id: str | PipelineVariable | None = Unassigned())[source]#
Bases:
BaseThe security configuration for OnlineStore.
- kms_key_id#
- Type:
The Amazon Web Services Key Management Service (KMS) key ARN that SageMaker Feature Store uses to encrypt the Amazon S3 objects at rest using Amazon S3 server-side encryption. The caller (either user or IAM role) of CreateFeatureGroup must have below permissions to the OnlineStore KmsKeyId: “kms:Encrypt” “kms:Decrypt” “kms:DescribeKey” “kms:CreateGrant” “kms:RetireGrant” “kms:ReEncryptFrom” “kms:ReEncryptTo” “kms:GenerateDataKey” “kms:ListAliases” “kms:ListGrants” “kms:RevokeGrant” The caller (either user or IAM role) to all DataPlane operations (PutRecord, GetRecord, DeleteRecord) must have the following permissions to the KmsKeyId: “kms:Decrypt”
- kms_key_id: str | PipelineVariable | None#
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid', 'protected_namespaces': (), 'validate_assignment': True}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class sagemaker.mlops.feature_store.OnlineStoreStorageTypeEnum(value)[source]#
Bases:
EnumStorage types for online store.
- IN_MEMORY = 'InMemory'#
- STANDARD = 'Standard'#
- class sagemaker.mlops.feature_store.ResourceEnum(value)[source]#
Bases:
EnumResource types for search.
- FEATURE_GROUP = 'FeatureGroup'#
- FEATURE_METADATA = 'FeatureMetadata'#
- class sagemaker.mlops.feature_store.S3StorageConfig(*, s3_uri: str | PipelineVariable, kms_key_id: str | PipelineVariable | None = Unassigned(), resolved_output_s3_uri: str | PipelineVariable | None = Unassigned())[source]#
Bases:
BaseThe Amazon Simple Storage (Amazon S3) location and security configuration for OfflineStore.
- s3_uri#
- Type:
The S3 URI, or location in Amazon S3, of OfflineStore. S3 URIs have a format similar to the following: s3://example-bucket/prefix/.
- kms_key_id#
- Type:
The Amazon Web Services Key Management Service (KMS) key ARN of the key used to encrypt any objects written into the OfflineStore S3 location. The IAM roleARN that is passed as a parameter to CreateFeatureGroup must have below permissions to the KmsKeyId: “kms:GenerateDataKey”
- resolved_output_s3_uri#
- Type:
The S3 path where offline records are written.
- kms_key_id: str | PipelineVariable | None#
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid', 'protected_namespaces': (), 'validate_assignment': True}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- resolved_output_s3_uri: str | PipelineVariable | None#
- s3_uri: str | PipelineVariable#
- class sagemaker.mlops.feature_store.SearchExpression(*, filters: List[Filter] | None = Unassigned(), nested_filters: List[NestedFilters] | None = Unassigned(), sub_expressions: List[SearchExpression] | None = Unassigned(), operator: str | PipelineVariable | None = Unassigned())[source]#
Bases:
BaseA multi-expression that searches for the specified resource or resources in a search. All resource objects that satisfy the expression’s condition are included in the search results. You must specify at least one subexpression, filter, or nested filter. A SearchExpression can contain up to twenty elements. A SearchExpression contains the following components: A list of Filter objects. Each filter defines a simple Boolean expression comprised of a resource property name, Boolean operator, and value. A list of NestedFilter objects. Each nested filter defines a list of Boolean expressions using a list of resource properties. A nested filter is satisfied if a single object in the list satisfies all Boolean expressions. A list of SearchExpression objects. A search expression object can be nested in a list of search expression objects. A Boolean operator: And or Or.
- filters#
- Type:
A list of filter objects.
- nested_filters#
- Type:
A list of nested filter objects.
- sub_expressions#
- Type:
A list of search expression objects.
- operator#
- Type:
A Boolean operator used to evaluate the search expression. If you want every conditional statement in all lists to be satisfied for the entire search expression to be true, specify And. If only a single conditional statement needs to be true for the entire search expression to be true, specify Or. The default value is And.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid', 'protected_namespaces': (), 'validate_assignment': True}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- nested_filters: List[NestedFilters] | None#
- operator: str | PipelineVariable | None#
- sub_expressions: List[SearchExpression] | None#
- class sagemaker.mlops.feature_store.SearchOperatorEnum(value)[source]#
Bases:
EnumSearch operators.
- AND = 'And'#
- OR = 'Or'#
- class sagemaker.mlops.feature_store.SetCollectionType[source]#
Bases:
objectSet collection type.
- collection_config = None#
- collection_type = 'Set'#
- class sagemaker.mlops.feature_store.SortOrderEnum(value)[source]#
Bases:
EnumSort orders.
- ASCENDING = 'Ascending'#
- DESCENDING = 'Descending'#
- sagemaker.mlops.feature_store.StringFeatureDefinition(feature_name: str, collection_type: ListCollectionType | SetCollectionType | VectorCollectionType | None = None) FeatureDefinition[source]#
Create a feature definition with String type.
- class sagemaker.mlops.feature_store.TableFormatEnum(value)[source]#
Bases:
EnumOffline store table formats.
- GLUE = 'Glue'#
- ICEBERG = 'Iceberg'#
- class sagemaker.mlops.feature_store.TableType(value)[source]#
Bases:
EnumAn enumeration.
- DATA_FRAME = 'DataFrame'#
- FEATURE_GROUP = 'FeatureGroup'#
- class sagemaker.mlops.feature_store.TargetStoreEnum(value)[source]#
Bases:
EnumStore types for put_record.
- OFFLINE_STORE = 'OfflineStore'#
- ONLINE_STORE = 'OnlineStore'#
- class sagemaker.mlops.feature_store.ThroughputConfig(*, throughput_mode: str | PipelineVariable, provisioned_read_capacity_units: int | None = Unassigned(), provisioned_write_capacity_units: int | None = Unassigned())[source]#
Bases:
BaseUsed to set feature group throughput configuration. There are two modes: ON_DEMAND and PROVISIONED. With on-demand mode, you are charged for data reads and writes that your application performs on your feature group. You do not need to specify read and write throughput because Feature Store accommodates your workloads as they ramp up and down. You can switch a feature group to on-demand only once in a 24 hour period. With provisioned throughput mode, you specify the read and write capacity per second that you expect your application to require, and you are billed based on those limits. Exceeding provisioned throughput will result in your requests being throttled. Note: PROVISIONED throughput mode is supported only for feature groups that are offline-only, or use the Standard tier online store.
- throughput_mode#
- Type:
The mode used for your feature group throughput: ON_DEMAND or PROVISIONED.
- provisioned_read_capacity_units#
- Type:
For provisioned feature groups with online store enabled, this indicates the read throughput you are billed for and can consume without throttling. This field is not applicable for on-demand feature groups.
- provisioned_write_capacity_units#
- Type:
For provisioned feature groups, this indicates the write throughput you are billed for and can consume without throttling. This field is not applicable for on-demand feature groups.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid', 'protected_namespaces': (), 'validate_assignment': True}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- provisioned_read_capacity_units: int | None#
- provisioned_write_capacity_units: int | None#
- throughput_mode: str | PipelineVariable#
- class sagemaker.mlops.feature_store.ThroughputModeEnum(value)[source]#
Bases:
EnumThroughput modes for feature group.
- ON_DEMAND = 'OnDemand'#
- PROVISIONED = 'Provisioned'#
- class sagemaker.mlops.feature_store.TtlDuration(*, unit: str | PipelineVariable | None = Unassigned(), value: int | None = Unassigned())[source]#
Bases:
BaseTime to live duration, where the record is hard deleted after the expiration time is reached; ExpiresAt = EventTime + TtlDuration. For information on HardDelete, see the DeleteRecord API in the Amazon SageMaker API Reference guide.
- unit#
- Type:
TtlDuration time unit.
- value#
- Type:
TtlDuration time value.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid', 'protected_namespaces': (), 'validate_assignment': True}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- unit: str | PipelineVariable | None#
- value: int | None#
- class sagemaker.mlops.feature_store.VectorCollectionType(dimension: int)[source]#
Bases:
objectVector collection type with dimension.
- collection_type = 'Vector'#
- sagemaker.mlops.feature_store.as_hive_ddl(feature_group_name: str, database: str = 'sagemaker_featurestore', table_name: str | None = None) str[source]#
Generate Hive DDL for a FeatureGroup’s offline store table.
Schema of the table is generated based on the feature definitions. Columns are named after feature name and data-type are inferred based on feature type. Integral feature type is mapped to INT data-type. Fractional feature type is mapped to FLOAT data-type. String feature type is mapped to STRING data-type.
- Parameters:
feature_group_name – Name of the FeatureGroup.
database – Hive database name (default: “sagemaker_featurestore”).
table_name – Hive table name (default: feature_group_name).
- Returns:
CREATE EXTERNAL TABLE DDL string.
- sagemaker.mlops.feature_store.create_athena_query(feature_group_name: str, session: Session)[source]#
Create an AthenaQuery for a FeatureGroup.
- Parameters:
feature_group_name – Name of the FeatureGroup.
session – Session instance for Athena boto calls.
- Returns:
AthenaQuery initialized with data catalog config.
- Raises:
RuntimeError – If no metastore is configured.
- sagemaker.mlops.feature_store.get_session_from_role(region: str, assume_role: str | None = None) Session[source]#
Get a Session from a region and optional IAM role.
- Parameters:
region – AWS region name.
assume_role – IAM role ARN to assume (default: None).
- Returns:
Session instance.
- sagemaker.mlops.feature_store.ingest_dataframe(feature_group_name: str, data_frame: DataFrame, max_workers: int = 1, max_processes: int = 1, wait: bool = True, timeout: int | float = None)[source]#
Ingest a pandas DataFrame to a FeatureGroup.
- Parameters:
feature_group_name – Name of the FeatureGroup.
data_frame – DataFrame to ingest.
max_workers – Threads per process (default: 1).
max_processes – Number of processes (default: 1).
wait – Wait for ingestion to complete (default: True).
timeout – Timeout in seconds (default: None).
- Returns:
IngestionManagerPandas instance.
- Raises:
ValueError – If max_workers or max_processes <= 0.
- sagemaker.mlops.feature_store.load_feature_definitions_from_dataframe(data_frame: DataFrame, online_storage_type: str | None = None) Sequence[FeatureDefinition][source]#
Infer FeatureDefinitions from DataFrame dtypes.
Column name is used as feature name. Feature type is inferred from the dtype of the column. Integer dtypes are mapped to Integral feature type. Float dtypes are mapped to Fractional feature type. All other dtypes are mapped to String.
For IN_MEMORY online_storage_type, collection type columns within DataFrame will be inferred as List instead of String.
- Parameters:
data_frame – DataFrame to infer features from.
online_storage_type – “Standard” or “InMemory” (default: None).
- Returns:
List of FeatureDefinition objects.
Modules
Dataset Builder for FeatureStore. |
|
Feature Definitions for FeatureStore. |
|
Utilities for working with FeatureGroups and FeatureStores. |
|
Multi-threaded data ingestion for FeatureStore using SageMaker Core. |
|
Enums for FeatureStore operations. |