sagemaker.mlops.feature_store.athena_query#

Classes

AthenaQuery(catalog, database, table_name, ...)

Class to manage querying of feature store data with AWS Athena.

class sagemaker.mlops.feature_store.athena_query.AthenaQuery(catalog: str, database: str, table_name: str, sagemaker_session: Session)[source]#

Bases: object

Class to manage querying of feature store data with AWS Athena.

This class instantiates a AthenaQuery object that is used to retrieve data from feature store via standard SQL queries.

catalog#

name of the data catalog.

Type:

str

database#

name of the database.

Type:

str

table_name#

name of the table.

Type:

str

sagemaker_session#

instance of the Session class to perform boto calls.

Type:

Session

as_dataframe(**kwargs) DataFrame[source]#

Download the result of the current query and load it into a DataFrame.

Parameters:

**kwargs (object) – key arguments used for the method pandas.read_csv to be able to have a better tuning on data. For more info read: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html

Returns:

A pandas DataFrame contains the query result.

catalog: str#
database: str#
get_query_execution() Dict[str, Any][source]#

Get execution status of the current query.

Returns:

Response dict from Athena.

run(query_string: str, output_location: str, kms_key: str = None, workgroup: str = None) str[source]#

Execute a SQL query given a query string, output location and kms key.

This method executes the SQL query using Athena and outputs the results to output_location and returns the execution id of the query.

Parameters:
  • query_string – SQL query string.

  • output_location – S3 URI of the query result.

  • kms_key – KMS key id. If set, will be used to encrypt the query result file.

  • workgroup (str) – The name of the workgroup in which the query is being started.

Returns:

Execution id of the query.

sagemaker_session: Session#
table_name: str#
wait()[source]#

Wait for the current query to finish.