AWS Data Flashcards
Query Only S3
S3 Select
Which service can set permissions at both the column and row levels ?
AWS Lake Formation
Columnar format
Apache Parquet, ORC
Why use columnar format such as Apache Parquet with Athena
reading only the necessary columns, thereby reducing the amount of data scanned and lowering costs.
VACUUM FULL vs VACUUM REINDEX
VACUUM FULL – Reclaims disk space, rebuilds indexes, and re-sorts all rows in the table. This is the default VACUUM operation in Redshift.
VACUUM REINDEX - Re-analyzes distribution of sort key columns + VACUUM FULL
Redshift is better for batch analytics or low-latency queries ?
Redshift is better suited for batch analytics
Apache Spark DataFrames
are distributed data collections arranged into named columns that enable parallel SQL operations on massive datasets
Glue DataBrew
Visual tool
Visual data preparation tools, preprocessing data
The T of ETL
AwsGlueSessionUserRestrictedServiceRole
Provides full access to all AWS Glue resources except for sessions. Allows users to create and use only the interactive sessions that are associated with the user. This policy also includes other permissions needed by AWS Glue to manage Glue resources in other AWS services
AWS Glue workflow can only trigger
EventBridge
The organization wants to retrieve all medical records for each patient. However, the records don’t have a common unique identifier.
AWS Glue FindMatches ML
AWS-managed KMS Key: automatic rotation every
1 year
rotation of Customer-managed KMS Key: (must be enabled) automatic every
1 year
GRANT EXECUTE
In redshift, permission allows users to run a stored procedure
oversee and document all activities within its Amazon Redshift resource, for audit and traceability purposes.
witch service ?
With cloudTrail
and
Enable Amazon Redshift’s built-in audit logging feature
AWS Glue crawler in account 11111 to access S3 bucket in account 99999
- Configure an IAM role in the 11111’s account with permissions to access the S3 bucket in the 99999’s account and associate this role with the AWS Glue crawler.
- Implement a bucket policy on the S3 bucket in the 99999‘s account that explicitly permits the AWS Glue crawler’s IAM role in the 11111‘s account access to the bucket.
- Update the AWS Glue crawler’s configuration in the 11111‘s account to target the S3 bucket containing the VPC flow logs in the 99999‘s account.