AWS Data Flashcards

1
Q

Query Only S3

A

S3 Select

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Which service can set permissions at both the column and row levels ?

A

AWS Lake Formation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Columnar format

A

Apache Parquet, ORC

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Why use columnar format such as Apache Parquet with Athena

A

reading only the necessary columns, thereby reducing the amount of data scanned and lowering costs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

VACUUM FULL vs VACUUM REINDEX

A

VACUUM FULL – Reclaims disk space, rebuilds indexes, and re-sorts all rows in the table. This is the default VACUUM operation in Redshift.
VACUUM REINDEX - Re-analyzes distribution of sort key columns + VACUUM FULL

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Redshift is better for batch analytics or low-latency queries ?

A

Redshift is better suited for batch analytics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Apache Spark DataFrames

A

are distributed data collections arranged into named columns that enable parallel SQL operations on massive datasets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Glue DataBrew

A

Visual tool
Visual data preparation tools, preprocessing data
The T of ETL

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

AwsGlueSessionUserRestrictedServiceRole

A

Provides full access to all AWS Glue resources except for sessions. Allows users to create and use only the interactive sessions that are associated with the user. This policy also includes other permissions needed by AWS Glue to manage Glue resources in other AWS services

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

AWS Glue workflow can only trigger

A

EventBridge

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

The organization wants to retrieve all medical records for each patient. However, the records don’t have a common unique identifier.

A

AWS Glue FindMatches ML

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

AWS-managed KMS Key: automatic rotation every

A

1 year

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

rotation of Customer-managed KMS Key: (must be enabled) automatic every

A

1 year

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

GRANT EXECUTE

A

In redshift, permission allows users to run a stored procedure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

oversee and document all activities within its Amazon Redshift resource, for audit and traceability purposes.
witch service ?

A

With cloudTrail
and
Enable Amazon Redshift’s built-in audit logging feature

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

AWS Glue crawler in account 11111 to access S3 bucket in account 99999

A
  1. Configure an IAM role in the 11111’s account with permissions to access the S3 bucket in the 99999’s account and associate this role with the AWS Glue crawler.
  2. Implement a bucket policy on the S3 bucket in the 99999‘s account that explicitly permits the AWS Glue crawler’s IAM role in the 11111‘s account access to the bucket.
  3. Update the AWS Glue crawler’s configuration in the 11111‘s account to target the S3 bucket containing the VPC flow logs in the 99999‘s account.