Tutorial Dojo Flashcards

Question 1

Q

AWS Data exchange

Answer

A

-3rd party datasets in s3
-accessed via API GetDataSet

Question 2

Q

Redshift concurrent scaling
and workload management

Answer

A

-handles concurrent users/unpredictability like BI workloads
-WLM can set query priority
-WLM up to 8 queues & eachqueue max concurrency of 50

Question 3

Q

Athena workgroups

Answer

A

-organize and manage queries
-can use Apache spark for analytics
-security and access control

Question 4

Q

AWS datasync

Answer

A

-on-prem to aws file storage like s3

Question 5

Q

S3 event notification

Answer

A

-event type=ObjectCreated for example
-can trigger lambda based off for example suffix .csv

Question 6

Q

AWS Glue for Ray

Answer

A

-job type for scale AI and Python and native library
-ray dataset based on Apache arrow

Question 7

Q

Aws managed services for Apache flink

Answer

A

-for real time, time series analysis
-sliding window for intervals or overlapping

Question 8

Q

Object lambda

Answer

A

-add code to get request enables real-time transformation as data retrieved
-on the fly

Question 9

Q

Aws Graviton instance

Answer

A

-custom aws for best price performance for workloads

Question 10

Q

Lambda provisioned concurrency

Answer

A

-setting scale without latency

Question 11

Q

Redshift data sharing

Answer

A

-share read access across clusters, workgroups, accounts, regions
-live data

Question 12

Q

-what to check state machine fails to start at a step?

Answer

A

state machines iam role

Question 13

Q

Glue’s sensitive data detection feature

Answer

A

-auto recognize PII AND redact

Question 14

Q

S3 VPC gateway endpoint

Answer

A

-specify as target route in route table for traffic destined to s3

Question 15

Q

Athena federated query

Answer

A

-connectors using lambda
-nosql, sql, timestream, etc

Question 16

Q

Kenisis reporting with redshift real time

Answer

A

-create external schema for data stream
-materialized view referencing schema w auto refresh

Question 17

Q

Stl_alert_event_log

Answer

A

-redshift view help identify performance issues and solution

Question 18

Q

Glue resource policy

Answer

A

-think finance and hr running own etl and access own dbs

Question 19

Q

S3 access point

Answer

A

-for multiple application access
-for cross-account access
-works with bucket policy

Question 20

Q

MSCK Repair table

Answer

A

-Athena query when new data added to existing partition
-makes new partitions visible but does not necessarily speed up performance

Question 21

Q

EFS and lambda

Answer

A

-mounts to efs seamlessly

Question 22

Q

Improve kinesis Performance when processing

Answer

A

-add shards
-config parallel satin
-reg lambda func as consumer w enhanced fan-out
-exponential backoff and retry?

Question 23

Q

Glue catalog partition predicates (frame)
&
Push down predicate

Answer

A

-server side filtering during frame creation (before data even loaded)
-faster than client side where data loaded in memory

-push down is similar but no mention of partition

Question 24

Q

Transient EMR clusters

Answer

A

-think batch jobs
-cluster created then terminated after

Question 25

Q

Question 26

Q

SQS settings

Answer

A

DelaySeconds -how long before visible in queue

VisibilityTimeout -prevents multiple receive/processed

MaxRecieveCount -amt of times a msg can be received before deleted

Question 27

Q

Athena notebooks

Answer

A

-interactive python coding environment
-execute spark code visually

Question 28

Q

CloudWatch container insights

Answer

A

-for microservices and container apps

Question 29

Q

OpenSearch storage

Answer

A

-hot = fastest access expensive
-ultra warm = less accessed cheaper
-cold = infrequent access can attach to ultra warm

Question 30

Q

Stored proc and aurora

Answer

A

-can run proc in aurora to trigger lambda when loan is approved for example.

Question 31

Q

MSK kafka ACLs

Answer

A

-microservices
-which apps read/write diff topics

Question 32

Q

Cloud trail data events vs management events

Answer

A

-data events = executions/s3 put example..
-management events = deleting resources

Question 33

Q

Glue DataBrew masking techniques

Answer

A

-substitution = aron changed to donny
-probablistic = different ciphertext each time
-nulling deleting

Question 34

Q

Question 35

Q

Athena Partition projection

Answer

A

-helps query performance focusing on subsets
-good to run when already partitioned and data is growing.

Question 36

Q

Redshift distribution style

Answer

A

-EVEN=rows even across node. Good when no joins/no clear dist key

-KEY=rows w same key stored together. Good for query frequently filter or joined on spec column

-ALL = full copy to each node. Best for small static table

-AUTO = may change over time or not clear

Question 37

Q

Sagemaker canvas

Answer

A

-no-code visual canvas.
-simplifies whole process from cleaning to prediction

Question 38

Q

Redshift vacuum commands

Answer

A

VACUUM FULL -same as vacuum

VACUUM DELETE ONLY -doesn’t speed up performance just reclaims disc space

VACUUM REINDEX -analyze interleaved sort key and performs vacuum.

VACUUM SORT ONLY -sorts w out reclaiming disk. Used when rows unsourced but space not an issue

Question 39

Q

Sagemaker workflows/lineage tracking

Answer

A

-save steps in workflow
-visually think step function editor

Question 40

Q

CloudWatch contributor insights & dynamodb

Answer

A

-view of dynamodb traffic trends

Question 41

Q

Dynamodb cardinality key

Answer

A

-when throttling issues use high cardinality key, so more evenly distributed.
-for hot partition issues

Question 42

Q

RDS performance insights

Answer

A

-gathers performance metrics